The project running preview
The scientific instrument (running in sand box during the review)
The scientific instrument (running in sand box during the review)
Backend CLI of this project.

Project Story — A Real-World SPM Agent

About the Project

This project is an AI agent that controls a scientific instrument that is deployed locally. I started this project with Google AI studio but soon I found the complexity is beyond what AI studio can handle. Thus, the rest of the coding is switched to and done by OpenCode with choise of model: gemini-3-pro-preview. The project itself is an AI Agent, and intrinsically it uses Gemini 3 flash preview to power the agent.

This project is an attempt to answer a question that kept bothering me throughout years of experimental research:

How do we automate complex software systems that are intentionally designed for human interaction and judgment?

Scientific instruments are one example — but the same problem appears in industrial control software, legacy GUIs, and safety-critical operational systems.

I built an autonomous Scanning Probe Microscopy (SPM) agent that operates directly inside real laboratory control software. It does not execute predefined scripts or follow a fixed workflow. Instead, it acts, observes what actually happened, updates its internal state, and then decides what to do next.

The goal is not speed or convenience. The goal is to make experimental judgment reproducible and constrained, without assuming the world behaves as expected.

Motivation

In real experiments, failure rarely comes from lack of knowledge. It comes from mismatch between expectation and reality.

Scripts assume success
Workflows assume stability
Humans adapt when neither is true

Noise, drift, partial failure, hysteresis, and recovery are not edge cases — they are the default. Most AI automation systems implicitly assume a closed world. Experimental science violates that assumption constantly.

I wanted to test whether a general-purpose LLM, without any domain-specific fine-tuning, could operate safely in such an environment if the behavioral structure around it was designed correctly.

What I Built

At the core is a strict ReAct-style control loop:

Action --> Observe --> Update Memory --> Next Action

This loop is hard-coded at the system level. The model cannot skip steps, assume success, or hallucinate state.

Key design choices

Hard-coded orchestration The agent is forced to observe after every action. Success must be seen, not assumed.
ROI-based observation Instead of full screenshots, the agent observes small, well-defined regions of interest (ROIs) on the control panel. Each ROI corresponds to a single, unambiguous instrument readout (bias, current, scan state, etc.).
Structured memory as ground truth The agent’s state is defined by what was observed and stored — not by what it says.
Memory compression for long sessions Long experimental sessions are compressed into stable, low-token summaries without losing causal structure, enabling long-horizon reasoning without context collapse.

While this demo targets SPM control software, the architecture is domain-agnostic. Any system where actions must be verified through observation — rather than assumed success — fits this model.

What This Demo Shows

In the live demo, the agent operates the actual Nanonis SPM control software used in experimental labs. For safety, the software is connected to a sandbox simulator rather than a real STM system.

The agent:

Issues real UI actions
Observes real control-panel feedback
Adapts when outcomes differ from expectations
Maintains coherent state over time

No scripted paths. No hidden resets. No idealized assumptions.

What I Learned

The most important lesson was simple:

You don’t need a smarter model to get smarter behavior.

Once the agent is forced to ground every decision in observation:

Hallucinated success disappears
Rigid plans give way to local, recoverable reasoning
Partial failure becomes manageable rather than fatal

Experimental intelligence is not about choosing the “best” action. It is about maintaining coherence over time in an unstable world.

Challenges

Latency vs reliability ReAct-style loops are slower than blind workflows, but dramatically more robust.
Memory design Free-form conversation memory quickly becomes unusable. Structured, compressible memory was essential.
Safety constraints In physical systems, trust must come from orchestration, not from trusting the model.

Why This Matters

Real AI-for-science — and AI-for-industry — progress will not come from larger models alone. It will come from systems that respect how physical environments and legacy software actually behave.

Automation that works only when nothing goes wrong is not automation. This project is a small step toward agents that can survive contact with reality.

Built With

gemini
gemini-3-pro-preview
python

Updates

Siyu Cheng started this project — Jan 01, 2026 06:41 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.