NanoRange: An Agentic Framework for Microscopy Image Analysis
Inspiration
Microscopy image analysis is a critical step in scientific research, from biology and materials science to nanotechnology. But anyone who has worked with microscopy images knows the pain: there are dozens of techniques and ML models for enhancing and segmenting images, each with a set of parameters that need careful tuning. Worse, getting good results often requires chaining multiple tools together into complex pipelines, denoising, contrast enhancement, segmentation, morphological analysis and tweaking each step until the output looks right.
The field is growing fast. The microscopy image analysis software market is projected to expand from USD 2.41 billion in 2024 to USD 5.89 billion by 2031, driven largely by the integration of AI and deep learning into imaging workflows. Yet most researchers still spend more time setting up and configuring tools than doing their actual analysis. We wanted to change that.
The idea behind NanoRange was simple: what if an AI agent could do all of this for you? What if you could just hand it an image, describe what you need, and let it figure out the rest, which tools to use, what parameters to set, and how to chain them together?
What It Does
NanoRange is an agentic framework that automates the entire microscopy image analysis workflow. The user provides an image along with some instructions, and the system takes over:
- Planning Phase: A planner agent reviews the image, selects the appropriate tools, builds a processing pipeline, verifies it, and proposes it to the user for feedback.
- Iterative Execution Loop: Once the user confirms the pipeline, an executor agent runs each tool, a critic agent reviews the output, and a parameter optimizer tunes the settings, rerunning as needed (up to T=3 rounds) until the desired result is achieved.
- Delivery: The system delivers the final processed images along with a detailed report.
What makes NanoRange particularly powerful is that VLMs, including Gemini 3 Pro Image Preview, are included as tools in the toolbox. This means the agent can not only run traditional image processing and ML algorithms but also leverage vision-language model capabilities to enhance, edit, and reason about images directly.
How We Built It
NanoRange is built using Google ADK (Agent Development Kit) and powered by Gemini 3.0. The architecture consists of:
- Multi-agent system: Separate agents for planning, execution, critique, and parameter optimization, each with specialized roles and instructions.
- Tool integration: We equipped Gemini with a comprehensive set of image processing tools including VLMs (Gemini 3 Pro Image Preview), preprocessing tools, segmentation models (Cellpose, watershed), morphological operations, measurement tools, and more, along with detailed descriptions of how each tool works and how its parameters affect the output.
- VLMs as tools: Gemini 3 Pro Image Preview is registered as one of the available tools, enabling the agent to leverage vision-language model capabilities to enhance, edit, and reason about images directly.
- Extensible toolbox: The framework is designed so that adding a new tool is as simple as writing a function and describing it to the agent, no changes to the core architecture needed.
Challenges We Faced
- Parameter sensitivity: Many microscopy tools are highly sensitive to parameter choices. Getting the critic agent to reliably evaluate output quality and guide the optimizer toward better parameters required significant prompt engineering and iteration.
- Pipeline verification: Ensuring the planner agent builds valid pipelines, where the output of one tool is compatible with the input of the next, was a non-trivial challenge, especially with diverse tool interfaces.
- Balancing autonomy and control: We wanted the system to be autonomous enough to be useful, but still give researchers control over the pipeline before execution. Finding the right balance between automation and user oversight was an ongoing design consideration.
- Tool diversity: Each ML model and image processing technique has its own API, input format, and output format. Wrapping them all into a consistent interface that the agent can reason about required careful abstraction.
What We Learned
- The multi-agent architecture pattern (planner → executor → critic → optimizer) is remarkably effective for complex, multi-step tasks where quality matters.
- Describing tools and their parameters in natural language to an LLM is a powerful way to build flexible, extensible systems, the agent can reason about tools it has never explicitly been programmed to use.
- Gemini 3.0's multimodal capabilities make it uniquely suited for this kind of task, as it can both reason about images and generate/edit them.
What's Next
- Batch processing: Scaling the framework to process hundreds of images in parallel, turning hours of manual work into minutes.
- Dataset generation: Running pipelines at scale to generate domain-specific annotated datasets for training new ML models for specialized microscopy use cases.
- Community tools: Opening up the toolbox so the microscopy community can contribute their own tools and share pipelines.
NanoRange — so researchers can focus on discovery, not configuration.
Log in or sign up for Devpost to join the conversation.