imagEdit

Starting page
Upload file and write prompt
Original image vs Agent result

Inspiration

The idea for imagEdit came from a simple realization: powerful image editing tools are often too complex for quick, everyday tasks. I wanted to see if I could use multimodal reasoning to bridge that gap. Instead of forcing users to navigate through hidden menus and sliders, I wanted to allow them to simply describe their intent and let the AI handle the technical execution.

What it does

imagEdit is an autonomous image orchestration agent that turns plain-English instructions into precise technical edits. Users can upload an image and give natural language commands like "remove the background and make it a 500x500 sticker." The app doesn't just "chat" about the image; it analyzes the request, plans the necessary steps, and executes the actual image manipulation tools to deliver a downloadable result.

How we built it

I designed the application using a "Brain and Tools" architecture:

The Brain: Gemini 3 Flash acts as the central orchestrator, analyzing both the user's prompt and the image context simultaneously.
The Blueprint: To bridge the gap between AI thought and code, I used Structured Outputs. The model generates a precise JSON blueprint that specifies which local tools to trigger.
The Tools: I developed a suite of local Python modules using rembg for AI-powered segmentation and PIL (Pillow) for various image processing tasks.
The UI: The frontend is built with Streamlit, utilizing session state management to ensure that the AI's reasoning and the user's edited assets persist throughout the session.

Challenges we ran into

Building with cutting-edge multimodal models presented several real-world engineering hurdles:

API Resilience: To handle 429 (Rate Limit) and 503 (Server Overload) errors during peak traffic, I implemented an Exponential Backoff strategy using the tenacity library.
Schema Handling: The model occasionally returned data in inconsistent formats, such as lists rather than dictionaries. I built a defensive parsing layer to sanitize and normalize these responses, preventing application crashes.
State Management: Ensuring that the image data remained accessible across various Streamlit reruns required a careful implementation of stateful logic.
Deployment Sync: Navigated GitHub-to-Streamlit connection hurdles to ensure a stable hosting environment.

Accomplishments that we're proud of

I am particularly proud of creating a tool that feels truly "agentic." It’s rewarding to see the system correctly interpret an ambiguous command and orchestrate multiple technical tools to achieve the result. Successfully implementing a production-grade retry logic that makes the app feel stable, even when the underlying API is under heavy load, was a significant personal milestone.

What we learned

This project taught me the importance of defensive programming when working with LLMs. I learned that you cannot always assume the model will follow a schema perfectly, and building "safety nets" in the code is essential for a smooth user experience. I also gained a deeper understanding of multimodal prompt engineering and how to effectively structure AI outputs for tool use.

What's next for imagEdit

imagEdit is currently a powerful prototype, but I plan to expand its "toolbelt." Future updates will include generative fill for object replacement, automated batch processing for entire folders of images, and more advanced filters. My ultimate goal is to turn it into a full-scale creative assistant that can handle complex design workflows through a single conversation.

Built With

dotenv
gemini-3-flash-api
pillow
python
rembg
streamlit
tenacity

Submitted to

Gemini 3 Hackathon

Created by

I served as the lead developer for imagEdit, responsible for the end-to-end architecture and implementation. My primary focus was building the "Reasoning Engine" using Gemini 3 Flash, where I engineered multimodal prompts to translate natural language into structured JSON blueprints. I also developed the local Python execution layer, integrating rembg and PIL to handle the actual image processing. Additionally, I focused on system resilience, implementing exponential backoff strategies to manage API rate limits and creating a defensive parsing layer to ensure stable communication between the AI brain and the local tools.

Adip Rijal

Updates

Adip Rijal started this project — Feb 09, 2026 12:29 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.