Inspiration
The idea for imagEdit came from a simple realization: powerful image editing tools are often too complex for quick, everyday tasks. I wanted to see if I could use multimodal reasoning to bridge that gap. Instead of forcing users to navigate through hidden menus and sliders, I wanted to allow them to simply describe their intent and let the AI handle the technical execution.
What it does
imagEdit is an autonomous image orchestration agent that turns plain-English instructions into precise technical edits. Users can upload an image and give natural language commands like "remove the background and make it a 500x500 sticker." The app doesn't just "chat" about the image; it analyzes the request, plans the necessary steps, and executes the actual image manipulation tools to deliver a downloadable result.
How we built it
I designed the application using a "Brain and Tools" architecture:
- The Brain: Gemini 3 Flash acts as the central orchestrator, analyzing both the user's prompt and the image context simultaneously.
- The Blueprint: To bridge the gap between AI thought and code, I used Structured Outputs. The model generates a precise JSON blueprint that specifies which local tools to trigger.
- The Tools: I developed a suite of local Python modules using
rembgfor AI-powered segmentation andPIL(Pillow) for various image processing tasks. - The UI: The frontend is built with Streamlit, utilizing session state management to ensure that the AI's reasoning and the user's edited assets persist throughout the session.
Challenges we ran into
Building with cutting-edge multimodal models presented several real-world engineering hurdles:
- API Resilience: To handle
429 (Rate Limit)and503 (Server Overload)errors during peak traffic, I implemented an Exponential Backoff strategy using thetenacitylibrary. - Schema Handling: The model occasionally returned data in inconsistent formats, such as lists rather than dictionaries. I built a defensive parsing layer to sanitize and normalize these responses, preventing application crashes.
- State Management: Ensuring that the image data remained accessible across various Streamlit reruns required a careful implementation of stateful logic.
- Deployment Sync: Navigated GitHub-to-Streamlit connection hurdles to ensure a stable hosting environment.
Accomplishments that we're proud of
I am particularly proud of creating a tool that feels truly "agentic." Itβs rewarding to see the system correctly interpret an ambiguous command and orchestrate multiple technical tools to achieve the result. Successfully implementing a production-grade retry logic that makes the app feel stable, even when the underlying API is under heavy load, was a significant personal milestone.
What we learned
This project taught me the importance of defensive programming when working with LLMs. I learned that you cannot always assume the model will follow a schema perfectly, and building "safety nets" in the code is essential for a smooth user experience. I also gained a deeper understanding of multimodal prompt engineering and how to effectively structure AI outputs for tool use.
What's next for imagEdit
imagEdit is currently a powerful prototype, but I plan to expand its "toolbelt." Future updates will include generative fill for object replacement, automated batch processing for entire folders of images, and more advanced filters. My ultimate goal is to turn it into a full-scale creative assistant that can handle complex design workflows through a single conversation.
Built With
- dotenv
- gemini-3-flash-api
- pillow
- python
- rembg
- streamlit
- tenacity
Log in or sign up for Devpost to join the conversation.