Inspiration

Eidos was born from a frustration with the "black box" nature of modern AI image generation. While tools like Midjourney and DALL-E 3 are powerful, they often lack the precision needed for professional workflows. We wanted to build a tool where the AI acts not just as a generator but as a collaborator—an agent that understands intent and provides tactile, semantic controls. We envisioned a world where you could manipulate an image as easily as a 3D scene, with lighting, composition, and style at your fingertips.

What it does

Eidos is a multi-agent image editing platform that bridges the gap between natural language prompting and professional-grade editing.

  • Agentic Collaboration: A backend powered by Gemini 3 orchestrates multiple specialized agents to analyze images, enhance prompts, and execute complex edits.
  • 3D Spatial Control: A real-time 3D "Composer" allows users to set up lighting, camera angles, and object positions, which are then translated directly into semantic prompt parameters.
  • Dynamic semantic UIs: Instead of generic text boxes, Eidos generates custom sliders, dropdowns, and color pickers tailored to the specific generation model and intent.
  • Pro-grade Workflows: Multi-model support (fal.ai, Replicate), image-to-image refinement, and a seamless desktop experience built with Electron.

How we built it

We leveraged a modern, high-performance stack to ensure a fluid creative experience:

  • Frontend: A React-based renderer using Tailwind CSS. We used Three.js (React Three Fiber) for the 3D controls and Framer Motion for smooth, interactive transitions.
  • Backend: A robust FastAPI service managing the agentic lifecycle. We integrated pydantic-ai and Agno for agent orchestration, using Gemini 3 as the primary "brain" and nano banana pro to generate the images.

Challenges we ran into

  • 3D-to-Text Mapping: Translating 3D spatial properties (like directional lighting and tilt) into text that AI models actually respect was a major engineering feat involving complex vector math and prompt engineering.
  • WebGPU & Threading: Implementing high-performance client-side features required navigating the complexities of WebGPU and Emscripten, specifically offloading heavy tasks to pthreads to keep the UI responsive.
  • State Synchronicity: Orchestrating state between a real-time 3D canvas, a dynamic chat interface, and a remote agent backend required a very sophisticated Zustand store and stable WebSocket management.

Accomplishments that we're proud of

We've created an experience where "AI" doesn't just mean "typing a prompt." Being able to rotate a light source in 3D and watch the image update is a game-changer. Our system doesn't just generate; it reasons. It can look at an image, identify its flaws, and suggest specific semantic controls to fix them. We built a design language that feels premium and focused, minimizing distraction and maximizing the creative "flow."

What we learned

We learned that users don't just want "better" images; they want their images. Precision is the most valuable feature in generative AI. Designing interfaces for agents requires a shift from static forms to dynamic, progressive disclosure. You have to design for what the agent might need next. Especially in AI, where latency is high, a snappy UI with immediate feedback (like the 3D canvas) is crucial for maintaining the illusion of direct manipulation.

What's next for Eidos

  • Multi-Object Compositions: Expanding the 3D composer to handle multiple subjects with individual controls.
  • Local-First Capabilities: Moving more model execution to the edge using WebGPU for near-instant latency.
  • Fine-Tuning Integration: Allowing users to train or bring their own LoRAs directly into the agentic workflow.
  • Collaborative Editing: Real-time multi-user "whiteboarding" with AI agents.

Built With

Share this project:

Updates