Inspiration

We were tired of the "Chatbot Fatigue." While LLMs are powerful, interacting with them via a linear text stream feels limiting. Real-world problem solving isn't linear; it involves planning, branching paths, delegating sub-tasks, and visualizing results. We wanted to build a system that feels less like a text editor and more like a Magic Lamp—where you state a high-level intent, and an intelligent "General Contractor" figures out how to do it, spawns the necessary workforce, and delivers a rich, multimodal result. With the release of Gemini 3.0, we finally had the reasoning capabilities and speed to make this "Agentic Orchestration" happen in real-time.

What it does

WishForge is a Generative UI Agentic Workflow engine. Instead of answering a question, it: Orchestrates: The "Genie" (Gemini 3.0 Pro) analyzes your abstract wish (e.g., "Plan a Japan trip" or "Design a startup"), detects underlying intent, and creates a strategic execution plan. Spawns Agents: It dynamically creates specialized AI agents (e.g., "Logistics Expert", "UX Designer", "Risk Analyst") to execute tasks in parallel or sequence. Synthesizes Reality: Instead of returning text, it uses Generative UI. The system decides how to present the answer. It constructs interactive Widgets—rendering high-fidelity mobile app mockups, plotting data on charts, pinning locations on Google Maps, generating diagrams with Mermaid.js, and even producing audio narrations.

How we built it

We built WishForge as a client-side React application powered entirely by the Google GenAI SDK: The Brain: We use gemini-3-pro-preview with strict JSON Schema (responseSchema) for the Orchestrator to ensure complex dependency logic and conflict detection. The Muscle: We use gemini-3-flash-preview for the agents to ensure high-speed parallel execution. The Creatives: We integrated gemini-2.5-flash-image for generating UI mockups/concept art and gemini-2.5-flash-preview-tts for voiceovers. The UI: We used Tailwind CSS for a cinematic glassmorphism aesthetic and Mermaid.js for rendering the live thought-process diagrams. Grounding: We implemented Google Search and Maps grounding to ensure the agents aren't just hallucinating, but fetching real-world data.

Challenges we ran into

Orchestration Logic: Teaching the model to understand dependencies (e.g., "Don't design the UI until the Features are defined") required significant prompt engineering and schema definition. Polymorphic Rendering: Handling the "Unknown." Since we don't know what widgets the AI will choose to build, the frontend had to be extremely flexible to render Maps, Charts, or Audio players dynamically without crashing. JSON Stability: Ensuring Gemini generated valid JSON for complex nested structures (like mobile mockups with multiple screens) was tricky, solved by enforcing strict Types in the responseSchema. Race Conditions: Managing state when 5 different agents return data, images, and audio simultaneously required a robust React state management strategy.

Accomplishments that we're proud of

The "Generative UI" Engine: We are most proud that the app designs itself based on the content. If you ask for numbers, it builds a chart. If you ask for a location, it builds a map. The UI is fluid intelligence. Transparent Reasoning: The "Workflow Visualizer" node graph. Watching Gemini's "brain" work in real-time—seeing it spawn nodes, execute tasks, and merge branches—demystifies AI for the user. Gemini 3.0 Native: We successfully leveraged the specific strengths of the 3.0 series (Thinking Configs & Flash Speed) to create an experience that feels instantaneous despite the complexity.

What we learned

Schemas are Superpowers: The responseSchema feature in the Gemini API is the single most important tool for building agentic apps. It turns vague LLM ramblings into executable code. Agent Personas Matter: An agent told to be a "Critical Risk Analyst" produces vastly different (and better) work than a generic AI prompt. Multimodality is the Future: Combining Text, Audio, and Image generation in a single request flow creates a "Wow" factor that text alone simply cannot match.

What's next for WishForge

Tool Use & Action: Currently, agents "research" and "design." The next step is "doing"—giving agents API keys to actually book the flight, deploy the code to GitHub, or send the email. Collaborative Forging: A multiplayer mode where a team of humans can interact with the team of AI agents in the same workspace. Long-Term Memory: Implementing a vector database so the Genie remembers your preferences, past projects, and business context across different sessions.

Built With

Share this project:

Updates