Inspiration
The "Action Era" signals a shift from AI as a chatbot to AI as an autonomous orchestrator. We were inspired by the massive gap between a raw creative spark (a messy sketch or a handheld video) and a professional-grade brand identity. Most AI tools require perfect prompts; we wanted to build an agent that understands the mess, reasons through the geometry, and autonomously plans a high-fidelity visual system without human hand-holding.
What it does
OMNI-VIBE is an Autonomous Creative Director. It takes multimodal inputs—hand-drawn sketches, raw product videos, or mood boards—and uses Gemini 3’s spatial-temporal reasoning to:
- Analyze Spatial Intent: Interpret the "vibe" and physical geometry of objects.
- Autonomous Planning: Generate a multi-step execution plan for a brand's visual identity.
- Self-Correction: Critically evaluate its own design choices (e.g., rejecting a clashing color palette and explaining why).
- Execute Action Triggers: Orchestrate technical prompts for Logo Marks, Social Canvases, and Product Mockups ready for high-fidelity generation.
How we built it
We architected OMNI-VIBE directly within the Google AI Studio Build Tab, leveraging the full power of Gemini 3 Pro.
- Context Management: We utilized the 1M token context window to store entire "Brand Bibles," ensuring long-term consistency across different asset requests.
- Reasoning Engine: We implemented Thought Signatures in the System Instructions, forcing the model to perform "Thinking Levels" (Analyze → Conceptualize → Verify → Execute).
- Multimodal Pipeline: The system processes image/video inputs to extract "Chromatic Signatures" and "Spatial Logic," which are then mapped to professional design frameworks (like Swiss Grid or Industrial Baroque aesthetics).
Challenges we ran into
The primary architectural challenge was moving beyond "Simple Prompt Wrapping." We had to engineer the Self-Correction loop—ensuring the AI didn't just agree with the user, but acted as a "Director" that could reject sub-optimal ideas. Achieving Spatial-Temporal consistency (making sure the logo rationale matched the 3D shape in a video) required precise instruction tuning to prevent hallucinations and maintain high-fidelity output.
Accomplishments that we're proud of
We successfully built a system that demonstrates Autonomous Agency. We are particularly proud of the "Thinking Log" feature, where the AI exposes its internal reasoning and self-correction process. Seeing the agent transform a 10-second shaky video of a mundane object into a "Bespoke Obsidian and Gold" luxury brand concept was a "Wow Factor" moment for us.
What we learned
Building OMNI-VIBE taught us that the future of AI is not in generating more content, but in orchestrating quality. We learned how Gemini 3’s massive context window fundamentally changes the RAG (Retrieval-Augmented Generation) paradigm—allowing us to keep the entire design history "live" in the agent's memory for perfect brand continuity.
What's next for OMNI-VIBE: Autonomous High-Fidelity Brand Director
Our roadmap includes:
- API Integration: Directly connecting OMNI-VIBE to Nano Banana Pro and Imagen for one-click final asset generation.
- The Marathon Agent: Expanding the agent's capability to manage 30-day social media campaigns autonomously, including real-time "Vibe-Shifts" based on market trends.
- Browser-Based Verification: Using Google Antigravity to let the agent test and verify web-based design artifacts in a live environment.
Log in or sign up for Devpost to join the conversation.