Inspiration

The pre-visualization process is a crucial step in content creation and filmmaking. For high-end productions, building a cinematic workflow can be both time-intensive and costly.

Creative professionals and content creators increasingly use AI tools to accelerate their workflows, but many of these tools remain scattered across multiple platforms. This fragmentation forces users to manually assemble creative assets, scenes, and edits, slowing production and disrupting a smooth, unified creative pipeline from concept to final output.

The result?

Increased cognitive load and creative fatigue, as creators spend more time managing tools than focusing on their vision. Meanwhile, the demand for personalized content continues to rise, and the creator economy is booming—making streamlined, efficient workflows more important than ever.

What it does

🗣️🎙️🎬🤖🚀

WONDER is an AI-powered creative tool designed to accelerate creative workflows. It bridges the gap between an initial idea and a fully realized cinematic sequence, acting as a high-velocity solution for professional and creative projects.

The platform features an intuitive voice agent that allows users to 🗣️simply speak their ideas. From there, the system seamlessly integrates text, images, and audio into a unified output stream within an immersive and easy-to-use dashboard or storyboard display.

The platform can serve several functions including:

👀Pre-Visualization – Rapidly generate a “rough cut” of a scene, complete with narration, to help visualize ideas before full production.

🎙️Narrative Iteration and Version Control – Compare different takes, experiment with alternate aesthetics and character choices, and quickly refine storylines.

💡Accessibility and Multi-Modality – Support diverse creative inputs and outputs, making content creation more versatile.

🧙‍♂️Immersive & Personalized Storytelling – Enable audiences to experience content as a dynamic, interactive story, increasing engagement and connection.

How we built it

The application follows a decoupled architecture, separating a custom Next.js frontend from a high-performance Python backend.

Google ADK and Fast API Server

The core logic is built using the Google ADK (Agent Development Kit), allowing for complex task delegation and structured reasoning. The Root Agent which uses the gemini-live-2.5-flash-native-audio model is wrapped in a FastAPI server. This provides a robust, asynchronous entry point for WebSocket communication.

🤖Agent Orchestration & Models Used

imagen-4.0-fast-generate-001 is used to generate the main image or cover image for the story.

gemini-2.5-flash-image, Nano Banana was used to generate the subsequent images with interleaved output. The model takes as the input the main image and parsed text files of the story to generate consistent and context-aware character outputs

gemini-2.5-flash-preview-tts was used to generate the audio narration.

veo-3.1-fast-generate-001 has also been used in testing to create video generation from the main image.

🎨Custom UI Flexibility

By decoupling the agent logic from the interface, the architecture allows for deep customization of the Front-End User Interface, ensuring a seamless, low-latency experience for real-time AI interactions.

Challenges we ran into

Integrating real-time bidirectional streaming interface with WebSocket Communication. The tutorial on the ADK Gemini Live API Toolkit Demo application was useful in informing a custom adaptation for the system's architecture. Model Hallucinations. In such cases, troubleshooting can be done by refreshing the browser and creating a new session.

Accomplishments that we're proud of

🚀Successful deployment on Google Cloud Run ✅. WONDER can inform the future of generative entertainment. By simply speaking with an intuitive voice agent, creators can move from concept to fully realized and personalized multimedia stories.

What we learned

WebSocket Communication to support bidirectional streaming with ADK.

What's next for Wonder

🎞️Integrating video display on the front end (set up for this has already begun on the backend with the agent, and testing has been conducted.) Integrating text suggestions for camera angles and action suggestions. Ability to directly download assets from the dashboard. Ability to speak to change direction of the narrative.

Built With

Share this project:

Updates