-
-
Homepage
-
images generated when narrating the story
-
images generated when narrating the story
-
images generated when narrating the story
-
images generated when narrating the story
-
images generated when narrating the story
-
images generated when narrating the story
-
images generated when narrating the story
Inspiration
Ancient myths and epics, like the Mahabharata, were originally passed down through the oral tradition—told by master storytellers around a fire. We wanted to bring this ancient experience into the modern era. Our inspiration was to combine the immersive, dramatic power of a live storyteller with the visual excitement of modern manga and comic books. We asked ourselves: What if an AI could not only tell you a mythic story with real emotion but also draw a comic book of it in real-time as you listen?
What it does
Shivu is a legendary, interactive mythic storyteller. Using real-time voice interaction, you can talk to Shivu and ask him to tell you epic tales.
As Shivu speaks—using dramatic pauses, whispers for secrets, and high energy for battles—he proactively uses his "Vision" to generate full, multi-panel comic book pages that illustrate the exact scene he is describing. The visuals appear dynamically on your screen, creating a seamless audio-visual storytelling experience. Shivu never breaks character, ensuring you are completely immersed in the legend.
How we built it
- Frontend: Built with React, Vite, and Tailwind CSS to create a dark, atmospheric, and immersive user interface.
- Voice & Persona: Powered by the Google Gemini Live API. We engineered a strict system prompt to give Shivu his unique affective performance, ensuring he acts like a true storyteller rather than an AI assistant.
- Real-time Visuals: We utilized *Gemini 3.1 pro preview * integrated via Function Calling. As the Live API streams the story, it triggers a
generate_visual_elaborationtool to render 4-5 panel manga-style comic pages in the background without interrupting the audio stream.
Challenges we ran into
Building a real-time, multimodal AI application came with several unique hurdles:
- API Rate Limits: Initially, Shivu was generating images too frequently, hitting the free-tier quota limits (429 errors). We solved this by instructing the model to generate multi-panel comic pages every 4-5 sentences, rather than single images constantly, balancing visual pacing with API limits.
- Image Text Hallucinations: When telling cultural stories (like Indian epics), the image model would try to be "authentic" by hallucinating unreadable native scripts (like pseudo-Sanskrit) in the comic panels. We fixed this by injecting a hardcoded, strict override into the image prompt to enforce English-only text and speech bubbles.
- Continuous Tool Calling: The Live API initially wanted to generate one image at the start of a story and then talk for 2 minutes straight. We had to heavily refine the system instructions to force the model to interleave its tool calls—pausing its internal generation every few sentences to trigger a new image before continuing to speak.
- Deployment Environment Variables: We faced issues with Vite not bundling system environment variables on Vercel, which broke the image generation in production. We resolved this by explicitly mapping
process.env.GEMINI_API_KEYin the Vite config.
Accomplishments that we're proud of
We are incredibly proud of the immersion factor. By successfully running real-time audio streaming alongside asynchronous image generation, we created an experience where the user truly feels like they are sitting across from a master storyteller. Getting the AI to maintain its persona while juggling complex background tool calls was a massive win.
What we learned
- Deep insights into managing WebRTC/WebSocket connections for the Gemini Live API.
- Advanced prompt engineering techniques to control both voice modulation (affective performance) and image generation styles simultaneously.
- How to handle state management in React when dealing with asynchronous, real-time AI streams.
What's next for Shivu
- Comic Book Export: Allowing users to download the generated comic pages as a PDF graphic novel at the end of the story.
- Expanded Mythologies: Adding specific knowledge bases for Greek, Norse, and Egyptian mythologies.
- Dynamic Soundscapes: Integrating background music and sound effects that change based on the mood of the generated story.
Built With
- gemini-2.5-flash
- gemini-live-api
- google-gemini-api
- react
- tailwind-css
- typescript
- vercel
- vite
Log in or sign up for Devpost to join the conversation.