Inspiration
"What if?" one of the sole reasons why children would eat your brains out, with PlotPix, that becomes simple! PlotPix creates stories with just a single prompt, as well as generate pictures and audio, which helps the children submerge into a world of their own, created just for them, by them.
What it does
- Page-by-Page Narrative: PlotPix creates a dynamic story one page at a time using Gemini 2.5 Flash-Lite, which remembers previous events to maintain a consistent plot.
- It utilizes Nano Banana to generate high-fidelity, watercolor-style art for every page.
- The system allows users to upload reference images, which the AI then uses to keep characters looking the same throughout the entire book.
- Every story snippet is automatically converted into audio using Text-to-Speech, making the book accessible for children who are still learning to read.
- Fancy UI created just for children, to keep them hooked and addicted.
How we built it
- Gemini 2.5 Flash-Lite: We used this model to handle the narrative logic. It transforms simple user ideas into structured story snippets and detailed visual descriptions.
- Nano Banana: This model generates high-fidelity watercolor illustrations. We implemented a Character Anchor system by passing previous frames as reference images to maintain visual continuity.
- gTTS: We added a text-to-speech integration, which acts as the narrator and helps add viability.
- Gradio: We built a custom UI using CSS to ensure the technology felt like a toy, not a terminal.
Challenges we ran into
Nano Banana is not free, hence image generation could not be demonstrated, as there were constraints. We'd love for that feature to work, but for now we had to work with text and audio only, image generation would have added a whole new aspect to this. However, if image generation were to work, our model would successfully implement image generation.
Managing a digital book is much harder than a simple chatbot because the app has to "remember" the order of pages while allowing the user to move backward.
Solution: We built a robust "history_state" index that separates "Read Mode" from "Write Mode".
- One of the biggest struggles in AI-generated storybooks is "character drift." In early versions, a character might be a cat on Page 1 but turn into a boy on Page 2.
Solution: We solved this by implementing a Reference Feedback Loop. When generating Page n+1$], we don't just send text; we send the image from Page n back into the Nano Banana model to act as a visual anchor.
Accomplishments that we're proud of
- We engineered a Custom CSS theme that transforms standard buttons into "Candy Clouds" and the interface into a "Book Frame."
- We were successful in implementing text, speech and visualization, all of the three aspects into our model.
- Instead of letting the app crash when hitting API quotas, we built logic to detect limits and provide a "Magic Prompt" to the user.
- We successfully built a state-aware navigation system that tracks a user's position in the story.
What we learned
- We learned that an LLM is only as good as its memory. To keep a story from wandering, we discovered how to feed the model its own previous narrative beats.
- We learned to implement a Feedback Loop where we passed the previous page's image as a reference to the Nano Banana model. This taught us how to leverage image-to-image capabilities to anchor visual identity.
- Custom-style Gradio using CSS to hide the technical parts of the interface, like redundant submit buttons or paperclip icons.
- Provide Sensory Feedback by combining visual art with gTTS audio, learning that multimodal output is much more engaging than text alone.
- We learned to build Graceful Fallbacks, ensuring that even if the image generation paused, the "Magic Prompt" and story logic remained active so the child's creativity was never punished.
What's next for PlotPix: My Magical Picture Book
If we had more time, we would focus on improving the user experience, adding more features based on user feedback, and exploring deeper integration with Google Gemini to provide more accurate and contextual responses. We would also improve error handling, add user authentication, and deploy the application to a cloud platform for broader accessibility.
Built With
- google-gemini-2.5-flash
- gradio
- gtts
- nano-banana
- pillow
- python
- python-dotenv
Log in or sign up for Devpost to join the conversation.