Inspiration
I am a father of two boys, aged five and two. I am always curious about their day, but when I ask how it went, the usual responses I get are, "It was fine," "Great," or "Okay." Any further enquiry results in unenthusiastic groans or an outright refusal to talk.
This got me thinking. What if I used their favorite wind-down activity, bedtime stories, to understand a little bit more about what they did? Could I use this nightly ritual as an incentive to uncover their day-to-day challenges and achievements?
What it does
The app provides an intuitive child-parent interface where parents and children can work together to create a story that reflects their unique style and, more importantly, their daily events.
The first step is setting up the experience. Children have plenty of options to shape their journey. They can choose to star in the story themselves, or pick from characters like a prince, princess, mermaid, wizard, or several others. They can also choose whether the story will be in English or Spanish. The best part is that if something significant happened to them that day, they can enter those details and have them seamlessly woven into the plot.
Once this quick setup is complete, the app generates three different journeys they can take, each with different core values attached based on their choices and daily events.
Once a journey is chosen, the family is presented with an immersive experience of custom narration and imagery. It is a story perfectly tailored to them, delivering the exact values they prefer.
How we built it
Core Tech Stack
- Backend: FastAPI and Python
- Frontend: Next.js and React
- Hosting: Digital Ocean App Platform
Agent Architecture For the agent workflow, I implemented a sequential agentic pattern utilizing three distinct agents:
- The Story Seed Agent: This agent generates the initial three story options based on the user's customization choices and daily activities. It also selects the core values that will be woven into the narrative.
- The Story Agent: Once a user selects their preferred journey, this agent takes the response from the Seed Agent to craft the narrative. It creates vivid environments and ensures that character consistency is preserved throughout the entire story.
- The Image Prompt Agent: Finally, this agent takes the scene and environment details generated by the Story Agent and creates highly optimized prompts tailored specifically for the FAL Schnell API to generate the illustrations.
Inference & Tooling I leveraged a variety of specialized models and tools for different parts of the pipeline:
- Story Seeding: Anthropic Claude Haiku
- Story Generation: Llama 70B
- Text-to-Speech: FAL AI TTS Multilingual v2 (with timestamp generation)
- Text-to-Image: FAL AI Schnell
- Agent Orchestration & Validation: PydanticAI
- Observability & Cost Monitoring: Logfire
- Code Assistant: Claude Code
Challenges we ran into
This was the first time I tried to build an agentic application. It was also my first time using the DigitalOcean App Platform and serverless inference APIs. While this unfamiliarity naturally created a steep learning curve, the core technical challenges I encountered and ultimately solved are as follows:
- Parallel execution and latency: Initially, I adopted a polling-based approach where the front-end had to constantly poll to check the status of the story generation. This created bottlenecks and negatively impacted the app's latency.
- Maintaining visual and narrative consistency: Once a journey is selected, it has to stay consistent until the very end of the story. When I initially began, the images varied wildly from page to page. The environment was not preserved, and characters would look completely different on page two compared to page one, even though their names stayed the same. I fixed this by ensuring character and environment details were preserved and passed into the individual prompts for every subsequent API call.
- Lack of observability: Once an inference call was made, it was difficult to observe exactly what prompts were being injected, what the responses were, and how long they took. This visibility was essential to debug errors and fix inconsistent experiences. To solve this, I adopted the Pydantic AI framework and utilized Logfire, which is already seamlessly integrated into the Pydantic AI ecosystem.
- TTS and word highlighting: This was an especially tricky challenge. I observed that the text highlighting stopped working after the first few sentences. Digging deeper into the logs, I noticed that the TTS API was truncating the timestamp data after a certain number of characters. Based on the story length, I split the text and made multiple calls. I truncated the first call at 280 characters and sent the rest in subsequent calls. Finally, I stitched the audio and timestamps back together on the front-end.
Accomplishments we are proud of
My son now shares much more about his day at school. He prompts me for a story at the end of every day and always demands more stories!
What we learned
Building this project taught me a tremendous amount about agentic development, integrating DigitalOcean inference APIs, working with app platforms, and designing a cohesive user experience.
What is next for Storytime
- Guided Prompts: Using targeted prompts instead of open-ended questions to encourage children to share more specific details about their day.
- Logging and Journaling: Creating a space to save memories.
- Email and Summaries: Automatically generating and sending summaries of the created stories.
- Database Support: Allowing for story replays and a "Favorites" section.
- Architecture Upgrades: Transitioning to a Redis cache instead of using a temporary queue for SSE (Server-Sent Events).
Log in or sign up for Devpost to join the conversation.