Inspiration
This project was inspired by my 3 year old who wishes to be told a bedtime story several times a day. He comes up with a combination of several different characters for the plot that I resorted to the use of LLMs to leverage its creative side and used Gemini models and the Google Agent Development kit particularly in this case to execute this.
What it does
The application starts with a greeter agent which greets the user and then requests a topic for the bedtime story telling app. The topic for the story as shared by the user is saved in the session's state object and passed on to the next agent in line which is the story creation agent . The story creatio agent is an LLM agent built on 'gemini-2.0-flash-001' builds the story based on topic shared. Further the agent passes on the task of narrating the generated story to a story telling agent . This agent generates is equipped with a generate_audio tool in its armor and generates an audio output in a friendly, expressive voice that narrates the story out to the user. The audio generation has been done using the Elevenlabs API. For orchestration purposes I have a Sequential Agent aka story pipeline agent which carries out these tasks sequentially.
How I built it
I implemented the project in Google ADK . There are 2 implementations in github for this application.
The folder named 'ADK_Agent' demonstrates the use of the ADK web interface to retrieve a topic for story telling from the user and for narrating back to the user. Here _Sequential Agents_ have been used for orchestrating the tasks between the greeter and the story_telling_agent.
The folder 'Bedtime_app' demonstrates the explicit definition of session , memory management and runners which gives the developer a lot more control and flexibility over agent's execution lifecycle, session management, and state persistence. I have coded a basic streamlit front end to provide a friendly interface. User can enter a topic for the story via the app which inturn calls the runner and the agents are executed Sequentially as in the first set up.
Challenges I ran into
- Faced a number of issues along the way that I had to troubleshoot ranging from basic issues where the ADK web interface could not detect the agents . This made me realize that the folder structuring mattered as well.
- Also not defining an agent as a root_agent as well gave me a bit of issue which I was able to fix later. 3. Lastly, function declaration for the audio_generation function done a certain way resulted in an error where the error log suggested the use of a more explicit function declaration. Following the suggestion and making the signature more explicit got rid of that issue
Accomplishments that we're proud of
While there's lot more to do here like adding a video and a background score to the story narration, I am happy at this stage with the overall output it produces. I tried out both the ADK web interface as well as explicitly created a session object and a runner to implement this.
What we learned
Loved the Google ADK documentation and the trial projects provided. It offered a lot of learning into the 'how to' of things along with how the adk does session, in-memory management and how we could leverage the power of the LLMs and agents even more , through MCP connections , functions as tools , agents themselves as tools etc.
What's next for Bedtime Story app
- Video generation using models like Veo2
- Adding a background score as well in the background
- I plan to extend it later to make it more of an educational app that includes fetching and reading out news , have built in trivias/quizzes and also preparatory modules for Year 1, 2 and so on. The main idea is to make it a fun and engaging way of teaching kids as well at some point rather than limiting it to bedtime story telling.
Log in or sign up for Devpost to join the conversation.