Inspiration
Every night, my younger brother would ask my father to tell him a story about a different topic, and my father would sometimes struggle to come up with a story. I considered the opportunity Generative AI has to produce engaging, custom stories on demand.
In the US alone, schools are underfunded by 150 billion dollars annually, hence students are unable to access quality educational materials. Using this tool, I believed that with a single device and Internet connection, any teacher could amplify the learning experiences of their students (The Century Foundation).
What it does
This tool first takes a few details from the user regarding the story, such as the character*, target age range and moral. Then, the story is generated in English and a translated language if requested by the user. This is followed by a human-like audio version of the story, either in English or the translated language if a translation has been requested. Finally, a vivid image of an interesting scene from the story is created and displayed.
*The character is optional, and an LLM call is made to generate an interesting character if the field is left blank.
How I built it
The user interface was built using Streamlit, while a range of Generative AI models are used to produce the story, image and audio.
Story text: GPT-3.5-turbo API Image: Dall-e-3 API Audio: ElevenLabs API for English, Google Translate TTS for other languages Translation: Google Translate API
Challenges I ran into
Making sure the stories are safe was a key consideration since the target audience is especially vulnerable to inappropriate outputs. Furthermore, I wanted to ensure that politically insensitive content is not generated by the tool. To ensure this, I made another LLM call to ensure the character input was not objectionable.
Furthermore, if many people were to use it simultaneously, or the rate limits had been otherwise exceeded, some of the APIs could raise errors. To make sure the user experience is seamless, I added back up models. For example, the back up for Dall-e-3 is Google's Imagen-006.
Accomplishments that I am proud of
- I am awed by the quality of the stories and images, with was in part due to the prompt engineering I implemented.
- When I showed it to some of my brother's friends, they were amazed by the quality of the story and image, which confirmed that the target audience did indeed like the tool.
What I learned
- How to use a variety of different APIs to handle different output formats (text, audio, images)
- How to use advanced computer science principles, like multi-threading, to parallely make the audio and image API calls.
What's next for MOSAIC (MOral Stories by AI for Children)
I would like to put this tool to work in the real world, helping students with their learning. I would love to get feedback from educational professionals and anyone else who knows how this tool could make a difference.
Built With
- google-generativeai
- openai
- python
- streamlit
Log in or sign up for Devpost to join the conversation.