Daily Tale AI

Kid name in audio story
Daily Generated
Voice Cloning
Readable Story
Learning Lessons

Inspiration

The idea for Daily Tale AI was born from watching my cousin struggle every night to find a new bedtime story for his 3-year-old son. Like many children, his son quickly recognized and complained about repetitive tales, making bedtime a stressful experience. On top of that, his mom was away on a business trip, and the boy constantly said, “Mom tells stories better.” This made me realize that bedtime, which should be calming, had become a challenge for them. With the advances in AI, I saw an opportunity to help.

What it Does

Daily Tale AI delivers fresh, personalized bedtime stories every night, with the child as the main character. The app also includes an AI voice cloning feature, allowing parents to easily clone their voices so their children can still hear stories narrated by them, even when they’re away. This not only keeps bedtime exciting but also creates a sense of comfort and consistency for both the child and the parents.

How We Built It

I developed Daily Tale AI using Ionic for the user interface and Node.js with TypeScript for the backend API available vía Docker-compose in a Digital Ocean droplet. In the process, I evaluated nearly all text-to-speech (TTS) cloud services and local solutions, ultimately selecting OpenAI's TTS for its natural-sounding quality, which stood out as the best option. The generated audio is then processed with FFmpeg to adjust the playback speed, creating a calmer tone to promote relaxation. Additionally, FFmpeg is used to mix in soothing background audio, ensuring each story has a unique and calming atmosphere.

For AI voice cloning, I built an in-house solution using Applio on a computer equipped with an Nvidia GPU. This setup handles the voice cloning process, creating a model of the parent’s voice. Each day, the cloned voice model is then used to infer the story over the OpenAI-generated TTS audio, providing a personalized and familiar listening experience for the child.

Challenges We Ran Into

One of our main challenges was integrating the AI voice cloning in a way that feels natural and user-friendly for parents. Maintaining the quality and consistency of the cloned voices was critical, especially since the voices need to sound convincingly real to provide comfort and familiarity for children.

Achieving a natural rhythm for TTS was particularly challenging, and this was solved using several steps in FFmpeg processing after the core audio is received to clean it, slow it down, and add background audio without introducing artifacts. Voice cloning is still a relatively new technology, and there aren’t affordable cloud providers, so I opted to set up a home server with Applio connected via a duckDNS address, with my API communicating through Gradio endpoints provided by Applio's local setup.

Accomplishments That We're Proud Of

I am proud of the seamless personalization I’ve achieved with the stories, ensuring each tale feels unique and engaging for every child. The AI voice cloning feature is also a major accomplishment, as it provides a comforting experience for children, allowing them to hear their favorite storyteller even when a parent is away. Most importantly, I’ve created a tool that not only helps parents but also turns bedtime into a magical, stress-free time for families.

Additionally, maintaining the consistency of images was challenging. While the app is intended to assist parents and not be used directly by children, it's still important that the images shown align with the audio and text being narrated. Achieving this consistency has been a significant technical feat. Those images are also optimised using Sharp library in api side, to adapt resolution and size for a good and fast experience.

What We Learned

The basic tech stack was familiar to me from previous side projects, but I had to dive deep into audio technologies, which were the most important aspect of this project.

Text-to-Speech (TTS): I did extensive research into the most realistic models for TTS generation.
Audio Cloning: I also had to learn about model generation and audio inference for voice cloning. Setting this up locally with high quality was a complex but rewarding learning experience, ensuring that the cloned voice is both realistic and effective.

What's Next for Daily Tale AI

The next step is to improve accents. Currently, available languages are English (basic accent) and Spanish (with a Latin accent). The next iteration will focus on refining accents and adding more languages. It’s critical for relaxation and sleep that the voice is of high quality and features an accent the child is familiar with, so this will be a key priority moving forward.