💡 Inspiration

We wanted to turn the experience of telling a story into something kids and families could immediately enjoy as a finished picture book. The idea was to make the creative process feel natural: speak the story out loud, choose a style and tone, and let the app handle the rest. That led us to build a system that could take a raw voice recording and transform it into something structured, visual, and ready to read.


🔍 What It Does

DreamsComeTrue records a story in the browser, sends the audio to a backend, and turns it into a multi-page illustrated picture book. The user can pick a visual style, reading level, and tone before recording. From there, the app:

  • Transcribes the speech
  • Cleans and structures the transcript into picture-book pages
  • Generates one illustration per page

The UI streams progress back to the user so pages appear as they are completed.


🛠️ How We Built It

We split the project into three parts:

Layer Tech
Frontend React + Vite
Backend Express
ML Service FastAPI

The frontend handles the storytelling experience and sends audio plus selected filters to the backend. The backend creates and tracks jobs, then orchestrates the full pipeline. The ML service keeps provider credentials out of the browser and handles transcription, cleanup, and image generation.

Pipeline:

  • ElevenLabs Scribe v2 — speech transcription
  • K2 Think v2 — transcript cleanup and page structuring
  • Gemini — illustration generation, page by page

The job flow is asynchronous by design — since transcription and image generation take time, the backend responds quickly while the frontend polls for updates until the book is ready.


🚧 Challenges We Ran Into

  • Coordinating multiple AI services into one smooth flow without exposing sensitive provider keys in the browser
  • Handling the inherent latency of transcription and image generation, which required a job-based architecture over a simple request-response model
  • Surfacing clear progress in the UI so users could watch the story take shape page by page

🏆 Accomplishments We're Proud Of

  • The app produces a complete picture book experience from a spoken story — not just a transcript
  • Getting streaming job progress working so the interface feels alive during processing
  • A clean architectural separation between UI, orchestration, and ML concerns — easier to reason about and deploy

📚 What We Learned

  • How much structure is needed to turn unstructured speech into something that reads like a children's book
  • How to reliably coordinate transcription, cleanup, and illustration generation in a single pipeline
  • The value of isolating secrets in a dedicated service and using async jobs for long-running tasks

🚀 What's Next for Dreams Come True

  • Improve the quality and speed of the story-to-book pipeline
  • Refine how generated pages appear and how the story progresses from recording to finished book
  • Deepen the existing style, reading level, and tone options for a more personalized picture-book output

Built With

Share this project:

Updates