Dreams Come True

Example1
Example2
Example3

💡 Inspiration

We wanted to turn the experience of telling a story into something kids and families could immediately enjoy as a finished picture book. The idea was to make the creative process feel natural: speak the story out loud, choose a style and tone, and let the app handle the rest. That led us to build a system that could take a raw voice recording and transform it into something structured, visual, and ready to read.

🔍 What It Does

DreamsComeTrue records a story in the browser, sends the audio to a backend, and turns it into a multi-page illustrated picture book. The user can pick a visual style, reading level, and tone before recording. From there, the app:

Transcribes the speech
Cleans and structures the transcript into picture-book pages
Generates one illustration per page

The UI streams progress back to the user so pages appear as they are completed.

🛠️ How We Built It

We split the project into three parts:

Layer	Tech
Frontend	React + Vite
Backend	Express
ML Service	FastAPI

The frontend handles the storytelling experience and sends audio plus selected filters to the backend. The backend creates and tracks jobs, then orchestrates the full pipeline. The ML service keeps provider credentials out of the browser and handles transcription, cleanup, and image generation.

Pipeline:

ElevenLabs Scribe v2 — speech transcription
K2 Think v2 — transcript cleanup and page structuring
Gemini — illustration generation, page by page

The job flow is asynchronous by design — since transcription and image generation take time, the backend responds quickly while the frontend polls for updates until the book is ready.

🚧 Challenges We Ran Into

Coordinating multiple AI services into one smooth flow without exposing sensitive provider keys in the browser
Handling the inherent latency of transcription and image generation, which required a job-based architecture over a simple request-response model
Surfacing clear progress in the UI so users could watch the story take shape page by page

🏆 Accomplishments We're Proud Of

The app produces a complete picture book experience from a spoken story — not just a transcript
Getting streaming job progress working so the interface feels alive during processing
A clean architectural separation between UI, orchestration, and ML concerns — easier to reason about and deploy

📚 What We Learned

How much structure is needed to turn unstructured speech into something that reads like a children's book
How to reliably coordinate transcription, cleanup, and illustration generation in a single pipeline
The value of isolating secrets in a dedicated service and using async jobs for long-running tasks

🚀 What's Next for Dreams Come True

Improve the quality and speed of the story-to-book pipeline
Refine how generated pages appear and how the story progresses from recording to finished book
Deepen the existing style, reading level, and tone options for a more personalized picture-book output

Built With

elevenlabs
expres
fastapi
gemini
k2
node.js
react
render
vercel
vite

Submitted to

HackPrinceton Spring '26

Created by

I worked on the front-end part of the project.

CHARVI MISHRA
I engineered the backend architecture and ML pipeline, including integrating ElevenLabs, K2, and Gemini, while deploying both the frontend and backend services.

Brayden Uglione
Hi there, I'm Brayden. I'm a computer science student at Rutgers with a passion for Machine Learning.
Ananya Jha
Sayan Gupta

Updates

Ananya Jha started this project — Apr 19, 2026 04:07 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.