Inspiration
We wanted to make storytime more engaging and useful for early learning. Children’s books are rich in narrative, vocabulary, and visual cues, but they’re often static.
- This project aims to build a full pipeline that can:
- Read a story
- Extract scene and visual cues
- Generate or select matching images
- Align audio narration and visuals
- Expose an interactive player that highlights words and keeps media in sync
We were inspired by how modern generative models (e.g., Gemini for text-to-prompt, Stability for image generation) can be combined with simple, robust frontend playback logic to produce something accessible for teachers and kids — no heavy tooling required.
What it does
This project turns children’s picture books and short stories into interactive, educational movies: each story is split into segments, each segment gets an illustrated background, a synchronized audio narration (master clock), and a muted video/visual track that follows the audio so the scene “plays” in time with narration. The UI is a storyboard viewer where teachers/parents/students can play, scrub, and step through segments while words highlight and media stay in sync.
How we built it
Stack
- Frontend: Vite + React + TypeScript + Tailwind + shadcn/ui
- Backend: Node.js + TypeScript (file storage, story parsing, image generation hooks)
- Assets: Local media under
/dataduring dev
Architecture
index.tsx— loads the sample story JSON, discovers local media, and builds an ordered list of segments (videoUrl,audioUrl).StoryboardViewer.tsx— the interactive player that handles play/pause/next/prev actions, audio-video synchronization, and segment navigation.backend/src/services/*— Node services for file storage, story parsing, image generation, and audio/video processing.
Pipeline overview:
- Story parsing and partitioning — the story JSON is parsed into structured segments based on sentence or scene boundaries.
- Prompt generation — for each partition, a visual prompt is generated using Gemini, describing the scene in a way suitable for Veo 3 video generation.
- Audio synthesis — for each partition’s narration text, we generate realistic voiceover using ElevenLabs, producing synchronized audio segments.
- Video generation — the Gemini prompts are fed into Veo 3 to produce matching short video clips for each segment.
- Media stitching — for each partition, the generated audio and video are combined and aligned to create cohesive, time-synced scenes.
- Interactive playback — the frontend loads these stitched segments and provides a storyboard interface where users can scrub, step, and play through the story while keeping narration, visuals, and text in sync.
Challenges we ran into
We also ran into challenges with Veo 3 and API credits. Generating consistent, high-quality visuals required multiple iterations and careful prompt tuning, and running out of generation credits occasionally blocked testing. To stay productive, we added local caching and fallback options so the system could still function when external API calls failed.
Accomplishments that we're proud of
We’re proud of how the project came together in the end. Each part — from the audio-driven playback and video sync logic to the generative image pipeline and interactive storyboard viewer — started as a separate piece, but gradually came together into a cohesive experience. Seeing all these components work in sync to turn a static story into an engaging, interactive movie was one of the most rewarding moments of the project.
What we learned
Frontend Playback Nuances
- Browsers block or reject
play()promises without user interaction.
→ Fixed usingawait play()and event-based gating (canplay/playing). - Making audio the master clock simplifies sync — aligning video to audio minimizes drift.
- Filenames and spaces can break assumptions (
audio_1.mp3vsaudio_1 copy.mp3).
Working with Generative Models
- Text prompt generation differs from image consumption — both need careful fallbacks.
- Video generation still needs a lot of work.
- The system runs fine without API keys for optional features.
Built With
- elevenlabs
- gemini
- react
- typescript
- veo

Log in or sign up for Devpost to join the conversation.