Using Google Cloud AI

Pashabook transforms children's drawings into animated video storybooks using Google Cloud AI. Users upload a drawing, and the system generates a 3-page story with illustrations (Gemini 2.0 Flash + Imagen 3), animations (Veo 3.1 Fast + FFmpeg), and narration (Cloud TTS). Built with React Native (Expo) frontend and Node.js backend on Cloud Run, using Firestore for job tracking and Cloud Tasks for async processing. Videos are stored in Cloud Storage with 24-hour TTL. The app supports Japanese and English, features real-time progress tracking, and includes background music mixing. Key learnings: managing Cloud Run timeouts for video processing, implementing efficient polling patterns to prevent API storms, and optimizing parallel processing (narration + animation) to reduce generation time from 3 minutes to ~2 minutes.

Inspiration

Transforming children's imagination into reality and strengthening parent-child bonds through digital storytelling.

What it does

Analyzes children's hand-drawn illustrations and automatically generates narrated animated video storybooks in under 3 minutes.

How we built it

React Native (Expo) + Google Cloud (Gemini 2.5 Flash Image, Cloud TTS, Veo 3.1 Fast) + Cloud Run + Firestore. Gemini interleaved output generates story + illustrations in a single API call.

Challenges we ran into

Imagen 3 quota limitations and migrating to experimental Gemini 2.5 Flash Image. Achieving sub-3-minute generation through parallel processing.

Accomplishments that we're proud of

Implementing Gemini interleaved output as designed. Character-specific voices with BGM mixing. High-speed generation pipeline under 3 minutes.

What we learned

Gemini multimodal API capabilities. Importance of quota management. Prompt engineering for child-appropriate content generation.

What's next for Pashabook

Imagen 3 fallback implementation. Expanding to 5-6 pages. Enhanced multilingual support. Parent library features.

Share this project:

Updates