StoryPal

Inspiration

Every child loves hearing a story where they are the hero. But personalized, educational storybooks are often expensive, hard to find, or time-consuming to create.

Teachers and parents also struggle to explain complex topics—like how digestion works or why we feel angry—in a way that is engaging for a young child.

We asked ourselves: what if any parent could create a fully personalized, illustrated, narrated storybook in under 60 seconds?

That question became StoryPal.

What It Does

StoryPal is an AI-powered web app that generates personalized educational storybooks for children aged 3–12.

You enter your child's name, age, appearance, and choose a topic. Within a minute, you get:

A multi-page story where your child is the main character
Unique watercolor-style illustrations for every page
Voice narration read aloud page by page
"Did you know?" fun fact boxes
An interactive quiz to reinforce learning
A downloadable PDF to print and keep

Topics span 35+ curated subjects including Science, Health, Emotions, Life Skills, and History—plus unlimited custom topics.

How We Built It

StoryPal uses a three-stage AI pipeline:

1. Story Generation We use Groq (LLaMA 3.3 70B) to generate a structured JSON story (title, pages, facts, quiz) in ~3 seconds. Prompts dynamically inject the child’s details into the narrative.

2. Image Generation Each page is converted into an image prompt and sent to Cloudflare Workers AI (SDXL Lightning), optimized for a consistent watercolor children's book style.

3. Voice Narration We integrated Kokoro-82M (ONNX) using kokoro-js. Text is chunked into sentence-level segments, converted to audio, and merged into seamless page narration.

Frontend: React + Vite + Tailwind CSS UI/UX: Framer Motion + shadcn/ui Storage: IndexedDB PDF Export: jsPDF

Challenges We Faced

TTS Chunking: Kokoro has strict input limits. We implemented sentence-aware chunking and precise WAV concatenation without audible gaps.
Consistent Image Style: SDXL initially produced inconsistent outputs. We refined prompts heavily to maintain a uniform watercolor aesthetic.
JSON Reliability: LLM responses were sometimes malformed. We added schema validation and retry logic.
No Backend Storage: We used Base64 encoding + IndexedDB to persist full stories (text, image, audio) while avoiding quota issues.

What We Learned

Groq’s inference speed enables real-time storytelling UX.
TTS architecture decisions depend on latency vs. model size tradeoffs.
Accessibility (WCAG, touch targets, keyboard nav) is essential in children’s apps from day one.