Questory

cover
figma make brainstorm

Inspiration

Approximately 1 in 6 children are neurodivergent, and for kids with ADHD, passive forms of learning – like sitting still and listening to stories – can be tough. We created Questory to provide a more engaging way for kids with ADHD to learn. Questory is an AI-powered, interactive storybook platform that turns traditional storytime into a more engaging & interactive learning experience.

What it does

🔊 Voice Cloning: Storytime is foundational to early development. Yet, not every child has consistent access to storytime due to busy family schedules. Questory allows parents to record their voices, which are then recreated using ElevenLabs and used to narrate every story, so stories can be read by any loved ones, anywhere, and at any time.

📚 AI Story Generation: QuestStory allows kids to choose their own topics and generates customized stories based on their interests. Stories are generated by Gemini 3.1 Image Flash and feature multi-page narratives with colorful illustrations on each page.

🎤 Narration & Conversation: Stories are read aloud by a live conversational narrator — Finn the Fox, Sally the Sloth, or a cloned family voice. Kids can ask questions/chat with the narrator at any time. If the child disengages for a long time, the narrator audibly checks in.

🔍 Interactive Features: Branching Choices allow kids to shape the story themselves by choosing what happens next. The "What is This?" Magnifying Glass, powered by Gemini, lets kids explore more information about any object in the illustrations.

⚡Proactive re-engagement: A 30-second inactivity timer triggers the narrator to speak up and re-engage the child.

How we built it

💻Frontend: Next.js + React for dynamic, visually-pleasing components for narration, choices, and visual interaction.

🤖 AI: Gemini 3.1 Image Flash generates structured prompts, scenes, branches, and visuals. Images are generated from scene metadata by Gemini and stored via Cloudinary. The magic Magnifying Glass is also powered by Gemini.

🗣️ Voice: ElevenLabs handles Text-to-Speech narration, narrating every page with word-level timestamps, enabling synchronized script highlighting for the kid to follow along to, live conversational agents that power the live narrator, and parent voice cloning that lets parents record a passage and instantly create a custom narrator voice.

📦Image Storage: Cloudinary stores and delivers all generated images with automatic format optimization.

🎨 Design: Early prototyping with Figma Make, final design created with Figma, converted to code with Claude

Challenges we ran into

Consistent image style across parallel generation: Because all images are generated simultaneously, each call has no shared context. We solved this by embedding strict, detailed style guides and fixed character descriptors into every prompt to maintain visual consistency.

Word-level highlight sync: Updating React state at high frequency caused flickering. We avoided this by using a requestAnimationFrame loop to directly update DOM styles only when the active word changes.

Accomplishments that we're proud of

Seamless voice system with ElevenLabs: We unified TTS, conversational AI, and voice cloning into a single consistent character experience across narration, Q&A, and re-engagement.

Human-centered, interactive UI/UX: Designed in Figma for kids — with readable fonts, clear visuals, and features like word highlighting to support early readers and attention retention.

Robust image generation pipeline: Parallelized generation with tightly controlled prompts ensures both speed and stylistic consistency.

What we learned

Coordinating parallel AI pipelines is critical: Generating text, images, and audio concurrently — while keeping them synchronized — is key to achieving real-time performance.

Prompt design directly impacts system reliability: Highly structured prompts (especially for characters and style) are necessary to reduce variance across independent model calls.

Voice systems require orchestration, not just APIs: Creating a consistent “character” meant carefully aligning multiple ElevenLabs endpoints and managing state across them.