Inspiration
The heart of VIVI lies in a deeply personal story. One of our teammates, Daniel, tutors a bright, curious lad on the autism spectrum: someone who doesn’t lack imagination but struggles to express it in ways the traditional school system expects. Within his own family, he’s seen how conditions like aphantasia and ADHD can quietly complicate learning, particularly when lessons rely heavily on visualisation or sustained attention.
That got us thinking. What if we could make it easier for neurodivergent children to express what’s in their minds, not with worksheets or strict instructions, but simply by looking and speaking? What if we could turn spontaneous, fleeting thoughts into joyful, personalised visuals? We wanted to build a tool that gave children back the magic of storytelling, without needing them to picture it in their heads. With VIVI, that’s exactly what we set out to do.
What it does
VIVI is an AI-powered storytelling companion that turns imagination into imagery. It’s designed for neurodivergent learners, especially children who struggle with traditional forms of reading, writing, and visual thinking.
Here's how it works: . Gaze detection waits until the user is looking attentively, then begins the session. . Speech recognition picks up what the user says and transcribes it. . Prompt enrichment reformulates short or fragmented speech into full prompts. . DALL·E generates vibrant illustrations based on the user’s story. . A Chrome extension allows visualisation of highlighted online text, ideal for non-verbal or shy users. . Twilio makes it possible for fictional characters to call the user and speak in character. . Analytics track engagement, vocabulary, word count per session, and other educational metrics. . It’s imagination made visible and storytelling made effortless.
How we built it
This project required a bit of orchestration across several technologies: . Frontend: React + TypeScript, built with Vite for quick iteration . Backend: FastAPI in Python, managing real-time gaze tracking, voice processing, and data analytics . Gaze Detection: MediaPipe + OpenCV to detect focused eye contact . Speech-to-text: OpenAI Whisper, highly accurate even with soft-spoken or accented input . Translation & NLP: GPT-3.5 turbo handles translation and prompt refinement . Image Generation: DALL·E 3 brings ideas to life in vivid, storybook-like illustrations . Chrome Extension: Manifest v3, Node.js Express backend (with CORS proxy) . Voice Simulation: Twilio API lets story characters phone you and chat dynamically . Analytics: We log session stats like average words per query, total words, and sentence distribution
We stitched these together into one cohesive user flow, gaze, voice, visuals, all in real time.
Challenges we ran into
. Synchronising gaze detection and voice without overwhelming the user took several iterations . Generating a visual from short, sometimes vague utterances (e.g. “then a dragon came!”) required prompt wizardry . Handling non-English audio input and making sure it retains tone and nuance post-translation . The Chrome extension presented cross-origin headaches that needed an Express-based proxy to resolve . And of course, trying to get it all to work smoothly in under 36 hours meant skipping a bit of sleep here and there
Accomplishments that we're proud of
. We created a pipeline that truly helps neurodivergent kids express themselves, without needing to write or draw . Got fictional AI characters to ring you up and have a natter, in character, magic! . Built a live Chrome extension that turns any webpage into a visual storytelling playground . Tracked meaningful metrics, like vocabulary growth and session length, so parents and educators can see progress over time
What we learned
. Designing for accessibility is not about simplifying, it's about meeting users where they are . AI models are powerful, but it’s how you stitch them together that brings real magic . Prompt engineering is its own kind of craft, vital when translating spontaneous voice into stunning imagery . Gaze detection, voice input, and generation need to be thoughtfully choreographed to feel natural . Building with empathy first makes even the most technical challenges worth it
What's next for VIVI
. Test VIVI in real-world classrooms and with neurodivergent families . Expand translation to support multilingual voice narration and character calls . Enhance analytics: track engagement trends, emotional tone, vocabulary growth . Allow children to co-create entire visual stories, page by page, and save them as books . Give users more control over style, let them choose colours, characters, or tone . Partner with SEN educators, autism support groups, and accessibility organisations
Built With
- chrome
- dotenv
- elevenlabs
- express.js
- fastapi
- git/github
- mediapipe
- mongodb
- node.js
- oauth
- openai
- opencv
- pydantic
- python
- react
- twilio
- typescript
- vite

Log in or sign up for Devpost to join the conversation.