WordWave

Homepage

Inspiration

World Wave started from an audio-visual interactive workshop I joined during my Stanford summer school. We built a simple poem generator that reacted to sound and movement, and it made me realize how powerful it is when kids can feel language instead of just reading it on a worksheet. I wanted to bring that same playful, embodied experience to vocabulary, emotions, and mental health for kids—so World Wave was born.

What it does

Kids start by choosing a word like “curiosity.” World Wave generates a real-time melody and floats related words, synonyms, and opposites across the sky. Using hand gestures, they “grab” the words they like, then make an OK sign to tell the system, “I’m done, start singing!” World Wave then sings their chosen words, and after each loop, kids get a turn to sing along. If they’re unsure what a word means, they can say “Hey Lulu” to call an AI agent that explains the word in kid-friendly language and connects it to feelings and mental wellbeing.

How we built it

Frontend & visuals: React + Vite with three, @react-three/fiber, and @react-three/drei to render a playful 3D world (sky, trains, trees, balloons). Word clouds float as interactive “notes” kids can grab.
Motion & gestures: react-webcam + MediaPipe Hands drive hand tracking. Gestures (OK sign, pinches, sways) trigger actions like selecting words, starting the song, and moving the scene—so it works hands-free.
Music generation: tone.js runs the transport, beat, and synths. @magenta/music generates melodies and grooves; we quantize and align syllables so words land on beat.
Voice & audio pipeline: ElevenLabs TTS is called through Convex actions (keeping API keys server-side) to “sing” or speak chosen words. Returned audio buffers and character timings are scheduled precisely with tone.js.
AI agent (“Lulu”): ElevenLabs Conversational AI listens for “Hey Lulu” and replies in kid-friendly language—explaining words, giving examples, and guiding play. Wake-word listening is client-side; keys stay protected via Convex for TTS.
Word semantics: Synonyms and opposites are fetched and color-coded so kids can tap to hear them, see relationships, and weave them into the song.

Challenges we ran into

Getting gesture detection to be reliable in different lighting conditions and camera angles.
Syncing music, visuals, and gestures Visuals responds at the right musical moment.
Balancing fun vs. cognitive overload: too many words or effects can distract from learning, so we had to simplify the experience.

Accomplishments that we're proud of

We built a working prototype where kids can literally grab words out of the air with their hands.

The system generates melodies in real time and sings the chosen words back to the user.
We implemented a gesture-based flow: turn on camera → pick words with pitch motion → OK sign to start singing → singalong loop.
We created an AI helper (“Lulu”) that can explain words in a supportive, non-judgmental way to encourage curiosity and emotional exploration.

What's next for WordWave

Add progression and mini-quests, so kids can unlock new melodies or skies as they learn more words.
Introduce multi-kid / classroom modes, so groups can build “word songs” together.
Explore basic progress dashboards so parents/teachers can see what kids have been exploring,while keeping the experience fun-first and low-pressure for the kids.