NurtureLink — A Voice for Every Parent, in Their Own Voice The story Stephen Hawking communicated through a single cheek muscle for the last 20 years of his life. Today, three quieter audiences still cannot reliably reach their own children:
Deaf or mute parents — many never speak their child's name aloud. Paralyzed parents (ALS, locked-in syndrome, late-stage Parkinson's) — fully cognitive but can't physically respond when the child runs to them. Working / absent parents — physically able, but in another room exactly when something goes wrong. For each of them, the gap is the same: the child can't hear them when it matters most. Assistive technology has built solutions for one audience at a time. None of them sound like Mom.
NurtureLink is one Gemma-4-powered app that closes the gap for all three — and every AI response is rendered in the parent's own cloned voice.
Features We shipped ten features. The three that matter most for judging:
🛡 GuardianWatch — the unique flagship An always-on AI guardian that watches a child (or elder) through a webcam every 12 seconds. Gemini 2.5 Flash analyzes the frame for hazards: sharp objects, open flames, climbing, choking, falls, hand-on-chest cardiac signs, unfamiliar people. On danger, it does three things in 2 seconds:
Speaks a calm intervention in the parent's own cloned voice: "Aarav, please put down the scissors. Mama is coming." Logs the moment with a thumbnail. On repeated danger readings, escalates to a full emergency overlay with GPS coordinates, a one-click WhatsApp deep-link pre-filled with the incident, and a Gemini-written emergency call script naming the right local number (911 / 108 / 112). This is the bridge between safety camera and parent voice. No competitor combines all three.
📖 StoryWeaver — emotional payoff Gemma 4, running locally on the laptop, writes a four-scene bedtime story tuned to the child's age and the parent's caregiver note. Each scene's illustration prompt is fed to Pollinations.ai for a watercolor children's-book picture. Every line is then narrated in the parent's matched voice, while their photo pulses with sonar rings on screen. A locked-in father can put his child to bed with a story he and Gemma wrote together.
👶 Parent Bridge — the cry classifier Records 7 seconds of audio, runs a hybrid pipeline: an Audio Spectrogram Transformer first decides whether it's a baby cry, then a custom acoustic-feature classifier (pitch, zero-crossing rate, spectral rolloff, MFCC) trained on the donateacry corpus picks one of five categories — hunger, gas, tired, burping, discomfort. We tested 5/5 on labeled samples. Gemma 4 then reasons over the audio + a baby photo + feed-and-diaper context to write a parent-tone suggestion: not just "hunger," but "Try offering a feed — the rhythmic pattern with sucking motions is the most common hunger signature."
Plus Voice Setup — 10 s recording matched to 14 neural voices, 98 % avg match EyeBridge — Head-pose + iris dwell typing, MediaPipe Face Landmarker, One-Euro filter, inline sign.mt ASL avatar SignSpeak — Real-time MediaPipe Hands, rule-based classifier on 21 landmarks, 18 signs, ~2 s response EarBridge — Reverse direction: hearing child speaks → captions + emoji emotion → ASL avatar CalmCue — Mic-only auto-comfort: detect cry → play parent's pre-saved soothing phrase in their voice Child Voice Check-in — Visual emotional check for non-verbal children LifeGuardianAI bridge — Links to my sibling AI Studio project Architecture Frontend (nurturelink_web/) React 19 + TypeScript + Vite + Tailwind v4. Talks to Gemma + Gemini APIs directly from the browser using fetch and FormData. All ML for hands and face landmarks runs in the browser via MediaPipe Tasks Vision (no API calls, no quota).
Backend (nurturelink_backend/) FastAPI + PyTorch + Hugging Face Transformers. Hosts google/gemma-3n-E2B-it multimodal (float16, low CPU memory) for sign recognition, story generation, child check-ins, and parent-bridge diagnosis. Hosts the AST + feature-based cry classifier. Wraps Microsoft Edge TTS for free 14-voice neural synthesis. Auto-converts browser webm/opus uploads to WAV via pydub + imageio-ffmpeg.
Standalone demo — cry_demo.html, single self-contained file with Tailwind via CDN, opens in any browser, has an instant-mode for video pacing.
Why Gemma 4 specifically Three reasons:
True multimodal — audio bytes + image + text context in one prompt. Other free models force three separate API calls and lose the cross-modal reasoning. Runs on a laptop — gemma-3n-E2B-it at float16 fits in ~5 GB RAM. Privacy-friendly, no recurring cost, no rate limit, no internet dependency for the core features. Reasoning, not just classification — the difference between "hunger" and "Baby sounds hungry — the rhythmic build matches the feeding interval. Try offering a feed." That second sentence is what an exhausted parent at 2 AM actually needs. What we'd build next True voice cloning (XTTS-v2 once it stabilizes on Windows + Python 3.12) instead of neural-voice matching A Forever Voice mode where a parent with terminal illness records voice + answers to common child questions while they still can, so the child can converse with them after they're gone On-device fine-tuned hazard model so GuardianWatch doesn't depend on Gemini API quota Native mobile app via Capacitor
Log in or sign up for Devpost to join the conversation.