Inspiration

This project was inspired by the experience of a first-time parent on our team. During long nights, repeated baby cries often lead to guesswork and rising anxiety. We wanted a calm, explainable assistant that helps parents decide what to try next, without medical diagnosis.

What it does

WhyMyBabyCries combines baby cry audio with recent care history to infer the most likely cause of crying and recommend the next best action. It provides:

Cry transcription and cause probabilities Clear, confidence-aware next-step guidance Low-latency hints during live recording Continuous personalization via context memory

How we built it

We use Gemini 3 as the central multimodal reasoning engine. Each crying event sends Gemini 3 a combined payload: crying audio plus structured care context (feeding, diaper, sleep, and feedback). Gemini 3 returns strict JSON with:

audio_analysis (transcription + probabilities) ai_guidance (cause, alternatives, actions, notice)

Live audio is streamed in chunks with partial reasoning for early hints. Context memory is fed into future requests, allowing Gemini 3 to act as a continuously adapting caregiver assistant rather than a stateless classifier.

Challenges we ran into

We balanced multiple input signals without over-weighting any single modality. Streaming partial reasoning required careful prompts to avoid premature or overconfident guidance.

What we learned

Gemini 3 excels at reasoning with context under uncertainty. Explicitly surfacing confidence and alternatives significantly improves trust in sensitive scenarios.

What's next

We plan to explore optional video-based context and parent-provided comfort audio as additional non-diagnostic signals. Future work will continue to prioritize privacy, transparency, and responsible AI use.

Built With

Share this project:

Updates