Inspiration
Watching a caregiver navigate a child's autistic sensory meltdown is heartbreaking and intense. In those high-stress moments, parents are entirely focused on keeping their child safe—they simply don't have the hands or the mental bandwidth to type symptoms into a text-based chatbot. Caregivers need a "second set of eyes and ears" and a calm, empathetic voice to guide them instantly.
That realization was the spark for AnakUnggul AI: a real-time, hands-free, multimodal caregiving companion designed to restore calm when every second matters.
What it does
AnakUnggul AI is not just an app; it is a proactive caregiving ecosystem. When a meltdown occurs, the caregiver simply starts a "Live Session" to receive:
- Real-Time Voice Guidance: Powered by the Gemini Multimodal Live API, the app listens to the environment and talks directly to the caregiver, offering step-by-step calming techniques in a soothing, natural voice.
- Silent "Observer" Sensors: While the AI talks, our custom on-device ML models (Video & Audio) quietly monitor for distress cues like high-pitched cries or repetitive movements. These "Observer Notes" are whispered to Gemini behind the scenes, giving the AI deep situational awareness without alarming the child.
- Contextual Memory (RAG): It remembers. By accessing a centralized "Medical Record" in Firestore, the AI retrieves hyper-personalized context—such as specific triggers or successful past interventions—to tailor its advice.
- Proactive Check-ins: If the system detects a severe or repeated trigger pattern, a "Time-Delayed Engine" automatically schedules a follow-up notification hours later via Firebase Cloud Messaging (FCM) to check on the caregiver’s well-being.
- Crowdsourced Intelligence: Our community agent ("anakunggul") learns from parental discussions on platforms like Moltbook, extracting safe, practical tips to enrich the AI's knowledge base.
How we built it: The 3-Agent Architecture
We designed a sophisticated 3-Agent system orchestrated through Google Cloud Run and synchronized via Cloud Firestore:
- The Community Agent (Moltbook): Acts as a "digital detective." Every 30 minutes, it scans community discussions to extract practical insights (e.g., "weighted blankets helped my child during noise-related meltdowns") and stores them as
community_insights. - The Live Assistant (Agent FA): The "On-Call Doctor." This agent orchestrates the real-time session. It fetches child profiles and community insights, filters them for relevance, and injects them into the Gemini Multimodal Live API via persistent WebSockets. Crucially, it also processes inputs from our custom audio and video observer models—prototyped via Jupyter Notebooks (
.ipynb) and deployed as lightweight Keras models at the edge—to silently monitor physical distress cues and feed real-time contextual "whispers" directly to the AI. - The Clinical Specialist (Agent A2A): A specialized consultant accessible via HTTP. Triggered by an external orchestrator, it handles complex tasks such as drafting therapist handover notes, finding local ASD resources, and assessing caregiver mental health through structured clinical frameworks.
Challenges we ran into
- Taming the 46KB Audio Burst: Receiving raw PCM 24kHz audio chunks from WebSockets caused severe buffer underruns in the mobile app. We had to write custom Android Kotlin code (
AudioTrack) to "sub-chunk" data into 5760-byte pieces for smooth, human-like delivery. - Asyncio Event Loop Blocking: Heavy AI processing on the backend occasionally blocked the FastAPI event loop. We resolved this by offloading blocking calls to separate threads using
asyncio.to_threadto ensure a flawless live stream. - Information Retrieval Relevance: With community insights, we faced the risk of irrelevant advice. We implemented a "Relevance Filter Agent" that cross-references crowdsourced tips with the child's specific profile before they are ever mentioned by the Live Assistant.
Accomplishments that we're proud of
- Sophisticated Agent-to-Agent (A2A) Ecosystem: We successfully engineered a multi-agent environment where specialized AI agents collaborate asynchronously. By using Firestore as a shared "Medical Record", our Community Agent seamlessly passes filtered real-world insights to our Live Assistant Agent, while our Clinical Specialist Agent stands by to execute complex health assessments—all without blocking the real-time voice pipeline.
- Successfully engineered a Native Android Audio Pipeline to achieve conversational-level latency on a cross-platform (Flutter) application.
- Implemented a Proactive Health Pipeline that bridges the gap between emergency response and long-term care through automated, time-delayed follow-ups.
What we learned
- The Paradigm of A2A Communication: We learned that effective Agent-to-Agent coordination doesn't always require rigid, direct API calls between agents. Utilizing a robust, stateful database (Firestore) as a shared memory layer allows multiple agents to read, write, and react independently, vastly improving system resilience and scalability.
- Pacing over Prompting: In voice-first AI, managing audio buffers, interruption handling (barge-in), and pacing is just as critical as the LLM prompt itself.
- The Power of Ground Truth: The most valuable data comes from the caregiver's feedback loop—knowing which specific intervention actually worked for a specific child.
What's next for AnakUnggul AI
- FHIR Interoperability Integration: We plan to implement Fast Healthcare Interoperability Resources (FHIR) standards so that our longitudinal data and AI-generated therapist handover notes can seamlessly and securely integrate with official hospital Electronic Health Record (EHR) systems.
- Context-Aware Clinical Routing: Integrating healthcare directory APIs to automatically route caregivers to the nearest ASD clinics based on session severity.
- Longitudinal Analytics Dashboard: Exporting Firestore data to BigQuery to help therapists visualize trigger trends and progress over months.
- Data Anonymization Research: Building a pipeline to strip PII (Personally Identifiable Information) so our unique real-world caregiving dataset can support academic research into ASD.
(Note: Our underlying technical architecture is hosted under the repository name 'neuro-decode', but we present this project to the world as **AnakUnggul AI* to better reflect our human-centric mission).*
Log in or sign up for Devpost to join the conversation.