NUMA - AI Companion for Cognitive Disorders Problem Individuals living with Alzheimer’s and Parkinson’s often struggle with memory lapses, confusion, and disrupted daily routines. Caregivers face the constant challenge of providing reminders, emotional reassurance, and medical consistency — often across long distances or busy schedules. Existing tools tend to be transactional (like reminder apps or health trackers) and lack the warmth, adaptability, and context-awareness needed for cognitive care.

Solution We built NUMA, an AI-powered companion that combines emotional intelligence with everyday assistance. NUMA acts as a supportive presence for patients while keeping families informed through subtle behavioral insights. It is designed as a modular system composed of three specialized agents: Memory Agent – Detects repeating questions, learns new ones from caregivers, and tracks short-term memory decline patterns. Routine Agent – Provides warm, human-sounding voice reminders for meals, medications, and appointments. Mood Agent – Analyzes tone of voice to detect anxiety or confusion, triggering reassurance messages or caregiver notifications.

Our current prototype demonstrates all three agents. The long-term vision is to integrate NUMA into wearable devices, enabling continuous, context-aware care throughout the day.

Tech Stack AI Models

OpenAI GPT-4o / GPT-4-turbo – Used for natural conversation, semantic similarity detection (memory recall), and generating empathetic responses.

OpenAI Whisper / GPT-4o-mini-transcribe – For accurate and multilingual speech-to-text (patient and caregiver voice inputs).

Text-to-Speech (TTS) – Converts AI responses into natural, human-like audio feedback, enabling NUMA to “speak.”

Embeddings (text-embedding-3-small) – Power the memory recall and question similarity logic that lets NUMA “learn” and reuse past answers.

Backend & Logic

FastAPI – High-performance Python framework for handling all API routes (/ask, /answer, /stt), integrating seamlessly with the AI models. Uvicorn – ASGI web server used to run the FastAPI app efficiently. SQLite + SQLAlchemy – Local database for persistent Q&A memory; lightweight yet reliable for rapid prototyping. FFmpeg (via Subprocess) – Converts user audio into a consistent format before transcription, ensuring compatibility and reducing errors. Python-dotenv, NumPy, Datetime, UUID – Supporting utilities for configuration, similarity computation, and unique session tracking.

Automation & Orchestration

n8n (Low-code Workflow Engine) – Automates inter-agent communication (Memory, Routine, Mood) and handles scheduling, reminders, and notifications. Webhooks – Enable real-time communication between the agents, backend, and front-end interface. Event-driven Design – Each module listens for triggers like “new question,” “emotion detected,” or “routine missed,” ensuring modular extensibility.

Frontend

Vanilla HTML, CSS, and JavaScript (ES6) – Lightweight, responsive interface divided into three panels (Patient, Caregiver, Numa). MediaRecorder API – Captures real-time voice input from the user. Web Speech API – Used for real-time text-to-speech playback of NUMA’s responses. Flexbox & CSS Animations – Create a fluid, gradient-based 3-panel layout with dynamic play/pause reverberation effects. CORS Middleware (FastAPI) – Ensures secure communication between the backend and frontend, even when hosted separately. (Future upgrade) Next.js 14 + Vercel – For scaling into a web application with a talking, glowing avatar that reacts to speech in real time.

Integrations

API Connectors: Webhooks between agents (Memory, Routine, Mood). Voice & Emotion Analytics: Planned integration with Azure Cognitive Services or OpenAI Realtime API for detecting stress or confusion. Data Sync: Secure synchronization across devices, enabling family members or doctors to view insights remotely.

Challenges

Synchronizing voice output and animation timing between n8n, backend, and UI layers.

Handling CORS and MIME-type issues when streaming live audio from AI responses.

Crafting responses that sound empathetic, not robotic — balancing tone, pacing, and warmth.

Managing cross-platform data flow (speech, text, emotion) without noticeable latency.

Designing an interface intuitive enough for cognitively impaired users — less command-driven, more conversational.

Accomplishments

Built and deployed three functional AI agents across different platforms in under 40 hours.

Designed a working front-end that plays live AI-generated audio reminders with synchronized visual feedback.

Created a modular, scalable system architecture that can evolve into a wearable device prototype.

Demonstrated how behavioral analysis and conversational AI can coexist in one emotionally intelligent system.

What We Learned

Empathy-first AI design matters: tone, pacing, and warmth are as critical as factual accuracy.

Lightweight workflows and webhooks outperform monolithic apps for experimentation and scalability.

Prototyping with real voice data early helps validate emotional resonance, not just functional success.

Designing for cognitive users requires simplicity, predictability, and gentle feedback rather than complex interactions.

In Essence

NUMA isn’t just a medical tool — it’s a companion. It represents a step toward emotionally intelligent AI that bridges the gap between clinical care and human connection, offering support to both patients and caregivers through small, meaningful interactions.

Built With

Share this project:

Updates