MediMate — A Voice-First AI Companion for Daily Health

Inspiration

Sometimes the best ideas come from stepping outside our comfort zone.

I joined the GenAI Genesis Hackathon with no expectation of winning anything. My goal was simple: to learn something new and to build again as I used to years ago.

Walking into the venue was a strange and funny experience. Most participants were undergraduate students from the University of Toronto, speaking in fast-moving startup slang about agents, APIs, and frameworks I had barely explored yet. For a moment, I genuinely felt old. At the same time, I felt young againbecause the energy in the room reminded me of the first time I discovered programming.

About ten years ago, I started experimenting with Android SDK, Java, and Eclipse, building small apps just out of curiosity. This hackathon gave me that same feeling again — the joy of tinkering with technology simply to see what is possible.

The idea behind MediMate came from considering a very real problem: many elderly people struggle to keep track of their medications, daily exercise, and nutrition. Existing apps often assume users are comfortable navigating complex interfaces. But for many seniors, the most natural interface is simply voice.

So the question became:

What if a health companion were as simple as having a conversation?

That idea evolved into MediMate — a voice-first AI assistant that helps manage medications, daily activities, and food suggestions while keeping caregivers informed.


What I Built

MediMate is designed as a voice-driven personal health companion.

Instead of navigating complicated menus, the user interacts with the system by simply speaking to it. The AI can:

  • Remind users to take medications
  • Log daily health activities (walking, exercise)
  • Suggest meals and nutrition guidance
  • Maintain a daily health log
  • Provide caregivers with a simple dashboard summarizing activity

The app focuses on simplicity.

The interface intentionally contains only a few core components:

  • A large central voice button that initiates conversation
  • A chat interface as fallback
  • A daily log view summarizing what happened during the day
  • A caregiver dashboard that shows adherence to medications and activities

The system acts as a personal companion loop:

  1. The AI interacts with the user via voice.
  2. The user responds naturally.
  3. Interactions are logged.
  4. Logs are summarized for caregivers.

Conceptually, this creates a feedback cycle:

$$ User\ Interaction \rightarrow AI\ Interpretation \rightarrow Health\ Log \rightarrow Caregiver\ Insights $$

Key Features Built

Feature Description
Voice-to-voice interaction Speak → transcribe → AI reply → spoken aloud
Multi-intent voice recognition One button handles meds, walks, meals, and chat
Medication reminders Overdue detection with time-based scheduling
RAG-powered nutrition Medication-aware recipe recommendations
Distress detection Keyword detection triggers caregiver alert instantly
Caregiver dashboard Web view with wellness score, alert history, activity
5-screen mobile app Home, Timeline, Chat, Profile, Nutrition
Dual LLM (local + cloud fallback) Ollama LLaMA 3.2 primary, Gemini 2.5 Flash fallback
Fully local STT faster-whisper base model, no cloud upload
Email alerts HTML caregiver emails via Gmail SMTP

How We Built It

MediMate was built as a full-stack prototype combining a mobile application, open-source AI models, and a lightweight retrieval system. The mobile app was developed using React Native (Expo + TypeScript) with a voice-first interface that lets users log medications, record activities, and ask questions via voice or chat. The backend is powered by FastAPI and integrates Ollama-hosted LLaMA models for conversation and faster-Whisper for local speech-to-text processing. Health logs and user information are stored in a SQLite database, while nutrition suggestions are generated using a simple RAG pipeline over a local recipes.json dataset, which retrieves relevant recipes and injects them into the LLM context to provide grounded food recommendations. A minimal caregiver dashboard summarizes medication adherence, activity logs, and alerts. The entire system was designed with a privacy-first approach, running open-source models locally so sensitive health data does not need to leave the device.


Challenges

This project was also a journey through the rapidly evolving AI tooling ecosystem. One of the biggest challenges was simply figuring out how everything fits together.

The modern AI stack includes:

  • LLM APIs
  • voice models
  • retrieval systems
  • agent frameworks
  • frontend integrations

Understanding how these components communicate required reading documentation, experimenting, breaking things, and rebuilding them.In many ways, it felt like going back in time. When I first started programming years ago, I spent countless hours exploring documentation, testing small snippets, and gradually assembling working systems. This hackathon recreated that exact experience. I often found myself vibe coding (finally understood the craze about this)— exploring ideas, reading docs, trying APIs, and iterating quickly until something worked.

Concrete technical challenges encountered:

  • Whisper latency on CPU — The default openai-whisper was too slow for a real-time demo. Switched to faster-whisper with int8 quantization and beam_size=3, reducing transcription from ~4 seconds to under 1 second for a 5-second clip on an M4 Mac.

  • Voice-to-chat state continuity — The user could start a conversation by voice and then switch to the chat tab. Without server-side memory, context was lost. Solved with a per-user rolling 10-message history stored in the backend, shared across both /voice/transcribe and /chat endpoints.

  • LLM JSON reliability — When asking LLaMA to return structured JSON for the Nutrition screen, the model would sometimes include prose around the JSON or hallucinate fields. Added a regex extraction pass to strip surrounding text, and a hardcoded fallback recipe set for when parsing still fails.

  • Expo Audio on Android — Recording permissions and audio format (m4a) required careful configuration. The Whisper server explicitly handles m4a input since that is what Expo-Audio produces on both iOS and Android.

  • Intent detection without an NLU model — Rather than adding a second model, intent detection in /voice/transcribe is keyword-based: medication/pill/took for log_medication, walked/steps for log_walk, food/recipe/eat for chat_food, and a catch-all for general chat. Fast, explainable, and zero latency.

Another challenge was scope management.

Initially, the system included:

  • multi-user architecture
  • complex caregiver analytics
  • extended interaction workflows

But for the hackathon demo, I realized the product needed to be simplified drastically. The final design focuses on a single-user scenario so the core interaction can be demonstrated clearly, which still has some bugs to fix.


What I Learned

This hackathon was less about the final product and more about learning the modern AI development landscape.

Some of the key things I explored include:

  • Integrating open-source LLMs into applications
  • Building a voice interaction pipeline
  • Implementing RAG-based information retrieval
  • Connecting AI reasoning to real-world workflows
  • Rapid prototyping with React Native + AI APIs

More importantly, I learned that even as technology evolves quickly, the core joy of building remains the same. Stepping into a room full of younger developers (some are kids doing high school) initially made me feel a little out of place.

I was simply a developer again — learning, building, and having fun.


Future Work

This prototype can be extended in many directions:

  • Multi-user support for families
  • Medication verification via computer vision
  • Personalized health models trained on individual patterns
  • Continuous monitoring via wearable devices
  • Privacy-preserving local AI deployment
  • Better voice AI models
  • Wearable integration — Apple Watch / Fitbit heart rate + step data logged automatically
  • Offline-first mobile — full AsyncStorage sync so the app works without WiFi
  • Multi-language support — Whisper already handles 99 languages; add UI localisation
  • Pharmacy integration — auto-import prescriptions via QR code or label photo

The long-term vision is to build a truly personalized AI health companion that runs locally, protects user privacy, and integrates seamlessly into everyday life.


Final Thoughts

Participating in this hackathon made me go truly out of my comfort zone.!! It was a reminder that sometimes the best way to learn something new is simply to jump in, explore, and build.


Acknowledgements

  • The developers and communities behind the many open-source tools and libraries that made this project possible, along with their excellent documentation.
  • Anthropic Claude and OpenAI models, which assisted during development by helping generate code, debug issues, and explore implementation ideas.

Built With

Share this project:

Updates