Dr.DejaVu

DrDejaVu — Your AI Medical Memory

Inspiration

We've all walked out of a doctor's office and immediately forgotten half of what was said. "Was my A1c improving? What medication did they adjust? What diet advice did they give six months ago?" Patients lose critical health context between visits, and paper aftercare summaries don't cut it. We wanted to build something that listens, remembers, and speaks back — a medical memory powered by voice AI.

What We Learned

Orchestrating 4 Eigen AI models (Higgs ASR V3.0, Higgs Audio V2.5, Higgs Audio Understanding V3.5, gpt-oss-120b) into a single coherent pipeline
Building a voice-first RAG system — converting spoken consultations into vector embeddings via ChromaDB, then retrieving them with semantic search
The importance of chunking strategies — splitting transcripts at sentence boundaries (~1000 chars) dramatically improved retrieval quality:

$$\text{similarity}(q, d) = \frac{q \cdot d}{|q| \cdot |d|}$$

Converting LLM markdown output into voice-friendly text for natural TTS delivery
Using Higgs Audio V2.5's voice cloning to make AI responses sound like the patient's actual doctor — building familiarity and trust

How We Built It

Architecture: A full-stack voice RAG pipeline in ~1,300 lines of code.

Voice Input → Higgs ASR V3.0 (transcribe)
           → gpt-oss-120b (summarize)
           → ChromaDB (index as vectors)
           → Patient asks a question (voice/text)
           → RAG retrieval (top-10 cosine similarity)
           → gpt-oss-120b (generate contextual answer)
           → Higgs Audio V2.5 (speak the answer in doctor's cloned voice)

Stack:

Frontend: React 18 + TypeScript + Vite — Dashboard, Upload, and History pages with real-time voice recording via MediaRecorder API
Backend: FastAPI (Python) — async endpoints for transcription, chat, and RAG queries
Vector DB: ChromaDB with cosine similarity + Sentence-Transformers embeddings
Metadata: SQLite for consultation records
Deployment: Docker Compose — one command to run everything

Eigen AI Models Used: | Model | Role | |-------|------| | Higgs ASR V3.0 | Speech-to-text transcription (9.12% WER) | | Higgs Audio V2.5 | Text-to-speech with voice cloning — responses sound like the patient's own doctor (~150ms latency) | | Higgs Audio Understanding V3.5 | Tone, sentiment & wellbeing analysis from patient voice | | gpt-oss-120b | Summarization + RAG-powered chat completions |

Voice Cloning Flow: During consultation upload, the doctor's voice is captured from the audio recording. Higgs Audio V2.5 clones this voice profile (with permission) so that when the patient later asks a question, the AI answer is delivered in their doctor's familiar voice — making the experience feel like a real follow-up conversation, not a robotic chatbot.

Challenges We Faced

Audio pipeline orchestration — Coordinating transcribe → summarize → index → retrieve → generate → speak across 4 different models required careful async handling
Voice cloning quality — Extracting a clean doctor voice profile from a two-speaker consultation audio required isolating the doctor's segments for cloning input
TTS formatting — LLM responses contain markdown, emojis, and bullet points that sound terrible when spoken aloud. We built a conversion layer to produce voice-friendly text
RAG retrieval quality — Early attempts returned irrelevant chunks. Adding date metadata and hybrid retrieval (transcript chunks + summaries) with patient-scoped filtering (where: {patient_id}) fixed it
Latency budget — A voice-first UX demands fast responses. Higgs Audio V2.5's ~150ms first-token latency was critical for keeping the experience conversational

Built With

all-eigen-ai-integrations
python
react
typescript
vite

Updates

Purna Mallepaddi started this project — Mar 22, 2026 03:54 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.