Project Story

Inspiration

Psychiatric misdiagnosis is the leading cause of patient suffering in mental health, with 77% of patients with non-specific diagnoses receiving inadequate care. We were struck by the fact that neuropsychiatrists aren't missing diagnoses due to incompetence—they're being asked to do the impossible. During a patient interview, they must simultaneously conduct therapy, build rapport, observe behavior, and recognize subtle diagnostic patterns across hundreds of DSM-5 criteria.

The human brain simply can't hold every diagnostic possibility while actively engaging in therapeutic conversation. This cognitive overload leads to anchoring bias (fixating on the first diagnosis that comes to mind) and premature closure (stopping the diagnostic process too soon). We wondered: what if AI could act as a real-time safety net, catching patterns psychiatrists might miss while the patient is still in the room?

That's when we realized we could build the diagnostic copilot psychiatry has never had.


What It Does

Charcot is a multimodal AI copilot for neuropsychiatric diagnosis that provides real-time diagnostic support during live patient interviews by analyzing both conversational and nonverbal cues.

Core Capabilities:

1. Multimodal Behavioral Analysis

  • Audio Processing: Real-time speech-to-text transcription using AssemblyAI with psychiatric vocabulary optimization
  • Facial Tracking: MediaPipe FaceMesh detects 468 facial landmarks to extract behavioral metrics
  • Eye Contact Monitoring: Calculates eye contact percentage using nose-tip deviation from camera center
  • Gaze Stability Analysis: Tracks head movement variance over 2-second windows to detect dissociation or anxiety indicators
  • Breathing Rate Detection: Monitors nose-to-mouth distance changes to estimate respiratory rate (8-30 bpm range)
  • Facial Expression Analysis: FaceAPI.js integration for micro-expression detection

2. Real-Time Diagnostic Support

  • Analyzes patient conversations for diagnostic patterns through live transcription
  • Cross-references symptoms against DSM-5 criteria
  • Displays alternative differential diagnoses to consider
  • Suggests specific follow-up questions to ask while patient is still present

3. Behavioral Red Flag Detection

  • Hyperventilation Alert: Breathing rate > 25 bpm triggers critical alert
  • Dissociation Indicator: Gaze stability > 95% (frozen stare) after 90 seconds
  • Anxiety Indicator: Gaze stability < 30% (erratic movement)
  • Avoidance Behavior: Eye contact < 20% after 45 seconds

What Makes Charcot Unique:

Unlike ambient scribes (MDHub, Nabla) that only document conversations, or screening tools (Limbic, Kintsugi) that patients use before appointments, Charcot is the only platform providing real-time, multimodal differential diagnosis support during live psychiatric interviews. We're the first to combine audio analysis with visual behavioral tracking specifically for psychiatric diagnosis.


How We Built It

Tech Stack:

Backend Audio/Video Input:

  • AssemblyAI - Real-time speech-to-text transcription with medical vocabulary
  • MediaPipe - 468-point facial landmark detection
  • TensorFlow - Machine learning runtime with WebGL acceleration
  • FaceAPI.js - Facial expression analysis

Frontend & UI:

  • React.js - Component-based UI architecture
  • Vite - Fast build tool and dev server
  • Tailwind CSS - Utility-first styling system
  • Recharts - Real-time data visualization for behavioral metrics
  • Lucide React - Icon system
  • Node.js - Backend services

Additional Libraries:

  • Flask - API endpoints
  • OpenCV - Advanced video processing
  • NumPy - Numerical computations for behavioral algorithms
  • DeepFace - Enhanced facial analysis

Behavioral Metrics Engine: We developed custom algorithms for three core behavioral metrics:

  1. Eye Contact Calculation:

    • Uses nose tip landmark (index 1) as gaze proxy
    • Calculates 2D deviation from camera center
    • Returns percentage score based on deviation distance
  2. Gaze Stability Tracking:

    • Monitors nose position over approximately 60 frames (2 seconds)
    • Computes movement variance across tracking window
    • Converts to stability score where higher values indicate more stable gaze
  3. Breathing Rate Estimation:

    • Tracks nose-to-mouth Euclidean distance over 10-second windows
    • Detects peaks in distance changes (inhalation cycles)
    • Converts peak count to breaths per minute, clamped to physiological range [8, 30]

Privacy-First Architecture:

  • 100% local video processing - No video data transmitted or stored
  • Browser-only computation - TensorFlow.js runs entirely client-side
  • HIPAA-compliant audio APIs - Secure transmission for transcription
  • No session recording - Real-time analysis only

Challenges We Ran Into

1. MediaPipe Model Initialization

Problem: MediaPipe FaceMesh took 3-5 seconds to initialize, causing blank screens and user confusion.

Solution: Implemented loading states and async initialization with proper error handling. Added visual feedback during model download.

2. DroidCam Green Screen Bug

Problem: When using DroidCam (iPhone as webcam), video element showed green screen despite correct pixel data.

Solution: Implemented canvas-based rendering pipeline - draw video frames to canvas before face detection. This workaround bypasses browser rendering issues while preserving actual pixel data.

3. Video Element Persistence Across Tab Navigation

Problem: Switching tabs unmounted the video element, breaking camera stream and stopping all computer vision processing.

Solution: Changed architecture to keep video element always mounted in DOM, using CSS visibility instead of conditional rendering. Critical lesson: video ref must maintain continuous stream access.

4. False Positives in Behavioral Alerts

Problem: Initial threshold-based alerts triggered constantly (e.g., every blink triggered "low eye contact" warning).

Solution: Implemented time-gated alerts (e.g., eye contact warnings only after 45 seconds of sustained low contact) and baseline comparison logic to detect changes rather than absolute values.

5. Real-Time Performance Optimization

Problem: Processing every video frame (30 FPS) caused CPU/GPU throttling on lower-end devices.

Solution: Although we currently process every frame via requestAnimationFrame, we identified this as a future optimization target. Could reduce to 10-15 FPS for face detection while maintaining smooth video playback.

6. Integrating Multiple Data Streams

Problem: Synchronizing audio transcription from AssemblyAI with real-time video analysis from MediaPipe while maintaining low latency.

Solution: Implemented event-driven architecture with websockets for audio streaming and optimized frame processing pipeline to ensure both streams align temporally for accurate multimodal analysis.


Accomplishments That We're Proud Of

Built a fully functional multimodal diagnostic system in 12 hours - Complete integration of audio transcription and video analysis working end-to-end

Achieved true privacy-preserving architecture - 100% local video processing means no video data ever leaves the user's device

Solved the DroidCam compatibility problem - Canvas-based rendering allows clinicians to use phones as high-quality webcams

Created genuinely novel multimodal diagnostic support - First platform combining real-time audio conversation analysis with behavioral tracking specifically for psychiatric diagnosis

Designed an intuitive clinical interface - Clean, professional UI that psychiatrists could actually use in practice without disrupting workflow

Validated technical feasibility - Proved that browser-based ML combined with cloud transcription can handle complex real-time multimodal analysis

Demonstrated competitive advantage - Only solution checking all five boxes: real-time in-session, multimodal (audio + video), differential diagnosis, actionable prompting, and behavioral red flag detection


What We Learned

Technical Lessons:

  • Browser-based ML is production-ready: TensorFlow.js + WebGL can handle real-time computer vision without backend infrastructure
  • Multimodal integration requires careful architecture: Synchronizing audio and video streams while maintaining low latency demands event-driven design
  • State management matters: Video element lifecycle management taught us the importance of understanding React's rendering model deeply
  • Privacy isn't optional: Building local-first from day one shaped our entire architecture - and made it better
  • Facial landmarks are surprisingly expressive: 468 points provide enough signal for breathing, gaze, and attention metrics

Domain Insights:

  • Psychiatric diagnosis is uniquely challenging: No blood tests, no imaging - purely conversational diagnosis makes pattern recognition incredibly difficult
  • Nonverbal cues matter enormously: Frozen gaze (dissociation), erratic movement (anxiety), avoidance (trauma) - these patterns are diagnostically relevant but hard to track manually
  • Real-time intervention is critical: The difference between "while patient is in room" vs "after patient leaves" is the difference between actionable insight and missed opportunity
  • Multimodal data is powerful: Combining what patients say with how they say it and their nonverbal behavior provides richer diagnostic signals than either modality alone

Product Thinking:

  • Copilot > Autopilot: Psychiatrists don't need AI to replace them - they need augmentation that catches what they miss while preserving their clinical judgment
  • The interface is the intervention: Suggestions only help if they're glanceable, non-intrusive, and integrated into natural workflow
  • Timing is everything: Tools that work after the patient leaves miss the critical window for intervention

What's Next for Charcot

Immediate Technical Roadmap:

1. Enhanced AI Diagnostic Engine (Priority 1)

  • Fine-tune LLM (GPT-4/Claude) on 100,000+ psychiatric case studies
  • Build DSM-5 criteria matching and differential diagnosis generation
  • Implement real-time diagnostic suggestion UI with confidence scores and reasoning explanations

2. Clinical Integration (Priority 2)

  • Integrate with EHR databases (Epic FHIR, Cerner) for patient history access
  • Build bi-directional data flow for seamless clinical workflow
  • Develop specialty modules for different psychiatric domains

3. Enhanced Behavioral Analysis

  • Implement voice prosody analysis (speech rate, pitch variance) for mood indicators
  • Add MediaPipe Pose for chest-based breathing (more accurate than facial movement)
  • Expand micro-expression detection capabilities

Clinical Validation:

Pilot Study at UIUC

  • Partner with campus counseling services for initial testing
  • Collect feedback from 5-10 psychiatrists on interface usability
  • Measure diagnostic suggestion accuracy against gold-standard diagnoses
  • IRB approval for research study

Data & Model Improvement

  • Continue training on anonymized psychiatric case studies
  • Build specialty modules: child psychiatry, geriatric, addiction medicine
  • Validate cross-disorder pattern recognition (bipolar vs depression, ADHD vs anxiety)

Regulatory & Compliance:

HIPAA Certification

  • Complete comprehensive security audit
  • Implement end-to-end encryption for all data transmission
  • Build audit logging and access controls
  • Achieve full HIPAA compliance certification

FDA Pathway (Long-term)

  • Pursue Class II medical device clearance (Clinical Decision Support)
  • 510(k) pathway as "software as a medical device"
  • Clinical trial data for efficacy validation

Go-to-Market Strategy:

Phase 1: Academic Medical Centers (Months 1-6)

  • Target psychiatry residency programs for adoption
  • Position as educational tool for junior psychiatrists
  • Collect validation data and testimonials

Phase 2: Private Practice Integration (Months 6-12)

  • Direct licensing to psychiatric practices
  • Per-seat SaaS pricing model ($200-300/month per clinician)
  • EHR integration and workflow optimization

Phase 3: Enterprise Partnerships (Year 2+)

  • White-label integration with ambient scribe providers (Nabla, MDHub)
  • Revenue-share model: they own documentation, we add diagnostic layer
  • Hospital system contracts

Vision:

Our ultimate goal is to make Charcot the diagnostic safety net that every neuropsychiatrist has at their side - catching patterns they might miss, suggesting alternatives they might not consider, and ensuring that patients get the right diagnosis in the first visit, not ten years later. We're building the future where psychiatric diagnosis is as evidence-based and AI-augmented as radiology or pathology - where technology amplifies human expertise rather than replacing it.

**Charcot: Preventing psychiatric misdiagnoses, one conversation at a time.

— Shrishant, Tanush, Sreehaas, Sadkrith**

Built With

Share this project:

Updates