About the Project

The Problem

Rare diseases affect 1 in 10 people globally, yet patients wait an average of 7-10 years for a correct diagnosis. During this "diagnostic odyssey," they're often misdiagnosed with common conditions like fibromyalgia, anxiety/depression, chronic fatigue syndrome, and IBS. Meanwhile, conditions like Endometriosis (affects 1 in 10 women), PCOS, Ehlers-Danlos Syndrome, and Lupus go undetected—causing unnecessary suffering and delayed treatment.

What Inspired Me

I was inspired by countless patient stories of being dismissed by doctors with phrases like "it's just stress" or "it's all in your head" when they had serious underlying conditions. The medical system is optimized to find common diseases, but rare diseases require pattern recognition that AI can help with. Real example: Endometriosis takes 7+ years to diagnose on average, despite affecting 200 million women worldwide. The symptoms are dismissed as "bad periods" when they're actually a debilitating disease.

How I Built It

Tech Stack:

  • Frontend: React.js with modern CSS
  • AI/ML: Hugging Face Inference API (Mistral-7B-Instruct)
  • Algorithm: Custom multi-stage symptom matching system

Architecture:

  1. Symptom Extraction Layer

    • AI-powered extraction using Mistral-7B with medical terminology prompts
    • Enhanced fallback with 40+ regex patterns for colloquial symptom descriptions
    • Captures severity, duration, and context
  2. Smart Matching Algorithm

$$\text{Score} = \sum (\text{MatchConfidence} \times \text{SymptomWeight} \times \text{SeverityMultiplier} \times \text{DurationMultiplier})$$

Where:

  • $\text{MatchConfidence} \in [0, 1]$ (fuzzy string matching + medical term mapping)
  • $\text{SymptomWeight} \in [1, 10]$ (diagnostic significance)
  • $\text{SeverityMultiplier} = 1.8$ if severe, $1.4$ if chronic
  1. Confidence Calculation

$$\text{Confidence} = \min(45, \text{Coverage} \times 0.45) + \min\left(30, \frac{\text{MatchedSymptoms}}{4} \times 30\right) + \min(25, \text{HighValueMatches} \times 12)$$

  1. Explainability Layer
    • AI-generated reasoning for each match
    • Comparison with common diagnoses
    • Specific next steps and testing recommendations

Database:

  • Curated 30+ diseases (15 rare, 15 common) with 200+ symptoms
  • Each symptom weighted by diagnostic significance (1-10)
  • Includes diagnostic criteria, misdiagnoses, and medical sources

What I Learned

Technical:

  • How to build effective medical AI that doesn't require massive datasets
  • Importance of fallback systems (AI fails ~10% of the time)
  • Fuzzy matching algorithms for medical terminology
  • Balancing precision vs recall in symptom matching

Medical:

  • Rare disease diagnostic criteria and patterns
  • Why misdiagnosis happens (symptom overlap, cognitive biases)
  • The importance of patient advocacy in diagnosis

UX/Ethics:

  • How to present AI health information responsibly
  • Importance of disclaimers without undermining utility
  • Designing for empowerment vs. causing health anxiety

Challenges I Faced

  1. Symptom Extraction Accuracy

    • Problem: Patients describe symptoms colloquially ("sex hurts") but databases use medical terms ("dyspareunia")
    • Solution: Built a two-tier system with medical term mapping and synonym dictionaries
  2. False Positives

    • Problem: Fatigue matches almost every disease
    • Solution: Implemented weighted scoring where rare/specific symptoms (weight ≥8) count more than common ones
  3. AI Reliability

    • Problem: Hugging Face API throttling and inconsistent outputs
    • Solution: Robust fallback extraction with 40+ regex patterns that works without AI
  4. Ethical Concerns

    • Problem: Risk of patients self-diagnosing incorrectly
    • Solution: Prominent disclaimers, shows common diagnoses for comparison, emphasizes "discuss with doctor" not "you have this", frames as advocacy tool not diagnostic tool
  5. Scoring Calibration

    • Problem: How to balance "2 perfect symptoms" vs "10 weak symptoms"?
    • Solution: Multi-factor confidence score weighing coverage, specificity, and high-value matches

Impact & Future Work

Current Impact:

  • Helps patients identify rare diseases to discuss with doctors
  • Reduces time to proper diagnosis
  • Empowers patient advocacy with specific testing recommendations

Future Improvements:

  • Add 50+ more rare diseases
  • Implement demographic filtering (age, gender, ethnicity)
  • Add symptom timeline visualization
  • Multi-language support
  • Integration with medical literature APIs for real-time citation

Built With

Share this project:

Updates