Inspiration
M2D2AI was born from a deep frustration shared by patients, clinicians, and researchers alike: the diagnostic odyssey faced by children with rare diseases. Millions of families spend years searching for answers, often encountering misdiagnoses and dead ends. As a researcher in bioengineering and AI, I saw a clear opportunity—what if we could harness AI to bridge the gap between complex patient data and clinical decision-making?
The spark came during my work analyzing unstructured clinical notes and genetic test results. I started talking to my doctor friends, and they all said the same thing—even with years of training, reviewing thousands of notes and absorbing all that data quickly is nearly impossible under time pressue. I realized that while data was abundant, actionable insights were not. This motivated me to create a platform that could truly support doctors in identifying rare conditions early and accurately.
What it does
M2D2AI is a clinical decision support platform that helps doctors detect rare diseases faster and more accurately. It takes in messy, multimodal data—clinical notes, genetic test results, and images—and distills it into a clear, explainable summary. Our AI model doesn't just list potential diagnoses; it prioritizes them based on evidence pulled from expert-curated medical databases, and even suggests next steps like further testing or specialist referrals. It’s like giving every clinician a supercharged assistant that never misses a detail and always cites its sources.
We designed it with real-world workflows in mind, so general practitioners and specialists alike can use it without changing how they already work. The end result is a streamlined, trustworthy tool that saves time, reduces misdiagnosis, and gives patients a better shot at getting the answers they need.
How we built it
Data Integration: We unified structured and unstructured data—clinical notes, ICD-10 codes, genetic test results, and imaging reports.
Knowledge Curation: We relied solely on expert-reviewed databases (e.g., OMIM, Orphanet, ClinVar) to provide the relevant information for models to answer the given questions.
Model Development: We use both public foundation models (e.g Llama models) and our own large language models with reinforcement learning from clinician feedback (RLHF), ensuring outputs were not just accurate but clinically meaningful.
User Experience: We built an intuitive UI that outputs a clear diagnostic report, complete with evidence-backed reasoning and next-step recommendations.
Challenges we ran into
Clinical Trust: We had to design every part of the system to earn and maintain clinician trust—explainability, traceability, and accuracy were non-negotiable. In addition, we need to convince the customers why trust our business rather than other competitors.
Multimodal Fusion: Integrating different data types in a meaningful way without overwhelming the model required extensive experimentation.
Pretrainining Foundation Models: requires significant amount of time and computing resources. Currently, we are still doing some experiments how to tackle the problems.
Balancing Specificity and Sensitivity: Tuning our model to avoid overfitting while catching subtle signs was a constant challenge.
Iterative Feedback Loops: Getting clinicians to provide detailed feedback took time and possibly expensive.
Accomplishments that we're proud of
First, we managed to successfully integrate and interpret multiple types of patient data (clinical notes, images, and basic results from genetic reports). Clinical notes, imaging summaries, and genomic data all speak different "languages," and bringing them together into a single, usable pipeline was a huge technical and design challenge.
Second, we built our pipeline on a foundation of trust. That meant only using expert-curated knowledge bases. We still ultilized the public foundation models to extract and generate responses; however, all information for generation comes from verified sources, reducing hallucinations.
Finally, we’re proud that this tool has the potential to make a real impact. Early benchmarking shows that M2D2AI outperforms the current publicly state-of-the-art models (SOTA) in identifying causal genes for 4,662 patients (75% of accuracy) and underlying conditions for 4,983 patients (80% of accuracy) with rare diseases. Meanwhile, the SOTA models only reach 10-20% for gene identification and 20-30% for disease diagnosis. While still early, the results are promising with plenty of room to grow.
What we learned
Throughout this journey, we learned that:
Evidence is the most important—expert-curated databases made a massive difference.
Explainability is essential—our models had to justify every suggestion they made.
Human-in-the-loop feedback enhances reliability—trust is built through iterative validation.
Starting a business is not easy—Throughout the course, I learned how to take a research idea and turn it into a real-world product, including how to think strategically about commercialization and business planning.
🚀 What’s Next for M2D2AI: AI-Driven Insights for Rare Conditions
Next, we will complete and improve our current models to be able to suggest the next medical tests and steps (of course with real-time evidence e.g: government webpages or other hospital news or databases). We will attend conferences to advertise our product and further validate our models on real-world applications. We'll keep refining our models with clinician feedback, expanding our knowledge base, and improving explainability. Our goal is to move beyond diagnosis—to become a core decision support tool for precision medicine, from early detection to treatment planning.
Built With
- llm
- medical
- python
Log in or sign up for Devpost to join the conversation.