Inspiration

Medical diagnosis often requires analyzing both imaging (like X-rays or MRIs) and patient clinical records. Doctors spend significant time correlating these two sources, which can be slow and prone to human error. We wanted to create an AI system that assists healthcare professionals by combining visual and textual data to provide fast, accurate, and explainable diagnostic insights.

What it does

MedAI simultaneously analyzes medical images (X-rays, MRIs, CT scans) and clinical text (patient symptoms and history) to generate diagnostic predictions. It provides:

Visual Interpretability: Highlights regions in images influencing predictions. Textual Interpretability: Shows which symptoms or historical data impacted the diagnosis. Multi-Modal Insights: Combines both sources for a more accurate overall prediction.

How we built it

Data Sources: Publicly available medical imaging datasets (X-ray, MRI) and anonymized patient records with symptoms and history. Model Architecture: A multi-modal neural network combining: CNNs for image analysis (PyTorch/TensorFlow) NLP models for text analysis Fusion layers to integrate features from both modalities Frameworks & Tools: TensorFlow, PyTorch, Streamlit for UI, NumPy, pandas, Scikit-Learn, Matplotlib for preprocessing and visualization. Training Approach: Model trained on images and text simultaneously with cross-entropy loss and evaluation metrics tracking both individual modality accuracy and combined diagnostic accuracy.

Challenges we ran into

Data Integration: Aligning image and text data for each patient required careful preprocessing.

Interpretability: Ensuring predictions were explainable to healthcare professionals was non-trivial.

Limited Datasets: Public datasets often lacked paired image-text records, so some text data had to be simulated while maintaining realism.

Performance: Achieving balanced performance across both modalities required fine-tuning hyperparameters.

Accomplishments that we're proud of

Developed a working multi-modal diagnostic system with 75% overall accuracy. Built a Streamlit interface for healthcare professionals to interact with the model and visualize explanations. Successfully combined computer vision and NLP techniques in a single diagnostic workflow.

What we learned

Multi-modal AI can provide more accurate and context-aware insights than single-modality systems. Explainable AI is critical in healthcare to foster trust in automated recommendations. Data preprocessing and alignment are often more challenging than model building itself.

What's next for MedAI Multi-Modal Diagnostic Assistant

Expand datasets: Integrate real-world anonymized patient records with matched imaging. Increase accuracy: Explore transformer-based models for text and vision transformers for images. Deployment: Integrate into hospital systems with secure data handling and real-time diagnostics. Additional modalities: Incorporate lab test results, wearable sensor data, and genomics for comprehensive diagnostics.

Built With

Share this project:

Updates