Inspiration

Healthcare triage is one of the biggest bottlenecks in medicine.People often Google their symptoms and get either wrong, scary, or completely mixed answers.We wanted to build a tool that: understands symptom descriptions in natural language, converts them into structured medical features predicts the most likely diseases using an AI model, outputs clear, interpretable results

The goal wasn’t to replace doctorsbut to make a first step tool that helps users understand what might be happening, and encourages proper follow-up with real medical professionals.

What it does

it takes as an input sentences from someone who is sick and dont know what disease they have for example: I have dry cough since yesterday, my head hurst a lot and when I am very sensible to light. from that we will extract all know symptoms and feed our model from it.Then we will outpus the top 3 most probable disease the user might haver with an accuracy score for each

How we built it

The system has three major components:

  1. Symptom Extractor (NLP Engine) Users type anything they want, like: “Crushing chest pain spreading to my arm and sweating a lot” This text is processed with:Normalization,Literal & fuzzy matching, Synonym dictionary and semantic embedding similarity This allows detection of symptoms even when phrased differently (e.g., “light hurts my eyes” → photophobia).

  2. Disease Prediction Neural Network Once symptoms are extracted, they are converted into a 157-dimensional binary vector, where each dimension is a symptom. Example: [0,1,0,0,1, ...]

The neural network architecture: Input: 157 symptoms Hidden Layers: 512 ->256 ->128 Activation: ReLU Regularization: Dropout = 0.3 Output: 61 diseases Softmax converts outputs into probabilities Model training included hyperparameter optimization to get the best configuration automatically.

  1. Web Application Backend: Flask REST API Endpoint /api/chat runs extraction + model prediction Frontend: custom HTML, CSS, and JavaScript Real-time UI that returns top predictions and detected symptoms This allowed us to demonstrate the model interactively.

Challenges we ran into

At the beginning, we used a publicly available symptoms disease dataset where each disease had almost only one patient entry. This created massive issues: the model could only memorizeone symptom pattern missing even one symptom caused total failure there was no variability in symptoms It caused overfitting and poor real-world performance

The solution: We built a fully new medical knowledge based synthetic dataset with: Real symptom probabilities, careful symptom clusters,prevalence modeling (rare diseases stay rare) 35,000 total patient rows, 61 realistic diseases and 157 symptoms. This improved training stability and prediction accuracy.

Accomplishments that we're proud of

High top-3 accuracy with realistic case descriptions A symptom extractor that can understand natural language extremely well A complete end-to-end medical AI system, not just a model A real-time, interactive web demo that anyone can use

What we learned

This project required us to combine machine learning, NLP, full-stack development, and dataset engineering. Some core things we learned are: Natural Language Processing: -how to extract symptoms from unstructured text -semantic similarity with Sentence-BERT (MiniLM) -Fuzzy and literal string matching -Building a hybrid extractor using multiple signals

Neural Networks: -Designing multi-layer feedforward models -Activation functions like ReLU -Regularization techniques:Dropout,Weight Decay,Early Stopping -How to use pytorch for a NN

Full-Stack Deployment: -API development with Flask -Frontend integration with HTML/CSS/JS -Handling JSON predictions in real time

What's next for Healthcare assistant

  1. Emergency Override System Hard-coded rules for life-threatening symptoms like MI, Stroke,etc

  2. More Advanced NLP Use a transformer like DistilBERT for even better symptom extraction.

  3. Better Dataset Instead of relying on a synthetic datasetwhich works well but is not idealwe could use a real medical database. By cleaning and preparing this data, we can make it suitable for our project and achieve more reliable results.

Share this project:

Updates