-
-
Preview of the breast cancer dataset showing key clinical features used for training the AI model.”
-
Preview of the breast cancer dataset showing key clinical features used for training the AI model.”
-
Confusion matrix demonstrating the model’s classification performance on test data.
-
Detailed classification report showing accuracy, precision, recall and F1-score of the XGBoost classifier.”
-
Top influential medical features identified by the XGBoost model for tumor prediction.”
💡 Inspiration
Breast cancer remains one of the leading causes of death among women, especially in India, where many cases are detected only in late stages due to lack of screening facilities, shortage of radiologists, and slow manual diagnosis. I wanted to build a solution that is fast, affordable, explainable, and accessible to everyone, even in rural areas without mammography equipment. This inspired me to develop an AI-driven system that can help doctors detect cancer early, reduce diagnostic workload, and provide immediate, data-driven insights.
⚙️ What it does
The AI-Based Breast Cancer Detection System performs:
Tumor Prediction: Classifies whether the tumor is Benign or Malignant using clinical features.
Stage Approximation: Estimates the probable cancer stage (0–IV) using medical parameters.
Real-Time Manual Input Mode: Allows doctors to enter clinical measurements and get instant diagnosis.
Explainability: Shows the most influential medical features using feature importance and permutation methods.
Automated Medical Report: Generates a downloadable report with results, confidence score, stage, and model information.
Voice Feedback (Optional): Reads out the diagnosis to support quick communication.
It is a complete, end-to-end diagnostic assistant, not just a model.
🛠️ How we built it
Started by importing essential ML libraries (NumPy, Pandas, Scikit-Learn, XGBoost, SHAP, Matplotlib).
Loaded a clinically validated dataset and performed preprocessing, feature scaling, and train-test split.
Trained an XGBoost classifier for high-accuracy prediction.
Developed a manual input pipeline so doctors can test with real patient data.
Implemented a simple but effective stage approximation logic using medical heuristic ranges.
Added explainability using feature importance and permutation-based analysis.
Built an automated report generator to produce complete diagnosis summaries.
Tested and validated the workflow inside Google Colab for easy deployment.
⚠️ Challenges we ran into
SHAP explainability initially caused errors because the tree explainer didn’t support the model in Colab environment.
Balancing feature scaling with manual input values required careful reconstruction of the full input vector.
Choosing meaningful parameters for stage approximation while ensuring it stayed clinically sensible.
Optimizing the pipeline so it runs smoothly even on low-end devices without GPU.
Making the system simple, clean, and doctor-friendly.
🏆 Accomplishments that we're proud of
Successfully built a high-accuracy diagnostic model using XGBoost.
Added explainability, which most AI medical tools lack.
Designed a clean prediction interface that works even on basic laptops.
Generated fully formatted automated medical reports, ready for hospital documentation.
Created an innovation that is applicable in real medical workflows, not just a theoretical ML project.
📚 What we learned
How to design practical ML solutions for healthcare, not just run models.
Importance of explainability in medical AI — doctors need to trust the output.
How small feature changes affect sensitivity and diagnosis accuracy.
How to convert a notebook into a complete, usable product.
How to communicate and document AI results clearly for clinical use.
🚀 What’s next for the AI-Based Breast Cancer Detection System
Integrating the system with mobile app / telemedicine platform for remote diagnosis.
Adding an image-based module using mammogram ML models for hybrid diagnosis.
Deploying the system as a cloud API that hospitals can plug into their EMR/EHR systems.
Expanding to detect other cancers using similar feature-based ML pipelines.
Enhancing stage detection using advanced medical datasets and real hospital training data.
My goal is to make this system a low-cost screening tool used across India to save lives through early detection.
Built With
- data
- google-colab
- joblib
- machine-learning
- matplotlib
- numpy
- pandas
- python
- scikit-learn
- shap
- standardscaler
- xgboost
Log in or sign up for Devpost to join the conversation.