Breast Cancer Classifier

Inspiration

Breast cancer remains one of the leading causes of cancer-related deaths globally, with outcomes often worsened by late diagnosis, particularly in low- and middle-income regions. While modern medical imaging and specialist expertise have improved detection in some settings, access to these resources is still uneven.

This project was inspired by a simple but important question: How can AI be used responsibly to support early breast cancer risk assessment and awareness, without overstepping clinical boundaries?

Rather than attempting to replace medical professionals, the goal was to explore how machine learning systems can act as decision-support tools, helping to analyze diagnostic patterns and highlight risk signals in a transparent and reproducible way.

What it does

I developed an end-to-end machine learning system that classifies breast tumors as benign or malignant using standardized numerical features derived from diagnostic imaging analysis (such as radius, texture, concavity, smoothness, and related measurements).

The final output is a cloud-deployed API that:

Accepts structured tumor feature inputs

Applies the same preprocessing used during training

Returns both a prediction and a confidence score

This system is explicitly designed as a proof of concept and research artifact, not a diagnostic tool.

How we built it

The project followed a structured workflow:

Problem Framing & Data Understanding I began by studying the clinical context of breast cancer diagnosis and exploring a dataset containing standardized tumor feature measurements commonly used in academic research.

Exploratory Data Analysis (EDA) I analyzed feature distributions, correlations, and class separability to understand which characteristics were most associated with malignancy.

Model Training & Evaluation Multiple machine learning models were trained and compared. A Support Vector Machine (SVM) emerged as the best-performing model based on accuracy, AUC, and generalization performance.

Model Interpretability To address trust and transparency, I applied SHAP (SHapley Additive exPlanations) to identify which features most influenced predictions. This step helped align model behavior with known medical insights, such as the importance of tumor concavity and boundary irregularities.

Deployment Moving beyond notebooks, I deployed the model as a FastAPI-based REST service, first locally and then on the cloud (Render). This made the system accessible for real-time testing and future frontend integration.

Challenges we ran into

Several challenges shaped the project:

Data Generalization The model was trained on a standardized dataset, which may not fully capture variations across populations, imaging devices, or clinical protocols.

Clinical Realism The system relies on pre-extracted numerical features rather than raw medical images, which limits how closely it mirrors real clinical workflows.

Ethical Boundaries A major challenge was ensuring the project did not overclaim its capabilities. Clear disclaimers and responsible framing were essential to avoid misuse or misinterpretation.

Deployment Learning Curve Transitioning from experimentation in notebooks to deploying a production-style API required learning version control, environment management, and cloud deployment practices.

Accomplishments that we're proud of

Built a complete end-to-end machine learning system, progressing from exploratory analysis and model training to a fully deployed, cloud-accessible API.

Successfully transitioned from notebook-based experimentation to a production-style workflow using FastAPI and cloud deployment.

Achieved strong predictive performance while maintaining responsible framing as a proof of concept rather than a diagnostic tool.

Integrated model interpretability techniques (SHAP) to better understand and communicate how key features influence predictions.

Designed the system to be modular and UI-agnostic, enabling future frontend, mobile, or service-level integrations.

Maintained ethical awareness by clearly stating limitations and avoiding overclaims, particularly in a sensitive healthcare context .

What we learned

Deployment fundamentally changes how machine learning models are designed, tested, and evaluated.

High accuracy alone is insufficient in healthcare-related applications; interpretability and trust are equally important.

Real-world data variability and population differences present significant challenges to generalization.

Clear communication of limitations is essential when building AI systems with potential societal impact.

Building APIs rather than tightly coupled interfaces enables scalability and collaboration.

Responsible AI development requires continuous consideration of ethical, clinical, and policy implications from the earliest stages

What's next for Breast Cancer Classifier

Future iterations could include:

Broader, multi-center datasets for better generalization

Stronger validation and bias analysis

Deeper explainability tools for clinical trust

Integration with a frontend interface

Exploration of computer vision approaches using raw medical images (with appropriate safeguards)

Ultimately, this project represents a foundational step toward building responsible, deployable AI systems for healthcare impact.

Built With

fastapi
git
github
jupyter
numpy
pandas
pydantic
python
render
scikit-learn
uvicorn

Updates

Arafah Ibitayo Ibitoye started this project — Jan 25, 2026 09:34 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.