We were inspired to tackle one of the most critical health challenges affecting millions of women worldwide. Breast cancer is the second most common cancer among women, and early detection can dramatically improve survival rates. We wanted to create a tool that demonstrates how machine learning can assist in medical diagnosis while making the technology accessible and educational for users to understand how AI works in healthcare.

Our website implements a real machine learning model that predicts breast cancer diagnosis based on cell nuclei measurements from the Wisconsin Breast Cancer dataset. Users can input 30 different features (mean values, standard errors, and worst values) related to cell characteristics like radius, texture, perimeter, area, smoothness, compactness, and concavity. The trained logistic regression model then provides a prediction with 96.5% accuracy. The interface includes sample data for testing, export/import functionality, and comprehensive information about the dataset and model performance.

We built this using a modern tech stack with React, TypeScript, and Tailwind CSS for the frontend, creating a beautiful and responsive interface. The core machine learning implementation uses a custom logistic regression algorithm with L2 regularization, trained on the UCI Wisconsin Breast Cancer dataset containing 569 cases. We implemented proper data standardization using z-score normalization and included all 30 features from the original dataset. The model uses sigmoid function for probability calculation and achieved high performance metrics through 5-fold cross-validation.

One of the biggest challenges was implementing the logistic regression algorithm from scratch rather than using pre-built libraries, ensuring mathematical accuracy in the sigmoid function and coefficient calculations. We also faced challenges in handling the 30-feature input form in a user-friendly way without overwhelming users. Balancing educational content with usability was tricky - we needed to explain complex medical and ML concepts while keeping the interface intuitive. Additionally, ensuring proper data standardization and maintaining model accuracy while making it web-compatible required careful optimization.

We're incredibly proud of achieving 96.5% accuracy with our custom logistic regression implementation, matching professional-grade ML performance. The seamless integration of real medical data with an intuitive user interface that doesn't compromise on functionality is a major achievement. We successfully created an educational tool that makes machine learning in healthcare accessible to non-technical users while maintaining scientific rigor. The comprehensive feature set including sample data, export/import capabilities, and detailed model performance metrics demonstrates a complete, production-ready application.

This project taught us the intricate details of implementing machine learning algorithms from scratch, giving us deep insights into how logistic regression actually works under the hood. We learned about the critical importance of data preprocessing and standardization in medical ML applications. Working with real medical data helped us understand the ethical considerations and responsibilities involved in healthcare AI. We also gained valuable experience in creating user interfaces for complex data input while maintaining accessibility and usability.

We plan to expand the model capabilities by implementing additional algorithms like Random Forest and SVM for comparison and ensemble predictions. Future versions will include data visualization features showing feature importance and prediction confidence intervals. We want to add more educational content about breast cancer awareness and the role of AI in medical diagnosis. Integration with medical imaging analysis and expanding to other cancer prediction models using different datasets are also on our roadmap. We're also considering adding multi-language support to make this educational tool accessible globally.

Built With

Share this project:

Updates