MaligNet

Inspiration

This project was born out of a desire to provide doctors with a personal AI companion that leverages data analytics and machine learning to improve breast cancer treatment recommendations. Inspired by Hacklytics GT 2025, we aimed to develop a highly accurate predictive model to assist clinicians in identifying key risk factors and optimizing treatment decisions.

Through extensive machine learning training and fine-tuning, we developed an optimized Random Forest model that achieves over 90% accuracy in predicting patient survival outcomes, ensuring reliable and interpretable insights for clinicians.

What it does

This project analyzes clinical and protein data to provide insights into breast cancer patient outcomes. The model:

Predicts patient survival risk based on various clinical features.
Identifies key contributing factors using feature importance analysis.
Enhances clinical decision-making by offering data-driven insights.

How we built it

Data Sourcing & Integration: We started with a well-known breast cancer dataset from Kaggle, which provided a solid foundation of biomarkers and clinical outcomes. This dataset was then combined with other relevant clinical data to form a comprehensive repository.
Machine Learning & Analytics: Using Python, state-of-the-art machine learning frameworks, and leveraging Streamlit for visualization, we built and validated predictive models that translate raw data into actionable treatment insights.
User Interface Development: Our front-end, developed with modern web technologies, leveraging Next.js, Shad/cn, and Typescript, offers an accessible and intuitive experience, allowing clinicians to easily interpret the recommendations.
Backend Architecture: Designed for scalability and security, our backend efficiently manages data processing while safeguarding sensitive patient information by harnessing MongoDB for storage and Flask to handle API calls.

Challenges we ran into

Dynamically connecting both frontend and backend successfully - (This feature will be implemented in the future).
Balancing the Dataset: Implementing SMOTE was crucial to handle class imbalances in patient outcomes.

Hyperparameter Tuning: Optimizing n_estimators, max_depth, and feature selection was an iterative process.

Accomplishments that we're proud of

Optimized Machine Learning Model: Achieved 90%+ accuracy with robust cross-validation performance.
Feature Engineering Success: Created new features that significantly improved prediction accuracy.
Beautiful Console Logging for ML Training: Developed a clear, structured debugging process for model interpretation.
Interdisciplinary Collaboration: Brought together machine learning, software development, and UI design to enhance the final product.

What we learned

Advanced ML Techniques: Fine-tuning Random Forest models significantly impacts accuracy.
Readable Debugging: Clear console outputs improve workflow efficiency and model interpretability.
UI/UX for Machine Learning: Making complex ML results accessible through intuitive visualizations enhances usability.

What's next for MaligNet

Use real-world inputs to fine-tune predictive algorithms, ensuring even more precise treatment recommendations. After this is developed, MaligNet plans to integrate generative AI to give a better tailored experience where healthcare teams can get further explanations, recommendations and even the opportunity to input data that is not specifically in the dataset. With the integration of gen AI, MaligNet also aims to make it multi-modal since these providers usually have information in multiple formats.

Link of ML Model Explanation link

Built With

flask
kaggle
matplotlib
next.js
numpy
pandas
plotly
python
scikit-learn
streamlit
typescript

Submitted to

Hacklytics 2025: Jurassic Age

Created by

I worked on the ML model and back-end. Developed a machine learning model for breast cancer prognosis, achieving 90%+ accuracy using Random Forest and SMOTE to balance the dataset. Built a Flask API to serve real-time ML predictions, ensuring future seamless integration for clinical decision support. Optimized feature engineering and hyperparameters via GridSearchCV, integrating histology-stage interactions and age group bins for improved predictions.

Sebastian Lian Carmagnola
JhayBae
Celeste Echols

Updates

Sebastian Lian Carmagnola started this project — Feb 22, 2025 09:23 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.