Smarter Diagnosis, Safer Data

Group: Privacy Prescribers Nishka Govil, Mirelle George, Metehan Berker

Inspiration

There is untapped potential in being able to leverage multiple hospitals’ patient data, which offers extensive information on patients’ symptoms and diagnoses, to train an AI model. A key issue is patient privacy. If hospitals are sharing patient data to train a model, how can we keep patients’ records secure? Our solution is to use a machine learning method called federated learning. Federated learning is where multiple devices collaborate to train a shared model, but without sharing their raw (and often sensitive) data.

What it does

Our platform enables hospitals to upload their patient data to help train AI models for medical purposes. The platform then allows hospitals to use the resulting trained models to assist in evaluating individual patients. For the scope of this hackathon, the model training and inference in our platform is currently done only for heart disease prediction. Our product’s key offering is that it provides a way for a large amount of patient data (collected from any number of hospitals) to be used to train medical models for a variety of purposes, in a way that ensures patient privacy and abides by data regulations. By using such a huge amount of data, the model can become incredibly accurate. The way we ensure each patient’s privacy is through federated learning.

What this means is that behind the scenes of our platform, when a hospital (i.e. client) uploads their patients’ data, the client trains the model on the hospital’s local data. Then, the trained model parameters are shared with the server, which aggregates the parameters from each client into a global model. Subsequently, the server sends back its global model parameters to each client. This process continues for many rounds until the model converges.

How we built it

We divided up the project into three parts: 1) creating the predictive model our user interface will demonstrate using 2) developing the architecture for federated learning 3) creating the user interface. With three team members, each team member took one part of the project. We started off with creating a predictive model, which was a logistic regression model to predict heart disease, as well as developing the code to run federated learning. Our next step was integrating the predictive model and federated learning files so that each hospital runs the same type of model on their patient data. Our final step was creating the user interface and integrating all our prior code to create an interactive display for users.

Challenges we ran into

We originally intended to use a Histogram Gradient Boosting Classifier, but realized it was incompatible with the tool we were using to execute federated learning. We addressed this by modifying our predictive model to instead use Logistic Regression, and subsequently optimized the hyperparameters to get the highest-accuracy model.

We were originally a group of five people, but two people dropped off mid-way into the project. We took this in stride and simply reduced our scope of work to ensure that our prototype was something three people could practically develop.

Accomplishments that we're proud of

Seeking to address issues of privacy, one of the goals of our project was to actively implement federated learning. Federated learning was our avenue to help protect the privacy of hospitals and their data. Overall, after training various models and conducting hyperparameter tuning, we found a good model for prediction that exceeded baseline accuracy with other models. We developed a UI to incorporate all of this information in a way that is visually appealing and easy to understand, while allowing space for hospitals to enter their data.

Throughout our project, we learned to apply federated learning to protect the privacy of patients and companies. We found that the best split of our project work was into 3 different categories of model searching & tuning, setting up the server/client, and building the UI visualization. We found that during the model searching process, various models that were trained and predicted, all offered solid accuracy. Originally, we found that the HistGradientBoostingClassifier model had a high accuracy and was a good predictive model option. However, we found that this model was incompatible with the federated learning framework and had to pivot accordingly. We prioritized maintaining a high accuracy while still being compatible with our federated learning framework. We then successfully implemented federated learning, as we have delivered in our project.

What's next for our Project

There are various next steps that we can envision ourselves navigating and exploring. One of the ideas we were thinking of is using more complex models such as neural networks and stacking various other models. This way, more complex relationships can be modeled. In this project, we chose to use the heart disease dataset, but we can expand this idea of federated learning to other datasets such as X-rays. Other avenues that we could foresee our project expanding to is integrating the product to use a chatbot that allows patients to discuss diagnoses. This can act as a preliminary diagnosis for patients before they even meet with medical providers, which frees up resources. In addition, incorporating human feedback from doctors & other clinical professionals could help the chatbot be better shaped toward the medical expertise of these professionals and assist with integrating this project in the hospital systems.

Built With

Languages: Python Packages/Tools: Flower, Scikit-learn, Pandas, Streamlit, Flask