Every year, the amount of data collected exponentially grows. As the abundance of data grows, so do the possibilities that come along with it. In conjunction with machine learning in Python, we decided to utilize the tools available to try to improve a critical aspect of the health industry: cancer diagnosis.

What it does

Our algorithm diagnoses the patient, given traits about their biopsy lab results. With the data of breast cancer on a cellular level, we were able to train a learning algorithm to predict an accuracy of 99% on our test set. In an effort to decrease the amount of false negative diagnosis on our algorithm's behalf, we were able to achieve a 0.4% false negative diagnosis.

How we built it

In terms of data, we accessed the breast cancer dataset from UCI's machine learning repository. Once we had the data, we used Python and various packages within Python to both clean up and visualize our data. We then used Tensorflow to model this data using 3 different machine learning algorithms: logistic regression, softmax regression, and neural networks. Using a 60% / 40% data split of our data, we trained and tested our models.

Challenges we ran into

The breast cancer dataset that we used contained only 539 incidences. At the beginning, we had hoped for larger datasets that could train a more sophisticated model. As a result, we had to make do with a smaller model, but still managed to achieve great results.

Accomplishments that we're proud of

Both Tate and I are incredibly proud of ourselves for coming this far in all. This is both of our first hackathons where we submitted our projects. Furthermore, neither of us had attempted a project in this field in the past, and found that our respective knowledges in machine learning and Tensorflow piggybacked off of each other and pushed ourselves to a newer level.

What we learned

Throughout Treehacks, we experienced the effects of extreme sleep deprivation, poor diet, and high strain. We vow to pack acai bowls to the next hackathon we go to along with an air mattress. Jokes aside, we threw ourselves into the water with analyzing and modeling learning algorithms in tensorflow as we had little prior experience beforehand. We also went above the typical matplotlib in Python for visuals and experimented with Seaborn for next level visualizations.

What's next for Breast Cancer Classifier

We look to expand to bigger datasets

Built With

Share this project: