I wanted to encourage people to and get screened earlier than they think and the idea is that by having a prediction of when you were onset with cancer gives one a relative idea as to how to move this predictor to predict even if you don't have cancer but have some N number of mutations and given the rate of mutations to help in getting people to get screened earlier so they can treat earlier.

What it does

Given a genome sequence it predicts the age of when someone was diagnosed with cancer

How I built it

Using Pytorch I built many different models, and RNN, Feed Forward NN, Logistic, and Linear classifiers to see how it would behave on the TCGA dataset.

Challenges I ran into

Debugging Pytorch and the node sizing to pool down correctly. Should have used a CNN now realizing that a lot of genome sequencing, 99 percent of dna is the same so I could apply a convolution to it to filter it out. Also reading in the data from TCGA.

Accomplishments that I'm proud of

It works with 80 percent accuracy.

What I learned

A lot of ML

What's next for Cancer Predictor

Use it to predict ahead of time instead of having immediate labels.

Share this project: