The effectiveness of cancer treatment highly depends upon the genome and protein sequences of the tumours. There is a growing capability to gather genomic and proteomic data related to cancer, as technologies such as DNA sequencing improve (e.g. furthered by the Oxford spinout, Oxford Nanopore Technologies). We wanted to find a way to process the data in order to get further insight into cancer genomics (e.g. relationship to cancer type), as well as help doctors easily identify the best treatment for a patient.
What it does
Exploring the possibilities of classifying cancer histology and tissue origin using mutations and protein occurrences.
How I built it
Azure machine learning & python
Challenges I ran into
Data was not easily accessible and not well formatted. There were many variables and assumptions we could have taken.
Accomplishments that I'm proud of
What I learned
A biochemist and material scientist switched roles with two computer scientists, each learned about the disciplines from the other team members - including data analysis, web scraping, protein sequences in dna, protein mutations and types of cancer and mutations.
What's next for CancerPred
Gather much more data to improve prediction accuracy of machine learning model (neural network) and work on making python scripts more robust. Finish off web interface to easily input protein sequence or even entire genome sequence.