Inspiration

Recent spikes in influenza cases in the United States and the general fight in science against rapidly changing and evolving viruses inspired us to identify this problem and pursue an algorithmic and computational solution.

What it does

Using genomic data on the hundreds of influenza strains from the past 30 years, our algorithms tackle the problem of vaccine production in a two-pronged manner. First, our statistical analysis of particular strains over time enable us to identify common sites of mutation in a particular influenza strain from year to year. This tool serves to help researchers visualize the changes that are occurring in the DNA sequences of these viral vectors. By understanding the mutations that change the virus the most, they can develop more precise viral cocktails for vaccines based on previous vaccines.

The second tool that we offer is a predictive analysis machine learning model that uses logistic regression modeling to help government officials and vaccine developers predict the transmissivity and epidemiological characteristics in a particular viral strain in the next few years. By utilizing historical data and information from previous influenza pandemics, this tool is able to predict mutations that could significantly alter the disease characteristics of influenza in a given year. In particular, the tool leverages large data sets of strain genomic data from over 37 country samples leading up to the major 2009 global flu epidemic.

How we built it

Python scripts were written to parse through public genome data on influenza from NCBI. Pairwise sequent alignment algorithms were used to compare sequential sequences of a strain over time. Scikit-learn and num.py were used to develop and train the model for predictive analysis. JavaScript and HTML were used to develop our website and research database. Various graphics components were used for data visualization.

Challenges we ran into

Parsing through and statistically manipulating the data from the NCBI government database was probably the most difficult aspect of our hack. In addition to this, we had issues developing the machine learning model and identifying the most accurate set of data for future projections of outbreak probabilities.

Accomplishments that we're proud of

Developed an original algorithm for dealing with and statistically analyzing NCBI and genomic data rigorously. Developed an interactive user interface that serves as a base plate for a research database.

What we learned

Parsing data and utilizing REST APIs to deal with data. Formatting and graphics (data visualization) on website. Machine learning (through logistic regression) model development.

What's next for Flulytics

Our next steps are likely to build out our platform more extensively. The goal is to interface better with the NCBI genomic database and select out key statistics that could be useful to researchers. Our other goal is to begin training our model continuously as new data flows in.

Future Business Models

Model 1: Business to Business: One model we potentially see ourselves working under is by engaging in a technology licensing deal with vaccine producers and pharmaceutical companies. By helping to optimize their production lines and improve profit in the short and long term, our technology would enable companies with the tools to grow.

Model 2: Business to Government: Another potential business model we are exploring is based on engaging in a contract with the United States government or government entities. By helping to avert major public health crises and preparing adequately for future outbreaks, our technology could help groups like the CDC and NIH predict the severity of future crises.

Share this project:

Updates