Inspiration
During last summer I was introduced to a conservation effort of an endangered species of bird, the Roseate tern (sterna dougallii). I had briefly done research and prototyped some ideas but decided this hackathon would be a perfect place to restart, collaborate with people interested in the project, and tackle this challenge head first in a hackathon.
What it does
The overarching goal of this application is to take genetic samples from different birds of the same species and to locate sections of their DNA that were unique to specific geographic origins to assist researchers with studying migration patterns. (Basically 23&Me for a Specific Endangered Species of Bird)
The application we built during the hackathon, takes raw text information sourced from the National Center for Biotechnology Information (NCBI) website and parses the information to a .CSV file utilzing Python and Regular Expressions. This structured data format allowed me to utilize Excel to select 3 genetic sequences out of 112. Each of these genetic sequences were saved as a FASTA file which contains meta data about the genetic sequence as well as a section of the genetic sequence of different organisms. Utilizing the BioPython library split each FASTA file into two FAFSTA files utilizing the SeqIO (Sequence Input Output) module. This allowed me to import the FAFSTA files into the application ClustalX2 where I could start the process of aligning the genetic sequences to start comparing them.
How we built it
We broke down the application into several stages:
- Source the genetic data
- Align/merge the different fragments of genetic data into one large file
- Identify where SNPs (Single Nucleotide Polymorphisms) were occuring within the different organisms
- Cross reference the SNPs with geographic location to identify which (SNPs) are specific to a subset of geographic regions
- Create a GUI application that allows researchers to detect geographic origin by uploading genetic sequence.
The first few hours of the hackathon involved research since our biological knowledge was quite limited. We utilized my university's website to collect a list of reading materials which could help me with making progress on this application. Eli and myself read through these materials to start working. The remainder of the time we were attempting to go down the list.
Challenges we ran into
The field of Bioinformatics is virtually brand new to me so there was a lot of learning and going down wormholes downloading softwares that we didn't know how to install/use, reading texts and documentation that contained technical knowledge that went over my head, and overall trying not to get discouraged after not making the ideal progress on the application.
Accomplishments that we're proud of
I stuck with the project from the start to the end of the hackathon. We got involved in the extracurricular activities hosted by the WesHack, partly due to me working in the same room that the events were taking place in. We was able to meet a lot of new people and witness some fellow Eastern CT State University students participate in their first hackathon.
What we learned
Its okay to try things that don't end up working out. There's a quote that was recurring in my mind while working on this project. ''If you don't succeed at first, change your defintiion of success".
What's next for CUT & SNP
I will keep on working on this project and we hope that this application opens up the possibilities to what researchers can do with a little bit of Python knowledge and a whole lot of coffee.
Built With
- biopython
- clustalx
- python
Log in or sign up for Devpost to join the conversation.