We came up with this idea when we tried running a MSA tool on 64 genomes, it took more than a couple hours and then my computer crashed. We realized that this is a problem that many people will face when trying to work with sequence alignment. So, our plan is to create a faster way while sacrificing some accuracy.
What it does
We started with creating multiple groups of possible alignments that were randomly generated. Then we calculate the fitness of each group to see which combination of alignments is the most accurate. We take the most accurate few and then apply mutations and crossovers. This gives us the members of the next generation and we continue this until we get a decent result.
How we built it
We built is using Python and the Biopython library. We also got our data from the NCBI datasets.
Challenges we ran into
Our original idea was to create a visualization tool for aligned DNA sequences, however, when we tried using a pre-existing MSA such as Muscle or MAFFT it took a very long time and did not complete. We changed our idea last minute to work on how we could make a faster MSA tool.
What we learned
This was our first time working with DNA sequence datasets and Biopython. We learned a lot about what is involved in bioinformatics and the problems that they might face.
What's next for Using Genetic Algorithms to Solve MSA
Next for GAMSA we need to work on adding constraints, so the random insertion of spaces is not so random. Currently the chances of find a good alignment by randomly generating the pool is very slim that that is something we will work on in the future.