Inspiration

HaploRecstr is a C++ haplotype reconstruction program. Method used is based on the algorithm introduced by Rastas et al.[Rastas, P., Koivisto, M. et al. (2005). Algorithms in Bioinformatics: 5th International Workshop. Berlin, Heidelberg: Springer. 145-151]

What it does

The program uses a Hidden Markov Model (HMM) to construct the data. By default, the model is initialized by going through the data to select the major alleles and assigning parameter values, then EM algorithm is used to optimize the likelihood function, and then Viterbi is used to reconstruct the haplotype data.
The program takes genotype as input. Outputs of the program include: 1) a set of reconstructed haplotypes; 2) a summary of the frequencies of all possible haplotypes in the population (sorted in descending order).

How I built it

C++

Challenges

3-dimensional matrix manipulation and time efficiciency when going through large datasets

Accomplishments that I'm proud of

It's our first attempt to implement EM algorithm using HMM

What's next for HapRecstr

Improve phasing accuracy and optimise memory cost

Share this project:

Updates