I saw the Genome Link competition and thought playing with this could teach me something about using genome information. My sister is editor for Genome magazine, and I had chatted with her about new apps in this area. I'm also always looking for ways to expand the ways I have worked on Alexa skills, and thought this would be a good opportunity to create one in Python instead of Javascript which is supported by Amazon with their Node.js SDK.

What it does

The program is an Alexa skill that is supposed to go to Genome Link to access genome data for different users. It gather reports on all the personality traits and stores them in a Amazon DynamoDB along with a name supplied by the user. The user can then select pairs of saved genome reports to compare, looking for matches in high or low levels of traits.

How I built it

Coded in Python with an Atom editor. Did local testing using the standard unittest, then uploaded to Amazon lambda and used the new Alexa Simulator in the developer console for testing, monitoring output from Cloudwatch logs.

Challenges I ran into

I had a while getting OAuth to work between the Amazon Alexa app and the Genome Link servers. Adam Jones (@domdomegg) had useful instructions on his Github, although I still couldn't get it working until a forum discussion on the last day got me to realize there was a discrepancy in the OAuth scope. Fixed, but now I'm getting unusual tokens and the SessionEndedRequest isn't firing. Sigh.

Accomplishments that I'm proud of

The programs basically works. I didn't have time to go through Amazon certification which means this won't be available in the store, but I can make anyone a beta tester if they want to launch it.

What I learned

A lot about OAuth including the differences between implicit grant and code grant, both of which the Alexa app can request, although Genome Link only takes the latter (I presume for increased security). I also learned about some of the GWAS studies they use as the basis for their work, which give some insight into how their algorithms are derived and the accuracy/inaccuracy of some of their descriptors. Being familiar with the Big Five model, for example, I felt I needed to rephrase some of the summary text that is automated in the reports out of their API.

My medical training makes me a little leery of over-interpreting the meaning behind some of these association studies, and I'm still not sure about the best way to make sure there is sufficient understanding among the general public on the limitations. Still, this information will be used in many different ways, and I think most people recognize that.

What's next for Genome Match

Pass certification (and add an icon!). Once through those basics, consider improvements like:

  • figure out if there is a better way to get people to log in and out for collecting multiple OAuth tokens for multiple data sets, maybe resetting the state somehow using the API gateway?
  • add more types of reports (probably the 'intelligence' and 'sports' sets next)
  • breaking down long lists into shorter categories
  • more variation in the language and adding niceties like proper pluralization
  • giving the warning information that is sent with the report summary
  • setup for other locales like UK, Canada, India
  • checking data for duplicates if same token used
  • rename function for data sets
  • add progressive response message during data report download
Share this project: