Ultra-high throughput processing of SRA data on AWS

Introduction video (1:59)

About our project Big-Data Analysis of RNA-sequencing data to gain insights for developing vaccines and drugs against the spread of the coronavirus.

Where does the SARS-CoV-2 come from The novel coronavirus, SARS-CoV-2, is believed to have stemmed from a zoonotic transfer from bats and pangolin to humans in a wet market of Wuhan, China in 2019.

Impact of the pandemic The resulting pandemic has infected millions and has already crippled the global economy. While there is an intense research effort to sequence virus isolates to understand the evolution of the virus in real-time, our understanding of where it originated is limited by the sparse characterization of other members of the Coronaviridae family.

What we are doing We are re-analyzing all RNA-sequencing data in the NCBI Short Read Archive to discover new members of Coronaviridae. Our initial focus is to re-analyze all vertebrate and meta-transriptomic RNA-sequence libraries, that is +1.1 million samples or 5,720,000 gigabytes of data.

Why is this data so important The resulting Coronaviridae sequence database will be the definitive characterization of this viral family. This free and public database will assist the global research effort by providing the deepest evolutionary conservation data possible, offering critical insight into the origins and evolution of this scourge.

What the impactful potential findings could be:

  • Are there more closely related SARS-CoV-2 viruses than known, specifically strains capable of recombination (mixing) with the SARS-CoV-2 which would hinder ongoing vaccine effort?
  • What is observed recombination rate between different CoV species, and how frequently is this expected to occur?
  • What are the animal/environmental reservoirs for SARS-CoV-2 or similar viruses? Do we have to limit contact with certain species?
  • Can we identify evolutionary conserved (and thus functional) regulatory motifs and/or RNA secondary structures across Circoviridae?

How can you participate in this project Serratus is an Open Science project, we welcome all scientists and developers to contribute.

In addition, we are looking to raise capital to allow the scientists working on Serratus to do so full-time and expand our team.

Visit our development page to learn more:, or reach out to Artem Babaian at ababaian {at} bccrc {dot} ca.

+ 2 more
Share this project: