Guava

Guava= Genomes for User Accessible Variant Annotation

Hospitals around the world are beginning to sequence patient DNA. These data, referred to collectively as patient genomes, are large data sets that require interpretation. Once interpreted, genomic data can inform many medical decisions including disease prediction and predisposition to diseases such as cancer, drug responses using pharmacogenomics and identifying genetic causes of disease.

Many annotation tools exist for genomes, but no open source web applications have emerged for clinical annotation of genomes and presentation of results.

We have developed a comprehensive API to process patient genomic sequence data from standard Variant Call Format (VCF) files and annotate these variants. Our annotation pipeline uses a messaging queue to achieve robust scalability that is ready for a cloud infrastructure.

The Guava API is designed for the clinic. We have implemented the technology to function independently on a local hospital cluster. A key-based authentication system provides secure access for authorized clinical users to only access their patients. The genomic variant queries can be performed rapidly, which is critical for whole genome sequence data which can contain upwards of 5 million variant rows per patient. Variants are annotated with gene effects, allele frequency, clinical significance and human phenotype ontology terms, and this list can be expanded using the open source annovar pipeline.

As a proof of concept we have constructed a web application that provides access to authenticated users to upload and annotate their genomic VCF files. Once uploaded, clinical users can access all annotated patient data.

We intend to beta test this service in the clinic.