Inspiration

A combined 30 million people suffer from rare or chronic diseases, in the UK alone. Most of these diseases have genetic underpinnings of which comprehensive understanding is poor. We need solutions to link genetic and molecular mechanisms to disease. We built Geneius, a bioinformatical tool, powered by Claude, that does just this.

What it does

Geneius is an innovative tool that streamlines scientific research by rapidly generating a contextual understanding based on simple user input prompts. To initiate the process, users only need to provide a disease, and optionally, any hypothetical genes associated with that disease. The conversion of these inputs into an effective query is seamlessly managed within Geneius's backend infrastructure.

The resulting contextual information is then employed to instruct Claude, our intelligent assistant, to execute one of two specific tasks, tailored to the user's preferences:

Solution 1) Disease-Gene Link Validation: When presented with a hypothetical disease-gene link, Geneius searches through the scientific literature to find compelling evidence supporting this association. It then provides a molecular explanation, complete with a citation to the relevant research papers.

Solution 2) Disease-Gene Link Hypothesis Generation: For users in need of insights about a specific disease, Geneius constructs a comprehensive disease context by extracting relevant information from the scientific literature. Claude searches and retrieves this literature, presenting users with a curated list of genes implicated in the disease based on scientific research. Furthermore, it elucidates the molecular mechanisms underpinning the involvement of these genes in the disease.

Geneius excels in presenting this valuable information in an easily digestible format, all while operating at remarkable speed. To put its efficiency into perspective, a highly productive human researcher, capable of reading at an average speed of 240 words per minute, would need a staggering 10 hours of uninterrupted reading to parse through 500 pieces of scientific literature. In contrast, Geneius accomplishes this task in around 5 minutes, thanks to its utilization of Claude's 100k token window when assembling our scientific context. This efficiency revolutionizes the way scientific hypotheses can be generated and validated.

How we built it

Geneius was built in Python, using the Anthropic Claude API. We used Claude-2 to search and retrieve information from PubMed papers, sorted by relevance. Concretely, we query Title, Abstract and DOI from PubMed, and combine this into a context that consequently gets fed into Claude-2. Our prompt constrains claude to provide us structured output. For solution 1, we are just looking for (i) DOI's and paper titles supporting the claims, (ii) explanation of gene-disease link. For solution 2, our output is (i) a gene list implicated in the disease (ii) DOI's and Paper titles supporting the claims, and (iii) explanations detailing the molecular mechanisms relating the hypothesized genes to the disease.

Challenges we ran into

We encountered multiple issues with our initial ideas, where literature wasn't relating to specific findings of published papers we were trying to verify. We agreed to pivot in strategy, realising that this tool would be more effective, and more applicable to a wider audience. Instead, if it was able to search through literature for more general, commonly-queried terms like genes and diseases, it would perform better and provide more context to the end-user.

We also had problems formulating the prompts to provide information in the specific format we required, and, ultimately, to provide the most usefulness to the end user i.e., providing DOI information so the user can independently verify any returned information if desired. We took inspiration from the prompt-based workshop, as well as various trial-and-error to end up with an effective prompt.

Accomplishments that we're proud of

We are proud that we've managed to make a tool in such a short time that we would use in our every-day life as PhD students in AI for drug discovery. We believe this work can be extended into a tool that has multiple applications and uses, and is directly relevant to a key pain-point in producing modern research in a field where AI is frequently beginning to overlap in meaningful ways with biology.

What we learned

We learned how Claude API works, to integrate this into Python. We also got to appreciate the incredible expanse of scientific literature there is out there, and the potential LLMs have to efficiently search and retrieve information from the vast body of science that exists today.

What's next for Geneius

Currently, our tool works for hypothesis generation in relation to the genetic underpinnings of disease. We want to expand the capabilities of Geneius beyond this, to make it able to validate and generate in any scientific field. Making expansive literature search accessible and usable for everyone.

Built With

Share this project:

Updates