Hybrid Cubed Paper Similarity Search Tool

tool
The team

Inspiration

In the growing field of hybrid organic-inorganic perovskite materials, the Hybrid Cubed Database contains a vast collection of research papers and metadata focused on perovskite research. This database serves as a valuable resource for researchers, providing access to a wide range of studies on topics such as bandgap engineering, exciton binding, and material synthesis techniques. However, as the database expands, quickly identifying relevant papers becomes increasingly challenging. Our goal was to address this challenge.

What it does

Our Hybrid Cubed Paper Similarity Search tool addresses this need by allowing researchers to input a phrase or topic and instantly retrieve a ranked list of the most relevant papers from the database. Using AI techniques, including SciBERT embeddings and cosine similarity calculations, the tool analyzes and compares paper metadata components like title, abstract, keywords, and authors. This component-wise similarity scoring provides a breakdown of each paper’s relevance to the search query, enabling researchers to efficiently discover related studies and draw connections across the Hybrid Cubed database.

How we built it

First, we created a script that fetches the metadata for research papers using DOI as the parameter. It uses the CrossRef API to pull the information and transcribe it into a json file, where it is then accessed by the main processing script. Using a pretrained SciBERT model gets the embeddings for 4 components: title, abstract, keywords, and authors. Cosine similarity is then calculated for the embedded query vs the components, to assess the angles between the vectors (semantic similarity) rather than magnitude typically calculated with Euclidean distance. The average score is calculated to rank the output, and the similarity scores are output on a locally hosted webpage.

Challenges we ran into

First was the reevaluation of our topic. While we initially set out to train an AI assistant for decipher fortran electronic structure theory code, we learned the limitations of the model, and had to pivot. Wanting to stay in the field of materials science, we decided to create a tool to enhance the capability of the Hybrid Cubed database (a database being developed by our research group).

Accomplishments that we're proud of

Given limited backgrounds in computer science, we are proud to have produced a working tool to aid researchers in the hybrid organic-inorganic perovskite field.

What we learned

We learned the potential application of AI in assisting research. By applying AI to scientific literature, we discovered how embeddings and similarity measures can be used to identify patterns, find connections, and surface relevant studies from large datasets.

What's next for Hybrid Cubed Paper Similarity Search Tool

We would like to verify its efficacy with larger datasets, and potentially integrate more functionality so users can input their own datasets and find the most similar papers to their queries.

Built With

Updates

Andy Kapoor started this project — Nov 02, 2024 10:44 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.