PaperParser

Inspiration

The research process is exciting, but navigating the literature search process can be daunting. Uncovering and understanding the most relevant papers demands time and patience. As researchers ourselves, we understand the struggle all too well: tirelessly sifting through search results, drowning in a sea of tabs, and grappling with complex GitHub repositories. To address this, we have engineered a solution to revolutionize the literature review process for academics: introducing PaperParser! Gone are the days of countless Google searches and endless scrolling through papers and Github repos. With PaperParser, we've harnessed the power of AI to streamline the literature search journey, ultimately enhancing the educational experience for researchers, whether an undergraduate or even a postdoctoral student.

PaperParser accelerates the review process by allowing you to upload a paper and instantly receive a concise abstract summary. But we don't stop there. We recognize the pivotal role that code plays in modern research, which is why PaperParser provides direct access to code associated with the paper, along with succinct code summaries. These two ingredients, with an emphasis on the codebase summaries, furnish you with a comprehensive understanding of the paper's contributions. Dive deep into the practical implications of the research with insightful code analysis, while simultaneously grasping its theoretical contributions through the abstract summary. Say goodbye to tedious paper skimming and hello to efficiency. With PaperParser, you can swiftly evaluate the relevance of a paper to your own research objectives without ever having to open it. To further elevate your experience, we've integrated an AI chatbot directly into our platform. Now, you can interact with research papers, pose questions, and seek clarification all from PaperParser’s user-friendly interface. This is done using vector embeddings to decipher which parts of the paper are most relevant to your questions. Empower your research endeavors with PaperParser – the ultimate partner for code-based literature review.

What it does

How we built it

Our tech stack, built using JavaScript, consists of an elegant architecture that integrates Express.JS, Node.JS, and React.JS. The combination of these tools allowed us to create an end-to-end pipeline from frontend to backend (and vice versa) for parsing through dense academic papers and their respective GitHub-hosted code bases. Outside of our language framework, we make extensive use of the GitHub API, Together.ai API, deployed with the Mistral-8X7b-Instruct-v0.1 LLM model to power our web app’s paper and code summary generation capability, as well as its powerful chatbot for allowing users to probe literature and specific Github program files more closely. Additionally, we employed Convert API for parsing research articles/publication PDFs for text, and lang chain to split up research papers into chunks that we converted into digestible embeddings using together.ai’s embeddings model.

Challenges we ran into

Choosing a stack: Early on, we were torn between React, React, Express, & Node, or Next.JS. Later on, from talking to mentors, it became clear that the stack of any project matters far less than we thought.
CSS: We spent an inordinate amount of time early on in the hackathon trying to bring our visual plan to reality. Our eventual simplistic final UI (built entirely from scratch) was one that took hours of iteration and documentation reading, eventually culmination in a tangible form that our entire team was extremely proud of.
Chatbot implementation: After learning about Retrieval-Augmented generation from Hassan, we spent hours learning about the technology before it could be successfully implemented.
Github link extraction from papers: Several papers that have some kind of computation background or application aer usually hosted on GitHub. Even though they have this commonality, these same papers can be remarkably inconsistent in their format, and the API we were using to parse urls for text (Convert API) would sometimes insert newline characters and spaces in the middle of the url. We had to decide wh

Accomplishments that we're proud of

Learning an entirely new technology (Express.js

What we learned

Flow with the wind. At one point, we sunk an hour and a half trying to use Together.AI’s node sdk for ChatCompetions before finally going down to their table for help. It turned out that their sdk was not well-maintained, and they recommended we use an ordinary fetch call. Although I had already gotten the fetch calls to work earlier, I had been so adamant on sdk because it felt more proper. Instead of hyperfixating on something small, think of the easiest path to get the most done. RAG: retrieval-augmented generation. Through conversation with Hassan, an employee with Together.AI, I learned about RAG. RAG enables you to augment general purpose LLMs with additional relevant information using vector embeddings. This is useful for cases where you want to ask questions with information the model has not been trained on. Furthermore, resources on RAG were sparse on the internet. Through lots of searching and reading, we were eventually able to implement RAG for the PaperParser chatbot and learn a lot in the progress.

What's next for PaperParser

PaperParser will start being used by researchers at MIT and we will iterate on their feedback.

Built With

Updates

Divya Shyamal started this project — Feb 18, 2024 12:12 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.