Inspiration

Academic papers are valuable sources of information, and being able to read them means that you can stay up to date with the latest happenings in your field. However, academic jargon, needlessly dense structure, and sometimes intentional obfuscation are prevalent. In our experience, understanding a paper in a field that you aren’t deeply familiar with can take more than a week of dedicated, focused reading.

The individual topics in a paper are never unapproachable, though. The difficulty comes from identifying topics, which topics are prerequisites to others, and often what the paper is attempting to introduce.

What it does

This is where PaperParser comes in. PaperParser looks at a paper, and does all of the boring stuff. Once it is done, you are presented with a graph representing the topics covered in the paper, their relationships, and where to start.

PaperParser doesn’t summarize the topics, though. Instead, you research individual topics, and test your understanding off of the summary rater, which determines your grasp of the subject. When applicable, it will link you to useful resources, but it will never think for you.

Under the hood

PaperParser uses the Gemini API to build a multi-step pipeline for submitted papers. Papers are scanned for topics, with no regard for relation or relevance. Then, topics are qualitatively linked, with relationships created between them. On the frontend, this allows the topics to be sorted into a hierarchy, showing you where to start. Then, user summaries are forwarded to the Gemini API, which determines their level of understanding.

Challenges we ran into

Designing a system that actually pulled relevant topics from papers was a significant challenge, as both model-based and NLP approaches often focused on the wrong elements of the paper.

Accomplishments that we're proud of

PaperParser works very well so far with topics that we are familiar with, and while that cannot be guaranteed to extend to topics we don’t understand, it’s a good sign.

A very cool thing that PaperParser has been able to do is reconstruct curriculum order from documents that do not mention it, which proves that it has some idea of order and prerequisites.

What we learned

We learned how to integrate AI both into our workflow and our project. None of us had vibecoded an app before, and while it was not by any means easy, it changed the way that we did work. Instead of writing code, we described features and debugged the result, which is a different way to approach software development.

Working with the Openrouter and Gemini APIs was another learning curve, since they use different formats.

What's next for PaperParser

We hope to add a login/signup account system, so that users can keep track of their knowledge across multiple topics.

The ideal endpoint for this project is as an Obsidian plugin, because Obsidian’s ecosystem is very supportive of what PaperParser aims to do. Notes could be created and referenced, and topics could be built upon existing user knowledge.

Built With

Share this project:

Updates