Inspiration Andrej Karpathy, one of the most well-known figures in AI, built arXiv Sanity — a simple but powerful tool that helped researchers keep up with new papers in their field. It quickly became a favorite among the research community. But as Karpathy got busier, maintaining it became difficult, and the project was eventually discontinued in 2021. Our project is a modern rebuild of that idea — a cleaner, faster, and more stable version that brings back the usefulness of the original tool while adding new features.
What it does Latent Arxiv Sanity makes it easy to organize and explore research papers. You can follow your favorite research areas, discover related work, and get recommendations for similar papers — all in one place.
How we built it We use Sentence Transformers to capture semantic meaning between papers, along with TF-IDF for traditional keyword-based similarity. The backend runs on Flask, while Streamlit helped us quickly prototype ideas and test models. Data from the arXiv API is processed, cleaned, and stored locally, so the app can search and compare papers efficiently.
Challenges we ran into Merging parts of the old arXiv Sanity code with our new design was tougher than expected. The data formats didn’t always align, and preprocessing the metadata to work with both TF-IDF and transformer embeddings took some time. Balancing speed, accuracy, and usability was another ongoing challenge.
Accomplishments we’re proud of We’re proud that it actually works! The system fetches papers, processes them, and serves meaningful recommendations — all without breaking. Rebuilding such a widely loved tool and getting it to a stable point felt very rewarding.
What we learned We learned a lot about how modern embedding models like Sentence Transformers can work hand-in-hand with classic NLP methods. We also learned how to build cleaner, more maintainable backend systems and how small UX improvements can make a research tool feel much more enjoyable to use.
What’s next for Latent Arxiv Sanity Next, we want to make the recommendations more personalized — maybe even adapt to a user’s reading history. We’re also planning to add topic-based browsing, trend insights, and a simple way for users to share or bookmark interesting papers.
Built With
- flask
- javascript
- python
- sentence-transformer
- skilearn
- streamlit

Log in or sign up for Devpost to join the conversation.