Inspiration

As researchers, so much of our time every day is spent looking through papers. In the ML field specifically, papers are being published at an exponentially increasing rate. While this is exciting, it makes finding the papers that are actually worth your time increasingly challenging and time-consuming. We want to help fix this problem.

On OpenReview, thousands of submissions to top conferences are reviewed by experts and made publicly available. We aim to analyze these reviews and turn them into clear, useful reports that highlight which papers are worth reading and which parts deserve the most attention.

What it does

Veros aggregates review scores and comments from OpenReview and uses them to compute a deterministic Veros Score for each paper. The score weighs each reviewer’s rating and confidence to estimate the overall quality of the paper, while also considering signals such as novelty, technical quality, clarity, and impact. When these categories can be derived directly from structured reviewer scores, Veros uses those values; when they cannot, it falls back on LLM analysis of the review text.

Veros also generates an AI summary for each paper and identifies which sections are most valuable to read, helping users decide whether to skim, dive deeper, or move on. Instead of requiring users to parse every review manually, Veros turns peer-review data into a concise report about the paper’s strengths, weaknesses, and relevance.

Beyond individual papers, Veros includes researcher rankings based on the average Veros Score of papers in our database. Users can search for specific authors to view their average score and compare research impact across the platform.

Veros also includes an Explore section designed to guide users through a topic. Users can enter an area they want to learn about, and Veros generates a sequence of papers that starts with introductory work and gradually builds toward more advanced research.

How we built it

We started with a high-level design for Veros, then tested the OpenReview API with Python scripts to understand what review data we could reliably access and build around. That early experimentation shaped the core workflow: fetch a paper, extract its reviews, score it, summarize it, and present the result in a way that is useful for researchers.

Claude Design helped us brainstorm the product direction and generate the initial website template. We then built the frontend with Next.js, React, TypeScript, and Tailwind CSS, focusing on a clean reading experience for papers, Veros Scores, AI summaries, reviewer quotes, and saved papers.

The backend was built with FastAPI, PostgreSQL, SQLModel, Redis, and Celery. It handles OpenReview ingestion, background processing, paper and review storage, deterministic score computation, and AI insight generation. The Veros Score combines review ratings, reviewer confidence, consensus, acceptance status, and review volume into a single 0–10 score.

We also added semantic search using sentence-transformers and pgvector so users can discover related papers beyond exact keyword matching. The final system ties together OpenReview data, background workers, scoring logic, LLM-generated insights, and a polished web interface into one end-to-end paper exploration tool.

Challenges we ran into

  • Search algorithm: We had to figure out how to return papers that were truly relevant to a user’s keywords, not just papers with exact word matches. This led us to combine keyword search with semantic similarity.
  • Scoring methodology: Different conferences use different review scales and decision formats, so we needed a way to standardize scores across venues. We built a normalized Veros Score that combines ratings, confidence, consensus, acceptance status, and review volume.
  • Website performance optimization: With thousands of papers in the database, generating search results and paper pages quickly became a challenge. We used background processing, stored outputs, and database-backed retrieval to avoid recomputing everything on demand.
  • Hardware limitations: Running strong models locally was difficult because paper summarization and review analysis require capable models and large context windows. We had to balance model quality, speed, and available hardware.
  • Model capabilities and cost: We needed a model that could accurately summarize papers, identify meaningful reviewer feedback, and stay affordable to run repeatedly. Prompt design and model choice became important parts of the system.
  • Paper similarity: Related papers often do not share the same keywords, so simple text matching was not enough. We used embeddings and vector search to find papers that are conceptually similar.

Accomplishments that we're proud of

We’re proud of building a product that feels approachable despite working with dense academic data. Veros presents papers, review summaries, reviewer feedback, and ranking signals in a clean interface that is fast, responsive, and easy to move through. Instead of overwhelming users with raw review data, the site helps them quickly understand why a paper may be worth reading.

We’re also proud of the ranking methodology behind the product. Review data varies widely across conferences, so we built a transparent scoring system that normalizes those differences and combines multiple signals, including reviewer ratings, confidence, consensus, acceptance status, and review volume. The result is a score that is easier to interpret while still being grounded in the underlying peer-review process.

What we learned

We learned how challenging it can be to work with messy real-world data, especially when review formats vary across papers and conferences. Parsing OpenReview data forced us to handle inconsistent fields, missing values, and different scoring systems.

We also learned a lot about performance optimization. As our dataset grew, we had to think carefully about database queries, background processing, caching/stored outputs, and how to keep the website responsive while working with thousands of papers.

On the frontend, we learned how to turn dense academic review data into a clean user experience with readable summaries, intuitive navigation, and clear visualizations. We also gained experience with new frameworks and tools across the stack, including Next.js, FastAPI, PostgreSQL, Redis, Celery, and vector search.

What's next for Veros

Next, we want to expand Veros beyond its current dataset to cover a wider range of conferences, fields, and research topics. A broader corpus would make the platform more useful for researchers exploring unfamiliar areas or comparing work across domains.

We also hope to grow Veros into a centralized space where users can contribute their own perspectives on papers they have read, discuss emerging ideas, and surface useful context that may not appear in formal reviews alone.

Finally, we plan to continue refining our scoring methodology. As we add more venues and review formats, we want to make the Veros Score as fair, transparent, and unbiased as possible while preserving its grounding in real peer-review signals.

Built With

Share this project:

Updates