GIF
Hero page
Tech stack workflow
Agent workflow

The motive

Research is the lifeline of innovation

When entering an age where we’re not just researching AI but using AI to research, it’s getting easier and easier for false claims or flawed logic to slip between the cracks. One of our teammates experienced this firsthand when their company discovered a mistake in a research paper they had relied on. That single oversight created setbacks in their own work and exposed a larger issue: when research fails, everything built on top of it is at risk. That’s what inspired us to create CHECKR.

We need guardrails to protect the lifeline of innovation.

We need CHECKR.

Up and running

What does CHECKR do?

CHECKR puts your paper through the ultimate research boot camp. Every claim, every proof, every line of code is sent down a carefully engineered pipeline; analyzed, verified, and stress-tested. It actually runs the code in the paper and does the math, traces the logic, and calculates exactly where things might have gone off track based on the authors’ claims. By the end, you don’t just get a verdict of “verified” or “not verified”; you get a clear map showing where the reasoning holds, where it wobbles, and where it might have gone off the rails. CHECKR also comes equipped with a chatbot you can interrogate about any part of the paper. And if you prefer talking over typing, a conversational AI lets you walk through the research as if you had a lab partner sitting next to you.

We made sure that CHECKR doesn’t replace human judgment, it amplifies it by providing clarity, transparency, and actionable insight. It helps researchers trust the work they rely on and feel confident that AI is supporting, not taking over, the peer review process.

The urge to break our laptops

Challenges we ran into

Building CHECKR wasn’t always smooth, and although occasionally painful, we know it’s necessary.

One of our biggest challenges was verifying the math itself. Math demands precision, and translating complex proofs and quantitative claims into something that can be formally validated pushed us to rethink how we structure our pipeline. We ended up successfully integrating Lean into our workflow, allowing us to formally verify complex claims without losing their original context.

And verifying math was only part of the challenge. With multiple agents analyzing, verifying, and reasoning in parallel, the challenge wasn’t just generating feedback, it was synthesizing it. Everything needed to be consolidated into clear, actionable outputs, and more importantly, that feedback had to loop back into the agents themselves, allowing them to refine, re-check, and execute again. Building a system that could think together took a lot of iteration and hard trade-offs. But each question we posed about the efficiency of our architecture strengthened CHECKR, pushing us to refine a system that thinks together and delivers verification you can trust.

Redbull, code, repeat

How we made CHECKR

We made CHECKR by initially creating a brief architecture diagram on Lucid, where we addressed the agent workflow, the workflow model, the role of each agent, and other services we needed such as OCR. Then, we spent a lot of time talking and looking through the sponsors to see which of them offered services that would help us with the implementation. We settled on Google's Document AI API for the paper OCR and Gemini LLM models, and a simple plan and executor ReAcT model. We used LangGraph heavily to direct the workflow and update a node's status. We wanted to use both context and syntax for coding verification by simply running with necessary contexts, and math proofs for validate the math claim, and our research led to use LEAN and SymPy. Finally, we used Supabase to put in all paper analysis for a simple RAG agent with chat context to talk more about the paper.

Goodbye yellow brick road

What's next for CHECKR

After speaking with Stanford post-doc alumni and students, we know CHECKR’s end goal is simple: make their lives easier.

Behind the scenes, that means continuously refining our backend architecture, strengthening our verification models, and optimizing performance at scale. Each iteration moves us closer to a system researchers can trust without hesitation. At the same time, we know the best product decisions won’t come from a whiteboard, they’ll come from the field. We’ll keep working directly with academics, lab teams, and industry R&D groups to better understand their workflows, pain points, and blind spots.

We believe the future of research shouldn’t just be faster. It should be smarter, safer and accountable.

And CHECKR is how we get there.