Inspiration

As a student, I start every morning reading academic papers. I previously relied on NotebookLM to help with this, but it wasn't designed specifically for paper reading — I still had to read through the entire paper myself and then manually ask questions about what I'd read. The workflow felt backwards: why should I struggle through dense notation and jargon before I can even ask for help understanding it?

I wanted something different. An AI agent that could give me the big picture first, then guide me step-by-step through the content — from a high-level overview all the way down to equations, methodology, and code. Something that understood my background and interests, and explained things at the right level for me.

So I built it myself.

How I Built It

Architecture

Layer Technology
Frontend React 19 + TypeScript + styled-components
Backend FastAPI (async Python) + SQLAlchemy
AI Engine Google Gemini with tool calling
Database PostgreSQL
Storage aws S3
Auth JWT (access + refresh tokens)
Deployment Docker Compose + Nginx

Development Process

I started with the agent logic. The core reading methodology — how to break a paper into stages, what to explain at each stage, how to handle multi-section survey papers differently from single-method papers — this was the hardest design problem. I wrote the stage prompts, built the agent with Gemini tool calling, and tested it extensively before touching any UI.

I used an AI coding agent throughout the process. The workflow was: design the agent's conversational logic and prompts first, write tests, then iterate based on evaluation results. At the same time, I built out the frontend and backend infrastructure.

Once I had a working prototype, I set up a public Git repository and shared it with PhD candidate friends for real-world testing. Their feedback — on which explanations were helpful, where the agent got confused, which reading stages felt redundant — directly shaped the prompt design.

Finally, I containerized everything with Docker, set up s3 for file storage, configured Nginx with proper caching and security headers, and deployed the full production stack.

Challenges

Scaling from Prototype to Product

It's surprisingly easy to build a working prototype as a solo developer with AI assistance. The real challenge hits when the project scales up. Suddenly, the frontend expects data in one format, the backend returns it in another, the agent generates markdown with image paths that don't resolve in production, and the Docker deployment has different environment variables than local development.

I learned the hard way that I needed to write documentation to manage myself. I created a file that serves as a single source of truth for the project's architecture, API contracts, database schema, and development patterns. Every time I context-switch between frontend, backend, and agent design, this document keeps everything aligned.

Evaluating Agent Quality

As an AI agent product, the most time-consuming part wasn't writing code — it was evaluating agent behavior. How do you know if one system prompt produces better explanations than another? How do you test that tool calling works reliably across different paper types?

I spent significant time:

  • Comparing different prompt designs to find which produced the most helpful reading experiences
  • Testing tool calling reliability (does the agent extract images at the right time? does it transition stages correctly?)
  • Evaluating across paper types: math-heavy ML papers, survey papers with dozens of sections, biology papers with dense figures
  • Tuning the reading plan generation to produce sensible stage orderings for different paper structures

There's no unit test for "this explanation is helpful." It required reading hundreds of agent responses, identifying failure patterns, and iterating on prompts — a very different kind of engineering than writing code.

Full-Stack Solo Development

This was my first time building a complete product end-to-end: frontend design, backend API, database schema, authentication, file storage, AI integration, containerization, and deployment. Each layer has its own best practices and pitfalls, and learning them all simultaneously meant I had to think about the product at a systems level — how every piece connects and where the failure modes are.

The upside: I now understand every layer of the stack and can make informed trade-offs. The downside: there are always more things to improve than hours in the day.

Share this project:

Updates