Research-RAG

The login page of Research-RAG
Research-RAG website home screen

Inspiration

Every research project starts with a folder of PDFs that just keeps growing. You know the answer is in there somewhere, but Cmd+F fails and Google doesn't know which papers you have. We wanted a tool that reads your papers, answers in plain English, and always shows you where it got the answer from.

What it does

Drop your PDFs into a web app, ask a question, get a written answer back with every claim linked to the exact passage it came from. You can filter by year or section, and your library stays private to you.

How we built it

Frontend: React + Vite + Tailwind — upload, ask, read. Backend: a small Node/Express server that handles uploads and talks to Gemini. Database: MongoDB Atlas, because it gave us both vector search and keyword search in one place.

Challenges we ran into

"The biggest one: the RAG kept returning the wrong answers. At first we built it the way most "chat with your docs" tools are built — pure semantic search, just comparing the meaning of your question to the meaning of each passage. It sounded smart in theory and it was fine for vague questions, but it kept missing the obvious stuff. Ask for a specific author, a specific number, a specific technical term, and it would hand back passages that were vaguely related instead of the one that literally said the thing.

The fix was to stop relying on similarity alone. We added a second search that looks for exact words the old-fashioned way, ran both searches together, and merged the rankings so a passage wins if either method thought it was good

Accomplishments that we're proud of

It actually cites its sources — every answer links back to a real passage. The hybrid search runs in a single database query, so the whole thing stays small and fast. From hitting enter to seeing a cited answer is a couple of seconds. It runs entirely on free tiers.

What we learned

Pure semantic search isn't enough on its own — you need exact-word search too. The order matters: filter before you rank, not after. Citations aren't a nice-to-have. If the user can't double-check you, the tool is worse. The clever retrieval was the quick part. The UI, the error messages, and the upload flow took just as long.

What's next for Research-RAG

Support more than PDFs (arXiv links, notes, Word docs). Follow-up questions that remember the last one. Better paper metadata via DOI / arXiv / Semantic Scholar lookups. Inline highlighting in the source PDF when you click a citation. Shared libraries so collaborators can ask questions across the same corpus