Inspiration

When learning new technologies, the most common advice is RTFM! However, most times the people who write the manual do not make the documentation the most accessible. To truly understand where you need to look, you have to read portions of the docs you are not interested in or even need to know. This is a big time waste and I find it quite frustrating. A common approach is to plug it into an LLM, but that can easily overwhelm their context window.

What it does

This application transforms your technical documentation into an interactive, AI-powered assistant. Simply upload your documentation as a PDF, and you’ll gain a natural-language interface to explore it. The system uses Retrieval-Augmented Generation (RAG) to break the docs into meaningful chunks, embed them into a vector database, and deliver fast, accurate, and context-aware responses from a powerful LLM.

You can continue prompting seamlessly — no need to re-upload — until you get the answers you need.

How we built it

We used Chonkie to intelligently parse and segment PDF documentation into semantically meaningful chunks. These chunks were embedded and stored for fast retrieval, enabling highly relevant context to be stored into Pinecone DB, a vector database. These vectors were then inserted into Gemini Flash 2.0 — our chosen LLM — for precise, performant answers to user queries.

On the backend, we built a lightweight yet scalable API using FastAPI, which handled PDF uploads, query processing, and response generation. The frontend was developed using React, providing users with an intuitive and interactive interface to upload documents, pose questions, and receive AI-generated responses.

We collaborated using GitHub, Git, and VS Code, which allowed us to iterate quickly and work efficiently as a team. This project also gave us an opportunity to learn best practices in building full-stack AI tools — from vectorized search to UI integration — with technologies that were largely new to us at the start of the hackathon.

Challenges we ran into

One of the most significant challenges we faced was implementing a chunking strategy that maintained semantic relevance. Since embeddings convert text into high-dimensional vectors based on meaning, it’s crucial that the chunks themselves preserve coherent and contextually rich units of information. Early approaches to chunking led to fragmented or overly broad sections, which resulted in irrelevant or imprecise LLM responses. After extensive research and testing, we adopted a method that better captured semantic relationships within the document, greatly improving retrieval and response quality.

Another technical hurdle was managing CORS (Cross-Origin Resource Sharing) issues between our FastAPI backend and React frontend, which were hosted on separate servers during development. Resolving these required fine-tuning headers and configuring our backend to properly handle preflight requests — a necessary step to ensure smooth communication between the UI and API.

Accomplishments that we're proud of

One of our biggest accomplishments was learning how to make LLMs more intelligent and grounded by connecting them with custom data — specifically, enabling them to answer questions about information not available on the public web. This shift from generic LLM outputs to context-aware, document-grounded responses was eye-opening and helped us understand the power of RAG systems in real-world applications.

We’re also proud of the fact that, despite limited prior experience with AI and full-stack development, we built a working product from scratch — complete with PDF parsing, semantic search, vector database integration, and a clean React frontend. It was a full learning journey, and we walked away not only with a functioning app but with a much deeper understanding of modern AI pipelines.

What we learned

One of the most valuable moments came from a workshop we attended, which broke down the fundamentals of Retrieval-Augmented Generation (RAG) in a way that truly clicked. We learned that each word or phrase can be represented as a numeric embedding in a high-dimensional vector space, capturing its semantic meaning. These vectors are compared to the user’s query vector, and the closest matches are retrieved and passed to the LLM — forming the basis of how RAG operates.

On the frontend side, we explored several options before landing on React. Initially, we experimented with Gradio, a Python-based frontend framework. While easy to use, it lacked flexibility and visual polish. We then tried HTMX, which claims to eliminate the need for JavaScript. However, we quickly ran into limitations — particularly its inability to handle JSON responses, which are foundational to modern web APIs. We came to understand why javascript is so widely used on the frontend.

What's next for Dino Docs

Hopefully adding the project, introducing better chunking, separate namespaces and more granular models for more accurate queries regardless of doc size, and maybe a bit more flare on the front end. Perhaps even a VS code plugin.

Built With

Share this project:

Updates