Inspiration
We wanted to make a PDF report advisor since current LLMs are inefficient at navigating and querying long, structured documents. By applying a RAG pipeline, we can index and retrieve only the most relevant chunks, reducing latency and cost.
What it does
- Upload & extract: Users POST a PDF; we parse every page with PyMuPDF and combine the text.
- RAG backend: We split the text into chunks, embed them with HuggingFace embeddings, and store/retrieve vectors in ChromaDB. On a question, we retrieve the top‑n relevant passages and feed those with the query into Google Gemini.
- Stateless chat API: Two endpoints (
/upload-pdf,/chat) that handle file ingestion, retrieval, and LLM querying.
How we built it
- Flask for routing and API endpoints.
- Flask‑CORS to enable cross‑origin calls for a decoupled front-end.
- PyMuPDF to extract raw text from PDFs.
- LangChain orchestrating text splitting and retrieval logic.
- HuggingFaceEmbeddings + ChromaDB for vector store and similarity search.
- Google Gemini SDK (
google.genai) as the LLM inference engine. - dotenv for managing environment variables securely.
Challenges we ran into
- Token limits: Feeding entire documents to the LLM crashed on longer reports, solved by chunking and retrieval.
- State management: Balancing chat history growth vs. context freshness.
- Error handling: Corrupted PDF pages and rate limits required robust exception handling.
- Frontend integration: We implemented the full RAG pipeline on the backend but haven’t yet wired the React/Next.js UI to consume the
/chatand retrieval endpoints.
Accomplishments that we’re proud of
- RAG pipeline: Chunking, embedding, and similarity search fully functional, drastically cutting down context size.
- End‑to‑end prototype: From file upload to LLM response in under 500 lines of clean, modular Python.
- LLM integration: Smooth hot‑reload of API keys and prompt templates via
.env.
What we learned
- Efficient retrieval: How chunk size, overlap, and embedding model selection affect relevance.
- Prompt engineering: Crafting retrieval‑augmented prompts improved answer accuracy.
- Integration patterns: Decoupling the retrieval layer from the chat history, setting up CORS, and designing stateless APIs.
What’s next for Prism
- Frontend hookup: Build/extend our frontend to call our
/upload-pdfand/chatendpoints, displaying retrieved passages and chat bubbles. - UX improvements: Highlight source passages in the PDF viewer for transparency.
- Persistence: Add user authentication and save past reports/chat histories in a database.
- Performance tuning: Experiment with alternative vector stores (FAISS) and embedding models for speed/cost trade‑offs.
Log in or sign up for Devpost to join the conversation.