Inspiration

We wanted to make a PDF report advisor since current LLMs are inefficient at navigating and querying long, structured documents. By applying a RAG pipeline, we can index and retrieve only the most relevant chunks, reducing latency and cost.

What it does

  • Upload & extract: Users POST a PDF; we parse every page with PyMuPDF and combine the text.
  • RAG backend: We split the text into chunks, embed them with HuggingFace embeddings, and store/retrieve vectors in ChromaDB. On a question, we retrieve the top‑n relevant passages and feed those with the query into Google Gemini.
  • Stateless chat API: Two endpoints (/upload-pdf, /chat) that handle file ingestion, retrieval, and LLM querying.

How we built it

  • Flask for routing and API endpoints.
  • Flask‑CORS to enable cross‑origin calls for a decoupled front-end.
  • PyMuPDF to extract raw text from PDFs.
  • LangChain orchestrating text splitting and retrieval logic.
  • HuggingFaceEmbeddings + ChromaDB for vector store and similarity search.
  • Google Gemini SDK (google.genai) as the LLM inference engine.
  • dotenv for managing environment variables securely.

Challenges we ran into

  • Token limits: Feeding entire documents to the LLM crashed on longer reports, solved by chunking and retrieval.
  • State management: Balancing chat history growth vs. context freshness.
  • Error handling: Corrupted PDF pages and rate limits required robust exception handling.
  • Frontend integration: We implemented the full RAG pipeline on the backend but haven’t yet wired the React/Next.js UI to consume the /chat and retrieval endpoints.

Accomplishments that we’re proud of

  • RAG pipeline: Chunking, embedding, and similarity search fully functional, drastically cutting down context size.
  • End‑to‑end prototype: From file upload to LLM response in under 500 lines of clean, modular Python.
  • LLM integration: Smooth hot‑reload of API keys and prompt templates via .env.

What we learned

  • Efficient retrieval: How chunk size, overlap, and embedding model selection affect relevance.
  • Prompt engineering: Crafting retrieval‑augmented prompts improved answer accuracy.
  • Integration patterns: Decoupling the retrieval layer from the chat history, setting up CORS, and designing stateless APIs.

What’s next for Prism

  • Frontend hookup: Build/extend our frontend to call our /upload-pdf and /chat endpoints, displaying retrieved passages and chat bubbles.
  • UX improvements: Highlight source passages in the PDF viewer for transparency.
  • Persistence: Add user authentication and save past reports/chat histories in a database.
  • Performance tuning: Experiment with alternative vector stores (FAISS) and embedding models for speed/cost trade‑offs.

Built With

Share this project:

Updates