PocketRAG

Inspiration

While working with research papers and technical notes, I found it time-consuming to locate specific details or summarize dense material. I wanted a simple, portable tool that could instantly summarize any document and answer context-specific questions without needing to upload data to external servers or depend on complex infrastructure. That idea became PocketRAG: a compact Retrieval-Augmented Generation (RAG) web app designed for speed, simplicity, and privacy.

What it does

PocketRAG lets users upload a PDF and instantly:

Generate a concise AI summary.
Ask context-aware questions about the uploaded content.
Retrieve and display relevant text passages with precision.

Each upload creates its own local index and can be revisited anytime, making it ideal for quick reviews, research assistance, and academic note-taking.

How I built it

Frontend & Backend: Flask served via Gunicorn for production stability.
Vector Indexing: FAISS for fast and efficient document embeddings.
AI Model: Gemini-2.5-Flash via the Google Generative AI API for summarization and Q&A.
Deployment: Dockerized the entire stack and hosted it on AWS EC2 for simplicity and control.
File Handling: Used Python’s pypdf and standard I/O to preprocess and chunk uploaded content.

Challenges I ran into

Deployment Troubles: Configuring App Runner’s health checks and networking turned out trickier than expected, so I migrated to a simpler EC2 + Docker setup.
API Stability: Balancing lightweight inference latency with meaningful output required model tuning.
Vector Storage: Managing memory footprint when processing large PDFs in FAISS was initially inefficient.
CORS and File Limits: Ensuring stable file uploads within Flask while keeping the app minimal.

What I learned

End-to-end deployment with AWS EC2 + Docker is far more controllable than managed services for small projects.
How to integrate Google’s Gemini API efficiently into a RAG pipeline.
The importance of good health checks and logs in debugging deployment issues.
How to create production-ready apps under tight hackathon timelines.