Inspiration

People consume massive amounts of information through PDFs—textbooks, research papers, documentation, and lecture notes—but PDFs are static, hard to search deeply, and inefficient for real learning. Reading is passive, context is fragmented, and extracting usable knowledge takes too much time.
LearnLM exists to turn PDFs into an interactive learning surface where users can actively interrogate, understand, and apply the content through chat.

What it does

LearnLM converts PDFs into a conversational learning experience. Users upload a PDF, and the app:

  • Extracts and normalizes text from PDFs (including long and complex documents)
  • Analyzes content with AI to identify key concepts and document structure
  • Provides a context-locked chat interface that answers questions strictly from the uploaded PDF
  • Supports explanations, summaries, examples, and concept breakdowns on demand
  • Generates actionable study tasks with estimated completion time
  • References exact sections or pages used to answer each question
  • Tracks progress across documents
  • Saves PDFs to a personal library for resume-and-review workflows
  • Exports summaries, notes, and study plans to PDF

How we built it

LearnLM is built using a modern full-stack architecture:

  • Frontend: Next.js 14 (App Router), TypeScript, Tailwind CSS, shadcn/ui
  • AI: OpenAI GPT-4o-mini for document analysis and chat
  • PDF Processing: Text extraction, chunking, and semantic indexing
  • Retrieval: Vector embeddings with a RAG pipeline for grounded responses
  • State Management: React hooks with localStorage persistence
  • Authentication: Civic Auth SDK
  • UI/UX: PDF viewer, highlighted references, progress indicators, responsive layouts

Challenges we ran into

  • PDF parsing inconsistencies across different document formats
  • Preventing hallucinations by strictly enforcing document-only context
  • Designing an effective chunking and retrieval strategy
  • Maintaining fast response times on large PDFs
  • Forcing consistent structured outputs from the LLM
  • Building a clear UX that shows where answers come from in the document

Accomplishments that we're proud of

  • End-to-end PDF → Chat → Study workflow
  • Accurate, source-grounded AI responses with page-level references
  • Fast and reliable chat on large technical documents
  • Persistent document library with progress tracking
  • Exportable summaries and study materials
  • Clean, modular, and scalable architecture

What we learned

  • Building reliable RAG systems for long-form documents
  • Advanced prompt engineering to reduce hallucinations
  • PDF text normalization and layout handling
  • Designing intuitive chat-based learning interfaces
  • Managing complex client-side state efficiently

What's next for LearnLM

  • Multi-PDF and cross-document chat
  • Flashcards with spaced repetition scheduling
  • Automatic quiz and exam generation
  • Collaborative document workspaces
  • Learning analytics and knowledge gap detection
  • Native mobile applications
  • LMS and Notion integrations

Built With

  • nextjs
  • openai
  • yttranscript
Share this project:

Updates