Inspiration
Research today is incredibly powerful, but also incredibly dense. Whether it’s a neuroscience paper, legal case, or engineering report, understanding even a small section often requires significant time and prior knowledge. We were inspired by a simple frustration: Why does understanding research still feel so slow and manual in an age of AI? We wanted to build something that makes research feel interactive, visual, and intuitive, something that makes users feel like they are inside the paper itself and helps them instantly grasp meaning.
What it does
Gist is a tool that converts highlighted text from any research article or paper into structured, interactive visualizations in real time.
When a user highlights text, Gist:
-Parses it semantically -Selects the most informative representation -Generates structured output -Renders an interactive visualization
Key features:
- Explain View (local understanding):
For any highlighted text, Gist provides:
A plain-language explanation and Multiple visualization options:
-Concept Map → relationships -Data Chart → trends -Timeline → progression -Process Flow → steps -A recommended visualization
- Architecture View (global understanding):
Gist extracts the paper’s logical structure as a directed graph:
-Nodes → key sections or concepts -Edges → relationships (supports, leads to, depends on) This gives a high-level view of the paper’s reasoning flow.
- Chat View (interactive understanding):
Users can ask questions about the paper or a highlighted section. Gist responds using:
-the selected context -semantic understanding of the content This turns the paper into a queryable knowledge system, not just static text.
How we built Gist (PDF Lens)
Gist is an AI-powered PDF reader that helps users understand complex documents through explanations, visuals, and chat. It’s built as a React frontend + FastAPI backend, with Google Gemini handling the AI.
Architecture
Frontend
- Built with React and TypeScript
- Uses Vite for development and bundling
- Styled with Tailwind CSS
- Routing handled by React Router
The frontend provides a PDF-like interface where users can select text, view explanations, explore document structure, and chat with the content.
Backend
- Built with FastAPI
- Served using Uvicorn
- Uses httpx to communicate with the AI
- Data validation handled by Pydantic
The backend is responsible for generating prompts, calling the Gemini API, and returning structured responses to the frontend.
How it works
- The user selects text in the document.
- The frontend sends a request to the backend (e.g.
/api/v2/explain). - The backend sends a prompt to Gemini (
gemini-2.5-flash). - Gemini returns structured JSON (explanation + optional visual).
- The frontend renders the explanation and visual.
Features
Explain
- Converts selected text into a plain-language explanation
- Generates visuals like diagrams, SVGs, or tables
- Supports follow-up refinements (simpler, more detail, analogy)
Architecture
- Extracts the structure of the document
- Returns a tree of sections that is rendered visually
Chat
- Lets users ask questions about the document
- Uses document context to generate grounded responses
Key design decisions
- Server-side AI only: All Gemini calls happen in the backend. The API key is never exposed to the client.
- Structured outputs: The backend enforces JSON responses so the frontend can reliably render visuals and explanations.
- Separation of concerns: The frontend handles UI and interaction, while the backend handles AI logic.
Summary
Gist combines:
- A React-based interface for interacting with documents
- A FastAPI backend for handling AI requests
- Gemini for generating explanations, visuals, and structure
The result is a system where users can read, understand, and explore complex PDFs more effectively.
Challenges we ran into
- LLM Output Consistency Gemini often returned:
-plain text instead of JSON -incomplete structures
We solved this with:
-strict prompt engineering -defensive parsing (safe_json_parse) -fallback defaults
- Visualization Selection Choosing the “best” representation is ambiguous.
We addressed this by:
-combining LLM classification with heuristics -returning multiple options + recommended view + incorporating UI/UX design principles.
Accomplishments that we're proud of
We ended up building a system where: unstructured text → structured visualization in real time.
Others include:
- Created an adaptive visualization engine, not a fixed UI
- Developed the AI Data Sketcher, which converts qualitative descriptions into visualizable datasets
- Designed a multi-level understanding system: local (Explain), global (Architecture), interactive (Chat).
- Achieved a smooth highlight → response interaction loop, which is the core user experience
What we learned
- LLMs require guardrails, not just prompts
- UX determines perceived intelligence as much as the model itself
- Real-time systems require latency-aware design
- Structured outputs are significantly harder than free-form generation
- Integration (not generation) is the hardest part of building AI products
What's next for Gist
Short-term goals:
- Implement embedding-based retrieval (RAG) for chat
- Improve classification confidence scoring
- Enhance visualization fidelity (better layouts, interactions)
Long-term goals:
- Full-paper ingestion + indexing
- Persistent knowledge graphs per paper
- Domain-specific optimization (legal, medical, engineering)
- Multi-document comparison and synthesis
Log in or sign up for Devpost to join the conversation.