Gist | Devpost

Inspiration

Research today is incredibly powerful, but also incredibly dense. Whether it’s a neuroscience paper, legal case, or engineering report, understanding even a small section often requires significant time and prior knowledge. We were inspired by a simple frustration: Why does understanding research still feel so slow and manual in an age of AI? We wanted to build something that makes research feel interactive, visual, and intuitive, something that makes users feel like they are inside the paper itself and helps them instantly grasp meaning.

What it does

Gist is a tool that converts highlighted text from any research article or paper into structured, interactive visualizations in real time.

When a user highlights text, Gist:

-Parses it semantically -Selects the most informative representation -Generates structured output -Renders an interactive visualization

Key features:

Explain View (local understanding):

For any highlighted text, Gist provides:

A plain-language explanation and Multiple visualization options:

-Concept Map → relationships -Data Chart → trends -Timeline → progression -Process Flow → steps -A recommended visualization

Architecture View (global understanding):

Gist extracts the paper’s logical structure as a directed graph:

-Nodes → key sections or concepts -Edges → relationships (supports, leads to, depends on) This gives a high-level view of the paper’s reasoning flow.

Chat View (interactive understanding):

Users can ask questions about the paper or a highlighted section. Gist responds using:

-the selected context -semantic understanding of the content This turns the paper into a queryable knowledge system, not just static text.

How we built Gist (PDF Lens)

Gist is an AI-powered PDF reader that helps users understand complex documents through explanations, visuals, and chat. It’s built as a React frontend + FastAPI backend, with Google Gemini handling the AI.

Architecture

Frontend

Built with React and TypeScript
Uses Vite for development and bundling
Styled with Tailwind CSS
Routing handled by React Router

The frontend provides a PDF-like interface where users can select text, view explanations, explore document structure, and chat with the content.

Backend

Built with FastAPI
Served using Uvicorn
Uses httpx to communicate with the AI
Data validation handled by Pydantic

The backend is responsible for generating prompts, calling the Gemini API, and returning structured responses to the frontend.

How it works

The user selects text in the document.
The frontend sends a request to the backend (e.g. /api/v2/explain).
The backend sends a prompt to Gemini (gemini-2.5-flash).
Gemini returns structured JSON (explanation + optional visual).
The frontend renders the explanation and visual.

Features

Explain

Converts selected text into a plain-language explanation
Generates visuals like diagrams, SVGs, or tables
Supports follow-up refinements (simpler, more detail, analogy)

Architecture

Extracts the structure of the document
Returns a tree of sections that is rendered visually

Chat

Lets users ask questions about the document
Uses document context to generate grounded responses

Key design decisions

Server-side AI only: All Gemini calls happen in the backend. The API key is never exposed to the client.
Structured outputs: The backend enforces JSON responses so the frontend can reliably render visuals and explanations.
Separation of concerns: The frontend handles UI and interaction, while the backend handles AI logic.

Summary

Gist combines:

A React-based interface for interacting with documents
A FastAPI backend for handling AI requests
Gemini for generating explanations, visuals, and structure

The result is a system where users can read, understand, and explore complex PDFs more effectively.

Challenges we ran into

LLM Output Consistency Gemini often returned:

-plain text instead of JSON -incomplete structures

We solved this with:

-strict prompt engineering -defensive parsing (safe_json_parse) -fallback defaults

Visualization Selection Choosing the “best” representation is ambiguous.

We addressed this by:

-combining LLM classification with heuristics -returning multiple options + recommended view + incorporating UI/UX design principles.

Accomplishments that we're proud of

We ended up building a system where: unstructured text → structured visualization in real time.

Others include:

Created an adaptive visualization engine, not a fixed UI
Developed the AI Data Sketcher, which converts qualitative descriptions into visualizable datasets
Designed a multi-level understanding system: local (Explain), global (Architecture), interactive (Chat).
Achieved a smooth highlight → response interaction loop, which is the core user experience