Inspiration

PDFs are the digital equivalent of paper—static, silent, and often impenetrable. In an age of intelligent assistants, our most important documents remain locked in a one-way conversation where we can only read. Professionals wading through dense financial reports, students deciphering complex academic papers, and anyone handling sensitive contracts face the same frustrating barrier. The tools that promise intelligence demand a steep price: our privacy, requiring us to upload our most confidential information to the cloud.

What if we could give our documents a voice? What if we could have a secure, intelligent conversation with a PDF, right on our own device? This project was born from that vision—to create an AI assistant that respects your privacy as much as it enhances your productivity.

What it does

PDF Assistant transforms static documents into dynamic, conversational partners. It's a Chrome extension that reimagines the PDF viewing experience, embedding a powerful, private AI directly into your browser. Unlike cloud-based tools, it keeps your data securely on your device, giving you unparalleled intelligence without compromise.

At its heart, PDF Assistant uses Chrome's built-in Gemini Nano AI to interact with your documents in three revolutionary ways:

  • Understand & Summarize: Instantly grasp the core message of any text selection or the entire document, turning pages of dense information into concise, actionable insights.
  • Extract & Analyze: Turn static tables and charts into live data. Simply draw a box to extract table contents into usable JSON or CSV, or ask complex questions about a diagram by combining text and images.
  • Create & Modify: Go beyond reading by actively co-creating with your documents. The AI can generate new text or code and seamlessly insert it into a new, downloadable version of the PDF.

Because this all runs on Chrome's built-in AI, the entire process is 100% offline, lightning-fast, and completely private.

How I built it

The magic of PDF Assistant lies in a carefully orchestrated symphony of on-device technologies, built on two foundational principles: Privacy-by-Design and a Frictionless User Experience.

  • The Viewer (The Stage): We used PDF.js to render documents flawlessly within a custom interface, building a layer on top that captures user interactions like text selection and visual area highlighting with precision.
  • The Brain (The Intelligence): The core AI capabilities are powered by Chrome's LanguageModel API (Gemini Nano). We developed a system of adaptive prompts that instruct the model to perform specific tasks—summarization, table formatting, or multimodal analysis—based on user input.
  • The Editor (The Magic Wand): For content modification, a background service worker uses the pdf-lib library. This was the most complex piece: it programmatically deconstructs the original PDF, "erases" the old content, and meticulously redraws the AI-generated text in its place, even preserving code formatting.

These components communicate seamlessly to create an experience that feels instantaneous and native to the browser.

Challenges and Breakthroughs

Building an AI that lives entirely inside a browser and manipulates PDFs presented unique hurdles:

  • The Challenge of Editing the Uneditable: PDFs are notoriously difficult to modify. Our breakthrough was developing a pipeline that could accurately map a user's visual selection on a <canvas> to the PDF's internal coordinate system, allowing for pixel-perfect content replacement.
  • The Challenge of On-Device Prompt Engineering: Local models like Gemini Nano are powerful but require precise instructions. We spent significant time iterating on prompts to ensure the AI could reliably extract structured data like tables from an image context alone—a task that pushes the boundaries of on-device AI.
  • The Breakthrough: A Truly Interactive PDF: Our proudest accomplishment is closing the loop. Users don't just get an answer in a chatbox; they get a new, improved document. This transforms the tool from a simple reader into a powerful workstation.

Impact and Future

PDF Assistant is more than a tool; it's a new paradigm for how we interact with private information. By proving that high-level AI tasks can be performed securely on-device, we are empowering students, researchers, lawyers, and financial analysts to work smarter without sacrificing confidentiality.

This is just the beginning. We're excited to expand the assistant's capabilities with:

  • Conversational Memory: Allow users to ask follow-up questions and have a continuous dialogue throughout a document.
  • Cross-Document Intelligence: Enable the AI to compare and synthesize information from multiple PDFs at once.
  • Automated Data Annotation: Train the AI to automatically identify and tag key information like names, dates, and financial figures across a document.

Technical Innovation

PDF Assistant showcases the power of Chrome's on-device AI through:

  • Pioneering On-Device Multimodal Analysis for PDFs
  • Zero-Latency AI Interaction with No External API Calls
  • A Novel Pipeline for Programmatic PDF Reconstruction in JavaScript
  • Seamless Interception and Rerendering of Browser Navigation
  • Adaptive Prompting System for Context-Aware AI Responses

By merging advanced document manipulation with cutting-edge, private AI, PDF Assistant creates a smarter, safer, and more productive web for everyone.

Built With

Share this project:

Updates