Erpa: 100% On-device browser agent for the visually impaired

💡 Inspiration

One day, I was talking to my friend Ben, who is a musician and happens to be a blind after his 20's, sending a link in nytimes. The day after I was like, "Hey did you see the nytime article I shared?" and he goes: "Nope. It's so hard for us to read those". He showed me how hard it is to read a content-heavy article and it's nearly impossible to read them. It was a groundbreaking moment to realize that modern AI-enhanced browsing experience is just for regular users, not even close for visually impaired and broader audience with reading disability.

Looking at big AI giants move towards building browser agents and even their own browsers like Claude for Chrome, Comet, and Atlas, I was impressed with their powerful capabilities to control browser and manage information overload, yet they often come at the expense of user privacy and accessibility. Many require sending entire page contents to the cloud, which is a non-starter for sensitive research, corporate documents, or for users who simply value their data.

Our hackathon goal became simple: create an AI browsing agent that is 100% on-device, transparent, and built for the marginalized in browser experience. We wanted to empower users like Ben who needs to quickly parse dense articles and navigate complex web structures without fatigue, and without sacrificing his personal or corporate data. Accessibility and privacy aren't afterthoughts in Erpa; they are the foundation.

✨ What it Does

Project Erpa is an ethical browsing companion that helps accessibility users, knowledge workers, and students process web content faster and more securely. It operates primarily as a WCAG 2.2 AA-minded side panel featuring:

On-page Summarization & Q&A: Concise summaries and natural-language answers grounded in the current page's content, with inline citations and anchor links.
Read Aloud (TTS): Synchronized highlighting with adjustable voice parameters via the browser’s Web Speech APIs.
Outline View: A clean, screen-reader-friendly tree view of headings/landmarks for rapid keyboard navigation.
Semantic Search: Local embeddings and in-browser vector search to find content by meaning, not just keywords.
Agentic Actions: An AI agent that can navigate, search, read, and summarize using safe, declarative functions.

🛠️ How We Built It

Building a fast, reliable, privacy-first AI agent that lives entirely in the browser presented some unique technical hurdles.

Architecture Highlights

Local-first by default: Semantic search runs fully on-device using Xenova Transformers (embeddings) and PGLite (in-browser PostgreSQL) for vector search. Reasoning is powered by Chrome’s Prompt API with responses grounded strictly in current-page context.
Accessibility Core: Designed for full keyboard navigation, correct focus management, ARIA roles, and respectful use of live regions for streaming updates.
Extraction Pipeline: A robust content script snapshots the DOM, runs readability/section detection, and generates a structured outline used for navigation, TTS, semantic search, and grounding.
Agent Functions: A constrained set of functions (navigate, semantic search, read aloud, extract, summarize) gives the agent safe, auditable capabilities.

🛑 Challenges We Ran Into

1. The Local Inference Dilemma

Challenge: Achieving low-latency, high-quality answers while staying local. Embedding generation and vector search must be fast across hardware from low-power laptops to desktops. We mitigated this with lightweight models, chunking, and incremental indexing.

2. Implementing function calls and agentic behavior

Challenge: Chrome’s Prompt API doesn’t expose traditional tool-calling. We achieved reliable "function-call-like" behavior via strict schema prompts, deterministic system instructions, and response validation before executing any action.

3. Connecting Screenshot and DOM Structure for Section Extraction

Challenge: We tried to link a screenshot with the DOM structure to extract meaningful page sections within 2 seconds, but was unsuccessful. The main difficulty was aligning visual regions from an image with the corresponding semantic DOM nodes in real time to enable fast navigation or summarization. Despite several optimization attempts, the process could not meet the sub-2-second performance target.

✅ Accomplishments That We're Proud Of

We are most proud of establishing privacy and accessibility as parallel, non-negotiable core features. This included:

Successfully implementing adaptive throttling and streaming responses to overcome local inference latency, significantly improving perceived speed.
Rigorously integrating accessibility into every design phase, including comprehensive testing with NVDA and VoiceOver.
Establishing transparency through a clear visual "Local Processing" badge and an opt-in telemetry module that allows local inspection of collected data.
Creating a separate open-source library to encapsulate our Prompt API utility patterns, making it easier for others to build privacy-first, local AI tools on top of Chrome's new APIs in the future. (Check out @ahnopologetic/use-prompt-api)

🧠 What We Learned

We implemented adaptive throttling and streaming responses. Streaming the output provides partial results quickly, significantly improving the perceived latency and user experience even when the local processing takes a few extra seconds.

We integrated accessibility into every design phase. We rigorously tested with NVDA and VoiceOver, focusing on correct focus order, using live regions for streaming updates, and ensuring that the agent never steals focus from the main page, respecting the user's current flow.

Transparency is key. We made Erpa open-source and added a clear visual "Local Processing" badge to the UI.

🔑 Quick Usage Highlights

Open/close sidepanel: Cmd+Shift+Y (Mac) / Ctrl+Shift+Y (Windows/Linux)
Focus semantic search: Cmd+Shift+F / Ctrl+Shift+F
Toggle voice input: Ctrl+Command+Enter (Mac)
Pause/resume TTS: Ctrl+Command+Option+Space (Mac)
Stop TTS: Ctrl+Command+Option+Enter (Mac) and even more...

See the README for full setup, loading the unpacked extension, and a complete keyboard reference.

🚀 What's Next for Erpa

Erpa has moved beyond a mere feature into a helpful, ethical browsing companion. Near-term goals:

Implement 100% "Visual" section detection and accessibility augmentation using screenshot and Prompt API. There are some several papers about it but I found it very hard to implement and meet the latency standard.
Achieve 5,000 installs and make it 100% free for those with visual disability (authorized by SSA and more)
Secure 5+ community PRs in the first 60 days post open-source launch
Complete a targeted WCAG 2.2 audit and publish findings

Built With

plasmo
prompt-api
rag
react
transformer.js
typescript
wasm

Updates

Sangtae Ahn started this project — Oct 30, 2025 12:28 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.