Inspiration

We were inspired by the gap between AI capabilities and everyday web browsing. While AI assistants exist, they're disconnected from the actual web experience. We envisioned a world where users could naturally converse with their browser, asking questions about content, controlling pages with voice, and customizing their experience instantly. Accessibility was a key driver—many users struggle with small text, cluttered layouts, or complex navigation. We wanted to democratize web access through voice, making browsing intuitive for everyone, from users with disabilities to power users seeking efficiency. The recent release of Chrome's built-in AI made this vision achievable without external APIs.

What it does

Lavio is an intelligent Chrome extension that brings voice-powered AI assistance directly to any webpage. Users simply hold a button and speak naturally. The AI understands context, answers questions about page content, and executes commands. Key features include: Voice interaction with real-time speech recognition and text-to-speech responses. Smart actions like "click on search," "scroll down," or "type my name" that work across any website. Page customization through voice—adjust text size, enable dark mode, hide ads, or activate reader mode instantly. Context-aware conversations where the AI analyzes page content to provide relevant answers. Universal translation supporting 30+ languages. An intuitive onboarding experience showcasing all capabilities.

How we built it

We built Lavio using vanilla JavaScript as a Chrome extension with manifest V3. The architecture consists of three core components: content.js injects the UI and handles user interactions, background.js manages Chrome's built-in AI sessions via the Prompt API, and specialized modules handle element detection, action execution, and page manipulation. For voice, we used Web Speech API for both recognition and synthesis, providing real-time transcription and natural responses. The AI integration uses Chrome's experimental Language Model API for intent detection, element matching, and conversational responses. We implemented a sophisticated intent classification system distinguishing questions from actions. The UI features CSS-in-JS styling, markdown rendering for rich text responses, smooth animations, and a comprehensive onboarding flow. We focused on modularity, creating separate classes for ElementDetector, ActionExecutor, and PageManipulator to keep code maintainable.

Challenges we ran into

The biggest challenge was intent classification—distinguishing "can you tell me" (question) from "can you make" (action). Initially, the AI misclassified polite requests, requiring a multi-layered fix with improved prompts, heuristic overrides, and fallback logic. AI response consistency was tricky; it sometimes returned malformed JSON or contradictory values (actionType set but isAction false), requiring robust parsing and auto-correction. Element matching proved complex on dynamic websites—simple text matching failed for generic terms like "pull request tab," leading us to implement AI-powered fuzzy matching with smart filtering to manage token limits. Protecting the Lavio widget from its own modifications (text size, dark mode) required careful CSS selector exclusion and double-inversion techniques. Persistence management was challenging—users wanted customizations reset on refresh, not persist globally. We also struggled with floating-point precision in CSS calculations.

Accomplishments that we're proud of

We're incredibly proud of achieving seamless voice-to-action functionality that works reliably across any website. The intent classification system reaches 95%+ accuracy through a unique three-layer approach combining AI, heuristics, and smart fallbacks. Our AI-powered element matching can find "pull request tab" even when the actual text is just "PRs." The page manipulation system is comprehensive yet elegant—users can customize any site with simple voice commands while Lavio's UI stays perfectly intact. We built a complete onboarding experience introducing features progressively. The entire system requires zero configuration and works offline. Most impressive: everything runs on Chrome's native AI without external APIs or servers, making it fast, private, and free. We're especially proud of the accessibility impact—making the web usable through voice for those who need it most.

What we learned

We learned that building reliable AI systems requires multiple layers of defense—never trust a single classification method. User intent is nuanced; "can you X" can mean different things based on the verb. We discovered Chrome's built-in AI is powerful but requires careful prompt engineering and robust error handling. CSS filter tricks like double-inversion elegantly solve complex problems. Web Speech API is surprisingly reliable for recognition but needs proper state management for synthesis. Smart token management is crucial—filtering elements before AI processing saves costs and improves accuracy. Modular architecture proved invaluable; separating concerns into ElementDetector, ActionExecutor, and PageManipulator made debugging easier and enabled rapid iteration. Most importantly: real user testing reveals edge cases no amount of planning can predict. Iterative development with continuous user feedback was essential.

What's next for Lavio AI Assistant

We plan to expand Lavio with per-site preference memory, letting users save customizations for specific domains. Advanced page manipulation will include custom CSS injection, font changes, and layout modifications. We're exploring multi-modal capabilities—combining voice with screenshots for visual understanding. Workflow automation is next: "Every time I visit this site, hide the sidebar and enable dark mode." Smart suggestions based on browsing patterns: "It's 10 PM, enable dark mode?" Integration with browser history for context-aware responses. Expanded language support for speech recognition beyond English. Collaborative features allowing users to share voice command sets. Finally, we want to make Lavio cross-browser compatible (Firefox, Safari, Edge) and explore mobile support, bringing voice-powered browsing to smartphones and tablets.

Built With

Share this project:

Updates