✨ Inspiration The inspiration for this project came from our interest in voice-based automation and LLM-powered web browsing. I’ve always been fascinated by how natural language interfaces can simplify complex user interactions. I wanted to build something that could help users navigate the web, search, and interact with content making browsing more accessible, intelligent, and effortless.

📚 What I Learned Throughout the journey of this project, I learned:

How to work with Chrome Extensions, Manifest V3, and Web Speech API

How to integrate Groq(gemma2-9b-it) for natural language understanding

Better problem-solving and debugging techniques, especially with asynchronous APIs

How to design a user-friendly voice UI

Real-world practices for browser scripting, content scripts, and LLM prompt engineering

🛠️ How I Built It I used the following tools and technologies to bring the project to life:

Frontend/UI: HTML, CSS, Vanilla JavaScript

LLM Integration: Groq via the Completions API

Browser APIs: Web Speech API for speech-to-text, Chrome Tabs/Scripting API

Voice Command Engine: Custom prompt-to-action mapping using Groq

Deployment: Chrome Extension (packed and loaded locally for now)

Development Process: I started by defining the core interaction: voice-to-command. I built a minimal UI to capture speech using the Web Speech API, then used Groq to analyze the command and decide whether to search, navigate to a URL, summarize the page, or answer a question about the current page. The final step was to execute these actions using browser scripting and dynamically injected scripts.

🧗 Challenges I Faced Some of the key challenges I encountered were:

Debugging asynchronous API calls and managing fetch requests with proper headers

Getting content scripts to load dynamically and ensuring has access across all pages

Parsing and handling structured LLM outputs reliably (JSON format from Groq)

Extracting full page innerHTML safely and efficiently summarizing it

Designing a flexible prompt that handles open-ended user speech while keeping response format consistent

Navigating the restrictions of Manifest V3, especially regarding background scripts and permissions

Built With

Share this project:

Updates