Voxium

A no hands solution

mb guys, I forgot to include the word test or said the wrong prompt sometimes. I tried to remove the places where I messed up prompting, but the video is still kinda bad

Inspiration

While building a workflow automation tool for repetitive browser tasks, I had a realization: if this system were connected to voice, it could fundamentally change how people with visual impairments or motor disabilities interact with the web.

Most current accessibility tools are rigid, requiring precise commands or complex setups. I wanted to build something that understood intent, not just keywords. Voxium was born from the idea that accessibility shouldn't be a niche feature—it should make technology more fluid for everyone.

What it does

Voxium is an AI-powered browser control system that translates natural speech into intelligent web actions. Instead of memorizing commands, users interact with their browser naturally.

Key capabilities include:

Natural Language Navigation: "Open YouTube" or "Scroll to comments."
Content Manipulation: "Replace 'disabled' with 'differently abled'."
AI Insights: "Summarize this page" to get instant TL;DRs.
Safety First: Confirmation prompts for potentially destructive actions.

How we built it

Voxium is a Chrome Extension built with a focus on real-time interaction and robust architecture:

Logic & UI: JavaScript, HTML, and CSS.
Voice Engine: Web Speech API coupled with an Offscreen Document for continuous, background recognition.
Brain: AI APIs for intent parsing, summarization, and cleaning misinterpreted speech.
Automation: Custom Content Scripts for direct DOM interaction and element targeting.
Used Claude and Copilot to vibe code it

Challenges we ran into

Speech Misinterpretation: Recognition engines often trip over accents or background noise. I implemented a preprocessing layer to "sanitize" text before it hits the AI.
Background Execution: Keeping the "ears" open while the popup was closed required navigating the complexities of Chrome’s extension lifecycle and offscreen documents.
Dynamic DOM Targeting: Converting an abstract thought like "click the first result" into a reliable CSS selector across millions of different site structures required building an adaptive querying logic.

Accomplishments that we're proud of

Built a fully functional AI automation engine in under 12 hours.
Successfully moved beyond "keyword matching" to true intent parsing.
Implemented persistent listening, allowing for a hands-free experience.
Created a real, demo-ready system—not just a mockup.

What we learned

AI Prompt Engineering: How to extract structured JSON intent from messy human speech.
Extension Architecture: Deep dived into background scripts, permissions, and secure API management.
UX for Accessibility: The critical importance of input preprocessing; even a 1% error rate in speech can break the user's trust in automation.

What's next for Voxium

Precision Tuning: Improving intent accuracy through advanced prompt engineering.
Custom Training: Allowing users to "teach" Voxium specific routines or nicknames for sites.
Latency Reduction: Optimizing the speech-to-action pipeline for near-instant response times.
Accessibility Presets: Creating profiles tailored to specific disability needs.

Built With

cometapi
gemini
html
javascript
json
minimax-m2.5

Updates

Alan Arul Stephen started this project — Feb 22, 2026 01:35 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.