Xtension – Hands-Free Browser Extension with Cloned Voice !

Inspiration

I was inspired by the idea of making the web completely hands-free. I wanted to create a browser extension that not only automates websites with voice commands but also responds intelligently in a cloned voice that feels personal. Most assistants sound generic, but by combining Gemini with ElevenLabs, I could build one that’s both smart and uniquely mine.

What it does

Xtension is a Chrome extension that:

  • Automates browsing with voice commands (scroll, click, navigate).
  • Listens for the keyword “Gemini” to activate AI responses.
  • Clones and uses a personal voice for replies via ElevenLabs TTS.
  • Lets me clone a new voice or paste an existing voice_id directly from a simple webpage.

How I built it

  • Extension (HTML/JS): Captures voice, handles browser automation, and manages the cloned voice_id.
  • Flask server (Python): Provides backend support for uploading and cloning voices.
  • Assistant script (Python):
    • Records microphone input.
    • Transcribes with ElevenLabs STT (scribe_v1).
    • Responds through Gemini when triggered.
    • Speaks back using the active cloned voice.

Challenges I ran into

  • Linking the Chrome extension with the Python assistant to share the voice_id.
  • Converting browser audio recordings (webm) into formats usable by ElevenLabs APIs.
  • Debugging API rate limits and errors in real-time during voice cloning.
  • Making the assistant respond only when “Gemini” is invoked while still supporting web automation commands.

Accomplishments that I’m proud of

  • Built a working hands-free extension that combines voice automation, AI answers, and cloned speech.
  • Successfully integrated browser-level JavaScript with backend Python services.
  • Made the assistant feel more human by giving it a customizable cloned voice.

What I learned

I learned how to:

  • Build and debug a full Chrome extension with persistent settings.
  • Work with real-time audio capture and APIs like ElevenLabs and Gemini.
  • Bridge between a web interface and a Python process smoothly.
  • Think about accessibility — hands-free browsing and personalized voices make the web more usable.

What’s next for Xtension

  • Add support for continuous streaming transcription instead of fixed chunks.
  • Allow multiple voice profiles for different moods or contexts.
  • Expand automation features for more complex multi-step browsing tasks.
  • Package the project for release on the Chrome Web Store so anyone can use it.

Built With

Share this project:

Updates