Inspiration

We wanted to build something that felt like a real AI assistant — not a chatbot, but an agent that could act in the world. The idea was simple: what if you could just talk to your computer and have it browse the web for you? No typing, no clicking — just intent.

What it does

Friday is a wake-word-activated voice assistant that controls a real browser using AI. Say "hey Friday" (or press the mic button), give it a task like "find me flights to Tokyo next weekend" or "order my usual from Amazon", and Friday takes over: it navigates, clicks, fills forms, and reports back when it's done.

Key features:

  • 🎙️ Wake-word detection — always listening for "hey Friday", hands-free
  • 🧠 Gemini-powered browser agent — uses vision + planning to navigate any website
  • 🖥️ Floating GUI — a persistent HUD showing the agent's live status and action log
  • 🔊 Text-to-speech feedback — Friday speaks its status back to you
  • ⏹️ Cancellable tasks — stop mid-flight with a single button press
  • 🔇 Mute toggle — silence voice output without stopping the agent
  • ⌨️ Browser shortcuts — back, refresh, new tab, home — all from the GUI

How we built it

The core stack:

  • wbrowser-use — the browser automation framework that gives the agent eyes (vision) and hands (actions)
  • Google Gemini via langchain-google-genai — the LLM brain that plans and executes steps
  • Vosk / SpeechRecognition — offline wake-word detection + command recording
  • pyttsx3 / gTTS — text-to-speech for spoken feedback
  • Tkinter — the floating HUD GUI that shows live agent state and logs
  • pyautogui — keyboard shortcut passthrough to the browser window

The architecture runs two threads: a Tkinter GUI thread (main) and a background agent loop thread that owns an asyncio event loop. Voice input, wake detection, and all browser-use async tasks live in the agent thread. The GUI communicates via callbacks and a shared threading.Event for manual mic triggers. Tasks are wrapped as asyncio.Task objects so they can be cancelled cleanly mid-execution when the user hits Stop.

Challenges

Threading + asyncio was the biggest challenge. Tkinter is not thread-safe, and browser-use requires asyncio. Bridging the two meant carefully isolating the event loop on the background thread and only touching GUI state through pre-registered callbacks.

Wake-word reliability in noisy environments required an explicit calibration step on startup — sampling ambient noise to set a dynamic energy threshold before listening begins.

Task cancellation needed to be graceful: cancelling an asyncio.Task mid-browser-action could leave the browser in a partial state, so the agent's max_failures and loop_detection_enabled settings help recover cleanly.

pyautogui failsafe had to be explicitly disabled (FAILSAFE = False) because the browser agent moves the cursor to corners of the screen during navigation — which would otherwise abort the process.

What we learned

  • How to architect a hybrid sync/async/GUI app without deadlocks
  • The power of browser-use's vision + planning pipeline for real-world web automation
  • How close we are to truly ambient, always-on AI agents that live alongside your workflow

Built With

Share this project:

Updates