Inspiration

Our Macs do real work coding, writing, designing, communicating but the moment we step away from the desk, that capability goes dark. You're at the gym, in a meeting room, on a walk, and your Mac just... waits.

What if your Mac was reachable through every channel you already use like text, voice, agent ecosystems not just one? What if a hands-free workflow wasn't a special mode, but the default?

That became MacBuddy.

What it does

MacBuddy turns your iPhone into a remote for your Mac. You can:

  • Text it via iMessage : "open any file in you folder and send it to you via gmail"
  • Call it via FaceTime : speak naturally, hear it speak back
  • Talk to it through ASI:One : it's a registered agent on Fetch.ai's Agentverse marketplace
  • Watch it execute multi-step GUI tasks in real time on a live dashboard
  • Get files emailed from your Mac or Google Drive without touching a keyboard

Every reply comes back as text, voice through FaceTime, or audio through your Mac's speakers the channel matches the input.

How we built it

The system is a five-service Python stack orchestrated by one launcher script:

  • Router (FastAPI) central hub. Classifies intent using a local Gemma 4 model (running via Ollama) with a regex keyword fallback, then dispatches to the right lane. Also exposes a WebSocket /events for the live UI.

  • FaceTime Lane multi-step GUI control via Claude Sonnet 4.6 Computer Use. Takes screenshots, plans, executes via pyautogui, loops until the task completes. Supports up to 12 reasoning iterations per request.

  • Orbit Lane structured Google Workspace operations: Drive search/read, Gmail send/list/search, Calendar events, file delivery, and screenshot capture+email.

  • Voice Daemon real-time STT+TTS over FaceTime audio. Captures audio via sounddevice, transcribes with Groq Whisper-large-v3-turbo, synthesizes replies with ElevenLabs Turbo v2.5.

  • iMessage Bridge polls chat.db every 2 seconds, deduplicates messages, sends replies via AppleScript, optionally speaks replies through Mac speakers via ElevenLabs.

  • Agentverse Wrapper separate uAgent process using Fetch.ai's uagents library with the standard Chat Protocol. Exposes the system on Agentverse with sender + action allowlists, so ASI:One queries route through the same router but with hard security guardrails. Innovation Lab badge included.

The UI is a frameless Tauri 2 native window showing "what you said → what MacBuddy did" .

Challenges we ran into

  • macOS Tahoe broke BlackHole audio loopback. Direct play+record returned 0.000 RMS even on a fresh install. We fell back to using the MacBook's built-in mic to pick up phone speaker audio.
  • Anthropic rate limits (30k input tokens/min on Tier 1) were hit fast when sending multiple commands in quick succession, since every Claude iteration includes a screenshot. We capped iterations at 12 and added pacing discipline.
  • Agentverse mailbox auth refused to acquire a token despite valid keys. We pivoted to direct endpoint mode via ngrok, which worked cleanly.
  • Zombie bridge processes from prior sessions caused triple message processing. We added rowid deduplication plus a pkill cleanup step in the launcher.

Accomplishments we're proud of

  • Three input channels routed through one classifier - text, voice, and agent-to-agent without any duplication of execution logic.
  • Two-layer security model (sender allowlist + action allowlist) makes public Agentverse exposure genuinely safe.
  • A working ngrok-tunneled uAgent registered, active, and ASI:One discoverable on Agentverse.
  • MacBuddy is the first-ever AI agent on Agentverse that can remotely control your Mac through ASI1 platform.
  • All voice replies use ElevenLabs Turbo v2.5 the Mac sounds like it has a personality.
  • A genuinely multi-step GUI controller powered by Claude Sonnet 4.6 not shortcuts, real screenshot-and-act loops.

What we learned

  • Agentic computer use is real. Claude Sonnet 4.6 reliably opens apps, navigates, types, and recovers from minor failures across multi-step tasks. The bottleneck isn't intelligence anymore it's rate limits and audio I/O.
  • Local + cloud is the right shape for routing. Gemma running locally for classification keeps the h

Built With

  • anthropic
  • asi-one
  • claude-sonnet-4-6
  • elevenlabs
  • elevenlabs-turbo-v2.5
  • fastapi
  • fetch.ai-agentverse
  • gemma-3-4b
  • javascript
  • pyautogui
  • python
  • rust
  • sounddevice
  • tauri
  • uagents
Share this project:

Updates