Inspiration
Our Macs do real work coding, writing, designing, communicating but the moment we step away from the desk, that capability goes dark. You're at the gym, in a meeting room, on a walk, and your Mac just... waits.
What if your Mac was reachable through every channel you already use like text, voice, agent ecosystems not just one? What if a hands-free workflow wasn't a special mode, but the default?
That became MacBuddy.
What it does
MacBuddy turns your iPhone into a remote for your Mac. You can:
- Text it via iMessage : "open any file in you folder and send it to you via gmail"
- Call it via FaceTime : speak naturally, hear it speak back
- Talk to it through ASI:One : it's a registered agent on Fetch.ai's Agentverse marketplace
- Watch it execute multi-step GUI tasks in real time on a live dashboard
- Get files emailed from your Mac or Google Drive without touching a keyboard
Every reply comes back as text, voice through FaceTime, or audio through your Mac's speakers the channel matches the input.
How we built it
The system is a five-service Python stack orchestrated by one launcher script:
Router (FastAPI) central hub. Classifies intent using a local Gemma 4 model (running via Ollama) with a regex keyword fallback, then dispatches to the right lane. Also exposes a WebSocket
/eventsfor the live UI.FaceTime Lane multi-step GUI control via Claude Sonnet 4.6 Computer Use. Takes screenshots, plans, executes via pyautogui, loops until the task completes. Supports up to 12 reasoning iterations per request.
Orbit Lane structured Google Workspace operations: Drive search/read, Gmail send/list/search, Calendar events, file delivery, and screenshot capture+email.
Voice Daemon real-time STT+TTS over FaceTime audio. Captures audio via sounddevice, transcribes with Groq Whisper-large-v3-turbo, synthesizes replies with ElevenLabs Turbo v2.5.
iMessage Bridge polls
chat.dbevery 2 seconds, deduplicates messages, sends replies via AppleScript, optionally speaks replies through Mac speakers via ElevenLabs.Agentverse Wrapper separate uAgent process using Fetch.ai's
uagentslibrary with the standard Chat Protocol. Exposes the system on Agentverse with sender + action allowlists, so ASI:One queries route through the same router but with hard security guardrails. Innovation Lab badge included.
The UI is a frameless Tauri 2 native window showing "what you said → what MacBuddy did" .
Challenges we ran into
- macOS Tahoe broke BlackHole audio loopback. Direct play+record returned 0.000 RMS even on a fresh install. We fell back to using the MacBook's built-in mic to pick up phone speaker audio.
- Anthropic rate limits (30k input tokens/min on Tier 1) were hit fast when sending multiple commands in quick succession, since every Claude iteration includes a screenshot. We capped iterations at 12 and added pacing discipline.
- Agentverse mailbox auth refused to acquire a token despite valid keys. We pivoted to direct endpoint mode via ngrok, which worked cleanly.
- Zombie bridge processes from prior sessions caused triple message processing. We added rowid deduplication plus a
pkillcleanup step in the launcher.
Accomplishments we're proud of
- Three input channels routed through one classifier - text, voice, and agent-to-agent without any duplication of execution logic.
- Two-layer security model (sender allowlist + action allowlist) makes public Agentverse exposure genuinely safe.
- A working ngrok-tunneled uAgent registered, active, and ASI:One discoverable on Agentverse.
- MacBuddy is the first-ever AI agent on Agentverse that can remotely control your Mac through ASI1 platform.
- All voice replies use ElevenLabs Turbo v2.5 the Mac sounds like it has a personality.
- A genuinely multi-step GUI controller powered by Claude Sonnet 4.6 not shortcuts, real screenshot-and-act loops.
What we learned
- Agentic computer use is real. Claude Sonnet 4.6 reliably opens apps, navigates, types, and recovers from minor failures across multi-step tasks. The bottleneck isn't intelligence anymore it's rate limits and audio I/O.
- Local + cloud is the right shape for routing. Gemma running locally for classification keeps the h
Built With
- anthropic
- asi-one
- claude-sonnet-4-6
- elevenlabs
- elevenlabs-turbo-v2.5
- fastapi
- fetch.ai-agentverse
- gemma-3-4b
- javascript
- pyautogui
- python
- rust
- sounddevice
- tauri
- uagents
Log in or sign up for Devpost to join the conversation.