Inspiration

We wear smart glasses every day but the AI assistants available for them are basic — they're either locked to a single cloud provider or can't access any real tools. We wanted an AI on our glasses that could do everything our desktop AI agent could do: search the web, run code, manage emails, query knowledge bases — all hands-free through voice. OpenClaw already gives us a powerful self-hosted AI gateway with 20+ channel integrations (Discord, Telegram, Slack), so we asked: why can't smart glasses be just another channel?

What it does

Edith is a voice AI assistant for smart glasses. Say "Hey Edith" and ask anything — she responds through your glasses speakers using your own OpenClaw agent, with full access to your tools, memory, and integrations.

  • Voice conversations — Ask questions, get answers spoken back through your glasses
  • Vision — "What am I looking at?" captures a photo from the glasses camera and sends it to your AI agent
  • Knowledge search — Powered by Senso.ai, query your document knowledge base hands-free
  • Code generation — Powered by Augment Code's Auggie CLI, build and debug code by voice
  • Secure multi-user — Powered by Unkey, API key auth protects the relay with rate limiting and usage analytics
  • Full agent access — Anything your OpenClaw agent can do (browse the web, execute code, manage files, send messages), Edith can do through voice

How we built it

The architecture has two components connected by WebSocket:

  1. Edith App (hosted on DigitalOcean) — A Bun/TypeScript server using the Mentra SDK. Handles wake word detection ("Hey Edith"), speech transcription, camera capture, and text-to-speech output. Acts as a WebSocket relay between the glasses and OpenClaw.

  2. OpenClaw Plugin (runs on user's machine) — A channel plugin that connects outbound to the Edith app via WebSocket, just like Discord and Telegram plugins connect outbound to their platforms. When a message arrives, it dispatches through OpenClaw's full agent pipeline and sends the response back.

The key insight: the plugin connects outbound to the cloud app, so users never need to expose ports or set up tunnels. We integrated Senso.ai for knowledge search, Unkey for API key verification on the WebSocket relay, and Augment Code's Auggie CLI for hands-free code generation — all as OpenClaw skills that any user can install.

Challenges we ran into

  • WebSocket upgrade conflicts — The Mentra SDK has its own WebSocket handlers that were intercepting our /openclaw-ws upgrade requests and closing them. We had to prepend our upgrade listener before the SDK's to prevent this.
  • Link code race condition — The OpenClaw plugin often connects before the glasses session starts, so the link code isn't registered yet. We solved this by auto-registering any link code on first connection rather than validating against a pre-registered set.
  • OpenClaw plugin API compatibility — Functions like resolveStorePath and dispatchReplyFromConfig had different signatures than expected. Required careful reading of the OpenClaw source to get the dispatch pipeline working.
  • Photo capture — The Mentra SDK routes camera requests through the phone app via BLE, so photos fail when the phone is asleep. Added defensive error handling so vision queries gracefully fall back to text-only.

Accomplishments that we're proud of

  • It actually works end-to-end — You put on glasses, say "Hey Edith, what's up?" and your OpenClaw agent responds through the speakers. The full pipeline: voice → transcription → WebSocket → OpenClaw agent → response → TTS → glasses.
  • Published as a real product — The plugin is on npm (openclaw-edith-glasses), the setup skill is on ClawHub and Shipables, and the app is on the Mentra app store. Anyone with OpenClaw and Mentra glasses can use it today.
  • Zero-config networking — No port forwarding, no tunnels, no firewall rules. The plugin connects outbound like Discord and Telegram bots do.
  • 4 sponsor integrations that each solve real problems, not just checkbox demos.

What we learned

  • OpenClaw's channel plugin architecture is remarkably well-designed — once you understand the ChannelPlugin interface and the dispatch pipeline, adding a new channel is mostly config and plumbing.
  • The Agent Skills open standard (SKILL.md) is powerful for distributing AI capabilities. A skill is just a markdown file, but it gives the LLM everything it needs to use a new tool autonomously.
  • Smart glasses are a surprisingly natural interface for AI — the always-on microphone and camera make them the perfect "ambient AI" device. The hard part isn't the AI, it's the plumbing between the glasses and your agent.

What's next for Edith

  • Streaming responses — Currently Edith waits for the full response before speaking. We want to stream TTS as the agent generates, for near-instant responses.
  • Proactive notifications — Have Edith tap your glasses and speak when something important happens (email from your boss, calendar reminder, build failure).
  • Multi-modal context — Continuous camera feed analysis, not just single photos. "Edith, keep an eye on this and tell me when it changes."
  • More glasses platforms — Even Realities G1, Vuzix Z100, and future AR devices.

Built With

Share this project:

Updates