AndroidClaw

Inspiration

Smart glasses give you eyes and ears everywhere, but no hands. You can see and hear the world through Meta Ray-Bans, yet when the AI says "I'll look that up" or "let me check that website," it has no way to actually do it. Voice assistants today can talk, but they can't act. We wanted to bridge that gap: a fully on-device AI assistant that sees what you see through your glasses, hears what you say, and can take real actions — browsing the web, looking up information, executing tasks — all running locally on your phone without any cloud server.

What it does

AndroidClaw turns a Pixel 9 phone paired with Meta Ray-Ban smart glasses into a hands-free AI agent that can both perceive and act:

See — The glasses stream live video to the phone via Meta's Wearables DAT SDK. Frames are sent at ~1fps to Gemini Live for visual understanding.
Listen & Speak — Bidirectional voice: the user speaks through the glasses' mic, Gemini responds with spoken audio, with echo cancellation and noise suppression for natural conversation.
Act — When Gemini decides it needs to take action (browse a website, look something up, execute a task), it issues a tool call that routes to the OpenClaw gateway running in an on-device Linux VM. OpenClaw launches a real Chromium browser, navigates pages, extracts information, and returns results — all happening locally on the phone.
Report back — The tool result feeds back into Gemini, which speaks the answer to the user through the glasses.

How we built it

The system has two Android repos working together:

meta-wearables-dat-android — A fork of Meta's Wearables DAT SDK sample app, extended with:

GeminiLiveSession — WebSocket client to Gemini's BidiGenerateContent API with tool declarations for action execution
GeminiLiveViewModel — Manages the full lifecycle: mic capture (16kHz PCM), speaker playback (24kHz PCM), video frame forwarding, tool call routing, and conversation transcript
OpenClawClient — HTTP client that forwards tool calls to the OpenClaw gateway via the OpenAI-compatible /v1/chat/completions endpoint

androidclaw — The on-device AI gateway:

A Debian arm64 Linux VM running under Android's pKVM hypervisor (AVF)
Node.js 22, OpenClaw 2026.2.9, and Playwright with Chromium — all running natively inside the VM
The gateway exposes ws://192.168.0.2:18790 over the VM bridge network
Browser tool works end-to-end: Playwright launches headless Chromium, navigates real web pages, and returns structured results

We initially tried embedding Node.js directly in the Android app using nodejs-mobile (Phase 1–2), but hit a wall: Playwright and Chromium can't run inside the constrained nodejs-mobile sandbox. The pivot to AVF gave us a full Linux environment where everything runs upstream, unmodified.

Challenges we ran into

nodejs-mobile limitations — Node 18 with small-ICU meant no Unicode property escapes, no ES2023 array methods, no Intl, no File global. We wrote 24 regex replacements, polyfills, and post-build transforms before realizing the browser tool was fundamentally impossible in this environment.
Browser session stale state — Playwright's singleton browser connection would go stale after failed requests, causing all subsequent browser calls to time out. We traced it to corrupted user-data directories and built a cleanup routine.
Speaker/mic race condition — Concurrent AudioTrack.write() calls from multiple coroutines crashed the native audio layer. Fixed by funneling all playback through a single dedicated coroutine draining a Channel.
SSH relay fragility — The nc-based relay from Mac to VM only handles one connection and stale processes silently block the port. Constant debugging of "why can't I connect" turned out to be zombie nc processes.
Android process freezing — Android 14+ aggressively freezes background processes even with foreground services and wake locks. Required battery optimization exemption and, on AOSP, disabling the app freezer entirely.
Gemini model latency — The initial model (Gemini 2.5 Flash) spent 5 minutes "thinking" before deciding to use the browser tool, causing app-side timeouts. Switching to Gemini 3 Pro Preview resolved this.

Accomplishments that we're proud of

Full browser tool on a phone — A real headless Chromium controlled by Playwright, running inside a Linux VM on Android, executing web browsing tasks triggered by voice through smart glasses. The entire loop — glasses → voice → Gemini → tool call → OpenClaw → Chromium → result → voice response — works end-to-end on-device.
Zero cloud infrastructure — No servers, no proxy, no relay. The AI gateway, browser, and model API calls all originate from the phone itself.
Upstream unmodified — After the AVF pivot, OpenClaw runs as-is from npm. No patches, no stubs, no post-build transforms. sudo npm install -g openclaw just works.
Natural voice conversation with actions — The Gemini Live integration supports real-time bidirectional audio with echo cancellation, live transcription, interruptibility, and seamless tool call handoff — it feels like talking to an assistant that can actually do things.

What we learned

Don't fight the platform — We spent days making nodejs-mobile work with polyfills and stubs. The moment we pivoted to a proper Linux VM, everything just worked. Sometimes the right abstraction layer matters more than clever workarounds.
AVF is underappreciated — Android's pKVM hypervisor gives you a real Linux environment on modern Pixels. It's like having a VPS in your pocket.
Audio is harder than it looks — Getting bidirectional voice right with echo cancellation, noise suppression, and proper stream lifecycle management required careful attention to Android's audio pipeline quirks.
Browser automation is fragile — Playwright's singleton connection model means one bad request can poison the entire session. Defensive cleanup patterns are essential for reliability.

What's next for AndroidClaw

App-managed VM lifecycle — The Android app should launch, configure, and manage the AVF VM automatically instead of requiring manual SSH setup.
Persistent API key in VM — Store the Gemini API key in the VM filesystem so it survives reboots without manual re-entry.
Streaming responses — Wire up SSE/streaming from the gateway through to the glasses for faster time-to-first-word on tool results.
Multi-modal tool results — Return screenshots and visual artifacts from browser sessions back through Gemini to describe what the page looks like.
More tools — File management, calendar integration, smart home control — anything the user can describe, the agent should be able to execute.

Built With

android-15
android-virtualization-framework-(avf)
c++
chromium
debian-arm64
gemini
gemini-live-api
javascript
jetpack-compose
kotlin
kotlin-coroutines
meta-ray-ban-smart-glasses
meta-wearables-dat-sdk
node.js
okhttp
openclaw
playwright
typescript

Updates

Cheng Cheng started this project — Feb 09, 2026 07:51 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.