ARIA - The Pocket Palantir

Aria's PC dashboard
User using Aria
Prototype
CAD render of prototype

Inspiration

Humane AI Pin raised $230M and failed. Rabbit raised $180M. Limitless, Tab AI, Plaud - hundreds of millions in funding, none delivered a product people actually use.

They all built expensive hardware looking for a use case. We flipped it: build the use case first, then wrap it in a $30 box you can 3D-print at home.

What it does

ARIA is a wearable AI assistant. Press the button to start listening, say "ARIA" to begin a conversation, and hear the answer in your audio output — in your own cloned voice

But it's not just Q&A. ARIA acts:

Detects the wake word "ARIA" in ambient conversation and starts a session automatically, no second button press needed
Searches the web and X/Twitter in real time
Remembers facts about you automatically (extracts memories from ambient conversation every 15 minutes)
Manages your calendar with natural language ("add the F1 race on Sunday at 2pm")
Sets reminders that speak to you through your audio output
Recalls what you discussed hours ago from full ambient transcripts
Chains up to 6 tool calls per query - search the web, save a memory, check your calendar, and compose a single informed response

The wearable box glows and animates: blue when idle, green when listening, yellow when thinking, magenta when speaking. The TFT display shows ARIA's responses. A live dashboard shows transcripts, tool activity, and stored memories in real time.

It works on any network, anywhere. The Pi creates its own WiFi hotspot. The wearable auto-connects. Tether your phone, plug into hotel WiFi, use campus ethernet. No setup, no router, no app install. ARIA just works.

How we built it

Three compute layers. Seven APIs. Two physical devices. One seamless voice experience.

Layer 1 - The Wearable (ESP32): Arduino Nano ESP32 in a custom 3D-printed enclosure (designed in Autodesk Inventor, printed during the hackathon). ST7789 TFT display with word-wrapped text rendering. 4 PWM-animated RGB LEDs with sine-wave breathing patterns. Push button triggers a WebSocket state machine that syncs with the base station in real time.

Layer 2 - The Brain (Raspberry Pi 5): FastAPI server running 3 WebSocket channels. OpenAI Realtime API for voice-to-voice at 24kHz with server-side VAD. GPT-5.2 agentic loop with 10 tools and 6-iteration depth. ElevenLabs Flash v2.5 for voice-cloned TTS (with OpenAI TTS and espeak as fallbacks). Whisper for ambient transcription. SQLite with 5 tables for persistent memory, calendar, transcripts, and reminders.

Layer 3 - The Dashboard (Browser): Dark-themed real-time interface with animated waveform, live transcript feed, conversation history, tool activity log, and memory viewer. All pushed via WebSocket - zero polling.

7 APIs integrated: OpenAI Realtime, GPT-5.2, Whisper, OpenAI TTS, ElevenLabs, Brave Search (DuckDuckGo fallback), Twitter/X via Tweepy.

Every component has graceful fallbacks. If ElevenLabs is down, it falls back to OpenAI TTS, then espeak. If Brave fails, DuckDuckGo takes over. If the wearable disconnects from WiFi, state and text are queued and delivered on reconnect. Nothing crashes. ARIA keeps working.

Challenges we overcame

Hardware prototyping without PCBs: We had no access to copper board or protoboard at the venue, so every electrical connection had to be hand-routed with individual jumper wires and point-to-point soldering. What would normally be a clean, structured layout became a spatial puzzle, routing power, data, and signal lines between the ESP32, TFT display, LEDs, and button in a compact 3D-printed enclosure with zero margin for shorts. We mapped every connection by hand, heat-shrinked each joint individually, and stress-tested the assembly through dozens of reconnection cycles before we were confident in its reliability.
ESP32 WebSocket reliability: JSON buffer overflows crashed the connection. We increased buffers to 512 bytes, added keepalive pings, and built a reconnection queue so no state is lost.
State flickering: The wearable would rapidly toggle between states. We added deduplication and a 90-second timeout to prevent stuck states.
Voice latency: Our first pipeline (record -> transcribe -> think -> synthesize -> play) had 4+ seconds of latency. Switching to OpenAI's Realtime API for bidirectional streaming brought it under a second.
Whisper hallucinations: On silence, Whisper would transcribe phantom phrases like "Thank you" or "Subscribe". We built a hallucination filter that catches these before they hit the brain.
3D printing the enclosure: Designed and printed the housing during the hackathon with tight tolerances for the display cutout, button access, and LED diffusion.

What we learned

Edge AI is viable today. A Raspberry Pi 5 can run a sophisticated AI pipeline with multiple fallbacks and real-time voice - no cloud GPU needed.
The agentic tool-use pattern (let the LLM decide which tools to call and chain them) is dramatically more useful than simple Q&A. ARIA feels intelligent because it acts, not just answers.
Voice cloning changes the UX completely. When ARIA responds in your own voice, the experience shifts from "talking to a device" to "thinking out loud." -Resilience beats speed. Our fastest path (Realtime API) was our least reliable. Our fallback pipeline never failed once. Shipping both with automatic switching meant the demo always worked.

Market validation

60+ people joined our waitlist within 24 hours of announcing
100+ B2B prospects reached via LinkedIn outreach during the hackathon
Direct interest from professionals wanting ambient AI for meetings, sales calls, and accessibility

Unit economics: $16 BOM, $149 retail kit = 89% gross margin. No recurring cloud costs for self-hosted users. Open-source core drives adoption; optional SaaS tier ($9/mo) for hosted users.

Competitors have raised $435M combined and failed to ship what we built in 24 hours for $30.

What's next

Custom PCB to replace the hand-wired prototype
Multi-user support with speaker diarisation, separate memory profiles per person
On-device transcription (faster-whisper / Whisper.cpp) to reduce cloud API dependency
Smartphone companion app
Integration with smart home devices
Enterprise tier for meeting intelligence and sales coaching
Full offline mode — everything processing locally on the Pi
Available as both prebuilt and DIY kits, with the full design open-sourced for anyone to build their own