Inspiration
The world’s most advanced Artificial Intelligence is currently "homeless." We interact with models like Gemini 2.5 Flash through glass screens or static smart speakers, but these assistants lack agency—they cannot follow us, they cannot see our physical environment, and they cannot act upon the world.
We were inspired by a simple question: Why build a robot with a bulky, expensive onboard computer when everyone already carries a supercomputer in their pocket? AURA-1 was born from the desire to bridge the gap between mobile compute and physical robotics, giving the world's smartest "brain" a way to walk.
What it does
AURA-1 is a standalone, moving personal assistant. By mounting a smartphone onto a custom robotic chassis, the app transforms the device into a Mobile Cortex.
Vision & Audio: Using the Gemini 2.5 Flash Native Audio Preview, it perceives the world in real-time, recognizing faces, obstacles, and gestures.
Embodied Interaction: It doesn't just talk; it moves. It can follow a user, perform "dances" to express emotion, and navigate domestic spaces.
Low-Latency Control: Through a direct USB-tunnel, the cloud-based AI sends motor commands directly to the robot's "legs" (chassis) with near-zero lag.
How we built it
The architecture of AURA-1 is a "Cloud-to-Cable" pipeline.
The Brain: Developed using React and the Google Generative AI SDK, specifically the Multimodal Live API. We utilized the Gemini 2.5 Flash Native Audio model to handle simultaneous audio and video streams.
The Nervous System: We implemented a USB-ADB port forwarding tunnel. This allows the mobile browser to communicate with the local motor controller (Raspberry Pi/Microcontroller) via WebSockets on 127.0.0.1:5005.
The HUD: A custom-built Head-Up Display using CSS and Framer Motion provides real-time telemetry, including orientation data calculated
Hardware: A custom-designed 3D-printed chassis housing high-torque DC motors, linked to the phone via a single USB-C data bridge.
Challenges we ran into
The "Tether" Trap: Early versions required a laptop to "stream" the app to the phone. Removing the laptop was our biggest hurdle, requiring us to host the logic locally on the phone while maintaining a secure WebSocket link to the hardware.
Mobile Browser Sandbox: Browsers are designed to be secure, often blocking camera and audio access. We had to implement "Secure Context" overrides and user-gesture triggers to ensure the Gemini Live session could initialize automatically.
Audio PCM Decoding: Gemini 2.5 Flash sends native audio in 24kHz PCM. Converting this raw buffer into high-fidelity speech on a mobile device without stuttering required optimizing the AudioContext and ScriptProcessor nodes.
Accomplishments that we're proud of
We successfully achieved true standalone autonomy. AURA-1 operates without a laptop, using only the phone's 5G/Wi-Fi to "think" in the cloud and its USB port to "act" in the physical world. We managed to keep the total system latency—from "seeing" an obstacle to "stopping" the motors—under 200ms.
What we learned
We learned that embodied AI is as much about networking as it is about robotics. Managing the data flow between the Google Cloud and a local USB serial port taught us deep lessons in asynchronous programming, WebSocket stability, and mobile sensor calibration. We also discovered the power of Function Calling (Tool Use) in LLMs, allowing the AI to treat a motor controller as just another API.
What's next for AURA-1
The next stage for AURA-1 is Spatial Memory. We plan to implement SLAM (Simultaneous Localization and Mapping) so the robot can build a persistent map of a home. We are also looking into "Shared Context," where the robot remembers your preferences and past interactions to become a more proactive personal assistant.
Built With
- abd
- google-generative-ai-sdk-(gemini-2.5-flash)
- node.js
- react
- tailwind-css
- typescript
- vite
- web-audio-api
- websockets
Log in or sign up for Devpost to join the conversation.