Inspiration
300K+ people starred OpenClaw in 6 weeks because they want a personal AI that actually does things - sends emails, runs code, browses the web, learns new skills. But every open-source "personal AI computer" requires Docker, SSH, terminal commands, and a cloud bill. The demand is proven. The accessibility isn't.
We asked: what if the personal AI computer wasn't on your desk - it was on your phone, powered by Gemini, and you just talked to it?
Text Description
What it does
Elora is a live voice-and-vision agent that acts on the real world from your phone. She handles interruptions naturally (full barge-in via Gemini Live API), has a distinct voice persona, and maintains persistent memory across sessions. This is a live agent, not a turn-based chatbot.
Voice-First, Barge-In Native. "Hey Elora" wakes from any screen. Full duplex audio via Gemini Live API - interrupt her mid-sentence, she interrupts you back. During calls, she sees your camera feed in real time and proactively speaks up when she notices something relevant. The experience is fluid and context-aware, never disjointed or turn-based.
She Sees Your World. During live voice calls, Elora observes your camera every 3 seconds and responds to what she sees without being asked. Point your phone at anything - she identifies objects, reads text, and recognizes faces. "This is my friend Maya" - she stores the face reference, and next time Maya appears on camera, she knows.
40+ Real Tools, Real Actions. Gmail (read, send, archive, batch manage), Google Calendar (full CRUD), Playwright browser automation with live screenshot stream, Google Docs and Slides creation, SMS via Twilio, code execution in a personal sandbox, and a web browser that works even on sites blocked in your country. These aren't mock tools - she sends actual emails, texts actual people, books actual calendar events.
Self-Extending Skill System. Tell Elora "I need to track crypto prices" - she searches her skill registry, installs the right skill, and executes it in your personal sandbox. If no skill exists, she writes one from scratch, tests it in the sandbox, and saves it to your library permanently. 7 skills ship bundled (weather, exchange rates, Hacker News, Wikipedia, crypto, RSS, Ethiopian power outage estimator). Users can create unlimited custom skills by voice.
Per-User Isolated Sandbox. Every user gets their own persistent cloud VM via E2B. Install Python packages, create files, run code - it all persists across conversations. When Elora runs a skill or executes code, it runs in YOUR sandbox, isolated from everyone else.
Proactive - Acts Before You Ask. Background engine runs every 5 minutes: meeting alerts 15 min before events, birthday nudges, stale contact check-ins, morning briefings, Gmail push alerts. All quality-gated by Gemini Flash. She doesn't wait for instructions.
People Memory + Face Recognition. Elora knows the people in your life - names, relationships, birthdays, contact info, faces. "Text my girlfriend" resolves the person, finds the phone number, sends the SMS.
3-Layer Persistent Memory (MemU). Powered by MemU (92% Locomo accuracy, 10x lower always-on cost): raw facts via vector search, compacted user profile, and session summaries. She remembers what you told her three weeks ago.
Security Built In. The Agntor trust protocol runs on every message: prompt injection detection (12 regex patterns + 3 heuristic checks + structural analysis), PII/secret redaction, tool guardrails with blocklist and confirmation policies, and SSRF protection with DNS resolution. Verified agent identity at /agent/identity.
Multi-Agent Architecture. Google ADK with 5 specialist sub-agents: WebResearcher, BrowserWorker, EmailCalendar, FileMemory, and a self-verifying ResearchLoop. The root orchestrator delegates to the right specialist automatically.
How we built it
Backend: FastAPI + Python 3.11 on Google Cloud Run. The ADK agent hierarchy has a root orchestrator and 5 specialist sub-agents, each with dedicated tools. Every tool function is wired into three places simultaneously: the ADK agent (text mode), the Gemini Live API declarations (voice mode), and the LiveKit voice agent.
Voice: Gemini Live API (gemini-2.5-flash-native-audio) handles real-time bidirectional audio streaming with native barge-in support. The mobile app maintains three concurrent WebSocket connections: text chat (ADK agent), live audio (Gemini Live API), and always-on wake word detection. The wake word system streams 800ms WAV clips from the microphone to the backend, where Gemini detects "Hey Elora" and triggers a call.
Skill System: Skills are YAML+code definitions stored per-user in Firestore. The skill engine supports: search (bundled + community registry), install (Firestore + sandbox deployment), create (Elora writes code, validates in sandbox, saves permanently), execute (runs in user's personal E2B sandbox), and publish. 7 bundled skills ship with Elora.
Personal Sandbox: Each user gets a persistent E2B sandbox VM with auto-pause. Sandbox IDs are tracked in Firestore for reconnection. Pre-installed packages persist. Files persist. The sandbox has full network access - can clone repos, make changes, commit, and push to GitHub.
Security: The Agntor trust protocol runs as middleware on every incoming message. Five layers: (1) prompt injection guard with 12 regex patterns + 3 heuristic checks + structural analysis, (2) PII/secret redaction for API keys, tokens, credit cards, SSNs, (3) tool guardrails with a blocklist (shell.exec, eval) and confirmation list (send_email, delete_file), (4) SSRF protection validating URLs against private IP ranges with DNS resolution, (5) agent identity endpoint exposing capabilities and security posture.
Mobile: Expo/React Native (TypeScript). Immersive call UI with live camera feed, floating controls, animated avatar states, and real-time chat transcript visible during voice calls - so users see tool execution happening as they speak.
Infrastructure: Terraform IaC provisions Cloud Run, Artifact Registry, GCS, Firestore, and IAM. GitHub Actions CI/CD auto-deploys on push. 50+ Cloud Run revisions showing real development history.
Challenges we ran into
- The Gemini Live API doesn't support the ADK tool-calling protocol natively, so we built a parallel tool declaration and dispatch system (manual JSON schemas + function mapping) that mirrors the ADK agent's 40+ capabilities for voice mode.
- E2B sandbox persistence required careful lifecycle management - auto-pause, Firestore ID tracking, reconnection logic, and graceful fallback when a paused sandbox can't be resumed.
- Per-user data isolation across 40+ tools required a consistent ContextVar pattern - every tool call reads the current user ID from a context variable set at WebSocket connection time.
- Memory compaction at scale: merging and deduplicating raw memory facts into a structured user profile without losing important details required iterative prompt engineering with Gemini Flash.
- Binary data from generate_music/generate_image tools was being stored in ADK session history, blowing up context windows on subsequent messages. Solved with a ContextVar side-channel that drains binary payloads outside the session state.
Accomplishments that we're proud of
- It's real. 40+ tools, all wired, all deployed, all working on a live Cloud Run instance with 50+ revisions of real development history. Nothing is mocked.
- The skill system works end to end. Elora can search for a skill, install it, execute it in the user's sandbox, and return results - or write a brand new skill from scratch, test it in the sandbox, and save it permanently.
- Per-user sandbox isolation. Every user gets their own persistent cloud VM. Files, packages, and state persist across conversations.
- Barge-in is real. You can interrupt Elora mid-sentence and she responds to your new question naturally. She can also interrupt you if she sees something important on camera.
- GitHub push by voice. During a power outage, you can tell Elora by voice to clone a repo, edit a file, commit, and push to GitHub - all from your phone, executed in the E2B sandbox.
- Security is built in, not bolted on. The Agntor trust protocol runs on every message before it reaches the agent.
What we learned
- The gap between "AI chatbot" and "personal AI computer" is not more LLM calls - it's isolation, persistence, and extensibility. A sandbox that persists and a skill system that learns changes the entire product category.
- Security can't be an afterthought for personal AI. Prompt injection, PII leakage, and SSRF are real attack vectors when an agent has access to email, files, and code execution.
- Google's ADK multi-agent architecture is powerful for real products - the one-parent-per-agent constraint forces clean separation of concerns.
- Gemini Live API's native barge-in makes voice agents feel fundamentally different from turn-based chatbots. The "live" factor is not a feature - it's a paradigm shift.
What's next
- Community skill marketplace. Users share, rate, and monetize skills - turning Elora into a platform.
- MCP (Model Context Protocol) support. Connect Elora to any MCP-compatible tool server, massively expanding the tool ecosystem.
- On-device wake word. Move detection to on-device ML to eliminate the always-on WebSocket.
- App Store launch. Elora is built as a real product, not a demo. The goal is to ship it.
-
URLs
- GitHub: https://github.com/Garinmckayl/elora
- Live Backend: https://elora-backend-qf7tbdhnnq-uc.a.run.app
- Agent Identity: https://elora-backend-qf7tbdhnnq-uc.a.run.app/agent/identity
- Skill Registry: https://elora-backend-qf7tbdhnnq-uc.a.run.app/agent/skills
- GCP Proof: https://youtu.be/W9jnF3Cvj6E
- Blog Post: https://dev.to/zeshama/i-built-a-personal-ai-computer-with-gemini-heres-how-934
- GDG Profile: https://gdg.community.dev/u/m4z26f/#/about
Third-Party Integrations
- E2B (e2b.dev): Per-user cloud sandboxes for code execution and skill runtime. Used under E2B's standard API terms.
- Agntor (github.com/agntor/agntor): Open-source trust protocol for agent security. MIT license.
- MemU (github.com/NevaMind-AI/memU): Open-source memory engine. MIT license.
- Twilio: SMS messaging. Used under Twilio's standard API terms.
- LiveKit: WebRTC voice transport. Used under LiveKit's open-source license.
-
Architecture Diagram




-
Proof of Google Cloud Deployment
Screen recording: YouTube - Cloud Run, Firestore, Cloud Storage walkthrough
Shows Cloud Run console with elora-backend service and 50+ revisions, Firestore collections, and Cloud Storage buckets.
Code links (additional proof):
core/main.py- FastAPI server deployed on Cloud Run with all endpointsinfra/main.tf- Terraform IaC provisioning Cloud Run, GCS, Firestore, IAM.github/workflows/deploy.yml- CI/CD auto-deploying to Cloud Run on push
Bonus Points Checklist
- [x] Blog post (+0.6): I Built a Personal AI Computer With Gemini - Here's How (published on dev.to, includes #GeminiLiveAgentChallenge)
- [x] Infrastructure as Code (+0.2): Terraform in
infra/main.tf+ GitHub Actions CI/CD in.github/workflows/deploy.yml - [x] GDG membership (+0.2): GDG Profile
Total bonus: +1.0 on a 6.0 max scale
-
Built With
- agntor
- cloud-storage
- e2b
- expo.io
- fastapi
- firebase-auth
- firestore
- gemini-2.0-flash
- gemini-2.5-flash
- gemini-live-api
- gmail-api
- google-adk
- google-calendar-api
- google-cloud-run
- google-genai-sdk
- google-slides-api
- google-workspace-api
- imagen
- livekit
- memu
- playwright
- python
- react-native
- terraform
- twilio
- typescript

Log in or sign up for Devpost to join the conversation.