Inspiration
We kept asking ourselves the same question: why do we need 14 apps to manage one life? A calendar here, a journal there, a habit tracker somewhere else, a notes app we forgot the password to. None of them talk to each other. None of them know us.
We wanted one place where you could say "I need to get back in shape and stop forgetting to call my mom" and have something actually do something about it. Not just take a note. Not just set a timer. Actually plan it, track it, remember it, and hold you accountable. That's how Jems was born — an AI life assistant where four specialized agents collaborate behind the scenes so you never have to context-switch between apps again.
What it does
Jems gives you four AI agents (we call them jems) that work together as a team:
Noor — your main conversational partner. She routes your messages to the right specialist and handles daily briefings, web search, and general chat. Kai — your scheduler. Creates tasks, reminders, and time-blocked plans. Decomposes goals into milestones automatically. Echo — your memory keeper. Curates an infinite journal from voice notes, photos, thoughts, and reflections. Detects mood patterns over time. Sage — your growth coach. Tracks goals, analyzes progress, and powers a social layer where friends share journal highlights. You talk in one thread (the Hub), and the right jem answers. Say "plan my week and journal about last week" and both Kai and Echo respond. Tasks can require photo proof for accountability. Journal entries can be tagged shareable so friends see them in the Lounge. The whole thing runs through a floating dock with a 3D agent sphere you can swipe, tap, or double-tap for live voice mode.
How we built it
The frontend is a Flutter app with a spatial white glassmorphism design — radial gradient agent spheres, frosted glass dock, soft shadows, no dark mode. State management with Riverpod, routing with go_router, and freezed for immutable data models.
The backend is a Python FastAPI service running a Google ADK multi-agent orchestration system. Noor is the root agent on Gemini 2.5 Flash (for voice streaming), and the three sub-agents run on Gemini 2.5 Pro for richer tool calling. We built 40+ tool functions across 11 modules — task management, goal decomposition, reminders, journal, memory, social, planning, analysis, and a cross-agent context bus so agents stay aware of each other's actions.
Data lives in Firestore. Vector memory (for semantic recall) uses Gemini text-embedding-004 and is stored in GCS. Authentication is Firebase Auth. Subscriptions go through RevenueCat. The whole backend deploys to Google Cloud Run via Terraform, with Artifact Registry, Secret Manager, Cloud Scheduler, and GCS buckets all provisioned as code.
Real-time chat runs over WebSocket with REST fallback. The context bus uses Firestore with a 24-hour TTL so agents can read each other's recent actions without direct coupling.
Challenges we ran into
Multi-agent orchestration was the big one. Getting four agents to collaborate without stepping on each other's toes — or worse, duplicating work — required building a context bus from scratch. When Kai creates a task linked to a goal, Sage needs to know about it to update progress, and Echo might want to journal about it. Coordinating that without creating circular dependencies took serious iteration.
Voice streaming over WebSocket with Gemini 2.5 Flash Live was another beast. Handling bidirectional audio, managing session state, and keeping latency low enough for natural conversation pushed us to rethink our connection architecture multiple times.
The spatial UI was deceptively hard. Glassmorphism with BackdropFilter in Flutter is expensive. Getting the 3D clay-like agent spheres to look right with radial gradients, inner shadows, and kawaii faces while keeping scroll performance smooth required a lot of profiling and optimization.
Accomplishments that we're proud of
The orchestration actually works. You send one message, and the right agent (or agents) respond. No manual routing, no "talk to the planner" commands. Noor just figures it out.
The context bus is clean. Agents publish events, other agents read them, stale events auto-expire. It's simple, decoupled, and it makes the whole system feel like a team instead of four isolated chatbots.
40+ tools across four agents, all deployed and working on Cloud Run. Goal decomposition automatically creates milestones, tasks, and reminders in one call. The adapt_plan tool analyzes velocity and detects when you're falling behind. It feels like the agents genuinely know your life.
And the proof-based accountability system — tasks that ask you to take a gym selfie or check in — is something we haven't seen anywhere else. Three months from now, you'll have the receipts.
What we learned
Agent orchestration is a design problem, not just an engineering one. The hardest part wasn't making agents call tools — it was deciding which agent should own which responsibility, and how they should communicate without creating chaos.
We learned that a context bus with TTL is way better than direct agent-to-agent calls. Loose coupling wins. We also learned that Gemini 2.5 Flash is surprisingly capable as a router/orchestrator, and that reserving Pro for the sub-agents that need deep reasoning is the right cost/performance tradeoff.
On the Flutter side, we learned that spatial UI design requires thinking in layers — glass, shadows, gradients, blur — and that every BackdropFilter has a performance cost you need to budget for.
What's next for Jems
Agent Marketplace — discover and add specialized third-party agents (fitness coach, language tutor, finance advisor) that plug into the existing orchestration. A2A Protocol — friend-to-friend agent communication. Your Noor can talk to your friend's Noor to coordinate plans. MCP integrations — connect your calendar, GitHub, Notion, and other tools as dynamic agent capabilities. iOS launch and cross-platform sync. Proactive intelligence — agents that reach out first. Morning briefings, evening reflections, nudges when goals stall, and celebration when you hit milestones. Voice-first experience — making live voice mode the primary interaction, not just a feature. Planning to include in vr or rayban meta glass which can access video streaming feature in gemini live



Log in or sign up for Devpost to join the conversation.