HomePlanet - Project Story

Inspiration

"Where did I put my keys?" This universal frustration becomes a major barrier for people with visual or cognitive impairments. While AI assistants can describe what's in front of a camera right now, none remember where things were yesterday or detect when something has moved.

We envisioned an AI that doesn't just seeβ€”it remembers. HomePlanet is a spatial memory assistant that builds persistent understanding of your physical space and improves through continuous learning.

What it does

HomePlanet is a spatial memory assistant that turns uploaded images into a searchable, intelligent database of objects. Here's what makes it special:

🎯 Fast Image Segmentation - Upload a photo and Gemini Vision rapidly identifies and segments every object with bounding boxes

βœ‚οΈ Smart Object Extraction - Each detected object is automatically cropped into its own first-class entity with precise coordinates

🧠 Persistent Memory - Mastra stores every object in PostgreSQL with metadata: what it is, where it was, when it was seen, and context

πŸ” Intelligent Research - Ask about any object and the agent uses Tavily + Browserbase to fetch relevant information and enrich its understanding

πŸ’¬ Conversational Interface - CopilotKit provides a chat UI where you can ask "What objects did I upload?" or "Tell me more about that lamp" and the agent remembers everything

The system doesn't just analyze imagesβ€”it builds a growing knowledge base of your physical world that gets smarter over time.

How we built it

We integrated six sponsor technologies to demonstrate modern AI agent development:

1. Daytona (Development Environment)

  • Standardized dev environment with all dependencies pre-configured
  • Used for image processing pipeline - handling image uploads, cropping, and manipulation
  • Enabled instant team collaboration without "works on my machine" issues

2. Google Cloud - Gemini (Vision AI)

  • Lightning-fast object detection - Gemini Vision segments images in seconds
  • Returns precise bounding boxes for every detected object
  • Provides rich descriptions and confidence scores
  • The speed is genuinely impressive - full image analysis in ~2-3 seconds

3. Daytona + Image Processing

  • Automatically crops each bounding box into individual object images
  • Each object becomes a first-class entity with its own cropped image file
  • Creates a visual catalog that's both human-readable and machine-queryable

4. Mastra (Agent Framework & Memory)

  • Persistent object memory - Every detected object stored in PostgreSQL
  • Tracks metadata: object type, description, coordinates, timestamp, parent image
  • Agent can recall "What objects have I seen?" across conversations
  • Built custom tools for image analysis, web research, and memory queries

5. Tavily + Browserbase (Intelligent Research)

  • When you ask about an object, agent uses Tavily to search for relevant info
  • Browserbase automates web interactions to gather enriched context
  • Agent remembers this research and associates it with the object in memory
  • Next time you ask, it recalls both the visual data and learned context

6. CopilotKit by ag-ui (Agent UI)

  • Clean chat interface connected to Mastra agents via @ag-ui/mastra adapter
  • Real-time agent state synchronization
  • Users can query objects conversationally: "What's in my office?" or "Tell me about that chair"

Challenges we ran into

Multi-Provider Integration - Orchestrating six different services with varying APIs required careful abstraction design. We used Mastra's tool system to create consistent interfaces across all integrations.

Agent Memory Architecture - Designing persistent memory that maintains context across sessions while staying performant required careful PostgreSQL schema design and efficient state management.

CopilotKit-Mastra Connection - Getting the @ag-ui/mastra adapter working with shared state and proper type safety took iteration, but the result is seamless agent-UI communication.

Environment Consistency - Managing multiple API keys and services across team members was simplified by Daytona's standardized dev environments.

Accomplishments that we're proud of

🎯 Object-Level Intelligence - We don't just store images, we extract and catalog every individual object as a searchable entity with cropped images and metadata

⚑ Blazing Fast Vision - Gemini's segmentation speed is genuinely impressive - full scene analysis in seconds

🧠 True Memory - Mastra's PostgreSQL integration means the agent actually remembers objects across sessions, not just within a conversation

πŸ” Self-Enriching Knowledge - Objects gain context over time as the agent researches them with Tavily/Browserbase and stores learnings

πŸ—οΈ End-to-End Pipeline - From image upload β†’ Gemini segmentation β†’ Daytona crop processing β†’ Mastra storage β†’ conversational retrieval, everything works together

✨ Six Sponsors, One Cohesive System - Daytona, Gemini, Mastra, Tavily, Browserbase, and CopilotKit all contribute meaningfully to the demo

What we learned

Gemini Vision is seriously fast - The speed of object detection and segmentation exceeded expectations. Processing complex scenes in 2-3 seconds enables real-time applications.

Treating objects as first-class entities changes everything - By cropping and storing each object individually, we created a queryable visual database rather than just analyzed images.

Mastra's memory system is production-ready - PostgreSQL storage with proper schema design enables genuine cross-session memory that feels magical to users.

Daytona isn't just for containers - Using it for image manipulation and processing pipelines showed how standardized environments help beyond just code.

Research + Memory = Intelligence - The combination of Tavily/Browserbase for research and Mastra for storage creates agents that genuinely learn and improve.

CopilotKit makes agents feel native - The @ag-ui/mastra adapter eliminated weeks of UI plumbing work.

What's next for HomePlanet

Complete Vision Pipeline - Finish integrating Gemini for full image analysis and object detection capabilities

Temporal Tracking - Build the knowledge graph to track object locations over time and detect changes

User Feedback Loop - Implement correction mechanisms where user feedback improves future predictions

Mobile Companion - Create a mobile app for easy photo capture throughout the day

Accessibility Features - Add voice input/output and screen reader support for visually impaired users

Pattern Learning - Use historical data to predict where objects should be based on user routines


HomePlanet demonstrates how modern AI agent frameworks (Mastra + CopilotKit), vision AI (Gemini), web automation (Browserbase + Tavily), and standardized dev environments (Daytona) combine to build assistive technology with real-world impact. 🌍

Built With

Share this project:

Updates