HomePlanet - Project Story
Inspiration
"Where did I put my keys?" This universal frustration becomes a major barrier for people with visual or cognitive impairments. While AI assistants can describe what's in front of a camera right now, none remember where things were yesterday or detect when something has moved.
We envisioned an AI that doesn't just seeβit remembers. HomePlanet is a spatial memory assistant that builds persistent understanding of your physical space and improves through continuous learning.
What it does
HomePlanet is a spatial memory assistant that turns uploaded images into a searchable, intelligent database of objects. Here's what makes it special:
π― Fast Image Segmentation - Upload a photo and Gemini Vision rapidly identifies and segments every object with bounding boxes
βοΈ Smart Object Extraction - Each detected object is automatically cropped into its own first-class entity with precise coordinates
π§ Persistent Memory - Mastra stores every object in PostgreSQL with metadata: what it is, where it was, when it was seen, and context
π Intelligent Research - Ask about any object and the agent uses Tavily + Browserbase to fetch relevant information and enrich its understanding
π¬ Conversational Interface - CopilotKit provides a chat UI where you can ask "What objects did I upload?" or "Tell me more about that lamp" and the agent remembers everything
The system doesn't just analyze imagesβit builds a growing knowledge base of your physical world that gets smarter over time.
How we built it
We integrated six sponsor technologies to demonstrate modern AI agent development:
1. Daytona (Development Environment)
- Standardized dev environment with all dependencies pre-configured
- Used for image processing pipeline - handling image uploads, cropping, and manipulation
- Enabled instant team collaboration without "works on my machine" issues
2. Google Cloud - Gemini (Vision AI)
- Lightning-fast object detection - Gemini Vision segments images in seconds
- Returns precise bounding boxes for every detected object
- Provides rich descriptions and confidence scores
- The speed is genuinely impressive - full image analysis in ~2-3 seconds
3. Daytona + Image Processing
- Automatically crops each bounding box into individual object images
- Each object becomes a first-class entity with its own cropped image file
- Creates a visual catalog that's both human-readable and machine-queryable
4. Mastra (Agent Framework & Memory)
- Persistent object memory - Every detected object stored in PostgreSQL
- Tracks metadata: object type, description, coordinates, timestamp, parent image
- Agent can recall "What objects have I seen?" across conversations
- Built custom tools for image analysis, web research, and memory queries
5. Tavily + Browserbase (Intelligent Research)
- When you ask about an object, agent uses Tavily to search for relevant info
- Browserbase automates web interactions to gather enriched context
- Agent remembers this research and associates it with the object in memory
- Next time you ask, it recalls both the visual data and learned context
6. CopilotKit by ag-ui (Agent UI)
- Clean chat interface connected to Mastra agents via
@ag-ui/mastraadapter - Real-time agent state synchronization
- Users can query objects conversationally: "What's in my office?" or "Tell me about that chair"
Challenges we ran into
Multi-Provider Integration - Orchestrating six different services with varying APIs required careful abstraction design. We used Mastra's tool system to create consistent interfaces across all integrations.
Agent Memory Architecture - Designing persistent memory that maintains context across sessions while staying performant required careful PostgreSQL schema design and efficient state management.
CopilotKit-Mastra Connection - Getting the @ag-ui/mastra adapter working with shared state and proper type safety took iteration, but the result is seamless agent-UI communication.
Environment Consistency - Managing multiple API keys and services across team members was simplified by Daytona's standardized dev environments.
Accomplishments that we're proud of
π― Object-Level Intelligence - We don't just store images, we extract and catalog every individual object as a searchable entity with cropped images and metadata
β‘ Blazing Fast Vision - Gemini's segmentation speed is genuinely impressive - full scene analysis in seconds
π§ True Memory - Mastra's PostgreSQL integration means the agent actually remembers objects across sessions, not just within a conversation
π Self-Enriching Knowledge - Objects gain context over time as the agent researches them with Tavily/Browserbase and stores learnings
ποΈ End-to-End Pipeline - From image upload β Gemini segmentation β Daytona crop processing β Mastra storage β conversational retrieval, everything works together
β¨ Six Sponsors, One Cohesive System - Daytona, Gemini, Mastra, Tavily, Browserbase, and CopilotKit all contribute meaningfully to the demo
What we learned
Gemini Vision is seriously fast - The speed of object detection and segmentation exceeded expectations. Processing complex scenes in 2-3 seconds enables real-time applications.
Treating objects as first-class entities changes everything - By cropping and storing each object individually, we created a queryable visual database rather than just analyzed images.
Mastra's memory system is production-ready - PostgreSQL storage with proper schema design enables genuine cross-session memory that feels magical to users.
Daytona isn't just for containers - Using it for image manipulation and processing pipelines showed how standardized environments help beyond just code.
Research + Memory = Intelligence - The combination of Tavily/Browserbase for research and Mastra for storage creates agents that genuinely learn and improve.
CopilotKit makes agents feel native - The @ag-ui/mastra adapter eliminated weeks of UI plumbing work.
What's next for HomePlanet
Complete Vision Pipeline - Finish integrating Gemini for full image analysis and object detection capabilities
Temporal Tracking - Build the knowledge graph to track object locations over time and detect changes
User Feedback Loop - Implement correction mechanisms where user feedback improves future predictions
Mobile Companion - Create a mobile app for easy photo capture throughout the day
Accessibility Features - Add voice input/output and screen reader support for visually impaired users
Pattern Learning - Use historical data to predict where objects should be based on user routines
HomePlanet demonstrates how modern AI agent frameworks (Mastra + CopilotKit), vision AI (Gemini), web automation (Browserbase + Tavily), and standardized dev environments (Daytona) combine to build assistive technology with real-world impact. π

Log in or sign up for Devpost to join the conversation.