HomePlanet - Project Story

Inspiration

"Where did I put my keys?" This universal frustration becomes a major barrier for people with visual or cognitive impairments. While AI assistants can describe what's in front of a camera right now, none remember where things were yesterday or detect when something has moved.

We envisioned an AI that doesn't just see—it remembers. HomePlanet is a spatial memory assistant that builds persistent understanding of your physical space and improves through continuous learning.

What it does

HomePlanet is a spatial memory assistant that turns uploaded images into a searchable, intelligent database of objects. Here's what makes it special:

🎯 Fast Image Segmentation - Upload a photo and Gemini Vision rapidly identifies and segments every object with bounding boxes

✂️ Smart Object Extraction - Each detected object is automatically cropped into its own first-class entity with precise coordinates

🧠 Persistent Memory - Mastra stores every object in PostgreSQL with metadata: what it is, where it was, when it was seen, and context

🔍 Intelligent Research - Ask about any object and the agent uses Tavily + Browserbase to fetch relevant information and enrich its understanding

💬 Conversational Interface - CopilotKit provides a chat UI where you can ask "What objects did I upload?" or "Tell me more about that lamp" and the agent remembers everything

The system doesn't just analyze images—it builds a growing knowledge base of your physical world that gets smarter over time.

How we built it

We integrated six sponsor technologies to demonstrate modern AI agent development:

1. Daytona (Development Environment)

Standardized dev environment with all dependencies pre-configured
Used for image processing pipeline - handling image uploads, cropping, and manipulation
Enabled instant team collaboration without "works on my machine" issues

2. Google Cloud - Gemini (Vision AI)

Lightning-fast object detection - Gemini Vision segments images in seconds
Returns precise bounding boxes for every detected object
Provides rich descriptions and confidence scores
The speed is genuinely impressive - full image analysis in ~2-3 seconds

3. Daytona + Image Processing

Automatically crops each bounding box into individual object images
Each object becomes a first-class entity with its own cropped image file
Creates a visual catalog that's both human-readable and machine-queryable

4. Mastra (Agent Framework & Memory)

Persistent object memory - Every detected object stored in PostgreSQL
Tracks metadata: object type, description, coordinates, timestamp, parent image
Agent can recall "What objects have I seen?" across conversations
Built custom tools for image analysis, web research, and memory queries

5. Tavily + Browserbase (Intelligent Research)

When you ask about an object, agent uses Tavily to search for relevant info
Browserbase automates web interactions to gather enriched context
Agent remembers this research and associates it with the object in memory
Next time you ask, it recalls both the visual data and learned context

6. CopilotKit by ag-ui (Agent UI)

Clean chat interface connected to Mastra agents via @ag-ui/mastra adapter
Real-time agent state synchronization
Users can query objects conversationally: "What's in my office?" or "Tell me about that chair"

Challenges we ran into

Multi-Provider Integration - Orchestrating six different services with varying APIs required careful abstraction design. We used Mastra's tool system to create consistent interfaces across all integrations.

Agent Memory Architecture - Designing persistent memory that maintains context across sessions while staying performant required careful PostgreSQL schema design and efficient state management.

CopilotKit-Mastra Connection - Getting the @ag-ui/mastra adapter working with shared state and proper type safety took iteration, but the result is seamless agent-UI communication.

Environment Consistency - Managing multiple API keys and services across team members was simplified by Daytona's standardized dev environments.

Accomplishments that we're proud of

🎯 Object-Level Intelligence - We don't just store images, we extract and catalog every individual object as a searchable entity with cropped images and metadata

⚡ Blazing Fast Vision - Gemini's segmentation speed is genuinely impressive - full scene analysis in seconds

🧠 True Memory - Mastra's PostgreSQL integration means the agent actually remembers objects across sessions, not just within a conversation

🔍 Self-Enriching Knowledge - Objects gain context over time as the agent researches them with Tavily/Browserbase and stores learnings

🏗️ End-to-End Pipeline - From image upload → Gemini segmentation → Daytona crop processing → Mastra storage → conversational retrieval, everything works together

✨ Six Sponsors, One Cohesive System - Daytona, Gemini, Mastra, Tavily, Browserbase, and CopilotKit all contribute meaningfully to the demo

What we learned

Gemini Vision is seriously fast - The speed of object detection and segmentation exceeded expectations. Processing complex scenes in 2-3 seconds enables real-time applications.

Treating objects as first-class entities changes everything - By cropping and storing each object individually, we created a queryable visual database rather than just analyzed images.

Mastra's memory system is production-ready - PostgreSQL storage with proper schema design enables genuine cross-session memory that feels magical to users.

Daytona isn't just for containers - Using it for image manipulation and processing pipelines showed how standardized environments help beyond just code.

Research + Memory = Intelligence - The combination of Tavily/Browserbase for research and Mastra for storage creates agents that genuinely learn and improve.

CopilotKit makes agents feel native - The @ag-ui/mastra adapter eliminated weeks of UI plumbing work.

What's next for HomePlanet

Complete Vision Pipeline - Finish integrating Gemini for full image analysis and object detection capabilities

Temporal Tracking - Build the knowledge graph to track object locations over time and detect changes

User Feedback Loop - Implement correction mechanisms where user feedback improves future predictions

Mobile Companion - Create a mobile app for easy photo capture throughout the day

Accessibility Features - Add voice input/output and screen reader support for visually impaired users

Pattern Learning - Use historical data to predict where objects should be based on user routines

HomePlanet demonstrates how modern AI agent frameworks (Mastra + CopilotKit), vision AI (Gemini), web automation (Browserbase + Tavily), and standardized dev environments (Daytona) combine to build assistive technology with real-world impact. 🌍

Built With

ag-ui
browserbase
daytona.io
gemini
mastra
tavily
weave

Updates

Avram C started this project — Oct 12, 2025 04:10 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.