Inspiration# Domus — A Multimodal Household Memory Agent

Inspiration

Households generate a surprising amount of unstructured information every day: sticky notes, appointment cards, receipts, grocery lists, screenshots of bills, and quick reminders shared in chat. Much of this information ends up scattered across different apps or forgotten entirely.

I wanted to explore how an AI system could bridge the gap between messy real-world information and structured digital organization. Instead of manually copying information between tools, what if an AI agent could interpret everyday inputs and convert them into structured memory?

Domus was created to explore that idea: a shared household memory assistant that can interpret messages and images and convert them into organized household information.


What It Does

Domus is a multimodal household memory assistant that transforms everyday inputs into structured household records.

Users can provide inputs such as:

  • Photos of notes or appointment cards
  • Screenshots of reminders or bills
  • Receipts or grocery lists
  • Natural language messages about household tasks

Domus interprets these inputs using Gemini’s multimodal capabilities and converts them into structured entries such as:

  • Household tasks
  • Shared reminders
  • Upcoming events
  • Grocery items

These records are stored in a shared household memory system, allowing users to ask questions like:

“What do we need to do this week?”
“What reminders do we have?”
“What should we buy at the store?”

Instead of simply generating text responses, Domus connects the AI model to backend tools that store and retrieve structured information.


How We Built It

Domus was built using Gemini models, Google’s Agent Development Kit (ADK), and Google Cloud services as part of the hackathon project.

The system follows a simple agent-tool architecture:

User → Frontend → FastAPI → Gemini Agent (ADK) → Firestore

Core Components

Frontend

  • Lightweight web interface
  • Allows users to submit messages and images
  • Sends requests to the backend API

Agent Layer

  • Built using Google’s Agent Development Kit (ADK)
  • Uses Gemini 2.5 Flash for reasoning and multimodal interpretation
  • Determines which tool to call based on user intent

Backend API

  • Built using FastAPI
  • Exposes endpoints that function as agent tools
  • Handles memory creation, updates, and retrieval

Database

  • Firestore stores structured household memory records
  • Entries include tasks, reminders, events, and notes

Agent Workflow

  1. A user submits a message or image.
  2. The request is sent to the backend API.
  3. Gemini analyzes the request and determines the user’s intent.
  4. The ADK agent selects the appropriate tool.
  5. Backend tools store or retrieve structured data from Firestore.
  6. Domus responds with useful context based on the stored household memory.

This architecture allows the AI to perform actions through tools, rather than only generating conversational responses.


Challenges

Designing an Agent Architecture

One challenge was moving beyond a typical chatbot design and instead building a system where the model could take structured actions through tools. This required designing a clear API layer that the agent could call reliably.

Structuring Household Memory

Household information can vary widely in format. Designing a schema that works for reminders, notes, tasks, and events while remaining flexible required experimentation.

Interpreting Multimodal Inputs

Images often contain multiple pieces of information. For example, a photo of a note might contain both a task and a date. The agent must interpret which information should become structured memory.

Tool Reliability

For the agent to function correctly, tools must be predictable and easy for the model to select. Structuring tools and responses in a consistent way was important to improve reliability.


What We Learned

Building Domus reinforced several important ideas about AI agent design.

  • Multimodal inputs expand how users interact with software.
  • Agents become significantly more useful when connected to real systems and structured storage.
  • Well-designed tools make it easier for models to perform reliable actions.

The project demonstrated how AI can help manage everyday information by turning unstructured inputs into structured memory that can be queried later.


Future Improvements

There are many directions Domus could evolve:

  • Background scheduling for reminders and notifications
  • Calendar and email integrations
  • Voice input and audio responses
  • Automatic task prioritization
  • Household member roles and permissions

A future version could also support multiple groups, allowing the same system to manage not only households but also teams, shared activities, or community groups.

The long-term vision is a multimodal memory assistant that helps people coordinate real-world tasks and information through AI.

Built With

Share this project:

Updates