Domus

Domus architecture

Inspiration# Domus — A Multimodal Household Memory Agent

Inspiration

Households generate a surprising amount of unstructured information every day: sticky notes, appointment cards, receipts, grocery lists, screenshots of bills, and quick reminders shared in chat. Much of this information ends up scattered across different apps or forgotten entirely.

I wanted to explore how an AI system could bridge the gap between messy real-world information and structured digital organization. Instead of manually copying information between tools, what if an AI agent could interpret everyday inputs and convert them into structured memory?

Domus was created to explore that idea: a shared household memory assistant that can interpret messages and images and convert them into organized household information.

What It Does

Domus is a multimodal household memory assistant that transforms everyday inputs into structured household records.

Users can provide inputs such as:

Photos of notes or appointment cards
Screenshots of reminders or bills
Receipts or grocery lists
Natural language messages about household tasks

Domus interprets these inputs using Gemini’s multimodal capabilities and converts them into structured entries such as:

Household tasks
Shared reminders
Upcoming events
Grocery items

These records are stored in a shared household memory system, allowing users to ask questions like:

“What do we need to do this week?”
“What reminders do we have?”
“What should we buy at the store?”

Instead of simply generating text responses, Domus connects the AI model to backend tools that store and retrieve structured information.

How We Built It

Domus was built using Gemini models, Google’s Agent Development Kit (ADK), and Google Cloud services as part of the hackathon project.

The system follows a simple agent-tool architecture:

User → Frontend → FastAPI → Gemini Agent (ADK) → Firestore

Core Components

Frontend

Lightweight web interface
Allows users to submit messages and images
Sends requests to the backend API

Agent Layer

Built using Google’s Agent Development Kit (ADK)
Uses Gemini 2.5 Flash for reasoning and multimodal interpretation
Determines which tool to call based on user intent

Backend API

Built using FastAPI
Exposes endpoints that function as agent tools
Handles memory creation, updates, and retrieval

Database

Firestore stores structured household memory records
Entries include tasks, reminders, events, and notes

Agent Workflow

A user submits a message or image.
The request is sent to the backend API.
Gemini analyzes the request and determines the user’s intent.
The ADK agent selects the appropriate tool.
Backend tools store or retrieve structured data from Firestore.
Domus responds with useful context based on the stored household memory.

This architecture allows the AI to perform actions through tools, rather than only generating conversational responses.

Challenges

Designing an Agent Architecture

One challenge was moving beyond a typical chatbot design and instead building a system where the model could take structured actions through tools. This required designing a clear API layer that the agent could call reliably.

Structuring Household Memory

Household information can vary widely in format. Designing a schema that works for reminders, notes, tasks, and events while remaining flexible required experimentation.

Interpreting Multimodal Inputs

Images often contain multiple pieces of information. For example, a photo of a note might contain both a task and a date. The agent must interpret which information should become structured memory.

Tool Reliability

For the agent to function correctly, tools must be predictable and easy for the model to select. Structuring tools and responses in a consistent way was important to improve reliability.

What We Learned

Building Domus reinforced several important ideas about AI agent design.

Multimodal inputs expand how users interact with software.
Agents become significantly more useful when connected to real systems and structured storage.
Well-designed tools make it easier for models to perform reliable actions.

The project demonstrated how AI can help manage everyday information by turning unstructured inputs into structured memory that can be queried later.

Future Improvements

There are many directions Domus could evolve:

Background scheduling for reminders and notifications
Calendar and email integrations
Voice input and audio responses
Automatic task prioritization
Household member roles and permissions

A future version could also support multiple groups, allowing the same system to manage not only households but also teams, shared activities, or community groups.

The long-term vision is a multimodal memory assistant that helps people coordinate real-world tasks and information through AI.

Built With

adk
fastapi
firestore
gemini
next.js
python
typescript

Submitted to

Gemini Live Agent Challenge

Created by

I designed and built the full prototype for Domus, including the frontend interface, backend API, agent logic, and multimodal processing pipeline. The system uses a Next.js frontend and a FastAPI backend integrated with Google’s Agent Development Kit and Gemini 2.5 Flash to interpret user messages and images, convert them into structured household memory, and store them in Firestore. I also implemented the memory schema, retrieval/update tools, and the demo workflow used in the project submission.

Clarissa Fonseca Chen
Victor Chen

Updates

Clarissa Fonseca Chen started this project — Mar 16, 2026 06:59 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.