GaudianMind Architecture Diagram
Detailed Architecture Diagram
Cloud run used to deploy the services.
Home page for GaudianMind UI
Dashboard for GaudianMind UI
Save A photo for memory with the detail
Live Assistance, that will take the audio video and GPS to assist the user
GITHUB readme file
Code snippet to show the use of live Gemini API
Automated deployment pipeline, where terraform will create base resources and terraform-job will create cloud run services post image built

GuardianMind

Inspiration

The idea for GuardianMind came from a personal experience. I watched my uncle begin to struggle with early Alzheimer’s, where even simple tasks like finding the right room or remembering where he was going became difficult. Those moments showed how small guidance at the right time could make a huge difference in maintaining a patient’s independence and safety.

This experience made us think: what if modern AI could act as a real-time companion that helps people navigate their surroundings when memory becomes unreliable? With recent advances in multimodal models like Gemini, it is now possible to build an assistant that can see, listen, and respond to the world in real time.

GuardianMind was created with a simple goal: build a low-cost AI assistant that helps Alzheimer’s patients stay safe, confident, and connected to their families.

What it does

GuardianMind is a real-time AI companion designed to support people with early Alzheimer’s or memory-related challenges.

The system uses camera input and voice interaction to understand a patient’s surroundings and provide contextual assistance.

Key capabilities include:

Real-time environment understanding using camera input
Voice interaction so patients can ask questions naturally
Guidance and orientation assistance when users feel confused
Personal memory bank where familiar places, objects, and people can be stored
Emergency alerts that notify family members if the patient needs help

For example, a patient can ask:

“Where am I?”
“How do I get to the kitchen?”
“Call my daughter.”

GuardianMind analyzes the environment and responds with helpful guidance, or connects the patient with caregivers when assistance is needed.

How we built it

GuardianMind is built as a multimodal Live Agent powered by Google Gemini and deployed on Google Cloud.

System Architecture

The system consists of several components working together.

Frontend (Web Application)

A web-based interface allows caregivers and patients to:

Upload photos of familiar places or people
Configure emergency contacts
Start a live Guardian session with camera and microphone access

This interface is intentionally simple so it can be used easily by elderly users.

Backend (FastAPI)

A Python FastAPI backend manages:

Authentication and user management
Image upload and processing
AI orchestration with Gemini APIs
Emergency alert workflows

AI Layer (Gemini Live API)

Gemini provides the intelligence behind the system. It processes:

Live camera frames
Voice input
Context from stored memories

This allows GuardianMind to interpret scenes and provide natural conversational responses.

Memory Bank

Caregivers can upload images of familiar environments such as rooms, objects, or family members.

These images are:

Stored in Google Cloud Storage
Analyzed using Gemini vision capabilities
Indexed in Firestore with descriptive metadata

This creates a personalized visual memory bank that helps the AI recognize familiar places for each patient.

Safety System

If a user expresses distress (for example saying they are lost or need help), the system can:

Detect the emergency intent
Retrieve caregiver contact information
Send alerts via email with contextual details

Cloud Infrastructure

The backend runs on Google Cloud, with infrastructure provisioned using Terraform and deployments automated through GitHub Actions.

Architecture Diagram

+------------------------+      +-------------------------+      +-----------------------+
|   User's Device        |      |   Backend (FastAPI)     |      |   Google Cloud        |
| (React Web App)        |      |   (on Cloud Run)        |      |   Services            |
+------------------------+      +-------------------------+      +-----------------------+
           |                           |                            |
           |--(1) HTTPS (REST API)----->|                            |
           |     (Photo Uploads)       |---(2) Gemini Vision API--->|   Gemini 2.5 Flash    |
           |                           |---(3) Save Metadata------->|   Cloud Firestore     |
           |                           |---(4) Store Image--------->|   Cloud Storage       |
           |                           |                            |
           |                           |                            |
           |<--(5) WebSocket (WSS)---->|                            |
           |      (Live Stream)        |                            |
           |                           |<--(6) Gemini Live API----->|   Gemini Live         |
           |                           |      (Audio/Video/Tools)   |                       |
           |                           |                            |
           |                           |---(7) Function Calls------>|   Firestore (Read)    |
           |                           |                            |   Storage (Read)      |
           |                           |                            |   Gmail API (Send)    |
           |                           |                            |   Secret Manager      |
           |                           |                            |

Challenges we ran into

Building a real-time AI assistant presented several challenges.

Real-time multimodal processing
Handling live audio, video, and conversational context simultaneously required careful integration with the Gemini Live API.

Reliable scene understanding
Ensuring the AI could interpret environments meaningfully while still keeping the prototype lightweight required careful prompt design and contextual memory usage.

Designing for elderly users
Creating an interface that remains simple and intuitive for people with cognitive challenges required us to rethink typical UI patterns.

Privacy considerations
Since the system uses camera input, we needed to design the architecture in a way that minimizes unnecessary data storage and respects user privacy.

Accomplishments that we're proud of

We are proud that we were able to:

Build a working multimodal AI companion
Integrate real-time vision and voice interaction
Create a personalized memory bank for patients
Deploy a scalable backend using Google Cloud
Demonstrate how AI can assist vulnerable populations

Most importantly, we built a prototype that shows how AI can move beyond chatbots and become a real-world assistant that supports people in everyday life.

What we learned

Through this project, we learned how powerful multimodal AI agents can be when applied to real-world problems.

Combining vision, voice, and contextual memory allows AI systems to interact with users in a much more natural way.

We also learned the importance of designing technology for accessibility, especially for elderly users. Interfaces must remain simple, clear, and supportive rather than overwhelming.

Finally, we gained valuable experience integrating Gemini Live capabilities with cloud infrastructure, enabling real-time AI experiences that can scale.

What's next for GuardianMind

GuardianMind is currently a prototype, but we see many opportunities to expand it further.

Future improvements could include:

Indoor navigation assistance
Integration with wearable devices or smart glasses
Advanced distress detection using behavioral signals
Support for multiple languages
A caregiver dashboard for monitoring and alerts
Improved environment recognition using richer visual context

Our long-term vision is to turn GuardianMind into a trusted AI companion that helps Alzheimer’s patients maintain independence while giving families peace of mind.