-
-
GaudianMind Architecture Diagram
-
Detailed Architecture Diagram
-
Cloud run used to deploy the services.
-
Home page for GaudianMind UI
-
Dashboard for GaudianMind UI
-
Save A photo for memory with the detail
-
Live Assistance, that will take the audio video and GPS to assist the user
-
GITHUB readme file
-
Code snippet to show the use of live Gemini API
-
Automated deployment pipeline, where terraform will create base resources and terraform-job will create cloud run services post image built
GuardianMind
Inspiration
The idea for GuardianMind came from a personal experience. I watched my uncle begin to struggle with early Alzheimer’s, where even simple tasks like finding the right room or remembering where he was going became difficult. Those moments showed how small guidance at the right time could make a huge difference in maintaining a patient’s independence and safety.
This experience made us think: what if modern AI could act as a real-time companion that helps people navigate their surroundings when memory becomes unreliable? With recent advances in multimodal models like Gemini, it is now possible to build an assistant that can see, listen, and respond to the world in real time.
GuardianMind was created with a simple goal: build a low-cost AI assistant that helps Alzheimer’s patients stay safe, confident, and connected to their families.
What it does
GuardianMind is a real-time AI companion designed to support people with early Alzheimer’s or memory-related challenges.
The system uses camera input and voice interaction to understand a patient’s surroundings and provide contextual assistance.
Key capabilities include:
- Real-time environment understanding using camera input
- Voice interaction so patients can ask questions naturally
- Guidance and orientation assistance when users feel confused
- Personal memory bank where familiar places, objects, and people can be stored
- Emergency alerts that notify family members if the patient needs help
For example, a patient can ask:
“Where am I?”
“How do I get to the kitchen?”
“Call my daughter.”
GuardianMind analyzes the environment and responds with helpful guidance, or connects the patient with caregivers when assistance is needed.
How we built it
GuardianMind is built as a multimodal Live Agent powered by Google Gemini and deployed on Google Cloud.
System Architecture
The system consists of several components working together.
Frontend (Web Application)
A web-based interface allows caregivers and patients to:
- Upload photos of familiar places or people
- Configure emergency contacts
- Start a live Guardian session with camera and microphone access
This interface is intentionally simple so it can be used easily by elderly users.
Backend (FastAPI)
A Python FastAPI backend manages:
- Authentication and user management
- Image upload and processing
- AI orchestration with Gemini APIs
- Emergency alert workflows
AI Layer (Gemini Live API)
Gemini provides the intelligence behind the system. It processes:
- Live camera frames
- Voice input
- Context from stored memories
This allows GuardianMind to interpret scenes and provide natural conversational responses.
Memory Bank
Caregivers can upload images of familiar environments such as rooms, objects, or family members.
These images are:
- Stored in Google Cloud Storage
- Analyzed using Gemini vision capabilities
- Indexed in Firestore with descriptive metadata
This creates a personalized visual memory bank that helps the AI recognize familiar places for each patient.
Safety System
If a user expresses distress (for example saying they are lost or need help), the system can:
- Detect the emergency intent
- Retrieve caregiver contact information
- Send alerts via email with contextual details
Cloud Infrastructure
The backend runs on Google Cloud, with infrastructure provisioned using Terraform and deployments automated through GitHub Actions.
Architecture Diagram
+------------------------+ +-------------------------+ +-----------------------+
| User's Device | | Backend (FastAPI) | | Google Cloud |
| (React Web App) | | (on Cloud Run) | | Services |
+------------------------+ +-------------------------+ +-----------------------+
| | |
|--(1) HTTPS (REST API)----->| |
| (Photo Uploads) |---(2) Gemini Vision API--->| Gemini 2.5 Flash |
| |---(3) Save Metadata------->| Cloud Firestore |
| |---(4) Store Image--------->| Cloud Storage |
| | |
| | |
|<--(5) WebSocket (WSS)---->| |
| (Live Stream) | |
| |<--(6) Gemini Live API----->| Gemini Live |
| | (Audio/Video/Tools) | |
| | |
| |---(7) Function Calls------>| Firestore (Read) |
| | | Storage (Read) |
| | | Gmail API (Send) |
| | | Secret Manager |
| | |
Challenges we ran into
Building a real-time AI assistant presented several challenges.
Real-time multimodal processing
Handling live audio, video, and conversational context simultaneously required careful integration with the Gemini Live API.
Reliable scene understanding
Ensuring the AI could interpret environments meaningfully while still keeping the prototype lightweight required careful prompt design and contextual memory usage.
Designing for elderly users
Creating an interface that remains simple and intuitive for people with cognitive challenges required us to rethink typical UI patterns.
Privacy considerations
Since the system uses camera input, we needed to design the architecture in a way that minimizes unnecessary data storage and respects user privacy.
Accomplishments that we're proud of
We are proud that we were able to:
- Build a working multimodal AI companion
- Integrate real-time vision and voice interaction
- Create a personalized memory bank for patients
- Deploy a scalable backend using Google Cloud
- Demonstrate how AI can assist vulnerable populations
Most importantly, we built a prototype that shows how AI can move beyond chatbots and become a real-world assistant that supports people in everyday life.
What we learned
Through this project, we learned how powerful multimodal AI agents can be when applied to real-world problems.
Combining vision, voice, and contextual memory allows AI systems to interact with users in a much more natural way.
We also learned the importance of designing technology for accessibility, especially for elderly users. Interfaces must remain simple, clear, and supportive rather than overwhelming.
Finally, we gained valuable experience integrating Gemini Live capabilities with cloud infrastructure, enabling real-time AI experiences that can scale.
What's next for GuardianMind
GuardianMind is currently a prototype, but we see many opportunities to expand it further.
Future improvements could include:
- Indoor navigation assistance
- Integration with wearable devices or smart glasses
- Advanced distress detection using behavioral signals
- Support for multiple languages
- A caregiver dashboard for monitoring and alerts
- Improved environment recognition using richer visual context
Our long-term vision is to turn GuardianMind into a trusted AI companion that helps Alzheimer’s patients maintain independence while giving families peace of mind.
🌟 Our Guiding Principle
Technology should not only make life more efficient — it should also make life safer, more supportive, and more humane.
Built With
- cloud-run
- fastapi
- firestore
- gemini-ai
- gmail-api
- google-cloud
- googlecloud-secret-manager
- googlecloudstorage
- gps
- python
- terraform
- web-speech
- websockets
Log in or sign up for Devpost to join the conversation.