Inspiration
Most AI assistants today are limited to a text box. You type a question, get a response, and the interaction ends there. We were inspired to explore a more natural way of interacting with computers β one where an AI assistant can see whatβs on your screen, hear your voice, and help you perform actions in real time.
Sentri was created to demonstrate how AI agents can move beyond static chatbots and become interactive desktop companions that assist users directly within their workflow.
What it does
Sentri is a multimodal desktop AI agent that helps users interact with their computer using natural language, voice, and visual understanding.
Key capabilities include:
π€ Voice Interaction β Users can talk to Sentri and receive spoken responses.
π Screen Understanding β Sentri can capture a screenshot and analyze what is on the screen using Gemini multimodal capabilities.
π± UI Interaction β Sentri can perform actions such as clicking on the screen or controlling windows.
π File Search β Users can quickly locate files on their system through natural language commands.
π§ Context-Aware AI β Sentri uses a structured memory system to maintain context and improve responses.
Together, these features create an assistant that can see, hear, speak, and act.
How we built it
Sentri was built as a modular desktop AI system using Python.
The architecture consists of several key components:
Desktop UI built with PySide6
Brain Controller responsible for routing user requests
Tool System for handling actions like file search, window control, and screenshot analysis
Memory System that stores structured identity and context data
Multimodal AI reasoning powered by Gemini
Speech-to-text using faster-whisper
Text-to-speech using pyttsx3
The system follows a layered architecture:
User β Sentri UI β Controller β Brain Layer β Tools + Memory β Gemini Multimodal AI
This modular design allows Sentri to combine reasoning, memory, and action-based tools in a single agent.
Challenges we ran into
Building a real-time desktop AI agent introduced several challenges:
Operating system restrictions (especially on Wayland) limited certain automation features like mouse control and screen capture.
Integrating multiple modalities (voice, vision, and automation) into a unified agent architecture required careful design.
Managing API limits and latency while maintaining responsive interactions was another challenge.
Designing a stable controller architecture that could route requests between tools and the AI model without crashing required multiple iterations.
These challenges pushed us to refine the architecture and build a more robust tool system.
Accomplishments that we're proud of
We are proud that Sentri evolved from a simple chatbot idea into a fully interactive desktop AI agent.
Key achievements include:
Successfully integrating vision, voice, and automation into one assistant.
Designing a memory-aware architecture that maintains structured user information.
Building a modular tool system that allows the agent to extend its capabilities.
Creating a working prototype that can analyze the screen and perform real actions.
Most importantly, Sentri demonstrates a new interaction paradigm where AI becomes an active participant in the userβs workflow.
What we learned
Throughout this project we learned:
How to design agent architectures that combine reasoning and tool usage.
The importance of structured memory and context management for AI assistants.
Practical challenges of building multimodal systems that combine vision, voice, and automation.
How to integrate modern AI models into real applications rather than simple demos.
This project also reinforced how powerful multimodal AI can be when combined with real-world interfaces.
What's next for Sentri
Future development of Sentri could expand its capabilities significantly.
Planned improvements include:
Continuous real-time screen awareness
More advanced voice interaction and interruption handling
Deeper application control and automation
A smarter long-term memory system
Integration with cloud services for collaborative workflows
Ultimately, the goal is to evolve Sentri into a true personal AI agent that can understand, assist, and collaborate with users directly within their digital environment.
Log in or sign up for Devpost to join the conversation.