Sentri

Inspiration

Most AI assistants today are limited to a text box. You type a question, get a response, and the interaction ends there. We were inspired to explore a more natural way of interacting with computers — one where an AI assistant can see what’s on your screen, hear your voice, and help you perform actions in real time.

Sentri was created to demonstrate how AI agents can move beyond static chatbots and become interactive desktop companions that assist users directly within their workflow.

What it does

Sentri is a multimodal desktop AI agent that helps users interact with their computer using natural language, voice, and visual understanding.

Key capabilities include:

🎤 Voice Interaction – Users can talk to Sentri and receive spoken responses.

👁 Screen Understanding – Sentri can capture a screenshot and analyze what is on the screen using Gemini multimodal capabilities.

🖱 UI Interaction – Sentri can perform actions such as clicking on the screen or controlling windows.

📂 File Search – Users can quickly locate files on their system through natural language commands.

🧠 Context-Aware AI – Sentri uses a structured memory system to maintain context and improve responses.

Together, these features create an assistant that can see, hear, speak, and act.

How we built it

Sentri was built as a modular desktop AI system using Python.

The architecture consists of several key components:

Desktop UI built with PySide6

Brain Controller responsible for routing user requests

Tool System for handling actions like file search, window control, and screenshot analysis

Memory System that stores structured identity and context data

Multimodal AI reasoning powered by Gemini

Speech-to-text using faster-whisper

Text-to-speech using pyttsx3

The system follows a layered architecture:

User ↓ Sentri UI ↓ Controller ↓ Brain Layer ↓ Tools + Memory ↓ Gemini Multimodal AI

This modular design allows Sentri to combine reasoning, memory, and action-based tools in a single agent.

Challenges we ran into

Building a real-time desktop AI agent introduced several challenges:

Operating system restrictions (especially on Wayland) limited certain automation features like mouse control and screen capture.

Integrating multiple modalities (voice, vision, and automation) into a unified agent architecture required careful design.

Managing API limits and latency while maintaining responsive interactions was another challenge.

Designing a stable controller architecture that could route requests between tools and the AI model without crashing required multiple iterations.

These challenges pushed us to refine the architecture and build a more robust tool system.

Accomplishments that we're proud of

We are proud that Sentri evolved from a simple chatbot idea into a fully interactive desktop AI agent.

Key achievements include:

Successfully integrating vision, voice, and automation into one assistant.

Designing a memory-aware architecture that maintains structured user information.

Building a modular tool system that allows the agent to extend its capabilities.

Creating a working prototype that can analyze the screen and perform real actions.

Most importantly, Sentri demonstrates a new interaction paradigm where AI becomes an active participant in the user’s workflow.

What we learned

Throughout this project we learned:

How to design agent architectures that combine reasoning and tool usage.

The importance of structured memory and context management for AI assistants.

Practical challenges of building multimodal systems that combine vision, voice, and automation.

How to integrate modern AI models into real applications rather than simple demos.

This project also reinforced how powerful multimodal AI can be when combined with real-world interfaces.

What's next for Sentri

Future development of Sentri could expand its capabilities significantly.

Planned improvements include:

Continuous real-time screen awareness

More advanced voice interaction and interruption handling

Deeper application control and automation

A smarter long-term memory system

Integration with cloud services for collaborative workflows

Ultimately, the goal is to evolve Sentri into a true personal AI agent that can understand, assist, and collaborate with users directly within their digital environment.

Built With

api
gemini
pyside6
python

Updates

Raj Raj Ranjan started this project — Mar 15, 2026 02:28 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.