Inspiration
Modern computers are powerful, but interacting with them is still largely manual — opening files, searching through folders, managing apps, and understanding what is happening on the screen all require constant user effort.
We were inspired by the idea of a true desktop AI assistant — something that doesn’t just answer questions, but can actually see, understand, and help with your computer in real time.
Projects like Microsoft Copilot and ChatGPT show how powerful AI can be, but most assistants are still limited to chat interfaces.
We wanted to build something more interactive: an intelligent desktop companion that can understand your screen, control apps, find files, and assist you directly while you work.
That idea became Sentri.
What it does
Sentri is an AI-powered desktop assistant that lives directly on your screen as a floating companion.
Key features include:
Screen Understanding Sentri can analyze screenshots around your cursor and explain what is happening on the screen using vision AI.
File Search Assistant Quickly locate files on your system using natural language.
Desktop Control Sentri can perform actions like clicking, minimizing, maximizing, and closing windows.
AI Chat Interface Users can ask questions or request help directly from the assistant.
PDF Summarization Sentri can read and summarize documents.
Floating Assistant UI A lightweight draggable desktop mascot that stays accessible without interrupting your workflow.
The assistant is powered by Google Gemini, specifically Gemini 2.5 Flash‑Lite, allowing fast reasoning and multimodal capabilities
How we built it
Sentri was built using a modular architecture designed for extensibility.
Core components
Desktop Interface
Built with PySide6 for the floating UI and control center.
Includes animated mascot states and chat interface.
AI Brain
Powered by Google Gemini using the Gemini API.
Handles reasoning, natural language understanding, and screen analysis.
Vision System
Screenshots captured using system tools.
Images processed with Gemini Vision to explain screen content.
File Intelligence
Semantic search implemented with SentenceTransformers using the all‑MiniLM‑L6‑v2 embedding model.
Automation Tools
Mouse automation and window control implemented through Python-based desktop interaction tools.
Architecture
UI (Floating Assistant) ↓ Controller ↓ Brain System ↓ Tools (File Search | Screen Explain | Automation | PDF Reader) ↓ Gemini AI
This architecture allows new tools to be easily added to the assistant.
Challenges we ran into
Building a real desktop assistant introduced several technical challenges:
Operating System Restrictions Modern Linux environments (especially Wayland) limit screen capture and automation features, which required workarounds for screenshot capture and mouse control.
API Limitations Working within API quotas while developing an AI-powered system required optimizing requests and selecting lightweight models.
Real-time UI Integration Ensuring the AI assistant could run without freezing the interface required implementing threaded communication between the UI and the AI brain.
Multimodal Processing Combining text interaction, screen understanding, and system control into a single pipeline required careful architectural design
Accomplishments that we're proud of
Some key achievements during development include:
Building a fully functional floating AI desktop assistant
Implementing screen understanding with vision AI
Creating a modular AI tool architecture
Integrating semantic document and file search
Designing a clean interactive UI with animations
Successfully connecting the assistant to Google Gemini AI
Most importantly, Sentri demonstrates that AI assistants can move beyond chat and become real productivity companions on the desktop.
What we learned
Through building Sentri we learned:
How to design AI tool architectures
How to integrate multimodal AI systems (vision + text)
How to build responsive desktop applications with threaded AI processing
The challenges of building AI systems that interact with operating systems
How to optimize AI usage under real-world API limitations
This project helped us understand what it takes to build practical AI assistants rather than just chatbots.
What's next for Sentri
Sentri is only the beginning. Future improvements include:
Voice interaction
Context-aware screen monitoring
Task automation workflows
Learning user preferences
Smart reminders and scheduling
Deeper OS integration
Cross-platform support (Windows, Linux, macOS)
Our long-term goal is to evolve Sentri into a true AI desktop companion that can understand your work environment and proactively assist you.
Log in or sign up for Devpost to join the conversation.