Inspiration

Modern computers are powerful, but interacting with them is still largely manual — opening files, searching through folders, managing apps, and understanding what is happening on the screen all require constant user effort.

We were inspired by the idea of a true desktop AI assistant — something that doesn’t just answer questions, but can actually see, understand, and help with your computer in real time.

Projects like Microsoft Copilot and ChatGPT show how powerful AI can be, but most assistants are still limited to chat interfaces.

We wanted to build something more interactive: an intelligent desktop companion that can understand your screen, control apps, find files, and assist you directly while you work.

That idea became Sentri.

What it does

Sentri is an AI-powered desktop assistant that lives directly on your screen as a floating companion.

Key features include:

Screen Understanding Sentri can analyze screenshots around your cursor and explain what is happening on the screen using vision AI.

File Search Assistant Quickly locate files on your system using natural language.

Desktop Control Sentri can perform actions like clicking, minimizing, maximizing, and closing windows.

AI Chat Interface Users can ask questions or request help directly from the assistant.

PDF Summarization Sentri can read and summarize documents.

Floating Assistant UI A lightweight draggable desktop mascot that stays accessible without interrupting your workflow.

The assistant is powered by Google Gemini, specifically Gemini 2.5 Flash‑Lite, allowing fast reasoning and multimodal capabilities

How we built it

Sentri was built using a modular architecture designed for extensibility.

Core components

Desktop Interface

Built with PySide6 for the floating UI and control center.

Includes animated mascot states and chat interface.

AI Brain

Powered by Google Gemini using the Gemini API.

Handles reasoning, natural language understanding, and screen analysis.

Vision System

Screenshots captured using system tools.

Images processed with Gemini Vision to explain screen content.

File Intelligence

Semantic search implemented with SentenceTransformers using the all‑MiniLM‑L6‑v2 embedding model.

Automation Tools

Mouse automation and window control implemented through Python-based desktop interaction tools.

Architecture

UI (Floating Assistant) ↓ Controller ↓ Brain System ↓ Tools (File Search | Screen Explain | Automation | PDF Reader) ↓ Gemini AI

This architecture allows new tools to be easily added to the assistant.

Challenges we ran into

Building a real desktop assistant introduced several technical challenges:

Operating System Restrictions Modern Linux environments (especially Wayland) limit screen capture and automation features, which required workarounds for screenshot capture and mouse control.

API Limitations Working within API quotas while developing an AI-powered system required optimizing requests and selecting lightweight models.

Real-time UI Integration Ensuring the AI assistant could run without freezing the interface required implementing threaded communication between the UI and the AI brain.

Multimodal Processing Combining text interaction, screen understanding, and system control into a single pipeline required careful architectural design

Accomplishments that we're proud of

Some key achievements during development include:

Building a fully functional floating AI desktop assistant

Implementing screen understanding with vision AI

Creating a modular AI tool architecture

Integrating semantic document and file search

Designing a clean interactive UI with animations

Successfully connecting the assistant to Google Gemini AI

Most importantly, Sentri demonstrates that AI assistants can move beyond chat and become real productivity companions on the desktop.

What we learned

Through building Sentri we learned:

How to design AI tool architectures

How to integrate multimodal AI systems (vision + text)

How to build responsive desktop applications with threaded AI processing

The challenges of building AI systems that interact with operating systems

How to optimize AI usage under real-world API limitations

This project helped us understand what it takes to build practical AI assistants rather than just chatbots.

What's next for Sentri

Sentri is only the beginning. Future improvements include:

Voice interaction

Context-aware screen monitoring

Task automation workflows

Learning user preferences

Smart reminders and scheduling

Deeper OS integration

Cross-platform support (Windows, Linux, macOS)

Our long-term goal is to evolve Sentri into a true AI desktop companion that can understand your work environment and proactively assist you.

Built With

Share this project:

Updates