Visor — Your AI Desktop Guide
Inspiration
Modern computers are powerful—but navigating them isn’t.
Many people, especially beginners, older adults, and non-technical users, struggle with essential tasks like:
- Creating accounts
- Navigating system settings
- Installing software
- Managing files
Traditional chatbots only provide text answers.
Tutorial videos aren’t interactive.
And none of these solutions respond to your actual screen.
We wanted something fundamentally different:
An AI that sees your screen, understands your intent, and literally points to what you need to click—in real time.
That became Visor.
What It Does
Visor is an intelligent desktop assistant that visually guides users through any task on their computer.
It works by analyzing screenshots, understanding user intent, and drawing arrows, circles, and tooltips directly on the screen.
Visor has four core features:
1. Real-time Visual Guidance
Visor:
- Captures a live desktop screenshot
- Sends it (plus the user’s goal) to an LLM via OpenRouter
- Receives structured instructions
- Draws an on-screen overlay (circle/arrow/box) to highlight what to click
The user follows the guidance, then presses Done to move to the next step.
2. Conversational AI That Understands Tasks
Users can describe tasks naturally, such as:
- “Help me find my CompArch folder.”
- “How do I change my display resolution?”
- “Guide me through creating a Google account.”
Visor interprets the goal and generates a step-by-step workflow dynamically.
3. Automatic Multi-Step Progression
After each user action:
- Visor detects UI changes via screenshot differences
- Determines the next step automatically
- Continues guiding until the task is complete
No manual setup. No pre-scripted workflows.
4. Visual Overlay Engine
A cross-platform floating overlay that:
- Sits above all applications
- Renders transparent, click-through arrows and highlights
- Updates based on screen changes
- Never blocks user interactions
How We Built It
Frontend & Overlay
- Electron handles desktop packaging, global hotkeys, and multi-window rendering
- React + TypeScript power the chatbox and UI
- HTML Canvas draws precise shapes and highlights over the screen
Backend Logic
- Node.js + Electron IPC for screenshot capture, window control, and message routing
- A custom high-resolution screenshot service optimized for minimal latency
AI Engine
Using OpenRouter (GPT-4o models), Visor analyzes:
- The screenshot
- The user’s goal
- The previous step
- UI context
The model returns structured JSON including:
step_descriptionshapebounding_box
Challenges We Ran Into
- Getting accurate bounding boxes from vision models
- Designing a safe system that never clicks for the user
- Managing multi-window transparency and click-through behavior in Electron
Accomplishments We’re Proud Of
- Built a fully functional desktop assistant with Electron
- Achieved reliable screenshot → AI → overlay loops
- Created a transparent, click-through guidance system on macOS
What We Learned
- Building transparent, floating overlays on macOS
- Prompting LLMs to generate stable bounding boxes
- Engineering fast, efficient screenshot capture pipelines
What’s Next for Visor
1. Advanced AI Interaction
- On-screen OCR and text understanding
- Memory of past UI states
- Automatic flow detection (e.g., Settings → Display → Resolution)
- Ensemble models for higher bounding-box accuracy
2. Real Automation
- Optional auto-click modes with strong safety constraints
- Keyboard shortcut detection + recommendations
- System-level integrations for power users
Log in or sign up for Devpost to join the conversation.