Visor — Your AI Desktop Guide

Inspiration

Modern computers are powerful—but navigating them isn’t.
Many people, especially beginners, older adults, and non-technical users, struggle with essential tasks like:

  • Creating accounts
  • Navigating system settings
  • Installing software
  • Managing files

Traditional chatbots only provide text answers.
Tutorial videos aren’t interactive.
And none of these solutions respond to your actual screen.

We wanted something fundamentally different:

An AI that sees your screen, understands your intent, and literally points to what you need to click—in real time.

That became Visor.


What It Does

Visor is an intelligent desktop assistant that visually guides users through any task on their computer.
It works by analyzing screenshots, understanding user intent, and drawing arrows, circles, and tooltips directly on the screen.

Visor has four core features:


1. Real-time Visual Guidance

Visor:

  • Captures a live desktop screenshot
  • Sends it (plus the user’s goal) to an LLM via OpenRouter
  • Receives structured instructions
  • Draws an on-screen overlay (circle/arrow/box) to highlight what to click

The user follows the guidance, then presses Done to move to the next step.


2. Conversational AI That Understands Tasks

Users can describe tasks naturally, such as:

  • “Help me find my CompArch folder.”
  • “How do I change my display resolution?”
  • “Guide me through creating a Google account.”

Visor interprets the goal and generates a step-by-step workflow dynamically.


3. Automatic Multi-Step Progression

After each user action:

  • Visor detects UI changes via screenshot differences
  • Determines the next step automatically
  • Continues guiding until the task is complete

No manual setup. No pre-scripted workflows.


4. Visual Overlay Engine

A cross-platform floating overlay that:

  • Sits above all applications
  • Renders transparent, click-through arrows and highlights
  • Updates based on screen changes
  • Never blocks user interactions

How We Built It

Frontend & Overlay

  • Electron handles desktop packaging, global hotkeys, and multi-window rendering
  • React + TypeScript power the chatbox and UI
  • HTML Canvas draws precise shapes and highlights over the screen

Backend Logic

  • Node.js + Electron IPC for screenshot capture, window control, and message routing
  • A custom high-resolution screenshot service optimized for minimal latency

AI Engine

Using OpenRouter (GPT-4o models), Visor analyzes:

  • The screenshot
  • The user’s goal
  • The previous step
  • UI context

The model returns structured JSON including:

  • step_description
  • shape
  • bounding_box

Challenges We Ran Into

  • Getting accurate bounding boxes from vision models
  • Designing a safe system that never clicks for the user
  • Managing multi-window transparency and click-through behavior in Electron

Accomplishments We’re Proud Of

  • Built a fully functional desktop assistant with Electron
  • Achieved reliable screenshot → AI → overlay loops
  • Created a transparent, click-through guidance system on macOS

What We Learned

  • Building transparent, floating overlays on macOS
  • Prompting LLMs to generate stable bounding boxes
  • Engineering fast, efficient screenshot capture pipelines

What’s Next for Visor

1. Advanced AI Interaction

  • On-screen OCR and text understanding
  • Memory of past UI states
  • Automatic flow detection (e.g., Settings → Display → Resolution)
  • Ensemble models for higher bounding-box accuracy

2. Real Automation

  • Optional auto-click modes with strong safety constraints
  • Keyboard shortcut detection + recommendations
  • System-level integrations for power users

Built With

Share this project:

Updates