Follow Halo - Your AI Guide Inside Any Interface

An intelligent Chrome extension that guides users through complex web interfaces using Chrome's built-in AI (Gemini Nano) and cloud AI (Google Gemini), with privacy-first screen analysis and step-by-step visual guidance.

Inspiration

The idea for Follow Halo was born from a simple frustration: we're drowning in interfaces, but no one's teaching us how to use them.

Every day, we work inside dozens of SaaS tools - Notion, Figma, Stripe, Shopify, Asana, Linear, Vercel. Each has its own logic, its own hidden features, its own "right way" to do things. And when we need to learn? We're stuck with:

  • Documentation that's outdated the moment a UI update ships
  • YouTube tutorials filmed six months ago that show buttons that don't exist anymore
  • Forum threads where someone asks "how do I X?" and gets told "just read the docs"
  • AI chatbots that can explain concepts but can't point at the actual button you need to click

Meanwhile, the AI revolution is exploding. We have autonomous agents, MCP servers, Project Mariner, Comet, ChatGPT agents... all getting smarter, more integrated, more capable. But here's the thing: they still can't help you where you actually work.

Because SaaS interfaces are still the gatekeepers of everything we do. Sure, you can automate things - but most real work still happens inside visual tools: dashboards, forms, editors, checkouts, data tables.

And honestly? Maybe total automation isn't even the point.

The Core Insight

"The future isn't 'no interface.' The future is making people faster inside the interfaces they already use."

Because you, the user - you still need to see, to understand, to learn. Sometimes you don't even know what to look for. You just want to get something done, right now.

I realized this while watching a colleague struggle with a complex admin panel. She'd been staring at the screen for 20 minutes, trying to figure out where to set up a webhook. The information was right there - but she couldn't see it. No amount of GPT-5 prompting could point her to the exact dropdown menu she needed.

What if AI could guide you right there, inside the app you're using?

That's when Follow Halo was born. I wanted to create something that:

  • Respects your privacy: sensitive work shouldn't leave your device
  • Understands your screen: sees what you see, knows what version you're on
  • Guides you step-by-step: shows exactly what to click, where to type
  • Makes you better, not dependent: you learn while you work

Follow Halo isn't about replacing human action with automation. It's about augmenting human capability with intelligent, contextual guidance.

What It Does

Follow Halo is an intelligent Chrome extension that acts as your personal mentor, guiding you through any web interface with AI-powered step-by-step instructions and visual cues.

The Experience

Imagine you're working in Stripe's dashboard, trying to set up a subscription plan for the first time. You:

  1. Open Follow Halo: a side panel appears next to your browser
  2. Get AI suggestions: Halo analyzes your screen and suggests: "Create a new subscription product", "Set up webhook endpoints", "Configure payment methods"
  3. Choose your goal: you select "Create a new subscription product" or type your own goal
  4. Receive a plan: Halo searches the web for the latest Stripe documentation, then generates a step-by-step to-do list
  5. Follow visual guidance: as you hover over each task, an animated mascot character moves to show you exactly where to click
  6. Get precise highlights: when you click a task, the exact button or field lights up with a glowing shadow effect
  7. Complete the workflow: you mark tasks as done one by one, learning as you go
  8. Celebrate success: confetti animation when you finish!

Core Capabilities

Privacy-First Architecture

  • Local-first processing: Gemini Nano runs entirely on-device (no data sent externally)
  • Selective cloud usage: Only when web research is needed (user-controlled)
  • Sanitized DOM analysis: Personal data stripped before processing
  • NDA-safe: Work with sensitive tools without leaking context

Intelligent Task Planning

  • Contextual suggestions: AI understands your screen and proposes relevant actions
  • Web-enhanced planning: Cloud AI searches for up-to-date tutorials and best practices
  • Structured to-do lists: Complex workflows broken into clear, actionable steps
  • Version-aware guidance: Halo knows about UI updates and interface changes

Visual Guidance System

  • Halo Mascot: A floating Rive-animated character that moves across your screen
  • 8×8 grid positioning: Precise element location using visual grid mapping
  • Spring animations: Smooth, natural movements that guide your attention
  • Element highlighting: CSS shadow effects that make the right button impossible to miss

Flexible AI Architecture

  • Nano (Astra): Fast, private, local LLM for vision and DOM analysis
  • Cloud (Stratus): Powerful Gemini API for planning and web research
  • Configurable fallback: Choose which agent handles DOM matching
  • No forced automation: You stay in control at every step

Use Cases

Enterprise Users

  • Learn complex internal tools without exposing proprietary data
  • Replace missing team members by learning their workflows
  • Quickly onboard to new SaaS platforms (Salesforce, SAP, Workday)

Freelancers and Solopreneurs

  • Master billing, CRM, marketing tools without unpaid tutorial time
  • Switch between client tools confidently
  • Complete admin tasks faster

Students and Hobbyists

  • Explore new software without reading documentation
  • Learn industry-standard tools (Figma, Notion, Webflow)
  • Build skills through guided practice
Local AI Architecture

Follow Halo's local-first AI architecture powered by Chrome's Gemini Nano

How I Built It

Follow Halo is built as a sophisticated Chrome extension with a three-layer architecture designed to balance power, privacy, and user experience.

Architecture Overview

┌─────────────────────────────────────────────────┐
│           Background.js (Lifecycle)             │
│    • Extension toggle and injection             │
│    • Per-tab state tracking                     │
└──────────────┬────────────────┬─────────────────┘
               │                │
    Injects    │                │  Opens
               ▼                ▼
┌──────────────────────┐  ┌────────────────────────┐
│    Content.js        │  │      SidePanel         │
│  (Eyes and Hands)    │◄─┤       (Brain)          │
│                      │  │                        │
│ • Screenshot capture │  │ • Vue.js UI            │
│ • DOM manipulation   │  │ • AI orchestration     │
│ • Halo Mascot widget │  │ • Task planning        │
│ • Element highlight  │  │ • chrome.storage       │
└──────────────────────┘  └────────────────────────┘

Tech Stack

Core Technologies | Technology | Version | Purpose | |-|-|-| | Vue.js | 3.5.22 | Reactive UI framework (Composition API) | | Vite | 7.1.12 | Build tool and dev server | | Chrome APIs | Runtime | Extension lifecycle, storage, tabs, AI | | Rive | 2.32.0 | Mascot animation runtime |

AI Services | Service | Model | Purpose | |-|-|-| | Chrome Nano AI | Gemini Nano | Local on-device inference | | Google Gemini API | gemini-2.5-flash | Cloud inference and web search |

Key Dependencies

  • @google/genai - Gemini API client for cloud capabilities
  • @rive-app/webgl2 - High-performance animation runtime
  • lucide-vue-next - Beautiful, consistent icon system

Component Architecture

I designed Follow Halo with a strict separation of concerns to maintain code quality and testability:

  1. Components import View-Logic only (never hooks or services directly)
  2. View-Logic receives hooks as parameters (bridges components ↔ hooks)
  3. Hooks contain business logic (no Vue imports, pure functions)
  4. Services are stateless API wrappers (no Vue imports, reusable)

This architecture means I can:

  • Test business logic without mounting Vue components
  • Swap AI providers without touching the UI
  • Run the same logic in different contexts (extension, web app, mobile)

The AI Agent System

Follow Halo uses a sophisticated multi-agent system with 8 specialized AI agents, each with carefully crafted prompts:

System Prompts and Responsibilities

Function System Prompt Name Purpose (English) Model Used
Screen Summary screen_summary_agent Summarize visual screen content and UI context Nano
Generate Suggestions suggestion_generation_agent Generate action suggestions based on screen + user context Nano
Learning Web Search goal_web_research_agent Search the web for useful information to assist user goals Cloud
Build To-Do todo_structuring_agent Convert goal + context into a structured task - to-do format Cloud
UI Element Grid Vision ui_element_locator Find the correct UI element in the interface with a 8×8 grid Nano
Local DOM Analysis dom_analysis_local Analyze sanitized DOM locally to identify elements Nano
Cloud DOM Analysis dom_analysis_cloud Analyze DOM via Cloud instead of Nano if activated Cloud
Congratulation Message completion_response_agent Generate final confirmation message Nano

Notes: if Nano is disabled an user select Cloud only all Nano 'Model Used' will be used with the Cloud API

Each agent is a specialist. Time has been take to ensure:

  • Nano agents stay fast and privacy-focused
  • Cloud agents leverage web search effectively
  • Outputs are structured and parseable (JSON schemas when possible)
  • Personality remains consistent (helpful, clear, never condescending)

Agent Communication Flow

halo-llm-sequence-diagram

The Halo Mascot: Bringing Guidance to Life

One of the most distinctive features of Follow Halo is the animated mascot that physically shows you where to click. Here's how it works:

Animation System

  • Built with Rive (WebGL2-accelerated animation)
  • Multiple states: idle, active, processing, success
  • State machines handle smooth transitions
  • Soft bobbing animation when idle

8×8 Grid Positioning The screen is divided into 64 cells, like a chess board:

  1. VisionTool captures a screenshot and overlays the grid
  2. User hovers over a to-do task
  3. Nano AI analyzes the screenshot and returns grid coordinates: (x, y) where 0 ≤ x, y < 8
  4. Mascot smoothly animates to cell center using spring physics

Spring Animation Parameters

This creates a natural, friendly motion that draws your eye to the right place without being jarring or distracting.

Privacy and Security Implementation

Sanitization Pipeline Before any data goes to the cloud (and only when the user explicitly enables cloud features), it passes through a sanitization pipeline:

Raw DOM => Remove personal data => Structural skeleton => Cloud

What gets removed:

  • All text fields containing user input => [REDACTED]
  • Email addresses => [email]
  • Phone numbers => [phone]
  • URLs with IDs => Generic patterns (e.g., /user/:id)
  • Auth tokens, API keys => Stripped entirely

What remains:

  • Element types (<button>, <input>, <div>)
  • Class names and IDs (for matching)
  • Structural relationships (parent/child/sibling)
  • Aria labels (for accessibility context)

User Control

  • No automatic cloud fallback
  • Transparent status indicators (users see which agent is processing)
  • Opt-in cloud features (disabled by default)
  • Explicit consent during onboarding
Asana Task List

The To-Do breaking down complex workflows into actionable steps related to the current app

Development Process

Iterative Development I built Follow Halo in phases:

  1. Phase 1: Core Extension - Basic Chrome extension with side panel
  2. Phase 2: Vision System - Screenshot capture and analysis
  3. Phase 3: AI Integration - Nano API, prompt engineering
  4. Phase 4: DOM Tools - Element highlighting and manipulation
  5. Phase 5: Cloud Hybrid - Gemini API, web search, hybrid mode
  6. Phase 6: Mascot - Rive animation, grid system, spring physics
  7. Phase 7: Polish - UX refinement, error handling, onboarding

Challenges We Ran Into

Building Follow Halo came with significant technical and design challenges that pushed me to innovate and problem-solve.

1. Balancing Privacy and Capability

The Problem: Users want powerful AI guidance, but they also want privacy. How do you analyze a screen without sending sensitive data to the cloud?

The Solution:

  • Implemented a local-first architecture where Gemini Nano handles most processing
  • Created a sophisticated sanitization pipeline that strips personal data before cloud calls
  • Built a hybrid system where users choose when to use cloud (explicit opt-in)
  • Used web search as the cloud's primary function (not sensitive data processing)

The Trade-off: Local-only mode is slightly less accurate for complex DOM structures, but it's fast and completely private. For users working with sensitive internal tools or proprietary data, this trade-off is worth it.

2. DOM Matching Accuracy

The Problem: Web interfaces are incredibly diverse. A "Submit" button might be:

  • <button>Submit</button>
  • <button class="btn-primary" data-action="submit">Submit</button>
  • <div role="button" onclick="submit()">Submit</div>
  • <input type="submit" value="Submit">
  • Or even a <canvas> element with painted text!

How do you reliably find the exact element the user needs to interact with?

The Solution:

  • Multi-modal matching: Combine visual grid location (from screenshot) with DOM structure analysis
  • Sanitized DOM preserves structure while removing noise
  • Configurable agents: Users can choose Nano (fast, private) or Cloud (more accurate) for DOM matching
  • Fallback patterns: If exact match fails, find nearest semantic match
  • User feedback loop: Users can report mismatches to improve prompts

What I Learned: Accuracy improved from ~70% to ~92% after:

  • Adding ARIA labels to the matching algorithm
  • Including parent/child relationships in context
  • Using unique class names as primary identifiers
  • Teaching the AI to prefer semantic HTML over generic divs

3. Prompt Engineering for Consistency

The Problem: Large language models are probabilistic. Even with the same prompt, they can give different answers. For a guidance tool, inconsistency is frustrating.

The Solution:

  • Structured outputs: Used JSON schemas to enforce output format
  • Few-shot examples: Included 3-5 examples in each prompt
  • Validation layers: Parse outputs and fallback if invalid

4. Web Search Integration

The Problem: Gemini's knowledge cutoff means it doesn't know about recent UI changes to SaaS platforms. How do you get up-to-date guidance?

The Solution:

  • Implemented web search as a first-class feature of the Cloud agent
  • Created a two-step process:
    1. Cloud agent searches for "how to [goal] in [platform] [current year]"
    2. Cloud agent synthesizes search results into structured steps
  • Source attribution: Each step includes links to original documentation
  • Recency bias: Prioritize results from the last 3 months
Pipedrive Task Loading

Gemini Cloud is used for up-to-date Pipedrive workflows in real-time

Accomplishments That I am Proud Of

Technical Achievements

True Hybrid AI Architecture Successfully combined local (Nano) and cloud (Gemini) AI in a way that:

  • Maximizes privacy by default
  • Leverages cloud only when necessary
  • Gives users explicit control over data flow
  • Maintains performance even on slow connections

Zero Privacy Compromises

  • 100% local processing in default mode
  • Zero telemetry or tracking
  • User-controlled cloud features
  • Passed manual security audit (no unexpected network requests)

Design Achievements

The Mascot System Created a visual guidance system that's both functional and delightful:

  • Smooth spring physics animations
  • State-aware behavior (idle, thinking, celebrating)
  • Clear visual communication without text
  • Tested with 10+ users - 100% said it "made finding buttons easier"

Progressive Disclosure UX Designed the interface to be:

  • Simple by default (one button: "What do you want to do?")
  • Powerful when needed (settings, cloud mode, advanced options)
  • Non-intimidating for non-technical users
  • Transparent about what's happening ("Analyzing screen...", "Searching web...")

Polished Developer Experience The codebase is:

  • Well-architected (components/view-logic/hooks/services separation)
  • Documented (every agent prompt has explanatory comments)
  • Testable (business logic isolated from Vue)
  • Extensible (easy to add new agents or platforms)

Personal Achievements

Mastering Gemini Nano API Became proficient with Chrome's cutting-edge on-device AI:

  • Understood prompt API limitations
  • Discovered when to use local vs. cloud
  • Learn working between sidepanel and content.js

Building for Real Users Unlike toy projects, Follow Halo solves a real, daily problem:

  • People actually want this
  • Addresses a market gap
  • Could become a sustainable product

Learning Rive Animation Improve skills in Rive:

  • Mastered state machines
  • Implemented spring physics

Exploring Creative Design and Animation

Beyond technical skills, this project was a playground for creativity. I used ToonSquid to bring characters to life and experimented with Rive to create smooth, interactive animations using state machines and physics. Visual assets were crafted in Illustrator, then assembled into a dynamic and cohesive video during montage. I also worked on refining the UX and shaping the visual identity of the project by creating a consistent style, thoughtful interactions, and an experience that feels both intuitive and unique. This blend of design, motion, and user-centered thinking pushed my creative boundaries far beyond pure development.

Shopify Confetti Celebration

Halo celebrates successful workflow completion with a light gamification experience

What's Next for Follow Halo

Follow Halo is just the beginning. Turning this hackathon project into a production-ready tool that could genuinely change how people learn software.

The Ultimate Goal

Make software accessible to everyone.

Right now, only power users master complex tools. Follow Halo aims to democratize that expertise. Whether you're:

  • A student learning Figma
  • A founder setting up Stripe
  • An employee onboarding to internal tools
  • A freelancer juggling 10 different SaaS platforms

...Follow Halo should make you feel confident, capable, and never stuck.

Final Thoughts

Today, productivity is still cool again. AI agents are everywhere. But most of them still can't help you right where you work.

Follow Halo bridges that gap. It combines:

  • Powerful AI (Gemini Nano + Cloud)
  • Privacy-first design (local processing by default)
  • Delightful UX (animated mascot, smooth interactions)
  • Learning-focused approach (guide, don't automate)

Because the future isn't "no interface."

The future is making people faster inside the interfaces they already use

Follow Halo. Your AI that guides you anywhere. 🧿

Links

GitHub: Follow Halo - Project source and updates

YouTube: Follow Halo - when AI jumps into your Interface

Twitter: @YafaHodis

Halo

Built With

Share this project:

Updates