narrAIt

Inspiration

One in five college students has a disability, and they often get blocked by complex software interfaces before they even reach their actual coursework. The accessibility gap isn't just one app; it's their whole daily stack, from Canvas to Excel tasks to Lightroom editing to FAFSA forms. Furthermore, assistive AI subscriptions priced for professionals ($20–$30/mo) shut out the students on financial aid who need them most. We wanted to build an affordable, immediate, and patient companion. Our core philosophy: Guidance, not automation. Screen readers give you access to reading; NarrAIt gives you access to navigating.

What it does

NarrAIt is an AI-powered assistive macOS menu bar app that helps disabled students navigate inaccessible software. Instead of automating the mouse and doing the work for them, which leaves the student dependent, NarrAIt teaches the interface.

Hover-Explain: Hold the Option key and point at anything confusing. NarrAIt reads the display, explains the control in plain English via a streaming text bubble, and speaks it aloud.
Voice Q&A & Pointing: Hold Cmd+Option to ask a question by voice. NarrAIt answers it and drops a green visual marker exactly where you need to click next, without ever seizing control of your mouse.
Access Profiles: Switch modes for Blind/Low Vision (richer spatial descriptions, magnifier, slower speech), Dyslexia (shorter sentences), or Language Support (jargon translation).
Academic Guardrails: It operates as an assistive tool, not a cheat code. It strictly refuses to solve graded coursework (like math problems or code assignment,s) but readily explains the software used for the assignment.

How we built it

We built a native macOS app using Swift 5.9, SwiftUI, and AppKit. To keep the codebase predictable, our architecture relies on a Single Orchestrator (ActivationCoordinator) state machine where API clients and UI components never call each other directly.

Cost-Optimized Architecture: To ensure NarrAIt is affordable on a student budget, we engineered a smart routing architecture focused on LLM tool call minimization. We use Gemini 3.0 Flash as a high-speed, low-cost router. It handles hover-explanations and general voice queries instantly (~$0.003/call). It only triggers the heavier, more expensive Claude Sonnet Computer Use model when precise screen coordinate pointing is strictly necessary.

Media Pipeline: SCScreenshotManager grabs multi-display captures. We utilized Groq Whisper Large v3 for lightning-fast speech-to-text (~180ms) and MacOS Local voice for extremely fast voice responses. Inputs are handled natively via CGEventTap.

Challenges we ran into

Coordinate Translation: Claude's Computer Use returns coordinates in the submitted screenshot's pixel space. Translating those back into AppKit global screen coordinates—while accounting for multi-display layouts, secondary monitors, and Retina scaling—required precise math and rigorous testing.
Pipeline Latency: Chaining ScreenCaptureKit to Groq Whisper, then to a routing LLM (Gemini), and potentially to another LLM (Claude) before hitting a TTS engine meant battling latency at every step to maintain a real-time "companion" feel.
State Management: Orchestrating complex asynchronous flows and instantly canceling in-flight network or audio requests the millisecond a user releases a hotkey was tricky and led to race conditions until we perfected our state machine.

Accomplishments that we're proud of

Cost Optimization via Tool Call Minimization: By intelligently routing requests and only invoking Claude's Computer Use when necessary, we successfully dropped the estimated operating cost to just ~$6 a month. We proved that premium assistive tech shouldn't require a $25/month subscription. Good models like Claude Computer or Computer Use by Perplexity cost ~$200.
80% Image Cost Reduction via Resolution Scaling: Screenshots are downscaled to a max 1280px before being sent to any LLM, cutting image input tokens by over 80% on a MacBook Pro. The AI returns coordinates in downscaled pixel space — we then upscale them back using the original display dimensions to land the pointer precisely on the target at full Retina resolution.
The Academic Boundary: We successfully implemented a content-type-based refusal system rather than a brittle keyword filter. NarrAIt can intelligently distinguish between "What's the answer to this calculus question?" (Refuse) and "How do I format this heading?" (Allow).
True Guidance: We resisted the urge to automate the mouse. Getting the Cursor to land perfectly on the target without seizing control of the user's cursor proves our foundational philosophy works.

What we learned

Accessibility requires granular control. Realizing we needed distinct profiles to alter text density, TTS speed, and spatial descriptions fundamentally changed our UI approach.
Working with beta models like Claude's Computer Use requires robust error handling, especially when mapping AI-generated coordinates to physical screen pixels.
Audio handling in macOS combined with real-time API streaming is incredibly powerful when properly managed through a strict, centralized state machine.

What's next for narrAIt

Our immediate next step is turning our single-click "cursor-buddy" into a multi-step software walkthrough. We want NarrAIt to guide the user to the first click, use ScreenCaptureKit to verify the click actually happened, and then seamlessly continue to the next step until the entire task is finished. The ultimate goal is to evolve NarrAIt into a patient teaching assistant for every software workflow a student encounters.

Built With

claude
computeruse
gemini
groq
screencapturekit
swift
whisper

Submitted to

Swan Hacks Spring 2026

Created by

I worked on the cursor functionality.

Hrishikesh Uchake
I mainly worked on the UI side, especially polishing the overlay/menu bar experience, fixing cursor tracking issues, and helping wire the activation flow so the app responds smoothly when users hover over things.

Mohammed Musthafa Rafi
Rahman Abdul Rafi