Inspiration

Switching between a tutorial and the actual app you're learning kills focus. We wanted something that just lives on your screen and shows you exactly where to click like a friend looking over your shoulder.

What it does

Press ctrl + option, ask anything about what's on your screen, and Pointy answers out loud while its cursor flies to the exact element it's talking about. No windows, no alt-tabbing it lives entirely in your menu bar.

How we built it

Swift + SwiftUI for the macOS app, AssemblyAI for real-time voice transcription over WebSocket, Claude Sonnet 4.6 for vision + reasoning, Apple Speech for audio output, and a Cloudflare Worker as a proxy so no API keys ever ship in the binary.

Challenges we ran into

Getting multi-monitor coordinate mapping right Claude's response contains pixel coordinates from a screenshot, and translating those to the correct physical screen across arbitrarily positioned displays took a lot of math. Also hit a wall with ElevenLabs TTS being blocked on Cloudflare Worker IPs on the free tier, so we switched to Apple's on-device speech.

Accomplishments that we're proud of

The pointing actually feels magical. Watching the cursor arc across the screen to land on exactly the button Claude is describing that moment lands every single time someone sees it for the first time.

What we learned

Streaming matters more than speed. Claude's SSE streaming means users hear the first word before the full answer is generated that alone makes it feel instant. Also learned that on-device AI (Apple Speech) beats cloud for latency-sensitive UX.

What's next for Pointy

Multi-app memory so Pointy learns your workflow, proactive help when you look stuck, and a mode for walking through onboarding flows in any app automatically.

Built With

+ 6 more
Share this project:

Updates