Inspiration
Switching between a tutorial and the actual app you're learning kills focus. We wanted something that just lives on your screen and shows you exactly where to click like a friend looking over your shoulder.
What it does
Press ctrl + option, ask anything about what's on your screen, and Pointy answers out loud while its cursor flies to the exact element it's talking about. No windows, no alt-tabbing it lives entirely in your menu bar.
How we built it
Swift + SwiftUI for the macOS app, AssemblyAI for real-time voice transcription over WebSocket, Claude Sonnet 4.6 for vision + reasoning, Apple Speech for audio output, and a Cloudflare Worker as a proxy so no API keys ever ship in the binary.
Challenges we ran into
Getting multi-monitor coordinate mapping right Claude's response contains pixel coordinates from a screenshot, and translating those to the correct physical screen across arbitrarily positioned displays took a lot of math. Also hit a wall with ElevenLabs TTS being blocked on Cloudflare Worker IPs on the free tier, so we switched to Apple's on-device speech.
Accomplishments that we're proud of
The pointing actually feels magical. Watching the cursor arc across the screen to land on exactly the button Claude is describing that moment lands every single time someone sees it for the first time.
What we learned
Streaming matters more than speed. Claude's SSE streaming means users hear the first word before the full answer is generated that alone makes it feel instant. Also learned that on-device AI (Apple Speech) beats cloud for latency-sensitive UX.
What's next for Pointy
Multi-app memory so Pointy learns your workflow, proactive help when you look stuck, and a mode for walking through onboarding flows in any app automatically.
Built With
- anthropic
- api
- appkit
- apple
- assemblyai
- avfoundation
- claude
- cloudflare
- elevenlabs
- screencapturekit
- speech
- streaming
- swift
- swiftui
- typescript
- workers
- wrangler
- xcode
Log in or sign up for Devpost to join the conversation.