Nora — Devpost story
Inspiration
Every AI product today forces a context switch. You're stuck in Photoshop, mid-slide, reading a paper — and the help lives in another tab. Open ChatGPT, screenshot, paste, switch back. The friction kills most of the use cases.
Existing answers hit two extremes: chat-only overlays (alt-tab to ask) and agents that take over your mouse (Operator, Computer Use). Both break flow. We wanted a middle ground:
AI points, human clicks. You learn the tool while it helps.
What it does
Hold ⌃Q anywhere on your Mac. Pick one of three things — they all share one memory.
- Coach — points at the next click target while you keep your real mouse. Learn any app by doing it yourself.
- Compose — type "send the participant waiver to imad" and it drafts the email, attaches the file, and sends. No Gmail tab, no compose button.
- Chat — ask anything about what's on your screen right now, or recall something you did last week.
The three surfaces share working memory: people you've emailed, file aliases, style preferences per recipient, skills you've practiced. The second time you do something is faster than the first.
How we built it
| Layer | Choice |
|---|---|
| Language | Swift 6 (strict concurrency, @MainActor everywhere) |
| UI | SwiftUI + AppKit (transparent always-on-top NSPanels, click-through overlays) |
| Hotkey | Carbon RegisterEventHotKey |
| Screen capture | ScreenCaptureKit with sharingType = .none so Nora never sees its own UI |
| Storage | SQLite via GRDB |
| Vision | Anthropic Computer Use beta on claude-sonnet-4-6 |
| Detectors | claude-haiku-4-5 (intent, glossary, preference parsing) |
| Email send | Composio v3 /api/v3/tools/execute/GMAIL_SEND_EMAIL |
| Email attachments | Composio v3 3-step presigned-URL flow (/files/upload/request → PUT → s3key) |
| Image transforms | Cloudinary unsigned upload + URL transforms (e_upscale) |
| Persistent memory | Backboard memories + threads/messages with memory: "Auto" |
| Security audit | Polarity Paragon — five [paragon-fix] commits merged |
Coach drives a per-tick vision loop with four prompt-cache breakpoints (system / tools / static text / initial image URL), keeping ~3,300 tokens cached and ~600 tokens new per tick. Compose bypasses vision entirely — it routes the user's one-line goal through a Haiku detector that returns a structured FollowUp (intent, recipient, subject, body), then resolves the recipient against a local SQLite people-cache (sub-ms hot path) backed by Backboard for cold starts. Chat's regex escalator routes recall queries to Backboard's auto-RAG and screen-aware questions to Anthropic vision.
Challenges we ran into
- ScreenCaptureKit + transparent overlays. Making Nora never see its own UI required
sharingType = .noneon every overlay window — a flag that's barely documented and silently ignored if you set it on the wrong window class. - Self-signed code signing. Without a stable signing identity, every rebuild re-prompts Accessibility + Screen Recording. We shipped a "Nora Local Signing" cert in the login keychain so TCC grants survive incremental builds.
- Composio's attachment protocol isn't in the public REST docs. The 3-step presigned-URL flow (request → PUT to S3 → reference
s3keyin the action body) only shows up in/api/v3/openapi.json. Reading the spec was faster than waiting on docs. - Backboard's
chat()silently returned base-model answers until we sentmemory: "Auto"alongsideassistant_id. The default is"off". One line of payload, an hour of confusion. - Per-tick SQLite write on the
@MainActorhot path was blocking Coach for the duration of every disk write. Paragon's audit caught it; the fix was a single utility-priorityTask.detached. - Two-pass click resolution. Toolbar icons (eyedropper, single-letter labels) were too small for the model to coordinate-pin. We added a
Zoom: x,y,w,hinstruction the model can emit, fed it through Cloudinary'se_upscaleURL transform, and re-asked on the upscaled crop. Coordinates map back into the original image space.
Accomplishments that we're proud of
- Three AI surfaces sharing one memory. "send hi to imad" the second time skips clarification entirely — sub-millisecond local cache hit, transparent Backboard fallback for cross-device.
- Real Gmail attachments delivered server-side, not links pasted into the body — same protocol the Composio Python/TS SDKs use under the hood.
- Two-pass zoom for tiny click targets. Vision returns
Zoom: 1100,180,80,28→ Cloudinary upscale 4× → second pass nails the coordinate. - Steady-state Coach tick: ~4,000 tokens, ~$0.005, ~3.5–4.5s end-to-end.
- Five real security fixes from Polarity Paragon's review — including a path-traversal in
BackboardClient.assistantId, a force-unwrap crash inCloudinaryClient.cloudName, and the SQLite-on-main-thread Coach issue. Every commit body lists severity, file:line, and the "why." - Per-recipient email style preferences with contradiction resolution — "for imad, include occasional typos" coexists with "always sign Sarah's emails 'Cheers'"; a Haiku reconciler replaces directly contradicting rules but keeps harmonious ones.
What we learned
- Vision-driving-your-mouse and chat-only Q&A are both wrong defaults. The middle ground (point, don't click) is a better product for most cases — users want to learn the tool, not be locked out of it.
- Memory only works if you flip the right switch.
memory: "Auto"turned Chat from "generic ChatGPT in a side panel" into "answers from your own history." That single field was the difference between a memory feature and the illusion of one. - Prompt caching pays for itself fast. Four breakpoints kept ~3,300 tokens cached per Coach tick — within the first session the cumulative savings easily covered the first-tick cache-write cost.
- A real security audit catches things you'd never find by re-reading your own code. Five HIGH/MEDIUM findings in a project this small surprised us.
- Regex first, LLM second. Path extraction, recipient hints, and chat recall escalation are all regex-driven. Faster than Haiku, deterministic, free.
What's next for Nora
- More direct actions in Compose — calendar invites, Slack DMs, Notion pages, Linear issues. The detector + preview-then-confirm pattern generalizes; we just need more Composio actions wired.
- Cross-device memory. Backboard already stores everything — picking up a half-finished session on a different Mac is mostly UI.
- Voice mode. Was on the original plan; we cut it to fit the demo. Push-to-talk + Whisper STT + a TTS reply on the same memory backbone.
- Adaptive Coach. Detect when the user did something different from what was suggested and adjust the next pointer instead of re-pointing at the same target.
- Skill-tree-driven onboarding. The constellation already auto-populates from Coach completions; the next step is using it as a "what should I learn next?" surface.
Built With
- anthropic
- appkit
- backboard
- claude
- cloudinary
- composio
- macos
- polarity
- sqlite
- swift
- xcode
Log in or sign up for Devpost to join the conversation.