Nora — Devpost story

Inspiration

Every AI product today forces a context switch. You're stuck in Photoshop, mid-slide, reading a paper — and the help lives in another tab. Open ChatGPT, screenshot, paste, switch back. The friction kills most of the use cases.

Existing answers hit two extremes: chat-only overlays (alt-tab to ask) and agents that take over your mouse (Operator, Computer Use). Both break flow. We wanted a middle ground:

AI points, human clicks. You learn the tool while it helps.

What it does

Hold ⌃Q anywhere on your Mac. Pick one of three things — they all share one memory.

  • Coach — points at the next click target while you keep your real mouse. Learn any app by doing it yourself.
  • Compose — type "send the participant waiver to imad" and it drafts the email, attaches the file, and sends. No Gmail tab, no compose button.
  • Chat — ask anything about what's on your screen right now, or recall something you did last week.

The three surfaces share working memory: people you've emailed, file aliases, style preferences per recipient, skills you've practiced. The second time you do something is faster than the first.

How we built it

Layer Choice
Language Swift 6 (strict concurrency, @MainActor everywhere)
UI SwiftUI + AppKit (transparent always-on-top NSPanels, click-through overlays)
Hotkey Carbon RegisterEventHotKey
Screen capture ScreenCaptureKit with sharingType = .none so Nora never sees its own UI
Storage SQLite via GRDB
Vision Anthropic Computer Use beta on claude-sonnet-4-6
Detectors claude-haiku-4-5 (intent, glossary, preference parsing)
Email send Composio v3 /api/v3/tools/execute/GMAIL_SEND_EMAIL
Email attachments Composio v3 3-step presigned-URL flow (/files/upload/request → PUT → s3key)
Image transforms Cloudinary unsigned upload + URL transforms (e_upscale)
Persistent memory Backboard memories + threads/messages with memory: "Auto"
Security audit Polarity Paragon — five [paragon-fix] commits merged

Coach drives a per-tick vision loop with four prompt-cache breakpoints (system / tools / static text / initial image URL), keeping ~3,300 tokens cached and ~600 tokens new per tick. Compose bypasses vision entirely — it routes the user's one-line goal through a Haiku detector that returns a structured FollowUp (intent, recipient, subject, body), then resolves the recipient against a local SQLite people-cache (sub-ms hot path) backed by Backboard for cold starts. Chat's regex escalator routes recall queries to Backboard's auto-RAG and screen-aware questions to Anthropic vision.

Challenges we ran into

  • ScreenCaptureKit + transparent overlays. Making Nora never see its own UI required sharingType = .none on every overlay window — a flag that's barely documented and silently ignored if you set it on the wrong window class.
  • Self-signed code signing. Without a stable signing identity, every rebuild re-prompts Accessibility + Screen Recording. We shipped a "Nora Local Signing" cert in the login keychain so TCC grants survive incremental builds.
  • Composio's attachment protocol isn't in the public REST docs. The 3-step presigned-URL flow (request → PUT to S3 → reference s3key in the action body) only shows up in /api/v3/openapi.json. Reading the spec was faster than waiting on docs.
  • Backboard's chat() silently returned base-model answers until we sent memory: "Auto" alongside assistant_id. The default is "off". One line of payload, an hour of confusion.
  • Per-tick SQLite write on the @MainActor hot path was blocking Coach for the duration of every disk write. Paragon's audit caught it; the fix was a single utility-priority Task.detached.
  • Two-pass click resolution. Toolbar icons (eyedropper, single-letter labels) were too small for the model to coordinate-pin. We added a Zoom: x,y,w,h instruction the model can emit, fed it through Cloudinary's e_upscale URL transform, and re-asked on the upscaled crop. Coordinates map back into the original image space.

Accomplishments that we're proud of

  • Three AI surfaces sharing one memory. "send hi to imad" the second time skips clarification entirely — sub-millisecond local cache hit, transparent Backboard fallback for cross-device.
  • Real Gmail attachments delivered server-side, not links pasted into the body — same protocol the Composio Python/TS SDKs use under the hood.
  • Two-pass zoom for tiny click targets. Vision returns Zoom: 1100,180,80,28 → Cloudinary upscale 4× → second pass nails the coordinate.
  • Steady-state Coach tick: ~4,000 tokens, ~$0.005, ~3.5–4.5s end-to-end.
  • Five real security fixes from Polarity Paragon's review — including a path-traversal in BackboardClient.assistantId, a force-unwrap crash in CloudinaryClient.cloudName, and the SQLite-on-main-thread Coach issue. Every commit body lists severity, file:line, and the "why."
  • Per-recipient email style preferences with contradiction resolution"for imad, include occasional typos" coexists with "always sign Sarah's emails 'Cheers'"; a Haiku reconciler replaces directly contradicting rules but keeps harmonious ones.

What we learned

  • Vision-driving-your-mouse and chat-only Q&A are both wrong defaults. The middle ground (point, don't click) is a better product for most cases — users want to learn the tool, not be locked out of it.
  • Memory only works if you flip the right switch. memory: "Auto" turned Chat from "generic ChatGPT in a side panel" into "answers from your own history." That single field was the difference between a memory feature and the illusion of one.
  • Prompt caching pays for itself fast. Four breakpoints kept ~3,300 tokens cached per Coach tick — within the first session the cumulative savings easily covered the first-tick cache-write cost.
  • A real security audit catches things you'd never find by re-reading your own code. Five HIGH/MEDIUM findings in a project this small surprised us.
  • Regex first, LLM second. Path extraction, recipient hints, and chat recall escalation are all regex-driven. Faster than Haiku, deterministic, free.

What's next for Nora

  • More direct actions in Compose — calendar invites, Slack DMs, Notion pages, Linear issues. The detector + preview-then-confirm pattern generalizes; we just need more Composio actions wired.
  • Cross-device memory. Backboard already stores everything — picking up a half-finished session on a different Mac is mostly UI.
  • Voice mode. Was on the original plan; we cut it to fit the demo. Push-to-talk + Whisper STT + a TTS reply on the same memory backbone.
  • Adaptive Coach. Detect when the user did something different from what was suggested and adjust the next pointer instead of re-pointing at the same target.
  • Skill-tree-driven onboarding. The constellation already auto-populates from Coach completions; the next step is using it as a "what should I learn next?" surface.

Built With

Share this project:

Updates