Nova | Devpost

Inspiration

I've always been frustrated by how siloed AI tools are — you open a chat window, describe your problem from scratch, and get a generic answer that has no idea what's actually on your screen. I wanted to build something that felt like a genuinely smart friend sitting next to you at your computer: one that could see what you're looking at, remember what you'd been working on all day, and respond in a way that was visible and physical — not just text in a box.

What It Does

Nova is a floating macOS AI assistant that activates with a global hotkey (Ctrl+Option) from any app. It captures your screen, passes it to Gemini 2.5 Flash along with a rolling memory of your session, and responds by:

Speaking an answer out loud
Animating a cursor to the most relevant element on screen
Sliding in a glassmorphic response panel for anything worth reading
Updating a live Context Web — a graph that maps your session and draws edges between queries that Gemini determines are genuinely semantically related

How I Built It

The app is written in Swift using SwiftUI and native macOS frameworks — ScreenCaptureKit for screen capture, CGEvent for cursor animation, AVSpeechSynthesizer for text-to-speech, and NSVisualEffectView for the glassmorphic UI. The AI backbone is Gemini 2.5 Flash, which handles vision, conversation, and session graph reasoning all in a single call.

Every response from Gemini returns a structured action payload: spoken text, display text, cursor coordinates, and metadata that powers the Context Web — including which prior session entries are semantically connected to the current query and why.

Challenges

The biggest challenge was making the Context Web actually meaningful. An early version connected session entries based on time and keyword overlap, which looked active but was basically noise. The breakthrough was delegating the connection logic entirely to Gemini — on each query, it reviews the indexed session history and decides which prior entries are genuinely related, returning specific reasons for each link. That turned the graph from decorative to genuinely intelligent.

What I Learned

Gemini 2.5 Flash's 1M token context window is underutilized in most apps. You can pass a full screen capture, rich session history, and a detailed system prompt in a single call — and the quality of reasoning scales with how much context you give it. The real design challenge isn't the model's capacity, it's deciding what to ask the model to reason about versus what to handle in code.

Built With

2.5
avspeechsynthesizer
cgevent
flash
gemini
macos
nsvisualeffectview
screencapturekit
swift
swiftui

Updates

Renil Gupta started this project — Apr 12, 2026 06:44 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.