ARCHITECTURE
ARCHITECTURE

Cursivis — Project Story

💡 Inspiration

Modern computing workflows are fragmented. Every time we interact with digital content — reading an article, reviewing code, drafting emails, filling forms, or analyzing screenshots — we constantly switch between tools: search engines, AI chatbots, translators, documentation, or productivity apps.

This friction led to a simple question:

What if AI could understand what you selected on screen, infer your intent, and immediately perform the most useful action?

That idea became Cursivis.

Cursivis turns any selection on your screen into context for an AI agent. The user simply selects text, highlights an image region, or uses voice input, and the system automatically understands the task and performs the appropriate action.

The project evolved into a multimodal agentic system powered by Amazon Nova models that can reason about context, plan actions, and automate real workflows directly in the browser.

🚀 What Cursivis Does

Cursivis is an AI workflow agent that operates directly on what the user selects.

The key principle is:

$$ \text{Selection} = \text{Context} $$

$$ \text{Trigger} = \text{Intent} $$

$$ \text{AI Model} = \text{Intelligence} $$

Instead of asking users to manually prompt an AI, Cursivis infers the task automatically.

For example:

User Selection	AI Action
Long article text	Summarize
Foreign language	Translate
Code snippet	Explain or debug
Screenshot	Analyze
Email text	Generate reply
Research notes	Expand into content

The result is a context-aware AI assistant integrated directly into everyday workflows.

🧠 How Amazon Nova Powers Cursivis

The intelligence layer of Cursivis is powered by Amazon Nova models via AWS Bedrock.

Nova 2 Lite — Reasoning Engine

Nova Lite handles:

intent inference
contextual reasoning
task planning
response generation

It acts as the core agent brain that decides what action should be performed.

Nova 2 Sonic — Voice Interaction

Voice input enables real-time interaction where users can speak commands or ask questions about selected content.

This allows Cursivis to function as a conversational AI workflow assistant.

Agentic Action Planning

Instead of only returning text, the Nova backend generates structured action plans such as:

browser automation steps
form autofill
research workflows
email drafting

These plans are executed by the browser execution layer, allowing Cursivis to automate real tasks.

🏗 How We Built It

Cursivis is built as a layered architecture combining desktop interaction, AI reasoning, and browser automation.

1️⃣ User Interaction Layer

Users interact with Cursivis through:

text selection
lasso image selection
voice input
Logitech MX Creative Console trigger
Orb UI and result panel

The trigger acts as an intent signal, telling the system to analyze the current context.

2️⃣ Companion Application

A Windows companion application (WPF / .NET) captures user context:

clipboard text
screen selections
screenshots
voice input
trigger events

It prepares structured context for the AI backend.

3️⃣ Nova Agent Backend

The backend runs on Node.js using AWS Bedrock SDK.

Responsibilities include:

intent inference
multimodal analysis
reasoning
action planning
result generation

Nova models transform raw user context into structured responses and workflow actions.

4️⃣ Browser Execution Layer

Cursivis integrates with the browser through:

Chromium extension
localhost bridge
DOM extraction
automated UI actions

This enables the system to perform real operations such as:

clicking elements
filling forms
generating email drafts
inserting generated content.

5️⃣ Output Layer

The system returns results via:

clipboard output
insert/replace text
email drafts
browser UI actions
visual feedback in the Cursivis UI.

🔧 Challenges We Faced

1️⃣ Context Understanding

Selections can vary widely:

text
code
screenshots
structured documents

Designing a system that can reliably infer intent from minimal context was one of the biggest challenges.

2️⃣ Agent Decision Making

Instead of simple prompt responses, the AI had to decide:

what task the user likely wants
whether automation is needed
which execution path to use.

This required building a structured action planning pipeline.

3️⃣ Multimodal Integration

Combining text, images, and voice input into a unified reasoning pipeline required careful design of the backend interface.

4️⃣ UI Automation Reliability

Browser automation must be stable and predictable.

The system prioritizes:

executing actions in the current browser tab
falling back to managed automation only when needed.

📚 What We Learned

Building Cursivis taught us several important lessons:

AI Should Be Context-Driven

The most powerful AI experiences happen when the system understands user context automatically, instead of requiring prompts.

Agents Need Structure

Agentic systems work best when responses are structured plans, not just text.

This enables real-world automation.

Multimodal AI Unlocks Better Workflows

Combining text, images, and voice allows AI to operate much closer to how humans interact with computers.

🌍 Potential Impact

Cursivis demonstrates how AI agents can augment everyday digital workflows.

Possible future applications include:

research assistants
developer productivity tools
automated documentation workflows
accessibility tools for complex interfaces
knowledge management systems.

By integrating AI directly into the user's environment, Cursivis moves toward a future where AI becomes an intelligent layer across all applications.

🔮 Future Work

Future improvements could include:

deeper Nova Act integration for complex workflow automation
cross-application agents
persistent context memory
collaborative agent workflows.

🧠 Final Thought

Cursivis explores a simple but powerful idea:

AI should not wait for prompts — it should understand what you’re already doing.

By combining selection-based context, agentic reasoning, and UI automation, Cursivis turns everyday interactions into intelligent workflows powered by Amazon Nova.