Cursivis — Project Story

💡 Inspiration

Modern computing workflows are fragmented. Every time we interact with digital content — reading an article, reviewing code, drafting emails, filling forms, or analyzing screenshots — we constantly switch between tools: search engines, AI chatbots, translators, documentation, or productivity apps.

This friction led to a simple question:

What if AI could understand what you selected on screen, infer your intent, and immediately perform the most useful action?

That idea became Cursivis.

Cursivis turns any selection on your screen into context for an AI agent. The user simply selects text, highlights an image region, or uses voice input, and the system automatically understands the task and performs the appropriate action.

The project evolved into a multimodal agentic system powered by Amazon Nova models that can reason about context, plan actions, and automate real workflows directly in the browser.


🚀 What Cursivis Does

Cursivis is an AI workflow agent that operates directly on what the user selects.

The key principle is:

$$ \text{Selection} = \text{Context} $$

$$ \text{Trigger} = \text{Intent} $$

$$ \text{AI Model} = \text{Intelligence} $$

Instead of asking users to manually prompt an AI, Cursivis infers the task automatically.

For example:

User Selection AI Action
Long article text Summarize
Foreign language Translate
Code snippet Explain or debug
Screenshot Analyze
Email text Generate reply
Research notes Expand into content

The result is a context-aware AI assistant integrated directly into everyday workflows.


🧠 How Amazon Nova Powers Cursivis

The intelligence layer of Cursivis is powered by Amazon Nova models via AWS Bedrock.

Nova 2 Lite — Reasoning Engine

Nova Lite handles:

  • intent inference
  • contextual reasoning
  • task planning
  • response generation

It acts as the core agent brain that decides what action should be performed.


Nova 2 Sonic — Voice Interaction

Voice input enables real-time interaction where users can speak commands or ask questions about selected content.

This allows Cursivis to function as a conversational AI workflow assistant.


Agentic Action Planning

Instead of only returning text, the Nova backend generates structured action plans such as:

  • browser automation steps
  • form autofill
  • research workflows
  • email drafting

These plans are executed by the browser execution layer, allowing Cursivis to automate real tasks.


🏗 How We Built It

Cursivis is built as a layered architecture combining desktop interaction, AI reasoning, and browser automation.

1️⃣ User Interaction Layer

Users interact with Cursivis through:

  • text selection
  • lasso image selection
  • voice input
  • Logitech MX Creative Console trigger
  • Orb UI and result panel

The trigger acts as an intent signal, telling the system to analyze the current context.


2️⃣ Companion Application

A Windows companion application (WPF / .NET) captures user context:

  • clipboard text
  • screen selections
  • screenshots
  • voice input
  • trigger events

It prepares structured context for the AI backend.


3️⃣ Nova Agent Backend

The backend runs on Node.js using AWS Bedrock SDK.

Responsibilities include:

  • intent inference
  • multimodal analysis
  • reasoning
  • action planning
  • result generation

Nova models transform raw user context into structured responses and workflow actions.


4️⃣ Browser Execution Layer

Cursivis integrates with the browser through:

  • Chromium extension
  • localhost bridge
  • DOM extraction
  • automated UI actions

This enables the system to perform real operations such as:

  • clicking elements
  • filling forms
  • generating email drafts
  • inserting generated content.

5️⃣ Output Layer

The system returns results via:

  • clipboard output
  • insert/replace text
  • email drafts
  • browser UI actions
  • visual feedback in the Cursivis UI.

🔧 Challenges We Faced

1️⃣ Context Understanding

Selections can vary widely:

  • text
  • code
  • screenshots
  • structured documents

Designing a system that can reliably infer intent from minimal context was one of the biggest challenges.


2️⃣ Agent Decision Making

Instead of simple prompt responses, the AI had to decide:

  • what task the user likely wants
  • whether automation is needed
  • which execution path to use.

This required building a structured action planning pipeline.


3️⃣ Multimodal Integration

Combining text, images, and voice input into a unified reasoning pipeline required careful design of the backend interface.


4️⃣ UI Automation Reliability

Browser automation must be stable and predictable.

The system prioritizes:

  1. executing actions in the current browser tab
  2. falling back to managed automation only when needed.

📚 What We Learned

Building Cursivis taught us several important lessons:

AI Should Be Context-Driven

The most powerful AI experiences happen when the system understands user context automatically, instead of requiring prompts.


Agents Need Structure

Agentic systems work best when responses are structured plans, not just text.

This enables real-world automation.


Multimodal AI Unlocks Better Workflows

Combining text, images, and voice allows AI to operate much closer to how humans interact with computers.


🌍 Potential Impact

Cursivis demonstrates how AI agents can augment everyday digital workflows.

Possible future applications include:

  • research assistants
  • developer productivity tools
  • automated documentation workflows
  • accessibility tools for complex interfaces
  • knowledge management systems.

By integrating AI directly into the user's environment, Cursivis moves toward a future where AI becomes an intelligent layer across all applications.


🔮 Future Work

Future improvements could include:

  • deeper Nova Act integration for complex workflow automation
  • cross-application agents
  • persistent context memory
  • collaborative agent workflows.

🧠 Final Thought

Cursivis explores a simple but powerful idea:

AI should not wait for prompts — it should understand what you’re already doing.

By combining selection-based context, agentic reasoning, and UI automation, Cursivis turns everyday interactions into intelligent workflows powered by Amazon Nova.

Built With

Share this project:

Updates