Cursivis — Project Story
💡 Inspiration
Modern computing workflows are fragmented. Every time we interact with digital content — reading an article, reviewing code, drafting emails, filling forms, or analyzing screenshots — we constantly switch between tools: search engines, AI chatbots, translators, documentation, or productivity apps.
This friction led to a simple question:
What if AI could understand what you selected on screen, infer your intent, and immediately perform the most useful action?
That idea became Cursivis.
Cursivis turns any selection on your screen into context for an AI agent. The user simply selects text, highlights an image region, or uses voice input, and the system automatically understands the task and performs the appropriate action.
The project evolved into a multimodal agentic system powered by Amazon Nova models that can reason about context, plan actions, and automate real workflows directly in the browser.
🚀 What Cursivis Does
Cursivis is an AI workflow agent that operates directly on what the user selects.
The key principle is:
$$ \text{Selection} = \text{Context} $$
$$ \text{Trigger} = \text{Intent} $$
$$ \text{AI Model} = \text{Intelligence} $$
Instead of asking users to manually prompt an AI, Cursivis infers the task automatically.
For example:
| User Selection | AI Action |
|---|---|
| Long article text | Summarize |
| Foreign language | Translate |
| Code snippet | Explain or debug |
| Screenshot | Analyze |
| Email text | Generate reply |
| Research notes | Expand into content |
The result is a context-aware AI assistant integrated directly into everyday workflows.
🧠 How Amazon Nova Powers Cursivis
The intelligence layer of Cursivis is powered by Amazon Nova models via AWS Bedrock.
Nova 2 Lite — Reasoning Engine
Nova Lite handles:
- intent inference
- contextual reasoning
- task planning
- response generation
It acts as the core agent brain that decides what action should be performed.
Nova 2 Sonic — Voice Interaction
Voice input enables real-time interaction where users can speak commands or ask questions about selected content.
This allows Cursivis to function as a conversational AI workflow assistant.
Agentic Action Planning
Instead of only returning text, the Nova backend generates structured action plans such as:
- browser automation steps
- form autofill
- research workflows
- email drafting
These plans are executed by the browser execution layer, allowing Cursivis to automate real tasks.
🏗 How We Built It
Cursivis is built as a layered architecture combining desktop interaction, AI reasoning, and browser automation.
1️⃣ User Interaction Layer
Users interact with Cursivis through:
- text selection
- lasso image selection
- voice input
- Logitech MX Creative Console trigger
- Orb UI and result panel
The trigger acts as an intent signal, telling the system to analyze the current context.
2️⃣ Companion Application
A Windows companion application (WPF / .NET) captures user context:
- clipboard text
- screen selections
- screenshots
- voice input
- trigger events
It prepares structured context for the AI backend.
3️⃣ Nova Agent Backend
The backend runs on Node.js using AWS Bedrock SDK.
Responsibilities include:
- intent inference
- multimodal analysis
- reasoning
- action planning
- result generation
Nova models transform raw user context into structured responses and workflow actions.
4️⃣ Browser Execution Layer
Cursivis integrates with the browser through:
- Chromium extension
- localhost bridge
- DOM extraction
- automated UI actions
This enables the system to perform real operations such as:
- clicking elements
- filling forms
- generating email drafts
- inserting generated content.
5️⃣ Output Layer
The system returns results via:
- clipboard output
- insert/replace text
- email drafts
- browser UI actions
- visual feedback in the Cursivis UI.
🔧 Challenges We Faced
1️⃣ Context Understanding
Selections can vary widely:
- text
- code
- screenshots
- structured documents
Designing a system that can reliably infer intent from minimal context was one of the biggest challenges.
2️⃣ Agent Decision Making
Instead of simple prompt responses, the AI had to decide:
- what task the user likely wants
- whether automation is needed
- which execution path to use.
This required building a structured action planning pipeline.
3️⃣ Multimodal Integration
Combining text, images, and voice input into a unified reasoning pipeline required careful design of the backend interface.
4️⃣ UI Automation Reliability
Browser automation must be stable and predictable.
The system prioritizes:
- executing actions in the current browser tab
- falling back to managed automation only when needed.
📚 What We Learned
Building Cursivis taught us several important lessons:
AI Should Be Context-Driven
The most powerful AI experiences happen when the system understands user context automatically, instead of requiring prompts.
Agents Need Structure
Agentic systems work best when responses are structured plans, not just text.
This enables real-world automation.
Multimodal AI Unlocks Better Workflows
Combining text, images, and voice allows AI to operate much closer to how humans interact with computers.
🌍 Potential Impact
Cursivis demonstrates how AI agents can augment everyday digital workflows.
Possible future applications include:
- research assistants
- developer productivity tools
- automated documentation workflows
- accessibility tools for complex interfaces
- knowledge management systems.
By integrating AI directly into the user's environment, Cursivis moves toward a future where AI becomes an intelligent layer across all applications.
🔮 Future Work
Future improvements could include:
- deeper Nova Act integration for complex workflow automation
- cross-application agents
- persistent context memory
- collaborative agent workflows.
🧠 Final Thought
Cursivis explores a simple but powerful idea:
AI should not wait for prompts — it should understand what you’re already doing.
By combining selection-based context, agentic reasoning, and UI automation, Cursivis turns everyday interactions into intelligent workflows powered by Amazon Nova.
Built With
- actions
- amazon
- amazon-web-services
- apis
- bedrock
- c#
- chromium
- css
- express.js
- extension
- git
- html
- iam
- javascript
- lite
- logitech
- node.js
- nova
- playwright
- sdk
- sonic
- typescript
- v3)
- wpf
Log in or sign up for Devpost to join the conversation.