Nova CoPilot for Screens

Inspiration

Across India, many NGOs, sustainability startups, and workforce returners spend hours navigating legacy portals that lack APIs. I noticed talented people spending more time clicking through forms than creating real impact. Instead of building another chatbot, I wanted to explore how agentic AI could remove repetitive digital work directly from the screen. Nova CoPilot for Screens was inspired by a simple idea: automation should learn from real human workflows, not force users to design complex scripts.

What it does

Nova CoPilot for Screens is an agentic UI automation system powered by Amazon Nova. It observes browser activity, learns repetitive workflows using Nova 2 Lite reasoning and multimodal embeddings, and safely executes automation through Nova Act.

Key capabilities include:

Watch & Learn Mode to capture real workflows
Workflow discovery through reasoning and pattern clustering
Ghost Mode to preview actions with confidence scores
Safe automation that reduces manual effort and errors

Instead of brittle macros, Nova CoPilot builds explainable workflows that adapt to changing interfaces.

How we built it

The project follows a multi-agent architecture:

Observer Agent: Records DOM structure, navigation paths, and interaction patterns.
Planner Agent (Nova 2 Lite): Generalizes workflows into reusable task graphs.
Executor Agent (Nova Act): Executes UI actions reliably with adaptive selectors.
Nova Multimodal Embeddings: Align visual layout and semantic meaning for robust UI understanding.

Tech Stack:

Frontend: React + Browser Extension
Backend: Node.js / Python services on AWS
AI Stack: Amazon Nova 2 Lite, Nova Act, Nova Multimodal Embeddings

Workflow pipeline:

Browser Recorder → Session Analysis → Agent Planning → Ghost Mode → Safe Automation

Challenges we ran into

One major challenge was building trust. Early prototypes felt too autonomous, which made users hesitant. Introducing Ghost Mode — where the agent previews actions without executing them — helped users feel in control.

Another challenge was handling UI variability. Traditional automation breaks when layouts change. Using multimodal embeddings allowed the system to identify elements semantically rather than relying on fixed coordinates.

Balancing automation speed with safety checks was also critical, especially for workflows involving sensitive data.

Accomplishments that we're proud of

Successfully built a screen-native agent that learns workflows instead of relying on scripts.
Implemented Ghost Mode with confidence scoring to improve transparency.
Demonstrated end-to-end automation across multiple tools in a single flow.
Designed the system with real community use cases in mind, especially NGOs and workforce returners.

What we learned

Building with Amazon Nova showed that separating reasoning from execution improves reliability and safety. Multimodal embeddings made automation far more resilient than expected, and visual feedback helped non-technical users understand AI decisions.

We also learned that community impact requires simplicity — automation must feel approachable, not intimidating.

What's next for Nova CoPilot for Screens

Future plans include:

Voice-triggered workflows using Nova Sonic
Localization for regional language interfaces
Community pilots with NGOs and social enterprises
Expanding automation templates for ESG reporting and returnship programs

The long-term vision is to make agentic automation accessible to organizations that traditionally cannot afford complex RPA tools, helping them reclaim time for meaningful work.

Built With

2
act
agentic
ai
amazon
amazon-web-services
apis
architecture
automation
browser
cloudcomputing
css3
embeddings
extension
html5
javascript
lite
multimodal
node.js
nova
python
react
rest
ui

Submitted to

Amazon Nova AI Hackathon

Created by

I designed and built Nova CoPilot for Screens end-to-end, including the core concept, agentic architecture, and implementation using Amazon Nova services. My contributions include workflow design, frontend interface development, Nova 2 Lite reasoning integration, multimodal embedding logic, and Nova Act automation flows. I also created the demo experience, recorded the video walkthrough, wrote the Builder.aws blog, and focused the project on real-world community impact for NGOs, sustainability teams, and women returners.

Yessasvini Sudarshanam

Updates

Yessasvini Sudarshanam started this project — Feb 20, 2026 04:50 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.