Inspiration

Across India, many NGOs, sustainability startups, and workforce returners spend hours navigating legacy portals that lack APIs. I noticed talented people spending more time clicking through forms than creating real impact. Instead of building another chatbot, I wanted to explore how agentic AI could remove repetitive digital work directly from the screen. Nova CoPilot for Screens was inspired by a simple idea: automation should learn from real human workflows, not force users to design complex scripts.


What it does

Nova CoPilot for Screens is an agentic UI automation system powered by Amazon Nova. It observes browser activity, learns repetitive workflows using Nova 2 Lite reasoning and multimodal embeddings, and safely executes automation through Nova Act.

Key capabilities include:

  • Watch & Learn Mode to capture real workflows
  • Workflow discovery through reasoning and pattern clustering
  • Ghost Mode to preview actions with confidence scores
  • Safe automation that reduces manual effort and errors

Instead of brittle macros, Nova CoPilot builds explainable workflows that adapt to changing interfaces.


How we built it

The project follows a multi-agent architecture:

  • Observer Agent: Records DOM structure, navigation paths, and interaction patterns.
  • Planner Agent (Nova 2 Lite): Generalizes workflows into reusable task graphs.
  • Executor Agent (Nova Act): Executes UI actions reliably with adaptive selectors.
  • Nova Multimodal Embeddings: Align visual layout and semantic meaning for robust UI understanding.

Tech Stack:

  • Frontend: React + Browser Extension
  • Backend: Node.js / Python services on AWS
  • AI Stack: Amazon Nova 2 Lite, Nova Act, Nova Multimodal Embeddings

Workflow pipeline:

Browser Recorder → Session Analysis → Agent Planning → Ghost Mode → Safe Automation


Challenges we ran into

One major challenge was building trust. Early prototypes felt too autonomous, which made users hesitant. Introducing Ghost Mode — where the agent previews actions without executing them — helped users feel in control.

Another challenge was handling UI variability. Traditional automation breaks when layouts change. Using multimodal embeddings allowed the system to identify elements semantically rather than relying on fixed coordinates.

Balancing automation speed with safety checks was also critical, especially for workflows involving sensitive data.


Accomplishments that we're proud of

  • Successfully built a screen-native agent that learns workflows instead of relying on scripts.
  • Implemented Ghost Mode with confidence scoring to improve transparency.
  • Demonstrated end-to-end automation across multiple tools in a single flow.
  • Designed the system with real community use cases in mind, especially NGOs and workforce returners.

What we learned

Building with Amazon Nova showed that separating reasoning from execution improves reliability and safety. Multimodal embeddings made automation far more resilient than expected, and visual feedback helped non-technical users understand AI decisions.

We also learned that community impact requires simplicity — automation must feel approachable, not intimidating.


What's next for Nova CoPilot for Screens

Future plans include:

  • Voice-triggered workflows using Nova Sonic
  • Localization for regional language interfaces
  • Community pilots with NGOs and social enterprises
  • Expanding automation templates for ESG reporting and returnship programs

The long-term vision is to make agentic automation accessible to organizations that traditionally cannot afford complex RPA tools, helping them reclaim time for meaningful work.

Built With

Share this project:

Updates