Inspiration

Modern work increasingly happens through web interfaces, including expense portals, CRM systems, government forms, and countless online dashboards. While these systems promise efficiency, users still spend hours repeating the same manual workflows. There are traditional automation tools like RPA that are brittle and break when the UI changes.

We wanted to explore a different idea: what if AI could simply watch you perform a task once and then automate it for you?

What it does

ObserveAI actually observes a user completing a task in the browser and converts it into a reusable workflow. The system understands the UI using multimodal analysis and DOM context, then automatically repeats the workflow later, even if the interface slightly changes. A Ghost Mode preview shows the steps before execution, giving users confidence and control.

How we built it

ObserveAI combines generative AI reasoning with UI automation to create an adaptive workflow learning system.

We used Amazon Nova models to interpret UI context and user actions. Nova Act powers reliable browser actions like clicking, typing, and navigation. A lightweight React + TypeScript dashboard stores recorded workflows and displays Ghost Mode previews. Multimodal embeddings help the system understand both visual layout and DOM structure, making the automation resilient to UI changes.

Challenges we ran into

One of the biggest challenges was designing a system that balances automation with user trust.

Fully autonomous automation can feel risky, especially when interacting with real interfaces. To address this, we designed the Ghost Mode preview, allowing users to review and approve AI-generated workflows before execution.

Another challenge was translating raw browser interactions into meaningful workflow steps that an AI model can interpret and execute reliably.

Accomplishments that we're proud of

We successfully built a system that can watch a workflow once and replay it autonomously. The Ghost Mode preview adds transparency, allowing users to verify steps before execution. Most importantly, the system demonstrates how AI agents can make automation accessible without complex scripting or setup.

What we learned

We learned that combining visual understanding with structured DOM data dramatically improves reliability for UI automation. We also discovered that users trust automation more when they can preview and validate the AI’s actions before execution.

What's next for ObserveAI?

In the future, we envision ObserveAI becoming a full AI workflow assistant capable of learning complex tasks across different applications and platforms.

Potential improvements include:

  1. Multimodal understanding of screens and documents
  2. Voice-driven workflow commands
  3. Adaptive learning from user feedback
  4. Enterprise integrations for CRM, finance, and HR systems

Built With

  • amazon-nova-2-lite
  • fastapi
  • javascript
  • mcp/github
  • nova-act
  • playwright
  • synthetic
Share this project:

Updates