Inspiration

The current automation ecosystem is brittle, fragmented, and inaccessible to non-developers. Tools like Zapier and Make collapse the moment you need screen-level interaction, while agentic AI systems like ChatGPT can’t actually execute tasks — they hallucinate, forget context, and lack real-world grounding.

Operari was inspired by this gap:

“What if we had intelligent agents that could use a browser like we do — scroll, click, type — and learn workflows just by watching a screen?”

From DeFi farming to cross-site research to community operations, we envisioned a future where the browser itself becomes an interface for orchestrating intelligent, flexible agents — without APIs, and without code.

What it does

Operari is a browser-native automation engine powered by CUAs (Computer-Using Agents). It allows users to:

-Describe a workflow using natural language or screen recordings -Train Operari to repeat tasks across dashboards, forms, dApps, and websites -Execute those tasks just like a real user — clicking, scrolling, typing, navigating, and parsing UI -Securely log in to platforms via ephemeral session sandboxes -Create modular workflows that can be scheduled, reused, and even shared

This means no APIs, no brittle scripts, and no more dead-end automations.

How we built it

Operari is structured around a modular five-layer architecture:

Knowledge Layer — Pulls in web content, whitepapers, dashboards, and Twitter threads as context

Orchestration Layer — Translates video/text inputs into multistep execution plans with branching logic

Capability Layer — Houses the CUAs, which interpret pixel-level screen data and interact like users

Authentication Layer — Stores ephemeral login states (cookies, tokens) securely in sandboxed memory

Execution Layer — Spins up visual agents in isolated VM sessions for privacy and auditability

The CUAs don’t hallucinate — they see the screen, parse layouts visually, and adapt in real time. We built lightweight interfaces for users to submit workflows via screen recordings or prompts, and integrated secure browser sandboxes to prevent leakage or persistence.

Challenges we ran into

-Screen variability: Different screen resolutions and layout shifts required dynamic element matching, not static selectors -Login flow handling: Securely storing credentials without persistent state forced us to build an encrypted, ephemeral auth engine -Pixel data processing: Interpreting interfaces visually — especially dynamic dashboards — meant developing OCR + DOM-agnostic interaction -User input fusion: Merging video training and prompt-based workflows was non-trivial; needed a robust orchestration layer

Accomplishments that we're proud of

  • Built the first-of-its-kind authentication system to give AI agent access to your logged in state.
  • Operari now supports multi-step workflows across Web3 tools like Virtuals, DeFi dashboards, Notion, Twitter, etc.
  • Designed and implemented a zero-persistence, sandboxed login system
  • Enabled no-code automation without ever touching an API — just show it or describe it once

What we learned

  • The browser is the new OS for automation — APIs are helpful but not necessary
  • Users want flexibility without setup — every time we removed config steps, engagement went up
  • Context is not enough — execution is the real differentiator in AI agent design
  • Designing for ephemerality and user control is vital in automation, especially in Web3 and finance

What's next for Operari AI

Custom Workflow Builder — A modular drag-and-drop interface to chain Operari actions across tabs, apps, and platforms

Hosted CUAs — Monetizable, shareable agent templates (e.g. “Wallet Checker”, “Airdrop Hunter”, “Protocol Screener”)

Integrated with Agent Ecosystems (ACP) — Operari will serve as the execution layer for agent frameworks via A2C or A2A models

Confluence-style Research Layer — Agents that summarize, synthesize, and act on browser content autonomously

Built With

Share this project:

Updates