Inspiration
The current automation ecosystem is brittle, fragmented, and inaccessible to non-developers. Tools like Zapier and Make collapse the moment you need screen-level interaction, while agentic AI systems like ChatGPT can’t actually execute tasks — they hallucinate, forget context, and lack real-world grounding.
Operari was inspired by this gap:
“What if we had intelligent agents that could use a browser like we do — scroll, click, type — and learn workflows just by watching a screen?”
From DeFi farming to cross-site research to community operations, we envisioned a future where the browser itself becomes an interface for orchestrating intelligent, flexible agents — without APIs, and without code.
What it does
Operari is a browser-native automation engine powered by CUAs (Computer-Using Agents). It allows users to:
-Describe a workflow using natural language or screen recordings -Train Operari to repeat tasks across dashboards, forms, dApps, and websites -Execute those tasks just like a real user — clicking, scrolling, typing, navigating, and parsing UI -Securely log in to platforms via ephemeral session sandboxes -Create modular workflows that can be scheduled, reused, and even shared
This means no APIs, no brittle scripts, and no more dead-end automations.
How we built it
Operari is structured around a modular five-layer architecture:
Knowledge Layer — Pulls in web content, whitepapers, dashboards, and Twitter threads as context
Orchestration Layer — Translates video/text inputs into multistep execution plans with branching logic
Capability Layer — Houses the CUAs, which interpret pixel-level screen data and interact like users
Authentication Layer — Stores ephemeral login states (cookies, tokens) securely in sandboxed memory
Execution Layer — Spins up visual agents in isolated VM sessions for privacy and auditability
The CUAs don’t hallucinate — they see the screen, parse layouts visually, and adapt in real time. We built lightweight interfaces for users to submit workflows via screen recordings or prompts, and integrated secure browser sandboxes to prevent leakage or persistence.
Challenges we ran into
-Screen variability: Different screen resolutions and layout shifts required dynamic element matching, not static selectors -Login flow handling: Securely storing credentials without persistent state forced us to build an encrypted, ephemeral auth engine -Pixel data processing: Interpreting interfaces visually — especially dynamic dashboards — meant developing OCR + DOM-agnostic interaction -User input fusion: Merging video training and prompt-based workflows was non-trivial; needed a robust orchestration layer
Accomplishments that we're proud of
- Built the first-of-its-kind authentication system to give AI agent access to your logged in state.
- Operari now supports multi-step workflows across Web3 tools like Virtuals, DeFi dashboards, Notion, Twitter, etc.
- Designed and implemented a zero-persistence, sandboxed login system
- Enabled no-code automation without ever touching an API — just show it or describe it once
What we learned
- The browser is the new OS for automation — APIs are helpful but not necessary
- Users want flexibility without setup — every time we removed config steps, engagement went up
- Context is not enough — execution is the real differentiator in AI agent design
- Designing for ephemerality and user control is vital in automation, especially in Web3 and finance
What's next for Operari AI
Custom Workflow Builder — A modular drag-and-drop interface to chain Operari actions across tabs, apps, and platforms
Hosted CUAs — Monetizable, shareable agent templates (e.g. “Wallet Checker”, “Airdrop Hunter”, “Protocol Screener”)
Integrated with Agent Ecosystems (ACP) — Operari will serve as the execution layer for agent frameworks via A2C or A2A models
Confluence-style Research Layer — Agents that summarize, synthesize, and act on browser content autonomously
Built With
- crewai
- openai
- python
- scrappybara
Log in or sign up for Devpost to join the conversation.