Inspiration

Arca Continental's sales teams spend hours manually re-entering purchase orders from client portals into internal systems every day. We asked: what if an AI could watch a human do this once and then do it autonomously, forever? That idea, borrowed from imitation learning in robotics, became ArcaVision.

What it does

ArcaVision watches a user complete a procurement workflow once, learns the field mappings, and then executes the entire process autonomously on new data. It opens the browser, navigates the portal, fills the forms, submits the order, and closes everything by itself. At the end, it generates encrypted financial reports and sends a branded email ticket automatically.

How we built it

The stack combines several AI and automation tools working together:

  • Claude Opus (Anthropic): analyzes screen recordings and audio to extract the workflow and generate a structured execution plan with Bayesian confidence scores.
  • Gemini (Google): supports multimodal analysis in the post-processing pipeline.
  • ElevenLabs: converts AI-generated feedback and status updates into voice output.
  • Groq + Whisper: fast audio transcription of the user's narration during recording.
  • browser-use + Playwright: browser agent that navigates portals via DOM, with no hardcoded CSS selectors.
  • FastAPI + SQLite: backend API and database storing sessions, learned plans, field mappings, orders, and errors.
  • Fernet encryption (AES-128-CBC + HMAC-SHA256): all financial data encrypted at rest before saving to the database or attaching to emails.
  • Solana Devnet: SHA-256 order hashes anchored on-chain for tamper-proof audit trails.
  • ReportLab + openpyxl: automated PDF and Excel report generation.
  • Plotly + Monte Carlo simulation: economic impact dashboard with 95% confidence intervals on time and cost savings.

Challenges we ran into

Tuali has no public API, so we built the agent to work via UI automation against a controlled sandbox instead of hitting production. Getting Claude's JSON responses to parse reliably was tricky — we built a balanced-brace extractor that handles markdown fences, nested JSON, and trailing text. We also had to prevent the agent from entering infinite scroll loops and filter out the recorder's own actions from the learned workflow.

Accomplishments that we're proud of

A full end-to-end pipeline working in under 24 hours, from screen recording, to autonomous browser execution, to encrypted financial reports, to email delivery. The agent uses zero hardcoded selectors and genuinely improves with each run through Bayesian field mapping updates.

What we learned

Imitation learning is surprisingly practical for enterprise UI automation when paired with a strong vision-language model. DOM-based navigation beats coordinate-clicking for reliability. And building for real enterprise constraints, closed systems, no APIs, compliance requirements, forces much better architecture decisions.

What's next for ArcaVision

Expanding the workflow library across all of Arca Continental's client portals, adding an active learning loop where low-confidence mappings trigger human review, and integrating purchase history to move from automation to predictive ordering.

Built With

Share this project:

Updates