Deck Anti Anti bot

Inspiration

Our project was inspired by the Deck Dropout Challenge, which asked participants to build an autonomous agent that could reliably withdraw a student from a hostile mock university portal. The challenge explicitly emphasized that modern websites are not designed to be automation friendly and that real difficulty comes from friction, instability, and antibot defenses rather than from simple navigation.

What motivated us most was that the problem was not framed as a single button click task. The challenge required discovery of hidden flows, interpretation of unclear UI state, and resilience under adversarial conditions. It felt much closer to real world automation engineering than typical scraping or form filling problems.

We also wanted to explore whether modern vision models and lightweight semantic models could be combined with traditional browser automation to build something that behaves more like an adaptive agent than a brittle script.

What it does

Deck Anti Anti bot is a statedriven autonomous agent that logs into a mock university portal, navigates the internal UI, handles blockers and confirmations, and completes a full withdrawal flow end to end.

At a high level the agent:

• Authenticates automatically with human like interaction patterns • Discovers and navigates enrollment and withdrawal flows dynamically • Solves multiple forms of interactive friction such as modals, verification steps, and multistep confirmations • Handles a vision based CAPTCHA that uses semantic trickery rather than simple object detection • Extracts and submits time sensitive MFA codes from ephemeral UI notifications • Completes a payment flow using DOM state verification and popup window control • Verifies that the final portal state is Withdrawn

The entire system is implemented as a modular asynchronous state machine, where each state returns a boolean that controls whether the orchestrator proceeds to the next phase or aborts safely.

How we built it

1. State machine architecture

The core design is a linear but resilient state machine:

Each state is implemented as an async function with the signature:

[ \text{execute}(browser, \text{optional services}) \rightarrow { \text{True}, \text{False} } ]

This structure makes the flow explicit, testable, and easy to instrument. Every state validates that the expected UI conditions are met before proceeding, rather than assuming that a click succeeded.

2. Browser abstraction layer

All interaction with Playwright is routed through a BrowserController abstraction that provides:

• human_type: per character typing with jittered delays • human_click: bounding box based clicking with random offsets • random_delay: entropy injection to avoid deterministic timing • screenshot: checkpoint observability • get_all_text: page wide text extraction for state verification

This isolates low level UI behavior from business logic and makes it possible to tune realism and performance centrally.

3. Resilient navigation and selectors

Instead of brittle CSS paths, the project relies on:

• Role based selectors • Text based selectors with regex • Attribute prefix selectors such as id^="class-" • Guarded waits such as :not([disabled])

Navigation steps always wait on semantic state changes, for example a button becoming enabled or a heading appearing, instead of relying on fixed sleep intervals.

4. MFA extraction pipeline

The MFA state implements a tiered extraction strategy:

DOM first parsing of ephemeral toast notifications
Regex scanning of page wide text
OCR fallback using a vision service

This prioritizes speed and reliability while still handling purely visual edge cases.

5. Vision based CAPTCHA classification

The Sun CAPTCHA module is implemented as a standalone image classification pipeline.

A CLIP ViT B 32 model is loaded locally and used to compute normalized embeddings for:

• Human related text prompts • Sun related text prompts

For each CAPTCHA image, cosine similarity is computed against both prompt sets:

[ s_{human} = \max( E_{img} \cdot E_{human}^T ), \quad s_{sun} = \max( E_{img} \cdot E_{sun}^T ) ]

The classification decision is:

[ \text{is human} = ( s_{human} > s_{sun} ) ]

When CLIP is unavailable or confidence is low, the system falls back to a vision LLM that returns a strict JSON list of grid indices.

All images, results, and timings are written to timestamped debug folders to make failures reproducible.

6. Payment and popup orchestration

The payment state is implemented as a coordinator plus helper functions.

Key techniques include:

• Scroll search loops to find CTAs without assuming position • DOM state verification after every critical input • Context based popup capture using expect_page • Temporary page swapping to reuse shared helpers • Final confirmation validation before returning success

This allows the agent to safely handle multi window flows without losing control of browser context.

Challenges we ran into

1. Ephemeral UI state

The MFA code appeared in a toast notification that could disappear in under two seconds. The solution required immediate screenshots, DOM first extraction, and multiple fallback layers.

2. DOM instability

Class names and layout structure changed across runs. This forced a move away from CSS path selectors toward role, text, and attribute prefix selectors.

3. Adversarial friction

The challenge intentionally introduced blockers that only triggered under automation. These did not fail deterministically, which required retry loops, state validation, and robust guards on every transition.

4. Vision classification reliability

The CAPTCHA was intentionally semantic rather than visual. The word Sun referred to a person rather than a star. This required prompt engineering and confidence based fallback logic instead of naive object detection.

5. Multi window control

The payment flow spawned new tabs. Without explicit context management, Playwright would silently continue operating on the wrong page. This was solved using event based page capture and explicit page swapping.

What we learned

• Deterministic sleeps are the enemy of reliability • State validation is more important than selector precision • Vision models can replace brittle heuristics for semantic classification tasks • Debug artifacts are not optional when building adversarial automation • A clean state machine architecture drastically reduces debugging time

Closing note

Deck Anti Anti bot is not a button clicking script. It is a resilient autonomous agent that combines browser automation, semantic vision, and state based control flow to operate reliably in hostile UI environments. The project demonstrates that modern automation engineering increasingly looks like systems engineering rather than scraping.