Inspiration

My core inspiration was a simple thought. AI agents are getting good at deciding what to do, but there is no standardized way to control whether they should actually do. Most teams building with agents rely on guardrails with hardcoded if-statements or manual approval queues. The gap between "the agent decided to move $5,000" and "the money left the account" is where things breaks down. I built OK Beamer to fill that gap with a formal control plane. It combines deterministic business rules with Gemini 3-powered contextual reasoning and applies it consistently before any AI Agent action executes.

What it does

OK Beamer sits between AI agents and the real-world actions they attempt to take. When an agent proposes an action -- creating a payment link, sending an email, posting a Slack message -- OK Beamer intercepts it. It then evaluates the action through a deterministic policy engine and an LLM-powered risk assessor (Gemini-3), and gates the outcome: allow it to execute, require human confirmation, or block it entirely.

The core mechanism is a graduated autonomy model. A human operator sets a baseline autonomy level (0 through 4), and the system enforces it. Gemini-3 can dynamically reduce the effective autonomy for a specific action when it detects elevated risk or intent drift, but it can never increase it beyond what the operator has set. Deterministic policies always take precedence over the LLM -- if a policy says block, the action is blocked regardless of what Gemini recommends. The system connects to real external services (Stripe for payments, Slack for messaging, Resend for email) and executes against their APIs when configured, or falls back to simulation when keys are absent. Every action, decision, and execution is logged to an audit trail with full reasoning attached.

OK Beamer answers the question "how do you let AI agents do useful work without giving them unchecked access to production systems."

How I built it

The Control Plane: A Next.js application with server-side route handlers.

The Pipeline: Actions hit a /api/decide endpoint that runs a three-stage pipeline:

  1. Deterministic policy evaluation.
  2. Gemini 3 risk assessment (via Zod-validated JSON).
  3. A decision resolver applying strict precedence rules.

Execution: Validated actions hit /api/execute, dispatching to real-world integrations like Stripe (payments), Slack (messaging), and Resend (email).

The Simulator: The frontend is a scenario-based dashboard where users watch agents execute multi-step missions (e.g., invoice processing) while the control plane intercepts and gates every move in real time.

Challenges I ran into

Schema Constraints: Gemini’s response schema enums only accept strings, but my autonomy adjustments needed to be integers ($-2, -1, 0$). I had to map string enums to numeric values during the parsing phase—a small but critical fix for the decision engine.

State Persistence: During development, Next.js Turbopack hot-reloads would wipe the in-memory store. I solved this by pinning the store to globalThis to ensure it survived module re-evaluations.

The "Demo" UX: The first version felt like a CRUD app. I realized that to show the value of a control plane, the user shouldn't be the one clicking "Send." I pivoted to Scenario Simulation, where the agent acts autonomously and the system intervenes. This shift from "User-triggered" to "Agent-triggered" made the value proposition click.

Accomplishments that I am proud of

I successfully moved beyond the "Chatbot" paradigm. While most projects focus on how an agent decides to act, I built the infrastructure that governs if it can act. Creating a separate Control Plane that intercepts and evaluates intent in real-time rather than just wrapping an LLM in a loop, is a significant shift for me.

I was able to push Gemini 3 beyond simple text generation and into the role of a Mission Critical Auditor. By utilizing high-reasoning thinking levels and strict response schemas, I turned the LLM into a "Risk Engine" that provides consistent, machine-parseable safety scores for every autonomous action.

One of my biggest technical wins was the Strict Precedence Model. I solved the common issue where LLMs try to "negotiate" their way around rules. By architecting a system where deterministic code (Zod + TypeScript) runs before the LLM, I ensured that business-critical guardrails are physically impossible for the AI to bypass.

As a solo developer, I implemented "self-healing" API patterns. From using globalThis to preserve state across hot-reloads to building graceful degradation for external integrations (like Stripe and Resend), the system is built to handle the chaotic nature of both AI and real-world APIs.

What I learned

1. Policy vs. Reasoning: The Precedence Model

Policy and LLMs serve fundamentally different purposes. Early in development, the temptation was to let Gemini handle everything from risk scoring, enforcement, to decision-making. However, LLMs are probabilistic. A payment threshold of $2,000 shouldn't be "usually enforced."

The Insight: Strict separation. The policy engine runs first and returns non-negotiable blocks.

The Hierarchy: Gemini operates in the advisory layer. It can add caution (reduce autonomy, flag risks) but can never override a hard policy.

The Logic: Policy > Autonomy > LLM Reasoning

2. Type-Safe Intelligence

Using Gemini’s structured output with Zod validation changed the game. By enforcing a strict response schema, every LLM response became machine-parseable and type-safe.

No Prompt Hacks: Eliminated "please respond in JSON" prompts and fragile regex parsing.

Functional AI: This made it possible to treat the LLM as a pure function with a typed return value which is essential when output feeds directly into a decision engine.

3. Autonomy is a Spectrum

Autonomy is not binary. I implemented a five-level model ranging from Observe-Only to Full Autonomy.

Dynamic Throttling: By allowing per-action adjustments, Gemini can "throttle" an agent’s freedom in real-time.

Risk Mitigation: If an action is deemed high-risk, the system automatically drops the autonomy level for that specific step, requiring human confirmation without killing the entire workflow.

Why Gemini-3

Structured output with schema enforcement. OkBeamer needs the LLM to return a precise JSON structure every time -- risk scores, alignment scores, autonomy adjustments, enums. Gemini's responseSchema parameter lets you define the exact JSON schema the model must conform to, which means the output is structurally guaranteed rather than hoping the model follows a prompt instruction. This is critical for a control plane where a malformed response could mean a missed block.

Reasoning over structured context. The review prompt sends Gemini a composite input: the raw instruction, the normalized action intent, the baseline autonomy level, and the full deterministic policy result. Gemini needs to reason across all of those signals -- not just summarize them -- to produce a calibrated risk score and decide whether to reduce autonomy. The model's reasoning capabilities handle this well, particularly the ability to weigh policy findings against contextual risk factors.

Adaptive autonomy as a constrained optimization. The autonomy adjustment is deliberately constrained to {0, -1, -2}. Gemini operates within that constraint: it can only add caution, never relax it. This is a design choice that works well with Gemini's structured output -- the enum constraint is enforced at the API level, not just via prompt engineering.

Practical considerations. The @google/genai SDK provides a clean server-side integration for Next.js route handlers. The free tier is generous enough for demos and development. The structured output + Zod validation pipeline gives a reliable fallback path when the model produces edge-case outputs.

What's next for OK Beamer

I plan to expand OK Beamer into a Multi-Agent Control Plane. This would allow a "Lead Agent" to delegate tasks to "Worker Agents," with OK Beamer managing the permissions and communication between them to prevent a "rogue sub-agent" from executing unauthorized actions.

By acting as an MCP host, OK Beamer can provide a unified safety layer for any third-party tool or data source an agent connects to, making the "OK Beamer" guardrails portable across different ecosystems.

Additionally, I’m exploring a "Lite" version of the control plane that can run on edge devices. If an autonomous drone or warehouse robot is powered by an agentic brain, OK Beamer would serve as the physical safety governor, ensuring high-level reasoning never violates low-level safety "kill-switches."

Built With

  • axios
  • axios@1.13.5
  • geminiapi
  • google
  • google/genai@1.40.0
  • next.js
  • next@16.1.6
  • react
  • react-dom@19.2.3
  • react@19.2.3
  • resend
  • resend@6.9.1
  • stripe
  • stripe@20.3.1
  • tailwind
  • tailwindcss@4
  • typescript
  • zod
  • zod@4.3.6
Share this project:

Updates