Latency for Auth0 overtime
Auth0 Calls made by Shieldclaw prompts
Model Exploration on auth0
Auth0 logs

Inspiration

Autonomous AI agents like OpenClaw are incredibly powerful, capable of executing real-world tasks such as file operations, API calls, and web interactions. However, this power comes with a critical limitation: a lack of built-in security, control, and accountability.

We were driven by a simple question:

How useful is efficiency if the system itself cannot be trusted?

Most AI systems attempt to detect or mitigate harmful behavior after it occurs. We took a fundamentally different approach by designing a system where unsafe actions are structurally impossible, rather than conditionally prevented.

This led to the creation of ShieldClaw, a secure and interpretable evolution of autonomous AI agents.

What it does

ShieldClaw is a secure, context-aware AI agent framework that embeds authentication, authorization, and observability directly into the execution pipeline.

It guarantees that:

Every request is cryptographically verified before processing
Every action is authorized prior to execution
Every decision is logged, traceable, and auditable

Even in scenarios such as prompt injection or agent manipulation, the system enforces hard constraints that prevent unauthorized or destructive behavior. Actions outside defined permissions cannot be executed because they are not representable within the authorization model.

Through Backboard integration, ShieldClaw also provides:

Structured, threaded execution contexts
Persistent memory across interactions
A complete audit trail of agent behavior and security decisions

How we built it

Auth0 Integration (Security and Identity Layer)

JWKS and JWT Verification ShieldClaw validates every incoming request using Auth0’s JSON Web Key Sets (JWKS). Tokens are verified for signature integrity, issuer, audience, and expiration before any application logic is executed. This ensures that only authentic, untampered requests enter the system.

Client Credentials Flow (Agent Identity) ShieldClaw agents operate under dedicated non-human identities using Auth0’s Client Credentials flow. Each agent receives its own scoped access token, ensuring that all downstream actions are cryptographically attributable to the agent itself, not the user.

Token Vault (Secure Credential Handling) All third-party credentials are stored in Auth0’s Token Vault rather than in environment variables or application code. When the agent requires access to an external service, it requests a short-lived token from Auth0 at runtime. This approach ensures:

No raw secrets are ever exposed to the agent
Credentials are never persisted locally
Compromised sessions cannot exfiltrate long-lived secrets

Fine-Grained Authorization (FGA) ShieldClaw uses Auth0 FGA (based on OpenFGA) to model permissions as explicit relationship tuples. Every resource and action is governed by a declarative authorization model.

Before any data is accessed or action is executed, an FGA check is performed. If the relationship does not exist, the action is denied.

This enforces strict guarantees such as:

The agent cannot modify its own code or execution logic
The agent cannot access or delete resources it has not been explicitly granted
The agent cannot initiate actions (e.g., purchases or API calls) outside user-defined intent

Because these constraints are enforced at the authorization layer, they are not bypassable through prompt manipulation or adversarial input. The agent is structurally incapable of performing disallowed operations.

Backboard Integration (Execution and Observability Layer)

Threaded Execution Each interaction is executed within a Backboard thread, which serves as a persistent execution context. This allows the system to maintain state, track intermediate steps, and support structured multi-step workflows.

Persistent Memory Backboard extracts and stores relevant information from interactions, enabling the agent to retain context across sessions and make consistent decisions over time.

Security Evaluation and Audit Trail All actions, decisions, and associated metadata (including risk signals) are logged through Backboard. This creates a transparent and queryable audit trail, allowing developers to inspect how and why the agent performed specific actions.

Challenges we ran into

Designing the system so that authorization checks occur before any data access required restructuring the entire request lifecycle. We also faced challenges in maintaining a clean separation between agent identity and user identity across all services.

Another key difficulty was balancing strict security constraints with practical usability, ensuring the agent remained functional while operating within tightly enforced permission boundaries.

Accomplishments that we're proud of

We built an AI agent system where security is enforced at the architectural level rather than as an afterthought. By moving enforcement into the authorization layer, we made prompt injection attacks ineffective by design.

We also achieved full traceability of agent behavior, enabling real-time inspection of execution flows, memory usage, and security decisions.

Finally, we successfully unified identity, authorization, and observability into a single cohesive framework for secure AI agents.

What we learned

We learned that securing AI systems is not about adding more rules or filters, but about eliminating the possibility of unsafe actions entirely.

Treating authorization as infrastructure rather than application logic fundamentally changes how AI agents are designed. It enables systems that are not only more secure, but also more predictable and easier to reason about.

We also saw the importance of stateful execution. By combining memory, context, and observability, we were able to build an agent that is both capable and accountable.

What's next for ShieldClaw

Client-Initiated Backchannel Authentication (CIBA) We plan to introduce real-time user approval for high-risk actions. When such an action is triggered, the user will receive a push notification and must explicitly approve the request within a limited time window. If no approval is given, the action is automatically canceled.

Adaptive Risk Scoring Permissions will dynamically adjust based on context, behavior, and historical signals to further strengthen security.

Multi-Agent Secure Orchestration We aim to support multiple collaborating agents, each with isolated identities and strictly enforced permission boundaries.

Developer SDK We plan to release a developer toolkit to enable others to build secure AI applications on top of ShieldClaw.

Blogpost

ShieldClaw: Designing Secure Autonomous AI Agents with Identity, Authorization, and Observability

Autonomous AI agents are rapidly evolving from experimental tools into systems capable of executing real-world tasks—interacting with APIs, modifying files, and making decisions across complex workflows.

Frameworks like OpenClaw demonstrate what is possible when large language models are given agency. However, they also expose a fundamental issue:

Autonomy without security is inherently unsafe.

As agents gain the ability to act, the question is no longer just what can they do, but what should they be allowed to do—and who decides that?

This is the problem we set out to solve with ShieldClaw.

Rethinking AI Agent Security

Most existing approaches to AI agent safety rely on reactive mechanisms:

Prompt filtering
Output validation
Post-execution monitoring

While useful, these approaches share a common weakness: they operate after intent has already been formed.

In adversarial scenarios—such as prompt injection—this model is insufficient. If an agent can be persuaded to attempt a harmful action, the system must rely on detection and intervention, which is neither deterministic nor reliable.

ShieldClaw takes a fundamentally different approach:

Security is enforced before execution, at the level of identity and authorization.

Instead of attempting to detect unsafe behavior, we ensure that such behavior is not representable within the system’s permission model.

System Overview

ShieldClaw is a secure, stateful AI agent framework built around three core pillars:

Identity — Every agent operates under a distinct, verifiable identity
Authorization — Every action is explicitly permitted before execution
Observability — Every decision is recorded, traceable, and explainable

This architecture transforms the agent from an opaque executor into a controlled, auditable system component.

Agent Identity: Treating AI Agents as First-Class Principals

A key design decision in ShieldClaw is to treat each AI agent as an independent machine identity, rather than an extension of a human user.

To achieve this, we integrated Auth0’s Machine-to-Machine (M2M) capabilities.

Dynamic Agent Registration

When a new agent is created, ShieldClaw programmatically registers a corresponding M2M application via the Auth0 Management API. This process generates a unique client_id and client_secret for the agent.

Each agent is represented internally as:

@dataclass
class AgentRegistration:
    agent_id: str
    agent_name: str
    auth0_client_id: str
    owner_sub: str
    scopes: list[str]
    created_at: float
    revoked: bool = False

This design ensures:

Isolation: Agents do not share credentials
Accountability: Every action is tied to a specific identity
Revocability: Individual agents can be disabled without affecting users

By decoupling agent identity from user identity, we establish a clear boundary between who requests an action and who executes it.

Authentication: Secure, Short-Lived Access via OAuth2

Once an agent has an identity, it must authenticate securely.

ShieldClaw implements the OAuth2 Client Credentials flow, allowing agents to obtain short-lived access tokens directly from Auth0.

The process is encapsulated in the AgentTokenClient:

resp = await client.post(
    f"https://{self.domain}/oauth/token",
    json={
        "grant_type": "client_credentials",
        "client_id": self.client_id,
        "client_secret": self.client_secret,
        "audience": self.audience,
    },
)

Key characteristics:

Tokens are time-bound and automatically refreshed
No reliance on user sessions or shared credentials
Each request carries a verifiable Bearer token

This ensures that all interactions with ShieldClaw are authenticated at the protocol level.

Authorization: Enforcing Constraints Before Execution

Authentication establishes identity, but authorization defines capability.

ShieldClaw integrates fine-grained authorization using a relationship-based model inspired by systems like OpenFGA. Every action an agent attempts is evaluated against a predefined permission graph.

Pre-Execution Authorization Model

Before any operation is performed:

The agent’s identity is verified
The requested action is mapped to a resource
An authorization check is executed
Only if permitted does execution proceed

This eliminates entire classes of vulnerabilities.

Examples of Enforced Constraints

Agents cannot modify their own execution logic
Access to sensitive data is explicitly scoped
Destructive operations require prior authorization

Because these rules are enforced structurally, they are not bypassable through prompt manipulation or adversarial input.

This represents a shift from:

“Prevent bad behavior when detected” to “Make bad behavior impossible to express”

Secure Credential Handling: Eliminating Secret Exposure

Traditional systems often expose API keys or credentials directly to the runtime environment, creating a significant attack surface.

ShieldClaw avoids this by implementing a token-based access model:

Secrets are stored securely outside the agent runtime
Agents request ephemeral, scoped tokens when needed
No long-lived credentials are ever exposed

This approach ensures that even if an agent is compromised, it cannot exfiltrate reusable secrets.

Execution and Observability: Building Transparent AI Systems

Beyond security, a critical requirement for real-world AI systems is explainability.

ShieldClaw integrates Backboard to provide structured execution and observability.

Threaded Execution Context

Each request is executed within a persistent thread, allowing the system to:

Maintain context across multiple steps
Track intermediate decisions
Support complex workflows

Persistent Memory

Relevant information from interactions is stored and reused, enabling:

Consistent decision-making
Personalization
Long-term reasoning

Audit Trail and Security Logging

Every action is logged with associated metadata:

Agent identity
Requested operation
Authorization decision
Contextual signals

This creates a fully traceable execution history, enabling developers to inspect and understand agent behavior at any point.

Key Challenges

Building ShieldClaw required addressing several architectural challenges:

Pre-access authorization: Ensuring that permission checks occur before any data retrieval required restructuring the request pipeline
Identity separation: Maintaining strict boundaries between user and agent identities across services
Usability vs. security: Designing a system that remains functional while enforcing strict constraints

Each of these challenges reinforced the importance of treating security as a first-class design concern.

What This Changes

ShieldClaw demonstrates that it is possible to build AI agents that are:

Secure by construction
Fully auditable
Deterministic in their capabilities

By embedding identity and authorization into the core architecture, we eliminate reliance on probabilistic safeguards.

Future Work

We are continuing to extend ShieldClaw in several directions:

Backchannel authentication (CIBA) for user approval of high-risk actions
Adaptive authorization models based on contextual risk
Multi-agent orchestration with isolated identities and permissions
A developer SDK for building secure agent-based applications

Conclusion

As AI agents become more capable, the systems that govern them must evolve accordingly.

The traditional model of reactive safeguards is no longer sufficient for systems that can act autonomously in real-world environments.

ShieldClaw represents a shift toward a more principled approach:

Security is not something added to AI systems—it is something they must be built upon.

By combining identity, authorization, and observability, we can move toward a future where AI agents are not only powerful, but also trustworthy by design.

Built With

Updates

Turing Derrick started this project — Mar 19, 2026 06:17 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.