Inspiration
Autonomous AI agents like OpenClaw are incredibly powerful, capable of executing real-world tasks such as file operations, API calls, and web interactions. However, this power comes with a critical limitation: a lack of built-in security, control, and accountability.
We were driven by a simple question:
How useful is efficiency if the system itself cannot be trusted?
Most AI systems attempt to detect or mitigate harmful behavior after it occurs. We took a fundamentally different approach by designing a system where unsafe actions are structurally impossible, rather than conditionally prevented.
This led to the creation of ShieldClaw, a secure and interpretable evolution of autonomous AI agents.
What it does
ShieldClaw is a secure, context-aware AI agent framework that embeds authentication, authorization, and observability directly into the execution pipeline.
It guarantees that:
- Every request is cryptographically verified before processing
- Every action is authorized prior to execution
- Every decision is logged, traceable, and auditable
Even in scenarios such as prompt injection or agent manipulation, the system enforces hard constraints that prevent unauthorized or destructive behavior. Actions outside defined permissions cannot be executed because they are not representable within the authorization model.
Through Backboard integration, ShieldClaw also provides:
- Structured, threaded execution contexts
- Persistent memory across interactions
- A complete audit trail of agent behavior and security decisions
How we built it
Auth0 Integration (Security and Identity Layer)
JWKS and JWT Verification ShieldClaw validates every incoming request using Auth0’s JSON Web Key Sets (JWKS). Tokens are verified for signature integrity, issuer, audience, and expiration before any application logic is executed. This ensures that only authentic, untampered requests enter the system.
Client Credentials Flow (Agent Identity) ShieldClaw agents operate under dedicated non-human identities using Auth0’s Client Credentials flow. Each agent receives its own scoped access token, ensuring that all downstream actions are cryptographically attributable to the agent itself, not the user.
Token Vault (Secure Credential Handling) All third-party credentials are stored in Auth0’s Token Vault rather than in environment variables or application code. When the agent requires access to an external service, it requests a short-lived token from Auth0 at runtime. This approach ensures:
- No raw secrets are ever exposed to the agent
- Credentials are never persisted locally
- Compromised sessions cannot exfiltrate long-lived secrets
Fine-Grained Authorization (FGA) ShieldClaw uses Auth0 FGA (based on OpenFGA) to model permissions as explicit relationship tuples. Every resource and action is governed by a declarative authorization model.
Before any data is accessed or action is executed, an FGA check is performed. If the relationship does not exist, the action is denied.
This enforces strict guarantees such as:
- The agent cannot modify its own code or execution logic
- The agent cannot access or delete resources it has not been explicitly granted
- The agent cannot initiate actions (e.g., purchases or API calls) outside user-defined intent
Because these constraints are enforced at the authorization layer, they are not bypassable through prompt manipulation or adversarial input. The agent is structurally incapable of performing disallowed operations.
Backboard Integration (Execution and Observability Layer)
Threaded Execution Each interaction is executed within a Backboard thread, which serves as a persistent execution context. This allows the system to maintain state, track intermediate steps, and support structured multi-step workflows.
Persistent Memory Backboard extracts and stores relevant information from interactions, enabling the agent to retain context across sessions and make consistent decisions over time.
Security Evaluation and Audit Trail All actions, decisions, and associated metadata (including risk signals) are logged through Backboard. This creates a transparent and queryable audit trail, allowing developers to inspect how and why the agent performed specific actions.
Challenges we ran into
Designing the system so that authorization checks occur before any data access required restructuring the entire request lifecycle. We also faced challenges in maintaining a clean separation between agent identity and user identity across all services.
Another key difficulty was balancing strict security constraints with practical usability, ensuring the agent remained functional while operating within tightly enforced permission boundaries.
Accomplishments that we're proud of
We built an AI agent system where security is enforced at the architectural level rather than as an afterthought. By moving enforcement into the authorization layer, we made prompt injection attacks ineffective by design.
We also achieved full traceability of agent behavior, enabling real-time inspection of execution flows, memory usage, and security decisions.
Finally, we successfully unified identity, authorization, and observability into a single cohesive framework for secure AI agents.
What we learned
We learned that securing AI systems is not about adding more rules or filters, but about eliminating the possibility of unsafe actions entirely.
Treating authorization as infrastructure rather than application logic fundamentally changes how AI agents are designed. It enables systems that are not only more secure, but also more predictable and easier to reason about.
We also saw the importance of stateful execution. By combining memory, context, and observability, we were able to build an agent that is both capable and accountable.
What's next for ShieldClaw
Client-Initiated Backchannel Authentication (CIBA) We plan to introduce real-time user approval for high-risk actions. When such an action is triggered, the user will receive a push notification and must explicitly approve the request within a limited time window. If no approval is given, the action is automatically canceled.
Adaptive Risk Scoring Permissions will dynamically adjust based on context, behavior, and historical signals to further strengthen security.
Multi-Agent Secure Orchestration We aim to support multiple collaborating agents, each with isolated identities and strictly enforced permission boundaries.
Developer SDK We plan to release a developer toolkit to enable others to build secure AI applications on top of ShieldClaw.
Blogpost
ShieldClaw: Designing Secure Autonomous AI Agents with Identity, Authorization, and Observability
Autonomous AI agents are rapidly evolving from experimental tools into systems capable of executing real-world tasks—interacting with APIs, modifying files, and making decisions across complex workflows.
Frameworks like OpenClaw demonstrate what is possible when large language models are given agency. However, they also expose a fundamental issue:
Autonomy without security is inherently unsafe.
As agents gain the ability to act, the question is no longer just what can they do, but what should they be allowed to do—and who decides that?
This is the problem we set out to solve with ShieldClaw.
Rethinking AI Agent Security
Most existing approaches to AI agent safety rely on reactive mechanisms:
- Prompt filtering
- Output validation
- Post-execution monitoring
While useful, these approaches share a common weakness: they operate after intent has already been formed.
In adversarial scenarios—such as prompt injection—this model is insufficient. If an agent can be persuaded to attempt a harmful action, the system must rely on detection and intervention, which is neither deterministic nor reliable.
ShieldClaw takes a fundamentally different approach:
Security is enforced before execution, at the level of identity and authorization.
Instead of attempting to detect unsafe behavior, we ensure that such behavior is not representable within the system’s permission model.
System Overview
ShieldClaw is a secure, stateful AI agent framework built around three core pillars:
- Identity — Every agent operates under a distinct, verifiable identity
- Authorization — Every action is explicitly permitted before execution
- Observability — Every decision is recorded, traceable, and explainable
This architecture transforms the agent from an opaque executor into a controlled, auditable system component.
Agent Identity: Treating AI Agents as First-Class Principals
A key design decision in ShieldClaw is to treat each AI agent as an independent machine identity, rather than an extension of a human user.
To achieve this, we integrated Auth0’s Machine-to-Machine (M2M) capabilities.
Dynamic Agent Registration
When a new agent is created, ShieldClaw programmatically registers a corresponding M2M application via the Auth0 Management API. This process generates a unique client_id and client_secret for the agent.
Each agent is represented internally as:
@dataclass
class AgentRegistration:
agent_id: str
agent_name: str
auth0_client_id: str
owner_sub: str
scopes: list[str]
created_at: float
revoked: bool = False
This design ensures:
- Isolation: Agents do not share credentials
- Accountability: Every action is tied to a specific identity
- Revocability: Individual agents can be disabled without affecting users
By decoupling agent identity from user identity, we establish a clear boundary between who requests an action and who executes it.
Authentication: Secure, Short-Lived Access via OAuth2
Once an agent has an identity, it must authenticate securely.
ShieldClaw implements the OAuth2 Client Credentials flow, allowing agents to obtain short-lived access tokens directly from Auth0.
The process is encapsulated in the AgentTokenClient:
resp = await client.post(
f"https://{self.domain}/oauth/token",
json={
"grant_type": "client_credentials",
"client_id": self.client_id,
"client_secret": self.client_secret,
"audience": self.audience,
},
)
Key characteristics:
- Tokens are time-bound and automatically refreshed
- No reliance on user sessions or shared credentials
- Each request carries a verifiable Bearer token
This ensures that all interactions with ShieldClaw are authenticated at the protocol level.
Authorization: Enforcing Constraints Before Execution
Authentication establishes identity, but authorization defines capability.
ShieldClaw integrates fine-grained authorization using a relationship-based model inspired by systems like OpenFGA. Every action an agent attempts is evaluated against a predefined permission graph.
Pre-Execution Authorization Model
Before any operation is performed:
- The agent’s identity is verified
- The requested action is mapped to a resource
- An authorization check is executed
- Only if permitted does execution proceed
This eliminates entire classes of vulnerabilities.
Examples of Enforced Constraints
- Agents cannot modify their own execution logic
- Access to sensitive data is explicitly scoped
- Destructive operations require prior authorization
Because these rules are enforced structurally, they are not bypassable through prompt manipulation or adversarial input.
This represents a shift from:
“Prevent bad behavior when detected” to “Make bad behavior impossible to express”
Secure Credential Handling: Eliminating Secret Exposure
Traditional systems often expose API keys or credentials directly to the runtime environment, creating a significant attack surface.
ShieldClaw avoids this by implementing a token-based access model:
- Secrets are stored securely outside the agent runtime
- Agents request ephemeral, scoped tokens when needed
- No long-lived credentials are ever exposed
This approach ensures that even if an agent is compromised, it cannot exfiltrate reusable secrets.
Execution and Observability: Building Transparent AI Systems
Beyond security, a critical requirement for real-world AI systems is explainability.
ShieldClaw integrates Backboard to provide structured execution and observability.
Threaded Execution Context
Each request is executed within a persistent thread, allowing the system to:
- Maintain context across multiple steps
- Track intermediate decisions
- Support complex workflows
Persistent Memory
Relevant information from interactions is stored and reused, enabling:
- Consistent decision-making
- Personalization
- Long-term reasoning
Audit Trail and Security Logging
Every action is logged with associated metadata:
- Agent identity
- Requested operation
- Authorization decision
- Contextual signals
This creates a fully traceable execution history, enabling developers to inspect and understand agent behavior at any point.
Key Challenges
Building ShieldClaw required addressing several architectural challenges:
- Pre-access authorization: Ensuring that permission checks occur before any data retrieval required restructuring the request pipeline
- Identity separation: Maintaining strict boundaries between user and agent identities across services
- Usability vs. security: Designing a system that remains functional while enforcing strict constraints
Each of these challenges reinforced the importance of treating security as a first-class design concern.
What This Changes
ShieldClaw demonstrates that it is possible to build AI agents that are:
- Secure by construction
- Fully auditable
- Deterministic in their capabilities
By embedding identity and authorization into the core architecture, we eliminate reliance on probabilistic safeguards.
Future Work
We are continuing to extend ShieldClaw in several directions:
- Backchannel authentication (CIBA) for user approval of high-risk actions
- Adaptive authorization models based on contextual risk
- Multi-agent orchestration with isolated identities and permissions
- A developer SDK for building secure agent-based applications
Conclusion
As AI agents become more capable, the systems that govern them must evolve accordingly.
The traditional model of reactive safeguards is no longer sufficient for systems that can act autonomously in real-world environments.
ShieldClaw represents a shift toward a more principled approach:
Security is not something added to AI systems—it is something they must be built upon.
By combining identity, authorization, and observability, we can move toward a future where AI agents are not only powerful, but also trustworthy by design.
Built With
- auth0
- html
- javascript
- python
Log in or sign up for Devpost to join the conversation.