-
-
Showing successful verification, unlocks tools, APIs.
-
Shows how signing keys are kept in the browser (client-side) and then public key is registered so the agent can verify the message
-
Shows successful detection of prompt injection and then properly checking if there's a signed message (there isn't)
-
Showing failed verification, does not unlock tools, keys, APIs
-
Built-in Demo Mode (to show how flow works)
-
Built-in Demo Mode (to show how flow works)
Inspiration
A while ago, I noticed a weird event popping up in my calendar. No matter what I did to delete it, it would pop up again. Turns out in an email, there was hidden code to add the event to the calendar! This was before the age of agents who could read your email and calendar. Prompt injection risk is a very real, there could be hidden prompts in web pages that drastically change your agent's behaviour. Or text "hidden" in emails that tell your agent to ignore previous instructions and leak your credentials.
What it does
It prevents prompt injections by signing the original prompts with the HUMAN's private key, the agent verifies it with the public key. Anything that isn't signed does not get access to tools/plugins. Furthermore, it checks for injection detection (you can swap in and out regex, local LLM, whatever) and then wraps a fence (XML-based) around it.
For the business/enterprise-y folks, it combine's Auth0's authentication with Ed25519 signatures so that only authorized users can reach the agent and the actual request can't be manipulated by attackers.
How I built it
Was built in python because it has the most libraries and flexibility to do things.
Challenges I ran into
Getting the demo correctly. The main principle of this was to show how signed messages are trusted and unsigned messages/injections are not. How do you do that except to mock it? But then in mocking it you are introducing exceptions and special-case handling. This is obviously a work-in-progress project.
Accomplishments that I'm proud of
I'm not super familiar with python, so there was a lot of trial and error to understand what to do.
What I learned
There's so many layers to this, from how signing and verification works in practice (I understood the concept but not the actual thing). Then there's understanding how the Auth0 Library works, along with prompt injection, middle layer building for agents, etc.
What's next for Prompt Injection Checker (aka prompt fencing) via Signatures
It's very much a hackathon project right now which means it's firmly in "Proof of Concept" - I tried to think of what it would need to be integrated into systems like OpenClaw but ran out of time.
Built With
- auth0
- python

Log in or sign up for Devpost to join the conversation.