PromptGuard

Inspiration

Coding agents are now able to execute commands, not just suggest code. That’s a new attack surface: a single prompt injection or a single typo can trick an agent into running a dangerous pip install.

Recent cybersecurity incidents (e.g., malicious packages and typosquatting-style attacks, Shai Hulud, Bar Lanyado huggingface-cli) showed us how fast an install can become an incident. We wanted a guardrail that works at execution time, when it matters most.

What it does

PromptGuard is a runtime safety layer for coding agents.
When an agent attempts to install a dependency, PromptGuard checks the request and:

detects typosquatting / confusingly similar package names,
blocks installs that look risky or malicious,
prevents prompt-injected install commands from being executed.

The goal is simple: make agent-driven development safer without requiring teams to redesign their workflow.

How we built it

We implemented a two-pillar architecture: 1) Verifier/Resolver: given a package name, it evaluates risk using deterministic signals (package age, similarity to well-known packages, and publisher metadata proxies). 2) Enforcer: the agent must get authorization before providing or executing install steps; risky actions are blocked or quarantined.

Challenges we faced

Narrowing the scope to something we could ship in ~24 hours without losing the “wow” moment.
Making the demo stable under hackathon conditions (network, time pressure, reliability).
Keeping decisions explainable and deterministic.

What we learned

Autonomous agents need the same thing every powerful system needs: guardrails at the point of action.
Even simple, high-signal controls can meaningfully reduce risk when agents interact with package managers and external tooling.