Inspiration
Coding agents are now able to execute commands, not just suggest code. That’s a new attack surface: a single prompt injection or a single typo can trick an agent into running a dangerous pip install.
Recent cybersecurity incidents (e.g., malicious packages and typosquatting-style attacks, Shai Hulud, Bar Lanyado huggingface-cli) showed us how fast an install can become an incident. We wanted a guardrail that works at execution time, when it matters most.
What it does
PromptGuard is a runtime safety layer for coding agents.
When an agent attempts to install a dependency, PromptGuard checks the request and:
- detects typosquatting / confusingly similar package names,
- blocks installs that look risky or malicious,
- prevents prompt-injected install commands from being executed.
The goal is simple: make agent-driven development safer without requiring teams to redesign their workflow.
How we built it
We implemented a two-pillar architecture: 1) Verifier/Resolver: given a package name, it evaluates risk using deterministic signals (package age, similarity to well-known packages, and publisher metadata proxies). 2) Enforcer: the agent must get authorization before providing or executing install steps; risky actions are blocked or quarantined.
Challenges we faced
- Narrowing the scope to something we could ship in ~24 hours without losing the “wow” moment.
- Making the demo stable under hackathon conditions (network, time pressure, reliability).
- Keeping decisions explainable and deterministic.
What we learned
Autonomous agents need the same thing every powerful system needs: guardrails at the point of action.
Even simple, high-signal controls can meaningfully reduce risk when agents interact with package managers and external tooling.
What’s next
- Support more ecosystems (npm) and add CI/CD integration.
- Stronger policies (lockfile enforcement, org allowlists).
- Expand detection coverage to more cybersecurity threats.
Log in or sign up for Devpost to join the conversation.