Inspiration

Coding agents are now able to execute commands, not just suggest code. That’s a new attack surface: a single prompt injection or a single typo can trick an agent into running a dangerous pip install.

Recent cybersecurity incidents (e.g., malicious packages and typosquatting-style attacks, Shai Hulud, Bar Lanyado huggingface-cli) showed us how fast an install can become an incident. We wanted a guardrail that works at execution time, when it matters most.

What it does

PromptGuard is a runtime safety layer for coding agents.
When an agent attempts to install a dependency, PromptGuard checks the request and:

  • detects typosquatting / confusingly similar package names,
  • blocks installs that look risky or malicious,
  • prevents prompt-injected install commands from being executed.

The goal is simple: make agent-driven development safer without requiring teams to redesign their workflow.

How we built it

We implemented a two-pillar architecture: 1) Verifier/Resolver: given a package name, it evaluates risk using deterministic signals (package age, similarity to well-known packages, and publisher metadata proxies). 2) Enforcer: the agent must get authorization before providing or executing install steps; risky actions are blocked or quarantined.

Challenges we faced

  • Narrowing the scope to something we could ship in ~24 hours without losing the “wow” moment.
  • Making the demo stable under hackathon conditions (network, time pressure, reliability).
  • Keeping decisions explainable and deterministic.

What we learned

Autonomous agents need the same thing every powerful system needs: guardrails at the point of action.
Even simple, high-signal controls can meaningfully reduce risk when agents interact with package managers and external tooling.

What’s next

  • Support more ecosystems (npm) and add CI/CD integration.
  • Stronger policies (lockfile enforcement, org allowlists).
  • Expand detection coverage to more cybersecurity threats.

Built With

Share this project:

Updates