Inspiration AI code reviewers like GitHub Copilot and CodeRabbit are becoming standard in modern dev workflows. We realized that if an LLM is making trust decisions on code, it becomes a target — a malicious contributor can hide instructions directly in their PR to manipulate the AI into approving dangerous code. Prompt injection is a well-documented LLM vulnerability, but nobody is defending against it at the PR level. We built Sentinel to close that gap.
What it does Sentinel is a GitHub App that scans every pull request for prompt injection attacks before any AI reviewer sees the diff. It runs two layers of detection: a pattern scan that catches known injection phrases, and a live GPT-4o probe that tests whether the payload would actually fool an undefended AI reviewer. If an attack is detected, Sentinel blocks the PR via GitHub Checks, posts a detailed security report as a comment, and prevents the merge button from being clicked.
How we built it We built Sentinel as a Node.js GitHub App using the GitHub Checks and Comments APIs. The injection scanner runs a regex pattern layer over the raw diff, then sends the diff to GPT-4o twice — once with no defenses to confirm the attack works, and once with a hardened system prompt to verify Sentinel catches it. The whole pipeline is triggered by GitHub webhooks on pull request events and deployed on Render.
Challenges we ran into Getting the GitHub Checks API to block the merge button reliably took a lot of trial and error. The hardest part was designing the hardened prompt — it needed to be specific enough to catch injections without flagging legitimate code comments as attacks. We also had to make sure the scanner ran before any other LLM stage so an injection could never influence the pipeline verdict.
Accomplishments that we're proud of Getting a live end-to-end demo working where a PR with an injection payload gets blocked in real time on GitHub. The moment where the report shows "undefended GPT-4o: FOOLED" next to "Sentinel: BLOCKED" is exactly what we wanted — it makes the threat concrete and visible rather than theoretical.
What we learned Prompt injection is a lot more subtle than we expected. Simple "ignore previous instructions" strings are obvious, but real attacks can be encoded, split across lines, or hidden in variable names. We also learned that defending an LLM is fundamentally different from securing traditional software — you can't just sanitize inputs the way you would for SQL injection.
What's next for Sentinel Real semantic diffing — running the diff through GPT-4o with and without the payload and comparing verdicts to prove behavioral manipulation, not just pattern matching. We also want to add detection for obfuscated and stealth payloads, a dashboard showing attack attempts across repos over time, and support for other AI code review tools beyond GitHub.
Built With
- express.js
- github-apps
- javascript
- neo4j
- node.js
- openai-gpt-4o-api
- react
- render
- supabase
- tavily
- vite
Log in or sign up for Devpost to join the conversation.