SecurePrompt: AI-Powered Prompt Injection Detection

Introduction to the Prompt Security Detector within GitLab-Agent configuration (agent.yml) defines AI behavior and security logic.
Flow configuration (flow.yml) sets up the multi-stage analysis pipeline.
User navigates to the Merge Request to initiate analysis
Prompt Security Detector is selected from available agents / User clicks Run Agent to trigger the analysis process or send a chat to start
A session is automatically created to start processing.Stage 1 begins by analyzing the prompt and reading code changes
Agent uses tools to inspect files, diffs, and repository data
Agent analyze patterns using 5-layer security pipeline.Code is analyzed to detect prompt injection and unsafe patterns.
Detector returns structured results: safety, category, and reasoning.System classifies risk and prepares actionable insights.
A comprehensive security report is generated.Results are posted directly to the Merge Request as comments
Sample vuln pattern classification
Sample vuln pattern classification
Sample vuln pattern classification
Sample prompt asking for Database credentials and response given by agent
Layer 4 - Contextual risk assessment sample
Detailed Risk breakdown
Attack pattern
Recommendations
FINAL output
Risk scoring cvss score

Inspiration

As AI-powered development tools become deeply integrated into modern workflows, a new class of vulnerabilities has emerged—prompt injection attacks. These attacks can manipulate AI behavior, leak sensitive data, or override system instructions.

We noticed that while traditional security tools focus on code vulnerabilities, AI prompt security is largely unaddressed—especially within developer workflows like Merge Requests.

This inspired us to build SecurePrompt, a solution that brings real-time prompt injection detection directly into GitLab, ensuring developers can catch and fix issues before they reach production.

What it does

SecurePrompt is an AI-powered security agent that integrates into GitLab Merge Requests to:

Analyze code changes for prompt injection patterns Classify attack types and intent Use a custom Python detection engine for validation Generate structured security reports with risk scores Post inline comments and summaries directly in the MR

It works automatically when a user clicks “Run Agent”, delivering real-time, actionable feedback within seconds.

How we built it

We built SecurePrompt using a combination of:

GitLab AI Agents framework agent.yml to define behavior, tools, and system prompts flow.yml to orchestrate a multi-stage pipeline: Analyze prompt Classify attack Isolate and log Generate report

A custom Python detector:

from src.detector import PromptInjectionDetector detector = PromptInjectionDetector() result = detector.detect(code_snippet) Built-in GitLab tools to: Read MR files and diffs Analyze repository context Post results back into Merge Requests

This hybrid approach combines AI reasoning + deterministic validation for higher accuracy.

Challenges we ran into

Designing reliable detection logic for ambiguous prompt injection patterns Balancing false positives vs. real threats Integrating seamlessly with GitLab’s agent and flow architecture Mapping AI analysis to actionable developer feedback Ensuring fast execution within real-time workflows

Accomplishments that we're proud of

Built an end-to-end working prototype integrated with GitLab Achieved real-time analysis within Merge Requests Implemented a multi-layer detection pipeline (AI + Python engine) Delivered clear, structured security reports with risk scoring Bridged the gap between AI security and developer workflows

What we learned

Prompt injection is a critical and evolving security challenge AI alone isn’t enough—combining it with rule-based validation improves reliability Developers prefer security feedback directly in their workflow, not external tools Performance and usability are just as important as detection accuracy

What's next for SecurePrompt: AI-Powered Prompt Injection Detection

Expand detection coverage with more advanced attack patterns Improve accuracy using feedback-driven learning Integrate with CI/CD pipelines for continuous security validation Extend support beyond GitLab to other platforms Build a centralized dashboard for security insights and trends