Inspiration
We recognized a critical vulnerability in the AI safety landscape: while system prompts are the first line of defense in AI systems, there's no systematic way to test and improve them against adversarial attacks. Spectra was born from the need to provide developers with deep visibility into their AI agents' behavior, performance, and security vulnerabilities, but more importantly, from the need to automatically identify, test, and patch vulnerabilities in system prompts through continuous red teaming and monitoring.
What it does
Spectra is a specialized AI security and observability platform designed to continuously monitor and test AI agents and system prompts against adversarial misuse. It acts as a security-focused wrapper around agentic workflows, tracking agent behavior in real time, surfacing failures, and actively red teaming prompt boundaries.
Spectra is distributed as a lightweight Python package spectra-agent-security that developers can wrap around any AI agent or workflow with a single line of code. It seamlessly integrates into tools like LangGraph or LangChain, providing continuous safety monitoring and prompt-level threat detection out of the box.
Key features:
- Real-time monitoring of agent executions, tool use, and LLM outputs
- Automated adversarial prompt injection to test system prompt robustness
- Vulnerability detection and classification across execution traces
- Prompt scoring system based on observed failure modes and behavior
- Automated prompt improvement suggestions based on observed weaknesses
- Visual dashboard for auditing agent thinking, vulnerabilities, and exploits
- Seamless integration with frameworks like LangGraph and LangChain
How we built it
- Core Engine: A Python-based wrapper system that tracks agent workflows, injects red team prompts, and logs all interactions
- Vulnerability Detection: Heuristic and pattern-based parsing of execution traces to flag unsafe or unexpected behavior
- Red Teaming System: A set of adversarial prompt cases and test harnesses that run alongside normal workflows
- Prompt Scoring: Lightweight evaluation system that measures prompt robustness based on failure frequency and severity
- Suggestion Engine: LLM-driven subsystem that proposes safer prompt revisions in response to failures
- Frontend Dashboard: Built with Tailwind + Shadcn UI, showing categorized vulnerabilities and timeline visualizations
Live Demo
Spectra dashboard running on example agentic workflow

Vulnerabilities page

Challenges we ran into
- Designing adversarial prompts that consistently challenge real-world system prompts
- Creating meaningful and interpretable safety metrics for prompt failures
- Balancing the overhead of red teaming with real-time monitoring performance
- Avoiding overfitting or hallucination in the prompt suggestion system
- Visualizing complex multi-step agent failures in an accessible way
Accomplishments that we're proud of
- Built a real-time red teaming system that runs in parallel with any AI workflow
- Developed a vulnerability logging pipeline that captures and classifies errors and failures
- Created a live dashboard that surfaces safety issues and allows detailed inspection
- Implemented a first-pass prompt scoring system to quantify robustness
- Generated and tested automated suggestions for hardening unsafe prompts
What we learned
- Prompt failures often happen in subtle, compounding ways — not all at once
- Automated red teaming can catch edge cases missed during manual QA
- System prompts are fragile and benefit from lifecycle-aware improvement tools
- Continuous observability is key for understanding the dynamics of AI agents
- Visual context is essential for debugging agentic workflows and prompt logic
What's next for Spectra
- More intelligent adversarial prompt generation using LLMs and simulation-based red teaming
- Expanded pattern recognition for unknown or emergent vulnerabilities
- Deeper integration with more frameworks and agent runtime environments
- Launching a shared database of known prompt vulnerabilities and exploits
- Auto-hardening system that applies fixes, retests, and closes the loop
- Defining and publishing community standards for prompt security benchmarks
Built With
- bolt
- fastapi
- langchain
- langgraph
- nextjs
- orchids
- python





Log in or sign up for Devpost to join the conversation.