Spectra

Spectra landing page
Architecture
Dashboard
Vulnerabilities
Analytics
Red-teaming console, idle
Red-teaming console (running an agentic workflow)

Inspiration

We recognized a critical vulnerability in the AI safety landscape: while system prompts are the first line of defense in AI systems, there's no systematic way to test and improve them against adversarial attacks. Spectra was born from the need to provide developers with deep visibility into their AI agents' behavior, performance, and security vulnerabilities, but more importantly, from the need to automatically identify, test, and patch vulnerabilities in system prompts through continuous red teaming and monitoring.

What it does

Spectra is a specialized AI security and observability platform designed to continuously monitor and test AI agents and system prompts against adversarial misuse. It acts as a security-focused wrapper around agentic workflows, tracking agent behavior in real time, surfacing failures, and actively red teaming prompt boundaries.

Spectra is distributed as a lightweight Python package spectra-agent-security that developers can wrap around any AI agent or workflow with a single line of code. It seamlessly integrates into tools like LangGraph or LangChain, providing continuous safety monitoring and prompt-level threat detection out of the box.

Key features:

Real-time monitoring of agent executions, tool use, and LLM outputs
Automated adversarial prompt injection to test system prompt robustness
Vulnerability detection and classification across execution traces
Prompt scoring system based on observed failure modes and behavior
Automated prompt improvement suggestions based on observed weaknesses
Visual dashboard for auditing agent thinking, vulnerabilities, and exploits
Seamless integration with frameworks like LangGraph and LangChain

How we built it

Core Engine: A Python-based wrapper system that tracks agent workflows, injects red team prompts, and logs all interactions
Vulnerability Detection: Heuristic and pattern-based parsing of execution traces to flag unsafe or unexpected behavior
Red Teaming System: A set of adversarial prompt cases and test harnesses that run alongside normal workflows
Prompt Scoring: Lightweight evaluation system that measures prompt robustness based on failure frequency and severity
Suggestion Engine: LLM-driven subsystem that proposes safer prompt revisions in response to failures
Frontend Dashboard: Built with Tailwind + Shadcn UI, showing categorized vulnerabilities and timeline visualizations

Live Demo

Spectra dashboard running on example agentic workflow
Vulnerabilities page

Challenges we ran into

Designing adversarial prompts that consistently challenge real-world system prompts
Creating meaningful and interpretable safety metrics for prompt failures
Balancing the overhead of red teaming with real-time monitoring performance
Avoiding overfitting or hallucination in the prompt suggestion system
Visualizing complex multi-step agent failures in an accessible way

Accomplishments that we're proud of

Built a real-time red teaming system that runs in parallel with any AI workflow
Developed a vulnerability logging pipeline that captures and classifies errors and failures
Created a live dashboard that surfaces safety issues and allows detailed inspection
Implemented a first-pass prompt scoring system to quantify robustness
Generated and tested automated suggestions for hardening unsafe prompts

What we learned

Prompt failures often happen in subtle, compounding ways — not all at once
Automated red teaming can catch edge cases missed during manual QA
System prompts are fragile and benefit from lifecycle-aware improvement tools
Continuous observability is key for understanding the dynamics of AI agents
Visual context is essential for debugging agentic workflows and prompt logic

What's next for Spectra

More intelligent adversarial prompt generation using LLMs and simulation-based red teaming
Expanded pattern recognition for unknown or emergent vulnerabilities
Deeper integration with more frameworks and agent runtime environments
Launching a shared database of known prompt vulnerabilities and exploits
Auto-hardening system that applies fixes, retests, and closes the loop
Defining and publishing community standards for prompt security benchmarks