Agent Thread
Agent Thread
Agent Thread
Dashboard

Project Story: Boundry

Inspiration

As AI agents move from experimental sandboxes into production workflows, the "messy intake" problem has become a critical bottleneck for security teams. AI risk reports don't arrive as clean JSON payloads; they arrive as fragmented screenshots of chat history, repository links, raw execution logs, and plain-language concerns like "this prompt feels a bit off."

We were inspired to build Boundry because we realized that for agentic security to be operational, it needs a deterministic system of record that can bridge the gap between flexible AI orchestration (the "brain") and auditable remediation (the "memory"). We wanted to move away from "security by chat transcript" and toward a structured, traceable, and approval-gated triage model.

What it does

Boundry is a production-style Python backend that serves as the control plane for AI security and safety triage. It integrates directly with Airia as the orchestration layer and GitHub for remediation.

Evidence Normalization: It takes heterogeneous inputs—prompts, logs, repo URLs, and configs—and normalizes them into a typed CaseRecord.
Finding Generation: Using a heuristic-driven analysis pipeline, it maps evidence to specific AI risk categories like prompt-injection, unsafe-command-execution, and data-exposure.
Approval-Gated Remediation: No external action (like opening a GitHub issue) happens without an explicit human-in-the-loop approval.
Operational Follow-Through: It tracks owners, analyst queues, and SLA states to ensure that "identified" risks actually become "resolved" vulnerabilities.

How we built it

We built Boundry using a modern, lightweight Python stack optimized for transparency and auditability:

FastAPI: Provides the stable API surface for intake and operational views.
Pydantic: Enforces strict schema contracts for every evidence type and finding.
SQLite: Serves as the default, single-instance system of record, enabling easy deployment and local testing.
Mermaid.js: Used for generating live architecture and user journey diagrams.
GitHub API: Orchestrates remediation through issue dispatch and real-time state synchronization.
Airia Integration: Boundry is designed to sit behind Airia, allowing it to benefit from multi-agent routing while providing the reliable backend state needed for enterprise compliance.

We modeled the triage logic as a series of transformations:

$$ \text{Evidence} \xrightarrow{\text{Normalization}} \text{Case} \xrightarrow{\text{Analysis}} \text{Findings} \xrightarrow{\text{Approval}} \text{Action} $$

Challenges we ran into

One of the primary challenges was idempotency when syncing with external systems. We didn't want to flood a repository with duplicate issues if the same risk was reported multiple times. We solved this by implementing a hidden traceability marker in the GitHub issue bodies:

This allows the system to "self-heal" and link existing issues back to internal cases without requiring a complex external state map.

Another challenge was maintaining the separation between evidence and inference. In security auditing, it is vital to know exactly what was observed (e.g., a specific log line) versus what was inferred (e.g., a high-severity prompt injection risk). We designed the Finding model to explicitly track evidence_refs and inference_basis as separate fields.

Accomplishments that we're proud of

We are proud of building a system that feels "production-ready" from day one. Boundry isn't just a prototype; it includes:

Automatic Schema Migrations: Using a built-in migration manager for the SQLite backend.
SLA Tracking: Real-time monitoring of how long cases stay in "intake" or "review" lanes.
End-to-End Traceability: A complete audit trail for every case, from the initial Whatsapp/Slack report to the final GitHub issue closure.

The seamless handoff between Airia (handling the "human" interaction) and Boundry (handling the "workflow" state) demonstrates a powerful model for multi-agent operating systems.

What we learned

We learned that determinism is a feature, not a limitation in agentic systems. While large language models are excellent at reasoning, they can be unpredictable for state management. By pushing the "reasoning" to the orchestration layer (Airia) and keeping the "state" in a structured backend (Boundry), we created a system that is both intelligent and reliable.

We also reinforced our belief in "Least Privilege" for AI agents. Boundry's approach of generating approval-gated actions ensures that even if an agent correctly identifies a fix, a human expert remains the ultimate authority before code changes are dispatched.

What's next for Boundry

The future of Boundry is about expanding the "remediation ecosystem":

Direct Jira and Slack Integrations: Moving beyond GitHub to support enterprise ticketing and notification flows.
Deeper Static Analysis: Integrating tools like Semgrep or Bandit directly into the analysis pipeline for deeper repository inspection.
Scaling to Postgres: Moving beyond SQLite to support horizontal scaling for high-volume security teams.
LLM-Augmented Analysis: Using Airia to provide even richer finding summaries and remediation suggestions while maintaining the core deterministic fallback.

Boundry is just the beginning of the "System of Record" for the agentic era.

Built With

airia
docker
fastapi
fly.io
github-api
httpx
latex
markdown
mermaid.js
pydantic
pytest
python
sqlite

Updates

Natasha N started this project — Mar 19, 2026 02:27 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.