💡 Inspiration
As Large Language Models (LLMs) like Gemini and GPT-4 become critical infrastructure, passive monitoring is no longer enough. We realized that companies don't just need to know when an AI fails—they need a system that actively prevents cascading failures.
We asked ourselves: "What if we could build a 'Circuit Breaker' for AI?"
This inspired the AI Trust Control Plane—a real-time decision engine that sits between users and AI models. Instead of just logging errors, it calculates a deterministic "Trust Score" and enforces active guardrails (like a Kill Switch) to block traffic when trust degrades.
⚙️ What it does
The platform provides a single pane of glass for AI reliability. It operates on a "Green → Red → Green" loop:
- Monitor: Detects Rate Limits (429s) and Latency Spikes in real-time.
- Score: Calculates a dynamic Trust Score (0-100). For example, a rate limit drops the score by 30 points immediately.
- Act: If the score drops below 60, the Kill Switch Guardrail activates, physically blocking API traffic to prevent cost overruns or bad user experiences.
- Audit: Every action (resolution, policy change) is cryptographically signed with SHA-256 and stored in an immutable ledger for compliance.
🛠️ How we built it
We built a local-first, fail-safe architecture to ensure the Control Plane survives even if the backend falters.
- Frontend: Built with Next.js 14 (App Router) and TypeScript for type safety. We used Shadcn/UI and Tailwind CSS for the mission-control aesthetic.
- State Management: We encountered a challenge where rapid API failures caused React state glitches. We solved this by implementing a
useRefbased "Instant Memory" system that tracks score degradation (100 → 70 → 40) with zero lag. - Datadog Integration: The system pushes both Logs and Real-time Events to Datadog's US5 region via a Next.js API proxy, allowing us to visualize the "Staircase Degradation" effect on Datadog dashboards.
- Security: We implemented a mock RBAC (Role-Based Access Control) system. The "SRE" role requires a security PIN to elevate privileges, while the "Auditor" role is read-only, demonstrating enterprise readiness.
🚧 Challenges we faced
The hardest part was the "Natural Degradation" logic. Initially, our Datadog graphs would jump straight from 100 to 0. To fix this, we wrote a custom scoring engine that subtracts penalties cumulatively (e.g., -15 for latency, -30 for rate limits). This created a realistic, organic "step-down" pattern in our observability metrics that mirrors real-world outages.
🏆 Accomplishments that I'm proud of
- The Kill Switch: Seeing the "GUARDRAIL ACTIVE" red overlay trigger automatically when the score hit Critical was a huge win.
- Immutable Audit Logs: Integrating the Web Crypto API to generate real SHA-256 hashes for every log entry makes the system feel truly "Audit-Ready."
- Datadog Sync: Successfully connecting our local simulation to the Datadog cloud and seeing the graphs move in real-time.
Built With
- datadog
- google-cloud-gemini-api
- javascript
- local-storage-api
- next.js
- react
- sha256
- shadcn-ui
- tailwind-css
- typescript
- web-crypto-api
Log in or sign up for Devpost to join the conversation.