FaultLine

Inspiration

When AI agents fail, debugging is manual and time-consuming. Engineers spend hours tracing through logs, tool calls, and model outputs to find root causes. There's no automated way to identify the exact point of failure, understand contributing factors, or get actionable fix suggestions.

We built FaultLine to solve this problem — an automated root-cause analysis platform that uses Gemini 3 to produce evidence-backed forensic reports for AI agent failures.

What it does

FaultLine captures every step of agent execution (user inputs, tool calls, model outputs, memory operations) and automatically analyzes traces to produce:

Evidence-backed root cause analysis — Every claim links to specific trace events (clickable in UI)
Causal graph visualization — Interactive React Flow graph showing failure propagation
Actionable fix suggestions — Categorized by prompt, tooling, memory, and orchestration issues
Confidence scores — Root cause and contributing factors ranked with confidence (0-1)

Key Features:

Clickable evidence links that scroll to timeline steps
Interactive causal graphs highlighting first divergence point
Counterfactual analysis ("If X, then Y")
Production-ready: rate limiting, retries, caching, metrics
Security: secret/PII redaction, access control

How we built it

Architecture:

Frontend: Next.js 16, React 19, Tailwind CSS, React Flow for causal graphs
Backend: Next.js API routes, BullMQ for job queuing
Worker: Node.js worker processes traces with Gemini 3
Storage: Redis for events, reports, and artifacts
SDK: TypeScript instrumentation library for easy integration

Gemini Integration:

Uses structured outputs (responseMimeType: "application/json" + responseSchema) for type-safe verdicts
Multi-step reasoning to analyze entire trace timelines
Evidence linking — references specific step IDs ("Step 1", "Step 2") in all claims
Causal graph generation — builds nodes/edges showing dependencies and contradictions
Confidence scoring for root cause and contributing factors

Performance Optimizations:

Evidence slicer — sends only relevant events to Gemini for efficiency
Token budget management (maxOutputTokens: 4096)
Caching — skips Gemini if trace unchanged
Rate limiting (100/min per IP)
Dead-letter queue for failed jobs

Monorepo Structure:

apps/web: Next.js app (API + UI)
apps/worker: BullMQ worker (forensics)
packages/sdk: TypeScript instrumentation SDK
packages/shared: Zod schemas, types, utilities

Challenges we ran into

Structured Output Parsing: Ensuring Gemini returns valid JSON verdicts with all required fields — solved by using Zod schemas for validation
Evidence Linking: Making sure every claim references specific trace events that users can verify — required careful prompt engineering
Causal Graph Generation: Building accurate dependency graphs from unstructured trace data — used Gemini to infer relationships
Performance: Optimizing token usage while maintaining analysis quality — implemented evidence slicer and caching
Real-time Processing: Handling async job processing with BullMQ while keeping UI responsive — used job queuing with status polling

Accomplishments that we're proud of

Evidence-backed analysis: Every claim in the verdict links to specific trace events — no black box analysis
Production-ready: Built with rate limiting, retries, caching, metrics, and security features
Multiple failure types: Handles date format errors, memory failures, rate limits, timeouts, authentication issues, and model contradictions
Clean architecture: Monorepo with SDK, web app, and worker — easy to integrate and deploy
Comprehensive documentation: Integration guide, deployment guide, demo script — ready for developers to use
Gemini structured outputs: Successfully leveraged Gemini 3's structured outputs for type-safe analysis

What we learned

Gemini 3's structured outputs are powerful for generating type-safe analysis — the responseSchema feature ensures consistent JSON responses
Evidence-backed analysis requires careful prompt engineering — we had to explicitly instruct Gemini to reference specific step IDs in all claims
Causal graphs help visualize complex failure chains that text alone can't convey — React Flow made this interactive and intuitive
Building production-ready observability tools requires careful attention to rate limiting, caching, and error handling — not just the core analysis logic
Multi-step reasoning works well for analyzing trace timelines — Gemini can identify patterns across multiple events

What's next for FaultLine

Scale infrastructure: Migrate to Postgres + S3 for production scale (currently Redis-only)
Export capabilities: PDF export for reports, JSON/CSV downloads
Real-time updates: SSE/WebSocket for live trace updates
Search and filtering: Full-text search across traces, filter by failure type, date range
Team collaboration: Multi-user support, comments on traces, shared dashboards
Advanced analytics: Failure trend analysis, common root causes dashboard
SDK improvements: More language support (Python, Go), better batching, offline mode

Try it out

Live Demo: [Your Vercel URL]
GitHub: https://github.com/ashutosh887/faultline
Documentation: See README.md, INTEGRATION.md, DEPLOYMENT.md

Quick Start:

npm install github:ashutosh887/FaultLine#packages/sdk
const tracer = new Tracer({ ingestUrl: "https://faultline.vercel.app" });
tracer.emit({ type: "user_input", payload: { text: "..." } });

Visit /runs to see 7 demo failure scenarios with full analysis!

Built With

bullmq
gemini3
next.js-16
next.js-api-routes
react-19
reactflow
redis
tailwind-css

Updates

Ashutosh Jha started this project — Feb 09, 2026 02:39 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.