Inspiration

When AI agents fail, debugging is manual and time-consuming. Engineers spend hours tracing through logs, tool calls, and model outputs to find root causes. There's no automated way to identify the exact point of failure, understand contributing factors, or get actionable fix suggestions.

We built FaultLine to solve this problem — an automated root-cause analysis platform that uses Gemini 3 to produce evidence-backed forensic reports for AI agent failures.


What it does

FaultLine captures every step of agent execution (user inputs, tool calls, model outputs, memory operations) and automatically analyzes traces to produce:

  • Evidence-backed root cause analysis — Every claim links to specific trace events (clickable in UI)
  • Causal graph visualization — Interactive React Flow graph showing failure propagation
  • Actionable fix suggestions — Categorized by prompt, tooling, memory, and orchestration issues
  • Confidence scores — Root cause and contributing factors ranked with confidence (0-1)

Key Features:

  • Clickable evidence links that scroll to timeline steps
  • Interactive causal graphs highlighting first divergence point
  • Counterfactual analysis ("If X, then Y")
  • Production-ready: rate limiting, retries, caching, metrics
  • Security: secret/PII redaction, access control

How we built it

Architecture:

  • Frontend: Next.js 16, React 19, Tailwind CSS, React Flow for causal graphs
  • Backend: Next.js API routes, BullMQ for job queuing
  • Worker: Node.js worker processes traces with Gemini 3
  • Storage: Redis for events, reports, and artifacts
  • SDK: TypeScript instrumentation library for easy integration

Gemini Integration:

  • Uses structured outputs (responseMimeType: "application/json" + responseSchema) for type-safe verdicts
  • Multi-step reasoning to analyze entire trace timelines
  • Evidence linking — references specific step IDs ("Step 1", "Step 2") in all claims
  • Causal graph generation — builds nodes/edges showing dependencies and contradictions
  • Confidence scoring for root cause and contributing factors

Performance Optimizations:

  • Evidence slicer — sends only relevant events to Gemini for efficiency
  • Token budget management (maxOutputTokens: 4096)
  • Caching — skips Gemini if trace unchanged
  • Rate limiting (100/min per IP)
  • Dead-letter queue for failed jobs

Monorepo Structure:

  • apps/web: Next.js app (API + UI)
  • apps/worker: BullMQ worker (forensics)
  • packages/sdk: TypeScript instrumentation SDK
  • packages/shared: Zod schemas, types, utilities

Challenges we ran into

  1. Structured Output Parsing: Ensuring Gemini returns valid JSON verdicts with all required fields — solved by using Zod schemas for validation
  2. Evidence Linking: Making sure every claim references specific trace events that users can verify — required careful prompt engineering
  3. Causal Graph Generation: Building accurate dependency graphs from unstructured trace data — used Gemini to infer relationships
  4. Performance: Optimizing token usage while maintaining analysis quality — implemented evidence slicer and caching
  5. Real-time Processing: Handling async job processing with BullMQ while keeping UI responsive — used job queuing with status polling

Accomplishments that we're proud of

  • Evidence-backed analysis: Every claim in the verdict links to specific trace events — no black box analysis
  • Production-ready: Built with rate limiting, retries, caching, metrics, and security features
  • Multiple failure types: Handles date format errors, memory failures, rate limits, timeouts, authentication issues, and model contradictions
  • Clean architecture: Monorepo with SDK, web app, and worker — easy to integrate and deploy
  • Comprehensive documentation: Integration guide, deployment guide, demo script — ready for developers to use
  • Gemini structured outputs: Successfully leveraged Gemini 3's structured outputs for type-safe analysis

What we learned

  • Gemini 3's structured outputs are powerful for generating type-safe analysis — the responseSchema feature ensures consistent JSON responses
  • Evidence-backed analysis requires careful prompt engineering — we had to explicitly instruct Gemini to reference specific step IDs in all claims
  • Causal graphs help visualize complex failure chains that text alone can't convey — React Flow made this interactive and intuitive
  • Building production-ready observability tools requires careful attention to rate limiting, caching, and error handling — not just the core analysis logic
  • Multi-step reasoning works well for analyzing trace timelines — Gemini can identify patterns across multiple events

What's next for FaultLine

  • Scale infrastructure: Migrate to Postgres + S3 for production scale (currently Redis-only)
  • Export capabilities: PDF export for reports, JSON/CSV downloads
  • Real-time updates: SSE/WebSocket for live trace updates
  • Search and filtering: Full-text search across traces, filter by failure type, date range
  • Team collaboration: Multi-user support, comments on traces, shared dashboards
  • Advanced analytics: Failure trend analysis, common root causes dashboard
  • SDK improvements: More language support (Python, Go), better batching, offline mode

Try it out

Quick Start:

npm install github:ashutosh887/FaultLine#packages/sdk
const tracer = new Tracer({ ingestUrl: "https://faultline.vercel.app" });
tracer.emit({ type: "user_input", payload: { text: "..." } });

Visit /runs to see 7 demo failure scenarios with full analysis!

Built With

  • bullmq
  • gemini3
  • next.js-16
  • next.js-api-routes
  • react-19
  • reactflow
  • redis
  • tailwind-css
Share this project:

Updates