Inspiration
When AI agents fail, debugging is manual and time-consuming. Engineers spend hours tracing through logs, tool calls, and model outputs to find root causes. There's no automated way to identify the exact point of failure, understand contributing factors, or get actionable fix suggestions.
We built FaultLine to solve this problem — an automated root-cause analysis platform that uses Gemini 3 to produce evidence-backed forensic reports for AI agent failures.
What it does
FaultLine captures every step of agent execution (user inputs, tool calls, model outputs, memory operations) and automatically analyzes traces to produce:
- Evidence-backed root cause analysis — Every claim links to specific trace events (clickable in UI)
- Causal graph visualization — Interactive React Flow graph showing failure propagation
- Actionable fix suggestions — Categorized by prompt, tooling, memory, and orchestration issues
- Confidence scores — Root cause and contributing factors ranked with confidence (0-1)
Key Features:
- Clickable evidence links that scroll to timeline steps
- Interactive causal graphs highlighting first divergence point
- Counterfactual analysis ("If X, then Y")
- Production-ready: rate limiting, retries, caching, metrics
- Security: secret/PII redaction, access control
How we built it
Architecture:
- Frontend: Next.js 16, React 19, Tailwind CSS, React Flow for causal graphs
- Backend: Next.js API routes, BullMQ for job queuing
- Worker: Node.js worker processes traces with Gemini 3
- Storage: Redis for events, reports, and artifacts
- SDK: TypeScript instrumentation library for easy integration
Gemini Integration:
- Uses structured outputs (
responseMimeType: "application/json"+responseSchema) for type-safe verdicts - Multi-step reasoning to analyze entire trace timelines
- Evidence linking — references specific step IDs ("Step 1", "Step 2") in all claims
- Causal graph generation — builds nodes/edges showing dependencies and contradictions
- Confidence scoring for root cause and contributing factors
Performance Optimizations:
- Evidence slicer — sends only relevant events to Gemini for efficiency
- Token budget management (maxOutputTokens: 4096)
- Caching — skips Gemini if trace unchanged
- Rate limiting (100/min per IP)
- Dead-letter queue for failed jobs
Monorepo Structure:
apps/web: Next.js app (API + UI)apps/worker: BullMQ worker (forensics)packages/sdk: TypeScript instrumentation SDKpackages/shared: Zod schemas, types, utilities
Challenges we ran into
- Structured Output Parsing: Ensuring Gemini returns valid JSON verdicts with all required fields — solved by using Zod schemas for validation
- Evidence Linking: Making sure every claim references specific trace events that users can verify — required careful prompt engineering
- Causal Graph Generation: Building accurate dependency graphs from unstructured trace data — used Gemini to infer relationships
- Performance: Optimizing token usage while maintaining analysis quality — implemented evidence slicer and caching
- Real-time Processing: Handling async job processing with BullMQ while keeping UI responsive — used job queuing with status polling
Accomplishments that we're proud of
- Evidence-backed analysis: Every claim in the verdict links to specific trace events — no black box analysis
- Production-ready: Built with rate limiting, retries, caching, metrics, and security features
- Multiple failure types: Handles date format errors, memory failures, rate limits, timeouts, authentication issues, and model contradictions
- Clean architecture: Monorepo with SDK, web app, and worker — easy to integrate and deploy
- Comprehensive documentation: Integration guide, deployment guide, demo script — ready for developers to use
- Gemini structured outputs: Successfully leveraged Gemini 3's structured outputs for type-safe analysis
What we learned
- Gemini 3's structured outputs are powerful for generating type-safe analysis — the
responseSchemafeature ensures consistent JSON responses - Evidence-backed analysis requires careful prompt engineering — we had to explicitly instruct Gemini to reference specific step IDs in all claims
- Causal graphs help visualize complex failure chains that text alone can't convey — React Flow made this interactive and intuitive
- Building production-ready observability tools requires careful attention to rate limiting, caching, and error handling — not just the core analysis logic
- Multi-step reasoning works well for analyzing trace timelines — Gemini can identify patterns across multiple events
What's next for FaultLine
- Scale infrastructure: Migrate to Postgres + S3 for production scale (currently Redis-only)
- Export capabilities: PDF export for reports, JSON/CSV downloads
- Real-time updates: SSE/WebSocket for live trace updates
- Search and filtering: Full-text search across traces, filter by failure type, date range
- Team collaboration: Multi-user support, comments on traces, shared dashboards
- Advanced analytics: Failure trend analysis, common root causes dashboard
- SDK improvements: More language support (Python, Go), better batching, offline mode
Try it out
- Live Demo: [Your Vercel URL]
- GitHub: https://github.com/ashutosh887/faultline
- Documentation: See README.md, INTEGRATION.md, DEPLOYMENT.md
Quick Start:
npm install github:ashutosh887/FaultLine#packages/sdk
const tracer = new Tracer({ ingestUrl: "https://faultline.vercel.app" });
tracer.emit({ type: "user_input", payload: { text: "..." } });
Visit /runs to see 7 demo failure scenarios with full analysis!
Built With
- bullmq
- gemini3
- next.js-16
- next.js-api-routes
- react-19
- reactflow
- redis
- tailwind-css
Log in or sign up for Devpost to join the conversation.