rootcause

Inspiration

Production incidents are chaotic. Logs are noisy, alerts are vague, and engineers are forced to manually scan stack traces while users are impacted. We’ve all experienced the stress of trying to quickly answer: What broke? How bad is it? How do we fix it?

We built RootCause to act like a digital Incident Commander - structured, calm, and analytical. Instead of copying logs into search engines or guessing at solutions, we wanted a system that transforms raw logs into clear, actionable intelligence in seconds.

At its core, incident response is pattern recognition:

$$ \text{Logs} + \text{Context} \rightarrow \text{Root Cause} $$

RootCause automates that reasoning process.

What it does

RootCause is an AI-powered incident analysis platform.

Users paste raw server or application logs into a terminal-style interface. The system:

Analyzes logs using Gemini AI
Identifies the likely root cause
Assigns severity levels
Generates a confidence score
Extracts supporting evidence
Produces step-by-step remediation guidance

Each analysis becomes a structured incident record stored in PostgreSQL. Users can:

Track incident status (open/resolved)
Tag and favorite incidents
Perform bulk actions
Continue AI-guided remediation via chat
Export enhanced, watermarked, searchable PDF reports
Access a developer API with secure API keys

The result is a complete incident lifecycle tool — not just a log analyzer.

How we built it

RootCause is a full-stack TypeScript application.

Frontend

React + TypeScript (Vite)
Tailwind CSS v4 (dark terminal aesthetic)
shadcn/ui components
Framer Motion for transitions
TanStack React Query for server state
Supabase Magic Link authentication

Backend

Express 5 on Node.js
PostgreSQL database
Drizzle ORM for schema management
Zod for shared schema validation
JWT authentication via Supabase

AI Layer

Gemini 2.5 Flash for structured log analysis
Strict prompt engineering to enforce consistent outputs
Fallback regex-based analysis if AI is unavailable

Conceptually:

$$ f(\text{raw logs}) = {\text{root cause}, \text{severity}, \text{confidence}, \text{fix steps}} $$

Additional Systems

API key generation with SHA-256 hashing
Rate limiting (100 requests/day per key)
Tagging and favorites system
Bulk incident operations
Foxit PDF integration for searchable, watermarked exports

Challenges we ran into

1. Making AI output structured and reliable

Raw LLM responses are not production-ready. We had to:

Enforce strict formatting rules
Validate responses using Zod
Reject non-log input
Implement fallback analysis logic

2. Handling noisy log input

Logs often contain duplicate traces, irrelevant warnings, and partial failures. Extracting high-signal evidence required careful prompting and structured parsing.

3. Security and API key design

We didn’t want to store raw API keys. We implemented:

SHA-256 hashing before storage
Prefix-only display
One-time visibility
Revocation support
Usage tracking and rate limiting

4. Time pressure

This was built under hackathon constraints. Balancing AI integration, authentication, database persistence, and export features in limited time required strong prioritization.