Inspiration

Ops/support teams struggle with fragmented signals—support tickets, error logs, uptime pings—without a unified incident view. Fireline aggregates these signals into incidents, auto-triages them, and generates communications.

What it does

Fireline is a real-time incident room that:

  • Ingests heterogeneous support events via Redpanda (Kafka-compatible streaming)
  • Aggregates events into incidents grouped by service and environment
  • Uses LLM-based triage to assign severity (SEV1/SEV2/SEV3), owner, and summaries
  • Generates internal and external status updates
  • Sends Slack notifications to assigned engineers and escalates SEV2+ incidents
  • Provides a real-time web UI showing incidents and status updates

How we built it

  • Backend: Java 21 + Akka SDK (Agents, Consumers, KeyValueEntities, Views, HTTP Endpoints)
  • Streaming: Redpanda for event ingestion
  • LLM: OpenAI API for triage and communications generation
  • Frontend: Next.js + TypeScript for the incident room UI
  • Observability: Distributed tracing with Jaeger (OpenTelemetry)
  • Notifications: Slack integration for real-time alerts

Challenges we ran into

  • Tracing setup: Switched from Akka Console to Jaeger for reliable trace visualization
  • Prompt injection: Implemented input validation and structured JSON parsing to mitigate LLM prompt injection
  • Distributed testing: Used Akka SDK TestKit patterns for testing event-driven components
  • Component orchestration: Coordinated Agents, Entities, and Consumers via ComponentClient

Accomplishments that we're proud of

  • End-to-end pipeline: Events → incidents → triage → comms → UI working end-to-end
  • Distributed tracing: Full visibility into the incident processing pipeline with Jaeger
  • Comprehensive testing: Unit, integration, and agent tests with 90%+ coverage
  • Slack integration: Real-time notifications with team-based routing and escalation
  • Production-ready architecture: Durable state with KeyValueEntities, proper error handling, and observability

What we learned

  • Akka SDK patterns: Agents, KeyValueEntities, Views, and component communication
  • Distributed tracing: OpenTelemetry integration and trace visualization with Jaeger
  • LLM safety: Prompt injection mitigation and structured output validation
  • Event-driven architecture: Building resilient systems with Kafka-compatible streaming

What's next for Fireline

  • More event types: Integrate monitoring systems (Datadog, New Relic) and chat platforms
  • Richer UI: Real-time updates via WebSockets, incident timeline visualization, and status page integration
  • Production hardening: Rate limiting, authentication, and multi-tenant support
  • Advanced triage: ML-based incident correlation and automated remediation suggestions

Built With

  • akka-sdk-3.5.9
  • docker
  • jaeger
  • java-21
  • junit
  • maven
  • next.js-16
  • openai-gpt-4o-mini
  • python
  • react-19
  • redpanda
  • slack-api
  • tailwind-css-4
  • typescript-5
Share this project:

Updates

posted an update

Post-Mortem: Tracing Implementation

What Broke

During development, we initially attempted to use the Akka Console for distributed tracing visualization. The Akka SDK provides built-in OpenTelemetry support, and we configured tracing export to what we thought was the Akka Console endpoint.

Issue: The Akka Console tracing endpoint couldn't be reached or wasn't properly configured for our local development setup. Traces were being generated but weren't visible in any UI, making it difficult to debug the incident processing pipeline.

Symptoms:

  • Tracing code executed without errors
  • Spans were created (verified via logs)
  • No traces appeared in Akka Console UI
  • Connection errors when trying to export traces to Akka Console endpoint

How We Fixed It

We switched to Jaeger for trace visualization, which is a well-established, production-ready distributed tracing system.

Solution:

  1. Added Jaeger to docker-compose.yml as a service
  2. Configured OpenTelemetry export to Jaeger's OTLP endpoint (http://localhost:4317)
  3. Updated backend/run.sh to set tracing environment variables pointing to Jaeger
  4. Verified traces appear correctly in Jaeger UI at http://localhost:16686

Why This Worked:

  • Jaeger is a standard OpenTelemetry-compatible backend
  • Easy to run locally via Docker
  • Provides excellent trace visualization and querying
  • Better suited for demonstrating observability to judges

Lessons Learned

  1. Use battle-tested tools for demos: Jaeger is more reliable and well-documented than trying to configure custom tracing endpoints
  2. Test observability early: We should have verified trace visibility sooner in development
  3. Docker Compose simplifies setup: Having Jaeger in docker-compose makes it easy to start/stop and share with the team

Current State

✅ Traces are working correctly in Jaeger
✅ End-to-end traces show: process-support-eventingest-eventtriage-incidentgenerate-comms
✅ Span attributes include: event.id, incident.id, severity, owner
✅ Error spans are properly recorded with exception details

The tracing implementation now provides excellent observability for debugging and demonstrating the incident processing pipeline.

Log in or sign up for Devpost to join the conversation.