SignalSage

SignalSage

Inspiration

Bragging rights, possible money to be gained, education, experience points, pursuit of knowledge, skill refinement, skill identification, and the opportunity to possibly contribute to the world of cybernetworks and cybersecurity.

What It Does

SignalSage is an AI-powered incident investigation copilot that connects to your live Splunk instance and automates the entire workflow from detection to resolution.

When an incident occurs, you simply point SignalSage at a service and time window. It then:

Automatically generates and executes 12 targeted SPL queries in parallel
Collects evidence across:
- Logs
- Metrics
- Traces
- Deployment events
Normalizes all evidence into a unified timeline

It then performs ML-powered analysis using Splunk's Machine Learning Toolkit:

Anomaly detection (z-score)
Log clustering
Cross-signal correlation
Latency distribution analysis

Output

SignalSage produces:

Ranked root cause hypotheses with confidence scores
Prioritized remediation playbooks, including:
- Risk levels
- Estimated resolution times
- Human approval gates for high-risk actions

Key Features

“Remediate Now”: Demonstrates autonomous AI-agent-driven incident response
Real-time monitoring dashboard: Auto-refreshes from Splunk
Ask Splunk Assistant:
- Query data in plain English
- Receive intelligent explanations (not raw tables)
Post-incident report generator:
- One-click export
- Markdown output (Confluence/Jira-ready)

Impact

SignalSage reduces Mean Time to Understand (MTTU) by replacing:

Manual dashboard switching
Writing SPL queries by hand
Mental cross-signal correlation

Result: A 45-minute investigation becomes a 30-second automated pipeline

How We Built It

We built SignalSage using:

Next.js 14
TypeScript
Tailwind CSS

Backend Architecture

Connects to Splunk Enterprise via REST API (port 8089)
Uses JWT authentication
Executes SPL queries via:
- Traditional search job lifecycle
- Faster oneshot export mode

Investigation Pipeline

Query Generator → 12 targeted SPL queries per incident
Live Evidence Collector → Executes queries in parallel
Evidence Normalizer → Converts results into typed data
Root Cause Analyzer:
- Uses 7 scoring models
- Enhanced with Splunk MLTK:
- Anomaly detection
- Clustering
- Forecasting
- Outlier detection
Remediation Engine → Maps hypotheses to playbooks

Additional Components

Splunk MCP server (development-time querying)
Natural language → SPL interface

Frontend

Tabbed investigation workflow
Real-time dashboard (auto-refresh)
Conversational AI assistant

Data Ingestion

Uses HEC (HTTP Event Collector)
Custom scripts generate realistic observability data:
- Logs, metrics, traces, deployments
All events use timestamps relative to “now” for freshness

Challenges We Ran Into

Splunk AI Assistant Integration

Successfully decoded cloud token and tenant/API structure
Blocked by OAuth2 issue:
- client_credentials rejected tenant ID as client_id
Python SDK (splunk-cloud-sdk) incompatible with Python 3.13

✅ Integration code is complete, but blocked on authentication

Performance Issues

The app frequently froze due to heavy UI effects:

backdrop-blur on many elements
Global will-change usage
50 confetti elements
Staggered animations across 100+ components

✅ Solution: Removed GPU-heavy effects and replaced with lighter alternatives

Query Performance

Splunk polling model:
- 1 request/second
- Up to 60 seconds latency

✅ Fixed using oneshot export mode

UI Glitch

Pulsing green border caused white flashes due to:

Hover state conflicts
Brightness filters
Inset box-shadow interactions

✅ Required multiple iterations to resolve

Accomplishments We’re Proud Of

✅ Fully connected to a real Splunk instance (not a demo)
✅ 12-query parallel pipeline produces meaningful results
✅ ML-powered root cause analysis works on live data
✅ “Remediate Now” demonstrates autonomous incident response
✅ Natural language assistant explains results clearly

Performance Milestone

End-to-end workflow completes in under 30 seconds:

Evidence collection
Root cause ranking
Remediation playbooks
Post-incident report generation

Production Readiness

Input validation
SPL injection prevention
Credential masking
Time window limits

What We Learned

The gap between a demo and product is performance
Heavy UI effects (blur, glassmorphism, animations):
- Look good in screenshots
- Hurt real-world usability

Key Technical Learnings

Splunk REST API is:
- Powerful
- Designed for asynchronous workflows

✅ Required:

Oneshot export mode
Parallel execution
Rule-based NL → SPL works for ~80% of use cases
Users value:
- Clear explanations
- Over perfect query translation
Splunk cloud AI:
- Powerful
- Difficult to integrate compared to on-prem

What’s Next for SignalSage

Immediate Next Step

Complete Splunk AI Assistant integration
- Awaiting proper OAuth2 credentials
Enables:
- LLM-powered SPL generation
- Advanced explanations

Near-Term Roadmap

Make “Remediate Now” fully functional:
- Kubernetes rollbacks
- Feature flag toggles
- Connection pool scaling
- Human approval workflows
Add real-time alerting:
- Auto-trigger investigations
- Shift to proactive operations

Longer-Term Vision

Multi-tenant support
Team collaboration:
- Shared investigations
- @mentions
- Handoffs
Continuous learning system:
- Improve root cause scoring from confirmed cases
- Build institutional knowledge
- Accelerate future incident resolution

Built With

app-router)
clustering
cross-signal-correlation
css-frameworks:-next.js-14-(react-18
forecasting-splunk-ai-assistant-(cloud-connected
javascript
jest-(testing)-platforms:-splunk-enterprise-10.2.3-(local-instance)
languages:-typescript
log-clustering
ml-boosted-confidence-scoring-(z-score-anomaly-detection
node.js-24
oneshot-export-splunk-http-event-collector-(port-8088)-?-data-ingestion-splunk-machine-learning-toolkit-(mltk-v5.7.4)-?-anomaly-detection
oneshot-synchronous-search-mode
openai-sdk-key-techniques:-jwt-token-authentication
parallel-query-execution
pending-oauth2-approval)-openai-api-(gpt-4o-mini)-?-fallback-ai-summaries-and-explanations-splunk-mcp-server-?-development-time-query-interface-web-audio-api-?-synthesized-ui-sound-effects-libraries:-zod-(runtime-validation)
polling
railway-(deployment)-apis-&-services:-splunk-rest-api-(port-8089)-?-search-job-creation
rule-based-nl-to-spl-conversion-with-ai-explanation-layer
sharp-(image-processing)
spl-(search-processing-language)
spl-injection-prevention-(allowlist-regex)
tailwind-css-3
uuid

Updates

Josh Lee started this project — Jun 15, 2026 11:53 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.