TXENT: Agentic Observability Memory System

Built for the Splunk Agentic Ops Hackathon 2026

GitHub Repository


📐 Architecture Diagram

System Architecture Overview

TXENT Architecture

View Full Resolution: architecture_diagram.png on GitHub

Data Flow & Component Interactions

┌─────────────────────────────────────────────────────────────────────────────┐
│                          TXENT COMMAND CENTER                               │
│                  (frontend/txent_final.html — Browser UI)                    │
│  ┌──────────┐  ┌──────────────┐  ┌────────────┐  ┌──────────────────────┐  │
│  │ Incident  │  │  Memory      │  │ Knowledge  │  │ Investigation       │  │
│  │ Trigger   │  │  Retrieval   │  │ Graph      │  │ Results Panel       │  │
│  └─────┬─────┘  └──────┬───────┘  └─────┬──────┘  └──────────┬──────────┘  │
└────────┼───────────────┼────────────────┼──────────────────────┼─────────────┘
         │               │                │                      │
         ▼               ▼                ▼                      ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                        FastAPI Backend (api/main.py)                         │
│                                                                              │
│   POST /api/ingest/incident    POST /retrieve    GET /graph                  │
│   POST /api/actions/execute    GET /api/dashboard GET /api/splunk/readings    │
│   GET  /api/splunk/status      POST /api/splunk/mcp/search                   │
└──────────────────────────────────┬───────────────────────────────────────────┘
                                   │
                                   ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                    Orchestrator (core/orchestrator.py)                        │
│                                                                              │
│  Coordinates all memory layers, kick detection, and agent investigations     │
└───┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────────────┘
    │          │          │          │          │          │
    ▼          ▼          ▼          ▼          ▼          ▼
┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌─────────────────┐
│   L1   │ │   L2   │ │   L3   │ │   L4   │ │  KICK  │ │  INVESTIGATOR   │
│Surface │ │Assoc.  │ │Struct. │ │Archety │ │Detector│ │  AGENT          │
│Memory  │ │Graph   │ │Patterns│ │pes     │ │        │ │                 │
│        │ │        │ │        │ │        │ │Contra- │ │ Autonomous      │
│Vector  │ │Service │ │7 obser │ │Deep    │ │diction │ │ investigation   │
│Search  │ │Topology│ │vability│ │ops     │ │& diver │ │ with evidence   │
│Qdrant  │ │Network │ │pattern │ │priors  │ │gence   │ │ collection      │
│+ SBert │ │X+spaCy │ │matching│ │        │ │scoring │ │ + LLM reasoning │
└────────┘ └────────┘ └────────┘ └────────┘ └───┬────┘ └────────┬────────┘
                                                 │               │
                                   ┌─────────────┘               │
                                   │  If contradiction detected  │
                                   │  → trigger investigation ───┘
                                   │
                                   ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                   Splunk Connector (connectors/splunk.py)                     │
│                                                                              │
│   ┌────────────────────┐  ┌──────────────────┐  ┌─────────────────────┐     │
│   │ Splunk REST API    │  │ Splunk MCP Server │  │ Simulation Engine  │     │
│   │ (port 8089)        │  │ (Model Context    │  │ (High-fidelity     │     │
│   │                    │  │  Protocol)        │  │  telemetry for     │     │
│   │ • SPL searches     │  │                   │  │  demo/dev mode)    │     │
│   │ • Index queries    │  │ • splunk_search   │  │                    │     │
│   │ • Server info      │  │ • splunk_indexes  │  │ • CPU/Memory spikes│     │
│   │ • Saved searches   │  │ • splunk_kvstore  │  │ • Error rate surges│     │
│   └────────┬───────────┘  └────────┬──────────┘  │ • DB pool exhaust  │     │
│            │                       │              └─────────┬──────────┘     │
│            │                       │                        │                │
│            ▼                       ▼                        ▼                │
│   ┌────────────────────────────────────────────────────────────────┐         │
│   │              Bidirectional Data Flow Logic                     │         │
│   │  • Ingest: Pull logs/alerts from Splunk indexes               │         │
│   │  • Query: Execute SPL searches for evidence gathering          │         │
│   │  • Enhance: Call AI Toolkit hosted models                      │         │
│   │  • Feedback: Push investigation results back via HEC           │         │
│   └────────────────────────────────────────────────────────────────┘         │
└──────────────────────────────┬───────────────────────────────────────────────┘
                               │
                               ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                        SPLUNK ENTERPRISE                                     │
│                                                                              │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐    │
│  │ MCP Server   │  │ AI Toolkit   │  │ AI Assistant │  │ Indexes      │    │
│  │ App (#7931)  │  │ App (#2890)  │  │ App (#7245)  │  │ (main, etc.) │    │
│  │              │  │              │  │              │  │              │    │
│  │ Model Context│  │ Hosted Models│  │ Natural Lang │  │ Log/Metric   │    │
│  │ Protocol     │  │ • Foundation │  │ SPL queries  │  │ storage      │    │
│  │ integration  │  │   Sec 8B     │  │              │  │              │    │
│  │              │  │ • Cisco Deep │  │              │  │              │    │
│  │              │  │   Time Series│  │              │  │              │    │
│  └──────────────┘  └──────────────┘  └──────────────┘  └──────────────┘    │
│                                                                              │
│  HTTP Event Collector (HEC) ← TXENT pushes investigation results back       │
└─────────────────────────────────────────────────────────────────────────────┘


┌──────────────────────────────────────────────────────────────────────────────┐
│                          AI MODELS INTEGRATION                               │
└──────────────────────────────────────────────────────────────────────────────┘

┌─────────────────────┐     ┌──────────────────────┐     ┌──────────────────┐
│  Embedding Models   │     │  LLM Backends        │     │ Splunk AI Toolkit│
│                     │     │                      │     │ Hosted Models    │
│ • Sentence-BERT     │────▶│ • Google Gemini      │────▶│                  │
│   (all-mpnet-v2)    │     │   (gemini-2.5-flash) │     │ • Foundation-Sec │
│                     │     │                      │     │   1.1-8B-Instruct│
│ • Used in L1, L3    │     │ • OpenAI GPT-4o-mini │     │   (Security LLM) │
│   for semantic      │     │                      │     │                  │
│   similarity        │     │ • Local LLMs via     │     │ • Cisco Deep Time│
│                     │     │   OpenAI-compatible  │     │   Series Model   │
│ • Pure Python       │     │   endpoints          │     │   (Anomaly       │
│   fallback if not   │     │                      │     │   forecasting)   │
│   installed         │     │ • Optional: Works    │     │                  │
│                     │     │   without LLM        │     │ • Cloud-hosted   │
│                     │     │   (retrieval-based)  │     │   (no GPU needed)│
└─────────────────────┘     └──────────────────────┘     └──────────────────┘
         │                            │                             │
         └────────────────────────────┼─────────────────────────────┘
                                      │
                                      ▼
                        ┌──────────────────────────┐
                        │  Investigation Agent     │
                        │  (agents/investigator.py)│
                        │                          │
                        │  Uses all three:         │
                        │  1. Embeddings for       │
                        │     semantic search      │
                        │  2. LLM for narrative    │
                        │     RCA generation       │
                        │  3. Splunk AI for        │
                        │     classification +     │
                        │     forecasting          │
                        └──────────────────────────┘

Key Data Flow Sequences

1️⃣ Alert Ingestion Flow

Splunk Alert → REST API → TXENT /ingest → L1 Surface Memory
                                        → L2 Graph Extraction
                                        → Metadata Enrichment
                                        → Memory Persistence

2️⃣ Kick Detection Flow

User Query → /retrieve → L1 Vector Search → Top-K Results
                       → L2 Graph Traversal → Service Topology
                       → L3 Pattern Matching → Structural Signatures
                       → Kick Detector:
                         • Semantic Similarity (L1 ↔ L3)
                         • Topological Analysis (L2 paths)
                         • Contradiction Scoring
                       → If Score > Threshold (0.42):
                         🔥 KICK FIRES → Launch Agent

3️⃣ Autonomous Investigation Flow

Kick Fires → Investigation Agent Triggered
           → Step 1: Query Splunk REST API
             • Search logs for affected services
             • Gather metrics snapshots (CPU, memory, connections)
           → Step 2: Query Splunk MCP Server
             • Execute structured SPL searches
             • Retrieve saved search results
           → Step 3: Call Splunk AI Toolkit
             • Foundation-Sec: Security classification
             • Deep Time Series: Anomaly forecasting
           → Step 4: L2 Graph Traversal
             • Identify upstream/downstream dependencies
             • Check health status of related services
           → Step 5: Evidence Synthesis
             • Build timeline with timestamps
             • Correlate metrics with patterns
             • Generate root cause hypothesis
           → Step 6: LLM Narrative Generation
             • Call Gemini/OpenAI for RCA report
             • Format actionable recommendations
           → Step 7: Push Results to Splunk HEC
             • Create txent:investigation event
             • Close feedback loop

4️⃣ Bidirectional Splunk Loop

┌─────────────┐
│   SPLUNK    │
│  ENTERPRISE │
└──────┬──────┘
       │
       │ ① Ingest: Alert/Log data flows INTO TXENT
       ▼
┌─────────────┐
│    TXENT    │
│   Memory +  │
│    Agent    │
└──────┬──────┘
       │
       │ ② Analyze: Kick fires, Agent investigates
       ▼
┌─────────────┐
│ Investigation│
│   Results   │
└──────┬──────┘
       │
       │ ③ Feedback: Investigation pushed back to Splunk via HEC
       ▼
┌─────────────┐
│   SPLUNK    │
│   HEC Index │
│ (txent:inv) │
└─────────────┘

Component Responsibilities

Component Technology Purpose
L1 Surface Memory Qdrant + Sentence-BERT Semantic vector search over raw logs/alerts
L2 Associative Graph NetworkX + spaCy Service topology and dependency mapping
L3 Structural Patterns Sentence-BERT + Cosine Similarity Match incidents to 7 known anti-patterns
L4 Operational Archetypes Rule Engine Apply senior SRE wisdom and operational priors
Kick Detector Hybrid Semantic + Graph Analysis Detect contradictions between layers
Investigation Agent Async Python + LLM + Splunk APIs Autonomous root cause investigation
Splunk Connector httpx + REST + MCP + HEC Unified Splunk integration layer
FastAPI Backend FastAPI + uvicorn REST API and SSE streaming
Command Center UI Vanilla JS + Canvas Real-time monitoring dashboard

AI Model Integration Details

Embedding Models (Local)

  • Model: sentence-transformers/all-mpnet-base-v2
  • Usage: L1 surface memory vector encoding, L3 pattern matching
  • Dimension: 768-dimensional embeddings
  • Fallback: Pure Python hash-based vectors if library unavailable

LLM Backends (Cloud/Local)

  • Google Gemini: gemini-2.5-flash for narrative RCA generation
  • OpenAI: gpt-4o-mini as alternative backend
  • Local Models: Any OpenAI-compatible endpoint (vLLM, Ollama, LM Studio)
  • Usage: Generate investigation narratives, explain contradictions
  • Optional: System works with retrieval-based answers if no LLM configured

Splunk AI Toolkit (Cloud)

  • Foundation-Sec-1.1-8B: Security-focused incident classification
  • Cisco Deep Time Series: Anomaly detection and trend forecasting
  • Usage: Enhanced analysis during autonomous investigations
  • Benefit: No local GPU required, enterprise-grade models

🎯 Inspiration

Modern observability platforms excel at detecting anomalies but struggle with understanding them. When a database timeout alert fires at 3 AM, on-call engineers face a critical question: Is this the root cause or just a symptom?

Traditional monitoring tools treat every incident in isolation, lacking memory of past patterns. A timeout might indicate database overload, or it could be a downstream symptom of cache saturation, network partition, or deployment drift. TXENT was born from a simple insight: observability systems should learn like humans do — building intuition over time and recognizing when surface symptoms contradict deeper operational patterns.

We asked ourselves: What if an AI agent could remember every incident, understand service dependencies, detect contradictions between symptoms and structural patterns, and autonomously investigate the true root cause?

That question led us to build TXENT — an observability system with persistent memory, structural pattern recognition, and autonomous investigation capabilities powered by Splunk Enterprise.


🚀 What it does

TXENT transforms traditional reactive monitoring into agentic, memory-driven observability. Instead of simply displaying alerts, TXENT:

Core Capabilities

🧠 4-Layer Memory Architecture

  • L1 Surface Memory: Stores raw logs, alerts, and telemetry with semantic vector search capabilities
  • L2 Associative Graph: Maps service relationships and dependencies in a knowledge graph (e.g., payment-api → redis-cache → postgres-db)
  • L3 Structural Patterns: Matches incidents against 7 pre-defined observability anti-patterns (Cache Saturation, Cascading Failure, Network Partition, Configuration Drift, etc.)
  • L4 Operational Archetypes: Encodes deep operational wisdom — the kind of knowledge senior SREs accumulate over years

⚡ The Kick Mechanism (Our Key Innovation)

  • Continuously compares incoming surface alerts against learned structural patterns
  • Calculates contradiction scores using semantic similarity and topological analysis
  • Fires a "Kick" when divergence exceeds threshold (default 0.42), signaling that the alert is likely a downstream symptom, not the root cause
  • Example: A "Database Timeout" alert arrives, but the Kick detector recognizes the topological signature of "Cache Saturation" — database timeouts are the effect, not the cause

🤖 Autonomous Investigation Agent

  • Automatically launches when the Kick mechanism fires
  • Queries Splunk Enterprise via REST API and MCP Server for evidence
  • Traverses service dependency graphs to identify upstream/downstream impacts
  • Builds evidence timelines with timestamps
  • Calls Splunk hosted AI models (Foundation-Sec, Deep Time Series) for enhanced analysis
  • Generates root cause reports with actionable remediation steps
  • Pushes investigation results back to Splunk via HTTP Event Collector (HEC) for complete feedback loop

🔗 Bidirectional Splunk Integration

  • Ingest: Pulls logs, alerts, and metrics from Splunk Enterprise indexes
  • Query: Executes SPL searches via REST API and MCP Server
  • Enhance: Leverages Splunk AI Toolkit hosted models (Foundation-Sec-1.1-8B, Cisco Deep Time Series)
  • Feedback: Pushes investigation results back to Splunk via HEC
  • Graceful Fallback: High-fidelity simulation mode when Splunk is unavailable (perfect for demos and development)

📊 Real-Time Command Center

  • Interactive dashboard with live Splunk readings
  • Topology graph visualization with service health status
  • Sparkline charts for CPU, memory, latency, and error rates
  • Investigation timeline with evidence collection logs
  • One-click remediation action execution
  • Sentinel chat interface for natural language queries

🛠️ How we built it

Technology Stack

Backend Core

  • Python 3.10+ with FastAPI for high-performance async API
  • Sentence Transformers (all-mpnet-base-v2) for semantic embeddings with pure Python fallback
  • Qdrant (in-memory vector database) for L1 surface memory with mock fallback
  • spaCy (en_core_web_lg) for entity extraction with intelligent fallback
  • NetworkX for knowledge graph operations with mock implementation
  • httpx for async Splunk API communication

LLM Integration

  • Flexible LLM backend supporting:
    • Google Gemini (gemini-2.5-flash) via API
    • OpenAI (gpt-4o-mini) via API
    • Local models via OpenAI-compatible endpoints
  • Retrieval-based answers work even without LLM

Splunk Enterprise Integration

  • REST API (port 8089): SPL searches, index queries, server info
  • MCP Server (Model Context Protocol): Structured agent-Splunk communication
  • HTTP Event Collector (HEC): Bidirectional event ingestion
  • AI Toolkit: Hosted models (Foundation-Sec, Deep Time Series forecasting)
  • Simulation Engine: High-fidelity 6-service microarchitecture fallback

Frontend

  • Single-file HTML/CSS/JavaScript (no build tools required)
  • Canvas-based topology visualization
  • Real-time polling (/api/dashboard every 5 seconds)
  • Server-Sent Events (SSE) for streaming responses
  • Demo mode with realistic drifting metrics when backend is offline

Architecture Design

We structured TXENT around separation of concerns:

  1. Memory Layers (layers/) - Each layer handles a specific depth of reasoning
  2. Core Orchestrator (core/orchestrator.py) - Coordinates all layers and manages lifecycle
  3. Kick Detector (core/kick.py) - Contradiction detection logic
  4. Investigation Agent (agents/investigator.py) - Autonomous analysis engine
  5. Splunk Connector (connectors/splunk.py) - Unified Splunk integration with graceful degradation
  6. FastAPI Backend (api/main.py) - REST API with 30+ endpoints
  7. Command Center UI (frontend/txent_final.html) - Real-time monitoring dashboard

Development Journey

Phase 1: Foundation (Days 1-2)

  • Built L1 surface memory with vector embeddings
  • Implemented L2 knowledge graph with service topology
  • Created L3 structural pattern matching with 7 signature schemas
  • Developed L4 operational archetypes with wisdom rules

Phase 2: Intelligence (Days 3-4)

  • Designed and implemented the Kick mechanism
  • Built contradiction detection using semantic + topological analysis
  • Created autonomous investigation agent with evidence collection
  • Integrated multiple LLM backends (Gemini, OpenAI, local)

Phase 3: Splunk Integration (Days 5-6)

  • Connected Splunk REST API for SPL queries
  • Implemented MCP Server integration
  • Added HTTP Event Collector feedback loop
  • Integrated Splunk AI Toolkit hosted models
  • Built high-fidelity simulation engine for graceful fallback

Phase 4: User Experience (Days 7-8)

  • Designed and built interactive Command Center UI
  • Implemented real-time dashboard polling
  • Added sparkline metrics visualization
  • Created investigation timeline display
  • Built Sentinel chat interface for natural language queries

💪 Challenges we ran into

1. Balancing Precision with Explainability

The Kick mechanism needed to detect contradictions reliably without overwhelming users with false positives. We solved this by:

  • Combining semantic similarity (cosine distance) with topological graph analysis
  • Implementing a tunable threshold system (default 0.42 after extensive testing)
  • Providing detailed reasoning in the Kick result payload

2. Graceful Degradation Across Dependencies

Not everyone has access to Splunk Enterprise, GPU resources, or commercial LLM APIs. We addressed this with:

  • Pure Python fallbacks for sentence-transformers, spaCy, NetworkX, and Qdrant
  • High-fidelity simulation mode that mimics real Splunk behavior
  • LLM-optional architecture (retrieval-based answers work without external models)
  • Zero-Docker, zero-GPU installation

3. Real-Time Bidirectional Splunk Integration

Creating a complete feedback loop (Splunk → TXENT → Investigation → Splunk) required:

  • Handling three different Splunk integration points (REST, MCP, HEC)
  • Dealing with varying authentication schemes
  • Managing SSE streaming for MCP Server responses
  • Implementing connection health monitoring with automatic simulation fallback

4. Maintaining Context Across Memory Layers

Coordinating retrieval across 4 layers while preserving metadata (incident_id, severity, service, timestamp) was complex. We solved this by:

  • Standardizing payload structures across all layers
  • Implementing metadata propagation through the orchestrator
  • Creating a unified context builder that merges L1–L4 results

5. Building a Production-Ready Dashboard Without Modern Frameworks

We deliberately avoided React/Vue/Angular to keep TXENT dependency-free. This meant:

  • Hand-coding canvas rendering for topology graphs
  • Implementing custom sparkline chart rendering
  • Building our own SSE event handling for streaming responses
  • Creating a polling architecture that gracefully handles backend failures

🏆 Accomplishments that we're proud of

Technical Achievements

The Kick Mechanism - A novel contradiction detection system that combines semantic embeddings, graph topology, and operational pattern matching to identify when surface alerts are masking deeper root causes

Full Bidirectional Splunk Integration - Complete data flow: ingest from Splunk → analyze with TXENT → investigate with agent → push results back to Splunk via HEC

Zero-Dependency Fallbacks - Runs on any Python 3.10+ environment without GPU, Docker, Splunk, or commercial LLM access

Autonomous Investigation Agent - Executes real Splunk queries, builds evidence timelines, correlates metrics, and generates actionable root cause reports

Splunk AI Toolkit Integration - Leverages hosted Foundation-Sec and Deep Time Series models for security classification and anomaly forecasting

Production-Grade API - 30+ REST endpoints with proper error handling, SSE streaming, health checks, and comprehensive documentation

Real-Time Command Center - Single-file dashboard with live topology, sparklines, investigation timelines, and natural language chat

Design Achievements

4-Layer Memory Architecture - A novel approach inspired by human cognitive patterns: Surface → Associative → Structural → Archetypal reasoning

High-Fidelity Simulation - Realistic 6-service microarchitecture with spiking metrics that enables demos without infrastructure

Explainable AI - Every Kick decision includes detailed reasoning, divergence scores, and evidence chains

Extensible Pattern System - L3 structural patterns and L4 archetypes are defined in simple Python dictionaries, making it easy to add new operational patterns


📚 What we learned

About Observability

  • Symptoms vs. Root Causes: Most alerts are downstream symptoms. Detecting contradictions requires understanding service topology and historical patterns
  • Memory Matters: Stateless monitoring systems repeat the same diagnostic work on every incident. Persistent memory dramatically improves investigation speed
  • Human Intuition is Codifiable: Senior SRE knowledge (L4 archetypes) can be encoded as rules and matched against structural patterns

About Splunk

  • MCP Server is Powerful: Model Context Protocol provides structured agent-Splunk communication superior to raw SPL queries
  • AI Toolkit Opens New Possibilities: Hosted models (Foundation-Sec, Deep Time Series) eliminate GPU requirements while providing specialized analysis
  • REST API + HEC = Complete Loop: Bidirectional integration creates a self-improving system where investigation results feed back into Splunk

About AI Agents

  • Retrieval Before Generation: Vector search + graph traversal often provides better answers than pure LLM generation
  • Threshold Tuning is Critical: The Kick threshold (0.42) required extensive testing to balance precision and recall
  • Evidence Chains Build Trust: Showing the agent's reasoning process (timeline, queries executed, metrics analyzed) is essential for adoption

About System Design

  • Graceful Degradation > Hard Dependencies: Every external service (Splunk, LLM, GPU libraries) has a fallback path
  • Single-File Dashboards Work: A standalone HTML file with embedded CSS/JS simplifies deployment dramatically
  • Mock Implementations Enable Testing: Pure Python fallbacks for spaCy, NetworkX, and Qdrant allow CI/CD without heavy dependencies

🔮 What's next for TXENT: Agentic Observability Memory System

Short-Term Roadmap

Enhanced Pattern Learning

  • [ ] Automatic L3 pattern discovery from historical incidents (unsupervised clustering)
  • [ ] User-defined custom patterns via UI
  • [ ] Pattern confidence scoring based on remediation success rates

Advanced Agent Capabilities

  • [ ] Multi-agent collaboration (specialized agents for network, database, cache, etc.)
  • [ ] Automated remediation execution with rollback capability
  • [ ] Predictive incident detection (trigger investigations before alerts fire)

Expanded Splunk Integration

  • [ ] Splunk AI Assistant integration for natural language SPL generation
  • [ ] Real-time webhook alerts (Splunk → TXENT automatic ingestion)
  • [ ] ITSI (IT Service Intelligence) integration for business service mapping

Production Features

  • [ ] Multi-tenancy with workspace isolation
  • [ ] RBAC (Role-Based Access Control)
  • [ ] Audit logging for all agent actions
  • [ ] Kubernetes deployment manifests

Long-Term Vision

Self-Improving Memory Build a reinforcement learning layer that tracks remediation outcomes and automatically adjusts Kick thresholds, pattern weights, and archetype rules based on resolution success rates.

Incident Knowledge Graph Extend L2 to store not just service relationships but historical incident connections: which incidents co-occur, which remediations work for which patterns, which teams resolved similar issues.

Natural Language Operations Enable full natural language control: "Show me all cache saturation incidents from last month" or "Execute the recommended remediation for incident INC-2026-001"

Open Source Observability Plugins Build connectors for Prometheus, Grafana, Datadog, New Relic, and other platforms to make TXENT's memory architecture available beyond Splunk.

Commercial Offering Package TXENT as a Splunk Enterprise app on Splunkbase with:

  • One-click installation
  • Pre-configured saved searches and dashboards
  • Hosted deployment option (SaaS)
  • Enterprise support and custom pattern development services

🎓 Key Learnings for the Community

  1. Memory Transforms Monitoring: Stateless alerting systems treat every incident as novel. Adding persistent memory with semantic search and graph relationships enables pattern recognition at human-expert levels.

  2. Contradictions are Signals: When a surface alert contradicts historical structural patterns, that divergence is valuable information — it often indicates the alert is a symptom, not the cause.

  3. Retrieval + LLM > Pure LLM: Vector search and knowledge graphs provide grounded, explainable context that pure LLM generation cannot match. Hybrid architectures are the future.

  4. Graceful Degradation Enables Adoption: Making every dependency optional (Splunk, GPU, LLM APIs) dramatically lowers barriers to trying the system.

  5. Observability is Cognitive Work: The L1→L2→L3→L4 progression mirrors how human experts reason: surface facts → relationships → patterns → wisdom. Building AI systems that match this structure feels natural to operators.


🙏 Acknowledgments

Built with ❤️ for the Splunk Agentic Ops Hackathon 2026

Special thanks to:

  • Splunk Engineering for the MCP Server, AI Toolkit, and comprehensive REST APIs
  • Open Source Community for sentence-transformers, spaCy, FastAPI, and NetworkX
  • The SRE Community for sharing operational wisdom that inspired the L4 Archetypes

📦 Try TXENT Today

# Clone the repository
git clone https://github.com/your-org/txent.git
cd txent

# Install dependencies
pip install -r requirements.txt

# Configure (optional - simulation mode works without config)
cp .env.example .env
# Edit .env with your Splunk credentials and LLM API keys

# Start the server
python -m uvicorn api.main:app --host 0.0.0.0 --port 8000 --reload

# Open the Command Center
# http://localhost:8000

No Splunk? No GPU? No problem! The high-fidelity simulator lets you explore TXENT's full capabilities including the Kick mechanism, autonomous investigation, and real-time dashboard with zero infrastructure.


📚 Resources


**TXENT — Because observability should have a memory.** Built for **Splunk Agentic Ops Hackathon 2026** 🏆

Built With

  • fastapi
  • google-gemini
  • html/css/javascript
  • knowledge-graphs
  • openai-gpt
  • python
  • rest-apis
  • server-sent-events
  • splunk-enterprise
  • splunk-hec
  • splunk-mcp
  • uvicorn
  • vector
Share this project:

Updates