zPilot AI — AI-Native Operational Intelligence for Enterprise z/OS Systems

Inspiration

Modern enterprises still rely heavily on mainframe systems for critical workloads like banking transactions, payroll processing, insurance claims, airline reservations, and government infrastructure. Despite powering some of the world’s most important systems, operational tooling around these environments often remains fragmented, reactive, and highly dependent on senior specialists.

We wanted to explore what a modern AI-native operational experience for z/OS could look like.

The idea behind zPilot AI was to combine:

  • AI-driven operational reasoning
  • enterprise observability
  • incident intelligence
  • batch workflow visibility
  • Zowe CLI integration
  • operational replay systems

into a unified operational intelligence platform designed specifically for enterprise mainframe environments.

Instead of treating AI as just a chatbot, we wanted AI to behave like an active operational participant capable of correlating incidents, analyzing failures, and guiding recovery workflows.


What zPilot AI Does

zPilot AI is an enterprise operational intelligence platform built for z/OS environments.

The platform includes:

Operational Monitoring

  • Real-time operational health dashboards
  • Alert intelligence and incident clustering
  • Infrastructure anomaly detection
  • Batch execution monitoring

AI-Powered ABEND Analysis

The ABEND Explainer analyzes:

  • JESMSGLG logs
  • SYSOUT data
  • JCL failures
  • dataset issues
  • DB2 and VSAM operational problems

It identifies:

  • probable root causes
  • failing job steps
  • operational dependencies
  • business impact
  • recovery recommendations

Job Tracker

An interactive enterprise batch visualization system showing:

  • job dependencies
  • critical paths
  • execution delays
  • downstream impact
  • SLA risks

Knowledge Graph

A live operational dependency map connecting:

  • jobs
  • DB2 systems
  • IMS databases
  • datasets
  • APIs
  • infrastructure services

This allows operators to visualize blast radius and cascading operational failures.

AI Operational Agents

The platform includes multiple AI operational agents:

  • Signal Extractor
  • Root Cause Analyst
  • Dependency Mapper
  • Recovery Planner
  • Batch Optimizer
  • JCL Intelligence Agent

Each agent continuously analyzes operational events and contributes contextual recommendations.

Zowe CLI Integration Layer

We also implemented a simulated enterprise integration layer inspired by real Zowe CLI workflows.

The system supports operational flows such as:

  • job inspection
  • spool viewing
  • dataset operations
  • DB2 query execution
  • console commands
  • TSO command execution

This helps bridge traditional mainframe operations with modern AI-native experiences.


Enterprise Incident Simulation Engine

One of the most exciting parts of the project is the live Incident Simulation Engine.

Operators can trigger scenarios like:

  • payroll batch failures
  • DB2 subsystem degradation
  • storage exhaustion
  • dataset corruption
  • cascading batch delays

The platform then simulates:

  • alert storms
  • dependency propagation
  • operational degradation
  • AI-driven recovery workflows
  • business impact escalation

This creates a cinematic operational “war room” experience for enterprise incident management.


How We Built It

We used MeDo as an agentic development environment to rapidly prototype and orchestrate the application architecture.

The project was built using:

  • React
  • Tailwind CSS
  • component-driven frontend architecture
  • AI-assisted workflow generation
  • mocked enterprise operational datasets
  • simulated Zowe CLI integrations
  • interactive operational visualizations

A large focus was placed on:

  • enterprise-grade UX
  • operational realism
  • AI orchestration
  • animated dashboards
  • believable operational workflows

Rather than building isolated AI features, we focused on creating a cohesive AI-native operational system.


Challenges We Faced

One of the biggest challenges was balancing:

  • realism
  • technical depth
  • visual polish
  • development speed

Mainframe environments are highly specialized, so we needed to create operational flows that felt authentic while still remaining understandable to a broader audience.

Another challenge was designing AI interactions that felt operationally meaningful rather than simply conversational. We wanted the AI agents to behave like active participants in incident response workflows instead of generic assistants.

Designing realistic operational simulations and dependency visualizations within hackathon constraints was also a major challenge.


What We Learned

Through this project we learned:

  • how powerful agentic AI development workflows can be
  • how modern UX can transform enterprise tooling
  • how operational intelligence systems benefit from multi-agent reasoning
  • how AI can bridge the gap between legacy enterprise infrastructure and modern developer experiences

We also explored how enterprise observability, incident response, and AI reasoning can be unified into a single operational platform.


Future Scope

Future plans for zPilot AI include:

  • live Zowe CLI execution
  • real z/OS connectivity
  • streaming operational telemetry
  • predictive incident analytics
  • automated remediation workflows
  • AI-generated operational runbooks
  • real-time mainframe observability pipelines
  • enterprise collaboration workflows

Our long-term vision is to build a truly AI-native operational intelligence layer for enterprise mainframe systems.

Built With

Share this project:

Updates