opscart-k8s-watcher

Version: 0.5.2
Purpose: Production-grade Kubernetes security auditing with multi-cluster support, HTML reporting, network policy analysis, and waste detection
Focus: CIS compliance, HTML reports, network isolation, waste detection, and multi-cluster analysis


Important Disclaimer

This is a security awareness and troubleshooting tool - NOT for:

  • Compliance auditing (use kube-bench for CIS compliance)
  • Financial decision-making (consult cloud architects for cost analysis)
  • Production security decisions (consult security professionals)

What it IS for:

  • Quick security posture checks
  • Multi-cluster health monitoring
  • Resource optimization opportunities
  • War room troubleshooting
  • Executive-ready HTML reports

What's New in v0.5.2

HTML Reports for Waste Detection

The waste command now supports HTML output alongside CLI format.

# Generate HTML report (same professional format as security reports)
./opscart-scan waste --cluster prod --format html

# CLI output (default - unchanged)
./opscart-scan waste --cluster prod

HTML report includes:

  • Visual scorecard showing all 9 waste categories at a glance
  • Color-coded severity (red=critical, orange=warning, blue=success)
  • Detailed findings with kubectl investigation commands
  • Separate "Housekeeping" section for Old ReplicaSets (not counted in total)
  • Kubernetes blue theme for professional/corporate environments

Reports saved to: reports/YYYY-MM-DD/opscart-waste-HHMM.html


What's New in v0.5

Waste & Drift Detection (waste command)

Detects forgotten, idle, and orphaned resources. Suggestions only - never modifies the cluster.

  • Abandoned namespaces - Old namespaces with no running pods (dev-john, test-2024, poc-ai)
  • Zombie pods - CrashLoopBackOff, ImagePullBackOff, OOMKilled for days
  • Unmanaged pods - Bare pods with no controller (forgotten kubectl run sessions)
  • Orphaned PVCs - Unbound, released, or bound-but-no-pod (silent storage cost leaks)
  • Stale Jobs/CronJobs - Completed jobs not cleaned up, CronJobs that never ran, no history limits set
  • Zero-replica workloads - Deployments and StatefulSets scaled to 0
  • Old ReplicaSets - Leftover rollout artifacts accumulating over time
  • Services with no endpoints - LoadBalancers flagged with cloud cost warning
  • Broken Ingresses - Backends pointing to services with no endpoints
  • Misconfigured HPAs - Scaling disabled or always stuck at minReplicas

Every finding includes: observed data, reason it's suspicious, and a kubectl command to investigate.

./opscart-scan waste --cluster prod                        # default: 7+ days old
./opscart-scan waste --cluster prod --min-age-days 30      # stricter threshold
./opscart-scan waste --cluster prod --namespace staging    # single namespace
./opscart-scan waste --all-clusters --min-age-days 14      # all clusters
./opscart-scan waste --cluster CLUSTER 2>/dev/null         # Corporate clusters: suppress harmless klog warnings

Troubleshooting

Corporate Cluster Warnings

When scanning corporate AKS/EKS clusters, you may see Kubernetes client library warnings:

W0217 11:00:42.760152 warnings.go:70] Use tokens from the TokenRequest API...

Workaround: Redirect stderr to suppress these warnings (they're harmless):

./opscart-scan waste --cluster CLUSTER 2>/dev/null
./opscart-scan network --cluster CLUSTER 2>/dev/null
./opscart-scan security --cluster CLUSTER 2>/dev/null

These warnings come from the Kubernetes client library (klog) and don't affect functionality.


Example scorecard:

WASTE SCORECARD
  🔴 Abandoned Namespaces:           1
  🔴 Zombie Pods (CrashLoop/OOM):    2
  🔴 Unmanaged Pods (no controller): 1
  ✅ Orphaned PVCs:                  0
  🟢 Old ReplicaSets:                2
  🟢 Misconfigured HPAs:             1
  Total waste items found:  7

What's New in v0.4

Network Policy Detection

  • Namespace coverage analysis - Which namespaces have NetworkPolicies and which don't
  • Smart infrastructure filtering - Auto-skips system namespaces using 3 strategies (no manual list needed):
    • Pattern-based - Covers kube-*, istio-*, calico-*, tigera-*, cert-manager, ingress-nginx, flux-system, argocd, velero, longhorn-*, cattle-*, openshift-*, gke-*, azure-*, karpenter, crossplane-*
    • Label-based - Detects pod-security.kubernetes.io/enforce=privileged system namespaces
    • User-defined - --skip-namespaces ns1,ns2 for anything not covered by patterns
  • Risk-based sorting - HIGH risk (production/staging) shown first, sorted by pod count
  • Coverage percentage bar - Visual indicator of cluster-wide policy coverage
  • Default-deny template - Ready-to-apply kubectl policy in recommendations
  • Multi-cluster support - Works with --all-clusters and --cluster-group
# Scan single cluster
./opscart-scan network --cluster prod

# All clusters
./opscart-scan network --all-clusters

# Cluster group
./opscart-scan network --cluster-group production

# Skip additional namespaces not covered by auto-detection
./opscart-scan network --cluster prod --skip-namespaces monitoring,vault

# Specific namespace only
./opscart-scan network --cluster prod --namespace production

Example output:

NETWORK POLICY SUMMARY
Total Namespaces:         8
Protected (policies):     0
Unprotected (no policy):  8
High Risk Namespaces:     3

Coverage: [░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 0% 🔴 Poor

🔴 UNPROTECTED NAMESPACES (sorted by risk):
  🔴 [PROD] production (10 pods) - HIGH RISK
  🔴 [SYS]  monitoring (5 pods)  - HIGH RISK
  🔴 [STAGE] staging (3 pods)    - HIGH RISK
  🟢 [DEV]  development (2 pods) - LOW RISK

What's New in v0.3

HTML Report Generation

  • Security HTML Reports - Professional security audit reports with CIS compliance scoring
  • Comprehensive HTML Reports - Full cluster health reports with real security data
  • Date-organized storage - Reports auto-organized as reports/YYYY-MM-DD/
  • Real data extraction - All reports use actual cluster data (validated against kubectl)

Enhanced Security Reporting

  • Deduplicated pod names - Shows "pod-name (4 issues)" for multiple issues per pod
  • Top 5 affected resources per finding type
  • Recommended actions in priority order
  • Validation steps for remediation
  • Issue count breakdown table
  • Validated accuracy - All counts match kubectl queries exactly

Helper Scripts

  • scripts/view-latest.sh - Open most recent report in browser
  • scripts/cleanup-reports.sh - Remove old reports (configurable retention)
  • scripts/daily-reports.sh - Generate reports for all clusters

New Commands

# Security HTML report
./opscart-scan security --cluster prod --format=html

# Security HTML for all clusters
./opscart-scan security --all-clusters --format=html

# Comprehensive cluster report
./opscart-scan report --cluster prod --monthly-cost 5000

# Comprehensive report for cluster group
./opscart-scan report --cluster-group production --monthly-cost 50000

Features

🌐 Multi-Cluster Support (v0.2)

  • Config management - Centralized cluster configuration
  • Multi-cluster scanning - Scan all clusters with --all-clusters
  • Cluster groups - Scan by environment with --cluster-group production
  • Side-by-side comparison - Compare security posture with --compare=a,b
  • Sequential execution - Clear, readable output for multiple clusters

🗑️ Waste & Drift Detection (v0.5)

  • 9 resource types - namespaces, pods, PVCs, jobs, deployments, ReplicaSets, services, ingresses, HPAs
  • Data-driven findings - every result shows observed data, not assumptions
  • Smart filtering - auto-skips infrastructure namespaces (same patterns as network command)
  • Configurable threshold - --min-age-days (default: 7)
  • HTML reports - --format html for visual dashboards (v0.5.2)
  • Suggestions only - never modifies the cluster

🌐 Network Policy Detection (v0.4)

  • Namespace coverage analysis - Protected vs unprotected namespaces
  • Smart infrastructure filtering - Auto-skips 15+ known infrastructure patterns
  • Risk-based prioritization - HIGH/LOW risk with clear reasoning per namespace
  • Actionable output - Ready-to-apply kubectl default-deny policy template
  • User-defined skip list - --skip-namespaces for custom infrastructure namespaces

📊 HTML Reports (v0.3)

  • Security Reports - CIS compliance, findings, remediation steps
  • Comprehensive Reports - Security + resources + cost analysis
  • Date-organized storage - Easy archival and retention management
  • Professional templates - Executive-ready presentations

Security Auditing

  • CIS Kubernetes Benchmark scoring (Pod Security subset)
  • 8 security check types - Validated against kubectl
  • Environment-aware analysis (PRODUCTION vs DEVELOPMENT)
  • Actionable remediation steps

Checks performed:

  • Privileged containers (CIS 5.2.1)
  • Host namespace sharing (CIS 5.2.2-5.2.4)
  • Root containers (CIS 5.2.6)
  • Privilege escalation
  • Resource limits
  • Security contexts
  • Service account usage
  • Added capabilities

Emergency Scanner

  • Crash looping pods
  • Pending pods
  • Image pull failures
  • High restart counts

Cost Optimization

  • Idle resource detection
  • Spot instance recommendations
  • Resource right-sizing opportunities
  • Potential savings estimation

Resource Search

  • Find resources by type (pod, deployment, service)
  • Filter by name pattern or status
  • Multi-cluster search support

Installation

# Clone repository
git clone https://github.com/opscart/opscart-k8s-watcher.git
cd opscart-k8s-watcher

# Checkout v0.4
git checkout v0.4

# Build
go build -o opscart-scan cmd/opscart-scan/main.go

# Initialize config for multi-cluster
./opscart-scan config init

# Run
./opscart-scan --help

Quick Start

1. Configure Clusters (v0.2)

# Initialize cluster config
./opscart-scan config init

# Shows your kubeconfig clusters and lets you organize them into groups
# Creates: ~/.opscart/clusters.yaml

# View configuration
./opscart-scan config show

2. Security Audit

CLI Output:

# Single cluster
./opscart-scan security --cluster prod

# All clusters
./opscart-scan security --all-clusters

# By cluster group
./opscart-scan security --cluster-group production

HTML Report (v0.3):

# Single cluster HTML report
./opscart-scan security --cluster prod --format=html
# Output: reports/2026-02-05/prod-security-1430.html

# All clusters HTML reports
./opscart-scan security --all-clusters --format=html
# Output: reports/2026-02-05/prod-security-1430.html
#         reports/2026-02-05/staging-security-1431.html
#         reports/2026-02-05/dev-security-1432.html

HTML Report Includes:

  • CIS compliance score with progress bar (e.g., 41/100)
  • Pods scanned and issues found (e.g., 47 pods, 181 issues)
  • Deduplicated pod names (e.g., "kube-apiserver (4 issues)")
  • Critical findings and warnings
  • Recommended actions in priority order
  • Validation steps
  • Issue count breakdown table

3. Comprehensive Cluster Report (v0.3)

# Full HTML report (security + resources + cost)
./opscart-scan report --cluster prod --monthly-cost 5000
# Output: reports/2026-02-05/prod-report-1431.html

# All clusters
./opscart-scan report --all-clusters --monthly-cost 50000

Comprehensive Report Includes:

  • Real CIS security score (e.g., 41/100 from actual cluster scan)
  • Security findings with pod counts (3 privileged, 31 hostPath, etc.)
  • Cost analysis and potential savings ($1,200-$1,800/month)
  • Overall health score
  • Professional HTML template

Note: v0.4 will add per-namespace breakdown and resource metrics to match CLI detail level.

4. Compare Clusters (v0.2)

# Compare two clusters side-by-side
./opscart-scan security --compare=prod,staging

# Shows:
# - CIS score difference
# - Issue count deltas
# - Environment-specific findings

5. Network Policy Analysis (v0.4)

# Check network isolation across all namespaces
./opscart-scan network --cluster prod

# All clusters
./opscart-scan network --all-clusters

# Skip namespaces not caught by auto-detection
./opscart-scan network --cluster prod --skip-namespaces monitoring,vault

6. Waste & Drift Detection (v0.5)

# Detect forgotten/idle/orphaned resources (default: 7+ days old)
./opscart-scan waste --cluster prod

# Generate HTML report (v0.5.2)
./opscart-scan waste --cluster prod --format html

# Adjust age threshold
./opscart-scan waste --cluster prod --min-age-days 30

# Focus on specific namespace
./opscart-scan waste --cluster prod --namespace staging

# All clusters
./opscart-scan waste --all-clusters --min-age-days 14

Commands

Config Management (v0.2)

# Initialize cluster configuration
./opscart-scan config init

# Show current configuration
./opscart-scan config show

Security Audit

# CLI output (default)
./opscart-scan security --cluster CLUSTER

# HTML report (NEW in v0.3)
./opscart-scan security --cluster CLUSTER --format=html

# JSON output
./opscart-scan security --cluster CLUSTER --format=json

# All clusters
./opscart-scan security --all-clusters

# Cluster group
./opscart-scan security --cluster-group production

# Compare two clusters
./opscart-scan security --compare=prod,staging

Comprehensive Report (NEW in v0.3)

# HTML report (default)
./opscart-scan report --cluster CLUSTER --monthly-cost 5000

# JSON report
./opscart-scan report --cluster CLUSTER --format=json

# CSV report
./opscart-scan report --cluster CLUSTER --format=csv

# All clusters
./opscart-scan report --all-clusters --monthly-cost 50000

# Cluster group
./opscart-scan report --cluster-group production --monthly-cost 50000

Waste & Drift Detection (NEW in v0.5)

./opscart-scan waste --cluster CLUSTER
./opscart-scan waste --cluster CLUSTER --format html  # HTML report (v0.5.2)
./opscart-scan waste --cluster CLUSTER --min-age-days 30
./opscart-scan waste --cluster CLUSTER --namespace NAMESPACE
./opscart-scan waste --all-clusters
./opscart-scan waste --cluster-group production --min-age-days 14

Network Policy Analysis (NEW in v0.4)

# Scan single cluster
./opscart-scan network --cluster CLUSTER

# All clusters
./opscart-scan network --all-clusters

# Cluster group
./opscart-scan network --cluster-group production

# Specific namespace only
./opscart-scan network --cluster CLUSTER --namespace production

# Skip namespaces not auto-detected
./opscart-scan network --cluster CLUSTER --skip-namespaces monitoring,vault

Other Commands

# Resource analysis
./opscart-scan resources --cluster CLUSTER

# Cost analysis
./opscart-scan costs --cluster CLUSTER --monthly-cost 5000

# Emergency scan
./opscart-scan emergency --cluster CLUSTER

# Find specific resources
./opscart-scan find pod --cluster CLUSTER --name nginx

# Cluster snapshot
./opscart-scan snapshot --cluster CLUSTER

Helper Scripts (v0.3)

View Latest Report

./scripts/view-latest.sh
# Opens most recent HTML report in default browser

Cleanup Old Reports

./scripts/cleanup-reports.sh 30
# Removes reports older than 30 days

Daily Reports for All Clusters

./scripts/daily-reports.sh
# Generates security reports for all configured clusters
# Useful for scheduled cron jobs:
# 0 6 * * * /path/to/opscart-k8s-watcher/scripts/daily-reports.sh

Report Storage Structure (v0.3)

Reports are automatically organized by date:

reports/
├── 2026-02-05/
│   ├── prod-aks-security-1430.html
│   ├── prod-aks-report-1431.html
│   ├── staging-aks-security-1432.html
│   └── dev-aks-security-1433.html
├── 2026-02-04/
└── 2026-02-03/

Benefits:

  • Easy archival and retention management
  • Clear chronological organization
  • Simple to find reports by date
  • Cleanup scripts work on date folders

Note: reports/ directory is in .gitignore


Validating Report Accuracy (v0.3)

All security counts can be validated against kubectl queries:

# Validate privileged containers count
kubectl get pods --all-namespaces -o json | \
  jq '[.items[] | select(.spec.containers[]?.securityContext?.privileged == true)] | length'
# Should match tool output: 3

# Validate host path volumes
kubectl get pods --all-namespaces -o json | \
  jq '[.items[] | select(.spec.volumes[]?.hostPath != null)] | length'
# Should match tool output: 31

# Validate host network usage
kubectl get pods --all-namespaces -o json | \
  jq '[.items[] | select(.spec.hostNetwork == true)] | length'
# Should match tool output: 11

# Validate missing resource limits
kubectl get pods --all-namespaces -o json | \
  jq -r '.items[] | select(.spec.containers[] | (.resources.limits == null or .resources.limits == {})) | "\(.metadata.namespace)/\(.metadata.name)"' | sort -u | wc -l
# Should match tool output: 33

Result: All counts match exactly


Use Cases

Weekly Waste Review (v0.5)

./opscart-scan waste --all-clusters --min-age-days 30

# Finds real issues like:
# - Namespace 'data-processing': 9 pods, none Running, 30 days old
# - Pod 'kubernetes-dashboard': CrashLoopBackOff, 7792 restarts
# - HPA 'worker': FailedGetResourceMetric - autoscaling silently broken
# - Bare pod 'webtest-34210': no controller, sitting in default namespace

Network Policy Audit (v0.4)

# Weekly network isolation check across all clusters
./opscart-scan network --all-clusters

# Focus on production only
./opscart-scan network --cluster-group production

# Shows:
# - Which namespaces have NetworkPolicies
# - Risk level per namespace (HIGH/LOW)
# - Ready-to-apply default-deny policy template

Multi-Cluster Security Review (v0.2 + v0.3)

# Generate HTML reports for all production clusters
./opscart-scan security --cluster-group production --format=html

# Email reports to security team
# Reports saved in reports/2026-02-05/

Cluster Health Comparison (v0.2)

# Compare prod vs staging security posture
./opscart-scan security --compare=prod,staging

# Shows:
# - CIS score: prod 73 vs staging 45
# - Critical issues: prod 2 vs staging 8
# - Recommendations for staging improvements

Executive Dashboard (v0.3)

# Monthly comprehensive reports for all clusters
./opscart-scan report --all-clusters --monthly-cost 100000

# Generates professional HTML reports showing:
# - Overall security posture across all clusters
# - Cost optimization opportunities
# - Potential savings aggregated

CI/CD Security Gate

# Gate deployment based on security score
SCORE=$(./opscart-scan security --cluster staging --format=json | jq '.cis_score')
if [ $SCORE -lt 60 ]; then
  echo "Security score too low: $SCORE"
  exit 1
fi

Configuration File

After running config init, clusters are stored in ~/.opscart/clusters.yaml:

clusters:
  - name: prod-aks-01
    context: prod-aks-01-context
    groups:
      - production
      - critical
  - name: staging-aks
    context: staging-aks-context
    groups:
      - staging
  - name: dev-local
    context: minikube
    groups:
      - development

This enables powerful multi-cluster workflows with --all-clusters and --cluster-group.


Version History

v0.5.2 (Current - February 2026)

HTML Reports for Waste Detection:

  • --format html flag for waste command
  • Visual scorecard with all 9 waste categories
  • Color-coded severity (red/orange/blue Kubernetes theme)
  • Detailed findings with kubectl commands
  • Old ReplicaSets shown separately (not counted in total)
  • Same professional format as security reports

v0.5.1 (February 2026)

Bug Fixes:

  • Fixed context cancellation leak in waste detector
  • Fixed PVC detection failing when pod listing errors
  • Fixed HPA detection on older Kubernetes clusters (< 1.23)
  • Added v1 HPA API fallback

v0.5 (February 2026)

Waste & Drift Detection:

  • waste command - detects forgotten, idle, and orphaned resources across 9 types
  • Abandoned namespaces, zombie pods, unmanaged bare pods
  • Orphaned PVCs, stale jobs, zero-replica workloads, old ReplicaSets
  • Services with no endpoints, broken ingresses, misconfigured HPAs
  • Data-driven findings with kubectl investigation commands
  • Smart infrastructure namespace filtering (same patterns as network command)
  • Configurable age threshold (--min-age-days, default: 7)
  • Suggestions only - never modifies the cluster

v0.4 (February 2026)

Network Policy Detection:

  • Namespace coverage analysis (protected vs unprotected)
  • Smart infrastructure filtering - auto-skips 15+ patterns (kube-*, istio-*, calico-*, tigera-*, cert-manager, ingress-nginx, flux-system, argocd, velero, longhorn-*, cattle-*, openshift-*, gke-*, azure-*, karpenter, crossplane-*)
  • Label-based detection (pod-security.kubernetes.io/enforce=privileged)
  • User-defined skip list via --skip-namespaces
  • Risk-based sorting (HIGH/LOW) with clear reasoning
  • Coverage percentage bar
  • Ready-to-apply default-deny policy template in recommendations
  • Full multi-cluster support

v0.3 (February 2026)

HTML Report Generation:

  • Security HTML reports with CIS scoring
  • Comprehensive cluster reports with real data
  • Date-organized storage (reports/YYYY-MM-DD/)
  • Helper scripts (view-latest, cleanup, daily-reports)

Enhanced Security Reporting:

  • Deduplicated pod names with issue counts
  • Top 5 affected resources per finding
  • Recommended actions and validation steps
  • Validated accuracy against kubectl

Format Separation:

  • Separate securityFormat and reportFormat variables
  • Security defaults to CLI table output
  • Report defaults to HTML output

v0.2 (Multi-Cluster Support)

Major Features:

  • Centralized cluster configuration (config init)
  • Multi-cluster scanning (--all-clusters)
  • Cluster groups (--cluster-group production)
  • Side-by-side comparison (--compare=a,b)
  • Sequential execution with clear output

Real-World Findings:

  • Found production namespace idle for 70+ days
  • Found staging namespace idle for 21+ days
  • Identified spot instance optimization opportunities
  • Scan time: ~200ms per cluster

v0.1 (Initial Release)

Security Improvements:

  • Removed unvalidated financial risk calculations
  • Added CIS Kubernetes Benchmark scoring
  • Environment-aware recommendations
  • Specific resource identification
  • Issue count validation

Roadmap

v0.6 (Next)

  • Full diff view for cluster comparison (promised in v0.2)
  • Per-namespace breakdown in comprehensive HTML reports
  • Historical trend tracking

v0.7 (Future)

  • Prometheus integration for CPU/memory idle detection
  • Grafana dashboard templates
  • Webhook notifications (Slack, email)
  • Custom policy definitions
  • Multi-cluster aggregated dashboard

Contributing

Key areas for contribution:

  1. Additional security checks
  2. Enhanced report templates
  3. Waste and cleanup detection
  4. Cluster comparison diff view
  5. Integration with other tools

License

MIT License - See LICENSE file for details


Support

  • Issues: GitHub Issues
  • Documentation: opscart.com
  • Author: Shamsher Khan (IEEE Senior Member)

Version: v0.5.2
Status: Dev/Stag/Production-ready for multi-cluster security auditing, network policy detection, and waste detection
Last Updated: February 2026

Built With

Share this project:

Updates