This system solves:

downtime that costs millions
high MTTR (mean time to resolution)

because

incident commanders
network operators
SRE's
coordinators

can trust AI to solve problems.

Primary Market

Large organizations operating complex infrastructure:

Cisco-heavy environments
hybrid cloud organizations
network operations centers (NOCs)
security operations centers (SOCs)

Downtime Is Extremely Expensive

Approximate enterprise outage costs:

Company Type	Downtime Cost
Mid-size SaaS	$5k–$50k/hour
Enterprise cloud	$100k+/hour
Banks	millions/hour
Telecoms	millions/hour
Healthcare systems	operational risk + compliance risk

Reducing outage duration by even:

creates massive ROI.

1. Faster Incident Resolution

AI automatically:

correlates signals
identifies likely root causes
summarizes telemetry
coordinates responders

2. Lower Staffing Burden

by helping

junior engineers operate faster
small teams manage larger systems
companies reduce operational overhead

3. This system stores:

incident histories
remediation patterns
organizational knowledge

4. This system can generate:

business impact summaries
executive briefings
customer-facing explanations

Automatically.

Voice allows:

hands-free interaction
rapid querying
continuous narration
ambient operational awareness

AI Agents

investigate
reason
coordinate
explain

autonomous operational reasoning

Competitors or adjacent platforms:

Company	Category
Datadog	observability
Splunk	logging/SIEM
PagerDuty	alerting
New Relic	monitoring
Cisco	network ops
Palantir	operational intelligence

Our system

autonomously coordinates
reasons using AI
listens and responds to voice in realtime
performs multi-agent workflows
utilizes operational memory

GTM

Start as:

assistant layer
summarization layer
incident coordination tool

Add:

remediation recommendations
workflow automation
runbook execution

Eventually:

self-healing infrastructure
AI-managed failovers
autonomous remediation agents

Mid-market Infrastructure Teams

MSPs
cloud-native startups
telecom operators
regional healthcare systems

Likely enterprise SaaS pricing.

Per Incident Seat

$50–300/user/month

Infrastructure-Based

priced by:

nodes
endpoints
incidents
telemetry volume

Enterprise Licensing

$50k–500k+/year

depending on scale.

1. Does the project tackle a problem in a fresh or unexpected way? Does it go beyond obvious solutions or existing tools?

Partially:

addresses the well-known problem of incident response, where tools like PagerDuty, Datadog, and Opsgenie already exist. However, it approaches it from a different angle: most solutions focus on *detection and alerting, leaving remediation to humans, *attempts to close the loop with autonomous, voice-driven remediation.

voice-controlled multi-agent orchestration where an engineer speaks natural language and AI agents execute coordinated responses - integration of voice + multi-agent + real-time collaboration in one package.

2. Did the team demonstrate strong engineering skill? Consider the difficulty of implementation, use of APIs/tools, and how much was built within the hackathon timeframe.

Successfully integrated 6 different sponsor APIs (Redis, VoiceOS, GetStream, ButterBase, AdaL, Tencent Cloud)
Built a working 7-agent orchestration system with Redis state management
Implemented WebSocket real-time updates with reconnection logic
Created a 3D topology visualization with Three.js (non-trivial)
Delivered a working Docker Compose setup with all services
Some features rely on mock data or fallback modes (e.g., ButterBase offline mode)
VoiceOS integration uses mock mode in the provided code (API keys not fully implemented)
The agent "reasoning" is partially deterministic rather than full LLM-driven due to API constraints
Built over weeks, not strictly 24 hours (assumption based on code volume)

3. Is the product intuitive, polished, and easy to use? Does the interface reflect thoughtful design choices that serve the end user?

Clean, modern dark-themed UI with glass morphism effects
3D topology visualization is visually impressive
Voice commands work via simulation with clear feedback
Real-time agent activity stream shows progress
Incident timeline and executive summary provide clear information hierarchy
Voice demo requires quiet environment (microphone not fully integrated)
Some UI elements are pre-seeded with demo data rather than live
Learning curve for first-time users (many features)
Mobile responsiveness limited (desktop-focused)

4. Does this solve a meaningful problem or create real value? Consider the breadth of people it could benefit and the significance of the pain point addressed.

Infrastructure failures cost enterprises $300k+ per hour (real)
NOC engineers experience high burnout from on-call rotations (real)
Mean time to resolution (MTTR) is a critical metric (real)
Reducing MTTR from hours to minutes has clear ROI
Voice control reduces context switching
Automated remediation reduces human error
AI hallucination risk in critical infrastructure (unacceptable)
Enterprise trust in autonomous remediation is very low
Would require extensive testing before production use
Competition from established players with AI roadmaps

Solves a real problem, but adoption barriers are significant.

5. Did the team clearly communicate what they built, why it matters, and how it works? Was the video demo compelling and easy to follow?

Comprehensive README and architecture documentation
Clear sponsor integration points explained
Visual demo with crisis mode simulation
Live demo requires stable internet and working API keys
Some complexity is hard to convey in 3 minutes
Voice demo requires quiet environment
The difference between "simulated" and "real" AI reasoning could confuse

6. Does the project genuinely use multiple sponsors, or does it feel forced? Would the product be meaningfully worse without each sponsor's tool?

VoiceOS, Redis, and GetStream are genuinely core. ButterBase and AdaL add value but aren't irreplaceable. Tencent Cloud is infrastructure, not product-differentiating.

Sponsor	Integration	Would product be worse without it?
Redis	State management, event streams, pub/sub	Yes - core dependency
VoiceOS	Voice commands, TTS	Yes - voice is differentiator
GetStream	Collaboration channels, threading	Yes - team collaboration essential
ButterBase	Incident memory, patterns	Partially - could use Redis alone
AdaL	Analytics, anomaly detection	Partially - could use basic metrics
Tencent Cloud	Deployment, infrastructure	No - could deploy elsewhere

7. Did the team spot a clear gap, inefficiency, or user pain point worth solving?

Not the first to identify this gap (e.g., PagerDuty's AI offerings, BigPanda, Moogsoft).

The gap identified is the "detection-remediation gap" where tools alert but don't fix. NOC engineers spend hours investigating, correlating, and executing runbooks. This is a real, well-documented pain point in SRE literature.

Built agent that investigates, not just alerts
Added executive summaries for business impact
Included human escalation path (acknowledging AI limits)
Focused on MTTR reduction as key metric

8. Does the extension deliver real value to users? (Time-saving, boosting creativity, etc.)

Automated runbook execution needs
Real incident data (not simulated)
Comparison against manual response times
User feedback from actual NOC engineers
False positive and false negative rates

9. Did the team demonstrate initiative, creative thinking, and user-first design?

More demo-focused than user-research-focused.

Initiative:

Tackled a complex multi-agent system in a hackathon
Integrated 6 sponsor technologies
Built 3D visualization beyond basic UI

Creative thinking:

Voice as primary interface for NOC (unconventional)
Agent handoff pattern with confidence scoring
Crisis mode simulation for demo theatrics

User-first design:

Executive summaries for managers
Collaboration channels for teams
Human escalation path (not fully autonomous)

10. Could this idea be expanded or monetized with further development?

Viable but challenging market.

Expansion paths:

Add more incident types and remediation playbooks
Integrate with more monitoring tools (Prometheus, Datadog)
Build mobile app for on-call engineers
Add Slack/Teams integration
Create marketplace for community playbooks

Monetization models:

SaaS subscription ($500-2000/month per customer)
Enterprise on-premise deployment
Usage-based pricing (events per month)
Professional services for custom playbooks

Market reality:

Competitive space (PagerDuty, Opsgenie, BigPanda)
Enterprise sales cycles are long (6-12 months)
Security and compliance requirements are high

11. Did they pitch their idea clearly and persuasively like real entrepreneurs?

problem statement (MTTR too high)
solution (autonomous AI agents) minimal
Explanation how it's different from existing AI operations tools
Address trust and safety concerns
Realistic go-to-market strategy

12. Did they talk to users, show personas, or consider real needs?

The team built features that address real NOC pain points:

On-call engineer (persona) → voice commands, escalation
Incident commander → agent orchestration view
Executive → summary dashboard, business impact

Missing evidence:

No user research artifacts in repo
No testimonials or pilot user feedback
Personas not explicitly documented

13. Is there something that makes this stand out from other solutions?

Differentiators:

Voice-first interface for hands-free operation
7 specialized agents with handoffs (not single LLM)
Real-time collaboration with GetStream
Cinematic demo experience (crisis mode)
Event-driven architecture with Redis streams

Not unique:

AI for incident response (multiple vendors)
Voice control (emerging feature)
Automated remediation (existing tools)

Built With

adal
getstream
py
redis
tencent
tsx
voiceos

Updates

Cathy l started this project — May 17, 2026 10:21 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.