-
-
Initial Dashboard: Starting a fresh infrastructure scan to map out our cloud nodes.
-
Infrastructure Health: Real-time monitoring showing critical pressure on our API and Auth services.
-
Failure Prediction: Using AI to predict a server crash caused by a 93% memory leak.
-
Cost Optimization: Pinpointing $775 in monthly cloud waste from idle server instances.
-
Security Risks: Identifying exposed S3 buckets and open database ports for immediate patching.
-
AI Assistant: Using the Nova 2 Lite chat to get deep insights into system performance.
-
Agent Automation: Triggering the AI Agent to autonomously fix security gaps and scale resources.
-
Voice AI: Asking Nova 2 Sonic for a prioritized verbal report on the next likely failure.
$ sentinelops --story
INSPIRATION
The 3 AM wake-up call.
A critical alert wakes you up, but by the time you log in, the damage is already done.
This lag between “something is wrong” and “someone is fixing it” is the real problem.
Traditional monitoring tools only report history. SentinelOps was built to remove that delay and transform cloud infrastructure from a reactive system into a self-healing organism.
WHAT SENTINELOPS DOES
Predictive Failure Detection
Isolation Forest models analyze telemetry patterns and forecast service failures 15–30 minutes before they occur.Security & Cost Intelligence
Continuous scanning detects vulnerabilities and identifies unused resources silently draining budgets.Nova-Powered Assistant
Natural language infrastructure queries powered by Nova-2-Lite.
Example: “Why is the payment API slow?”
Instead of raw logs, the system provides contextual explanations.Voice AI Interface
Hands-free infrastructure management powered by Nova-2-Sonic for situations when you are away from the keyboard but need immediate status.Autonomous Remediation
The Nova Act agent executes DevOps workflows automatically.
It can scale services, restart systems, and isolate failures without human approval for known-safe operations.
HOW IT WAS BUILT
Frontend
Next.js 15 with Tailwind CSS and Framer Motion designed as a real-time command center with pulsing metrics, fluid transitions, and instant feedback.
Intelligence Layer (Amazon Nova Suite)
Nova-2-Lite
Used for chat assistant capabilities and log summarization with reasoning.
Nova-2-Sonic
Provides voice synthesis for the hands-free interface.
Nova Act
The core reasoning engine responsible for autonomous remediation and safe execution planning.
Nova Multimodal Embeddings
Used for semantic search across infrastructure topology so the system understands relationships between services instead of analyzing isolated metrics.
Backend
FastAPI with scikit-learn running Isolation Forest anomaly detection models.
A custom Bedrock HTTP client was built to keep Nova inference calls below 100 milliseconds.
THE HARDEST PROBLEM
The biggest challenge was teaching Nova Act to make safe decisions without human supervision.
The solution was Nova Multimodal Embeddings. The system vectorizes the entire infrastructure state including services, dependencies, and historical incidents. This gives the agent contextual awareness.
Instead of simply detecting high CPU on a machine, the system understands the broader relationship between services and past failures.
Example reasoning:
CPU is high on instance X.
That instance powers the authentication layer used by the payment API.
The last time this pattern occurred it was caused by a memory leak in the JWT validation service.
Based on this context, the system proposes and executes preventive actions such as restarting specific services or scaling replicas.
DEMO ENVIRONMENT
For the hackathon demo a smart mock AWS environment was built.
It simulates realistic infrastructure behavior and telemetry while avoiding the need for live AWS credentials. Judges can observe predictions, explanations, and automated responses as if the system were connected to a real production environment.
PROUDEST FEATURE
The Failure Prediction Engine.
Most monitoring tools provide only a probability score.
For example:
Risk score = 0.87
SentinelOps instead produces a narrative explanation:
“The payment service has a 94% probability of failure within the next 20 minutes based on memory fragmentation patterns matching three previous incidents. Recommended action is to restart the JWT worker pod and scale the replica set by two. Execution will begin in 10 seconds unless cancelled.”
This combination of Isolation Forest anomaly detection and Nova-2-Lite reasoning creates trust in autonomous decision-making.
WHAT WAS LEARNED
DevOps is shifting from automation to agency.
Traditional systems use predefined rules such as: If CPU usage exceeds 90 percent, send an alert.
SentinelOps recognizes complex patterns, predicts cascading failures, and acts before outages occur.
The difference is moving from a monitoring tool to an intelligent operational teammate.
WHAT'S NEXT
Live Data Streams
Integrating real CloudWatch and CloudTrail telemetry streams.
Shadow Execution
The system will test proposed fixes in cloned environments and verify results before applying them to production systems.
Collective Intelligence
Embedding incident post-mortems so the system learns from every failure across multiple infrastructures.
Built With
- act
- aws-bedrock-languages:-python-3.11
- embeddings)
- fastapi
- framer-motion-ai/ml:-amazon-bedrock-(nova-2-lite
- frameworks:-next.js-15
- render-(backend)
- scikit-learn-(isolation-forest)-cloud:-netlify-(frontend)
- sonic
- tailwind-css
Log in or sign up for Devpost to join the conversation.