Inspiration

Modern cloud microservices can be compromised by ransomware in milliseconds, spreading laterally before standard passive monitoring systems can alert an on-call engineer. Traditional Site Reliability Engineering (SRE) workflows focus entirely on observability and passive alerting—requiring a human to look at a chart, triage logs, and execute remediation paths.

We realized that human response teams are fundamentally too slow to catch automated malware. This inspired us to build RanSafe: an autonomous, AI-driven cybersecurity circuit breaker that shifts enterprise defense from passive monitoring into immediate, real-time cloud state mutation.


What it does

RanSafe functions as an autonomous AI SRE protecting Google Cloud infrastructure against ransomware outbreaks. By continuously analyzing live, full-stack compute and storage telemetry via a custom Model Context Protocol (MCP) server, it flags sub-second indicators of compromise.

Instead of throwing a passive alert, the system actively mitigates threats in real time:

  • Isolates Blast Radii: Instantly modifies Google Cloud Armor policy rulesets to network-airgap infected cloud nodes.
  • Drops Attack Paths: Severs container network namespaces to instantly halt lateral ransomware encryption.
  • Restores Balance: Provisions clean application replicas from verified backups and safely reroutes active customer web traffic away from the isolated nodes entirely in the background.

How we built it

We engineered RanSafe across four decoupled, parallel execution domains:

Node ID Operational Domain Responsibility Key Deliverables
Node 1 /sandbox Target Environment & Attack Simulation microservice.js, malware_sim.sh
Node 2 /observability Cloud Ops & Telemetry Extraction Bridge Cloud Run Deployment, mcp_server.js
Node 3 /agent AI Core Prompting & System Reasoning validator.py, Grounding Config
Node 4 /execution Infrastructure State-Mutation Daemon & UI handler.py, Terminal Status Dashboard

We utilized the following anomaly threshold bounds to govern our agent prompt logic:

$$\text{Ransomware Vector } (R_v) = \begin{cases} \text{CRITICAL THREAT}, & \text{if } U_{\text{cpu}} > 85\% \land W_{\text{ops}} > 200/\text{s} \land E_{\text{coef}} > 0.80 \ \text{NOMINAL STATE}, & \text{otherwise} \end{cases}$$

Where:

  • $U_{\text{cpu}}$ is processor utilization.
  • $W_{\text{ops}}$ represents file-write performance frequencies.
  • $E_{\text{coef}}$ measures encryption entropy coefficients.

Challenges we ran into

Our primary technical pivot was migrating our architecture from local machine scripts to strict, enterprise-grade cloud parameters to comply with hackathon rules. This resulted in unexpected challenges:

GCP IAM & Registry Access Blocks: We encountered severe storage permission denial errors during automated container builds. We systematically resolved these conflicts by auditing our build service accounts inside Cloud Shell and manually binding targeted roles/artifactregistry.writer and logging roles.

Data-Grounding Latency: Structuring complex, real-time JSON-RPC telemetry streams so an LLM core could reason and emit instantaneous, schema-compliant infrastructure commands required strict operational alignment. We solved this by implementing rigid API contracts between our nodes from day one.


Accomplishments that we're proud of

  • [x] True State Mutation: We moved completely beyond typical conversational chat widgets to build an agent that safely alters live infrastructure topology and firewalls in production.
  • [x] End-to-End Automation: We successfully connected a real-time loop where a simulated microservice attack maps to live telemetry, filters via MCP to Gemini, and triggers automated cloud armor mitigation in seconds.
  • [x] Container Weight Optimization: Maintained highly slim, multi-stage Docker builds (~150 MB) using node:20-slim layers to prevent operational pipeline lag.

What we learned

We mastered building applications with the open-source Model Context Protocol (MCP) to supply language models with targeted domain data in real time. We also gained crucial practical experience with Google Cloud Agent Builder for multi-tiered context grounding, secure cloud API credential management with Secret Manager, and implementing structured JSON schemas to govern autonomous execution gates safely.


What's next for RanSafe

  • Enterprise Cluster Meshes: We plan to scale RanSafe from single cloud runtimes into high-availability, multi-region Google Kubernetes Engine (GKE) environments.
  • Proactive Threat Hunting: Move beyond reactive remediation by implementing continuously auditing security layers.
  • AI Drift Monitoring: Deploy Arize components to monitor vector embedding health, detecting prompt drift and tracking LLM evaluation stability.
  • Going Open Source: Release the core platform codebase as an open-source framework for autonomous, self-healing cloud microservice immunity.

Built With

Share this project:

Updates