About the Project (Devpost Submission Draft)

Inspiration

We built SysMind because looking at Grafana dashboards at 3 AM is not a hobby.

Every developer knows the pain: a pager alert fires in the middle of the night. You wake up, connect to VPN, SSH into a server, run top, grep through infinite logs, and try to guess why the CPU is at 100%. It’s tedious, stressful, and error-prone.

We realized that modern LLMs like Gemini 1.5 Pro/Flash (and the upcoming Gemini 3.0) have two superpowers that perfectly map to this problem:

  1. Multimodal Vision: They can "see" charts and graphs just like a human engineer does.
  2. Massive Context Window: They can ingest 10MB+ log files to find the one error line that grep missed.

We wanted to build an agent that doesn't just chat about infrastructure but actually fixes it.

What it does

SysMind is an autonomous Site Reliability Engineer (SRE) agent. It automates the entire Detect → Diagnose → Fix → Verify loop for Linux systems.

  • It Sees: You feed it a screenshot of a monitoring dashboard (Grafana/Datadog). It identifies anomalies visually (e.g., "The CPU spiked vertically at 14:02 UTC").
  • It Investigates: It connects to the server (via Docker socket), checks running processes, and cross-references them with the dashboard data.
  • It Reads: It can ingest enormous log files (syslog, app logs) to correlate the spike with specific application errors.
  • It Fixes (Safely): It proposes a remediation (e.g., killing a stuck process), but pauses for Human-in-the-Loop approval if the action is destructive.

How we built it

The architecture is designed for safety and realism:

  1. The Brain (Gemini 3.0 via Google GenAI SDK): We use Gemini's native Function Calling to give the agent "hands." It can execute tools like list_processes, read_log, grep_file, and restart_service. We leverage the Vision capabilities to analyze dashboard screenshots, creating a truly multimodal troubleshooter.

  2. The Body (Python + Docker): The agent is a Python application that runs in a container. It manages a separate "Target" container (an Ubuntu sandbox) via the Docker Socket. This allows us to simulate catastrophic failures (fork bombs, CPU stress tests) safely without risking the host machine.

  3. The Interface (Rich + Antigravity): We built a beautiful CLI using the Rich library to visualize the agent's Chain of Thought (CoT). The agent displays its reasoning in real-time:

    • 🔵 THOUGHT: "I see a spike. I need to check PID 1234."
    • 🔴 RISK ANALYSIS: "Killing PID 1234 is HIGH RISK. Asking for permission."
    • 🟢 ACTION: "Process terminated. Verifying fix..."
  4. Google Search Grounding: We integrated Google Search to help the agent understand obscure error codes or configuration flags that aren't in its training data, preventing hallucinations.

Challenges we ran into

  • Safety vs. Autonomy: Giving an AI root access is scary. We had to implement a strict Risk Protocol. The agent classifies every action (Low/Medium/High Risk). High-risk actions (like kill, rm, restart) trigger a mandatory input() block that waits for a human to type "y".
  • Hallucinations: Early versions would "invent" PIDs to kill. We solved this by implementing a Verify-First loop: the agent must list_processes and confirm the PID exists immediately before attempting to kill it.
  • Context Management: Sending 100MB logs is expensive and slow. We implemented a "smart tailing" strategy where the agent reads the last 200 lines first, and only requests the full file if it detects a pattern that requires deep context.

Accomplishments that we're proud of

  • The "Titanium" User Experience: The terminal interface looks like something out of a sci-fi movie. It's not just functional; it builds trust by showing exactly what the AI is thinking.
  • Speed: In our benchmarks, SysMind can go from "Alert" to "Fix" in about 35 seconds. A human usually takes 15-45 minutes.
  • The Vision Integration: Watching the agent correctly identify a "memory leak pattern" just by looking at a PNG of a chart was a magic moment for us.

What we learned

  • Agents need "Eyes": Text logs are not enough. Infrastructure is visual. Adding Vision capabilities changed the agent from a "log reader" to a "system analyst."
  • Prompt Engineering for SRE: We learned that SRE work requires a very specific "persona." We had to instruct Gemini to use the USE Method (Utilization, Saturation, Errors) to stop it from guessing randomly.
  • The Power of Google Grounding: Connecting the agent to live Google Search documentation meant we didn't have to hardcode knowledge about every Linux package. It just "looks it up."

What's next for SysMind

  • Kubernetes Integration: Moving from Docker containers to full K8s pod management.
  • Predictive Maintenance: Instead of fixing broken things, analyzing trends to fix them before they break.
  • Voice Interface: Fully integrating the "Jarvis-like" voice feedback for a hands-free war room experience.
Share this project:

Updates

posted an update

The Evolution of SysMind: A Deep Dive

Here is the complete development timeline of SysMind during the hackathon. It's been a busy week!

Phase 1: The Core (Feb 1)

  • Abstractions First: Implemented OSStrategy pattern to decouple the agent logic from the underlying operating system commands.
  • Transport Layer: Switched from SSH to docker-exec for lower latency and better reliability in containerized environments.
  • Cognition Engine: Integrated google-genai SDK with an exponential backoff strategy to handle API rate limits gracefully.
  • Diagnostic Tools: Built the initial toolset (list_processes, read_file, netstat) to give the agent basic "senses".

Phase 2: Platinum Upgrade (Feb 1 - Late)

  • Robustness: Fixed entry points and added robust path handling to prevent crashes on edge cases.
  • Exploratory Tools: Added find_file and grep_file to help the agent locate logs dynamically.
  • Safety Timeouts: Implemented strict execution timeouts to prevent the agent from hanging on long-running commands.
  • Native Persona: Fine-tuned the system prompt to adopt a professional "SRE" persona, focusing on the USE Method (Utilization, Saturation, Errors).

Phase 3: Titanium Hardening (Feb 2)

  • Context Optimization: Refactored memory management to handle large context windows (1M+ tokens) without blowing up the API costs.
  • Air-Gapped Audit: Implemented a local audit.json log that records every "Thought", "Risk Assessment", and "Action" for post-mortem review.
  • Security Injection: Sanitized all shell commands using shlex to prevent injection attacks.
  • Hybrid Resilience: Added a deterministic "Mock Engine" (Simulation Mode) for testing the agent without burning API credits.

Phase 4: Operation Grand Prize (Feb 7)

  • Multimodal Vision: The game-changer. Integrated gemini-2.0-flash Vision to analyze graphical dashboards (PNGs) for anomaly detection.
  • Rich TUI: Built the "Sci-Fi" terminal interface using the Rich library to visualize the OODA loop in real-time.
  • Interactive Safety: The "Human-in-the-Loop" protocol now pauses execution for high-risk commands and waits for user confirmation (Y/N).

We are incredibly proud of how far this project has come in just a few days!

Gemini3 #BuildingInPublic #Python #DevLog #OpenSource

Log in or sign up for Devpost to join the conversation.