About the Project (Devpost Submission Draft)
Inspiration
We built SysMind because looking at Grafana dashboards at 3 AM is not a hobby.
Every developer knows the pain: a pager alert fires in the middle of the night. You wake up, connect to VPN, SSH into a server, run top, grep through infinite logs, and try to guess why the CPU is at 100%. It’s tedious, stressful, and error-prone.
We realized that modern LLMs like Gemini 1.5 Pro/Flash (and the upcoming Gemini 3.0) have two superpowers that perfectly map to this problem:
- Multimodal Vision: They can "see" charts and graphs just like a human engineer does.
- Massive Context Window: They can ingest 10MB+ log files to find the one error line that
grepmissed.
We wanted to build an agent that doesn't just chat about infrastructure but actually fixes it.
What it does
SysMind is an autonomous Site Reliability Engineer (SRE) agent. It automates the entire Detect → Diagnose → Fix → Verify loop for Linux systems.
- It Sees: You feed it a screenshot of a monitoring dashboard (Grafana/Datadog). It identifies anomalies visually (e.g., "The CPU spiked vertically at 14:02 UTC").
- It Investigates: It connects to the server (via Docker socket), checks running processes, and cross-references them with the dashboard data.
- It Reads: It can ingest enormous log files (syslog, app logs) to correlate the spike with specific application errors.
- It Fixes (Safely): It proposes a remediation (e.g., killing a stuck process), but pauses for Human-in-the-Loop approval if the action is destructive.
How we built it
The architecture is designed for safety and realism:
The Brain (Gemini 3.0 via Google GenAI SDK): We use Gemini's native Function Calling to give the agent "hands." It can execute tools like
list_processes,read_log,grep_file, andrestart_service. We leverage the Vision capabilities to analyze dashboard screenshots, creating a truly multimodal troubleshooter.The Body (Python + Docker): The agent is a Python application that runs in a container. It manages a separate "Target" container (an Ubuntu sandbox) via the Docker Socket. This allows us to simulate catastrophic failures (fork bombs, CPU stress tests) safely without risking the host machine.
The Interface (Rich + Antigravity): We built a beautiful CLI using the
Richlibrary to visualize the agent's Chain of Thought (CoT). The agent displays its reasoning in real-time:- 🔵 THOUGHT: "I see a spike. I need to check PID 1234."
- 🔴 RISK ANALYSIS: "Killing PID 1234 is HIGH RISK. Asking for permission."
- 🟢 ACTION: "Process terminated. Verifying fix..."
Google Search Grounding: We integrated Google Search to help the agent understand obscure error codes or configuration flags that aren't in its training data, preventing hallucinations.
Challenges we ran into
- Safety vs. Autonomy: Giving an AI
rootaccess is scary. We had to implement a strict Risk Protocol. The agent classifies every action (Low/Medium/High Risk). High-risk actions (likekill,rm,restart) trigger a mandatoryinput()block that waits for a human to type "y". - Hallucinations: Early versions would "invent" PIDs to kill. We solved this by implementing a Verify-First loop: the agent must
list_processesand confirm the PID exists immediately before attempting to kill it. - Context Management: Sending 100MB logs is expensive and slow. We implemented a "smart tailing" strategy where the agent reads the last 200 lines first, and only requests the full file if it detects a pattern that requires deep context.
Accomplishments that we're proud of
- The "Titanium" User Experience: The terminal interface looks like something out of a sci-fi movie. It's not just functional; it builds trust by showing exactly what the AI is thinking.
- Speed: In our benchmarks, SysMind can go from "Alert" to "Fix" in about 35 seconds. A human usually takes 15-45 minutes.
- The Vision Integration: Watching the agent correctly identify a "memory leak pattern" just by looking at a PNG of a chart was a magic moment for us.
What we learned
- Agents need "Eyes": Text logs are not enough. Infrastructure is visual. Adding Vision capabilities changed the agent from a "log reader" to a "system analyst."
- Prompt Engineering for SRE: We learned that SRE work requires a very specific "persona." We had to instruct Gemini to use the USE Method (Utilization, Saturation, Errors) to stop it from guessing randomly.
- The Power of Google Grounding: Connecting the agent to live Google Search documentation meant we didn't have to hardcode knowledge about every Linux package. It just "looks it up."
What's next for SysMind
- Kubernetes Integration: Moving from Docker containers to full K8s pod management.
- Predictive Maintenance: Instead of fixing broken things, analyzing trends to fix them before they break.
- Voice Interface: Fully integrating the "Jarvis-like" voice feedback for a hands-free war room experience.
Log in or sign up for Devpost to join the conversation.