TALOS

The Spark: Beyond Error Logging

The inspiration for Talos came from a specific, recurring frustration: "Notification Fatigue." Like many developers, I was tired of CI/CD pipelines acting like smoke detectors—loudly alerting me to a fire (a failed build) but doing nothing to put it out. I realized that the current generation of DevOps tools are passive observers. They tell you what broke, but the mental load of context switching, reading stack traces, and writing the patch still falls on the human. I wanted to build something that didn't just report errors but actually fixed them. I wanted a "Digital Employee"—an autonomous agent that treats a broken build as a "pain signal" and initiates a biological-style "healing response" without my intervention.

The Architecture: Anatomy of an Agent

Building Talos required moving beyond simple "chatbot" architecture into an agentic loop. I structured the system into three distinct biological components:

The Nervous System (FastAPI + Supabase) I needed a robust backend to manage the state of "healing runs." Using FastAPI (Python 3.11), I built an event-driven core that listens for GitHub webhooks. When a workflow_run fails, the system calculates a "Pain Signal" intensity.
- Storage: I used Supabase to store the history of "Patient Zero" (the root cause files) and the "Thought Process" logs.
- Cognition: The brain is Google's Gemini 3 (gemini-3-flash-preview). I chose it for its massive context window, allowing me to feed it entire dependency graphs and error logs.
The Hands (E2B Sandbox) This was the most critical integration. You cannot let an AI run rm -rf / on your production server. I utilized E2B to create ephemeral, secure sandboxes. When Talos attempts a fix, it clones the repo into this isolated environment, installs dependencies, and runs the tests there. If the tests fail, the "patient" dies in the sandbox, not in production.
The Visual Cortex (Playwright) Code isn't just logic; it's also presentation. A common challenge with AI code generation is that it might fix a logic error but break the UI (e.g., changing a CSS class). I implemented a "Visual Cortex" using Playwright. Before submitting a fix, Talos takes a screenshot of the running app to ensure no visual regressions occurred.

The Logic: Quantifying the Fix

One of the hardest parts was determining when the agent should be confident enough to open a Pull Request. I modeled this as a weighted scoring function.

Let $$S_{fix}$$be the confidence score of a generated patch. We define the acceptance threshold$$\theta$$such that a PR is only opened if$$S_{fix} > \theta$$.

$$ S_{fix} = \alpha \cdot T_{pass} + \beta \cdot (1 - \Delta_{UI}) + \gamma \cdot C_{sem} $$

Where:

$$T_{pass} \in {0, 1}$$ is the binary result of the unit tests in the sandbox.
$$\Delta_{UI}$$ represents the visual divergence (pixel difference) detected by the Visual Cortex, normalized between 0 and 1.
$$C_{sem}$$ is the semantic consistency score returned by the LLM (does this code look like the rest of the repo?).
$$\alpha, \beta, \gamma$$ are weights tuning the strictness of the agent.

For Talos, I prioritized logical correctness, setting $$\alpha$$ heavily, meaning if tests fail, the score remains 0.

Challenges Faced

The "Hallucination" Loop Early versions of Talos would get stuck in infinite loops. The AI would suggest a fix, the test would fail with a new error, and the AI would suggest the same fix again.
- Solution: I implemented a "Healing History" context. The prompt sent to Gemini includes the previous failed attempts in the current run, effectively telling it: "You already tried X and it caused Y. Try something else."
User Trust (The "Black Box" Problem) Developers don't trust AI that works in the dark. If a bot just opens a PR, you wonder, "How did it get here?"
- Solution: I built the Neural Dashboard using Next.js 15 and Server-Sent Events (SSE). This created a "Glass Box" experience. Users can watch the "Neural Stream" in real-time—seeing the agent parse the stack trace, "think" about the dependency graph, and execute terminal commands—building trust through transparency.

What I Learned

Building Talos taught me that Agentic AI is fundamentally different from Generative AI.

Context is King: The quality of the fix is directly proportional to the quality of the "Patient Zero" context you provide.
Sandboxing is Non-Negotiable: To give an agent agency, you must give it a safe playground. E2B was the enabler that turned a text generator into a code executor.
Visuals Matter: For frontend debugging, text-based logs are insufficient. Giving the AI "eyes" (Playwright) drastically reduced false positives. Talos isn't just a tool; it's a proof of concept for a future where developers oversee "digital squads" rather than writing every line of code themselves.

Built With

e2b
fastapi
gemini
next.js
python
react
supabase
tailwind
typescript

Submitted to

Gemini 3 Hackathon

Created by

0xGenZero.

My Contribution
I architected and built the entire Talos system, bridging the gap between a standard chatbot and a fully autonomous agent. My primary focus was designing the "Nervous System" (the FastAPI backend), which required complex prompt engineering to get Google's Gemini 3 to accurately diagnose "Patient Zero" in a dependency graph.
I also built the "Neural Dashboard" (Next.js/React) from scratch. I wanted to solve the "Black Box" problem common in AI tools, so I implemented a real-time event stream that visualizes the agent's thought process as it happens. Integrating the E2B sandbox for safe code execution was my biggest technical hurdle, but it was essential for transforming Talos from a passive advisor into an active "digital employee" that can safely run tests and fix code.

0xGenZero T

Updates

0xGenZero T started this project — Feb 07, 2026 01:34 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.