The Inspiration In the modern enterprise, automation is the central nervous system. However, as we rely more on tools like Zapier, Make, and internal RAG pipelines, we encounter a frustrating reality: Automations are fragile. A single API change, a renamed JSON field, or an unexpected null value can quietly kill a mission-critical workflow. The inspiration for AutoFlow Doctor came from the "Silent Failure" problem. Most platforms tell you that something broke, but they rarely tell you why in plain English, and they never show you the "cure." I wanted to build an Autonomous Site Reliability Engineer (SRE) that doesn't just monitor logs—it "watches" your recordings, "speaks" the diagnosis, and "renders" the fix.
How I Built It Building a system that functions as a "Doctor" required a multi-layered AI stack, leveraging the cutting edge of the Gemini ecosystem: The Brain (Gemini 3 Pro): I utilized Gemini 3 Pro’s multimodal capabilities to analyze both structured logs and raw video recordings of UI failures. By passing screen recordings of broken workflows, the model can identify visual error toast messages that might not even appear in the backend logs. The Cure (Veo 3.1): To provide "Visual Proof," I integrated Veo to generate synthetic video simulations of the repair process. This allows users to visualize how the data nodes are being re-connected before they hit "Apply." The Voice (Gemini 2.5 Flash TTS): I implemented a specialized "SRE Persona" using the Charon voice. By processing raw PCM audio streams, the app provides real-time AI narration, explaining complex remapping logic in a reassuring, professional tone. Grounding (Google Search): To handle external API changes (e.g., "Shopify changed their API version"), I implemented Google Search Grounding to provide live documentation citations for every fix. The Science of Triage We calculate the System Health Index ( ) based on the probability of node success ( ) and the entropy of the current configuration ( ): Where represents the "Mapping Distance" between source and target fields. If (our threshold), the Doctor automatically initiates a Triage Protocol.
Challenges I Faced Audio/Video Synchronization: Handling raw PCM data from the TTS engine and syncing it with Veo-generated video in a browser environment was a significant engineering hurdle. I solved this by implementing a custom AudioContext scheduler that tracks nextStartTime to ensure gapless playback. Multimodal Prompting: Designing a prompt that could accurately bridge the gap between a human's "visual recording" of a bug and a computer's "code snippet" of the fix required extensive iteration on the system instruction set. Race Conditions: Managing API key selection states during high-latency video generation required a robust state-machine approach to ensure a smooth user experience.
What I Learned This project pushed me to think about AI as a maintainer, not just a creator. While most AI tools help you write new code, the real value in production is keeping existing systems alive. I learned that multimodal models are uniquely suited for debugging because they can "see" the context that logs often miss. AutoFlow Doctor proves that with the right AI orchestration, we can move from "Reactive Alerting" to "Autonomous Healing."
Built With
- ai
- gemini3
Log in or sign up for Devpost to join the conversation.