Inspiration
I was frustrated by the "Black Box" nature of current AI agent frameworks. Most multi-agent systems operate as opaque loops—you give them a prompt, wait five minutes, and hope they don’t hallucinate or crash. When they inevitably fail, I had no visibility into where or why.
I realized that treating AI agents like chatbots is a dead end for complex engineering tasks. They needed to be treated like microservices.
Inspired by Kubernetes and distributed systems, I asked myself: What if I separated the "Brain" (Orchestration) from the "Mind" (Inference)? I wanted to build a system where the runtime graph wasn't static, but recursive—where an agent could realize a task was too big, request help, and the system would dynamically spin up new sub-agents to handle the load, all visualized in real-time.
What it does
RARO (Recursive Agentic Runtime Operator) is a "Kubernetes for Cognitive Workflows." It is a visual command center that orchestrates Google Gemini 3.0 agents to perform complex, multi-step tasks.
- Dynamic Graph Splicing: Unlike static workflows, RARO agents can self-modify their execution graph. If a "Researcher" agent realizes a topic is too broad, it outputs a
json:delegationrequest. The Rust Kernel intercepts this and instantly "splices" new sub-agents into the live DAG (Directed Acyclic Graph) without stopping execution. - Cortex Safety Layer: I implemented a "nervous system" that sits between the AI and the tools. It pattern-matches every tool call against a safety registry. If an agent tries to delete a file or execute risky code, the system pauses execution and summons the human operator for approval via a physical-style "Intervention Ticket" in the UI.
- Glass-Box Observability: The Svelte 5 console visualizes the "living graph." You see exactly which agent is thinking, what tools they are using, and how data flows between them.
- RFS (Raro File System): Agents share a secure, tiered file system. One agent can download a PDF to the
sessionvolume, and the next agent can mount it, read it, and generate a Python script to analyze it, all within a strictly typed infrastructure.
How I built it
I utilized a "Split-Brain" Architecture to leverage the best tools for specific jobs:
- The Brain (Rust Kernel): I built the orchestrator in Rust using
AxumandTokio. This handles the DAG scheduling, topological sorting, concurrency, and WebSocket streams. Choosing Rust ensured that even if an agent hallucinates, the runtime itself never crashes. - The Muscle (Python Agent Service): I used Python and FastAPI to interface with Google's Gemini 3.0. This layer handles the multimodal inputs (PDFs, images) and the "thinking" logic. It remains stateless and purely reactive.
- The Face (Svelte 5): I built a "Tactical Arctic" UI using Svelte 5 (Runes). It connects to the Kernel via WebSockets to render DAG updates at 60fps. I wrote a custom layout engine to visualize dynamic node injection in real-time.
- The Ghost (Debug Puppet): To test complex topologies without burning API credits, I built a specialized debugging service that intercepts agent calls and injects mock responses, allowing me to unit-test the graph mutation logic.
Challenges I ran into
- Dynamic Graph Mutation: Allowing an agent to modify the very graph it is currently traversing was difficult. I had to implement strict cycle detection and "orphan capture" logic in Rust to ensure that when a node splits into three sub-nodes, the downstream dependencies are correctly rewired to the new outputs without breaking the chain.
- The "Context Drought": In a DAG, if Agent A fails to produce output, Agent B (who depends on A) wakes up to empty input. I implemented a "Context Drought" detection system in the Kernel that pauses the run and alerts me if an agent is about to run with insufficient data.
- UI Synchronization: Because the graph changes shape in real-time on the backend, keeping the frontend visualization in sync without jarring "jumps" required designing a custom diffing algorithm in my Svelte store logic.
Accomplishments that I'm proud of
- Type-Safe Orchestration: I successfully decoupled the control plane from the inference plane. The Rust kernel enforces the protocol, while Python handles the creativity.
- The "Living Graph": Seeing the UI visibly expand and rearrange itself the first time an agent requested delegation was a magical moment. It felt like watching the system "think" structurally.
- Secure Code Sandboxing: I integrated the E2B Code Interpreter to ensure that when agents write and execute Python code, it runs in a secure, ephemeral cloud sandbox rather than on the host machine.
- Human-in-the-Loop UX: I moved beyond simple chat. The "Approval Card" interface that pops up when the Safety Cortex triggers makes the AI feel like a powerful tool I control, rather than a black box I hope works.
What I learned
- Structure > Prompting: No amount of prompt engineering solves a bad architecture. Giving agents a structured environment (RFS, Tools, Delegation Protocol) makes them infinitely more reliable than just asking them to "be careful."
- State Management is Hard: Managing the state of a distributed system where the "workers" (AI agents) are non-deterministic requires defensive programming at every layer.
- Rust + Python is a Superpower: Using Rust for the heavy lifting of state management allowed my Python code to be incredibly simple and focused purely on the AI logic.
What's next for RARO - Recursive Agent Runtime Orchestrator
- Streaming Tokens: Currently, I stream log events. I plan to implement token-level streaming from Python -> Rust -> UI for that satisfying "typewriter" effect during inference.
- Marketplace: I want to allow users to save their "Graph Templates" (e.g., "Deep Research Team", "Code Refactor Team") and share them.
- Vector Long-Term Memory: Integrating a vector database into the RFS so agents can recall findings from workflows that ran weeks ago.
Built With
- e2b
- gemini
- python
- rust
- svelte
- tavily
- typescript

Log in or sign up for Devpost to join the conversation.