What It Does
Bob is a multi-agent AI system that continuously monitors your cloud infrastructure and acts on what it finds:
| Stage | What Happens |
|---|---|
| Anomaly Detection | Scans AWS cost records and CloudWatch metrics for spending spikes, unusual patterns, and resource anomalies across all monitored services |
| Cost Forecasting | Predicts future costs using Amazon Chronos and Lag-Llama time-series ML models, flagging services trending upward before they blow the budget |
| Root Cause Investigation | Correlates anomalies with GitHub commits, application logs, and deployment timelines — building a "commit → deploy → spike → error" timeline |
| Automated Solutions | Generates prioritized recommendations with estimated savings, and can auto-create GitHub Pull Requests with concrete code/config fixes |
| Slack Alerts | Automatically notifies the team via Slack webhook when critical or high-severity anomalies are detected |
| Interactive Chat | Real-time streaming chat (SSE) with the orchestrator agent — ask questions in natural language and get answers backed by live data, charts, and Plotly visualizations |
The Background Analyst runs autonomously on a 3-minute cycle: scan → forecast → recommend → investigate → enrich → alert. Every finding is pushed to the frontend in real-time via Server-Sent Events.
Key Design Decisions
- LangchainRB + Gemini: We chose to run all agent logic in Ruby to keep the stack unified. The agents use LangchainRB's tool-calling system with Google Gemini as the LLM — fast, cheap, and surprisingly good at structured JSON output.
- Background Analyst as a Ruby Thread: Instead of a separate worker process, the analyst runs as a single thread inside Puma. Simple, zero-infrastructure, and pushes alerts to an in-memory queue that the SSE endpoint drains.
- Python only for ML: The only Python code is the forecasting microservice. Chronos and Lag-Llama require PyTorch, which doesn't exist in Ruby — so we isolated them behind a clean FastAPI boundary.
- Agents create GitHub PRs: The Solution Agent doesn't just recommend fixes — it can call
create_pull_requestto commit file changes to a branch and open a real PR on your repo.
Tech Stack
| Layer | Technology |
|---|---|
| Frontend | React 19, Vite 6, TailwindCSS 4, Recharts, Plotly.js, Lucide icons, Motion |
| Backend | Ruby on Rails 8.1, Puma, SSE streaming, SQLite |
| AI/LLM | Google Gemini 2.0 Flash via LangchainRB, multi-agent orchestration |
| ML Forecasting | Python FastAPI, Amazon Chronos (tiny/small), Lag-Llama (probabilistic) |
| Integrations | AWS Cost Explorer, CloudWatch, GitHub API (commits + PRs), Slack webhooks |
| Data | SQLite with 6 tables: cost_records, cloudwatch_metrics, log_events, commits, anomalies_detected, resource_tags |
🚧 Challenges We Ran Into
Getting agents to produce clean JSON: LLMs love to wrap JSON in markdown code fences, add explanatory prose, or hallucinate extra fields. We spent significant time crafting prompts that reliably produce raw JSON arrays/objects, and wrote robust
extract_json_array/extract_json_objectparsers as fallbacks.Agent timeout management: Each sub-agent can call multiple tools in a chain (list resources → query metrics → query costs → generate chart). On a cold start or with large datasets, this can exceed 2 minutes. We added per-agent timeouts with graceful degradation — if an agent times out, it returns whatever partial result it has.
SSE streaming in Rails: Implementing real-time Server-Sent Events for both the chat interface and the background analyst feed required careful thread-safety. The background analyst pushes to a mutex-protected in-memory array, and the SSE controller polls it with long-polling to avoid blocking Puma threads.
Chronos + Lag-Llama on CPU: The ML models are designed for GPU inference. Getting Amazon Chronos and Lag-Llama to run acceptably on CPU-only hackathon machines required careful model size selection (tiny/small) and batching strategies.
Multi-agent coordination: The orchestrator needs to decide which sub-agents to call and in what order, passing context between them. Getting the orchestrator to reliably follow the "scan → investigate → solve → forecast" pipeline without skipping steps or going in loops was one of the trickiest prompt engineering challenges.
🏆 Accomplishments We're Proud Of
Fully autonomous analysis loop: The Background Analyst runs 24/7 with zero human intervention — scanning, forecasting, investigating, and enriching findings every 3 minutes. It's not a chatbot you have to ask; it finds problems on its own.
End-to-end from anomaly to PR: Bill or Bob can detect a cost spike, trace it to a specific commit, generate a fix, and open a GitHub Pull Request — completely autonomously. That's the full DevOps loop closed by AI.
Real-time streaming UI: Every agent thought, tool call, finding, and recommendation streams live to the frontend via SSE. You can watch the AI think in real-time — see it scan resources, query metrics, correlate commits, and build its analysis step by step.
Multi-model forecasting: We didn't settle for one model. Users get predictions from both Amazon Chronos (deterministic) and Lag-Llama (probabilistic with confidence intervals), with a comparison view to assess forecast reliability.
Clean multi-agent architecture: 5 agents, each with a focused role, composable tools, and a shared database — all orchestrated by a single conductor that maintains conversation context across sessions.
What We Learned
- Ruby is underrated for AI agents: LangchainRB + Gemini is a surprisingly productive combo. Ruby's expressiveness made the agent code readable and maintainable, and Rails' conventions kept the project organized even as it grew to 5 agents, 8 tools, and 6 services. Also, Putting Chronos and Lag-Llama behind a simple FastAPI boundary was the best architectural decision. The Ruby app doesn't need to know about PyTorch tensors — it just POSTs a JSON request and gets a forecast back.
Log in or sign up for Devpost to join the conversation.