What It Does

Bob is a multi-agent AI system that continuously monitors your cloud infrastructure and acts on what it finds:

Stage What Happens
Anomaly Detection Scans AWS cost records and CloudWatch metrics for spending spikes, unusual patterns, and resource anomalies across all monitored services
Cost Forecasting Predicts future costs using Amazon Chronos and Lag-Llama time-series ML models, flagging services trending upward before they blow the budget
Root Cause Investigation Correlates anomalies with GitHub commits, application logs, and deployment timelines — building a "commit → deploy → spike → error" timeline
Automated Solutions Generates prioritized recommendations with estimated savings, and can auto-create GitHub Pull Requests with concrete code/config fixes
Slack Alerts Automatically notifies the team via Slack webhook when critical or high-severity anomalies are detected
Interactive Chat Real-time streaming chat (SSE) with the orchestrator agent — ask questions in natural language and get answers backed by live data, charts, and Plotly visualizations

The Background Analyst runs autonomously on a 3-minute cycle: scan → forecast → recommend → investigate → enrich → alert. Every finding is pushed to the frontend in real-time via Server-Sent Events.


Key Design Decisions

  • LangchainRB + Gemini: We chose to run all agent logic in Ruby to keep the stack unified. The agents use LangchainRB's tool-calling system with Google Gemini as the LLM — fast, cheap, and surprisingly good at structured JSON output.
  • Background Analyst as a Ruby Thread: Instead of a separate worker process, the analyst runs as a single thread inside Puma. Simple, zero-infrastructure, and pushes alerts to an in-memory queue that the SSE endpoint drains.
  • Python only for ML: The only Python code is the forecasting microservice. Chronos and Lag-Llama require PyTorch, which doesn't exist in Ruby — so we isolated them behind a clean FastAPI boundary.
  • Agents create GitHub PRs: The Solution Agent doesn't just recommend fixes — it can call create_pull_request to commit file changes to a branch and open a real PR on your repo.

Tech Stack

Layer Technology
Frontend React 19, Vite 6, TailwindCSS 4, Recharts, Plotly.js, Lucide icons, Motion
Backend Ruby on Rails 8.1, Puma, SSE streaming, SQLite
AI/LLM Google Gemini 2.0 Flash via LangchainRB, multi-agent orchestration
ML Forecasting Python FastAPI, Amazon Chronos (tiny/small), Lag-Llama (probabilistic)
Integrations AWS Cost Explorer, CloudWatch, GitHub API (commits + PRs), Slack webhooks
Data SQLite with 6 tables: cost_records, cloudwatch_metrics, log_events, commits, anomalies_detected, resource_tags

🚧 Challenges We Ran Into

  1. Getting agents to produce clean JSON: LLMs love to wrap JSON in markdown code fences, add explanatory prose, or hallucinate extra fields. We spent significant time crafting prompts that reliably produce raw JSON arrays/objects, and wrote robust extract_json_array / extract_json_object parsers as fallbacks.

  2. Agent timeout management: Each sub-agent can call multiple tools in a chain (list resources → query metrics → query costs → generate chart). On a cold start or with large datasets, this can exceed 2 minutes. We added per-agent timeouts with graceful degradation — if an agent times out, it returns whatever partial result it has.

  3. SSE streaming in Rails: Implementing real-time Server-Sent Events for both the chat interface and the background analyst feed required careful thread-safety. The background analyst pushes to a mutex-protected in-memory array, and the SSE controller polls it with long-polling to avoid blocking Puma threads.

  4. Chronos + Lag-Llama on CPU: The ML models are designed for GPU inference. Getting Amazon Chronos and Lag-Llama to run acceptably on CPU-only hackathon machines required careful model size selection (tiny/small) and batching strategies.

  5. Multi-agent coordination: The orchestrator needs to decide which sub-agents to call and in what order, passing context between them. Getting the orchestrator to reliably follow the "scan → investigate → solve → forecast" pipeline without skipping steps or going in loops was one of the trickiest prompt engineering challenges.


🏆 Accomplishments We're Proud Of

  • Fully autonomous analysis loop: The Background Analyst runs 24/7 with zero human intervention — scanning, forecasting, investigating, and enriching findings every 3 minutes. It's not a chatbot you have to ask; it finds problems on its own.

  • End-to-end from anomaly to PR: Bill or Bob can detect a cost spike, trace it to a specific commit, generate a fix, and open a GitHub Pull Request — completely autonomously. That's the full DevOps loop closed by AI.

  • Real-time streaming UI: Every agent thought, tool call, finding, and recommendation streams live to the frontend via SSE. You can watch the AI think in real-time — see it scan resources, query metrics, correlate commits, and build its analysis step by step.

  • Multi-model forecasting: We didn't settle for one model. Users get predictions from both Amazon Chronos (deterministic) and Lag-Llama (probabilistic with confidence intervals), with a comparison view to assess forecast reliability.

  • Clean multi-agent architecture: 5 agents, each with a focused role, composable tools, and a shared database — all orchestrated by a single conductor that maintains conversation context across sessions.


What We Learned

  • Ruby is underrated for AI agents: LangchainRB + Gemini is a surprisingly productive combo. Ruby's expressiveness made the agent code readable and maintainable, and Rails' conventions kept the project organized even as it grew to 5 agents, 8 tools, and 6 services. Also, Putting Chronos and Lag-Llama behind a simple FastAPI boundary was the best architectural decision. The Ruby app doesn't need to know about PyTorch tensors — it just POSTs a JSON request and gets a forecast back.
Share this project:

Updates