Datadog Dashboard

Project Story: Grocery AI Agent

About the Project

Grocery AI Agent is an intelligent kitchen management system that goes beyond simple list-making. It actively manages your kitchen inventory, acts as an AI sous-chef for recipe suggestions, and—most importantly—autonomously restocks your pantry by purchasing items on Amazon when supplies run low. It bridges the gap between digital "smart lists" and physical world actions.

Inspiration

We've all been there: staring at an empty fridge at 7 PM, realizing we forgot the one ingredient needed for dinner. I wanted to build a system that didn't just tell me I was out of milk, but actually did something about it. The inspiration was to create a truly "agentic" experience where the AI takes ownership of a task from start to finish—from "I need eggs" to "Order Confirmed".

What I Learned

The Power of Observability: Building an autonomous agent is scary. Is it ordering 500 eggs? Is it leaking my address? Integrating Datadog taught me how to gain confidence in AI agents by tracking every decision they make.
AI orchestration: Coordinating Vision models (for inventory scanning) and LLMs (for chat/logic) with traditional code requires robust error handling and fallback strategies.
Real-world Automation is messy: Websites change, selectors break. I learned to build resilient Selenium scripts that can handle dynamic DOMs.

How I Built It

The application is a modern full-stack implementation with a heavy focus on agentic workflows and observability:

1. The Core Stack

Frontend: Built with React and Vite for a snappy, premium user experience, styled with Tailwind CSS.
Backend: FastAPI (Python) serving as the brain, managing state with Firebase authentication and Firestore database.
AI Brain: Google Gemini models used for both Computer Vision (analyzing photos of groceries) and Conversational Logic (chatting with the user).

2. The Agent (Restocker)

Automation: A Python-based Selenium agent that logs into Amazon, searches for low-stock items, and adds them to the cart. It runs as a background task, triggered automatically by inventory levels.

3. Datadog Integration (The "Secret Sauce")

APM Tracing: Every API call, database query, and AI generation is traced. We can see exactly how long Gemini takes to respond versus how long Firebase takes to fetch data.
Custom Metrics: We track business metrics like cart.order_value, privacy.pii_redacted, and security.attacks_blocked.
AI Monitoring: We tag traces with user.id and prompt.tokens to monitor AI costs and usage patterns per user.
RUM: Real User Monitoring tracks the frontend experience, ensuring the "Add to Cart" button actually works for users.

Datadog Dashboard & Observability Deep Dive

The heart of this project is the Grocery AI - Observability & Security Dashboard. Here is exactly what we added to maintain 99.9% reliability:

1. Real User Monitoring (RUM) & User Sentiment

What it tracks: Every click on the frontend, including "Frustration Signals" like Rage Clicks (when a user repeatedly clicks a non-responsive button) and Dead Clicks.
Why it helps: This is our early warning system for "AI Hallucination" on the UI side. If a user tries to click "Confirm Order" but the AI generated invalid JSON that broke the button, RUM captures the user's frustration immediately, allowing us to revert the bad model prompt.

2. LLM Monitoring: Cost & Token Usage

We treat the "AI Brain" as a measurable infrastructure component:

Token Usage Tracking: We log llm.usage.tokens for every interaction.
Cost Control: By correlating User IDs with High Token usage on the dashboard, we can identify "Power Users" or "Stuck Loops" that are draining the budget.
Quality vs. Cost: The dashboard shows a direct correlation between Gemini API Latency and Token Count, helping us optimize our system prompts to be concise yet effective.

3. Service Level Objectives (SLOs)

We defined two critical SLOs to ensure we meet user expectations:

[Grocery AI] Backend Availability SLO: Target 99.0%. This ensures that our FastAPI server isn't crashing or returning 500 errors.
[Grocery AI] Gemini API Latency SLO: Target 95% of requests < 2 seconds. Since users expect instant chat responses, this tracks the performance of the Google Gemini model.

3. Security Monitoring System & Case Management

We implemented a robust security layer that doesn't just alert, but opens Datadog Cases for incident tracking. By appending @case to our monitor messages, we move from "Alert Fatigue" to structured "Incident Response."

We deployed 3 Critical Monitors to cover the AI Safety triad:

1. [Security] PII Leak Detected:
- Trigger: If our defined regex detects a phone number or email in the LLM prompt that wasn't successfully redacted by the code.
- Action: Opens a Medium Severity Case. The team investigates why the PII scrubber failed and manually purges the trace data.
2. [Security] Prompt Injection Blocked:
- Trigger: Detects adversarial attempts (e.g., "Ignore all previous instructions" or "System Override").
- Action: Opens a Critical Severity Case. This is an active attack. We immediately block the user.id and analyze the adversarial prompt to harden our System Instructions.
3. [Business] High Value Fraud Suspicion ("The Rogue Agent Stopper"):
- Trigger: If the AI Agent attempts to add an item >$1000 (like Gold Bars or high-end electronics) to the Amazon Cart.
- Action: Opens a High Severity Case for manual approval. The automation is paused until a human verifies it's a legitimate user request, preventing financial loss.

Challenges Faced & Solutions

Challenge 1: The "Black Box" of AI

Problem: In the beginning, it was impossible to know why the AI suggested a wrong recipe or misidentified a fruit.
Solution: I implemented Datadog APM to trace the exact prompt sent to Gemini and the raw response received. I created a dashboard to correlate "Poor Bot Responses" with specific prompt templates, allowing me to iterate on the system prompt rapidly.

Challenge 2: Trust & Security

Problem: Users are hesitant to let an AI handle purchasing or see personal data.
Solution: I built a "Traffic Generator" to simulate bad actors (Hackers, PII leakers) and implemented a PII scrubbing layer. Using Datadog Custom Metrics, I visualized blocked attacks and redacted phone numbers in real-time, proving the system is secure.

Challenge 3: Flaky Automation

Problem: The Amazon restocker script would sometimes fail silently if a CSS selector changed.
Solution: I wrapped the Selenium agent with ddtrace. Now, if the agent fails to find the "Add to Cart" button, it throws a tagged error span in Datadog. I set up a Monitor to alert me immediately if the agent.failure_rate spikes, turning a silent failure into a managed incident.

How to Run It

Want to try it yourself? Run it locally. Here is how to spin up the full Agentic Stack:

Prerequisites

Python 3.10+
Node.js 18+
Datadog API & APP Keys
Google Gemini API Key
Firebase Credentials (serviceAccountKey.json)

1. Backend Setup

cd backend
python -m venv .venv
# Windows: .venv\Scripts\activate
# Mac/Linux: source .venv/bin/activate
pip install -r requirements.txt

# Set Environment Variables
$env:DD_API_KEY="<your_key>"
$env:GOOGLE_API_KEY="<your_key>"
# Run with Datadog Tracer
ddtrace-run uvicorn main:app --reload

2. Frontend Setup

cd frontend
npm install
npm run dev
# App will operate at localhost:5173

3. Generate Traffic (Simulate Attacks)

# This script simulates Normal users, Hackers, and Fraudsters!
ddtrace-run python traffic_gen.py

Vote of Thanks

A huge thank you to Google used for the powerful Gemini Models that serve as the brain of this agent, making complex reasoning possible with incredible speed.

And a special thanks to Datadog for the observability platform. Without Datadog, this agent would be a black box. The ability to trace a request from a React button click, through a FastAPI backend, into a Google Gemini API call, and back—all while monitoring for security threats—is a game changer for building Agentic AI.

Built With

Updates

Arkajyoti Dey started this project — Dec 31, 2025 04:58 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.