Inspiration

Inspired by e-commerce losses from undetected anomalies in recommendation systems, like latency spikes causing cart abandonment. Aimed to automate monitoring using Vertex AI and Datadog for faster remediation.

What it does

Detects anomalies in Vertex AI recommendation engines, such as latency spikes (>200ms), model drift, and fraud patterns. Streams telemetry to Datadog, visualizes via dashboards, triggers rules for incidents with context for engineers

How we built it

Used Vertex AI for recommendation engine with Gemini models. Integrated Datadog API for metric streaming, dashboard creation, and rule-based alerts. Simulated data with Python, Scikit-Learn for detection, deployed on Google Cloud and Vercel(demo).

Challenges we ran into

Integrating real-time telemetry from Vertex AI to Datadog; handling noisy data for accurate anomaly rules; simulating realistic e-commerce anomalies without production access.

Accomplishments that we're proud of

Built end-to-end observability for LLM apps; achieved low false positives in detection; created actionable dashboards that reduce response time by 50%.

What we learned

Deepened knowledge in Vertex AI telemetry, Datadog workflows, and anomaly detection techniques; importance of thresholds in balancing alerts and noise.

What's next for AI-Powered Anomaly Detection for E-Commerce

Add ML-based adaptive thresholds; integrate with more partners; expand to fraud detection in transactions; open-source components for community contributions.

Built With

Share this project:

Updates