🔍 Inspiration

As AI adoption accelerates, engineering teams are flying blind — they have no visibility into what their LLMs are actually doing in production. How much are we spending? Which model is faster? Why did that agent fail? I built LLMWatch to answer these questions.

🏗️ What I Built

LLMWatch is a full-stack B2B LLM observability and orchestration platform featuring:

  • Multi-Model Routing — Switch between self-hosted Qwen3.5-35B (via vLLM on AWS EC2) and Google Gemini 3 Flash with a single toggle
  • Real-Time Analytics — Track cost, latency, request volume, and error rates live
  • Reasoning Mode — See the LLM's chain-of-thought alongside responses
  • Autonomous ReAct Agent — 4 tools (web search, code execution, DB query, doc analysis) with real-time SSE streaming
  • Agent Trace Viewer — Full execution traces stored in DynamoDB with timeline visualization
  • MLFlow Integration — Every LLM call logged for experiment tracking and model comparison
  • Multi-Tenant Security — JWT auth with company-scoped data isolation

⚙️ How I Built It

Backend: FastAPI + LangChain + MLFlow + AWS DynamoDB + vLLM
Frontend: React 19 + TypeScript + TailwindCSS v4 + shadcn/ui + Framer Motion
Infrastructure: AWS EC2 (GPU for Qwen) + DynamoDB + Docker + Nginx
AI Tools: Google Antigravity (Gemini 3.1 Pro + Claude Sonnet 4.6)

🚧 Challenges

  • Implementing real-time SSE streaming for the ReAct agent while maintaining DynamoDB trace storage
  • Self-hosting Qwen3.5-35B-A3B on EC2 with vLLM and 4-bit quantization
  • Building a multi-tenant architecture where every query is company-scoped
  • Completing a production-grade full-stack platform solo in 24 hours

📚 What I Learned

  • MLOps patterns for production LLM deployments
  • LangChain ReAct agent architecture with custom callback handlers
  • AWS infrastructure design for AI workloads
  • The power of AI-assisted development with Google Antigravity

Built With

Share this project:

Updates