LLMWatch

LLMWatch: Real-time LLM observability — track costs, latency, errors & model performance across all your AI providers in one dashboard.

🔍 Inspiration

As AI adoption accelerates, engineering teams are flying blind — they have no visibility into what their LLMs are actually doing in production. How much are we spending? Which model is faster? Why did that agent fail? I built LLMWatch to answer these questions.

🏗️ What I Built

LLMWatch is a full-stack B2B LLM observability and orchestration platform featuring:

Multi-Model Routing — Switch between self-hosted Qwen3.5-35B (via vLLM on AWS EC2) and Google Gemini 3 Flash with a single toggle
Real-Time Analytics — Track cost, latency, request volume, and error rates live
Reasoning Mode — See the LLM's chain-of-thought alongside responses
Autonomous ReAct Agent — 4 tools (web search, code execution, DB query, doc analysis) with real-time SSE streaming
Agent Trace Viewer — Full execution traces stored in DynamoDB with timeline visualization
MLFlow Integration — Every LLM call logged for experiment tracking and model comparison
Multi-Tenant Security — JWT auth with company-scoped data isolation

⚙️ How I Built It

Backend: FastAPI + LangChain + MLFlow + AWS DynamoDB + vLLM
Frontend: React 19 + TypeScript + TailwindCSS v4 + shadcn/ui + Framer Motion
Infrastructure: AWS EC2 (GPU for Qwen) + DynamoDB + Docker + Nginx
AI Tools: Google Antigravity (Gemini 3.1 Pro + Claude Sonnet 4.6)

🚧 Challenges

Implementing real-time SSE streaming for the ReAct agent while maintaining DynamoDB trace storage
Self-hosting Qwen3.5-35B-A3B on EC2 with vLLM and 4-bit quantization
Building a multi-tenant architecture where every query is company-scoped
Completing a production-grade full-stack platform solo in 24 hours

📚 What I Learned

MLOps patterns for production LLM deployments
LangChain ReAct agent architecture with custom callback handlers
AWS infrastructure design for AI workloads
The power of AI-assisted development with Google Antigravity

Built With

aws-dynamodb
aws-ec2
docker
fastapi
framer-motion
google-gemini-api
jwt
langchain
mlflow
nginx
pydantic
python
qwen3.5
react
recharts
shadcn/ui
tailwindcss
typescript
vllm

Updates

Ömer Ulucan started this project — Mar 07, 2026 11:40 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.