Documentation Page
Dashboard Page

ObservAI

Open-source AI observability platform designed to make large language models (LLMs) transparent, reliable, and actionable in real-world systems.

🧠 Inspiration

While building production AI applications with Gemini / Vertex AI, I realized something scary:
LLMs fail silently.

Prompts silently consume thousands of extra tokens, costs spike without warning, hallucinations creep in, and semantic drift happens — yet most teams ship without any visibility. Traditional observability tools stop at infra metrics like latency and error codes. LLMs are probabilistic systems, and they need a new layer of observability that understands AI behavior, not just server metrics.

That gap is what inspired ObservAI.

This challenge became even more relevant in the context of global mobility and digital inclusion — where reliable, efficient AI can help build tools for translation, documentation, collaboration, and intelligent workflows that make it easier for people to work and interact across borders.

🚀 What ObservAI Does

ObservAI is an enterprise-grade LLM observability platform for production AI systems.

It provides comprehensive tracking across your AI infrastructure:

📊 Token usage & latency metrics
💰 Real-time cost attribution per request
🎯 Quality metrics (coherence, toxicity, hallucination risk)
🔍 40 AI/ML detection rules for anomalies
🤖 Lyra — AI-powered prompt optimizer

All data is processed through a secure backend for real-time analytics, alerts, and actionable insights to help teams build AI services that are trustworthy and scalable.

🏗️ How I Built It

Architecture overview:

$$ \text{User App} \rightarrow \text{ObservAI SDK} \rightarrow \text{Supabase Edge Function} \rightarrow \text{PostgreSQL} \rightarrow \text{Dashboard} $$

This architecture ensures zero developer friction while enabling deep LLM observability.

1. SDK Layer

Lightweight TypeScript/JavaScript SDK (npm: @observai/sdk)
Wraps AI calls and intercepts all LLM interactions
Collects telemetry (tokens, latency, cost, quality scores)
Added batching + async sending for minimal overhead
Cross-platform support: frontend, backend, AWS Lambda, Docker

2. Quality Analysis Engine

Lightweight heuristic + NLP scoring
Semantic awareness for coherence and hallucination risk
Normalized scores between (0 \le x \le 1)
Toxicity and prompt injection detection

3. Ingestion Backend

Supabase Edge Functions for serverless ingestion
PostgreSQL with strong row-level security
Batched writes with validation
Real-time anomaly detection hooks

4. Real-Time Dashboard

Modern React/Vite/Tailwind web interface
Live metrics visualization per request
Token usage, latency percentiles, cost attribution
Alerts and root-cause insights

5. AI-Specific Detection Rules & Lyra

Suite of 40 AI/ML detection rules that catch:
- Hallucination risk
- Prompt injection attempts
- Cost spikes and token inefficiencies
- Latency anomalies
- Semantic drift over time
- Toxic output patterns
Lyra — data-driven prompt optimizer using live metrics to suggest improved prompts

⚠️ Challenges Faced

Defining meaningful AI quality metrics
Traditional monitoring tools lack semantic awareness. I had to design new metrics (coherence, hallucination risk, toxicity) that correlate with real AI quality, not just infrastructure health.
Balancing performance and overhead
Collecting telemetry without slowing down AI inference required efficient batching and optional async sending — making observability invisible to developers.
Security & Privacy
Ensuring telemetry doesn't leak sensitive user data while still providing useful insights required careful design of row-level security and data handling.
SDK Usability
Making the SDK drop-in and cross-platform required careful API design and strong TypeScript typing to ensure developers could adopt it in minutes.

Each challenge pushed me to think about AI observability as fundamentally different from traditional observability — it's behavioral first, not infrastructure first.

📚 What I Learned

AI behavior must be monitored at a semantic level, not just as requests and responses
LLMs should be treated as production systems, not just APIs
Developers need not just alerts, but actionable insights — a key reason why Lyra is part of this system
Open-source tools must be simple to adopt: this is why ObservAI's SDK installation can be done in minutes
Small SDK design decisions massively affect developer adoption

I also dove deeply into AI evaluation techniques, client-side instrumentation patterns, and secure serverless backend design.

Most importantly, I learned how to design AI-native infrastructure.

🧩 Built With

Languages

TypeScript
JavaScript

Frontend

React
Vite
Tailwind CSS

LLM Platform

Google Gemini on Vertex AI
Compatible with multiple LLM providers

Backend / Infrastructure

Supabase Edge Functions
PostgreSQL
Serverless architecture

SDK & Tooling

ObservAI SDK (@observai/sdk)
Node.js
npm package distribution

🌍 Impact & Global Relevance

ObservAI is designed to help the next generation of AI-powered tools that are:

More cost-efficient through prompt optimization and usage insights
More reliable in production with real-time anomaly detection
Safer by detecting toxicity and prompt injections
Better at providing meaningful responses through quality monitoring

These improvements matter especially for tools that support global mobility, cross-cultural communication, and collaborative workflows. By providing developers with real insights into how their AI behaves in the real world, ObservAI enables creation of AI services that are trustworthy and scalable — critical for global accessibility and inclusion.