Usher-in

Usher-In

Inspiration

We built this because LLM integration is currently a "black box" for developers. Most teams have zero real-time visibility into why a Gemini request is slow or how many tokens are being burned per user session. We wanted to transform Gemini 2.5 Pro from a blind API call into a production-monitored asset.

What it does

Our project, LLM Pulse, provides a high-fidelity "War Room" for AI operations. It captures every request to the Gemini API and exposes critical health metrics:

Operational Success: Real-time monitoring of API success vs. failure rates.

Performance Benchmarking: P95 latency tracking to identify bottlenecks in generative content production.

Cost Management: Aggregated token consumption tracking to prevent budget overruns.

How we built it

The system is built on FastAPI and integrated with Google Vertex AI. We used the Datadog ddtrace library for deep instrumentation.

Telemetry: We engineered custom span attributes to capture total_tokens and prompt_length.

Visualization: We built a custom dashboard using Datadog’s Query Value and Timeseries widgets to aggregate these spans into actionable business intelligence. We quantify our performance density using the following ratio: $$Performance_Density = \frac{\sum Total_Tokens}{\sum Latency_{sec}}$$

Challenges we ran into

The biggest technical hurdle was Data Type Integrity. We initially struggled with Datadog treating numeric token data as "Dimension" strings, which blocked mathematical aggregation. We had to ruthlessly re-engineer our trace facets into Measures to enable the Sum and Rate functions. We also had to debug massive latency spikes (over 1 minute) by isolating cold-start traces from actual model failures.

Accomplishments that we're proud of

We achieved 100% observability. We can trace a single user prompt from the initial HTTP call all the way through the LLM generation and back, with every micro-cent of token cost and every millisecond of latency accounted for on a single pane of glass.

What we learned

We learned that simply "sending data" to an observability platform is useless if it isn't structured for math. We also learned that in the world of GenAI, Latency is the new Uptime. A request that takes 1.26 minutes is a failure, even if the status code says "200 OK."

What's next for LLM Pulse

We plan to implement Automated Cost-Killswitches. Using the token consumption data we've unlocked, we will build a circuit breaker that automatically throttles users or switches to smaller models (like Gemini Flash) if a specific budget threshold is reached in a single hour.

Built With

datadog
ddtrace
fastapi
gemini-api
google-cloud-aiplatform
google-vertex-ai
python

Updates

Yafet Melka started this project — Dec 31, 2025 03:18 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.