Inspiration

Every Gemini project I built last year ended up with the same four problems once it left my laptop. I couldn't tell which prompt was burning my Vertex AI bill. I couldn't see when p95 latency started creeping up after a release. I couldn't notice when the model's output got noticeably longer or noisier. And I couldn't audit which external hosts my agent's tools had actually called. There are tools for each piece, but they either lock you into a hosted backend or ask you to install a giant APM agent. GeminiLens is the smallest thing that solves all four, locally, before you have to commit to a vendor.

What it does

GeminiLens wraps any Vertex AI Gemini client and produces a Trace record per call with prompt, response, token counts, latency, USD cost, and a list of tool invocations. It includes a rolling-vs-baseline drift report so you can see when latency, cost, or output length is shifting. An httpx-based egress allowlist enforces that agent tools can only reach approved hosts. A Streamlit dashboard renders the traces with live metrics, drift cards, and a timeline. For production, an optional Dynatrace exporter pushes every trace as a structured log event with full gen_ai.usage.* semantic conventions.

How I built it

  • google-genai for Vertex AI Gemini 2.5 calls
  • Streamlit + pandas for the dashboard
  • httpx custom transport for the egress allowlist
  • Pure stdlib for cost math and drift, so the math is reviewable
  • pytest covering cost, observer, guard, Azure adapter, and Dynatrace exporter

The Gemini cost table is hand-curated from Google's published pricing and is checked into the repo so reviewers can audit it without leaving GitHub. There's also an Azure OpenAI adapter that wraps the same Trace shape for projects that mix Gemini and Azure OpenAI.

Challenges I ran into

google-genai's usage_metadata shape varies between client versions and between Vertex AI and the public Gemini API. The observer handles both. Drift on a small trace history is noisy, so I expose the window sizes and sample counts explicitly in the report rather than hiding them. Streamlit's WebSocket-driven rendering meant headless Chrome screenshots needed virtual-time budgets to capture a populated dashboard.

Accomplishments that I'm proud of

  • 19 passing tests covering the full public API
  • Cost calculator returned $0.000027 for a real 40-in/6-out gemini-2.5-flash call, matching the published price table to the rounding penny
  • Auto-seed on cold dashboard load so a reviewer opening the URL fresh sees a populated UI with realistic drift cards
  • Self-observation: GeminiLens can wrap itself, recording the cost of its own demo runs

What I learned

The OpenInference and OpenTelemetry GenAI semantic conventions are still diverging. Picking conservative attribute names (gen_ai.usage.input_tokens) that work in both worlds keeps the Dynatrace exporter future-proof.

What's next for GeminiLens

  • Multi-process trace stitching for distributed agents
  • Vector-store retrieval drift signal
  • One-click Arize Phoenix export
  • TrueFoundry exporter for the LLM observability sponsor track on this hackathon

Built With

  • azure-openai
  • dynatrace
  • gemini
  • gemini-2.5
  • google-genai
  • httpx
  • llm-observability
  • opentelemetry
  • pandas
  • python
  • streamlit
  • vertex-ai
Share this project:

Updates