Siamese

A secure incident analysis tool powered by Google's Gemini.

Siamese started from a practical SRE pain point: incident response is often slowed down by manual log parsing, scattered evidence, and pressure to produce clear root-cause summaries quickly. I wanted a tool that could take raw artifacts (logs, metrics, config changes), analyze them in one place, and return a structured incident report that is immediately useful to engineers and stakeholders.

I built Siamese as a browser-first workspace using React + TypeScript + Vite, with Gemini as the analysis engine. The app ingests uploaded artifacts, combines them with a focused incident question, and asks the model for a strict JSON report with these fields: summary, timeline, root_causes, evidence, mitigations, follow_ups, and confidence. The app also supports model selection so outputs can be compared across different Gemini variants.

A major part of the work was improving reliability and reproducibility, not just UI polish. I added deterministic model settings, validation and error handling improvements, and a testing pipeline with unit/component tests plus Playwright smoke tests. I also ran side-by-side model evaluations (3 Pro, 3 Flash, 2.5 Pro, 2.5 Flash Lite, 2.5 Flash) to measure quality differences and tune prompts toward cleaner evidence formatting and more realistic confidence scoring.

Key challenges:

  • Keeping outputs consistently structured across different models.
  • Preventing noisy or fragmented evidence formatting in generated JSON.
  • Balancing detail vs readability for demo/judging use.
  • Hardening runtime behavior for missing config/env and provider errors.
  • Ensuring deterministic, repeatable behavior suitable for comparison demos.

What I learned:

  • Prompt constraints and schema validation are critical for dependable LLM outputs.
  • Different models can vary significantly in signal-to-noise even on identical inputs.
  • Determinism and robust fallback/error handling matter as much as raw model quality in production-style workflows.
  • Testing AI-assisted products still benefits from classic software testing discipline (unit + UI + e2e).
GHBanner
Share this project:

Updates