Inspiration

  • Empty shelves and overstock waste kill margins in grocery. We wanted an agent that learns seasonality, lead-time drift, and past mistakes—then proves it with measurable evals.

What it does

  • Seeds a realistic supermarket dataset (2 years, seasonal demand, multiple SKUs/suppliers).
  • Shows live stock + recent POs in a browser (demo.html) and triggers the agent via Airia.
  • Scores agent performance with Braintrust (lead-time accuracy, stockouts, waste) across six historical cycles to show improvement.

How we built it

  • simulate.py generates products, suppliers, sales history, and six learning cycles, then loads everything into Supabase.
  • demo.html (Supabase JS) pulls live stock/orders and calls Airia through a tiny Python proxy.
  • braintrust_eval.py pulls delivered orders from Supabase, computes per-cycle scores, and pushes experiments to Braintrust.

Challenges we ran into

  • Getting Airia to return clean JSON arrays consistently (LLM fences/error wrappers).
  • Balancing demand realism (seasonal spikes, perishables) with fast seeding for demos.
  • Keeping evals “live” while avoiding hardcoded scores.

Accomplishments that we're proud of

  • A complete loop: seeded data → live agent trigger → database updates → eval scores that show a clear improvement arc (cycles 1–6).
  • Plain-English agent reasoning stored on each PO for judge transparency.
  • Seasonal intelligence baked into the simulator (milk in Dec, flu meds in Oct–Dec, soup in autumn, etc.).

What we learned

  • Strict output contracts matter for agent pipelines—schema + fence stripping avoids brittle parses.
  • Eval clarity beats model complexity: three simple metrics (lead time, stockouts, waste) tell a convincing story.
  • Simulated data needs intentional “interesting” states (one SKU always near red) to make demos compelling.

What's next for StockPulse

  • Re-enable the hourly/Airia simulation loop once the webhook returns validated orders arrays.
  • Add anomaly alerts (supplier slippage, sudden demand spikes) and push to Slack.
  • Swap the public Supabase key in demo.html to an anon key for safer sharing.
  • Expand Braintrust runs to cover fresh incoming cycles automatically (nightly cron).

Built With

  • airia
  • braintrust
  • google-deepmind
  • supabase
Share this project:

Updates