Inspiration
- Empty shelves and overstock waste kill margins in grocery. We wanted an agent that learns seasonality, lead-time drift, and past mistakes—then proves it with measurable evals.
What it does
- Seeds a realistic supermarket dataset (2 years, seasonal demand, multiple SKUs/suppliers).
- Shows live stock + recent POs in a browser (
demo.html) and triggers the agent via Airia. - Scores agent performance with Braintrust (lead-time accuracy, stockouts, waste) across six historical cycles to show improvement.
How we built it
simulate.pygenerates products, suppliers, sales history, and six learning cycles, then loads everything into Supabase.demo.html(Supabase JS) pulls live stock/orders and calls Airia through a tiny Python proxy.braintrust_eval.pypulls delivered orders from Supabase, computes per-cycle scores, and pushes experiments to Braintrust.
Challenges we ran into
- Getting Airia to return clean JSON arrays consistently (LLM fences/error wrappers).
- Balancing demand realism (seasonal spikes, perishables) with fast seeding for demos.
- Keeping evals “live” while avoiding hardcoded scores.
Accomplishments that we're proud of
- A complete loop: seeded data → live agent trigger → database updates → eval scores that show a clear improvement arc (cycles 1–6).
- Plain-English agent reasoning stored on each PO for judge transparency.
- Seasonal intelligence baked into the simulator (milk in Dec, flu meds in Oct–Dec, soup in autumn, etc.).
What we learned
- Strict output contracts matter for agent pipelines—schema + fence stripping avoids brittle parses.
- Eval clarity beats model complexity: three simple metrics (lead time, stockouts, waste) tell a convincing story.
- Simulated data needs intentional “interesting” states (one SKU always near red) to make demos compelling.
What's next for StockPulse
- Re-enable the hourly/Airia simulation loop once the webhook returns validated
ordersarrays. - Add anomaly alerts (supplier slippage, sudden demand spikes) and push to Slack.
- Swap the public Supabase key in
demo.htmlto an anon key for safer sharing. - Expand Braintrust runs to cover fresh incoming cycles automatically (nightly cron).
Built With
- airia
- braintrust
- google-deepmind
- supabase
Log in or sign up for Devpost to join the conversation.