LineWise — OEE Decision Support for Damm Canning Lines
HackDAMM 2026 entry. A decision-support layer for production planning on Damm's lines 14, 17 and 19 at El Prat. LineWise enriches Damm's existing theoretical planning (Blue Yonder) with evidence from what actually happened: it enforces hard physical constraints, flags historically-toxic SKU transitions, predicts weekly OEE, and gives planners a risk-banded view of any urgent-demand insertion.
TL;DR — the value, in three claims that survive scrutiny
- Three parallel data-rules catch 33% of urgent-demand insertion options as flagged (332 / 995 feasible options across 108 held-out test scenarios), on top of the format-compatibility safety net. The flags are split across (a) historically bottom-decile OEE pairs, (b) pairs whose actual changeover is ≥1.5× line median, and (c) cross-format insertions on multi-format lines. In 26% of scenarios there is no clean option across any line — the system tells the planner "pick the least-bad option" with reasons. (reports/urgent_demand_backtest_summary.txt)
- The weekly OEE forecaster predicts next-week 4-wk trailing OEE per line at R² = 0.82 — a calibrated capacity signal at the level the data supports. (models/framing_comparison.png)
- Annualised soft-rule value range: €49k–€99k per year across 3 lines (planner-time only; the HARD format check is excluded from this number because Blue Yonder is assumed to already enforce it — see "Why not just Blue Yonder?" below).
The daily run-level OEE model (R² ≈ 0.40, MAE ≈ 0.10) is deliberately not the headline. We tested it two independent ways on held-out data:
- Predicted-lift backtest: 95% CI on within-day reordering lift is [-0.0016, +0.0023] — statistically rules out any meaningful daily reordering effect.
- Realised-OEE agreement backtest: no significant correlation between optimizer agreement and realised OEE (p = 0.17).
Both findings are documented honestly in POST_MORTEM.md and the reports/ folder. Within-day sequence reordering is not where the project's value lives, and we don't claim it does.
What's in the box
| Capability | Where it lives | Validation |
|---|---|---|
| Hard line-format constraint layer (L14: 1/3+1/2 · L17: 1/3 only · L19: 1/3+1/2+2/5) | src/optimizer.py (LINE_FORMATS, line_can_run) |
100% catch — 15.3% of test-period urgent-demand options blocked |
| Worst-decile transition avoidance (data filter: 16 known-toxic pairs with mean OEE ≤ 0.38) | src/optimizer.py (worst_decile_transitions) |
Surfaced in the urgent-demand tab as flagged options |
| Urgent-demand triage (rank by safety → then OEE band → then changeover) | src/simulator.py (inject_urgent_demand) |
reports/urgent_demand_backtest_summary.txt |
| Weekly OEE forecaster (next-week 4-wk trailing mean per line) | src/weekly_forecast.py |
R² = 0.82 on held-out weeks |
| Line-relative risk bands (per-line μ/σ, not absolute 0.70/0.80) | src/simulator.py (_classify_risk) |
Makes the risk traffic-light meaningful given mean OEE of 0.40–0.53 |
| Daily-run OEE predictor (CatBoost + XGBoost + Combined ensemble) | src/catboost_model.py, src/xgboost_model.py, src/predict.py |
Test MAE ≈ 0.10, R² ≈ 0.40 — feeds the simulator as a tiebreaker |
| Quantile prediction intervals (q10 / q50 / q90) | src/xgboost_model.py (train_quantile) |
~71% empirical coverage |
| What-if simulator + scenario comparison | src/simulator.py |
UI: Sequence Planner + Scenario Comparison tabs |
| Three-framing R² ceiling diagnostic (AS-IS / LEAKAGE / WEEKLY) | scripts/evaluate_model.py + src/framings.py |
Headline plot models/framing_comparison.png |
| Data post-mortem (12-panel EDA) | scripts/eda_report.py |
reports/eda_report.png, reports/eda_findings.txt |
| Held-out backtest #1: within-day sequence reordering | scripts/backtest_recommender.py |
Statistical CI framing in reports/backtest_summary.txt |
| Held-out backtest #2: optimizer-agreement vs realised OEE | scripts/backtest_similarity.py |
reports/backtest_similarity_summary.txt |
| Held-out backtest #3: urgent-demand triage counterfactual | scripts/backtest_urgent_demand.py |
reports/urgent_demand_backtest_summary.txt |
| FastAPI bridge (5 data endpoints + Gemini chatbot stream) | api/server.py |
/lines, /skus, /history, /simulate, /urgent, /chat |
| React frontend (4 routes + floating Gemini assistant) | web/ |
TanStack Start + shadcn/ui + typed API client in src/lib/api/ |
| Gemini-backed assistant (system prompt grounded in this repo) | src/chatbot.py |
Used by POST /chat; UI widget in web/src/components/linewise/Chatbot.tsx |
Repository layout
HackDAMM2026/
├── README.md ← this file
├── POST_MORTEM.md ← ceiling story + documented negative results
├── requirements.txt
├── LICENSE
│
├── data/
│ ├── raw/ ← original Damm CSV/Excel exports
│ ├── parsed/ ← normalised per-source DataFrames
│ └── processed/ ← runs_df.csv, changeover_matrix.csv, product_meta.csv
│
├── src/ ← all library code (importable as `src.X`)
│ ├── pipeline.py ← raw → parsed → processed (runs_df builder)
│ ├── features.py ← 37 leakage-free features + LabelEncoders
│ ├── framings.py ← AS-IS / LEAKAGE / WEEKLY problem framings
│ ├── xgboost_model.py ← XGB train/tune/predict + quantile models
│ ├── catboost_model.py ← CatBoost train/tune/predict
│ ├── predict.py ← Combined model + predict_oee_any dispatcher
│ ├── weekly_forecast.py ← weekly panel + forecaster (R² 0.82)
│ ├── optimizer.py ← LINE_FORMATS + worst_pair + OR-Tools TSP
│ └── simulator.py ← what-if + urgent-demand + worst-pair flagging
│
├── scripts/ ← entry-point scripts (runnable from anywhere)
│ ├── evaluate_model.py ← train all 3 framings × 3 model variants
│ ├── backtest_recommender.py ← within-day reorder predicted-lift backtest
│ ├── backtest_similarity.py ← optimizer-agreement vs realised OEE
│ ├── backtest_urgent_demand.py ← rule-layer counterfactual on urgent demand
│ └── eda_report.py ← 12-panel data post-mortem
│
├── api/
│ ├── __init__.py
│ └── server.py ← FastAPI bridge (5 data endpoints + /chat SSE)
│
├── web/ ← React (TanStack Start) frontend
│ ├── src/lib/api/ ← typed fetch clients + reference store
│ ├── src/components/linewise/ ← LineWise UI components (incl. Chatbot)
│ └── src/routes/ ← /, /forensic, /optimizado, /urgente
│
├── src/
│ ├── ... ← (same library code as above)
│ └── chatbot.py ← Gemini wrapper used by /chat
│
├── models/ ← trained artefacts (.pkl, .json, .png)
├── reports/ ← backtest + EDA outputs
└── notebooks/ ← exploratory notebooks
Setup (one-time)
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Running the project
All commands assume you are at the repo root with .venv activated.
One-shot demo prep (recommended before showing a judge)
# Trains models + runs all three backtests. ~3-5 min total.
make demo
# or, if make isn't available:
python scripts/evaluate_model.py --no-shap
python scripts/backtest_recommender.py
python scripts/backtest_similarity.py
python scripts/backtest_urgent_demand.py
Then open the app in two terminals:
# Terminal 1 — FastAPI (models + Gemini chatbot bridge) on :8000
make api
# Terminal 2 — React frontend on :8080
make ui
Individual commands
# Train + diagnose all 3 framings × 3 model variants (~2-3 min)
python scripts/evaluate_model.py
python scripts/evaluate_model.py --no-shap # skip SHAP for speed
python scripts/evaluate_model.py --tune --trials 60 # re-tune via Optuna (~15 min)
# The three held-out backtests
python scripts/backtest_recommender.py # within-day reorder (predicted lift)
python scripts/backtest_similarity.py # optimizer agreement vs realised OEE
python scripts/backtest_urgent_demand.py # rule-layer counterfactual (headline)
# 12-panel data post-mortem
python scripts/eda_report.py
React frontend + FastAPI bridge
The React app (web/) is the only UI now — it talks to
the Python models through api/server.py. The API loads the Combined
(XGBoost + CatBoost) ensemble by default; fall back is CatBoost, then XGBoost.
| URL | What it shows |
|---|---|
/ |
Home — today's Gantt across L14/L17/L19 + day highlights |
/forensic |
Historical OEE per line + worst-decile transition heatmap |
/optimizado |
Plan A (alternated baseline) vs Plan B (brand-grouped) with real /simulate predictions |
/urgente |
Insert urgent SKUs, see per-rule evidence flags from /urgent |
| floating bot button | Ask LineWise — Gemini-powered assistant streaming from /chat |
The frontend resolves the API URL from VITE_API_URL — copy
web/.env.example to .env.local if you change the port.
Enable the Gemini chatbot
echo "GOOGLE_API_KEY=your-gemini-api-key" >> .env
make api # restart so the API picks up the env var
Then click the bot button in the bottom-right of the React app. The dot is green when the assistant is reachable and amber when offline (e.g. key missing) — the offline reason is shown inline.
Docker (one command, no local Python or Node needed)
The whole stack — Python ML backend + React frontend — runs in two containers
wired together by docker-compose.yml.
# Optional: enable the chatbot
echo "GOOGLE_API_KEY=your-gemini-key" > .env
# Build + launch both services in the background
make docker-up # or: docker compose up --build -d
# Open the app
open http://localhost:8080
| Service | Container | Port | What it runs |
|---|---|---|---|
api |
linewise-api |
8000 | Python 3.11 + uvicorn + the Combined ML ensemble + the Gemini bridge |
web |
linewise-web |
8080 | Node 22 + Vite dev server serving the React UI |
The first build downloads ~1.2 GB of Python ML wheels (catboost / xgboost / shap / ortools) and ~250 MB of npm modules; subsequent builds reuse the layer cache and complete in seconds.
make docker-logs # tail combined logs
make docker-down # stop and remove containers
make docker-rebuild # force a fresh build (no cache)
Customise ports / API URL by exporting env vars before bringing the stack up:
API_PORT=9000 WEB_PORT=3000 VITE_API_URL=http://localhost:9000 \
docker compose up --build
The four routes of the React app
/— Home — line KPIs and today's Gantt across L14/L17/L19 with brand colours and risk bands./forensic— Cockpit Forensic — historical OEE per line, inefficient-transition heatmap, drill-down per shift./optimizado— Plan Optimizado — two candidate weekly sequences run through the real/simulateendpoint, with KPI deltas, drag-to-reorder, and accept-into-store./urgente— Demanda Urgente — drop urgent SKUs into a queue, replan against the live plan via/urgent. Format-incompatible lines filtered; per-rule evidence flags (low-OEE, friction, cross-format) visible in the breakdown.
A floating bot button is available on every route: opens the Asistente LineWise, a Gemini-streamed chat scoped to the LineWise tool. The assistant accepts a line context (14/17/19) so answers are line-aware. It is intentionally narrow — its system prompt refuses off-topic questions.
The legacy Streamlit app (
app/app.py) has been removed in favour of the React frontend + FastAPI bridge.
Chatbot system prompt
The Gemini assistant is grounded by the system prompt in src/chatbot.py. The model is gemini-2.5-flash. Both env-var names work for the key: GOOGLE_API_KEY or GEMINI_API_KEY. The API auto-loads .env at the repo root, so a single line in .env is enough — no extra dependency.
If the key is missing, the chat button stays visible with an amber dot and the side panel shows a clear offline reason — the rest of the app keeps working.
Why not just Blue Yonder?
Blue Yonder is Damm's existing theoretical planner. It almost certainly already enforces format compatibility (the HARD rule in our urgent-demand triage). The HARD rule in LineWise is belt-and-suspenders — its purpose is to be a safety net, not a new feature.
LineWise's new value is the empirical layer that Blue Yonder cannot infer because it plans from theoretical changeover times, not from how those changeovers actually behaved on the shop floor:
| Rule | What Blue Yonder sees | What LineWise adds |
|---|---|---|
| Soft 1 — low-OEE pair | This transition takes 20 min of theoretical changeover, OEE assumed nominal. | This specific (line, from→to) pair runs at mean OEE 0.34 across 3 historical observations — the theoretical OEE never materialises. |
| Soft 2 — high-friction pair | Theoretical changeover = 20 min. | Median actual changeover on this pair has been 50 min — 2.5× the theoretical figure. |
| Soft 3 — cross-format pair | "L19 can run all three formats, no flag." | "But the 1/3 → 1/2 transition on L19 historically loses 1.8 OEE points to setup overhead — flag for review." |
On the held-out test period, the soft-rule layer fires 332 times across 108 urgent-demand scenarios (33% of feasible options), at a flag rate of ~3 per scenario. In 26% of scenarios there's no clean option across all three lines, meaning the planner is told "every option here trips at least one historical-evidence flag — pick the least-bad."
That information is impossible to derive from theoretical changeover matrices alone. That's the value layer.
Key design choices (and why they matter to a judge)
- Line-format constraints are hard physical rules, not data-derived. L17 cannot run a 50 cl SKU even if a stray row exists in the history. Encoded in
LINE_FORMATS(src/optimizer.py). Enforced in bothrecommend_lineandinject_urgent_demand. - Real product metadata. 183 SKUs in
data/processed/product_meta.csvwith brand/format/color — features likesame_brand,same_format,color_changecarry real domain signal, not synthetic stubs. - No leakage in features. Every rolling/serial feature uses
shift(1)on a chronological sort within (line, SKU) groups. The LEAKAGE framing inevaluate_model.pydeliberately shows what R² would look like if we did leak (~0.99 withavailability×performance≈ OEE), as a ceiling diagnostic. - Three framings, not one. Daily OEE has an irreducible ~0.40 R² ceiling from within-(line, SKU) noise. We make this explicit rather than overclaiming.
- Risk bands are line-relative, not absolute. With line means at 0.40 / 0.53 / 0.47, a threshold like "<0.70 = high risk" flags every single prediction. Bands are anchored to per-line μ/σ instead. See
LINE_BASELINESinsrc/simulator.py. - Worst-decile transitions are a data rule, not a model output. 16 historically-bad SKU pairs (mean OEE ≤ 0.38 with n ≥ 2 observations) are pre-computed from the training period and flagged at urgent-demand time. Defensible regardless of model accuracy.
- Optimiser uses real costs.
build_cost_matrixmixes 0.7 × historical-actual-changeover + 0.3 × (1 − historical-OEE) per transition. Theoretical times are only the fallback, padded by 20%.
Headline numbers (last full run)
| Metric | Value | Source |
|---|---|---|
| HARD-rule format blocks (test period) | 187 / 1,182 (15.8%) | reports/urgent_demand_backtest_summary.txt |
| SOFT rule 1 — low-OEE-pair firings | 2 / 995 feasible (0.2%) | ↑ same |
| SOFT rule 2 — high-friction firings | 10 / 995 feasible (1.0%) | ↑ same |
| SOFT rule 3 — cross-format firings | 321 / 995 feasible (32.3%) | ↑ same |
| SOFT rule (any) — total firings | 332 (33.4% of feasible) | ↑ same |
Honest read of the rule distribution: Rule 3 (cross-format) carries 97% of the soft-flag firings; rule 2 (high-friction) 3%; rule 1 (low-OEE) 0.6%. The three rules are complementary, not equal-weight. Rule 3 is the dominant signal because cross-format setups are common on multi-format lines and are structurally costly. Rules 1 and 2 catch edge cases the format predicate misses — same-format pairs that nevertheless run at bottom-decile OEE, and same-format pairs with documented multi-x changeover overruns. Without them, these would slip past the predicate.
| Scenarios with NO clean option | 28 / 108 (25.9%) | ↑ same | | Annualised soft-rule value (3 lines, BY-excluded) | €49k – €99k / year | ↑ same | | Weekly OEE forecaster, test R² | 0.823 | models/framing_comparison.png | | Daily OEE model — Combined ensemble test MAE / R² | 0.099 / 0.398 | models/framing_metrics.json | | Daily within-day reorder lift — 95% CI | [-0.0016, +0.0023] | reports/backtest_summary.txt | | Optimizer-agreement ↔ realised-OEE Spearman ρ | -0.17 (p = 0.09, n.s.) | reports/backtest_similarity_summary.txt |
Demo scenario (90-second walkthrough)
Setup. In two terminals: make api and make ui. Open http://localhost:8080.
Step 1 — /forensic. Show the per-line OEE history and the worst-decile transition heatmap. "This is what Damm has today — descriptive."
Step 2 — /urgente. Add a 1/2 SKU urgent demand and click Replanificar con Damm.
- The system shows L14 and L19 as feasible; L17 is filtered out because it can't run 1/2.
- The Damm replan panel shows per-rule evidence flags from the backend (low-OEE / friction / cross-format).
- The recommendation explains why in plain language.
Step 3 — /optimizado. Optimize the week and walk the two Gantts side by side. KPI deltas are real /simulate predictions, not pre-baked numbers. Drag to reorder and see the comparative KPIs update.
Step 4 — Ask LineWise (bot button, bottom-right). Open the assistant. Pick Tren 17. Ask: "Why isn't Line 17 showing up for my 50 cl SKU?" — the answer streams from Gemini, grounded in the system prompt that knows the hard format rules.
Known limitations + next steps
See POST_MORTEM.md for the full ceiling story. Headline limitations:
- ~60% of OEE variance lives WITHIN (line, SKU) cells, driven by operator skill, material lot quality, and micro-stops — none of which are in the dataset. Daily R² > ~0.42 is unreachable without those.
- No € ROI claim for daily reordering — the lift is too small relative to the model's MAE (0.10) to convert to currency meaningfully without Damm's internal economic data (hl/OEE-point sensitivity × €/hl margin).
- Quantile coverage 71% vs 80% target — could be lifted with conformal recalibration.
- No external data signal yet (weather, holidays). The brief explicitly says external data is encouraged but not required; we focused on getting maximum signal from the operational history first.
Next steps if Damm wants to deploy:
- Integrate Damm's actual
planned_changeover_minfield (we currently substituteactual) — removes the documented train/inference distribution shift. - Integrate operator and material-lot IDs — lifts the daily R² ceiling.
- Calibrate quantile intervals via conformal prediction — reaches 80% coverage.
- Wire the urgent-demand tool into Blue Yonder via a REST hook — present blocked / flagged options as advisory annotations on theoretical schedules.
🍺 Damm × Engineering HUB Hackathon 2026
Log in or sign up for Devpost to join the conversation.