Project Story

About the Project

Airfare headlines kept citing “mystery surcharges,” but most articles lacked route-level evidence. That gap pushed me to build a transparent analytics stack that traces each fare back to measurable structure—distance, competition, and hub power—instead of vague narratives. The Data Marathon brief further nudged me to turn a static dataset into a story-first dashboard that stakeholders can actually explore, filter, and validate.

What I Learned

  • Translating aviation economics into engineered features matters: deriving price-per-mile, demand bins, and hub premiums made the model’s behavior far more interpretable.
  • Building a clean MLOps loop (data → XGBoost training → FastAPI serving) enforced disciplined reproducibility: versioned CSVs, pinned dependencies, and deterministic seeds.
  • Story-first UX beats chart dumps: curating 16 endpoints into 5 dashboard sections helped surface the “so what” behind each metric instead of overwhelming users with raw plots.

How I Built It

  1. Data pipelineclean_data.py ingests ~14k rows, handles currency coercion, imputes missing city metrics, then exports both vis-ready and ML-ready tables. Market concentration is computed via the Herfindahl–Hirschman Index:

$$ HHI = \sum_i s_i^2 $$

where \( s_i \) is carrier \( i \)’s market share (using a three-share proxy when full carrier breakdowns aren’t available).

  1. Modeling layerbackend/app/models/train_cli.py runs a 100-trial XGBoost search (hist method, time-based split) and logs artifacts for deployment, including tuned hyperparameters and evaluation outputs.

  2. Service + UX — FastAPI exposes 16 visualization routes plus /predict/fare, while a Vite + Tailwind React dashboard consumes them with live filters and an interactive fare predictor for “what-if” exploration.

Challenges

  • HHI estimation: Limited carrier breakdowns required a proxy that wouldn’t oscillate across routes. Clipping shares and smoothing medians stabilized the concentration signal without destroying ranking.
  • GPU/CPU parity: Training on GPU introduced slight numeric drift. Exporting the tuned parameters and re-running with fallback_to_cpu kept serving behavior consistent and easier to reproduce.

Built With

Share this project:

Updates

posted an update

Project Updates — OJLK Team 15: Airfare Markets Under Pressure

Keep a log of how OJLK Team 15 – Airfare Markets Under Pressure has evolved.
Post updates about:

  • New features
  • App store releases
  • Screenshots
  • Code snippets

Your followers will see these updates in their feeds and can comment on them.

Log in or sign up for Devpost to join the conversation.