Inspiration

Flight delays cause problems through everything: gate availability, crew rotations, passenger connections and even runway/stand capacity. In practice, the earlier you know a flight is likely to leave late or even early the more options you have prior i.e re-assigning gates and communicating updates before that aircraft is even in the air. We wanted to build something that uses real historical delay patterns and gives a clear operational prediction which is turned into actual timings that teams and also individuals can plan around.

What it does

PlaneHack is a fight delay prediction system designed for operational decision-making. Given flight details it outputs, delay status classification and its magnitude via an expected departure time and an expected arrival time. We provide a simple easy to interpret UI to be able to tell what's going to happen before departure rather than after the delay has already occurred.

How we built it

We began with handling the data and preprocessing for our primary baseline dataset which was done via a preprocessing pipeline. From there once we had data we began experimentation via benchmarking multiple modelling and decision making strategies including but not limited to sequence based attempts i.e LSTMs XGBoost, regression models like Linear and Polynomial. Further research into the matter, we discovered our final model choice which was the FT-transformer via a research paper we had found. and was trained using pytorch. From there we integrated our backend into a flask style python backend that allowed the trained weights and encoders to be loaded on startup. For our frontend we developed a simple UX via a lightweight UI built with Vite React and typescript with tailwind CSS with a strong focus on clarity and simplicity. From there we extended product by adding it as a Chrome extension for quick access and light workflows.

Challenges we ran into

We ran into problems with our data set as with the large quantity of data we were affected with a large amount of noise which led to a limited signal in the dataset, delays were affected by many hidden variables that aren't always represented. Another major problem we faced was hardware and resource constraints especially with training the models and also API constraints mainly due to the aviation API.

Accomplishments that we're proud of

Although a daunting and challenging technical task, we overcame this by developing a complete end to end system and have properly benchmarked many model families and demonstrated strong research prowess in finding and deploying the research based tabular model under the constraints presented. In doing this we produced outputs that are operationally interpretable.

What we learned

We learned that Tabular prediction is often data-limited and not simply just model-limited and a clean inference pipeline is as important as the overall model itself. We all have developed our own knowledge of machine learning and data processing via an applied and practical project and gaining an understanding of the underlying process of data-driven decision making.

What's next for PlaneHack

Working with different datasets and working with different features. One additional consideration we had was dimensionality reduction by attempting PCA to reduce the noise and improving generalisation. Working on model monitoring and retraining as patterns change and polishing our arrivals model for better integration onto the Chrome extension.

Built With

Share this project:

Updates