Inspiration
We set out to apply machine learning methods to interesting real world datasets. We found data on flight traffic and delays from the United States Government and decided to build a predictive model to measure flight delays. With the wealth of data we set out to gain insight into the nature of flight delays and the aviation transportation network.
What it does
Predicts the deviation from the scheduled arrival time of a flight, given information regrading the Airline, Airport, Flight Plan, and Date of Departure.
How I built it
Implemented a Multinomial Linear Regression in R on data of flights from 2014 across the United States. 240,000 flights used to train our model, with 10 predictor variables.
Challenges I ran into
Our data had many categorical variables. In R a categorical variable is designated as a 'factor', with each factor having some number of levels. For example the factor 'color' could have levels ('red', 'blue', 'green'). When making predictions, due to the nature of the Machine Learning algorithm, if you try to make a prediction based on a level in a factor not yet before seen -such as an exotic airport location-, then the model cannot make a prediction. It has no prior data to base its predictions on.
Accomplishments that I'm proud of
We got something working.
What I learned
It's hard to fit a good model to real world data.
What's next for Fly
Train on larger data, variable selection such as Lasso or Ridge Regression.
Log in or sign up for Devpost to join the conversation.