Inspiration
This project was inspired by the idea of processing large data sets that are freely available on the internet. In particular, this project takes advantage of this data set of roughly 5,000,000 US flights in the year 2015 which contains various pieces of information such as flight delays.
What it does
This application creates a decision tree using a subset of the flight data set, which is then used to predict whether or not a flight the user enters into a web form will be delayed.
How we built it
We have web server and decision tree back end that is implement in C# which is in charge of building a tree, training it, testing it, and classifying unknown flights using a built tree. Additionally this back end receives and processes requests that are sent by a web front end that is implemented using React.
Challenges we ran into
The flight data set that we are using is huge at over 5,000,000 entries. Since this data set is hosted in a remote database, there are issues that we have run into with both being able to process that many requests and being able to build a tree using the flights received from the database. We have tried different approaches to this to help improve our test accuracy (currently around 70-80%) including sampling only a small subset of the database and random sampling.
Accomplishments that we're proud of
We are proud of the amount of work we got done in these ~11 hours. Building a decision tree classifier that has reasonable accuracy and a web front end to interface with the server is an achievement that we are proud of.
What we learned
We learned a lot about the handling of large data, in addition to decision tree learning, and react apps
What's next for Flight Delay Predictor
We are working to further improve the accuracy of our decision tree classifier.
Log in or sign up for Devpost to join the conversation.