Over 20% of all flights arrive late, as per the U.S. Department of Transportation statistics. This can lead to disruptions in your plans, if not accounted for. Moreover, on top of that, Bad Weather accounts for nearly half of all flights delayed or cancelled.

As frequent flyers, we felt that an app based on actual facts and figures can help travellers like us in making informed decisions about their upcoming travel. Knowing Weather before-hand as well as detailed awareness on Airline and Airport On-Time Performance can help us to plan itineraries effectively. Current Events, be it Weather Alerts or Happenings in the city of departure and destination, can also help in preparing for the journey.

And so, My Flight Buddy was designed and developed to leverage Big Data Analytics and Machine Learning.

What it does

My Flight Buddy provides the below feature:

  • Analytics on Route, Airline and Airports: Performance Analytics on 8000+ Routes in US, 15 Airlines and 200+ Airports. Get Insights on when are the flights most likely to get delayed based on hour of day or day of week.
  • Weather Forecasting: For your Flight, get the Current Weather, Hourly Weather as well as Forecasted Weather for next 10 days for Departure and Arrival Locations
  • Delay Prediction: With state of the art Analytics and Machine Learning, get an approximate estimate for arrival delay for your flight based on weather conditions on your day of travel
  • Live Alerts: Live Weather Alerts and Twitter Feeds and Historical Twitter Sentiments are provided within the app to get latest information on your flight and Departure & Arrival Cities
  • Responsive Application: My Flight Buddy is designed and developed so that it can be used seamlessly across Web Browsers on PC as well as Mobile/Tablet devices.

My Flight Buddy uses On-Time Performance Data of all US domestic flights for approx last 2 years to gather insights on performance of airlines, airports and routes.

The Delay data is, then co-related with the actual weather condition on the day of flight, to build a Linear Regression Model for Arrival Delay, against Precipitation, Snowfall and Temperatures. This model, is then, used to provide an estimate to the user for the approx delay his flight may incur based on weather data on the day of his travel.

How we built it

IBM Bluemix Platform was used to build and host the My Flight Buddy Application. The entire development cycle consisted of the below steps:

  • Past 22 Months of Flight on-time Performance Data was imported into IBM Object Storage
  • Daily Weather Data was pre-processed and stored in IBM Object Storage for co-relation with Flight Performance Data
  • Using IBM Analytics for Apache Spark, Input Data was analysed for each Airport, Airline & Route. Analysis was stored in Cloudant DB. Document based storage was chosen for easy retrieval of data
  • Using IBM Analytics for Apache Spark Mlib, the Arrival Delay was modeled against Precipitation, Snow-fall and Temperatures. Model was then stored in Cloudant DB
  • Node.js based Web Application was built and hosted on IBM Bluemix to interface with users.
  • The web application provides a UI for Route, Airline and Airport Analytics and Predictions
  • The web application also Integrates with FlightAware™ for Current Delays and Further Information
  • Through the web application, user can see details of Current Weather & 10-day forecast, which is fetched using IBM Insights for Weather.
  • IBM Insights for Twitter is used for Sentiment Analytics and displayed on the web app for cities and airlines
  • Latest Tweets are returned using Twitter Search API and displayed on the web app for Live tracking of events
  • Weather Alerts are fetched from NOAA ATOM Feed and displayed on the web app for the Departure and Arrival cities

Currently My Flight Buddy analyses more than 2.5 GB of data for last two years. Going forward, the architecture can be based on a scheduled job on IBM Analytics for Apache Spark, (via Spark-Submit) to import and get monthly insights into the Cloudant database so that the user can get updated information, at all times.

Insights from Analytics

The On-time performance data was analysed against various parameters and we found the below key Insights:

  • Delay is lesser earlier in the day. This probably is because the delay keeps on piling up throughout the day as the day progresses.
  • Late Aircraft is one of the major reason for Flight Delay. Weather and Carrier also plays an important role in the delay.
  • June and December are the months where the % of flights getting delayed is more for most of the Airlines. Is it because of the holiday season rush?!
  • Larger Airport may not necessarily mean more delay in Flights.

Accomplishments that we're proud of

This was our first web application developed using the technologies mentioned and we are proud of learning these and completing the entry on time.

What we learned

For sure, we learnt that Apache Spark is a brilliant platform for big data analysis. Computations which took hours on other platforms that we have used, were completed in minutes on Apache Spark hosted on IBM Bluemix.

Almost, all the technologies used at the server end & for analytics, were a first for us and so was a great learning experience.

What's next for My Flight Buddy

The current RSquared value for the Delay Predication Model is 0.27. This can be further improved by incorporating other dependent variables into the model, like the Hour of Flight, Rolling Averages of Precipitation and Snowfall.

Also, there are many improvements that can be incorporated, like:

  • Global Presence: My Flight Buddy can use flight on-time performance data for other countries, where available, and provide analytics for them as well. Spark RDD will be great feature to use!
  • More analytics; since BTS On-Time Performance has many features which can be reported on
  • More Data Inputs: Other APIs which can provide right information to travellers can be incorporated for a one-stop app for all details regarding their flight
  • Weather Data Unit Conversion: Currently weather data is represented in Imperial Units. Metric Units conversion functionality can be provided
  • Improve Apache Spark Python code for Performance and parallel processing

Built With

Share this project: