Predictive Regression analysis for Delay Management and Improved Customer experience on Trains
What is PRDIC-T
PRDIC-T uses historical journey data from the Deustche Bahn to enable the prediction of delays on the rail network in terms of both likelihood and scale. These are delivered through an interactive and easy-to-use GIS tool.
Motivation behind PRDIC-T
Tired of purchasing train tickets just to find out in the station that your train has been delayed? Tired of having to annoy your customers by no fault of yourself but simply because delays just happen? Well then, worry no more! PRDIC-T is a solution for both railway companies and their customers. With PRDIC-T you can now check the possibility of a delay on your journey even before buying your ticket! PRDIC-T also enables railroad companies to adapt in advance for major events and unpredictable weather conditions in order to minimise all possible delays.
How we built it
Our datasets include:
- Deustche Bhan historical delay data for Stuttgart
- Weather estimates from The Weather Underground
- Major events data hosted in Stuttgart
At the core of PRDIC-T is a logistic regression model which is fed the above specified data, segmented by peak/off-peak times in order to produce the most accurate predictions possible.
Challenges we ran into
The project was far from easy, having to overcome hurdles at what seemed to be every other step. None of us being particulary well versed into Data Science and Machine Learning, it was more of an educational experience for all of us. Key challenges would be:
- Cleaning the Deutsche Bahn dataset provided to us
- Merging it with other datasets collected from external sources
- Making sense of the data and what information it contains within
Accomplishments that we're proud of
- Working with a real dataset: from cleaning to processing and interpretting, these were all skills we learnt on the spot and we believe we did a pretty decent job.
- Using different predicting models in order to find an optimal one
- Making new great friends, having lots of fun and an awesome experience!
What's next for Lo-Ki: PRDIC-T
Twenty-four hours are clearly not enough for a fleshed-out project. What we have in plan for the future includes polishing the dashboard and UI, identifying and collecting more relevant external data to add to our ever-increasing dataset, and optimizing the predictive model until we get it to peak performance!
And we all know that when it comes to software, scaling up is always important. We believe that this would be easy to achieve for PRDIC-T, since most of the hard work has already been done. The next logical step is to move from using big blocks of datasets to feeding real-time data into the algorithm. PRDIC-T was also not developed with only a certain company in mind, being a piece of software that would be easily adapted to work with data that any railroad company collects.