We believe that some train delays can be caused by many factors in society, for example weather or volume of passengers due to key public events..
What it does
Our model takes weather data for Stuttgart and combines it with station, service and delays data from DB. The model can predict the likelihood of a journey being delayed on a specific service on a specific day.
How we built it
We scraped weather data for Stuttgart and imported this along with the datasets from DB into Python.
We merged the train schedule dataset with the dataset detailing delays.
We joined on station data to each of the records.
We calculated the delay based on the expected arrival time and actual arrival time.
We added on geospatial data for each of the journeys. We then visualised the data using D3.
Challenges we ran into
We initially looked at ways of exploiting the Network Rail geometry data. Unfortunately while there was a large volume of data we felt that it lacked opportunity to produce something meaningful, particularly as much of it lacked geospatial values. Difficulties in obtaining the data due to poor wi-fi for much of the first 24 hours. Understanding and translating the data (headers were in German). Understanding how the data represented a journey. Understanding which data is relevant.
Accomplishments that we're proud of
Built a model with good accuracy using open data..
What we learned
Access to good quality data can allow accurate forecasting of issues affecting the rail industry.
What's next for DBD3
Introduce more open datasets to develop the model and improve forecasting.