Empower ORR with omniscience

ORR visits every Network Rail project every quarter to check whether they are reaching milestones.

Yet ORR collects monthly data on performance and financial indicators, which can be used to predict underperformance, without the need for a visit.

Our project enables ORR to use this data in order to:

Achieve efficiency through visiting sites that really need it
Effectiveness during visits: question targeting
Stopping problems before they develop

How it works

The framework is split into 3 sections;

Data pre-processing and Cleansing
Data Processing and Model Training
Data Visualisation

Currently, two sources of data are used -

Time-series financial data which both monitors and forecasts spend against budget
Time-series records of completed and missed project milestones

Both are passed through the initial data pre-processing and cleansing stages - this simply requires the relevant spreadsheets to be dropped into the correct folders.

Data source 1 is used to derive the features (X) of the machine learning model, while data source 2 is used as the outcomes (y). The two data sources are inner joined to ensure they overlap the same time range.

The specific features used for the basic model were the fluctuations of spend away from the given baseline, for Control Period 5, years 1 to 5. For example, the fluctuation for CP5 Year 1 was equal to CP5Y1 Baseline - CP5Y1 Actual/Estimated Spend. This resulted in 5 features for this model. A variety of further features could be added here to aid prediction accuracy, depending on the sources of data available. This could include “Project on a Page” RAG status, number of milestones, size and type of project, etc.

These features and outcomes are then split into a training and validation set, to ensure that the model is not exhibiting signs of bias or overfitting. The training data is then passed into several machine learning classification algorithms - this includes:

Naive Bayes
Random Forest
Logistic Regression
Kernel Support Vector Machine (Kernel SVM)

Once the training of these models is complete, they can be tested with the validation data which has not yet been seen by the system. Confusion matrices are calculated for each algorithm, which measure the accuracy of the model against the actual outcomes.

Given the same set of features, these models can then be used to prioritise projects which are projected to miss future milestones. On the front-end, this is shown as a table ordered by the probability of this occurring - i.e. how confident the model is that a project is likely to fail.

In addition to machine learning, the data is reformatted as a JSON for easier visualisation of multidimensional arrays. When a user selects a project that has a high likelihood of failure, they are then provided with a variety of easy to read charts indicating the reasons why the project has been classified as a potential underperformer. This assists the user with deciding if they should undergo further investigation on a project, possibly leading to a site visit. As the data is served as a web page, this data is easily accessible on-site, allowing the user to undergo a concise investigation.

Modelling Results

Below are the results from the machine learning algorithms described above.

Naive Bayes was shown to be the highest performing algorithm, according to the confusion matrices (example shown below). In tests, it was shown to correctly predict 7 of 8 validation data points on average, having been trained on 31 data points.

	Predicted Outcomes
		Completed	Missed
Actual Outcomes	Completed	5	1
	Missed	0	2

The one incorrect result was always shown to be a project that the model predicted was failing, but was actually completed. This is the best error case scenario, as it might have caused an investigation into a project where there was no issue instead of not investigating and, therefore, missing serious issue.

Random Forest, Logistic Regression and Kernel SVM were all shown to perform similarly, guessing 6 out of 8 project correctly. However, in these cases the predictions showed worst error case scenarios, as the two incorrect results was where the model predicted a successfully completed milestone, when it actually was a missed milestone. This could have led to no investigation, where there should have been.

With these initial prototype results, it is suggested that a Naive Bayes model is taken forward. This algorithm also lends itself to root cause analysis of which variables are most influential in causing projects to miss milestones.

Limitations

1. Sample size

Due to the limited provision of data our sample sizes were extremely small. We were working with 36 data points (derived from the overlap of the two sources described above), which is a particularly small sample to develop a statistically robust model. The methods employed scale well with data and prediction accuracy would likely improve with a larger sample.

2. Features available in the data

Missing a milestone of a major project is a complex event. It may be triggered by many factors, hence many features are required to predict it accurately. With a domain expert on the team, we believed that there is enough data to predict this, however we were unable to acquire this data due to the extensive pre-processing required. Therefore, the features used in the model relate to financial data only, which is a stable predictor but may not be sensitive enough.

Future research

Data availability and time constraints have inevitably had limitations for our project. However there is great potential to develop it further, through:

Broader availability of data and variety of features will certainly improve the accuracy of the prediction models. Obtaining more varied features from administrative data held by National Rail and converted to machine-readable format would be a major step forward.
Information acquired during visits completed following a prediction of underperformance could feed back into the model to improve accuracy, creating an iterative learning system.
As new projects emerge, trends may change and the models will need to respond to this. Developing the project to allow new project entries will keep the model up to date and relevant.
Dynamic analysis of underlying financial data to track how revisions change closer to the deadlines can inform of ‘rigging’ patterns and how to spot them.
Developing a method to cross validate the data used to train the model, to reduce any inherent biases. As model trained on bias data will, itself, be biased.

This project has been specifically designed for ORR purposes, however due to National Rail data being the underlying data source, this is very applicable to National Rail too. Further, due to the nature of performance monitoring across many regulatory bodies, the approach can be applied to other sectors too.