E-PASS 宜-PASS

Inspiration

Several studies suggest that severe traffic jams in many countries cost economic growth often invisible to the policymakers and economists. Urban residents in busy cities probably feel the impact of traffic congestion in their day-to-day lives. A solution that just mitigates traffic congestion problems can thus translate into appreciable economic values.

We surveyed a number of road users in Taiwan, many of whom frequent Freeway 5, and realized the difficulty in directing them to use alternative roads such as Highway 2 and Highway 9. Most people choose not to use Highway 2 because it takes way more time (even when the Hsuehshan Tunnel is clogged with traffic), while avoiding Highway 9 for safety reasons. Hence, a more feasible and practical approach is trying to spread traffic across diverse times and interchanges.

To mitigate the traffic congestion on Freeway 5 (and especially the part through the Hsuehshan Tunnel), the Taiwanese government currently has a so-called "High Occupancy Vehicle" road policy in effect -- during certain peak traffic hours on weekends, only vehicles with at least three passengers are allowed on Freeway 5. However, the policy has so far not seen the desirable effect of reducing traffic congestion, partly because most vehicles travelling on Freeway 5 on weekends have more than two passengers anyways.

According to our survey results, many Freeway 5 users are willing to change the times they hit the road to avoid the traffic. Nevertheless, they mostly rely on their past experience about the traffic situations at different times. Existing mobile apps such as i68, while providing real-time traffic information, is not particularly useful in helping drivers avoid the upcoming traffic if they are already on or ready to get on the freeway. With historical traffic data made available by Taiwan's National Freeway Bureau, we propose to build software applications that give more predictive information to the users before they even hit the road or plan their trip. In addition, we suggest combining such predictive information with soft traffic control policies to tackle the traffic congestion problem.

What it does

Our solution delivers three major components:

Accurate predictive traffic information through the use of sophisticated machine learning techniques.
- prediction of the traffic volume at different hours and interchanges for the two coming weekends.
- prediction of the traffic congestion and expected travel time at different hours from different freeway interchanges.
Intuitive ways of visualizing the predictive results to help drivers make more informed decisions.
- define a mathematically simple yet effective traffic congestion index based on traffic volume and average car speed on the road which intuitively reflects the traffic congestion situations.
- congestion index is on the scale from 0 to 10, with 0 meaning no traffic and 10 indicating heavy congestion.
Soft traffic control policies that integrate nicely with the software components.
- require drivers to reserve an electronic pass (E-PASS henceforward) to use Freeway 5 through the Hsuehshan Tunnel.

We here describe a real usage scenario of our solution from the perspective of a prospective road user. Upon launching the software, the user is prompted to enter information such as vehicle licence plate ID, name, and citizenship ID, etc. to reserve an E-PASS for driving through the Hsuehshan Tunnel. After verification of the information, the user is presented with simple bar charts showing predicted traffic volume at different times and interchanges on Freeway 5 for the two coming weekends. Moreover, the user will see on the same bar charts whether there are more available E-PASSes for their desired times of travel, and if not, they will be asked to book an E-PASS for a different time, date, or interchange. Our proposed policy is to limit the number of E-PASSes for several rush hours. Those drivers who fail to reserve an E-PASS for the time of their road use will incur an additional toll fee. There is an "History" page in our software showing records of the user's E-PASS applications in the past.

Apart from the need to reserve an E-PASS at rush hours, on a different page the user is shown the prediction of the traffic congestion situation (via a congestion index we defined) at different hours of the current day, as well as the predicted travel time from different interchanges to either ends of Freeway 5 through the Hsuehshan Tunnel. For those hours before the current time of the day, we automatically update (once every hour) the displayed congestion index values with those calculated based on data collected real-time from Taiwan's National Freeway Bureau's online repository. This can be especially useful when the user is planning a return trip from Yilan back to Taipei when they do not have a pre-set leaving time or interchange in mind.

Finally, our solution integrates with Google Maps and propagates user's reserved E-PASS information into the navigation system. Currently this only allows the navigation to suggest a route that avoids Freeway 5 until the user's reserved interchange. We envision that navigation systems (either mobile-based or vehicle-embedded) will become capable of using such information to plan a route between any arbitrary origin and destination that gets on Freeway 5 at the reserved interchange.

How we built it

We briefly describe the technology behind the most crucial components forming the backbone of our solution, as well as the resources that we have used.

Predictive analytics

We built our predictive models using several machine learning APIs in Python. (Most libraries are implemented in other languages such as C++ for efficiency reasons, but they have nice bindings to Python.) The particular learning algorithm that we made use the most is a decision tree-based ensemble method called gradient boosting, which compares favorably and often gives superior performance than many more widely used methods such as SVMs and random forests. The employed learning technique is not unheard to many machine learning/data science practitioners. Nonetheless, our sophistication lies in the smooth combination of effective data preprocessing, feature engineering, and a principled approach to model validation and hyperparameter tuning.

In addition to the data provided on the official hackathon website, we made use of the historical traffic data from Taiwan's National Freeway Bureau. In particular, we processed the raw trip data to produce the travel time data needed for our predictive modelling. We also incorporated weather information (obtained from Taiwan's Central Weather Bureau Open Data Platform) such as precipitation amount in Yilan and Taipei in 2014 and 2015 (up to October) in constructing our predictive models. Besides, we have a running server that periodically pulls traffic data (every five minutes) from the aforementioned source, parses/preprocesses the data, performs calculation, and updates relevant information (every hour) in our software application.

User interface (UI)

On the user interface front, we developed in the C# language a simple, user-friendly, and easy-to-use UI display with several Microsoft development APIs. Using these APIs, our software's UI nicely displays the results from our predictive models and provides the users with intuitive visualization of the predicted traffic situations of interest. We also employed Google's API to embed Google Maps into our software for possible future integration with travel navigation systems.

Challenges we ran into

In the course of this hackathon project, we have encountered numerous challenges that are common in most software and data science projects. Without presenting all the technical detail, we name a few particular challenges:

As old as it sounds, the most important challenge, as in almost every project, is probably the very first step -- coming up with a solution that draws upon our central idea (e.g., data-driven decision support), is helpful in mitigating the target traffic problem, and is implementable within the short timeframe of this hackathon.
Many challenging are associated with dealing with data, e.g., the existence of errors and inconsistencies in the historical data from Taiwan's National Freeway Bureau; how to make the best use out of available data; and experimentation with various statistical, predictive modelling, and machine learning techniques. Moreover, the timeliness of data arrival also significantly affects what features we are capable of implementing.
On the software development side, the process of piecing together all the individual components presented many challenges as always. For example, loading predictive results produced by an underlying model often gives an unexpected visual effect from what we originally hoped. Additionally, setting up automated tasks of data collection and calculation requires a lot of care and caution, as we have interleaved jobs that repeat every 5 minutes, every hour, and every day.

Accomplishments that we are proud of

Within the short period of this hackathon, we have been able to build predictive models with highly respectable accuracy. We set aside a portion of data (from October 28th to November 13th, inclusive) from those used in the process of building our models. On this hold-out dataset of about sixteen days, the mean absolute error of our travel time prediction is 4 minutes and 9 seconds, translating to roughly 12% relative error. We are particularly proud of our implementation of the entire machine learning pipeline that ran from the very first step of data gathering and cleaning all the way to deployment of the predictive models.

Moreover, we achieved a clean integration of the data science component with an intuitive UI display. Many existing software applications share the common problem of having too much information; while providing abundant information, the user is faced with the problem of choosing the information actually helpful to them. Our solution, on the contrary, consistently presents what we consider will be the most useful piece of predictive knowledge in a fashion that is straightforward and easy to digest.

One less mentioned and often underappreciated accomplishment is the initial process of transforming an abstract idea into a solid design of a working software solution. We are proud of our early-stage brainstorming process which, other than the technical aspects, considered the problem from a user-oriented perspective and specifically tried to target the traffic problem. Additionally, we have invited several interested friends to join our early thought exercises, resulting in a more rounded final solution.

What we learned

We have learned from essentially every technological piece involved in the making of this project. Instead of listing all the technical frameworks employed, we here discuss the more general knowledge we learned about freeway usage and driver behaviour that was somewhat unexpected to us.

First and foremost, our survey results suggest that most people using Freeway 5 express little tolerance of wasting time in traffic jams, yet on weekends (especially long weekends when a special holiday happens to be Friday or Monday) road users spend an enormous time on Freeway 5 stuck in the traffic. This points to a need of more predictive information on the user side. If prospective road users can receive accurate predictive information before they even plan their trips, they may be able to change the time they get on the freeway or even plan a completely different trip that does not use Freeway 5.

Another important finding from the survey is that little increase in freeway tolls may have a surprisely large effect in lowering people's will to use freeways. 62 percents of our survey respondents will consider not using Freeway 5 if NTD$100 toll is in place for vehicles travelling through the Hsuehshan Tunnel. This proportion increases to 77 percents if the toll is NTD$150. Coincidentally, in the recent past the Taiwanese government implemented a new traffic policy in effect during the Dragon Boat holidays slightly increasing the toll fee for Freeway 5. This toll fee increase has later been found to result in a disproportionately large reduction in the number of vehicles on road during the holidays. This fact is consistent with our survey finding and leads us to consider the associated traffic policy of charging additional toll for drivers without E-PASSes.

What's next for E-PASS (宜-PASS)

We see several distinct directions for future development. First, on the side of data analytics and software development we have only touched the tip of the iceberg of data science in this short two-month period. We outline some possible future directions.

Experiment with a lot more techniques, methods, data processing pipelines, etc. to try to improve the predictive accuracy: feature engineering, comparing more models, blending different models' outputs, etc.
Periodically re-train our models as sufficient new data arrive (e.g., once every month) so that our predictive models will adapt to drivers' evolving behaviour. This is rather straightforward to implement but may introduce significant improvement as we expect drivers' behaviour to change as responses to governmental policies and to the growing population of our software users.
Make more prompt (e.g., real-time) prediction if data arrives fast enough or at least adjust/calibrate our prediction results according to the real-time data.
Employ a distributed computing framework (e.g., Spark or Hadoop) for efficient processing of data when data gets larger and larger.
Integrate more tightly with existing apps or vehicle-embedded systems, e.g., Google Maps and navigation systems.
Integrate with Google Calendar so that the E-PASS reservation information will be added to the user's calendar.

Another area for the future is to design more effective traffic policies with the aid of our predictive results:

Control the number of E-PASSes for different times based on our prediction of traffic volume.
Adaptively move the current High Occupancy Vehicle hours according to our traffic estimates.