Bike sharing programs can introduce new people into bicycle commuting by providing fun, safe, and secure bikes. Bike sharing can encourage new demographics – those who wouldn’t normally ride a bike – to start using bikes for transportation. The project we are doing here is introducing a new bike-sharing company in the city of Austin and their strategy to be successful. We start by first keeping ourselves in the shoes of the top management of the company and brainstorming on what are key attributes which we should know in order to be successful. First is to get the demand at each kiosk. Second is to estimate how many people would be traveling to other kiosks at a particular time of the day. If we can estimate these values, we can then answer all the subsequent questions that are necessary for the ABC (Austin B-cycle) to become successful in Austin.
What it does
Given the checkout and return time at each kiosk, Trip duration and GPS locations of kiosk's, our model can forecast the demand at each kiosk's for next 24 hours and also give the transition probability matrix between each node at any given hour which can help us with:
- Deciding how to manage this demand
- Determine the overall profitability of the network
- Selecting new hot pits stops
- Introduce a new pricing strategy for nodes with low demand at a particular hour of the day
How we built it
Before forecasting the demand, let's visualize the data and check if we can draw some inferences. As it is a Time Series data, it would be useful to check if there is any trend or seasonality in the data. Figure 1 shows you that there is indeed trend and seasonality in the dataset. The trips have constantly been increasing over the years which proves our hypothesis of the trend in the dataset. In the month of March and October, the trip count increases and that can be attributed to the fact that ACL is organized in the month on October every year and SXSW in March. One thing to note here is that in the year 2018, the trip count increased many folds which were because the number of b-cycles and kiosk’s were increased by almost 1.5x. One possible reason could be the increase in demand for b-cycles. Getting into further granularity, Figure 2 shows that the highest demand for ABC is from 11 am to 3 pm which makes intuitive sense as that is the time when students go to their classes.
As there were trend and seasonality in the data, it makes sense to use ARIMA modeling in for forecasting. But there is a bottleneck in this method. Because we have the data for 95 kiosk's, building a time series model on all 95 kiosk's was not possible. Hence, we decided to use the Random Forest algorithm which, although doesn't take into account the temporal dependence of the dataset, gives us satisfactory results which can then be used for further analysis. Results from Random forest were then used to build the transition probability matrix (Figure 3). We used the data from the past 4 weeks for a particular day to build this matrix. For example, for building the matrix for Tuesday at 3:00 pm, we used the data for the past 4 Tuesday's 3:00 pm.
The transition probability matrix P gives us the probability (Pij) with which a cycle starting at node i will end up at node j in the next hour. This transition probability matrix will help us understand what the distribution of bike from a node i will be by the end of next hour.
Using the demand and the transition probability matrix we can address problems which are of key importance, such as
What should the relocation strategy be? If we know ahead of time that the demand at node j will be high at 3:00 pm on a Tuesday, we can see to it that the number of cycles to satisfy the demand are available at node j. This can be done manually but also using dynamic pricing. In this case, by reducing the prices at some not so high demand kiosks, we increase the likelihood that a customer uses a cycle and drops it off at the kiosk where there is a forecast of high demand.
What should the pricing be and what is the total profitability of the network? If we have holding costs, transportation costs of bicycles, etc, we can calculate the cost associated with any kiosk and then optimize to maximize the revenue and decide on a pricing strategy. We can also calculate what should our optimal distribution of cycles be using the stationary distribution of P.
Accomplishments that we're proud of
We are proud of the fact that we were able to apply all the concepts that are currently taught to us in the classes. Given the amount of time, we understood the data, came up with a unique solution and implemented it which gave us a highly satisfying feeling.
What's next for Dynamic Solutions
Next, we would like to improve upon the assumptions that we made:
- The demand at any kiosk at any given hour will depend on the demand in the previous hours as well. We should take that into consideration as well.
- Applying exponential smoothing to the weights while considering the past 4 days data and past few hours data.
- Building a model using Time series analysis (ARIMA modeling) which will give us the demand forecast for all the kiosk's
- Strategy for the coupon system which can be given to the customers if they use the bicycle from a kiosk of low demand.