Inspiration
Goals
Uber has a large number of drivers who sign up on the platform but oftentimes, drivers do not end up completing their first ride with Uber. The goal of our project is to identify what factors are significant in determining whether a driver who signed up actually ends up driving for Uber.
EDA
From our EDA we found that a large number of potential drivers did not complete background checks or register vehicles on the platform. Over 20,000 signed up drivers did not end up completing the background check and thus did not complete their first ride. Similarly another 20,000 who did complete the background check did not complete their vehicle registration for Uber. Of the remaining 13,000 candidates, only approximately 6,000 completed their first drive.
To investigate these drops in numbers by stage, we looked at the time it took from signup to background check completion and vehicle registration. We found that those who completed their first ride completed the background check in a timely manner with almost all completing them within the first 10 days. Those who did not become drivers tended to have longer gaps from signup to the completion of the background check. This behavior was similar when comparing the time from the signup to the time of vehicle registration on the platform.
Additionally, we found that while most signups came through ad/paid promotions, 4.7% of those who sign up through those channels ended up driving. Meanwhile 20.1% of those who came through referrals completed their first ride. So while paid promotions do drive many users to sign up to drive, very few follow through. Referrals are a lot more effective in actually getting users to be active on the platform.
The other area of interest was the make, model, and year of the vehicle submitted since approximately half of the users who submitted vehicles did not complete their first ride. Most vehicles submitted were from the year 2000 or newer with some being older. Those older than 2000 almost never became drivers which aligns with Uber’s requirements for vehicles to be 16 years or newer. To investigate whether owners of specific types of vehicles were more likely to complete their first ride, we sought to bring in data containing specs of the vehicles (type, number of doors, etc.) however we were unable to finish this analysis.
Data Processing
For our data preprocessing, we created multiple features from the stages (signup, background check, vehicle added). These included fields with the number of days between the date of the signup and the date of background check and vehicle addition. We had indicators whether they had completed a background check or added a vehicle. Signup channel, OS, vehicle make and model were One Hot Encoded. The label was a binary indicator variable which was positive if they had completed a ride. For imputation, we took median values of the numerical values, such as the number of days between the start date and the other days.
Logistic Regression
The model replaced all the dates with the amount of time from the first drive date to the other dates. All numerical values were replaced with the median value, and all non numeric values that were null were replaced by the most common value. The model was supposed to see what the likelihood one would remain a driver would be given the categorical and numerical data. We ultimately ended up with an accuracy of 0.935. The AUC under Precision-Recall Curve was 0.7168690489097689. The ROC-AUC was 0.945.
Decision Tree
Based on the EDA, we concluded that the following features would be most useful: vehicle year, days between signup date and background check date, days between signup date and vehicle added date, signup channel, and signup OS.
We tried different values for max depth and determined that max depth 4 achieved the highest precision values. Since our data is imbalanced with only ~11% of users successfully being drivers, we chose to use precision to measure the performance of our model. We achieved a precision of 0.7099 and an accuracy of 0.9364.
Findings
Our analysis highlights the importance of timeliness in the onboarding process for Uber drivers. We found that the time between signup and the other stages in the onboarding process were indicative of whether a user drove or not. While for both drivers and non-drivers, most completed the background check very soon after they had signed up, but many of the non-drivers had larger gaps from signup to background check completion. A quick turnaround in from signup to background check completion could be useful in ensuring users end up driving for Uber.
Additionally our analysis on the user signup channel suggests that paid promotions are not nearly as effective as referrals in getting drivers to actually complete their first ride. To take advantage of this, it may be effective to incentivize current drivers to refer their friends.
Log in or sign up for Devpost to join the conversation.