Challenge
"Hackers, we want you to work with Uber’s ride data. This dataset must be used. This challenge is intended to be a very open-ended challenge. We want you to explore the Uber ride data set and come up with new trends or find anomalies. You’re allowed to use other datasets to supplement this analysis. We’ve added some additional resources for datasets that you can use, but you’re not limited to only these resources. You can supplement it with whatever datasets that you find! After you finish your analysis, we want you to visualize it. Show us what you’ve found."
Inspiration
Taking a break from my Main Project, I wanted to see what sort of insights I could uncover using Uber's ride data. I had an awesome time exploring this dataset during a midnight coding binge. If given even more time, I would have really enjoyed taking a deeper dive into this analysis!
The Data
Due to data privacy concerns, many valuable features were dropped from the dataset. This left very little room for exploration. The methods I use in this project are great for working with data that can be described by a connected graph.
This dataset contains many data points but very few features to work with (~4 million rows and 7 features). The features being source id, destination id, hour-of-day, arithmetic mean travel time, geometric mean travel time, arithmetic standard deviation, and geometric standard deviation. These geometric/arithmetic pairs are heavily correlated, so it is only beneficial to keep one. Leaving me with 5 usable features for this analysis.
What I Did
• Anomaly detection using Uber ride data modeled as a Barabasi-Albert directed graph. It's preferential attachment mechanism allowed me to identify relative high-density areas.
• Estimate the probability of a destination given a source location by constructing an adjacency matrix. Due to its stochastic nature, the adjacency matrix can also be used as a Random Walk object.
• Optimize ride times by outputting the optimal time of day to request an Uber given a rider's source and desired destination.
• Arbitrarily map the most lively areas using the most traveled to destinations with respect to the hour of the day.
The Code
Feel free to review the code here to get a better understanding of the methods used in my analysis!
Note: This project is an iPython Notebook so it is helpful to have Jupyter Notebooks installed on your machine when trying to view the project.
Built With
- jupyter-notebook
- python

Log in or sign up for Devpost to join the conversation.