What we try to figure out
We want to know if the Taxi passenger flow value in Manhattan related by the subway station features like the positions and subway passenger flow. The other problem interests is that if the Taxi passenger flow value affects the amount of vehicle collisions (Taxi involved) happen at the meantime.
What we built
We visualize the relationships by present data on the city map.
TAXI Pick-up vs. Subway (PickUpMap.py) -Data Time Period: 2011-MAY -Title: The date & time -Each Frame: The TAXI pick-up and the passenger flow for each subway station for this hour -Symbol Info: Blue point: a single TAXI pick-up happened in current hour Orange Circle: the size of the circle indicates the value of the passenger flow for this subway station when it was NOT RAINY Purple Circle: the size of the circle indicates the value of the passenger flow for this subway station when it was RAINY -Our intention is to build an application that helps taxi driver build knowledge of where they can get large possibilities of customers pickups in certain period of time. This gives us the idea to build an animation showing taxi-pickup distribution in graph time-wisely. Furthermore, we are also interested to see if number of exits of subways will increase the taxi pickups nearby. However, we got poor correlation between them by observing the graph. -We also have a heuristic hypothesis that the rainy weather will increase the amount of the TAXI pick-up since people might r tend to hide in cabins instead of walking on street under the rain. However, by observing the dynamic taxi-pickup-distribution graph that we generated from the dataset of All Yellow Taxis in May 2011, a very low correlation is found between taxi-pickups and rainy factor.
TAXI Pick-up vs. collision (collision_pickup_map.py) -Data Time Period: 2016-MAY -Title: The date & time -Each Frame: The TAXI pick-up and the passenger flow for each subway station for this hour -Symbol Info: Blue point: a single TAXI pick-up happened in current hour Red point: a collision (TAXI involved) happened in current hour -At the first, we think the number of vehicle collisions should increase as taxi pickups increase(since the vehicle flow increase). However, the observation over the time-wise graph generated gives us the result that they are irrelative as when the taxi pickups increase in the graph, the occurrence of collisions doesn't increase significantly. Collisions are also more likely distributed uniformly across New York City borough while taxi pickups cluster in Manhattan area. This is possibly due to the limitation of dataset since the taxi pickup datas are only from Yellow Cab Company and it mainly dominates Manhattan and airports.
1 plot Analysis: TAXI Pick-up vs. collision (ColiisionAnalysis.ipynb)
The things on the way
The Analysis on the subway station info and the taxi pick-up amount. (SubwayAnalysis.py)
Challenges we ran into
No background on the data diving for both of us. The data of the Manhattan Taxi is too large to deal with efficiently.
Accomplishments that we're proud of
Everything we have done.
related dataset: nyc-subway-turnstile-and-weather.csv NYC-vehicle-collisions.csv yellow_tripdata_2011-05.csv yellow_tripdata_2016-05.csv