Inspiration
Current navigation applications don't give us the flexibility to choose our routes, most often they choose the most efficient one, masking other potential ways we would want to travel. For instance, if you're walking downtown, what areas may you want to avoid along the way? We decided to develop an application that gives users flexibility in choosing their routes, sourcing publicly available crime data to help make informed decisions.
What it does
A user specifies their current location and where they want to go, along with an option to choose the most efficient path or the safest path. The path will be drawn across the map given the user's input.
How we built it
- Data preprocessing - We downloaded data from Tucson Data Hub. In Python using pandas, scikit-learn, seaborn and a handful of other data science libraries. We used kmeans and the TfidfVectorizer models to classify crime descriptions into 10 different groups, identified those groups and assigned weights based on the severities of the crime to help with our routing. The modified data with these groupings was exported to be processed by our route-finding app.
- Route-finding web app - We used Python's flask library to handle the backend of the site, pure javascript for building the map interface, open map api for route finding (vector files are downloaded by script for quick access).
- For the safe route calculation, we use the coordinates from the modified data that includes the crime categories and apply weights based off of severity of crime to each node. However, using the nodes themselves wasn't as helpful in route calculation as using edges, so a calculation was applied to create gradient values from each node along the edges. These edges would be avoided depending on how high the values were in the edge if it were along the most efficient path.
Challenges we ran into
- Categorizing crimes was difficult, as there were 800 unique descriptions to categorize in 10 or less groups. We tried using some text search methods to categorize the data, but ended up with far too many items in the "other" category. We considered using a chatbot to do it, but the results were not ideal, giving us a bunch of if/else statements. We decided to try a clustering algorithm (KMeans) and combined it with a vectorizer to create 10 distinct clusters (categories) and then interpreted those as crime groups. We used these groups in our weights to decide the safest route.
- Route finding was difficult, as using nodes with values had less value than calculating with edges. A gradient method was applied to address this, placing values on the edges to assist with routing.
- The coordinate system used by openstreetmap was different than the coordinate system from the Tucson Data Hub. The coordinates had to be shifted into alignment to work.
Accomplishments that we're proud of
- Using clustering to speed up our categorizations of crime. Our initial approach took a bit longer, and was prone to error and more subjectivity. Using KMeans to cluster vectorized text sped things up a lot.
- The work on the pathfinding algorithm had many iterations, solving issues with different data sources and coordinate systems, managing the severity of crimes to ensure the path doesn't go around Tucson instead of through it, etc
What we learned
A ton of information about vectors! Whether we used it to classify text or find routes, they were a major part of most steps of the project.
What's next for Safe Route
More features! We'd like to give the user more options to customize their route, and add more data sources to make more decisions from. We'd like to integrate an agent through AWS to reference this data as a knowledge base, and then summarize to users exactly why certain areas are avoided.
Log in or sign up for Devpost to join the conversation.