Often, people get too busy with global news and activities that they underlook things happening in their neighborhood and nearby cities. Hence, we wanted to build something that can aware people of their safety in their locality.
What it does
It scraps through the local news article and analyzes the type of crime and extracts the location of the incident. The identified crime is then weighed and aggregated with the number of occurrence and casualties, and given a severity score (Sev_score) ranging from 0 to 1. The closer the score is to 1, the more unsafe the locality/city is classified as. We plotted the results in the map with color-coded pins for the location with the incident of crime/threat. Red indicates the sever state (Sev_score from 0.8 to 1), Orange indicates the unsafe (Sev_score from 0.6 to 0.79), Yellow indicates the moderate (Sev_score from 0.3 to 0.59) and Green indicates the safe state (Sev_score from 0 to 0.29). The webpage is updated every 24 hours with available local news.
How we built it
We wrote an NLP-pipeline in Jupyter notebook that scrapped the local news article. We extracted the heading of the news and used Google Cloud NLP API to extract the location of the incident and converted that to corresponding latitude and longitude using Geopy Geocoders API. We hard-coded the dictionary that included the categorical crime topics with severity score based on the findings from the literature review. We did the sentiment analysis of the keywords from the headlines and compared the major identified words with the dictionary using the tools from the NLTK library. Our algorithm aggregated the severity score for the major words in the news title and divide that with the number of occurrences of major words in the title. This scaled all the total score in the range of 0 to 1. We then used the Google Maps API for visualizing our results with appropriate color coding and used React along with HTML/CSS to style the webpage.
Challenges we ran into
Because there were not enough labeled data sets on the criminal news on the local level, we had to hardcode the dictionary ourselves. The sentiment analysis tools could only categorize the event into crimes or others but to determine the severity of the crime we had to manually assign the scores to the categories. The location of crime as extracted from the news topics had to be cleaned multiple times. Similarly, the local news reports didn't include enough data to run more analysis.
Accomplishments that we're proud of
We could represent our results on the map for the major cities in Mississippi and their neighborhood. We are happy that in our first hacking project, that too in a very limited time, we could actually accomplish the targeted portions of our project first learning and then leveraging on the natural language processing using some of the best APIs available in the industry.
What we learned
We learned significantly on using industry level APIs and accomplish tasks efficiently.
What's next for Our Safe-Neighborhood
We will finish setting up the NLP pipeline on the server. Right now we have only accomplished the classification of severity of the crime for one portion of the State. We look forward to doing that for multiple states with broader data sets from sources like Twitter and New York Times. One of our major goals is also to elaborate and extend the dictionary that we used in classifying the crimes in the given location.