We realized that the lack of good data sets on gun violence is a major problem in the USA, and may be exasperating shootings. We saw that many of the online databases are relatively small and required user manual labor. Computer automation could significantly improve these data sets.
What it does
This program parses through articles in CNN, Google News, NY post, NBC news, Fox news, and the Gun Violence Archives. It identifies the articles that are discussing gun violence through newspaper and natural language processing and then transcribes the text of these articles to a text file. This text file is then analyzed using pattern recognition to extra critical information, including the shooter, victim, and city. This information can then be contributed to the gun-violence data base, which has tasks that include scanning headlines and identifying people.
How we built it
We built it all using python and multiple python libraries. We used newspaper to parse news articles and re to handle the searching and logic of analyzing data text files.
Challenges we ran into
We had a difficult time designing the logic of the pattern-recognition program. Additionally, parsing articles was slow at times.
Accomplishments that we're proud of
We're proud of the data we collected by scraping through a thousand articles and finding almost a hundred on gun violence.
What we learned
We learned how to use new python libraries, how to deal with big data, and how to logic out text pattern recognition.
What's next for Expanding Data for Gun Violence
We want to make a more precise algorithm that uses more sophisticated machine learning to extra relevant information from articles. We also plan to design an web application so that other users in the public can also parse articles and contribute to data bases. Our software is expandable to other news sources, so it can include much more data, which will improve our model.