During the information session about Sensor Journalism, the lecturer brought up the four fundamental questions of journalism. The first of those questions is, "What question does the public want answered?" Answering that question can be quite a lot of work, so we wanted to develop a tool to make the process easier.
When speaking with the professional journalists, we found that a tool which helps them identify which stories are worth reporting could be potentially useful. We searched around online and asked some of the professionals, and couldn't find any existing solution that did what we aimed to do. So we built it.
What it does
Meta-Journalism is a sensor journal project that uses the internet as it's sensor. It takes data from news sources, social media, and trending searches on Google to give a journalist a supply vs. demand relationship for articles on a given topic, which helps determine what topics are worth writing about.
Social Media Score: Derived based on the frequency of new tweets being sent out featuring the keywords. This score is used to scale the potential popularity of a topic based on expected reader engagement.
Press Coverage Score: Derived based on how many news articles mention the keywords, weighted by frequency of publication. This score is used to measure the current supply of news stories on a topic.
Search Popularity Score: Derived based on the data supplied by Google Trends on search popularity. This score is used to measure the current demand for news stories on a topic.
M-Score: Our cumulative score, taking all of the above into account. The larger the M-Score, the more opportunity there is for articles to be posted on the subject. Smaller, or negative M-Scores imply oversaturation of stories in the market, thus less opportunities for articles on that subject. M-Scores less than 5 probabably aren't worth writing about, where larger M-Scores are more likely to appeal to an undersaturated market.
How we built it
We created separate web-scrapers for all 3 of the sites we're interfacing with. Only Google Trends had a useful API that we were able to integrate. With those scrapers, we took the data we gathered and plugged it into specialized algorithms we created to determine a score based on the data from each scraper individually, and all of the scrapers combined.
Challenges we ran into
Building the algorithms was very difficult, finding a way to turn the very different, and often unrelated values we got from the three sites into something meaningful wasn't easy. The algorithms we use still aren't perfect, but we've found they give results that in theory to make sense based on the input data. Also, getting Flask to work, and figuring out a way to host the website was a pain. There seemed to be a lot of outdated documentation on the internet, and filtering what worked from what didn't took a lot of development time.
Accomplishments that we're proud of
Getting all our base functionality done with time to spare. Solving a variety of problems in creative ways after google-fu failed.
What we learned
Tyler: I learned a lot of web development stuff, had to teach myself flask from the ground up to get things working.
Lonnie: I became a lot more familiar with Python through this experience.
Selina: I gained a lot of experience with html, and got an idea of how python scripts and html can be integrated.
What's next for Meta-Journalism
Add more sites to the press coverage, and social media scores. Further refine our algorithms to make them useful. Get a real, permanent, web hosting solution, so I don't need to run everything off my local hard drive.
The demo site will only be live for the duration of Profhacks, but if it's well received we'll move to a more permanent hosting solution.