ScrapeTheLines

Hacked by Steven Feng (Rice '22), with mentorship from Albert Wang (NJIT '22). ScrapeTheLines is a Python script which uses the BeautifulSoup API and urllib2 to scrape the headlines of different news websites for repeated keywords. The script then returns the top ten keywords and their relative frequency, along with the frequency of other relatively infrequent keywords. The intention is for the user to then search those keywords themselves on the news outlets they prefer, thus at least reducing bias created by search algorithms on the Web.

Sources

Here are the articles I referenced for simple facts about the different news channels: https://en.wikipedia.org/wiki/BBC https://en.wikipedia.org/wiki/Fox_Broadcasting_Company https://en.wikipedia.org/wiki/CNN https://en.wikipedia.org/wiki/NBC https://en.wikipedia.org/wiki/NPR

Here is a neat graphic of the political leanings of various news outlets, courtesy of the Washington Post: Image of News Networks

Future Goals

My first and foremost goal is to optimize the scraping algorithm so it's more efficient and more unbiased. Next steps are to implement some sort of data visualization and find some way to export scraped data on an Excel spreadsheet.

Built With

python

Updates

Steven Feng started this project — Mar 10, 2019 09:04 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.