Inspiration
There are many existing chrome extensions on the marketplace that override the new tab page, ranging from pages of beautiful backgrounds to productivity tools. Since the new tab page is one of the most frequented web pages you see on a daily basis, we thought it would be a great platform to deliver relevant news in an easily digestible and unobtrusive manner.
What it does
News Cloud is a chrome extension that aggregates text from a variety of news sites (NPR, CNN, BBC, Ars Technica, Reddit, and etc.) and overrides the new tab page to display shared trending phrases based on word frequency. We take these trending phrases and display them in a word cloud comprised of hyperlinks sized according to their occurrence. The hyperlinks redirect to a Google News search of the keyword or phrases.
How we built it
The extension was built concurrently in two parts. The code dealing with news aggregation and text processing was written entirely in Python. The script scrapes raw text from various sources and puts them through a three step process to determine the final set of data to draw into a word cloud. Step one, upon receiving the raw text, spaCy is used to filter out comparatively meaningless parts of speeches such as conjunctions, prepositions, as well as common words such as “man” or “best”. Step two, the word itself and a bigram created from [prev word] + “ “ + [curr word] is added and updated in the frequency chart. Step three, after all sources have been scraped, duplicates & similar keywords are removed, and around the top 30 key phrases & its frequency are passed on.
Meanwhile, the website and word cloud generation scripts were written in HTML/CSS and Javascript. The word cloud script takes in the result of the Python script and sizes them relative to their word frequency. The input list of phrases is first sorted in descending order based on word frequency and then normalized to text sizes appropriate for the browser window. The word cloud is then generated by placing each phrase link onto the page in a spiral path beginning in the middle of the page. Every time a link attempts to be placed down, a simple object collision function checks for overlap and adjusts the position of the new phrase.
To allow these two scripting languages to interact with one another in a single web app, we utilized the Flask framework.
Challenges we ran into
One of the main issues that we faced was dealing with how to utilize Python and Javascript in one product. There is no way to directly run a python script on a traditional website. Through our research, we determined that the Flask framework was the optimal solution. Having never even heard of the framework before, we went through many tutorials and debugging sessions, and in the end, we were able to effectively utilize it to complete our project.
Another issue that we faced with determining what the word cloud should be comprised of. We originally planned on having only one word phrases, but we quickly realized that they were simply too short to provide relevant context to the situation. We resolved this issue by utilizing bigrams (2-word phrases) drawn from multiple sources and POS-tagging (parts of speech) to filter out less meaningful words.
Accomplishments that we're proud of
One of the main accomplishments that we had was learning and effectively utilizing Flask, a previously unknown technology to us, in order to resolve the core issue of communication between Python and Javascript scripts.
Another aspect of the project that we are proud of is the general polish that News Cloud possesses. From the surprisingly effective web scraping and frequency analysis to generate relevant key phrases to the great UI design and cloud generation algorithm.
What we learned
Other than the technical knowledge (ie. Flask, web scraping, PaaS, general web development) we’ve learned on the fly, we also experienced the collaborative workflow and time constraints more representative of real world situations. Through our experience, we realized that often what contributes to the success of the project is the degree of collaboration rather than individual contribution.
What's next for News Cloud
We are proud of what we were able to accomplish with News Cloud, especially as our first hackathon project. That being said, there is no such thing as a completed piece of software. There are several aspects of the project that can be improved upon.
The web scraping and frequency calculation algorithm creates a significant loading time whenever the site is opened which limits its effectiveness as a new-tab extension. This was the main reason why we were unable to effectively deploy the Flask project onto a PaaS like AWS, GCP, and Heroku because the HTML request would always time out.
Another feature that we would like to include is a customization menu that allows the user to add more news sources, change background image, and settings regarding word cloud generation.
We are excited to continue developing News Cloud into the useful and convenient news source that we conceived of.
Notes
News Cloud logo is original. Background image belongs to Firewatch.
Log in or sign up for Devpost to join the conversation.