We spend a lot of time trying to stay caught up with the news, but it can be hard to keep track of everything. We live in New York City, where we can walk by any bodega and get an idea of what’s going on in the news just by glancing at the newspapers on the stand.

What it does

A Thousand Words displays the cover photos and headlines of your favorite newspapers and magazines, recreating that bodega news experience in app form. It provides a smooth user experience with customization so you can have your favorite news sources front and center.

There is also a companion data visualization web app that displays a word cloud of descriptions from the past year of The New York Times’ front page images. Clicking on a word opens up one of the images that corresponds to said label.

How we built it

The app for Android is written in Java, in Android Studio. We used standard Android libraries to build it. We made the UI as simple as possible, the front page is just a vertical layout with a bunch of images on it. This makes our app extremely lightweight and able to run on any size phone with very low power usage. The backend is written in Python and runs on Google App Engine. We used the standard App Engine webapp2 framework, and urllib to scrape images from the web. For the data visualization aspect we used Google BigQuery, the Google Cloud Vision API, and JavaScript. First we used Google BigQuery to organize the past year of New York Times headline photos. Then, once we had the data and images gathered, we used the Google Cloud Vision API to generate a description of each headline image. Using python we ranked each label by how common it is and used that to generate the word cloud. The word cloud is written in JavaScript, using the D3.js data visualization library. We chose to create a dynamic word cloud in JavaScript so that users can interact with and explore different themes. We host the data visualization and theme explorer on Google App Engine.

Challenges we ran into

It was a challenge to get high quality images to download efficiently and quickly, because the news can turn on a dime, or it can stay the same for hours. We created a caching system, along with a bunch of enhancements in the cloud that made the app use much less data and battery power. We also spent a lot getting the Google Vision API to do what we needed it to do.

Accomplishments that we're proud of

There have been a lot of data visualization projects that have taken into account newspapers and the things that newspapers say. But what about the cover photos that play such a crucial role in documenting the world around us? Well we decided we could visualize that data as well. First we collected all the front cover images of the New York Times from the year 2016. Then we used Google’s Cloud Vision API to describe each picture with a couple of labels. Using these labels we were able to build a fascinating word cloud of the most prevalent themes of the cover photos from the New York Times in 2016. We have always felt that this is something that would be awesome to do, and now that we can see it in action it is definitely something that we will use in the future!

What we learned

We learned how to effectively create smooth, clutter-free user interfaces in Android in order to make viewing the news as effortless as possible. We learned about how to use Google App Engine and other Google Cloud API’s to build powerful backend web services. We learned about data visualization using CSS, JavaScript and the D3 library.

What's next for A Thousand Words

We plan to continue working on the Android app, soon to be followed by an iPhone version. The Google Cloud Vision API was a great start in our attempt to describe front page images; however, we believe we can do an even better job in representing what images actually mean. In addition, we’d like to continue the data collection process so we can make more awesome data visualizations, maybe even connecting it back to the phone app in some way!

Share this project: