Since the last presidential election in the United States, it has been uncovered how much Fake News is produced by fringe media outlets and other malicious actors. By simplifying the reader's news experience, we hope to bring more people back to reputable news outlets. We do this by clustering last week's stories into the 7 most written about topics and show a selection of the 7 most important stories for each cluster.
What it does
7news loads its content from the Thomson Reuters Media Express API. We analyze the news items of the last 7 days with Hierarchical Clustering to extract the 7 topics to read about. For each topic the news items are ranked according to data received from Thomson Reuters Open Calais.
How I built it
Our system is separated into a backend service, written in Scala using the Play framework, which exposes a REST API to our frontend, which is implemented in Angular2 and TypeScript. The parameters for our algorithms were optimized based on hallway testing of results at the hackathon. The news articles are cached on the server.
Challenges I ran into
The biggest challenge was the rate limit of the shared account used to access the Thomson Reuters APIs. In addition the biggest challenge to overcome is always the lack of sleep...
Accomplishments that I'm proud of
- Strength of clustering
- System architecture
What I learned
Parts of our group were new to Scala and Angular2 and managed to get quickly up to speed in these new environments. Reviewing and debating the different clustering and ranking algorithms gave us a great insight into some new techniques.
What's next for 7news
- Further refinement of ML-Techniques
- Scaling up the system for longer time periods and usage of a lot more data
- Building a frontend that is more visually appealing
- Personalization of clustering and ranking algorithms