Ever seen Mr. Robot? What if we could find similar hidden insights using data, but without doing so illegally? One form of data that often goes ignored is news articles, which could be used to find relationships between people, places, and companies for example.
What it does
It streams real-time data from news articles--in our case the New York Times--using Apache Kafka and finds relationships using text tagging between entities (people, places, and companies), using NLTK and Apache Spark. With these relationships constructed, we can build a graph of connections between these entities, and figure out which bankers have been connected with which frauds, which politicians have been connected with which other politicians, and so on.
The name of an article or entity can be input, and we will see a graph in which the nodes represent entities and the edges represent articles (i.e. how entities are related).
How we built it
- New York Times API for retrieving news articles
- Apache Spark + Apache Kafka for real-time data-streaming
- NLTK for natural language processing
- neo4j for storing relationships between entities
- ElasticSearch for quick searching of entities and articles
- Python + Flask for building a RESTful API used to query ElasticSearch
- TypeScript + AngularJS 2.0 for building a client-side web application
Challenges we ran into
Frontend, frontend, frontend! Learning to build using AngularJS was really a struggle, especially from a team of members with no frontend experience.
Accomplishments that we're proud of
Getting the backend to work properly, and being able to connect it with our frontend to render some data.
What we learned
Lots of new technologies! (At a fundamental level, of course!)
What's next for Nexus
We can integrate other streams of news sources to integrate a wider representation of public media. We could also conduct sentiment analysis to determine whether some relationships are good, while others are bad. Lots of interesting things to go from here.