The NewsAlpha CLI lets users run queries on social news posts in near real-time
A LUIS model is used to analyze social news posts and recognize intents to do harm or violence
The Gab and 4Chan aggregators run as Azure Functions with logging
Social news posts are stored in CosmosDB
An Azure Search service indexes social news posts and data enrichments
One really cool feature of Visual Studio is being able to stream logs from your Azure Functions and view your Azure services as you code.
In light of recent events involving mass shootings and the unmoderated use of Internet forums and recent dicussions around monitoring social media sites to possibly detect threats in advance, NewsAlpha was created partly to investigate how feasible such a technology would be and partly to demonstrate how open-source AI software can be equivalent and in most cases preferable to closed-source software especially in an ethically perilous scenario like user communication monitoring.
What it does
NewsAlpha aggregates in near real-time social news posts from different Internet forums and social media sites and analyzes these posts by extracting entities like geographic locations, person names and dates, then uses an NLU model to predict the presence of any intents to do violence or harm to others or self harm.The data is indexed by a full-text search service and from the CLI (we ❤️ CLIs) an operator can rapidly query millions of forum or social media posts for entities and possible intents to harm others or oneself. NewsAlpha currently indexes 4chan's /pol/ forum and the Gab microblogging site.
How I built it
NewsAlpha is the spiritual sucessor to the work I started in the OLAF cyber-forensics program. OLAF is a desktop monitoring tool that uses image classification and Azure Cognitive Services to detect and classify in near real-time the images users download to their PC. NewsAlpha instead runs in the cloud and monitors Internet forums and social media sites, aggregating, analyzing, and indexing these posts allowing an operator to scan millions or more social media posts looking for possible threats to do harm to others or self-harm.
NewsAlpha is built using the following Azure services
Azure Functions: NewsAlpha's aggregators run as Azure Functions eliminating the need for server frameworks.
CosmosDB: Social media posts are stored in a CosmosDB databae. CosmosDB has the capacity to store millions or more of news posts cheaply with high-performance data access.
Language Understanding (LUIS) LUIS is used to create an NLU model for extracting entities like person names and geographical locations and for predicting intents to do harm or violence
Azure Search Full-text indexes are created using Azure Search for posts for each forum or social media site allowing rapid searching across large numbers of data items.
Challenges I ran into
As always Murphy's Law kicked in and on the last day to submit my Azure subscription was cancelled as I used up all my credit. Fortunately I was able to re-create the services quickly. The LUIS service is expensive to run at the scale I need and I may disable it a few days after submission to keep costs down.
Accomplishments that I'm proud of
I was able to build an end-to-end cloud AI app on my own and I learned a lot of different things.
What I learned
This was my first time using many Azure services and I was quite impressed at how easy they are to setup and deploy and configure. LUIS made creating an NLU model almost effortless and the ability to dynamically train the model using utterances received in real-time at the service endpoints is quite wonderful. Although I do prefer the opens-source way of putting together the components one needs from scratch, paying for cloud services like LUIS and Azure Search is definitely a viable and in some cases a preferable option for delivering projects.
What's next for NewsAlpha
I really enjoyed working on NewsAlpha and plan to continue working on it for some time. Some (out of many) additions I have planned:
- More aggregators for Reddit, Twitter and other forums and social media sites
- Better NLU models for detecting threats, identity hate etc.
- Web interface in addition to CLI