Sense Media

Dashboard preview
Project Architecture

The Goal

The project aims to detect hate speech against individuals, communities, organizations, company on social media and use that data for analytics. Along with hate speech, the project also focuses on sentiment analysis of news media articles about any of the above-mentioned entity and present the resultant data in a dashboard.

Project Architecture

architecture

Data Producers: There are 2 producers, one producing Tweets and the other one is responsible for sending news articles. The producer services send the messages via a Kafka topic.
Data Consumers: Like producers, there are 2 consumers. One for tweets and another for news articles. On the consumer, the Expert.ai NLP API is called for the classification and speech detection of the received messages. The classified data is then sent to the Elasticsearch indexes, that is used by Kibana for dashboarding.
Apache Kafka: Has two topics, "topic-1" and "topic-2", each with 2 partitions and 2 replicas.
Elasticsearch & Kibana: Two separate indexes for storing news articles and tweets. Kibana accesses the data for visualization.

Inspiration

While browsing the internet I came across some negative/abusive comments against a community. People from around the world were abusing the ones who are different, and who are not like us. It is okay if they are different but that doesn't give us the right to spread hate against them. Some people have learned to accept them, but many cases still register, related to racism, social abuse, and social media bullying.

So, I thought, why not make a real-time analytics pipeline where we can witness how a community or an individual is treated on social media and the internet? Using the obtained data and with the help of analytics, we can take effective measures against such actions and Make the world a better place.

What it does

The concept is simple. A pipeline that takes data from different sources, like Twitter, and news APIs, send them via Kafka brokers to an Elasticsearch index which Kibana finally uses for visual representation. These data can be further used for analytics and studying human behaviour on social media and the Internet.

The Expert.ai NL APIs play a major role in classifying and detecting the contents. The Sentiment Analysis and Hate speech detection APIs work on the data made available via Kafka and store the result in the Elasticsearch indexes.

The data from Elastcsearch is used by Kibana to visualize by the Data Science team to do further analytics and get valuable insights.

How I built it

The main components are containerized using Docker. Thus, a single docker-compose command fires up the entire pipeline.
Containers include "Apache Kafka", "Zookeeper", "Elasticsearch", and "Kibana".
The producer and consumer services are built using "fastAPI".
The Machine Learning section is handled by Expert.ai APIs. Includes Sentiment Analysis and the Hate-Speech detection algorithm. Expert.ai makes building the ML pipeline very easy and hassle-free.

Structure

    Sender
        | - producer.py
        | - main.py

    Receiver
        | - consumer.py
        | - model.py
        | - elk.py
        | - main.py

Sender: The main job is to send the data (message) via Apache Kafka. The producer.py module handles Apache Kafka using the kafka-python package.

Receiver: This is where the majority of work takes place.

The consumer.py module handles the messages sent via Kafka.
The model.py module takes care of the Expert.ai NL APIs.
The elk.py module deals with the Elasticsearch using the elasticsearch package.

Expert.ai NL API

The Expert.ai API code is inside model.py. A Model class has been defined to handle all NL tasks.

class Model:
    def __init__(self):
        pass 
    def hate_speech(self, text):
        # hate speech code
    def sentiment_analysis(self, text):
        # sentiment analysis code

The Hate Speech detector aims at detecting and classifying instances of direct hate speech delivered through private messages, comments, social media posts and other short texts.

The detector can identify three main categories of hate speech based on purpose:

Personal insult
Discrimination and harassment
Threat and violence

Sentiment analysis is a type of document analysis that determines how positive or negative the text's tone is.

Tools used

What I learned

Before this project, I have never worked with real-time systems. I was only involved with handling ML models. But thanks to Expert.ai's hackathon, I have learned a lot about data ingestion pipelines and fundamentals of Data Engineering. Also, this is the first time where I got the chance to explore Apache Kafka and Elasticsearch. And after building this project, I now at least have an idea of why and how to deal with a data pipeline.

What's next for Sense Media

I want to make this a standalone tool. A tool that can be integrated with any project, irrespective of the tech stack. I want to make it modular so that one can quickly set up a real-time pipeline within no time.

Built With

docker
elasticsearch
expertai
fastapi
kafka
kibana
python

Updates

Dipankar Medhi started this project — Oct 21, 2022 03:05 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.