Inspiration

In an era dominated by social media influence, it is easy to sway public opinion with a large following. The investigation involves a preliminary analysis of the digital presence of Chinese influencers, spanning platforms such as Twitter, TikTok, Facebook, YouTube, and others.

What it does

Our analysis on what kinds of messages Chinese influencers are spreading and where their regional focus lies.

How we built it

We built our visualizations using Pyplot, GeoPandas, and WordCloud. Analysis was done using Scikit-Learn and NLTK.

Challenges we ran into

The dataset was very limited in terms of the amount of data we could use for our analysis. We decided to try an obtain tweets for each user but due to the changes of the Twitter API in April 2023 we would have had to pay astronomical amounts of money to use it.

Accomplishments that we're proud of

In the beginning we did not think we could do much given the limitations of the dataset. But we were still able to apply machine learning concepts such as clustering algorithms, NLP concepts such as sentiment analysis, and data visualization like word clouds and heatmaps.

What we learned

In this hackathon, our background in data analysis centered on SQL or theoretical learning. Working with Natural Language Processing (NLP) was a big leap; we were able grasp the concept starting from scratch - understanding how to collect data, preprocess the text, evaluate models, and make predictions.

Moreover, the hackathon shed light on the real challenges of gathering ample and meaningful data for NLP projects. We realized firsthand how tough it can be to source diverse and substantial text data. It became clear that the quantity and quality of data play a huge role in the success of any sentiment analysis or NLP endeavor.

What's next for Analyzing China's Media Influence

Given more data such as the content of the users posts we would be able to train a classifier once we label the users using sentimental analysis. We could also use more data from NLP analyses to augment our feature vectors and look for other patterns and create other data visualizations such as network graphs between users.

Built With

Share this project:

Updates