News Analytics and Classifier - Bloomberg Industry Group

The app frontend

Task 1: Embedding preimage predictions:

Opening of a space park in London with a lot of music and fanfare.
A timeline of the Covid pandemic recession in North America and Europe
Deaths resulting from the Covid pandemic and implications for vaccination campaigns
Description of the US withdrawal from Afghanistan and conflict in the Middle East.
Effects of global warming contributing to Hurricane Ian and devastation caused in Florida

Task 2: News article classifier

I used a Kaggle dataset [1] about news articles of several different categories to train a classifier. From the raw data, the preprocessing step was really simple because of the embedding API provided to us. For each news article, there was a short description field, which I directly fed into the embedding to obtain a numerical representation of the data. For simplicity, I only selected 7 of the several categories available in this dataset due to the time limitations of the API. Then, I converted these categories into numerical values and fed them along with their respective embeddings into a random forest classifier. After training, I adjusted the maximum depth to be 7, which provided the best testing accuracy.

To make a convenient frontend, I deployed the random forest classifier model to the Streamlit cloud

References

[1] https://www.kaggle.com/datasets/rmisra/news-category-dataset?resource=download

Built With

Updates

Rishi Phatak started this project — Oct 09, 2022 11:22 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.