NewsBERT

If you want to stay up to date on technical discussions, you probably browse different sources of information like reddit, twitter, medium and various programming blogs.

Inspiration

In the recent two years there was lots of progress made in NLP because of transformer models. One remarkable feature of these pretrained language models can be used for tasks like Zero-Shot Learning.

Zero-shot learning for text mining is basically unsupervised classification where the classes are text themselves.

What it does

We tackle the problem of organizing information from different social media feeds in single wall that can be sorted by topics.

The app pulls articles from RSS feeds and lets the user filter the articles by topic classes.

How we built it

The app is built using streamlit. We used pretrained models from huggingface transformers and haystack libraries to extract topic scores.

More precisely we use Natural Language Inference models and construct pairs (text, "text is on {topic}") for given topics. The score gives the confidence that text entails the sentence "text is on {topic}" for each topic. This is used as our topic match score.

Our implementation uses deepset's haystack library to reduce zero-shot learning to search problem: for each topic we find top k documents that match query "text is on {topic}".