The Transformer model in NLP has truly changed the way we work with text data. Transformer is behind the recent NLP developments, including Google’s BERT, Open AI’s revolutionary GPT and Facebook’s BART. After learning how the Transformer works and how it’s related to language modeling, I realised that we are under-utilizing it's capabilities and hence, I developed "Transformerly".
What it does
The key modules of this project are:
- Zero-shot-classification of LIVE Twitter Feeds - For general topics, like the random tweets on twitter, zero-shot classification model works amazingly well. The webapp allows the user to enter the number of tweets, a key word and the labels or parameters the user wants to classify. While utilising colab’s GPU engine, the web app is capable to fetching (using snscrape) and classfying (using BART) live tweets in few seconds. (2500 tweets in 44 seconds). The user is free to put any number and type of labels (the beauty of Zero-Shot). The user can put from date label in the format YYYY-mm-dd and set the language code (compatible with 100 different languages). A real time graph using charts.js is generated for the user to get the best visualization of the data.
- DATASET LABELLER - For researchers who are looking for labelled dataset but have this chunk of unclassified texts, transformerly provides an excellent easy-to-use solution. The user simply has to upload his excel/csv file into the webapp, enter the labels to be classified on and within few seconds a new labelled excel file will be available for download.
- CONTENT GENERATOR - This module is designed for creative writers who ran out of ideas and are open to take AI’s help in generating contents like: a. Poetries b. Q/A jokes c. Story Tales d. Play Script e. Conversationalist The length and temperature of the text to be generated can be adjusted by the user.
- REDDIT CONTENT ANALYSER - Live Reddit Content Analysis tool is available for analyzing Reddit comments, replies etc. based on a topic and labels as required by the user. (Similar to Twitter Analyser tool)
How I built it
ZSC Model : The model utilises Google Colab's GPU engine and Tranformer library for ZSC on user-defined classification labels for content fetched from Twitter/Reddit/User's input. GPT-2 Model: The OpenAI's GPT-2 model is fine-tuned on different types of datasets to generate content live. Flask App : The flask web app ensures the whole integration of the user-facing web application.
Challenges I ran into
There were multiple challenges and risks involved in the implementation of transformerly. Following is the analysis for the same:
Accomplishments I'm proud of
Transformerly is a unique all-in-one application that can disrupt the way we analyse social media.
Who is this for
It can help sales department of a product company to analyze the brand feedback on twitter/reddit (simply put keyword "brand name" and put label as "positive,negative" (or literally anything) to analyse user opinions ). It can help the police department of the state by extracting the past n number of tweets of a location and classify it as "emergency crime" or "peace". It can help ML enthusiasts who are missing out on labeled datasets by providing one-click solution (Dataset Labeller Tool) It can help content writers shape their ideas with content generator feature. The possibilities of using transformerly are endless! It is compatible with 100 different languages.