What is the problem you are solving?
Given the coherent text generation capabilities of the GPT-2 model, we aim to use it to produce sarcastic replies to a specified user comment since it's a fun way of how humans can interact with machines.
Why did you choose this problem?
We see that chatbots are usually boring and not worth talking to. However, if added a tiny dose of humor, wit,,, and sarcasm, it can become fun to actually have a chatbot around! On this thought, we decided to create a simple, yet fun Discord bot that will generate and reply with a sarcastic comment to any comment specified to it.
Briefly explain the models you experimented with to solve the problem.
Since we wanted human-like texts, training a language model from scratch was not an option. So, we chose to fine-tune the GPT-2 model available in the Hugging Face library.
What challenges did you face?
The first challenge was to learn about how to approach such a task. Then we came across the paper that used a pre-trained transformer model to do text summarization, which is very similar to what we had in mind, not on a semantic level, but on the representation level. So, we used that method to proceed with the implementation. This was particularly challenging since Hugging Face is a HUGE library with innumerable methods, settings, etc. It was a nice learning experience to finally get the model to start training! Further, due to limited amount of computational resources, and the ML model itself being very large, we had to carefully load data to the model so that it doesn't crash while training. For this, we used a batch size of 1, and also used accumulated gradients so that instead of calling the optimizer step each time, we call it after a certain amount of samples had been processed. This turned out to be a bit lighter on RAM usage and worked well.
Finally, since we are hosting the bot on our private server that doesn't have access to a GPU for inference, the bot takes a bit (~10 seconds) to generate a reply. It was taking longer earlier, but we managed to load the model once and keep it in memory to make inferences rather than loading it each time.
What is the result you are the most satisfied with (accuracy, other metrics, etc)?
We are quite satisfied with how the model performs. Sometimes, it generates really witty replies that are fun to read!
If you were to continue this project, what would you explore doing / improve?
We will definitely try to train the model on a larger subset of the available data. Moreover, we'll also try mitigate the biases that the model has learned by carefully choosing the training data to not include obscene or racist comment/replies. Lastly, we would also like to deploy it on a scalable web hosting, and publish it as a public bot that can be integrated into other Discord servers as well.
How would you recommend approaching similar projects?
I'll definitely start by exploring the literature and the previous work done in areas related to the problem you are solving. Then once you have a definitive strategy penned down, you can move on to explore the tools that are available to implement the project. This step should be done after the first one since it's often the case that it's easier to find tools than have a roadmap defined. Finally, keep iterating until perfection (which doesn't exist, unfortunately)! It's also very important to keep in mind the effects and consequences of making the service available to the public.
Built With
- discord
- hugging-face
- pytorch
Log in or sign up for Devpost to join the conversation.