We were first inspired to make a chatbot which could have realistic conversations. Among the many applications of this technology, we all recognized the usefulness of a tool that could extract opinions and answer questions about topics discussed on podcasts. So, we created Pod Bot, a text-generative chatbot which is trained on the transcripts of podcasts and offers realistic conversation on a range of interesting topics.
What it does
Our chatbot answers questions about topics discussed on the Tim Ferriss Show. These topics include entrepreneurship, cryptocurrencies, creating healthier lifestyles, and more. Our chatbot is able to hold a realistic conversation with users and provide insightful opinions.
How we built it
We used transfer learning on the pre-trained Blenderbot 90M model, which is a chatbot that Facebook created and trained on the pushshift.io (Reddit comments) dataset. The pre-training of Blenderbot on Reddit comments is what allows the chatbot generate realistic, contextually accurate, and grammatically correct responses.
We scraped 150 transcripts from the Tim Ferris Show to create a custom dataset which we used for transfer learning on Blenderbot. One reason we chose the smallest Blenderbot variant was because we wanted the transcripts to affect the largest relative influence on the model's parameters. As well, time restriction and resources limitation made the smallest variant the most logical option.
Finally, we deployed our model onto Google Cloud using websockets and Docker.
Challenges we ran into
We were limited by the available datasets with accurate transcripts. We spent a significant amount of time doing data preprocessing. The documentation for Parl.ai was quite complicated which made transfer learning and deploying the model challenging.
Accomplishments that we're proud of
We are happy that our chatbot is able to have realistic communication regarding interesting podcast topics. We are proud that we were able to learn how to use Parl.ai and apply it to a chatbot in such a short time frame. We are also proud that the bot is an available online service.
What we learned
We all learned a lot about cutting-edge NLP techniques. We became familiar with Facebook's Parl.ai API and HuggingFace. As well, we developed confidence with using cloud services and Docker.
What's next for PodBot
The next steps for PodBot would be to expand our dataset and train our model on more podcast transcripts.