Miami Hack Week 2021 Q&A bot

“Where can I find free beer?”
“Which house throws the best parties?”
“What are the best technologies to hack on?”

Inspiration

Google is a huge company based on traditional information retrieval technology: it currently returns a list of webpages, and then you often still have to hunt for the information you’re looking for. In the future, I believe information retrieval will be more conversational: the search engine will be able to understand your questions better than it does now and directly reply to you with an answer.

I met someone a few months ago at a hack night at the HF0 house in Taiwan who told me that ColBERT is (was?) the state of the art language model for these sorts of applications, so I wanted to make a project to try it out and see how well it works.

What it does

Answers questions about Miami Hack Week based on a corpus of messages from our Slack workspace.

How we built it

Built a Slack scraper in Python to dump messages from the Miami Hack Week Slack workspace into a sqlite database
Downloaded the ColBERT code from https://github.com/stanford-futuredata/ColBERT
Downloaded a pre-trained ColBERT model from https://huggingface.co/vespa-engine/colbert-medium/tree/main
Wrote some glue code to index the Slack messages and a frontend to do retrieval

Challenges we ran into

The pretrained model was not quite in the format the the ColBERT code was expecting and it took quite a bit of messing around to get it working.
My MacBook Pro can’t run CUDA, so I had to comment out a bunch of CUDA code to get the model running on my CPU.

Accomplishments that we're proud of

I think the result quality is reasonably good, and would be even better with a larger corpus of messages.

What we learned

Maybe I would’ve been better off working in a Colab or a VM or something that can run CUDA
Using pretrained models isn’t always as straightforward as one might hope
If I’d partied less, I maybe could’ve built a prettier frontend

What's next for Miami Hack Week 2021 Q&A bot

Couple possible directions if people think this shows promise:

Finish the Slack app and embed it into the Slack UI
Build a version for Notion documents
(long shot) Try to make the language model more conversational, like GPT-3, and build a search engine

Built With

Updates

Ryan Landay started this project — Aug 06, 2021 01:39 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.