Inspiration

Google is a huge company based on traditional information retrieval technology: it currently returns a list of webpages, and then you often still have to hunt for the information you’re looking for. In the future, I believe information retrieval will be more conversational: the search engine will be able to understand your questions better than it does now and directly reply to you with an answer.

I met someone a few months ago at a hack night at the HF0 house in Taiwan who told me that ColBERT is (was?) the state of the art language model for these sorts of applications, so I wanted to make a project to try it out and see how well it works.

What it does

Answers questions about Miami Hack Week based on a corpus of messages from our Slack workspace.

How we built it

  1. Built a Slack scraper in Python to dump messages from the Miami Hack Week Slack workspace into a sqlite database
  2. Downloaded the ColBERT code from https://github.com/stanford-futuredata/ColBERT
  3. Downloaded a pre-trained ColBERT model from https://huggingface.co/vespa-engine/colbert-medium/tree/main
  4. Wrote some glue code to index the Slack messages and a frontend to do retrieval

Challenges we ran into

  • The pretrained model was not quite in the format the the ColBERT code was expecting and it took quite a bit of messing around to get it working.
  • My MacBook Pro can’t run CUDA, so I had to comment out a bunch of CUDA code to get the model running on my CPU.

Accomplishments that we're proud of

I think the result quality is reasonably good, and would be even better with a larger corpus of messages.

What we learned

  • Maybe I would’ve been better off working in a Colab or a VM or something that can run CUDA
  • Using pretrained models isn’t always as straightforward as one might hope
  • If I’d partied less, I maybe could’ve built a prettier frontend

What's next for Miami Hack Week 2021 Q&A bot

Couple possible directions if people think this shows promise:

  • Finish the Slack app and embed it into the Slack UI
  • Build a version for Notion documents
  • (long shot) Try to make the language model more conversational, like GPT-3, and build a search engine

Built With

Share this project:

Updates