logo

CATALYST

We created Catalyst to make the life of engineers, sales people, HR staff and others in a commercial environment easier. Catalyst continuously gathers the internal knowledge of an organisation and makes it accessible to everyone. The way we built our solution provides opportunities to save time, know more and reduce stress on the humans in the loop. Ultimately, this catalyzes all business processes.

Inspiration

We saw that employees of large businesses often spend a lot of time on the hunt for information, stuck in meetings, and struggling with problems that someone within the company has solved before. Moreover, we recognized that internal wikis and recordings of online meetings could be a valuable resource, but often remained underutilized due to the sheer volume and lack of organization. Thus, we wanted to create a tool to access this information quickly and easily, and learn from the experiences of others.

What it does

Catalyst is a comprehensive platform comprising several key components:

  1. Data Collection: Catalyst gathers and stores both internal and external company data from various sources. It also supplements this information with content from internal meetings.

  2. Data Pre-processing: Following data cleansing, the system extracts pertinent details and organizes them into discrete facts. All these facts are stored in a database, enabling users to search for them using text queries.

  3. Chat Bot: An AI model serves as an intermediary between the user and the data. It endeavors to comprehend user inquiries and formulates responses based on information from the database.

  4. User Interface, Trustworthiness, and Explainability: The user interface constitutes the front-end website that facilitates user interactions with the system. We enhance transparency by offering detailed insights into how the chat bot generates responses, including the information from the database that was utilized and associated confidence scores.

How we built it

Here's an overview of the key technologies and components used in Catalyst:

System Design

  • Data Collection: We implemented a Python bot using Selenium that keeps track of scheduled meetings on Google Calendar and automatically joins those meetings on Google Meet. Meeting participants simply need to allow the bot to join. The bot records the meeting, transcribes it and stores the result in a Google Drive. We complement this information with Sika-specific external documents that we obtained from Sika USA's website.

  • Data Pre-processing: We clean all textual data, summarize it and split it into facts using OpenAI's API for ChatGPT. We tried storing facts by themselves as well as in a question-and-answer format (hoping this would retain more context information), but did not see a qualitative difference.

  • Database & Chat Bot: Inspired by works such as DB-GPT, we also encode all data in a vector database to make it searchable via chat (FLAISS). We used the LangChain framework to instantiate a context-aware ChatGPT model. This queries the database with the user's questions and uses the retrieved information to reason about the answer.

  • User Interface: We created a web app using a Flask backend and React frontend. We built a chat interface for our conversational AI, an option to upload PDF files to the database and the option to select different profiles that determine the security level of the data access.

  • Trustworthiness & Explainability: When querying from the vector database, we measure the distance between the query and the result and use this to compute our confidence score. Given an answer from the chat bot, we visualise the confidence score for every item retrieved from the database in the UI.

Challenges we ran into

On the model side, using the vector database is challenging. Although there exist successful implementations of vector databases on large datasets, in our case, the retrieved text embeddings often do not correspond well to the user question that we query with. We tried embedding knowledge in different ways (as facts, Q&A, ...), but had little success in improving performance on this. Finally, the model is still able to answer questions about Sika, but only because ChatGPT was trained on data about it.

Accomplishments that we're proud of

We are proud of the following achievements:

  • Building a fully functional web app in less than 40h while having to tinker around with context-aware LLMs.
  • Integrating data from multiple sources, which required building a bot for Google Meet.
  • Development of an efficient chat assistant capable of providing instant solutions to common issues.
  • Being part of HackZurich 2023.

What we learned

During HackZurich and while building Catalyst, we learned the importance of:

  • Planning the hackathon project starting from the demo. First, plan what the user requires, take into account what you want to/can show and finally derive the set of features that have to be implemented based on that.
  • Working with LLMs and vector databases is pretty easy thanks to frameworks like LangChain, but the performance that we got out of the box was underwhelming. This would be quite interesting to explore further.
  • What to pack for the next hackathon :).

What's next for Catalyst

In the future, we plan to:

  1. Improve the encoding of the vector database. Others show that it can work better. It probably just requires reading more about it and trying some things.
  2. Expand data sources and integrations to enhance the richness of information available.
  3. Extend the chat assistant's capabilities to handle a wider range of queries and tasks, such as multi-modal inputs (text, voice, etc.).

We believe that Catalyst has the potential to revolutionize the way people work and learn.

Challenge: Sika (New Approaches to Sika Knowledge Management)

Link to Deployed Website: http://34.65.18.22:3000

Built With

Share this project:

Updates