We got our inspiration from a main stream LLM model called GPT, and how popular Chat GPT is. It is incredibly useful for many things, such as having conversations, writing code, accessing resources, sending personalized queries that are hard to google etc.
Hence, we wanted to create a specialized LLM chatbot for UMass, which would act as a central access point for all UMass information, such as the latest events, course prerequisites etc. This would be incredibly helpful for incoming students and freshmen, for their non-specialized queries. Speaking from personal experience, it can be very consuming to search for the right of bit of information you're looking for since there are so many pages. This makes that much easier
We scraped the data from UMass websites and were about to fine tune a pre-trained LLM model on it (training 0.12% of the weights). The fine tuning of the model would make the chatbot better respond to UMass related queries. However, we were unable to do this in time and are using a more rudimentary LLM.
We used multiple web scraping techniques to scrape thousands of websites and parsed the data. Then we preprocessed it for the LLM and fed it the data. At the same time, we created a web app, for the user to interface with the LLM and make queries. At the very end we linked the output of LLM with the interface.
Scraping the web was quite challenging especially how different all the websites are, and that you need to get everything in a single text format. Writing the LLM was a particularly challenging because of multiple different libraries working together, many different data formats, and the inherent complexity of the architecture.
We learnt a lot about LLMs, web scraping, databases and web development.
We will try to fine tune the model and increase the data so that it returns more specific results, as well as including near real time data updates.
Built With
- beautiful-soup
- javascript
- llm
- python
- remix
- scrapy
- sql
- tailwind
- typescript
Log in or sign up for Devpost to join the conversation.