Inspiration
Our initial inspiration for this project was the rapid growth of NLPs like ChatGPT. However, as most people realize at some point, large language models do not know everything. And in many cases, especially in academic, there are entire gaps in its knowledge, since a lot information is not posted publically online. So with that in mind, we created QuereadAI, which allows users to create their own chat bot, that can use information not available online, fine tuned by users themselves.
What it does
QuereadAI is a powerful tool that is able to read through large amounts of text-based files (pdf, txt, ppt, etc.), and use them as context as a chat bot. Essentially, it is able to act as large language models like ChatGPT, with the bonus that it uses the files uploaded by the user as the primary context. This allows for extremely specific questions like "what day does CS 1332 have exam 2?" to have accurate responses, given the right information inputted by the user. However, it still retains the powerful natural language processing capabilities, being able to reform the information to any number of things like summaries, key points, and even more complex things like stories and poems. As a result, this is an extremely powerful tool in many circumstances, but especially in academia. This is because students could upload documents provided, which would allow them to not only skip the arduous process of searching to get one bit of information, but QuereadAI can generate study guides, summaries of units, example questions and more! However, this is not just limited to school, and the possibilities are endless given its capabilities.
How we built it
QuereadAI uses LangChain, as well as OpenAI's natural language processing as a basis to generate responses to users who submit any form of text file (pdf, txt, ppt, etc.). Based on the data that is submitted, which is stored on a chroma vector database specific to each user, we are able to display the output through Gradio, a lightweight framework that allows for seamless display of ML models in web apps. However, accessing this functionality would not be possible without our surrounding web development, which was done using bootstrap, tailwind, scss and flask, as well as OAuth, allowing us to keep track of user history, and retain their files across instances. We of course additionally hosted back-end functionality on AWS, allowing us to not only store user data, but also ensure security.
Challenges we ran into
One of the biggest challenges we encountered was our usage of Gradio. While it is an excellent library due to its simplicity, it is also its biggest downfall. Not only that, but it is also a very new project, so the documentation is lackluster. These factors led to us having a lot of problems with integrating it with flask, as they both run on localhost ports, and using both in a web development environment was a nightmare. To make things worse, since our specific use case was specific, and Gradio as new as it is, we were basically alone on figuring things out.
Accomplishments that we're proud of
Our biggest accomplishment was definitely being able to integrate all the functionality we had planned from the beginning. Going into the project, we definitely did not expect to be able to complete everything we set out to do, but we ended up cutting out nothing. Specifically though, we are happy about being able to utilize very new libraries like chroma and Gradio in our project, which make our product stand out. Finally, we all learned a new technology or two along the way, which is a great bonus to having a final product we are happy with.
What we learned
For most of us, it was our first natural language processing project, which was both exciting and daunting. However, in the end we feel as though we understand not only the technicalities of how an NLP works, but also how to implement one in a practical way. Additionally, it was the first time all of us used OAuth, which was similarly daunting and caused a lot of headaches, but in the end was a worthy endeavor.
What's next for QuereadAI
The first step after the hackathon will be to make it public facing. Right now it is only accessible if you have the source code, and run it locally; however, we hope to make it accessible through a browser, so anyone can go to the website and use our product. We also want to make the file manager better, as right now it is impossible to remove a file without clearing them all. Also, we hope to add some functionality to view the files, so you know which is which, without needing to know their names.
Built With
- amazon-web-services
- bootstrap
- chromadb
- css
- flask
- gradio
- html5
- javascript
- langchain
- oauth
- openai
- python
- scss
- tailwind
Log in or sign up for Devpost to join the conversation.