Inspiration
Our inspiration for this project was the Xficient challenge to build an application that parses through a university course description PDF. As an all freshman team, we know the struggle first hand of choosing classes from a selection of over 2000. Not to mention all of us have had experiences where we had to talk to a counselor, but their schedule was full for over a month. This desperation is what fueled the creation of AI counselor. A solution to help guide college students on their academic journey. While the challenge only required the application to process the first 100 pages, in order for it to truly be used as a tool to assist students, our application parses through the entire 800 page pdf.
What it does
AI Counselor is an AI assistant that takes in a university course listing PDF and responds to any question that user might have. Often times in order to retrieve information on a scale like this requires the knowledge of SQL, but this leaves most users unable to access this resource. Our application however, takes in natural language queries and outputs a result within 30 seconds. With an elegant and intuitive AI, critical course info is accessible to all.
How we built it
The core of our application started with trying to parse a PDF file and returning a readable JSON file. Due to the complex algorithms involved, our group decided on Python for its high development velocity. We experimented with various PDF parsing libraries, but landed on PyMuPDF for its speed in processing. We then split up into groups to work on converting natural language into SQL queries, exporting data from the PDF into a JSON file, and the Web Application framework to integrate each individual program. The most complex part was the conversion from natural language into SQL queries. We used Llama LLM from Meta for the conversion. However, Llama out of the box doesn't naturally output SQL queries well, so one of our group members fine-tuned the model to work better with our SQL queries.
Challenges we ran into
I would love to say it was smooth sailing all throughout, but I would be lying. We had our ups and downs, and by that I mean it was mostly down. Our morning started with a team discussion with everyone admitting that we don't really know how to use git. So instead of starting development, our team learned git using a test repository. After an arduous learning process, we finally started on our project. We then realized that we had no idea on how to structure a large scale application like this. It took a lot of fumbling through folders and redundant files to finally format our git repository. After looking through countless forums and documentations, we were able to witness our project come together like a puzzle. There was a moment around 2 in the afternoon where we wondered if we should just walk away, but we persevered.
Accomplishments that we're proud of
Our program runs mostly locally and does not require any external API access such as GPT-4. The PDF processing is extremely fast and deceptively simple. We also expanded the scope off the initial problem, expanding our program to cover the entire 800 page PDF rather than the 100 paged recommended. Most importantly, we persevered. We initially started with low hopes for our project, and believed it would a broken mess when we were done wit it. Regardless, we gave it our best shot and are genuinely proud of what we had created. From using technologies that we were not necessarily familiar with to "powering" through and squashing all of the bugs we encountered, we were truly able to work together as a team to make something beautiful.
What we learned
Sometimes you have to shoot for the moon, because you never know where you will end up. It might be cliche, but we really didn't expect to achieve anything close to the final product that we have today. There were many times that we thought that we would end up with a broken mess, but we pushed through it all and coded for about 30 hours. This experience taught us all about the development cycle, and what it takes to create a large scale application like this. And while we are incredibly proud of our final product, the experience we gained on this journey is worth much more than any prize could be.
What's next for AI Counselor
We were limited by the hardware that we had due to the size of the llm. We had to take shortcuts in order to get around the fact we had limited processing power. Another feature we wanted to implement was to parse through multiple PDFs so that the model has more context to go off of. We would also like to implement auto updating the course catalogue to reflect the courses that change over the semester. We could go on and on, but those are the main features we would implement next.
Log in or sign up for Devpost to join the conversation.