InstiGPT

Home screen of InstiGPT

Inspiration

The generative AI models have taken the world by storm. They have made the information, which was previously difficult to find and comprehend, available easily. InstiGPT aims to do this for the internal information of IIT Bombay, which is not accessible to outsiders (as well as information accessible to outsiders), and make students' lives easier!

What it does

InstiGPT uses state-of-the-art RAG techniques to efficiently and accurately find information related to IIT Bombay and answer the students' queries. It can answer queries related to courses (their content, their faculty/TAs, their grading in previous semesters, and much more), internships and placements (stats and how they can prepare for particular companies), fun facts about the institute, including internal slangs (which is especially useful for freshers), and many other queries that most of the students face regularly. It can even compare courses from other institutes with the ones offered by IIT Bombay, which has proved very useful for students planning on participating in student exchange programs.

How we built it

InstiGPT is built on top of RAG and uses state-of-the-art techniques to improve the naive RAG approach. Here are some notable points about our implementation:

We curated information and generated vector embedding on that information which is stored in a vector db.
Our chain consists of not only a vector db (azure cosmos db) in which we have stored internal documents such as course information (such as grading, handouts, faculty information, etc.) but also incorporates a live search on the web to answer the user's questions better when they ask a question which is not particularly made for the curated-information in our database.
We use a FastAPI based backend REST API to handle the text generation and database related operations.
The frontend is powered by a Next.js app written in typescript and has been made completely responsive (desktop + tablet + mobile).

Challenges we ran into

One of the most difficult challenge was to make sure that the LLM does not hallucinate. We researched a lot of techniques to improve this such as improving the prompt, rechecking the answer given by the LLM by again passing it to the LLM and asking whether it was correct, etc.
The data curation (retrieval and cleaning) was also a huge task. We had to scrape all sorts of documents such as pdfs, json files, html pages, xml files, etc.

Accomplishments that we're proud of

The hallucinations by the LLM has been reduced a lot and we have gotten it to say that it does not know the answer when it does not.
It can answer most of the queries by the students (we have over 6k students now using it in IIT Bombay)
Created helpful script to add new data from various file formats, such as PDF, JSON, HTML, etc., to the vector db, which makes it insanely easy to add new data and can be done by anyone without knowledge about the application, which can help in avoiding stale data or unavailability of new data.

What's next for InstiGPT

We aim to integrate InstiGPT into other apps of our institute where it could take context from the part of the app the user is in and help them out. For example, we have an app called ResoBin which contains all documents related to courses such as previous year exam papers, previous year course handouts and notes, the course content and much more. We wish to add InstiGPT as an addon to this app so that it can guide the students better when they face a problem such as helping them get the answers to particular questions from the papers or comparing course contents to select which course would be better for them.

Built With

azure
azurecosmosdb
langchain
mongodb
next.js
openai
python
typescript

Updates

Geet Sethi started this project — Jun 26, 2024 01:19 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.