As students, there are few things more frustrating than spending hours poring over documents and textbooks, only to find yourself still struggling to understand the material. This was the inspiration behind DocuBot: to create a study tool that could simplify the learning process and provide answers to any question based on the material you provide it.
What it does
With DocuBot, students can upload any document, and AI technology will analyze it to provide insightful and relevant answers based on the context of the document. This means students can spend less time searching for information and more time understanding and retaining the material. DocuBot was designed with the modern student in mind, recognizing the demands of a fast-paced academic environment and providing the tools needed to succeed. With DocuBot, students can unlock their full potential and achieve academic success with ease.
How we built it
DocuBot uses a multi-step process to help it answer questions about documents successfully. Firstly, DocuBot uses a combination of Optical Character Recognition (OCR) and Natural Language Processing (NLP) to understand the context of the document. Next, the documents are vectorized and uploaded to a vector database to be indexed. When a question is received, DocuBot performs a cosine similarity analysis to find the relevant context. The context is then fed into a primary and a secondary Large Language Model to successfully answer the question. See the White Paper below for more information.
Challenges we ran into
We were all very new to the machine-learning field and needed to learn things as the project progressed. One of our biggest challenges was finding a library that performs OCR on the document. While some libraries like
tessarct perform well on natural language, we could not find any that support math symbols and formulas. We solved this by writing our own OCR algorithm that converts PDF documents to LaTeX with 90% accuracy according to our non-scientific tests.
Accomplishments that we're proud of
Despite being new to the machine-learning field, we were able to overcome several challenges and achieve some significant accomplishments with DocuBot. One of our proudest accomplishments was developing our own OCR algorithm to convert PDF documents to LaTeX. This was a crucial step in enabling DocuBot to accurately analyze and understand complex documents containing math symbols and formulas. Our algorithm has achieved an impressive 90% accuracy rate in our non-scientific tests, which is a testament to the dedication and hard work of our team. We are proud to have developed a solution that was not available in existing libraries, and we believe it represents a major step forward in the field of document analysis and understanding.
What's next for DocuBot
In the future, we hope to add support for images, diagrams, and charts; and score DocuBot on the M3C task. See the White Paper below for more information.
The White Paper
For more information please see the White Paper for DocuBot