Inspiration
Confused in class reading a research paper or book, we began to use ChatGPT and other chatbots to aid our studies by providing summaries and prompting in the direction in which we were confused. In the middle of our reading, when asking a question along with providing the file, we would always receive information that was past our current page and needed more reading to understand. And even worse, when reading a fiction book, we would have the story spoiled if I asked the chatbot any questions about a certain character.
That's where LitPal, our website, comes in handy. We wanted to make sure that if you had questions about a book or an article, you can be sure, the response will be curated and easily interpretable for what page on the book you are on.
What it does
Our project offers you access to relevant literature and a chatbot. With user authentication, you are able to have a personalized experience, storing relevant and favorite literature, right at your fingertips.
Users are able to search for their pieces of literature and read their article from the website. As you scroll and read the article, our custom LLM is updated on the content you have read, and trained to answer specific questions based up to that point. You can easily navigate through the pages of the passage and ask the chatbot any questions that you may have about it as you go.
This isn't just about saving you time and effort; it's about helping you really understand what you're reading. We're talking about comprehending those complex topics with ease and getting a deeper understanding of the subject. So, whether you're a student, researcher, or just a curious soul, our project is here to make reading a breeze.
How we built it
Frontend:
The entire framework is built off of React.js and TailwindCSS. We used React-Router for page routing and included user authentication through JWT and React Auth Kit. We custom built the Chat UI and used a PDF Viewer component to render the file. Also used AJAX for REST API requests to the backend.
Backend: To extract data from the Google Scholar articles, we created a custom web scraper that utilizes Python's Beautiful Soup library to scrape the most relevant results to a particular search query. From this, we downloaded the PDF and preprocessed it by converting its PDF form into a processable form for our custom LLM. We then created vector embeddings based on this data in our Chroma vector database. Finally, we processed these embeddings in our custom Large Language Model from the Hugging Face framework tuned towards this specific knowledge base to help process any related queries to the document. To make our backend functional and accessible for our frontend, we used Flask to create API endpoints for our frontend.
Challenges we ran into
Backend: We were unable to obtain PDFs of books from the Library Genesis API as the PDFs were coming out distorted. We also attempted to implement multiple versions of OCR algorithms but finally diverged to an efficient and reliable library for our use case. To implement our LLM, we struggled to find an effective solution through HuggingFace and OpenAI, tuning multiple models to our knowledge base to eventually arrive at a model that produced the best results. This tuning and selection process took many hours and a majority of our time during this project.
Frontend: Much of the React development challenges came because our team wasn’t universally experienced with React. As a result, we had to give most of our work to one person to develop on. There were also a lot of debugging challenges that had to be addressed through pure trial and error through the website development process.
Accomplishments that we're proud of
General: Teamwork: switching tasks when teammates were struggling and communicating that struggle, how to ask for help from one another.
Sid: Parsed Google Scholar for PDFS, and article data. Fed it into OCR models to convert from pdf to text. Made a Vector Database with Chroma, trained it on Mosaic7B from HuggingFace.
Brice: Learned git commands, debugged pieces of the LLM model, worked on the presentation, and provided academic resources to facilitate the project.
Harman: Set up Flask endpoints with the frontend. Built middlestack with Redis database. Converted our .txt files to .csvs, and vectorized the data.
Vish: Set up the Frontend with React.js and Tailwind.css. Implemented user authentication with JWT tokens. Used React router to create a routing system for the pages. Set up the Flask/RestAPI, to provide Communication with the frontend and logic of code. Integrating all built services, to be clean callable functions. Used Pinecone, OpenAI embeddings, and custom HuggingFace Model. Finetuned model on our Knowledge Base.
What we learned
Brice: Explored the Theory of NLP and Vector analysis. Learned how to use Figma to create Presentations.
Harman: Learned to use Flask, in a more free-flowing environment. Learned core ML concepts and implemented it to use the Redis database.
Sid: Learned how to make use of HuggingFace, use OpenAI API and embeddings, set up Vector databases, web scraping with BeautifulSoup, and integrate Redis into workflow.
Vish: Honed knowledge of Fine Tuning models and embeddings. Also honed in App Development skills.
What's next for HackSC-X: Education- LitPal
Our current project is primarily focused on academic literature, but we are looking to expand its utility to reach a broader audience. To achieve this, we are exploring the possibility of incorporating more traditional books, beyond academic texts. One avenue we're considering is adding contextual information from sources like SparkNotes, which is widely used for understanding classic and popular literature. By doing so, we aim to provide users with a comprehensive view of both academic and recreational reading materials. We also want ease of access to the literature, by being able to directly link to a DOI, or show books based on covers, we can make the transition from local readers to our website much easier. Additionally, we intend to take a personalized approach to learning by creating tailored learning plans for users based on their specific struggles and areas of difficulty. We hope that this personalized approach will empower users to improve their comprehension and engagement with the material they read, regardless of whether it's academic or recreational. Our commitment to enhancing the user experience remains at the forefront of our project's future development.
Log in or sign up for Devpost to join the conversation.