Inspiration

It should be clear that studying by reading a textbook is painful. It should also be clear that studying is often non-linear, and almost always not in the exact order that the topics are placed in the textbook. Unfortunately, textbooks are what at least 250 million children have to study off of daily, since they can't afford to go to school, and not being able to access the course content at an efficient rate blocks their competitiveness against their more well-off peers in the academic field. Not only is the world losing talent, but these kids are also losing their one chance at making it out of poverty. Curricula, especially international ones, like IB, AP, A-level, etc., are therefore actively blocking out these marginalized groups from accessing them.

Furthermore, even just for the average student living in a developed country, it is often helpful to get customized lessons for their curriculum. Instead of having to spend effort going through every potentially meaningless detail that is mentioned in textbooks, many students, including myself, prefer to just have the core concepts grasped; in other cases, the textbook may not be enough for students with extraordinary curiosity towards the course content; in almost all cases, the linear approach of the textbook confuses students.

What it does

SnapLearn takes a textbook from you and transforms it into a course that works for you. Specifically, the user uploads a textbook PDF file, while SnapLearn comprehends and suggests a learning plan for you, based on the desired amount of work you wish to do every day, and the relevance of the topics to each other. It then generates the course itself by summarizing and rephrasing the textbook in simpler terms, supplementing it with practice questions. This could mean boiling the course down to its most necessary and core concepts if you just want bite-sized lessons daily, or it could mean massively expanding on the course content provided in the textbook into professional realms if you wanted to work on studying extensively. Oftentimes, this means regrouping the topics covered in the textbook so that you can learn more efficiently and more sensibly.

How we built it

We used spaCy and PDF Plumber to catch the table of contents of each textbook to determine its core topics, which are then fed into an NLP model to determine the prerequisite nature (i.e. dependency) of these topics towards each other. Since multiple relationships can exist, we use a DAG (directed acyclic graph) to represent the learning journey of the user, and thus find the most efficient path that traverses every node using a simple algorithm. Then, based on an estimate of how much time the user needs to cover each of these topics, we built another algorithm to form them into lessons of a desirable size to the user, while also squeezing in practices and accounting for cases where the size of the lesson is not ideal. Finally, we use Gemini to help us summarize the original textbook text and regenerate course content that is helpful for the user.

Frontend page are built in Next.js, React and Typescript. Including welcome page, login page and course dashboard. Login validation is supported by Auth0, while UI design is based on Mantine UI.

The user data is stored securely with Auth0, while the progress of the user is stored in a PostgreSQL server.

Challenges we ran into

Primarily, handling the PDFs, which, since there is no guaranteed metadata, required a lot of effort into NLP models and fine-tuning. We overcame this with sheer effort as we threw our heads against the problem again and again, until the results became satisfactory.

Secondarily, the response time from the Gemini API was often slow, especially when lessons spanned more than 5 minutes each. This meant we had to implement streaming tokens, something neither of us had done before. We overcame such an issue by learning how to use the NDJSON format.

Finally, the logic for splitting/carrying topics across lessons. We had to keep lessons engaging while also not exceeding the time limit provided by the user, and also using every second we could. We reworked the algorithm over and over until we determined that it was satisfactory.

Accomplishments that we're proud of

Most significantly, we never gave up despite the challenges. But simultaneously, we are proud of the fact that we can now successfully extract the table of contents from any PDF textbook and give reliable estimations to the time needed to finish them; we are proud of our comprehensive and dynamic algorithm; we are proud of our seamless integration of many APIs to help us get our work done.

What we learned

A lot of algorithms, design, full-stack development, and machine learning.

Share this project:

Updates