StandardSync - Sync Content to Standards

Inspiration

OpenStax has created an incredible library of free, peer-reviewed textbooks, helping over 1.7 million students. But there's a major bottleneck: for schools to actually adopt and use these books, teachers need to know exactly which government standards they cover. Currently, mapping a textbook to these standards is a tedious, manual process that involves people reading every single page and tagging it by hand. We wanted to build a tool that automates this part, allowing OpenStax to scale faster and helping teachers find the exact content they need without the headache.

What it does

StandardSync is a classification tool that reads textbook content, like section text, examples, and exercise titles, and automatically predicts the correct educational standard.

It handles the nuance of educational data by using a hybrid approach. First, it performs a semantic similarity analysis to understand the text's actual meaning. Then, it applies a logical layer to check the context to make a high-precision prediction.

The React web application comes with functionality of Voice-based assistance (using ElevenLabs) as well as ability to enter an educational objective and get a mapping to a educational standard from a textbook (by running our classification model behind the scenes).

How we built it

We built a Python pipeline that mimics how a human would solve the problem:

The file provided was deep, recursive JSON objects, so we wrote a parser to flatten everything out, ensuring that every single math problem "remembered" where it came from (like its Grade, Domain, and Cluster).
We used all-mpnet-base-v2 to turn the text into vectors. This allows the system to match concepts based on meaning (semantic search). We also noticed that certain standards only ever show up in specific chapters. We built a statistical engine that looks at the metadata to calculate the probability of a standard appearing in a given domain.
We had a problem where the model was guessing too many answers. To fix this, we wrote a dynamic threshold function, so the model picks the top match, and only adds a second one if it's 95% confident.

Challenges we ran into

Our first model tried to be helpful by suggesting 3 or 4 standards per question. However, we realized the competition metric required exact matches, meaning our broad guessing strategy resulted in a 0% score. We had to completely rethink our logic to prioritize precision (finding the one right answer) over recall.
The OpenStax data is deeply nested, so extracting a specific exercise without losing the fact that it belongs to a certain chapter required writing some recursive functions to keep the context intact.
Many standards sound fairly similar. A standard text search couldn't really tell the difference, so we had to rely on our statistical engine to differentiate them based on the chapter structure.

Accomplishments that we're proud of

We're proud of not backing down despite initial setbacks, and eventually achieved a great outcome of ~95% accuracy. We went above and beyond with a React web app that allows for teachers to utilize Voice based features (with ElevenLabs) to instantly classify their content into textbook standards, as well as read through various standards and find the closest match to their classroom content, helping solve their problem of having to spend time in manual tasks such as reading text-heavy books.

What we learned

We learned a lot about text classification and analysis. We found that combining multiple signals (semantic similarity, domain patterns, concept patterns, keyword matching) significantly improves accuracy over any single approach.

What's next for StandardSync

We hope to experiment with fine-tuning the sentence transformer model on educational content. We also hope to take our application further and add more functionalities, such as collaborative features, support for other educational standards frameworks, and a mobile app.