Inspiration

The average school counselor has 491 students on their caseload. During scheduling season, that's about 12 minutes per kid, to read their file, check prerequisites, and place them into classes. Every year, students get missed. Usually the quiet ones. We wanted to build something that reads the files so the counselor doesn't have to.

What it does

Distill compresses student records down to only the facts that matter for scheduling, automatically catches prerequisite conflicts, and builds a complete master schedule for 250 students in under 30 seconds, with zero period conflicts and zero constraint violations.

How we built it

We built it in three layers. First, an ML compression pipeline: we chunk each student file, embed it using a local sentence transformer model, and use an algorithm called Maximal Marginal Relevance to pick the most relevant, least redundant sentences under a strict token budget. No text is generated, only real sentences from the original file are kept. Second, a prerequisite checker that walks the course catalog and flags any student requesting a course they haven't earned yet. Third, a constraint solver (Google OR-Tools CP-SAT) that places all 250 students simultaneously, respecting period conflicts, seat caps, and ranked preferences to find the globally optimal schedule. The whole thing runs locally with no internet connection required.

Challenges we ran into

Getting the compression to reliably keep hard constraints like "no Period 7" was tricky, early versions would drop them in favor of transcript entries because the model thought grades were more relevant. We also hit a macOS bug where the solver would silently freeze when using multiple threads, which took a while to track down. The fix was one line of code; finding it wasn't.

Accomplishments that we're proud of

87.8% token reduction with zero AI-generated text, every output is verbatim from the source 250 students placed with 0 conflicts, found by a solver considering everyone simultaneously 100% prerequisite contradiction detection before the schedule even runs Runs completely offline, costs $0 to operate, no IT setup required

What we learned

Just picking real sentences instead of generating new ones, is actually the right tool for high-stakes tasks where a paraphrase can cause a real mistake. And constraint programming is way more powerful than people realize. For a scheduling problem with hundreds of interdependent rules, it finds the optimal answer with hard guarantees, something a greedy approach or an LLM simply can't do.

What's next for Distill

Connect to the student information systems schools already use (PowerSchool, Infinite Campus) so counselors don't have to copy-paste anything. Add a district-level dashboard so administrators can compare equity metrics across schools. And let counselors override placements with notes that feed back into the next scheduling cycle as soft rules.

Built With

Share this project:

Updates