What inspired this
I'm a junior doing a double major, and outside class I'm an 3 jobs, then the internship hunt, juggling with 19 credits every sem. and ML research, and cooking. A normal week has more moving parts than I can hold in my head.
So I tracked it badly, planned late at night, and missed deadlines.
I tried using an LLM to plan my week. It felt smart for a day. Then I noticed the schedules were wrong, two things in the same hour, work scheduled after its deadline. It wasn't thinking about my week. It was generating text that looked like a plan. An LLM doesn't know whether a schedule is feasible. It just writes one that sounds plausible.
That's why Donna exists. Planning a semester isn't a language problem, it's a constraint problem.
What it does
EVERYTHING.
You upload your syllabi. Donna extracts every deadline, learns how long each kind of assignment takes you, and builds a weekly schedule that fits around classes, sleep, and the things you won't give up. Ask it "can I take Friday night off?" and instead of guessing, it re-solves and tells you exactly what would break.
The LLM is still there, but it only translates: it turns your sentence into a structured request and explains the result back. It never decides anything.
How I built it
The core is a Python sidecar running the algorithms, under the existing React + Express app.
Scheduling is a mixed-integer program solved with CP-SAT:
$$\min \sum_k w_k \cdot \text{penalty}_k(x) \quad \text{s.t. hard feasibility constraints}$$
Hard constraints for deadlines, sleep, and class times; soft penalties for hard work late at night or excessive context-switching.
Time estimation is a hierarchical Bayesian model — it guesses reasonably for a new user, then personalizes as real completion data arrives:
$$\log(\text{hours}) \sim \mathcal{N}(\mu_u + \alpha_t + \beta_c + \gamma^\top x,\ \sigma^2)$$
When a request is infeasible, Donna runs an IIS-style search to return the minimal set of conflicting constraints, so you know what to move. There's also large-neighborhood search for fast rescheduling, a Cox model for procrastination risk, a Thompson-sampling bandit for notification timing, and a CRF that checks the LLM's syllabus extraction. Eight algorithms total; the LLM does three narrow language jobs.
What I learned
How easy it is to look finished. Every endpoint returned 200 and it felt done. Then I audited each algorithm against known inputs and found the feasibility checker returning "feasible" for obviously impossible inputs the solver was quietly allowing unlimited overflow. It passed every smoke test and was completely wrong.
That's the difference between "the code runs" and "the algorithm is correct." I ended up writing explicit acceptance tests with numeric pass conditions for all eight algorithms, because "it returns something" is not evidence.
The challenges
The audit-and-fix loop. Several early implementations were broken in ways that only showed up under specific inputs, the feasibility bug, a reoptimizer that reshuffled the whole schedule instead of patching locally, a router that sent every request down the same path. Each passed lint and built fine. Fixing them meant going algorithm by algorithm with real test cases.
The other was honesty about models that need data. The Cox model, the bandit, the forecaster are real implementations, but they only get good after weeks of real user data. Instead of claiming they were "trained," I built the retraining loop properly and stayed straight about where they are.
Built With
- docker
- express.js
- fastapi
- groq
- javascript
- node.js
- numpy
- oauth
- postgresql
- pymc
- python
- react
- recharts
- redis
- scikit-learn
- sql
- timescaledb
- vite

Log in or sign up for Devpost to join the conversation.