EV dojo

Inspiration

"It’s insane we don’t have rapid feedback for this already! If you’re learning piano, you immediately hear when you press a wrong key; ... You experiment, adjust, until you figure out how to get closer to your goal. But most of the time your behavior is causing concrete, visceral impacts ... that you'll never hear about. It’s like trying to learn how to play a piano with earplugs in. All your moves fall into some great void. Nobody else is doing this. Someone should do it. I think it would be good for the world."

As it turns out, most things in life don't always have unambiguous feedback, and the beauty of tech should help bridge this gap.

This is where EV Dojo comes in. We want dojos —training arenas where we can fail and improve —so we can hone our expected value of our skills, where a feedback loop may not exist otherwise.

What it does

We already have a useful way of solving the problem of a lack of clear feedback: coaches. Experts in a specific topic are great for giving insightful comments, but they tend to be costly and too individualized to scale to a wider public.

We can, however, leverage the implicit knowledge that experts and coaches have to create our own mini-judges for each topic through a particular system.

Since our goal is to have a generalized way to find these functions for each topic and let people interact with them in real-time, we need a way for experts to collaboratively work together to train a function to approximate their decision-making.

This is where the Bradley-Terry (BT) pairwise preference model comes into play. If we have two choices, i and j in some set D, and we want to find their approximate ranking for all elements in D, we can optimize their individual state functions s_i by comparing the expected result of the BT to our crowdsourced preferences.

$$ P(i \succ j) \;=\; \sigma(s_i - s_j) \;=\; \frac{1}{1 + e^{-(s_i - s_j)}}. $$ (we also scale sigma(s_i - s_j) by each rater's "trust value" to make it more robust)

With these s_i scores for each input, we then turn each score into a probability. Roughly, this equates to "the input i is good roughly p(i) percent of the time," which means we can take the top and the bottom of all of our possible inputs as the "ticks" that are the most important for feedback.

To turn a score into a probability, we use Platt scaling:

$$ \hat p \;=\; \sigma(a s_i + b) \;=\; \frac{1}{1+e^{-(a s_i + b)}}, $$

Since we want a probability, a and b are 1 and 0, respectively. This simplifies our state-probability mapping to

$$ \hat p \;=\; \sigma(a s_i + b) \;=\; \frac{1}{1+e^{-(s_i)}}, $$

Now that we have learned "value mini-judgers," we can use them to evaluate people on their topics in real time. To do this, we use a classification algorithm (that is unique to whatever modality that the topic entails, video, audio, text, et cetera.) to give a confidence (probability) that a certain input is being exhibitied, and then we multiply that confidence by our internal probability to alert the user when we are sure that they are doing something bad or good.

To further improve robustness, I modeled the brain's neuron-action potentials, where in order for an event to fire, you want consecutive inputs (persistence, how many times an input consecutively passes threshold before firing) and hysteresis, where once you fire you become slightly less sensitive to the conditions (the threshold lowers or raises) so that our user doesn't get constantly alerted if they are teetering on the threshold.

Challenges and Achievements

I worked solo and I'm surprised at the quality of the result that came out of it.

First time working with a web application, so there was a big learning curve.

Future

I plan to stick with this project and see if I can use a platform of the #1 education person on substack to crowdsource my solution.

Now, the power of this app comes in its inherent scalability. In our demo, we demonstrated signals for interviews, but with enough crowdsourcing, we can learn the value functions of any number of difficult skills that lack feedback. Imagine a world where you can fail and learn from the comfort of your home, without any social pressures or messy feedback to lead you astray.

That’s the world I envision with EVDojo. If you want to help support living in this world, you can help train our A/B choices on whatever unique skill you can think of, and be a part of a system that can change the way we approach problem-solving, optimization, and improvement.

Built With

Updates

Hunter H. started this project — Oct 19, 2025 11:40 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.