Inspiration

We were inspired by the demo shown at the opening ceremony showcasing different LLMs' abilities to solve the WORDLE. We saw MinuteCryptic as an opportunity to go further by gauging the LLMs' abilities to take slightly illogical or non-deterministic approaches towards converting a clue into an answer, and being able to take a hint when it really is unable to take a guess.

What it does

Clue-Less benchmarks most major LLM models to see how they perform against all 813 Minute Cryptic Puzzles we have access to, tracking their success rate, their score under or over par, and even their hint strategies.

What we learned

While most LLMs are pretty successful at arriving at the answer under par, different LLMs are more or less persistent. For example, while Gemini uses less hints, it also solved fewer problems, while gpt-4o solved every clue, but often required more hints.

Built With

Share this project:

Updates