As I was building RiffRoll, I realized that the LLM I was using through Groq was not very funny. It kept talking about existential dread (which is probably a different problem with LLMs), and was sometimes so absurd as to not make sense.
To solve this problem, I started testing different language models to see, which was funniest. However, I don't trust myself to know what's funny and so I started feeding the results back to other models to get a peer consensus.
Well, that turned into an entirely separate project, and I ended up building and submitting another hackathon project, https://modelmash.site
In case you're wondering, xAI's Grok is the best at one-liners and dark humor, having been trained on billions of tweets, and some Deepseek models are better at observational humor. Claude 3.5 haiku seemed to be best at puns.
Of course, this warrants further study!
Log in or sign up for Devpost to join the conversation.