Bots Against Alignment

Inspiration

We were inspired by the idea of multiple ChatGPT bots playing a game against each other to target an arbitrary alignment goal. Also, ChatGPT tends to tell bad jokes but is often unintentionally funny. This fits well with the Cards Against Humanity play style.

What it does

It's a game webpage, multiple users enter an alignment target. These are randomly appended together to produce an aligner goal. Because of the random append order and the fact this is secret during the game, this makes users proposing reverse prompts etc an interesting tactic. Users then give their bot a prompt to help generate a possible response for the aligner. They can edit this prompt a limited number of times based on how well their bots are performing.

How we built it

It's a webpage with multiple chatgptturbo instances running.

Challenges we ran into

Tying the front end and back end was surprisingly difficult in the limited time frame.

Accomplishments that we're proud of

Multiple bots interacting is pretty awesome. Also the alignment script being a random mismash of user targets is interesting.

What we learned

You can't really treat an LLM like a program it's really more like a very dumb agent.

What's next for Bots Against Alignment

Building in short-term memory to allow bots to adjust behavior to the aligner turn by turn. Letting a human serve as aligner.

Built With

fastapi
node.js
python
render.com
svelte
vercel

Updates

Conor Cox started this project — Apr 15, 2023 07:28 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.