Inspiration

AI has the power to do great good in the world and great harm. While OpenAI and other companies have put in place safeguards to prevent misuse many people have discovered prompt injection attack which circumvent them. DAN(Do Anything Now) inspired me to see if AI could generate its own prompt injection attacks.

By generating prompt injection attacks automatically we can find vulnerabilities in AI faster and patch them before they are abused by malicious users.

What it does

IAN uses GPT4 to generate novel prompt injection attacks, tests it against GPT4 then tests if the response to the attack violates OpenAIs rules. If the attack was successful then it is added to the prompt list and used in the prompt injection attack prompt.

How we built it

There is an Angular web app which hits the GPT4 API. The bulk of this project is prompt engineering.

Challenges we ran into

Mostly prompt engineering and finding known good prompt injection attacks for the initial prompt.

Accomplishments that we're proud of

Running IAN I have discovered ~500 novel prompt injection attacks. Ian has a success rate of 77%.

What we learned

GPT4 is a criminal mastermind if you ask it the right way.

What's next for IAN(Inject Anything Now)

Offer white hat AI hacking to companies which want to discover their vulnerabilities before the public does.

Built With

Share this project:

Updates