IGGY

Project Introduction

Dear Grandma is all about pushing AI agents to their limits, responsibly. Our goal is to break, probe, and really understand how these models behave when challenged, so they can be made safer before reaching the real world.

Project Aims

We aim to tackle the hidden weaknesses of advanced AI, working as a responsible red team. Our main goal is to find and understand critical risks and behaviours in modern models including blackbox ones.

Given 7 blackbox animal agents, we probe their behaviour with adversarial queries. Each agent is mapped to its technical framework, underlying model, and architectural design based on observable responses, latency, and error patterns.

Methodologies

Our methodology emphasises comprehensive coverage and measurable outcomes. We conduct repeated attacks across all relevant categories and document the Attack Success Rate (ASR) for each agent, ensuring findings are quantifiable and reproducible.

Red-Team AI Security Testing: Uncovering breakthrough vulnerabilities with the following rigorous methods:

PAIR
Role-play
STI Jinja
DAN
Please Attack

Success Rate Calculation

Flowchart order of response parsing to detect whether or not the prompt successfully jailbroke the system.

RegEx: detecting of key phrases as initial processing of response heuristics
Semantic Analysis: Sentence Transformer all-MiniLM-L6-v2 to detect negative semantics e.g. I’m sorry I can’t...
LLM as a Judge: Calling Gemini-2.0-flash-001 with attack prompt and response to determine if the prompt succeeded in jailbreaking.

Built With

all-minilm-l6-v2
bedrock
gemini-2.0-flash-001
llama3.3
novapremier

Updates

Harrish Ayyanar started this project — Nov 16, 2025 09:59 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.