Inspiration
These days, AI systems are very effective at agreeing with users. While this improves fluency and usability, it introduces a serious issue: confidence is often mistaken for correctness.
Many incorrect answers, whether in exams, design decisions, or logical reasoning, appear convincing at first glance but fall apart under closer inspection. Most AI tools behave as validators rather than skeptics, whereas human reasoning improves through challenge, debate, and counter arguments.
This led to a simple but powerful question: What if an AI tried to disprove me before helping me be right?
That question became the foundation of REFUTE - a system designed to apply adversarial reasoning, accepting answers only after they survive logical stress tests.
What it does?
REFUTE is an adversarial reasoning engine where users submit a claim, solution, or response. Instead of validating the input, the system actively attempts to refute it by:
~Exposing hidden assumptions ~Generating edge cases and counter examples ~Identifying logical inconsistencies and contradictions
After each challenge round, users can revise their response. Only when the input passes multiple rounds of adversarial analysis does the system issue an acceptance verdict, along with a step-by-step logical justification.
How I built it
REFUTE is built as a layered web application with a structured reasoning pipeline. User inputs are processed by a backend service that constructs adversarial prompts for the Gemini 3 API. Gemini 3 is instructed to return responses in a strict JSON schema, separating verdicts, arguments, counter-arguments, and reasoning steps.
The backend validates and parses this structured output before storing results and rendering them in the frontend. This approach ensures that AI reasoning remains controlled, explainable, and auditable rather than free-form.
Challenges I ran into
One major challenge was preventing the AI from agreeing too quickly with the user’s input. Designing prompts that consistently encouraged skepticism and counter-reasoning required multiple iterations. Another challenge was enforcing structured JSON outputs from a generative model while maintaining reasoning quality.
Accomplishments that I'm proud of
~Built a working adversarial reasoning engine rather than a standard AI assistant. ~Successfully transformed Gemini 3 into a logical challenger, not just a responder. ~Designed an explainable system that separates verdicts, arguments, and reasoning
What I learnt
I learnt that strong AI systems are not defined by agreement, but by their ability to challenge assumptions. Structured prompting and schema enforcement significantly improve reliability, transparency, and trust in AI-generated reasoning.
What's next for REFUTE
Next, I plan to add scoring for reasoning strength, domain-specific reasoning modes (education, software design, policy analysis), and support for collaborative debates where multiple claims can be evaluated simultaneously.
Log in or sign up for Devpost to join the conversation.