CogniShield Confidence Is Not Accuracy.

Inspiration

CogniShield began with a shared observation: AI tools often respond with striking confidence, even when the information may be incomplete or incorrect. During casual experimentation, we noticed something subtle but important. When we asked neutral questions, the AI responded carefully. But when we confidently framed a false claim as fact, the tone of the response sometimes shifted. It became softer. More aligned. Less corrective. That moment raised a bigger concern. If AI mirrors the confidence of the user, what happens in professional environments where confidence already signals authority? CogniShield was born from that question.

What it does

CogniShield highlights the gap between fluency and factual accuracy in AI-generated responses.It demonstrates how conversational AI systems are designed to be helpful, polite, and aligned, which can sometimes result in agreement bias. Through a short animated explainer and structured testing, the project shows how phrasing, tone, and user confidence influence AI responses. The goal is not to criticize AI systems, but to encourage users to approach them with awareness and critical thinking.

How we built it

We developed CogniShield as a lightweight Chrome extension that integrates directly into AI chat interfaces. The extension analyzes AI responses in real time and displays a side-panel diagnostic view, highlighting behavioral indicators such as sycophancy score, concessive agreement, emotional anchoring, and PII risk. When the sycophancy score exceeds a defined threshold, CogniShield provides a prompt refinement suggestion to help users reframe their query more critically. Users can accept, modify, or dismiss the suggested prompt, maintaining full control over the interaction. The extension also enables users to fact-check responses and submit corrections, this strengthens both user awareness and system evaluation. Rather than restricting AI outputs, CogniShield promotes responsible usage through guided intervention and user-driven validation.

Challenges we ran into

One major challenge was balance. We did not want to present AI systems as unreliable or deceptive. That would be misleading. At the same time, ignoring their limitations would undermine the purpose of CogniShield. Striking a responsible, evidence-based tone required careful framing. Another challenge was explaining how large language models work without oversimplifying them. These systems generate responses based on patterns and probabilities, not factual verification. Communicating this clearly — without reducing it to “AI just guesses” — required thoughtful language. We also faced a technical challenge: large language models are non-deterministic. The same prompt can produce different responses across attempts. This variability made it difficult to consistently measure behaviors like sycophancy or agreement bias, since outputs are not fixed. Designing a scoring mechanism that accounts for this unpredictability required iteration and testing. Finally, tone was critical. Too dramatic, and the project risks sounding alarmist. Too neutral, and the message loses urgency. Finding the right balance between caution and clarity was one of the most important challenges we faced.

Accomplishments that we're proud of

We’re proud that CogniShield sparks reflection rather than fear. Instead of attacking AI, it reframes the conversation around responsibility. We successfully translated a technical behavior, probabilistic text generation and alignment tuning into something understandable in under a minute. We’re also proud of creating a professional, polished explainer that communicates a nuanced issue clearly and concisely. Most importantly, we started a conversation about critical thinking in the age of intelligent tools.

What we learned

We learned that AI’s power lies in its fluency and that fluency can be persuasive. We learned that humans are naturally influenced by confident language, even when accuracy isn’t guaranteed. We also learned that responsible AI usage isn’t just about better models. It’s about better users. Technology evolves quickly. Judgment must evolve with it.

What's next for Cognishield

Next, we will refine our scoring models to improve reliability detection across varied AI responses and reduce sensitivity to non-deterministic outputs. We plan to incorporate structured user feedback to strengthen evaluation accuracy and introduce clearer contextual explanations when high-alignment or sycophantic behavior is detected. Our long-term objective is to establish CogniShield as a lightweight trust layer for generative AI supporting more deliberate, evidence-based decision-making in professional settings.

Built With

Share this project:

Updates