veri-TA-serum: Deception Mirror (Made by Tanmay and Ayan)

Inspiration

We noticed that people often make bold claims about money, fitness, or relationships that collapse under scrutiny.
Traditional apps track behavior, but very few challenge self-deception directly.
Inspired by research on model honesty, interpretability, and adversarial debate, we designed a system that pairs linear probing of activations with symbolic sanity checks and debate agents.
Our vision: help users test their own narratives with supportive reframes, evidence, and counter-perspectives.


What it does

  • Claim Input: Accept text or voice claims, with optional context.
  • Self-Deception Radar: Probes hidden activations to assign a deception risk score.
  • Counter-Narrative Generation: Abstracts claims into schemas, normalizes units, and executes domain checks (finance, fitness, career, relationships, famous personas, history, medicine).
  • Symbolic Program Execution: Abstracts claims into schemas, normalizes units, and executes domain checks (finance, fitness, career, history, relations).
  • Debate Mode: Advocate vs. Skeptic agents stress-test assumptions and produce verdicts.
  • Mirror Log: Stores abstractions, probe scores, debates, and checks. Users can save, delete, or export.
  • Vertical Selection: Tailors checks and abstractions by domain (finance, fitness, history, career, relations).

How we built it

  • Frontend: Next.js (App Router) with TypeScript, Tailwind CSS, and shadcn/ui components.
  • AI Core: Genkit.ai flows using GPT-OSS models for probing, abstraction, and debates.
  • Validation: Zod schemas for input/output contracts; integrated with React Hook Form.
  • State & Logs: Managed via custom hooks like useMirrorLog.
  • Animations: Framer Motion; Icons from lucide-react.

Probe math

We trained a lightweight logistic probe to estimate deception risk:

$$ \text{risk} = \sigma!\left(\mathbf{w}^\top \mathbf{h} + b\right), \quad \sigma(x) = \frac{1}{1+e^{-x}} $$

  • Calibration: Temperature scaling + Platt scaling
  • Final Score: Blends probe and symbolic results

$$ s_{\text{final}} = \alpha\, s_{\text{probe}} + (1-\alpha)\, s_{\text{symbolic}}, \quad \alpha \in [0,1] $$

Symbolic checks

  • Finance: normalize to monthly units, check compounding and rate plausibility.

    • Example: if $r_{\text{monthly}} > 30\%$ without leverage → flag risk.
  • Fitness: caloric feasibility.

    • Example: Claiming to lose $10 \text{ kg}$ in $3$ days requires

$$ 10 \times 7700 = 77{,}000 \ \text{kcal deficit} \approx 25{,}667 \ \text{kcal/day} $$

→ violates physiology.


Challenges we ran into

  • Activation access: Aligning tokenization and hidden states across inference runs.
  • Calibration: Mapping probe scores to meaningful real-world thresholds.
  • Coverage: Designing symbolic templates that handle messy human claims.
  • Latency: Balancing debate depth with response times.
  • UX Clarity: Ensuring reframes are constructive, not adversarial.
  • Privacy: Giving users total control of Mirror Logs.

Accomplishments that we're proud of

  • Integrated linear probes + symbolic checks + debate in one pipeline.
  • Built vertical-specific abstractions (finance, fitness, history, career, relations).
  • Delivered a clean Next.js + Tailwind + shadcn/ui app with Genkit flows.
  • Designed a Mirror Log for transparency, control, and auditability.

What we learned

  • Honest-by-design AI emerges from hybrid symbolic + neural methods.
  • Simple probes, when calibrated, are surprisingly effective for deception triage.
  • Debates with unit conversions make users more open to revising claims.
  • Domain-specific abstractions reduce false positives and increase usefulness.

What’s next for veri-TA-serum

  • Expand unit libraries and symbolic templates per vertical.
  • Support on-device probing for privacy-preserving inference.
  • Build an evaluation harness with deception-specific benchmarks.
  • Add plugin API for extensibility (e.g., medicine, politics).
  • Explore human-in-the-loop adjudication for high-stakes claims.

References & Related Work

Self-supervised Analogical Learning using Language Models Zhou, B., Jain, S., Zhang, Y., Ning, Q., Wang, S., Benajiba, Y., & Roth, D. (2025). Self-supervised Analogical Learning using Language Models.

Among Us: A Sandbox for Agentic Deception Golechha, S., & Garriga-Alonso, A. (2025). Among Us: A Sandbox for Agentic Deception.


Key takeaway: veri-TA-serum blends neural probes, symbolic checks, and adversarial debate to help users challenge their own claims — turning self-deception into an opportunity for reflection.

Built With

Share this project:

Updates