About The Project

Inspiration Academic integrity tools often punish first and ask questions later. We wanted the opposite: verify the author, not just the answer. The idea for Gotcha came from watching classmates and lecturers struggle—detectors flagging genuine work, students losing trust, and staff drowning in manual checks. Our key insight: a student already has a signature style. If we can baseline that style under invigilated conditions and compare later work to it, we can flag suspicious drift with context and keep a human in the loop.

What it does

Gotcha builds a per-student style profile and uses it to flag anomalies in future submissions. • Baseline mode: Extracts stylometric/code features from a student’s first invigilated submission and stores a style profile. • Compare mode: For new work, computes features again, compares to the baseline, and calls a Vertex AI AutoML Tabular classifier to estimate “AI-assist likelihood.” • Returns a structured JSON summary with: • an AI usage score (probability mapped to %), • style-drift evidence (e.g., sudden jumps in vocabulary sophistication or code formatting), • a cautious verdict (unlikely / partial / likely), and • suggested next steps (e.g., live reproduction, oral check).

How we built it • Feature extraction (server-side) • Essays: token counts, sentence stats, type-token ratio, punctuation & caps ratios, simple “burstiness,” etc. • Code: LOC, comment/blank ratios, identifier lengths, naming patterns (camelCase vs snake_case), import counts, a cyclomatic-proxy, and nesting depth. • Training (Vertex AI AutoML Tabular) • CSVs with label + tabular features only (no raw text/code). • Predefined split via a split column (TRAIN/VALIDATE/TEST) to prevent empty validation errors. • Auto-augmentation (demo-only) to satisfy AutoML’s ~1,000-row minimum. • Endpoint reuse + evaluation printing (AUC/PR, selected confidence points). • Serving • Deployed the trained model to a Vertex Endpoint. • /predict route lets a teacher pick any uploaded file, choose Baseline or Compare, and get JSON results. • We added request timeouts and a baseline fast-path (skip Vertex call when just registering style). • Governance • Stored per-student baselines in MongoDB. • Logged features and model version with each decision to support audits and human review.

Challenges we ran into • AutoML’s 1,000-row minimum • Our initial sets were tiny, so training failed (Too few rows). We solved this by (a) generating richer synthetic data for the demo and (b) adding bootstrapped augmentation with tiny numeric noise to reach the floor. • Split configuration gotcha • AutoML expects predefined_split_column_name="split" (not the older param). Using the wrong kwarg threw an error mid-hackathon. • Endpoint/network hangs • Long-running prediction calls made the server feel frozen. We added AbortController timeouts and made Baseline mode skip predictions entirely. • Distribution shift risks • Invigilated baselines vs take-home assignments can differ in topic/length. We addressed this with threshold tuning and by framing outputs as signals, not verdicts.

Accomplishments that we’re proud of • A full baseline→compare→score loop powered by Vertex AI and usable from a simple web UI. • A feature pipeline that works for both essays and code, without storing raw content in the model. • Clear, human-friendly outputs that surface evidence instead of black-box scores. • Pragmatic engineering: endpoint reuse, proper splits, batch-prediction support, and resilient server behavior.

What we learned • Model ≠ decision. The most valuable piece is the operating threshold and how you present uncertainty to humans. • Simple features go far. Clean, interpretable tabular features + AutoML can be strong baselines. • MLOps matters early. Endpoint reuse, health checks, timeouts, and schema locking saved us from demo-day surprises. • Ethics by design. It’s crucial to keep a human in the loop and to avoid punitive framing—“style inconsistency” is not “cheating.”

What’s next for Gotcha • Richer, privacy-safe features: add transformer-based style embeddings computed locally, store only vectors, not text. • Group-aware splitting: strict student-wise splits to further reduce leakage; optionally per-course baselines. • Adaptive thresholds: course- or cohort-specific calibration using validation curves: • Evaluation dashboards: precision/recall curves, confusion matrices, and drift plots over time: • Policy hooks: integrations for oral checks, IDE keystroke timelines, or LMS plagiarism APIs—always opt-in and transparent. • Custom training path (still on Vertex): if real datasets are small, offer an XGBoost/Scikit-Learn route (no 1k floor) alongside AutoML.

Built With

Share this project:

Updates