Inspiration
Developer tools often assume the AI knows what it's doing, but what if the human doesn't? We built VibeCheck to ensure that AI-assisted code changes come with genuine understanding and not just autocomplete.
What it does
VibeCheck is a knowledge gate and QA loop scaffold for Claude Code that:
- Intercepts code mutations before they're applied
- Evaluates complexity against the user's demonstrated competence
- Generates targeted questions using LLMs that probe understanding of why the change works, not just what it does
- Adapts difficulty across 3 scaffolding levels (conceptual → guided → hinting)
- Tracks competence over time in a persistent YAML model
Small, safe changes pass automatically. Complex changes (concurrency, async patterns, multiprocessing) trigger an interactive QA session that either validates understanding or applies a competence penalty.
How we built it
Built with Python-first architecture, LangChain for structured outputs, and OpenRouter for model access.
Challenges we ran into
- LLM consistency: Structured outputs from LLMs required careful prompt engineering and output parsing
- Question scaffolding: Balancing difficulty across 3 attempt levels without making questions trivial
- Evaluation strictness: Finding the right criteria for different question types (true/false vs. plain english vs. faded examples)
- Test isolation: Mocking the OpenRouter client while maintaining realistic test coverage
Accomplishments that we're proud of
- Spec-first development: Started with
finalized_MVP_spec.mdand built the entire system from it - Clean architecture: Separate concerns between gate logic, Q&A orchestration, and persistence
- Interactive demo:
test_script.pysimulates realistic VibeCheck runs with complex concurrency patterns - Comprehensive testing: a multitude of tests covering gate, normalization, aggregation, and QA loop
What we learned
- LLM integration patterns: Structured outputs via LangChain's
with_structured_output()are powerful but require defensive type handling - Competence tracking: Simple score adjustments (delta-based) with evidence logging creates an auditable learning record
- Python packaging: uv as a unified tool for dependency management, testing, and linting
What's next for VibeCheck
- Vector-based concept hierarchy for nested competence tracking (e.g., 'python.async' under 'python.concurrency')
- Batch mode for validating multiple related changes at once
- Analytics dashboard showing competence trends over time
Built With
- gpt-4o
- langchain
- openrouter
- pyright
- pytest
- python
- pyyaml
- ruff
- uv


Log in or sign up for Devpost to join the conversation.