PromptLock — Budget-guaranteed context compression + smart model routing
Compress. Route. Save.
PromptLock is a two-layer system that guarantees your context fits within a strict token budget (2K/4K/8K/16K) and automatically routes it to the most cost-effective LLM for the job.
Inspiration
Every developer using LLMs runs into the same painful constraint: context windows are limited and tokens are expensive.
We’ve all been there — debugging a production issue at 2 AM, pasting 500 lines of logs into an LLM, then hitting the context limit. So you truncate… and the real error was in the last 20 lines. Gone.
Or you're doing a code review, pasting an entire diff into a top-tier model and burning $0.50 in tokens for a task a smaller model could handle for $0.02.
The current workflow is broken:
- Naive truncation deletes the most important information
- Manual selection is slow and error-prone
- One-model-for-everything wastes money on simple tasks
So we built PromptLock to solve all three at once.
What it does
PromptLock optimizes LLM usage through two layers:
Layer 1 — Semantic Compression (Budget Guarantee)
PromptLock ingests context (logs, code, diffs, docs) and:
- Splits it into structured chunks (~750 tokens)
- Classifies each chunk (error, log, diff, api, config, docs, etc.)
- Prioritizes content based on your selected task mode (debug/review/build/docs)
- Compresses using Token Company’s
bear-1API - Packs output into a hard budget target: ≤ 2K / 4K / 8K / 16K tokens — always
✅ Set 4096 tokens → get ≤4096 tokens. No exceptions.
Layer 2 — Intelligent Routing (Cheapest-best model)
After compression, PromptLock estimates complexity and routes to the best model automatically:
- Complex debugging with stack traces → GPT-4o / Claude 3.5 Sonnet
- Standard code review → GPT-4o-mini
- Simple documentation lookup → Claude Haiku / Gemini Flash
It also shows cost estimates before you send, so developers can see savings instantly.
Results
Across realistic dev inputs, PromptLock achieves:
- 66% average token reduction from compression
- 20–40% additional savings from routing
- Up to 80% total cost savings with zero semantic loss
Key features
- Budget Guarantee: output always fits the token limit you set
- Task-Aware Modes: debug prioritizes errors, review prioritizes diffs, docs prioritizes documentation
- Transparency: see exactly what was dropped + why, restore with one click
- Protected Patterns: never compresses critical tokens (IPs, ports, URLs, file paths, code fences,
user:/assistant:markers) - Cost Calculator: real-time savings estimation
- Observability: full tracing of compression + routing decisions in Arize Phoenix
- MCP Integration: use directly inside dev workflows (Claude Code / Cursor)
How we built it
PromptLock is a full-stack system designed like a real developer platform.
Backend (FastAPI + Python)
Pipeline:
- Chunker — splits input into structured ~750-token chunks
- Classifier — labels chunks (error/log/diff/api/config/docs) with priority scoring
- Packer — greedy bin-packing algorithm enforcing the strict budget invariant
- Renderer — produces a clean “Prompt Pack”:
System + Context + Key Facts + Actions + Questions
- Router — complexity scoring + RouteLLM model selection + cost comparison
Frontend (Next.js + TypeScript)
We built a devtools-style UI that feels fast and transparent:
- Budget selector (2K/4K/8K/16K) + mode dropdown
- Aggressiveness dial for compression strength
- Before/after token metrics
- Dropped-content report with restore flow
- Copy-ready prompt pack output
Integrations
- Token Company
bear-1— semantic compression that preserves meaning - RouteLLM — intelligent routing to the optimal model
- Arize Phoenix — full observability for chunking/classification/packing/routing
- Model Context Protocol (MCP) — native IDE workflow integration
- LeanMCP — hosted deployment of our MCP server for instant usage
Challenges we faced
1) The budget guarantee problem
Guaranteeing token budgets sounds simple until edge cases appear:
- token counting must be exact
- prompt template overhead must be included
- compression ratios vary depending on content type
- sometimes high-priority chunks don’t fit cleanly
We solved it with a conservative packing algorithm, reserved headroom, and post-compression validation.
2) Classification accuracy
Early versions misclassified logs/config/diffs frequently. We improved this with hierarchical classification: structure → keyword signals → surrounding context patterns.
3) Preserving semantic meaning
Compression can destroy important details (ports, IP addresses, file paths). We implemented protected patterns to keep critical tokens intact.
4) Real-time performance
Large inputs initially took 3–5 seconds. We optimized with parallel classification, caching, streaming updates, and debounced UI handling.
Accomplishments we’re proud of
- Budget guarantee works reliably (fuzz-tested across thousands of random inputs)
- 66% average compression without meaning loss
- A demo that instantly proves why task-aware compression beats truncation
- Full tracing in Phoenix for every decision
- MCP integration that makes PromptLock usable without leaving your editor
What we learned
- Token counting is harder than it looks across model ecosystems
- Transparency builds trust (developers want to know what got removed and why)
- Most dev context is noise — task-aware prioritization is the real win
- Routing multiplies savings beyond compression alone
- MCP makes LLM tooling feel native to real workflows
What’s next
Short-term:
- Browser extension for one-click compression anywhere
- VS Code extension with inline compression
- Multimodal/image context support
- Team analytics dashboard
Medium-term:
- Learn from user restore patterns to improve prioritization
- Custom mode creation (bring your own priority maps)
- API key management + shared team deployments
Long-term:
- Fine-tuned compression model for developer context
- Predictive compression before you hit the context limit
- Full multi-turn conversation compression
- Enterprise deployment with SSO + audit logs
MCP Integration (Live on LeanMCP)
PromptLock is deployed as a live MCP server on LeanMCP so developers can call it directly from IDE workflows.
- MCP Endpoint: https://promptlock-nexhacks.leanmcp.app/mcp
- Health Check: https://promptlock-nexhacks.leanmcp.app/health
- Dashboard: https://ship.leanmcp.com/projects/2b4ec4c3-6f3c-4856-b12b-4a5c53435f2e

Log in or sign up for Devpost to join the conversation.