Inspiration

As AI tools become more widely used and shipped, the environmental cost behind every prompt and computation multiplies. According to EESI, large data centers can consume up to 5 million gallons per day, equivalent to the water use of a town populated by 10,000 to 50,000 people. Most users don't realize that inefficient prompts can contribute to this problem by leading to unnecessary computation, increasing consumption.

What it does

A pipeline that takes messy user prompts and rewrites them into tight, well-framed prompts — using two small local models and a retrieval layer for style hints.

Stack:

  • Extractor: qwen2.5:3b — pulls a 6-field skeleton (INTENT, TASK, SUBJECT, OUTPUT, CONSTRAINTS, PROMPT)
  • Reviser: gemma3:4b — rewrites using the skeleton + retrieved exemplars
  • Retrieval: HumanDelta vector DB over fka/awesome-chatgpt-prompts + a custom corpus
  • Scorer: heuristic rubric (concision, preservation, no-leak, no-headers, compression) over 12 labeled test prompts

How It Works

  1. User gives a raw prompt (could be 3 words or 3 paragraphs)
  2. Extractor distills it into a structured skeleton
  3. Gating logic decides whether to retrieve (length, intent, similarity threshold)
  4. Reviser rewrites — borrowing structure only from retrieved examples, never nouns
  5. Scorer compares "with retrieval" vs "without retrieval" across categories

What We Learned

  • Small models need scaffolding, not trust. A 3B model won't follow a schema just because you hand it one. It needs concrete examples, canonical order enforcement, and a regex safety net.
  • RAG is portable leverage. Adding new style patterns = dropping text into a corpus. No fine-tuning, no retraining.
  • Retrieval isn't always better. For trivial inputs ("tie a tie"), retrieved "act as a stylist" templates actively hurt. Gating matters more than retrieval quality.
  • Structure-only borrowing beats content borrowing. Pulling "Act as a " from an exemplar is safe. Pulling nouns, verbs, or domain words leaks contamination (Ethereum prompts drifting into messenger-app territory).
  • Deterministic fallbacks save LLM weaknesses. Regex constraint sweeping caught "3-4 hours a week" that the extractor kept dropping.

Challenges

1. The extractor kept copying its own examples

Give qwen an in-context example with write a python function... and ask it to process "tie a tie" — and the skeleton comes back with PROMPT: write a python function. Fixed by: two diverse examples, explicit "NEVER copy values verbatim," and a short-input rule ("if ≤5 words, copy to PROMPT verbatim").

2. The reviser over-prompted trivial tasks

"Act as a professional stylist to tie a tie" — nobody wants that. Added a QUICK TEST to Rule 1: role prefix only for tasks needing specialized professional expertise. Everyday tasks get step-by-step framing instead.

3. Synonym leaks

Extractor would turn "tie" into "necktie" in the SUBJECT field, and the reviser would propagate it. Hard-coded "NEVER substitute synonyms" at extraction and "if original says X, don't write Y" at revision.

4. Dropped load-bearing constraints

"I can work out 3-4 hours a week" → skeleton CONSTRAINTS: none → revised prompt drops the budget. Fixed in layers:

  • Added CONSTRAINTS field to skeleton
  • Added SCAN-FOR-CONSTRAINTS list to extractor prompt
  • Added _sweep_constraints() regex safety net that injects missed constraints before revision

The Loop That Shipped It

Every failure mode followed the same rhythm:

  1. Run test prompt
  2. Spot a new failure (schema echo, noun leak, dropped constraint)
  3. Tighten the rule
  4. Re-run the full test set — does it regress anything else?
  5. Add a safety net in code if the rule can't be trusted

The scorer made step 4 cheap. Without it, I'd have been guessing whether each fix actually helped.

Takeaway

Small local models + RAG can produce strong results, but only with defensive engineering. The LLM is one component in a pipeline — schema cleaners, regex sweepers, retrieval gates, and scoring loops are the other 70%.

Built With

  • fast-api
  • human-delta
  • jupyter-notebook
  • neon-db
  • next-js
  • ollama
  • python
Share this project:

Updates