ContextShop

Inspiration

Context engineering — not model size, not prompts — is the #1 lever for agent quality in 2025.

"Context engineering is the delicate art and science of filling the context window with just the right information for the next step." — Andrej Karpathy

"Context engineering has become effectively the #1 job of engineers building AI agents." — Cognition team (Devin)

Yet most retail agents still use naive RAG: same query → same retrieval → same mediocre answer every time. No memory. No learning. We wanted to prove this gap with numbers — show that context engineering quality directly equals answer quality.

What it does

ContextShop is a retail Q&A agent that self-improves its context over time using Qdrant as its memory backbone.

A user chats with a shopping assistant for sports & outdoor gear. Under the hood, three Qdrant collections feed a token-budgeted context window:

  • products/ — 5K product catalog, searched with Qdrant's hybrid SPLADE sparse + dense vectors (RRF fusion)
  • user_preferences/ — facts learned about the user each turn (budget, brands, terrain) via LLM extraction
  • episodic_memory/ — compressed summaries of past conversation turns

The same query "I need a good running shoe" at Turn 0 vs Turn 5:

Turn 0 (cold) Turn 5 (warm)
Knows budget? No — asks for it Yes
Knows brand? No Yes — Nike/Adidas
Knows terrain? No Yes — trail/muddy
Retrieval score 0.290 0.417 (+44%)

How we built it

Python backend implementing all 4 context engineering strategies:

  • SELECT — Qdrant hybrid search: SPLADE sparse vectors (exact brand/spec matching) + dense vectors (semantic meaning), fused with RRF. Outperforms pure cosine similarity for product queries.
  • WRITE — After each turn, an LLM extracts user preference facts (budget, brands, use-case) and upserts them to user_preferences/.
  • COMPRESS — Every 3 turns, conversation history is summarized and upserted to episodic_memory/.
  • ISOLATE — Three separate Qdrant collections with session-namespaced payload filtering. No bleed between users or sessions.

A token budget manager allocates the 8K context window: 60% products / 25% episodic memory / 15% preferences.

The Streamlit UI shows a live metrics panel: retrieval scores, memory chunk counts, and a score progression chart — so you can watch the context improve in real time.

Challenges we ran into

  • Qdrant payload filtering requires explicit indexes — session isolation broke silently until we added keyword indexes on session_id at collection setup time.
  • LLMs wrap JSON in markdown fences even when instructed not to — added fence stripping to the fact extraction parser.
  • RRF fusion scores are normalized differently from cosine scores — reframed the improvement metric around answer quality, not raw score delta.

Accomplishments that we're proud of

  • Clean proof of the thesis: same query, measurably better answer after 5 turns
  • Qdrant hybrid search (SPLADE + dense) working natively — not just cosine similarity
  • Full session isolation: each conversation namespaced, no memory bleed between users
  • One-command setup: git clone + bash setup.sh → running app in ~2 minutes

What we learned

Qdrant's native hybrid search with SPLADE sparse vectors significantly outperforms pure dense search for retail queries. Exact brand names, SKUs, and spec terms are caught by the sparse component while semantic meaning is handled by dense vectors.

Context engineering is systems-level work, not prompt work. The interesting decisions are about token budget allocation, when to compress, what to persist, and how to isolate concerns — not what words to put in the system prompt.

What's next for ContextShop

  • Temporal validity for product facts (prices change, stock runs out)
  • Multi-modal: image embeddings for visual product search
  • User feedback loop: thumbs up/down drives re-ranking and re-indexing
  • Cross-session memory: opt-in persistent user profile across conversations

Built With

  • claude-haiku
  • claude-sonnet
  • fastembed
  • openrouter
  • python
  • qdrant
  • streamlit
Share this project:

Updates