ContextShop

Inspiration

Context engineering — not model size, not prompts — is the #1 lever for agent quality in 2025.

"Context engineering is the delicate art and science of filling the context window with just the right information for the next step." — Andrej Karpathy

"Context engineering has become effectively the #1 job of engineers building AI agents." — Cognition team (Devin)

Yet most retail agents still use naive RAG: same query → same retrieval → same mediocre answer every time. No memory. No learning. We wanted to prove this gap with numbers — show that context engineering quality directly equals answer quality.

What it does

ContextShop is a retail Q&A agent that self-improves its context over time using Qdrant as its memory backbone.

A user chats with a shopping assistant for sports & outdoor gear. Under the hood, three Qdrant collections feed a token-budgeted context window:

products/ — 5K product catalog, searched with Qdrant's hybrid SPLADE sparse + dense vectors (RRF fusion)
user_preferences/ — facts learned about the user each turn (budget, brands, terrain) via LLM extraction
episodic_memory/ — compressed summaries of past conversation turns

The same query "I need a good running shoe" at Turn 0 vs Turn 5:

	Turn 0 (cold)	Turn 5 (warm)
Knows budget?	No — asks for it	Yes
Knows brand?	No	Yes — Nike/Adidas
Knows terrain?	No	Yes — trail/muddy
Retrieval score	0.290	0.417 (+44%)

How we built it

Python backend implementing all 4 context engineering strategies:

SELECT — Qdrant hybrid search: SPLADE sparse vectors (exact brand/spec matching) + dense vectors (semantic meaning), fused with RRF. Outperforms pure cosine similarity for product queries.
WRITE — After each turn, an LLM extracts user preference facts (budget, brands, use-case) and upserts them to user_preferences/.
COMPRESS — Every 3 turns, conversation history is summarized and upserted to episodic_memory/.
ISOLATE — Three separate Qdrant collections with session-namespaced payload filtering. No bleed between users or sessions.

A token budget manager allocates the 8K context window: 60% products / 25% episodic memory / 15% preferences.

The Streamlit UI shows a live metrics panel: retrieval scores, memory chunk counts, and a score progression chart — so you can watch the context improve in real time.

Challenges we ran into

Qdrant payload filtering requires explicit indexes — session isolation broke silently until we added keyword indexes on session_id at collection setup time.
LLMs wrap JSON in markdown fences even when instructed not to — added fence stripping to the fact extraction parser.
RRF fusion scores are normalized differently from cosine scores — reframed the improvement metric around answer quality, not raw score delta.

Accomplishments that we're proud of

Clean proof of the thesis: same query, measurably better answer after 5 turns
Qdrant hybrid search (SPLADE + dense) working natively — not just cosine similarity
Full session isolation: each conversation namespaced, no memory bleed between users
One-command setup: git clone + bash setup.sh → running app in ~2 minutes

What we learned

Qdrant's native hybrid search with SPLADE sparse vectors significantly outperforms pure dense search for retail queries. Exact brand names, SKUs, and spec terms are caught by the sparse component while semantic meaning is handled by dense vectors.

Context engineering is systems-level work, not prompt work. The interesting decisions are about token budget allocation, when to compress, what to persist, and how to isolate concerns — not what words to put in the system prompt.

What's next for ContextShop

Temporal validity for product facts (prices change, stock runs out)
Multi-modal: image embeddings for visual product search
User feedback loop: thumbs up/down drives re-ranking and re-indexing
Cross-session memory: opt-in persistent user profile across conversations

Built With

claude-haiku
claude-sonnet
fastembed
openrouter
python
qdrant
streamlit

Submitted to

GenAI Zürich Hackathon 2026

Created by

My Contribution — Mohit Mandawat (System Design & Architecture Lead)
I owned the end-to-end system design and architectural decisions for ContextShop, from initial concept through to the working architecture that Ayush implemented.
Conceptual Foundation: Defined the core thesis — that context engineering quality directly determines agent answer quality — and designed the experiment to prove it with measurable metrics (Turn 0 vs Turn 5 retrieval improvement).
Architecture Design: Designed the three-collection Qdrant memory architecture (products/, user_preferences/, episodic_memory/) with session-namespaced payload filtering for user isolation. Specified the hybrid search strategy combining SPLADE sparse vectors with dense vectors using RRF fusion, choosing this over pure cosine similarity based on retail query characteristics (exact brand names, SKUs, and spec terms).
Context Engineering Strategy: Mapped all four context engineering primitives (SELECT, WRITE, COMPRESS, ISOLATE) to concrete system components. Designed the token budget allocation model (60% products / 25% episodic / 15% preferences within an 8K window) and the compression trigger policy (summarize every 3 turns).
Technical Decision-Making: Made key architectural calls including Qdrant keyword indexes on session_id for reliable filtering, the LLM-based preference extraction pipeline with JSON fence-stripping, and the metric reframing around answer quality rather than raw score delta.
Project Direction: Drove the overall product vision — positioning ContextShop as a proof-of-concept that context engineering is systems-level work, not prompt work — and defined the roadmap (temporal validity, multi-modal search, feedback loops, cross-session memory).
Aayush handled the implementation, translating the architecture into working Python code, building the Streamlit UI with the live metrics panel, and handling deployment/setup scripting.

MOHIT MANDAWAT
Aayush Kumar

Updates

MOHIT MANDAWAT started this project — Mar 17, 2026 06:41 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.