Inspiration -When we prototype multi-agent systems, each agent tends to “win” its local objective—great discounts, friendly clauses, ambitious forecasts—while accidentally drifting from enterprise goals (policy, risk, budget). We wanted a design where context itself is engineered: pinned policies, explicit token budgets, and a post-hoc critic that rewrites outputs to align with the organization’s OKRs.

The Wildcard category felt right because the idea isn’t a new model—it’s a reusable engineering pattern for any agentic workflow.

What it does

• Context is a first-class surface area. If you budget tokens across tiers, you can predictably preserve what matters (policy) and prune what doesn’t (ephemera). • Critic passes beat prompt bloat. A lightweight, policy-only Global Policy Critic (GPCritic) can auto-correct local wins that violate enterprise rules. • CPU-friendly demos are viable. With Ollama and gpt-oss-20B, we delivered a credible local story; for speed, you can substitute a quantized instruct model, but the design stays the same.


How we built it

Runtime & Model • Model: gpt-oss-20b via Ollama (HTTP: /v1/chat/completions, with native /api/chat fallback). • Service: FastAPI with three agents: negotiation, compliance, forecast. • Config: .env holds enterprise guardrails (policy thresholds, prohibited/required clauses) and layer budgets. Advanced Context Engineering • We assemble a layered prompt on every call:

  1. GPC — Global Policy Context (pinned): OKRs, budget variance limits, prohibited & required clauses.
  2. DSC — Domain Strategy Context: category playbook (summarized).
  3. TSC — Task/Session Context: recent turns; recency-biased and summarized to prevent overfit.
  4. ETC — Ephemeral Tool Context: one-shot payloads (quotes, budgets). • Token budgets (example): b=(bGPC,bDSC,bTSC,bETC)=(0.25,0.25,0.40,0.10),∑bi=1\mathbf{b} = (b_{\text{GPC}}, b_{\text{DSC}}, b_{\text{TSC}}, b_{\text{ETC}}) = (0.25, 0.25, 0.40, 0.10),\quad \sum b_i = 1b=(bGPC,bDSC,bTSC,bETC)=(0.25,0.25,0.40,0.10),∑bi=1 Pruning order under pressure: ETC→TSC→DSC→GPC\text{ETC} \rightarrow \text{TSC} \rightarrow \text{DSC} \rightarrow \text{GPC}ETC→TSC→DSC→GPC (GPC survives). • Global Policy Critic (GPCritic): We re-call the model with only GPC+DSC\text{GPC} + \text{DSC}GPC+DSC plus the agent’s draft. The critic: o Inserts required warranties, o Rewrites or removes prohibited clauses, o Enforces budget variance explanations or trims plans. Objective sketch (conceptual) We treat alignment as a constrained multi-objective: min⁡draft  α V(draft,GPC)+β T(prompt)s.t.layer_tokens≤bi⋅context_budget\min_{\text{draft}} \; \alpha\,V(\text{draft}, \text{GPC}) + \beta\,T(\text{prompt}) \quad \text{s.t.}\quad \text{layer_tokens} \le b_i \cdot \text{context_budget}draftminαV(draft,GPC)+βT(prompt)s.t.layer_tokens≤bi⋅context_budget where VVV measures policy violations and TTT proxy-counts tokens. The critic reduces VVV if the primary agent over-optimized locally. Endpoints • POST /agent/negotiation → price/terms proposal (+ warranties if missing) • POST /agent/compliance → clause check & rewrite • POST /agent/forecast → plan vs budget with trade-offs • Browser demo.html calls the three endpoints directly (CORS enabled). ________________________________________ 🔬 What’s actually happening (mini examples) • Negotiation: Agent chases an 18% discount. GPCritic ensures required warranties and rejects any risky termination language. Compliance: If a clause hints at “terminate at will without cure”, the critic rewrites it to meet policy. • Forecast: If planned spend exceeds budget by >τ%> \tau\%>τ% (from .env), response must include trade-offs or bring the plan within threshold

Challenges we ran into

• Local vs Enterprise tension: It’s easy to write longer prompts; it’s hard to guarantee policy survives pruning. Budgets + pruning order solved this. • Latency on CPU: gpt-oss-20B on CPU is heavy. We kept the pattern intact so teams can swap models or deploy GPU later without changing the alignment logic. • Prompt creep: We added rolling summaries to TSC so the system doesn’t overfit to recent chatter.

Accomplishments that we're proud of

• Clear separation of concerns: agents optimize locally; the critic enforces globally. • Deterministic knobs: budgets & thresholds live in .env, so teams can tighten alignment without refactoring. • A tiny, explainable blueprint that other teams can lift into their own agentic stacks.


What's next for ProcureSense: Advanced Context Engineering(gpt-oss-20B) :• Quantitative evals (violation counters, token accounting) baked into CI.

• RAG with policy-aware re-ranking so DSC remains concise but high-signal. • Red-team suites to probe adversarial clauses and long-tail compliance issues

Built With

  • fastapi
  • gpt-oss-20b
  • huggingface
  • ollama
  • python
Share this project:

Updates