NexPath

Inspiration

Vibe coding has changed how software gets built. AI coding agents can generate a whole feature from a one-sentence prompt — but the speed of generation often outpaces the discipline of process. We kept watching ourselves skip reviews, forget tests, and ship without acceptance checks. Not out of laziness — out of momentum.

We'd been researching MCP and coding agent workflows for a while, and during the AI Hackfest's "Hacking with GitHub Copilot + MCP" workshop, the specific idea clicked: what if we could build a tool that gives developers meaningful direction and helpful suggestions right inside their coding sessions — matching their momentum instead of fighting it? NexPath was built from that point — during the hackathon — as an independent project.

What it does

NexPath is a behaviour-guidance system that sits alongside AI coding agents and surfaces a decision session at the right moments — giving developers meaningful input, helpful direction, and interesting suggestions without ever forcing their hand. It is fully supported on Claude Code as of v0.1.1, with config detection and MCP registration implemented for Cursor, Windsurf, Cline, Roo Code, KiloCode, and OpenCode (end-to-end testing for those agents is planned for v0.1.3).

Captures every prompt via an MCP server that the agent runs as a child process
Classifies the prompt into one of 8 development stages (idea → PRD → architecture → task breakdown → implementation → review/testing → release → feedback loop) using a two-tier cascade (keyword matching → TF-IDF scoring) with a separate LLM cross-confirmation before firing
Detects the right moment — stage transitions, absence signals, low-confidence flags — and fires a short, non-intrusive terminal popup with relevant suggestions
Offers 3 levels of easier options — pick one, hit "Show simpler options →", or skip (replayed later by nexpath optimize)
Calibrates tone to the developer via a Nature + Mood classifier reading the last 5–20 prompts, shifting phrasing between beginner, cool_geek, hardcore_pro, and pro_geek_soul
Stays fully local — all prompts, session state, and skipped-session queue live in a SQLite DB at ~/.nexpath/prompt-store.db. No telemetry, no cloud.

How we built it

Runtime: Node.js 18+, TypeScript (ESM), built with tsc and run via tsx in dev
CLI: commander for the command tree (install, uninstall, init, optimize, status, log, config, store), @clack/prompts for the terminal UI
MCP server: @modelcontextprotocol/sdk over stdio. Each AI coding agent spawns nexpath-serve as a child process and calls the capture_prompt tool on every user message
Storage: sql.js (WASM SQLite) with a custom schema (5 tables: prompts, config, projects, session_states, skipped_sessions); 100 MB soft ceiling with age-based eviction
Classifier cascade:
- Tier 1 — keyword matcher (<1 ms, confidence ≥ 0.65 short-circuits)
- Tier 2 — TF-IDF via the natural library (<5 ms, confidence ≥ 0.40 short-circuits)
- A third tier using MiniLM embeddings (@xenova/transformers) is implemented in source but not yet wired into the production pipeline
Stage 2 cross-confirmation: gpt-4o-mini via the openai SDK, CO-STAR-style prompt, JSON-only response, confidence gate at 0.60 before firing the decision session
Pinch generator: a separate gpt-4o-mini call produces the 2–3 word header that opens each decision session; on any failure it falls back to a static label
Advisory pipeline: the full nexpath auto pipeline — classifier → session state → absence detection → Stage 2 LLM → decision session UI — runs automatically via a Claude Code UserPromptSubmit hook between every user prompt
Profiling: a lightweight 2-axis NatureClassifier (precision × playfulness) plus a rule-based MoodClassifier (focused / rushed / excited / frustrated / methodical / casual) running over the last 5–20 prompts
Language detection: tinyld after a preprocessing pass that strips code-heavy lines, splits camelCase, and drops English programming keywords
Logging: structured file-based logger at ~/.nexpath/nexpath.log with 5 MB rotation and nexpath log command for inspection
Validation: runtime schemas with zod; unit tests with vitest (~9,050 test lines for ~5,430 source lines — 1.67x ratio)
Multi-agent install: one nexpath install command detects every supported agent on disk and writes the correct MCP entry — different schema per agent (standard, Cline/Roo, KiloCode, OpenCode)

Challenges we ran into

Prompt-intent classification under latency budget: we needed every captured prompt classified in well under 30 ms to stay invisible. Pure LLM calls were too slow and too expensive. The two-tier cascade (keyword → TF-IDF) with confidence thresholds lets us short-circuit ~80% of prompts at Tier 1/2 and keep the pipeline under 5 ms for local classification.
Detecting "the right moment" without being annoying: firing the decision session too often kills trust; firing it too rarely makes the tool invisible. We built a two-stage gate — fast local signals (stage transition, absence flags, low-confidence + active absence) on Stage 1, then a single gpt-4o-mini cross-confirmation on Stage 2 that returns a strict JSON verdict. We also enforce "once per event per session" so a given transition can never re-fire.
Rendering interactive UI from a hook subprocess: the advisory pipeline runs as a hook subprocess, which means stdin/stdout are owned by the parent agent. We had to build a custom TTY select function (TtySelectFn.ts) that opens /dev/tty directly to render the decision session UI — with error capture, cleanup, and graceful fallback when TTY isn't available.
Language detection on mixed code/English prompts: tinyld returned English for almost every prompt because developers sprinkle const, function, identifiers, etc. into non-English text. We wrote a preprocessor that drops code-heavy lines (>50% non-letter), splits camelCase tokens, and filters a small English-keyword stoplist — plus a sticky-fallback and accuracy/gap threshold to avoid flip-flopping.
MCP install across 7 agents: each agent stores its MCP config in a different path, with a different schema (some need disabled and alwaysAllow, OpenCode uses an array command under a mcp key, KiloCode requires a type: 'stdio' field, Claude Code wants the CLI). Writing a clean detection + install + uninstall layer with full Windows/macOS/Linux path resolution took more time than we budgeted.
Keeping everything local: we refused to add cloud sync or telemetry, which meant doing retention, eviction, and cleanup in sql.js ourselves with crash-safety on every prompt capture.

Accomplishments that we're proud of

A fully working advisory pipeline on Claude Code — the decision session fires automatically between user prompts via the UserPromptSubmit hook, with no manual intervention required
A nexpath install command that registers the MCP server correctly across Claude Code, Cursor, Windsurf, Cline, Roo Code, KiloCode, and OpenCode — each with its own config format
A two-tier classifier that hits its latency budget (<5 ms) and degrades gracefully when no LLM key is available
A decision-session UX with a 3-level easier-options cascade, ANSI-styled pinch labels, Ctrl-C = skip (recorded for later), and no back-navigation — it respects developer flow
nexpath optimize — replays every skipped decision session so nothing is permanently lost
nexpath status — shows MCP connections, hook registration, prompt store stats, and config at a glance
A privacy posture we're genuinely happy with: everything local, 100 MB soft ceiling, nexpath store delete / nexpath store prune as first-class commands, no network calls except the optional Stage 2 and pinch LLM calls
~9,050 lines of tests for ~5,430 lines of source (1.67x ratio) across classifier, store, decision-session, pinch generator, language detection, and CLI

What we learned

Small models + cheap heuristics in a cascade beat one big model call when you have a latency budget and a bounded decision space
"Non-intrusive" is a UX invariant, not a feature. Every design choice — pinch labels, 3-level cascade, skip-without-punishment, optimize replay — comes from treating interruption as a cost
The MCP spec is well-designed, but real-world agent configs are a zoo. An install command is only as good as its platform-detection and JSON-merge code
Rendering interactive UI from a subprocess that doesn't own the terminal is harder than it sounds — TTY handling on macOS, Linux, and Windows each have their own edge cases
Calibrating tone to the developer (nature × mood) made the terminal prompts feel less like a compliance checklist and more like a colleague's nudge. This mattered more than we expected

What's next for NexPath

v0.1.2 — Fix existing MCP server issues and stabilise the advisory pipeline for production reliability
v0.1.3 — Expand end-to-end testing and support to Cursor, Windsurf, Cline, Roo Code, KiloCode, and OpenCode
Publish to npm as nexpath and nexpath-serve so npm install -g nexpath + nexpath install is the full onboarding path
Wire the MiniLM embedding tier into the production classifier pipeline for improved accuracy on ambiguous prompts
Integrate with ReviewDuel (our peer-review layer for coding-agent workflows) so NexPath's behaviour guidance and ReviewDuel's artifact reviews share one data layer
Expand classifier vocabulary and explore replacing gpt-4o-mini with whatever the best cheap-reasoning model is at the time