Inspiration
Vibe coding has changed how software gets built. AI coding agents can generate a whole feature from a one-sentence prompt — but the speed of generation often outpaces the discipline of process. We kept watching ourselves skip reviews, forget tests, and ship without acceptance checks. Not out of laziness — out of momentum.
We'd been researching MCP and coding agent workflows for a while, and during the AI Hackfest's "Hacking with GitHub Copilot + MCP" workshop, the specific idea clicked: what if we could build a tool that gives developers meaningful direction and helpful suggestions right inside their coding sessions — matching their momentum instead of fighting it? NexPath was built from that point — during the hackathon — as an independent project.
What it does
NexPath is a behaviour-guidance system that sits alongside AI coding agents and surfaces a decision session at the right moments — giving developers meaningful input, helpful direction, and interesting suggestions without ever forcing their hand. It is fully supported on Claude Code as of v0.1.1, with config detection and MCP registration implemented for Cursor, Windsurf, Cline, Roo Code, KiloCode, and OpenCode (end-to-end testing for those agents is planned for v0.1.3).
- Captures every prompt via an MCP server that the agent runs as a child process
- Classifies the prompt into one of 8 development stages (idea → PRD → architecture → task breakdown → implementation → review/testing → release → feedback loop) using a two-tier cascade (keyword matching → TF-IDF scoring) with a separate LLM cross-confirmation before firing
- Detects the right moment — stage transitions, absence signals, low-confidence flags — and fires a short, non-intrusive terminal popup with relevant suggestions
- Offers 3 levels of easier options — pick one, hit "Show simpler options →", or skip (replayed later by
nexpath optimize) - Calibrates tone to the developer via a Nature + Mood classifier reading the last 5–20 prompts, shifting phrasing between
beginner,cool_geek,hardcore_pro, andpro_geek_soul - Stays fully local — all prompts, session state, and skipped-session queue live in a SQLite DB at
~/.nexpath/prompt-store.db. No telemetry, no cloud.
How we built it
- Runtime: Node.js 18+, TypeScript (ESM), built with
tscand run viatsxin dev - CLI:
commanderfor the command tree (install,uninstall,init,optimize,status,log,config,store),@clack/promptsfor the terminal UI - MCP server:
@modelcontextprotocol/sdkover stdio. Each AI coding agent spawnsnexpath-serveas a child process and calls thecapture_prompttool on every user message - Storage:
sql.js(WASM SQLite) with a custom schema (5 tables: prompts, config, projects, session_states, skipped_sessions); 100 MB soft ceiling with age-based eviction - Classifier cascade:
- Tier 1 — keyword matcher (<1 ms, confidence ≥ 0.65 short-circuits)
- Tier 2 — TF-IDF via the
naturallibrary (<5 ms, confidence ≥ 0.40 short-circuits) - A third tier using MiniLM embeddings (
@xenova/transformers) is implemented in source but not yet wired into the production pipeline
- Stage 2 cross-confirmation:
gpt-4o-minivia theopenaiSDK, CO-STAR-style prompt, JSON-only response, confidence gate at 0.60 before firing the decision session - Pinch generator: a separate
gpt-4o-minicall produces the 2–3 word header that opens each decision session; on any failure it falls back to a static label - Advisory pipeline: the full
nexpath autopipeline — classifier → session state → absence detection → Stage 2 LLM → decision session UI — runs automatically via a Claude CodeUserPromptSubmithook between every user prompt - Profiling: a lightweight 2-axis
NatureClassifier(precision × playfulness) plus a rule-basedMoodClassifier(focused / rushed / excited / frustrated / methodical / casual) running over the last 5–20 prompts - Language detection:
tinyldafter a preprocessing pass that strips code-heavy lines, splits camelCase, and drops English programming keywords - Logging: structured file-based logger at
~/.nexpath/nexpath.logwith 5 MB rotation andnexpath logcommand for inspection - Validation: runtime schemas with
zod; unit tests withvitest(~9,050 test lines for ~5,430 source lines — 1.67x ratio) - Multi-agent install: one
nexpath installcommand detects every supported agent on disk and writes the correct MCP entry — different schema per agent (standard, Cline/Roo, KiloCode, OpenCode)
Challenges we ran into
- Prompt-intent classification under latency budget: we needed every captured prompt classified in well under 30 ms to stay invisible. Pure LLM calls were too slow and too expensive. The two-tier cascade (keyword → TF-IDF) with confidence thresholds lets us short-circuit ~80% of prompts at Tier 1/2 and keep the pipeline under 5 ms for local classification.
- Detecting "the right moment" without being annoying: firing the decision session too often kills trust; firing it too rarely makes the tool invisible. We built a two-stage gate — fast local signals (stage transition, absence flags, low-confidence + active absence) on Stage 1, then a single
gpt-4o-minicross-confirmation on Stage 2 that returns a strict JSON verdict. We also enforce "once per event per session" so a given transition can never re-fire. - Rendering interactive UI from a hook subprocess: the advisory pipeline runs as a hook subprocess, which means stdin/stdout are owned by the parent agent. We had to build a custom TTY select function (
TtySelectFn.ts) that opens/dev/ttydirectly to render the decision session UI — with error capture, cleanup, and graceful fallback when TTY isn't available. - Language detection on mixed code/English prompts:
tinyldreturned English for almost every prompt because developers sprinkleconst,function, identifiers, etc. into non-English text. We wrote a preprocessor that drops code-heavy lines (>50% non-letter), splits camelCase tokens, and filters a small English-keyword stoplist — plus a sticky-fallback and accuracy/gap threshold to avoid flip-flopping. - MCP install across 7 agents: each agent stores its MCP config in a different path, with a different schema (some need
disabledandalwaysAllow, OpenCode uses an array command under amcpkey, KiloCode requires atype: 'stdio'field, Claude Code wants the CLI). Writing a clean detection + install + uninstall layer with full Windows/macOS/Linux path resolution took more time than we budgeted. - Keeping everything local: we refused to add cloud sync or telemetry, which meant doing retention, eviction, and cleanup in
sql.jsourselves with crash-safety on every prompt capture.
Accomplishments that we're proud of
- A fully working advisory pipeline on Claude Code — the decision session fires automatically between user prompts via the
UserPromptSubmithook, with no manual intervention required - A
nexpath installcommand that registers the MCP server correctly across Claude Code, Cursor, Windsurf, Cline, Roo Code, KiloCode, and OpenCode — each with its own config format - A two-tier classifier that hits its latency budget (<5 ms) and degrades gracefully when no LLM key is available
- A decision-session UX with a 3-level easier-options cascade, ANSI-styled pinch labels, Ctrl-C = skip (recorded for later), and no back-navigation — it respects developer flow
nexpath optimize— replays every skipped decision session so nothing is permanently lostnexpath status— shows MCP connections, hook registration, prompt store stats, and config at a glance- A privacy posture we're genuinely happy with: everything local, 100 MB soft ceiling,
nexpath store delete/nexpath store pruneas first-class commands, no network calls except the optional Stage 2 and pinch LLM calls - ~9,050 lines of tests for ~5,430 lines of source (1.67x ratio) across classifier, store, decision-session, pinch generator, language detection, and CLI
What we learned
- Small models + cheap heuristics in a cascade beat one big model call when you have a latency budget and a bounded decision space
- "Non-intrusive" is a UX invariant, not a feature. Every design choice — pinch labels, 3-level cascade, skip-without-punishment,
optimizereplay — comes from treating interruption as a cost - The MCP spec is well-designed, but real-world agent configs are a zoo. An install command is only as good as its platform-detection and JSON-merge code
- Rendering interactive UI from a subprocess that doesn't own the terminal is harder than it sounds — TTY handling on macOS, Linux, and Windows each have their own edge cases
- Calibrating tone to the developer (nature × mood) made the terminal prompts feel less like a compliance checklist and more like a colleague's nudge. This mattered more than we expected
What's next for NexPath
- v0.1.2 — Fix existing MCP server issues and stabilise the advisory pipeline for production reliability
- v0.1.3 — Expand end-to-end testing and support to Cursor, Windsurf, Cline, Roo Code, KiloCode, and OpenCode
- Publish to npm as
nexpathandnexpath-servesonpm install -g nexpath+nexpath installis the full onboarding path - Wire the MiniLM embedding tier into the production classifier pipeline for improved accuracy on ambiguous prompts
- Integrate with ReviewDuel (our peer-review layer for coding-agent workflows) so NexPath's behaviour guidance and ReviewDuel's artifact reviews share one data layer
- Expand classifier vocabulary and explore replacing
gpt-4o-miniwith whatever the best cheap-reasoning model is at the time
Built With
- anthropic
- clack
- claude-code
- cline
- commander
- cursor
- gpt-4o-mini
- kilocode
- mcp
- minilm
- model-context-protocol
- natural
- node.js
- openai
- roo-code
- sql.js
- sqlite
- tfidf
- tinyld
- tsx
- typescript
- vitest
- windsurf
- xenova-transformers
- zod


Log in or sign up for Devpost to join the conversation.