posted an update

Team: Error: Skill Not Found!

Our earlier builds (v1/v2) ran on flan-t5-base and flan-t5-large, encoder-decoder models that were fast and free but had a fatal flaw: they echoed our prompt templates back literally. Milestones would say things like "[specific action]" instead of an actual action, and first steps read "I will [[start]] at [[time]]." Technically correct format, zero real content.

So for v3.0 we swapped out the generation model entirely and moved to Qwen2.5-7B-Instruct, a 7.6B decoder-only model, 4-bit quantized to run on a free Colab T4 GPU (~5GB VRAM). Unlike T5, it never echoes templates and produces genuinely fluent, specific reasoning. We kept facebook/bart-large-mnli for zero-shot confidence scoring (assumption risk %, milestone achievability %, path uncertainty %), and added gTTS so the full plan can be narrated aloud.

We also stopped trusting vibes and built an automated benchmark. Using bart-large-mnli we score every output on specificity, relevance, and template leakage, then ran it across three diverse test ideas: an AI career switch, a UX freelancing path, and a biotech transition. Results: specificity averaged around 63%, relevance averaged 94%, and template leakage was 0% across all three. The v1/v2 bracket-echo problem is gone.

Full pipeline timing on a real GPU run: Stage 1 (Idea Clarifier) 77s, Stage 2 (Assumption Miner) 115s, Stage 3 (Milestone Builder) 79s, Stage 4 (First Step Forge) 22s, Stage 5 (Responsible AI) 25s, plus a 3.2MB audio narration generated at the end. Total pipeline: about 6 minutes on a free Colab T4.

The system is still grounded in the same three behavioral science frameworks: Cognitive Load Theory (Sweller, 1988) for the idea clarifier, Assumption-Based Planning (Dewar et al., 2002) for the risk miner, and Implementation Intentions (Gollwitzer, 1999) for the first-step forge. The frameworks didn't change, but now the model is actually capable of reasoning through them instead of filling in blanks.

Log in or sign up for Devpost to join the conversation.