PitchCraft

LegacyAIDev Legacy AI posted an update — Feb 09, 2026 08:57 PM EST

The ML Part

Training Strategy:

With only 70 total examples (40 real deals + 30 LLM-generated synthetic data), we used XGBoost regression with 3-fold cross-validation to maximize predictive power while preventing overfitting.

Feature Engineering:

Integration complexity scoring
Scope quantification (feature count, custom logic requirements)
Technical stack requirements (platform complexity, data volume)
Client segment indicators (industry, company size)
Client Pain Severity (How painful + urgent are they?)

Model Performance:

Metric	Training Set	Test Set	Cross-Validation
R²	0.937	0.807	0.816 ± 0.060
MAE	$2,328	$3,688	$3,898 ± $629
RMSE	$2,874	$4,720	-

Key Results:

80.7% explained variance on unseen data
Predictions within ±$3,688 on average, commercially viable for real quotes
CV-Test consistency: Test R² (0.807) aligns with CV estimate (0.816), confirming reliable generalization
Stable across folds: R² ranged from 0.74-0.88, showing model robustness

Deployment:

Serialized via joblib, served through FastAPI
Real-time inference with <100ms latency
Integrated as a tool call within the Gemini 3 agent workflow
Returns predictions with confidence intervals for quote ranges

The model proves that even with limited training data, aggressive feature engineering and careful cross-validation can produce production-ready ML systems for high-stakes business decisions.

Log in or sign up for Devpost to join the conversation.