Project Idea
ML Experiment Autopilot is an autonomous agent that designs, executes, and iterates on machine learning experiments without human supervision, explaining every decision along the way.
Inspiration
Traditional AutoML tools operate as black boxes — they select models and tune hyperparameters but never explain why. Data scientists lose insight into what works and what doesn't. ML Experiment Autopilot takes a fundamentally different approach: it thinks like a researcher, forming hypotheses, testing them, analyzing results, and adapting its strategy based on what it learns.
Description
At its core, the system uses Gemini 3 Flash Preview with Thought Signatures to maintain reasoning continuity across the entire experiment session. Four cognitive components — ExperimentDesigner, ResultsAnalyzer, HypothesisGenerator, and ReportGenerator — share a single multi-turn conversation, allowing Gemini to reference early iteration results when designing later experiments. Temperature is set to 1.0 with thinking level "high" for complex reasoning tasks.
The workflow is fully autonomous: data profiling, baseline establishment, then an iterative loop of Gemini-designed experiments executed via Jinja2-generated Python scripts in isolated subprocesses. The agent detects performance plateaus, balances exploration vs. exploitation, and generates publication-ready Markdown reports with visualizations.
Development Tech Stack
Built with scikit-learn, XGBoost, LightGBM, MLflow, and Rich, the system is designed for the Marathon Agent track — long-running autonomous tasks with self-correction and continuous reasoning.
Built With
- gemini-3-flash-preview-(google-generativeai)
- jinja
- lightgbm
- matplotlib
- mlflow
- numpy
- pandas
- pydantic
- pytest
- python
- python-dotenv
- rich
- scikit-learn
- typer
- xgboost

Log in or sign up for Devpost to join the conversation.