Inspiration
Machine learning has enormous potential to accelerate scientific discovery, yet many researchers lack the programming background needed to build reliable ML workflows. In practice, we saw scientists relying on spreadsheets, ad-hoc scripts, or fragmented LLM-generated code that was hard to understand and validate. This inspired us to build an AI research assistant that not only generates ML pipelines, but does so in a structured, transparent, and scientifically defensible way.
What it does
Metric Mode is an AI agent that helps researchers turn their datasets into clear, reproducible machine learning workflows. Users upload their data and describe their research goal in natural language. The agent profiles the dataset, asks clarification questions about the modelling task, validates key assumptions, and then generates structured ML pipelines. It produces two reproducible scripts: one for training and validating the model, and another for testing it on unseen data. The system also explains each step so researchers can understand and trust the process.
How we built it
We built Metric Mode as a structured AI agent using Koog, JetBrains’ Kotlin framework for intelligent agents. Koog allowed us to design the system as a multi-step workflow rather than a single prompt.
Challenges we ran into
One of the biggest challenges was designing the agent’s dialogue flow. We needed to gather enough information to build a valid ML pipeline without overwhelming the user. Deciding when to ask questions, how to confirm assumptions, and how to detect potential modelling risks required careful workflow design. Another challenge was balancing automation with transparency. We wanted the system to be powerful, but also explainable, so that researchers could follow the logic behind each modelling decision.
Accomplishments that we're proud of
We are proud of building an agent that turns unstructured user input into a structured, reproducible ML workflow. The checkpoint system ensures that key modelling decisions are explicitly confirmed before any code is generated. We also successfully integrated conversational guidance, data profiling, and pipeline generation into a coherent user experience powered by Koog.
What we learned
We learned that many challenges in applying ML to research are not about algorithms, but about workflow structure and validation. LLMs can generate code quickly, but without a structured agent framework, the results are often hard to trust. We also learned how important observability and state management are when building multi-step AI agents.
What's next for Metric Mode
Next, we would like to expand Metric Mode to support more data modalities and more advanced modelling tasks. We also aim to integrate richer visualizations of the generated workflows and provide deeper explanations of model behavior. In the long term, we see Metric Mode evolving into a platform that helps standardize responsible machine learning practices in scientific research.
Built With
- chatgpt
- github
- koog
- kotlin
- lovable
- openai
Log in or sign up for Devpost to join the conversation.