Auto Research

Progression of Training
Auto Research vs SOTA

Inspiration

I currently work in a research lab here on campus, and one of the hardest things to do is knowing what to try next. Especially when working with health data, cloud compute can often just be not allowed or become significantly more expensive to ensure that it meets data storage and privacy requirements. This makes it cost prohibitive to just throw things at the wall and see what sticks. Additionally, there is so much state of the art research that simply doing a literature review and seeing what exists, can easily take weeks of work. Then, trying and iterating on other people's work can take even longer.

What it does

Auto Research is an agent that solves this. It is an intelligent research agent that runs 24/7 and collaborates with you on literature review, proposing architectures, and training both small tests and full scale training runs. By using the Asus Ascent GX10, there is plenty of compute to run both training and LLM inference with Nemotron locally, it also keeps data private and local.

Auto Research helps in the research process end to end. Firstly, it can parse Arxiv and retrieve relevant papers and preprints and help tabulate the results cleanly for you to view and compare the tradeoffs and accuracy of modern methods. Second, it can act as a hypothesis engine with you. Auto Research can attempt to find gaps in research and combine methods from the literature review and propose end to end ideas. Additionally, you can propose ideas and Auto research will scan Arxiv again to see if they have been attempted before, and also reality check you on effectiveness. Thirdly, Auto Research acts as the experiment executor. Once you decide on a specific experiment or hypothesis to try, Auto Research can work end to end and write code, debug dependencies, and tune hyperparameters as it goes. Auto Research can also analyze validation and test set results to modify the architecture to compensate for how the model might be performing, like if it is overfitting or just not converging. Lastly, Auto Research can help you compile your results and relevant papers that were referenced.

How we built it

Auto research was built with a centralized Research Coordinator agent that manages the overall research workflow and remembers what has been tried and what has not. Then, domain specific agents (such as ones integrated with arXiv) can be orchestrated to do their portion of the work and edit code. Computationally, it is integrated with both OpenClaw and Ollama. Ollama is used for local inference and chosen for its privacy and ease of use. While the Nvidia Nemotron models were used in this experiment, any open source model available on Ollama can be used.

Challenges we ran into

One of the primary challenges we ran into was testing the agent. While the OpenClaw agent can easily go and find datasets by itself, it often cannot download the datasets itself, as they are Captcha protected. We solved this by allowing the agent to request human intervention in specific places, and more easily interrupt its iteration loop. Additionally, the agent often struggled with longer training runs, as they would not know if a training run was actually done or not, and would have to just keep checking the output logs. This expended context at no extra benefit, and was solved by utilizing a combination of subagents and memory so that the model could anticipate when to check for updates, and utilize subagents to not cloud the context window any more than it had to.

Accomplishments that we're proud of

Successfully building an agentic research system in 24 hours was very difficult. Relying entirely on local hardware sometimes made inference slow, and also introduce more bugs. The system is also very capable and can manage the entire research process from start to finish entirely with varying levels of independence depending on how much in the loop the human needs to be in different parts. Additionally, this system allows for research to be done more quickly and especially in areas where data storage is sensitive, like in health, being able to help accelerate research even by a little bit is invaluable.

What we learned

This was the first time I had ever developed with OpenClaw. It was very difficult to come up with a product that integrated effectively with OpenClaw and was able to take advantage of its tools was a large learning process.

Additionally, the amount of reasoning needed for research was also higher than expected. LLMs seem to be a decent ways off from being able to do research fully autonomously, and without humans to sanity check work, and a very specialized harness, such as this project, to control what it does and give it explicit abilities, it can be very difficult for them to work effectively.

What's next for Auto Research

While Auto Research in its current form is focused primarily on local compute, it would be incredibly useful to add a cloud compute arm as well. Adding the ability for the agent to check different cloud providers for pricing and spin up kubernetes pods while monitoring their cost and performance and deleting them as training fails while simultaneously managing cost and choosing experiments would be incredibly useful for researchers who require more powerful cloud compute.

Built With

nemotron
openclaw
python

Updates

Daniel Rhee started this project — May 16, 2026 08:00 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.