> “ The agent acted like a hyperparameter optimization algorithm with some basic reasoning baked in.”
Good lens.
The crux of the auto research repo is basically one file - program.md which is a system prompt that can be summarized as “do this in a loop: improve train.py, run the training, run evals, record result. Favor simplicity”. The other files are an arbitrary ML model that is being trained.
_pdp_today at 7:53 PM
Take some working code. Ask an LLM to fix bugs. Measure performance and test coverage. Feed the results back into the LLM. Repeat.
This has been the standard approach for more complex LLM deployments for a while now in our shop.
Using different models across iterations is also something I've found useful in my own experiments. It's like getting a fresh pair of eyes.
datsci_est_2015today at 7:15 PM
I often use LLMs to explore prior art and maybe find some alternative ways of thinking of problems. About 90% of what it tells me is useless or inapplicable to my domain due to a technicality it could not have known, but the other 10% is nice and has helped me learn some great new things.
I can’t imagine letting an agent try everything that the LLM chatbot had recommended ($$$). Often coming up in recommendations are very poorly maintained / niche libraries that have quite a lot of content written about them but what I can only imagine is very limited use in real production environments.
On the other hand, we have domain expert “consultants” in our leadership’s ears making equally absurd recommendations that we constantly have to disprove. Maybe an agent can occupy those consultants and let us do our work in peace.
jpcompartirtoday at 7:35 PM
There are better techniques for hyper-parameter optimisation, right? I fear I have missed something important, why has Autoresearch blown up so much?
The bottleneck in AI/ML/DL is always data (volume & quality) or compute.
Does/can Autoresearch help improve large-scale datasets?
Is it more compute efficien than humans?
1970-01-01today at 8:18 PM
>
The original paper used several medical X-ray datasets which I don’t have access to anymore, so I needed a new dataset with spatial annotations to test the expert attention mechanism. I picked the Ukiyo-eVG dataset: ~11K Japanese woodblock prints
So... It did work. It found bugs (that he didn't know about) and it did optimization (that he hadn't done).
dvttoday at 7:25 PM
Ok, so looking at the commit log[1], I was mostly interested in seeing what the "moonshot ideas" implementations looked like, but basically everything is just hyperparameter tuning. Which is nice, but likely not worth the $$$ spent on the tokens. Am I missing something here?
I've done something with a small project I have and I had very similar results overall.
lucasaytoday at 8:05 PM
This feels less like automated research and more like structured trial and error with a decent feedback loop. Still useful, but I think the real bottleneck is how good your eval metric is. If that’s weak, the whole loop just optimizes for the wrong thing faster.
lamrogertoday at 7:15 PM
Awesome breakdown! It really feels like a hyper-hyper parameter search + bug fixer.
I started looking at Kaggle again and autoresearch seems to converge to many of the solution vibes there.
Wild ensembles, squeezing a bit of loss out. More engineering than research IMO
BrokenCogstoday at 7:19 PM
Does autoresearch work for projects that are not llm based? Eg in karpathy's example he is optimizing the nanogpt. What if I wanted to improve a Unet for image segmentation?