Hugging Face Open-Sources "ml-intern": An AI Agent That Replicates ML Research Workflows

Summary

Hugging Face released "ml-intern", an autonomous AI agent that completed a post-training research internship by replicating a DeepMind baseline on test-time compute scaling. The agent achieved 45%→65% accuracy (+20pp) using last-step PRM prediction scoring, outperforming greedy, majority vote, and standard Best-of-N baselines. Full implementation artifacts are now open source.

Integration Strategy

When to Use This?

Research acceleration: Automating baseline replication for new papers
Benchmark evaluation: Systematic comparison of inference strategies
Post-training research: Iterating on reward model and scoring strategies
Reproducibility pipelines: Standardized research workflow automation

The take-home test template (github.com/huggingface/post-training-takehome) provides a reusable framework for evaluating agent capabilities on ML research tasks.

How to Integrate?

Available Artifacts:

Artifact	Link	Purpose
Documentation	huggingface.co/blog/cmpatino/ml-intern-takehome	Full technical report
Trained Model	huggingface.co/cmpatino/math500-bon-exercise	PRM model on Math500
Results Dataset	huggingface.co/datasets/cmpatino/math500-bon-weighted-results	Experimental data
Docker Space	Hugging Face Spaces (T4 GPU)	Live demo deployment
Test Template	github.com/huggingface/post-training-takehome	Evaluation framework

Deployment: The Docker Space runs on T4 GPU hardware, providing accessible inference for experimentation without local GPU requirements.

Compatibility

Framework: Hugging Face ecosystem (Transformers, PEFT)
Evaluation: Math500 benchmark
Infrastructure: Docker containers, Hugging Face Spaces
Integration points: Model Hub, Datasets Hub, Spaces deployment

Source

Source: @huggingface

Reference: ml-intern Full Documentation Reference: Math500 Trained Model Reference: Weighted Results Dataset Reference: Post-Training Takehome Test

Published: November 2024 DevRadar Analysis Date: 2026-04-23

Tags: #OpenSource, #LLM, #Inference, #Research, #Agentic