Hugging Face Launches Dedicated PyTorch MPS Optimization Team Targeting 100x Performance Gains

Summary

Hugging Face is forming a specialized team to optimize PyTorch's Metal Performance Shaders (MPS) backend, with initial targets of 100x performance improvement for core operations. Early wins include reimplementing torch.sort and torch.multinomial as native MPS shaders, plus 5x faster safetensors loading on Apple Silicon. The team is actively soliciting community input on priority operations.

Integration Strategy

When to Use This?

Ideal Use Cases:

Local LLM inference on MacBook Pro/Mac Studio (7B-13B parameter models)
Interactive ML development on Apple Silicon without cloud dependency
Privacy-sensitive inference workloads requiring on-device processing
Researchers prototyping on personal hardware

Industry Fit:

Academic/research institutions with Apple hardware mandates
Mobile app developers doing on-device ML
Privacy-first enterprises (healthcare, legal, finance)

How to Integrate?

Current Status:

# Verify MPS availability
import torch
print(torch.backends.mps.is_available())  # Must be True

# Current usage pattern
device = torch.device("mps")
model = model.to(device)

Migration Path: For developers currently using CPU or falling back to CPU for specific operations:

Monitor Hugging Face announcements for operation coverage
Test workloads with torch.mps device after shader updates
Fallback to CUDA/CPU only where MPS coverage is incomplete

Framework Integration:

Transformers library: model.to("mps") works today
Diffusers: MPS support in progress for many pipelines
Safetensors: Already benefits from 5x loading improvements

Compatibility

Component	Status
PyTorch Version	2.0+ required for Flex Attention
Apple Silicon	M1/M2/M3/M4 (Metal 3+)
Rosetta 2	Not supported—must be native ARM64
CUDA fallback	Automatic when MPS unavailable

Source: @huggingface Published: 2025 DevRadar Analysis Date: 2026-04-24

Analysis Note: Primary source is a tweet/RT. Specific benchmark methodology, operation coverage roadmap, and timeline details are not publicly disclosed. Inferred details are marked as such.