Hugging Face Launches Dedicated PyTorch MPS Optimization Team Targeting 100x Performance Gains
Hugging Face is forming a dedicated team for PyTorch Metal Performance Shaders (MPS) optimization. Initial focus on torch.mps backend targeting significant performance improvements. First deliverables include torch.sort and torch.multinomial reimplemented as MPS shaders, with flex attention support in development. Additionally achieving 5x faster safetensors loading on MPS. Team is soliciting community input on priority operations for future optimization work.
Hugging Face Launches Dedicated PyTorch MPS Optimization Team Targeting 100x Performance Gains
Hugging Face is forming a specialized team to optimize PyTorch's Metal Performance Shaders (MPS) backend, with initial targets of 100x performance improvement for core operations. Early wins include reimplementing torch.sort and torch.multinomial as native MPS shaders, plus 5x faster safetensors loading on Apple Silicon. The team is actively soliciting community input on priority operations.
Integration Strategy
When to Use This?
Ideal Use Cases:
- Local LLM inference on MacBook Pro/Mac Studio (7B-13B parameter models)
- Interactive ML development on Apple Silicon without cloud dependency
- Privacy-sensitive inference workloads requiring on-device processing
- Researchers prototyping on personal hardware
Industry Fit:
- Academic/research institutions with Apple hardware mandates
- Mobile app developers doing on-device ML
- Privacy-first enterprises (healthcare, legal, finance)
How to Integrate?
Current Status:
# Verify MPS availability
import torch
print(torch.backends.mps.is_available()) # Must be True
# Current usage pattern
device = torch.device("mps")
model = model.to(device)
Migration Path: For developers currently using CPU or falling back to CPU for specific operations:
- Monitor Hugging Face announcements for operation coverage
- Test workloads with
torch.mpsdevice after shader updates - Fallback to CUDA/CPU only where MPS coverage is incomplete
Framework Integration:
- Transformers library:
model.to("mps")works today - Diffusers: MPS support in progress for many pipelines
- Safetensors: Already benefits from 5x loading improvements
Compatibility
| Component | Status |
|---|---|
| PyTorch Version | 2.0+ required for Flex Attention |
| Apple Silicon | M1/M2/M3/M4 (Metal 3+) |
| Rosetta 2 | Not supported—must be native ARM64 |
| CUDA fallback | Automatic when MPS unavailable |
Source: @huggingface Published: 2025 DevRadar Analysis Date: 2026-04-24
Analysis Note: Primary source is a tweet/RT. Specific benchmark methodology, operation coverage roadmap, and timeline details are not publicly disclosed. Inferred details are marked as such.