DevRadar
🤗 HuggingFaceSignificant

Hugging Face Launches Dedicated PyTorch MPS Optimization Team Targeting 100x Performance Gains

Hugging Face is forming a dedicated team for PyTorch Metal Performance Shaders (MPS) optimization. Initial focus on torch.mps backend targeting significant performance improvements. First deliverables include torch.sort and torch.multinomial reimplemented as MPS shaders, with flex attention support in development. Additionally achieving 5x faster safetensors loading on MPS. Team is soliciting community input on priority operations for future optimization work.

LysandreFriday, April 24, 2026Original source

Hugging Face Launches Dedicated PyTorch MPS Optimization Team Targeting 100x Performance Gains

Summary

Hugging Face is forming a specialized team to optimize PyTorch's Metal Performance Shaders (MPS) backend, with initial targets of 100x performance improvement for core operations. Early wins include reimplementing torch.sort and torch.multinomial as native MPS shaders, plus 5x faster safetensors loading on Apple Silicon. The team is actively soliciting community input on priority operations.

Integration Strategy

When to Use This?

Ideal Use Cases:

  • Local LLM inference on MacBook Pro/Mac Studio (7B-13B parameter models)
  • Interactive ML development on Apple Silicon without cloud dependency
  • Privacy-sensitive inference workloads requiring on-device processing
  • Researchers prototyping on personal hardware

Industry Fit:

  • Academic/research institutions with Apple hardware mandates
  • Mobile app developers doing on-device ML
  • Privacy-first enterprises (healthcare, legal, finance)

How to Integrate?

Current Status:

# Verify MPS availability
import torch
print(torch.backends.mps.is_available())  # Must be True

# Current usage pattern
device = torch.device("mps")
model = model.to(device)

Migration Path: For developers currently using CPU or falling back to CPU for specific operations:

  1. Monitor Hugging Face announcements for operation coverage
  2. Test workloads with torch.mps device after shader updates
  3. Fallback to CUDA/CPU only where MPS coverage is incomplete

Framework Integration:

  • Transformers library: model.to("mps") works today
  • Diffusers: MPS support in progress for many pipelines
  • Safetensors: Already benefits from 5x loading improvements

Compatibility

ComponentStatus
PyTorch Version2.0+ required for Flex Attention
Apple SiliconM1/M2/M3/M4 (Metal 3+)
Rosetta 2Not supported—must be native ARM64
CUDA fallbackAutomatic when MPS unavailable

Source: @huggingface Published: 2025 DevRadar Analysis Date: 2026-04-24

Analysis Note: Primary source is a tweet/RT. Specific benchmark methodology, operation coverage roadmap, and timeline details are not publicly disclosed. Inferred details are marked as such.