DevRadar
🤗 HuggingFaceSignificant

IBM Granite Releases ModernBERT-Based Multilingual Embedding Models: 97M and 311M Parameters

IBM Granite released two multilingual embedding models based on ModernBERT architecture: a 97M parameter variant and a 311M parameter variant. The models support 200+ languages with a 32K context window, designed for retrieval, search, similarity, and code-related tasks. Day-zero support on Text Embeddings Inference (TEI) indicates these models are optimized for efficient inference deployment.

Alvaro BartolomeWednesday, April 29, 2026Original source

IBM Granite Releases ModernBERT-Based Multilingual Embedding Models: 97M and 311M Parameters

Summary

IBM has released two multilingual embedding models based on the ModernBERT architecture—a 97M parameter base variant and a 311M parameter large variant—supporting 200+ languages with a 32K token context window, optimized for retrieval, search, similarity, and code tasks. Both models ship with day-zero Text Embeddings Inference (TEI) support for production-ready deployment.

Integration Strategy

When to Use This?

Strong Fit Scenarios:

  • Multilingual RAG systems requiring broad language coverage
  • Code search and retrieval within large repositories
  • Enterprise knowledge bases spanning multiple languages
  • Semantic search requiring longer context windows
  • Cost-sensitive deployments where smaller models suffice

Consider Alternatives If:

  • Operating in a single language with existing best-in-class options (e.g., E5, BGE variants)
  • Requiring state-of-the-art English-only performance
  • Needing explicit training data or fine-tuning control
  • Operating under strict on-premise constraints requiring full model auditing

How to Integrate?

Via Hugging Face Hub: The models are available through Hugging Face's model repository, enabling standard HF ecosystem integration:

from transformers import AutoModel, AutoTokenizer

# 97M variant
model_97m = AutoModel.from_pretrained("ibm-granite/granite-embedding-97m-multilingual")
tokenizer_97m = AutoTokenizer.from_pretrained("ibm-granite/granite-embedding-97m-multilingual")

# 311M variant
model_311m = AutoModel.from_pretrained("ibm-granite/granite-embedding-311m-multilingual")
tokenizer_311m = AutoTokenizer.from_pretrained("ibm-granite/granite-embedding-311m-multilingual")

Via Text Embeddings Inference (Recommended for Production): TEI provides optimized inference with automatic batching and quantization support:

# Pull and run via TEI
model=ibm-granite/granite-embedding-311m-multilingual
docker run -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:latest --model-id $model

API Complexity: Low. Standard embedding generation workflow—tokenize, encode, pool/normalize.

Migration Path: For teams currently using other embedding models, the primary changes involve model ID updates and potential embedding dimension adjustments (specific dimensions not confirmed in available sources).

Compatibility

ComponentStatus
PyTorch✓ Standard support
ONNXExpected (TEI uses Safetensors)
vLLMLikely (encoder-only compatibility varies)
TEI✓ Day-zero support confirmed
LangChain✓ Via Hugging Face embeddings integration
LlamaIndex✓ Via Hugging Face embeddings integration

Source: @huggingface Reference: IBM Granite Model Release (Hugging Face Hub) Published: Recent (2025) DevRadar Analysis Date: 2026-04-29