IBM Granite Releases ModernBERT-Based Multilingual Embedding Models: 97M and 311M Parameters
IBM Granite released two multilingual embedding models based on ModernBERT architecture: a 97M parameter variant and a 311M parameter variant. The models support 200+ languages with a 32K context window, designed for retrieval, search, similarity, and code-related tasks. Day-zero support on Text Embeddings Inference (TEI) indicates these models are optimized for efficient inference deployment.
IBM Granite Releases ModernBERT-Based Multilingual Embedding Models: 97M and 311M Parameters
IBM has released two multilingual embedding models based on the ModernBERT architecture—a 97M parameter base variant and a 311M parameter large variant—supporting 200+ languages with a 32K token context window, optimized for retrieval, search, similarity, and code tasks. Both models ship with day-zero Text Embeddings Inference (TEI) support for production-ready deployment.
Integration Strategy
When to Use This?
Strong Fit Scenarios:
- Multilingual RAG systems requiring broad language coverage
- Code search and retrieval within large repositories
- Enterprise knowledge bases spanning multiple languages
- Semantic search requiring longer context windows
- Cost-sensitive deployments where smaller models suffice
Consider Alternatives If:
- Operating in a single language with existing best-in-class options (e.g., E5, BGE variants)
- Requiring state-of-the-art English-only performance
- Needing explicit training data or fine-tuning control
- Operating under strict on-premise constraints requiring full model auditing
How to Integrate?
Via Hugging Face Hub: The models are available through Hugging Face's model repository, enabling standard HF ecosystem integration:
from transformers import AutoModel, AutoTokenizer
# 97M variant
model_97m = AutoModel.from_pretrained("ibm-granite/granite-embedding-97m-multilingual")
tokenizer_97m = AutoTokenizer.from_pretrained("ibm-granite/granite-embedding-97m-multilingual")
# 311M variant
model_311m = AutoModel.from_pretrained("ibm-granite/granite-embedding-311m-multilingual")
tokenizer_311m = AutoTokenizer.from_pretrained("ibm-granite/granite-embedding-311m-multilingual")
Via Text Embeddings Inference (Recommended for Production): TEI provides optimized inference with automatic batching and quantization support:
# Pull and run via TEI
model=ibm-granite/granite-embedding-311m-multilingual
docker run -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:latest --model-id $model
API Complexity: Low. Standard embedding generation workflow—tokenize, encode, pool/normalize.
Migration Path: For teams currently using other embedding models, the primary changes involve model ID updates and potential embedding dimension adjustments (specific dimensions not confirmed in available sources).
Compatibility
| Component | Status |
|---|---|
| PyTorch | ✓ Standard support |
| ONNX | Expected (TEI uses Safetensors) |
| vLLM | Likely (encoder-only compatibility varies) |
| TEI | ✓ Day-zero support confirmed |
| LangChain | ✓ Via Hugging Face embeddings integration |
| LlamaIndex | ✓ Via Hugging Face embeddings integration |
Source: @huggingface Reference: IBM Granite Model Release (Hugging Face Hub) Published: Recent (2025) DevRadar Analysis Date: 2026-04-29