Google Cloud Eighth-Generation TPUs: Inference and Reasoning Variants, Gemini Embedding 2 GA, and Decoupled DiLoCo
Google Cloud announced several technically substantive updates: (1) Eighth-generation TPUs now include TPUt (inference-optimized) and TPUi (reasoning-optimized) variants; (2) Gemini Embedding 2, a natively multimodal embedding model, reached general availability via Gemini API and Gemini Enterprise Agent Platform; (3) Stitch by Google open-sourced the DESIGN.md draft specification for cross-platform use; (4) Google DeepMind introduced Decoupled DiLoCo, a distributed training method for coordinating AI model training across multiple data centers with improved resilience and flexibility.
Google Cloud Eighth-Generation TPUs: Inference and Reasoning Variants, Gemini Embedding 2 GA, and Decoupled DiLoCo
Google Cloud announced eighth-generation TPUs split into TPUt (inference-optimized) and TPUi (reasoning-optimized) variants, Gemini Embedding 2 reaching general availability as a natively multimodal embedding model, and Decoupled DiLoCo—a resilient distributed training method for coordinating AI model training across multiple data centers. Stitch by Google also open-sourced the DESIGN.md specification for cross-platform use.
Integration Strategy
When to Use This?
TPUt (Inference-Optimized):
- High-volume production inference workloads
- Real-time serving with strict latency SLAs
- Batch embedding generation at scale
- Cost-optimized serving for established models
TPUi (Reasoning-Optimized):
- Complex agentic workflows requiring multi-step reasoning
- Chain-of-thought applications
- Mathematical and logical problem-solving
- Research and exploration tasks where depth matters more than throughput
Gemini Embedding 2:
- Cross-modal retrieval systems (finding images from text queries, etc.)
- RAG systems requiring semantic understanding across modalities
- Recommendation systems incorporating multiple content types
- Unified embedding spaces for heterogeneous data warehouses
Decoupled DiLoCo:
- Training runs exceeding single-datacenter capacity
- Privacy-constrained training (data stays regional, gradients don't leave)
- Disaster recovery for large-scale training runs
- Organizations with distributed GPU/TPU infrastructure
How to Integrate?
TPU Access:
- Google Cloud console for provisioning
- TensorFlow, JAX, and PyTorch support (via PyTorch/XLA)
- Migration from v5/v6 TPUs should be straightforward for standard workloads
Gemini Embedding 2 API:
- Gemini API with embedding-specific endpoints
- Gemini Enterprise Agent Platform for integrated agent workflows
- Standard cosine similarity or dot-product similarity for downstream tasks
Decoupled DiLoCo:
- Google DeepMind research publication provides methodological details
- Implementation requires custom integration (likely open-sourced tooling forthcoming)
- Best suited for organizations with existing distributed training infrastructure
Compatibility
- Frameworks: TensorFlow, JAX, PyTorch (via XLA)
- Cloud Platform: Google Cloud exclusively (TPU is Google-specific silicon)
- Existing Tooling: Compatible with Vertex AI pipelines, likely supports standard MLOps tooling
- Migration Path: Forward-compatible from fifth and sixth-generation TPU deployments
Source: @GoogleAI Reference: Google Cloud Next 2025 Announcements Published: 2025 DevRadar Analysis Date: 2026-04-25