DevRadar
🌐 Google AiSignificant

Google Cloud Eighth-Generation TPUs: Inference and Reasoning Variants, Gemini Embedding 2 GA, and Decoupled DiLoCo

Google Cloud announced several technically substantive updates: (1) Eighth-generation TPUs now include TPUt (inference-optimized) and TPUi (reasoning-optimized) variants; (2) Gemini Embedding 2, a natively multimodal embedding model, reached general availability via Gemini API and Gemini Enterprise Agent Platform; (3) Stitch by Google open-sourced the DESIGN.md draft specification for cross-platform use; (4) Google DeepMind introduced Decoupled DiLoCo, a distributed training method for coordinating AI model training across multiple data centers with improved resilience and flexibility.

Google AI@@GoogleCloudSaturday, April 25, 2026Original source

Google Cloud Eighth-Generation TPUs: Inference and Reasoning Variants, Gemini Embedding 2 GA, and Decoupled DiLoCo

Summary

Google Cloud announced eighth-generation TPUs split into TPUt (inference-optimized) and TPUi (reasoning-optimized) variants, Gemini Embedding 2 reaching general availability as a natively multimodal embedding model, and Decoupled DiLoCo—a resilient distributed training method for coordinating AI model training across multiple data centers. Stitch by Google also open-sourced the DESIGN.md specification for cross-platform use.

Integration Strategy

When to Use This?

TPUt (Inference-Optimized):

  • High-volume production inference workloads
  • Real-time serving with strict latency SLAs
  • Batch embedding generation at scale
  • Cost-optimized serving for established models

TPUi (Reasoning-Optimized):

  • Complex agentic workflows requiring multi-step reasoning
  • Chain-of-thought applications
  • Mathematical and logical problem-solving
  • Research and exploration tasks where depth matters more than throughput

Gemini Embedding 2:

  • Cross-modal retrieval systems (finding images from text queries, etc.)
  • RAG systems requiring semantic understanding across modalities
  • Recommendation systems incorporating multiple content types
  • Unified embedding spaces for heterogeneous data warehouses

Decoupled DiLoCo:

  • Training runs exceeding single-datacenter capacity
  • Privacy-constrained training (data stays regional, gradients don't leave)
  • Disaster recovery for large-scale training runs
  • Organizations with distributed GPU/TPU infrastructure

How to Integrate?

TPU Access:

  • Google Cloud console for provisioning
  • TensorFlow, JAX, and PyTorch support (via PyTorch/XLA)
  • Migration from v5/v6 TPUs should be straightforward for standard workloads

Gemini Embedding 2 API:

  • Gemini API with embedding-specific endpoints
  • Gemini Enterprise Agent Platform for integrated agent workflows
  • Standard cosine similarity or dot-product similarity for downstream tasks

Decoupled DiLoCo:

  • Google DeepMind research publication provides methodological details
  • Implementation requires custom integration (likely open-sourced tooling forthcoming)
  • Best suited for organizations with existing distributed training infrastructure

Compatibility

  • Frameworks: TensorFlow, JAX, PyTorch (via XLA)
  • Cloud Platform: Google Cloud exclusively (TPU is Google-specific silicon)
  • Existing Tooling: Compatible with Vertex AI pipelines, likely supports standard MLOps tooling
  • Migration Path: Forward-compatible from fifth and sixth-generation TPU deployments

Source: @GoogleAI Reference: Google Cloud Next 2025 Announcements Published: 2025 DevRadar Analysis Date: 2026-04-25