DevRadar
🤗 HuggingFaceSignificant

Poolside Laguna XS.2: Open-Weight MoE Model for Agentic Coding

Poolside releases Laguna XS.2, an open-weight Mixture of Experts model with 33B total parameters and 3B active parameters per token. Designed specifically for agentic coding workflows and long-horizon task completion. Trained entirely in-house on proprietary infrastructure. Architecture optimized for single GPU deployment. Released under Apache 2.0 license with weights available on HuggingFace and API access via poolside platform.

poolsideTuesday, April 28, 2026Original source

Poolside Laguna XS.2: Open-Weight MoE Model for Agentic Coding

Summary

Poolside's Laguna XS.2 is an Apache 2.0-licensed Mixture of Experts model with 33B total parameters and 3B active parameters per token, optimized for single-GPU deployment in agentic coding workflows. The 10:1 activation sparsity ratio enables surprisingly capable coding assistance from modest hardware.

Integration Strategy

When to Use This?

Laguna XS.2 is purpose-built for scenarios where:

  • Agentic coding pipelines require models that maintain coherent context across long task sequences
  • Local/private deployment is non-negotiable (healthcare, finance, defense contractors)
  • Cost efficiency matters—sparse activation reduces token-level compute by ~90% versus dense 33B
  • On-premise inference without GPU clusters is a requirement

Less Suitable For:

  • Extremely latency-sensitive real-time autocomplete (where smaller dense models like Phi-3-mini excel)
  • Environments with only CPU inference capability
  • Situations requiring maximum benchmark performance (GPT-4 class models remain superior)

How to Integrate?

Via HuggingFace Transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "poolside/Laguna-XS.2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name, 
    torch_dtype="auto",
    device_map="auto"  # Single GPU auto-distribution
)

Via Poolside API: Direct API access available at platform.poolside.ai for teams preferring managed inference without deployment overhead.

Quantization Path: Given the single-GPU target, 4-bit or 8-bit quantization variants (GPTQ, AWQ, GGUF) will likely emerge from the community for even tighter memory constraints.

Compatibility

  • Transformers: Standard AutoModelForCausalLM compatibility expected
  • vLLM: Support likely after community testing and upstream integration
  • llama.cpp: GGUF conversion will enable CPU inference for edge cases
  • PyTorch: Required backend; CUDA or ROCm for GPU inference
  • Trading: Apache 2.0 eliminates the non-commercial restrictions plaguing some open models

Source: Poolside Announcement Reference: HuggingFace Model Card Published: 2025 DevRadar Analysis Date: 2026-04-28