🤗 HuggingFaceSignificantmerve
Gemma 4 MTP Speculative Decoding Delivers Up to 3x Inference Speedup
Gemma 4 has received speculative decoding acceleration via MTP (Main-Thought-Parallel) drafters, achieving up to 3x tokens/sec throughput improvement while maintaining identical output quality. Day-0 support confirmed ac…