🌐 Kimi MoonshotSignificantKimi.ai
Kimi AI Open-Sources FlashKDA: CUTLASS-Based Delta Attention Delivers 1.72×–2.22× Prefill Speedup on NVIDIA H20
Kimi.ai open-sources FlashKDA, a CUTLASS-based implementation of Kimi Delta Attention kernels designed for high-performance LLM inference. The implementation delivers 1.72x-2.22x prefill speedup on NVIDIA H20 hardware co…