🤗 HuggingFaceSignificantDailyPapers
Qwen-Scope: Open Interpretability Toolkit Exposes 81k Learned Features via Sparse Autoencoders
Qwen released Qwen-Scope, an interpretability toolkit using Sparse Autoencoders on Qwen3.5-27B. The toolkit exposes 81k learned features across 64 transformer layers, enabling steerable inference (behavior modification w…