![[Paper Review] Massive Activations in Large Language Models](https://eric-mingjie.github.io/massive-activations/assets/main_teaser_final.png)
[Paper Review] Massive Activations in Large Language Models
Paper Link Massive Activations, Hidden Biases: A Reinterpretation of Self-Attention’s Secrets TL;DR Just 4–10 extreme scalar values …
20 minute
2402.17762v2
Transformer
SelfAttention
BiasMechanism
RepresentationLearning
Interpretability
NeuralMechanisms
Massive Activations
Explicit Attention Bias
![[paper review] SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-bit Training](https://cdn-uploads.huggingface.co/production/uploads/66c0a08bac74db25de8427ec/Tb20E3IJSV6PjcD9Nkvfg.png)
![[Paper Review] Helix Parallelism: Rethinking Sharding Strategies for Interactive Multi-Million-Token LLM Decoding](https://www.storagereview.com/wp-content/uploads/2025/07/image2-2-png-e1752234784623.webp)