![[Paper Review] Massive Activations in Large Language Models](https://eric-mingjie.github.io/massive-activations/assets/main_teaser_final.png)
[Paper Review] Massive Activations in Large Language Models
Paper Link Massive Activations, Hidden Biases: A Reinterpretation of Self-Attention’s Secrets TL;DR Just 4–10 extreme scalar values …
20 minute
2402.17762v2
Transformer
SelfAttention
BiasMechanism
RepresentationLearning
Interpretability
NeuralMechanisms
Massive Activations
Explicit Attention Bias