Training Stability

All posts under tag "Training Stability"

1 posts total

Sorted by date

[Paper Review] Peri-LN: Revisiting Normalization Layer in the Transformer Architecture

Paper Link Peri-LayerNorm: A Third Option Beyond Post-LN and Pre-LN TL;DR By simply adding another LayerNorm right after the residual …

2502.02732v3 LayerNorm Transformer Architecture Training Stability Large Language Models FP16 Training Empirical Evaluation Gradient Explosion Benchmark Evaluation