EfficientAttention

All posts under tag "EfficientAttention"

1 posts total

Sorted by date

[paper review] SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-bit Training

[paper review] SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-bit Training

[paper review] SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-bit Training

[Paper Review] SageAttention 3 & SageBwd — FP4-Powered Inference and 8-bit Training Paper link: https://arxiv.org/abs/2505.11594v1 📝 …

2505.11594v1 LowPrecision FP4 INT8Training EfficientAttention SageAttention TransformerOptimization BlackwellGPU Quantization InferenceAcceleration TrainingEfficiency CUDA Triton