![[논문리뷰] SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-bit Training](https://cdn-uploads.huggingface.co/production/uploads/66c0a08bac74db25de8427ec/Tb20E3IJSV6PjcD9Nkvfg.png)
[논문리뷰] SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-bit Training
논문 링크 SageAttention 3 & SageBwd — FP4로 달리고 8-bit로 학습한다 📝 한 줄 요약 (TL;DR) Blackwell 세대 GPU의 FP4 Tensor Core를 100 % 활용하도록 설계된 SageAttention 3(추론)과 SageBwd(훈련) …
26 분