DeepSeek-V3 Technical Report
논문 링크 한 줄 요약 (TL;DR) DeepSeek-V3는 671 B-parameter MoE LLM에 Aux-loss-free Load-Balancing Bias + FP8 혼정밀 훈련 + Multi-Token Prediction을 결합해, …
35 분
2412.19437v2
MoE
FP8
Open-source LLM
Model Efficiency
DeepSeek
CausalLM
AI Research