DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
논문 링크 DeepSeek-R1: 공개 RL-Only 파이프라인으로 o1 급 추론을 재현하다 TL;DR** DeepSeek-R1은 critic-less GRPO RL + 소량 Cold-Start SFT + 다단계 RL/SFT + 지식 증류 파이프라인으 …
30 분
2501.12948v1
DeepSeek
Large Language Models
Reinforcement Learning
GRPO
Math Reasoning
Knowledge Distillation
Causal LM
Open Source Models
Self-Evolution
SOTA Benchmarking