DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
논문 링크 한 줄 요약 (TL;DR) “끊어 쓰고(Truncate-and-Resume) 탐험하며(내재 보상 MCTS) Lean 4 증명을 찾자” – DeepSeek-Prover-V1.5는 7 B 모델로 miniF2F 63.5 % / *ProofNet …
29 분
2408.08152v1
FormalProof
AutomatedTheoremProving
Lean4
LLMforProof
ProofSearch
WholeProof
MCTS
RMaxTS
IntrinsicReward
TruncateAndResume
MathReasoning
AI4Math
LanguageModel
ReinforcementLearning
DeepSeekProver