DeepSeek

'DeepSeek' 태그의 모든 글

총 11개의 글

시간순 정렬

DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition

논문 링크 Subgoal Curriculum + CoT Consistency: DeepSeek-Prover-V2가 자동 정리 증명의 판을 갈아엎다 TL;DR DeepSeek-Prover-V2는 **“문제를 잘게 쪼개고, 쪼갠 대로 끝까지 맞춘다”** …

2025년 07월 08일

23 분

2504.21801v1 formal-theorem-proving automated-reasoning math-llm chain-of-thought proof-verification formal-mathematics DeepSeek

Inference-Time Scaling for Generalist Reward Modeling

논문 링크 Inference-Time Scaling: DeepSeek-GRM이 초대형 모델을 넘어선 비결 한 줄 요약 (TL;DR) “27 B 모델 × 32배 샘플”—Generative Reward Model(GRM)과 k-Vote …

2025년 07월 08일

22 분

2504.02495v2 Reward Modeling Generative Reward Model LLM Evaluation Preference Modeling Reinforcement Learning from Human Feedback (RLHF) DeepSeek

Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

논문 링크 DeepSeek-V3: 2 048 대 H800으로 405 B-급 LLM을 돌린다는 것의 의미 TL;DR ― 한 줄 요약 Multi-Head Latent Attention (MLA) + FP8 MoE + Dual-Pipe + 2-계층 MPFT …

2025년 07월 08일

26 분

2505.09343v1 Large Language Models Mixture of Experts FP8 Training Transformer Optimization Memory Efficiency Distributed Training Inference Acceleration DeepSeek

Code I/O: Condensing Reasoning Patterns via Code Input-Output Prediction

논문 링크 CODE I/O: 코드 입·출력 + 자연어 CoT로 범용 추론까지 — 데이터 설계만으로 7B-30B LLM을 평균 +2 점 끌어올리다 TL;DR “코드 함수 → 입력·출력 예측 + 체계적 Chain-of-Thought(CoT)”라는 단일 데 …

2025년 07월 07일

31 분

2502.07316v4 DeepSeek LLM Code Reasoning Chain-of-Thought I/O Prediction Execution Feedback Data-Centric AI Instruction Tuning Transformer Long Context

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

논문 링크 Native Sparse Attention (NSA) — 64 k 토큰도 11× 빠르게, 정확도는 그대로 한 줄 요약 (TL;DR) NSA는 ‘압축 → 선택 → 슬라이딩’ 3 분기 희소 어텐션과 GQA/MQA-친화 커널을 결합해 64 k 컨 …

2025년 07월 07일

31 분

2502.11089v2 Sparse Attention Long Context Transformer Optimization Efficient LLM GPU Acceleration FlashAttention Memory Efficiency Inference Speedup Trainable Sparsity Triton Kernel Deep Learning Language Models DeepSeek

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

논문 링크 DeepSeek-R1: 공개 RL-Only 파이프라인으로 o1 급 추론을 재현하다 TL;DR** DeepSeek-R1은 critic-less GRPO RL + 소량 Cold-Start SFT + 다단계 RL/SFT + 지식 증류 파이프라인으 …

2025년 07월 06일

30 분

2501.12948v1 DeepSeek Large Language Models Reinforcement Learning GRPO Math Reasoning Knowledge Distillation Causal LM Open Source Models Self-Evolution SOTA Benchmarking

Janus-Pro: UnifiedMultimodalUnderstanding and Generation with Data and Model Scaling

논문 링크 Janus-Pro 7B: Dual-Encoder Multimodal LLM That Outsmarts Bigger Models 한 줄 요약 (TL;DR) SigLIP 이해 인코더 + VQ 생성 인코더를 완전히 분리한 뒤 7 B …

2025년 07월 06일

31 분

DeepSeek 2501.17811v1 Janus-Pro Dual-Encoder Multimodal Learning Vision-Language Models Text-to-Image Image Understanding Large Language Models Adapter Networks Visual Tokenization GenEval MMBench DPG-Bench DeepSeek-LLM Efficient Training Synthetic Data

DeepSeek-V3 Technical Report

논문 링크 한 줄 요약 (TL;DR) DeepSeek-V3는 671 B-parameter MoE LLM에 Aux-loss-free Load-Balancing Bias + FP8 혼정밀 훈련 + Multi-Token Prediction을 결합해, …

2025년 07월 05일

35 분

2412.19437v2 MoE FP8 Open-source LLM Model Efficiency DeepSeek CausalLM AI Research

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

논문 링크 DeepSeek-VL2 — “작고 빠르면서 고해상도까지 정확한” 멀티모달 LLM 한 줄 요약 (TL;DR) Dynamic Tiling × MLA-MoE × 800 B VL 데이터라는 세 축의 설계로, 4.5 B …

2025년 07월 05일

31 분

2412.10302v1 DeepSeek Multimodal Learning Vision-Language Models High-Resolution Image Processing Dynamic Tiling Mixture of Experts (MoE) KV-Cache Compression Multi-head Latent Attention (MLA) Visual Grounding OCR Parameter Efficiency LLM Inference Optimization Edge AI Open Source Models Document Understanding Infographic QA Chart and Table QA Visual Reasoning Multilingual VQA Conversational AI with Images

Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts

논문 링크 한 줄 요약 (TL;DR) Loss-Free Balancing(LFB)은 auxiliary loss를 완전히 제거한 채, 전문가별 bias 한 줄 업데이트만으로 Mixture-of-Experts(MoE) 모델의 ‘로드 밸런스 ↔ 성능’ 딜레 …

2025년 07월 01일

28 분

2408.15664v1 DeepSeek Mixture-of-Experts Load Balancing Loss-Free Learning DeepSeek-MoE Routing Transformer MaxVio

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

논문 링크 한 줄 요약 (TL;DR) FLOPs / token로 재정의한 DeepSeek Scaling Law 하나로 모델·데이터·하이퍼파라미터를 자동 결정하여, 2 T token만으로 67 B 파라미터 모델이 LLaMA-2 70 B를 코드·수학·대 …

2025년 06월 29일

25 분

2401.02954v1 LLM DeepSeek

DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition

Inference-Time Scaling for Generalist Reward Modeling

Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

Code I/O: Condensing Reasoning Patterns via Code Input-Output Prediction

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Janus-Pro: UnifiedMultimodalUnderstanding and Generation with Data and Model Scaling

DeepSeek-V3 Technical Report

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

검색 시작

검색 결과 없음