DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
논문 링크 DeepSeek-VL2 — “작고 빠르면서 고해상도까지 정확한” 멀티모달 LLM 한 줄 요약 (TL;DR) Dynamic Tiling × MLA-MoE × 800 B VL 데이터라는 세 축의 설계로, 4.5 B …
31 분
2412.10302v1
DeepSeek
Multimodal Learning
Vision-Language Models
High-Resolution Image Processing
Dynamic Tiling
Mixture of Experts (MoE)
KV-Cache Compression
Multi-head Latent Attention (MLA)
Visual Grounding
OCR
Parameter Efficiency
LLM Inference Optimization
Edge AI
Open Source Models
Document Understanding
Infographic QA
Chart and Table QA
Visual Reasoning
Multilingual VQA
Conversational AI with Images