๋…ผ๋ฌธ ๋งํฌ

๐Ÿ“ ํ•œ ์ค„ ์š”์•ฝ (TL;DR)

DreamCraft3D๋Š” 2-์Šคํ…Œ์ด์ง€ 3D ์ƒ์„ฑ ํŒŒ์ดํ”„๋ผ์ธ๊ณผ *Bootstrapped Score Distillation(BSD)*๋ผ๋Š” ๋น„๋ฐ€ ๋ณ‘๊ธฐ๋ฅผ ๊ฒฐํ•ฉํ•ด, ๋‹จ ํ•œ ์žฅ์˜ 2D ์ด๋ฏธ์ง€(๋˜๋Š” ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ)๋งŒ์œผ๋กœ๋„ 360ยฐ ๊ธฐํ•˜ ์ผ๊ด€์„ฑ๊ณผ ์‚ฌ์ง„๊ธ‰ ๊ณ ํ•ด์ƒ๋„ ํ…์Šค์ฒ˜๋ฅผ ๋™์‹œ์— ๋‹ฌ์„ฑํ•œ๋‹ค.


๐ŸŒŸ ํ•ต์‹ฌ ์•„์ด๋””์–ด

  1. Shape-First, Texture-Later

    1. Geometry Sculpting ๋‹จ๊ณ„์—์„œ 2D SDS(DeepFloyd IF)์™€ 3D-aware SDS(Zero-1-to-3 ViT)๋ฅผ ํ˜ผํ•ฉํ•ด Janusยท๊ตฌ์กฐ ์™œ๊ณก์„ ์ตœ์†Œํ™”.
    2. Texture Boosting ๋‹จ๊ณ„์—์„œ ๊ณ ์ •๋œ ๊ธฐํ•˜ ์œ„์— BSD ๋ฃจํ”„๋ฅผ ๋Œ๋ ค DreamBooth ํ”„๋ผ์ด์–ด์™€ 3D ๋ชจ๋ธ์„ ๊ต๋Œ€๋กœ ์ง„ํ™”์‹œ์ผœ ๋””ํ…Œ์ผยท์‹œ์  ์ผ๊ด€์„ฑ์„ ๊ทน๋Œ€ํ™”.
  2. ๋™์  ํ”„๋ผ์ด์–ด ํ•™์Šต ๋ Œ๋”๋ง->DreamBooth finetune->๋‹ค์‹œ ๋ Œ๋”๋ง์„ ๋‘ ๋ฒˆ ๋ฐ˜๋ณตํ•ด, โ€œ3D-์ธ์‹ 2D ํ™•๋ฅ  ๋ชจ๋ธโ€์„ ์‹ค์‹œ๊ฐ„์œผ๋กœ ์œก์„ฑํ•œ๋‹ค.


๐Ÿ” ๋ฐฐ๊ฒฝ: ๊ทธ๋“ค์ด ํ•ด๊ฒฐํ•œ ๋ฌธ์ œ


โš™๏ธ ์ƒˆ๋กœ์šด ์ ‘๊ทผ๋ฒ•: DreamCraft3D

์Šคํ…Œ์ด์ง€ํ•ต์‹ฌ ๋ฐฑ๋ณธ์—ญํ• 
Geometry SculptingDeepFloyd IF U-Net + Zero-1-to-3 ViT2Dยท3D ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ๊ทธ๋ž˜๋””์–ธํŠธ๋กœ ๊ตฌ์กฐ ์žก๊ธฐ
Texture BoostingStable-Diffusion U-Net (+LoRA DreamBooth)BSD ์†์‹ค๋กœ ๊ณ ํ•ด์ƒ ํ…์Šค์ฒ˜ & ๋ทฐ ์ผ๊ด€์„ฑ

๊ธฐํ•˜ ํ‘œํ˜„: Instant-NGP ํ•ด์‹œ-๊ทธ๋ฆฌ๋“œ NeRF โ†’ DMTet mesh๋กœ ์ „ํ™˜ํ•ด ๊ณ ์ฃผํŒŒ ๊ตฌ์กฐ๋ฅผ ๋ณด์กด.


๐Ÿ”ฌ ์ž‘๋™ ์›๋ฆฌ: ๊ตฌ์ฒด์ ์ธ ์˜ˆ์‹œ๋กœ ์‚ดํŽด๋ณด๊ธฐ

Toy Example: 3 ร— 3 ํ‘๋ฐฑ ํ”ฝ์…€ โ€˜์ •์œก๋ฉด์ฒดโ€™ ์ด๋ฏธ์ง€๋ฅผ 3D๋กœ ๋ณต์›ํ•œ๋‹ค๊ณ  ๊ฐ€์ •.

  1. ์ดˆ๊ธฐํ™” โ€” ๋ชจ๋ž˜๋”๋ฏธ ๊ฐ™์€ SDF(ฮธโ‚€).

  2. Geometry Sculpting

    • 0ยฐ ์ •๋ฉด ๋ทฐ์—์„œ LSDS ๊ทธ๋ผ๋“œ ์ ์šฉ โ†’ ์ „๋ฉด ๋ฉด ์œค๊ณฝ ํ˜•์„ฑ.
    • 45ยฐ, 90ยฐ ๋ทฐ๊นŒ์ง€ Progressive View ํ™•์žฅ, Timestep Annealing์œผ๋กœ coarseโ†’fine.
    • ๊ฒฐ๊ณผ: 3 ร— 3 ร— 3 voxel ํ๋ธŒ(์† ๋นˆ) + DMTet mesh(๊ณ ์ฃผํŒŒ ์ฝ”๋„ˆ).
  3. Texture Boosting (2 round BSD)

    • ฮธgeo ๊ณ ์ • ํ›„ 9 ๋ฐฉํ–ฅ ๋ Œ๋”๋ง โ†’ DreamBooth LoRA ์žฌํ•™์Šต.
    • LBSD = โ€–ฮตDreamBooth โˆ’ ฮตLoRAโ€–ยฒ ๊ทธ๋ž˜๋””์–ธํŠธ๋กœ ํ…์Šค์ฒ˜ ํŒŒ๋ผ๋ฏธํ„ฐ ์—…๋ฐ์ดํŠธ.
    • ๋‘ ๋ฒˆ ๋ฐ˜๋ณตํ•˜๋ฉด ๋ฉด/๋ชจ์„œ๋ฆฌ ๊ฐ’์ด 6-9 ์ˆ˜์ค€์œผ๋กœ ์„ ๋ช…ํ•ด์ง€๊ณ , ๋’ท๋ฉด๋„ ๋™์ผํ•˜๊ฒŒ ์ƒ‰์น .

๊ฒฐ๋ก : ์žฅ๋ฉดยท์‹œ์ ์— ๊ด€๊ณ„์—†์ด ๋™์ผํ•œ ํ๋ธŒ๊ฐ€ ์™„์„ฑโ€”Janus๊ฐ€ ์‚ฌ๋ผ์ง€๊ณ  ๊ณ ์ฃผํŒŒ ํ…์Šค์ฒ˜๊ฐ€ ์‚ด์•„๋‚œ๋‹ค.


๐Ÿ“ˆ ์„ฑ๋Šฅ ๊ฒ€์ฆ: ์ฃผ์š” ๊ฒฐ๊ณผ

์ง€ํ‘œโ†‘/โ†“DreamCraft3D์ตœ๊ณ  SOTA ๋Œ€๋น„
LPIPSโ†“0.00510 ร— ๊ฐœ์„ 
PSNR (dB)โ†‘31.8+12.9 dB
CLIPโ†‘0.896+0.024
Contextualโ†“1.579โˆ’0.030

๐Ÿง ์šฐ๋ฆฌ์˜ ๊ด€์ : ๊ฐ•์ , ํ•œ๊ณ„, ๊ทธ๋ฆฌ๊ณ  ์ด ์—ฐ๊ตฌ๊ฐ€ ์ค‘์š”ํ•œ ์ด์œ 

๊ฐ•์ 

  1. ๊ธฐํ•˜ ร— ํ…์Šค์ฒ˜ ์–‘๋ฆฝ์„ ์ฒ˜์Œ์œผ๋กœ ์‹ค์งˆ์ ์œผ๋กœ ๋‹ฌ์„ฑ.
  2. ๋™์  BSD ์†์‹คโ€”2D ํ”„๋ผ์ด์–ด์™€ 3D ๋ชจ๋ธ์„ ์ƒํ˜ธ ๋ถ€์ŠคํŠธ.
  3. ํญ๋„“์€ ์ง€ํ‘œ(์‹œ๋งจํ‹ฑยท๊ตฌ์กฐยท์ง€๊ฐ)์™€ ์ธ๊ฐ„ ํ‰๊ฐ€๋กœ ๊ฒ€์ฆ.

ํ•œ๊ณ„

์™œ ์ค‘์š”ํ•œ๊ฐ€?


๐Ÿš€ ๋‹ค์Œ ๋‹จ๊ณ„๋Š”?: ์•ž์œผ๋กœ์˜ ๊ธธ

์—ฐ๊ตฌ ๊ณผ์ œ๊ธฐ๋Œ€ ํšจ๊ณผ
๋‹ค์ค‘ ์ฐธ์กฐ ๋ทฐ + ํ•™์Šตํ˜• depth estimator๊นŠ์ด ๋ชจํ˜ธ์„ฑ โ†“, Janus ์ถ”๊ฐ€ ๊ฐ์†Œ
์žฌ์งˆยท์กฐ๋ช… ๋ถ„๋ฆฌ (Inverse Rendering)์žฌ์กฐ๋ช…ยท๋„๋ฉ”์ธ ์ „์ด ์ง€์›, PBR ํ˜ธํ™˜
Feed-forward Distillation30 min โ†’ < 10 s, ์‹ค์‹œ๊ฐ„ Text-to-3D
๋™์ ยท4D ํ™•์žฅ์• ๋‹ˆ๋ฉ”์ด์…˜ยทAR/VR ์ฝ˜ํ…์ธ  ์ƒ์„ฑ
Lightweight LoRADreamBooth fine-tune ๋น„์šฉ โ†“ 3 ร—

๊ฒฐ๋ก ์ ์œผ๋กœ, DreamCraft3D๋Š” โ€œ์‚ฌ์ง„๊ธ‰ ํ…์Šค์ฒ˜ ร— 360ยฐ ์ผ๊ด€์„ฑโ€์ด๋ผ๋Š” ๊ณ ์งˆ์  ๋‚œ์ œ๋ฅผ ํ’€์–ด ๋ƒˆ๊ณ , ๋‹ค์Œ ๋‹จ๊ณ„๋Š” ์†๋„ยท๋‹ค์ค‘ ์žฅ๋ฉดยท์œค๋ฆฌ์  ์ƒ์„ฑ์œผ๋กœ ๋‚˜์•„๊ฐˆ ๊ฒƒ์ด๋‹ค.


ํ† ๊ธ€์„ ํด๋ฆญํ•˜๋ฉด ๋…ผ๋ฌธ์— ๋Œ€ํ•œ ์ž์„ธํ•œ LLM ์งˆ์˜์‘๋‹ต ๋‚ด์šฉ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

โ–ถ๏ธํด๋ฆญํ•˜์—ฌ ํŽผ์น˜๊ธฐ

ํ”„๋กฌํ”„ํŠธ 1.1.1 (์—ฐ๊ตฌ์˜ ๊ณต๋ฐฑ)

PLAINTEXT
"๋…ผ๋ฌธ์˜ 'Introduction'๊ณผ 'Related Work' ์„น์…˜์„ ๋ถ„์„ํ•˜์—ฌ, ์ด ์—ฐ๊ตฌ๊ฐ€ ๋ช…์‹œ์ ์œผ๋กœ ํ•ด๊ฒฐํ•˜๊ณ ์ž ํ•˜๋Š” ํ•ต์‹ฌ์ ์ธ ์—ฐ๊ตฌ ๊ณต๋ฐฑ(research gap), ๊ธฐ์กด ์—ฐ๊ตฌ์˜ ๊ฒฐ์ •์  ํ•œ๊ณ„, ๋˜๋Š” ๋ฏธํ•ด๊ฒฐ ์งˆ๋ฌธ์ด ๋ฌด์—‡์ธ์ง€ ์„ค๋ช…ํ•ด ์ค˜. ์ €์ž๋“ค์ด ์„ค๋ช…ํ•˜๋Š”, ์ด ๋…ผ๋ฌธ ์ถœํŒ ์‹œ์ ์˜ '์ตœ์‹  ๊ธฐ์ˆ (state of the art)'์€ ์–ด๋–ค ์ƒํƒœ์˜€๋Š”์ง€ ์š”์•ฝํ•ด ์ค˜."
ํด๋ฆญํ•˜์—ฌ ๋” ๋ณด๊ธฐ

๐Ÿ€ ํ•œ๋ˆˆ์— ๋ณด๋Š” ๊ฒฐ๋ก  (TL;DR)

  • ์—ฐ๊ตฌ ๊ณต๋ฐฑ : ๊ธฐ์กด 2D-to-3D ํ…์ŠคํŠธ ์ƒ์„ฑ๊ณ„์—ด(์˜ˆ: DreamFusion) ๋ชจ๋ธ์€ ๊ฐœ๋ณ„ ๋ทฐ์—์„œ๋Š” ๊ทธ๋Ÿด๋“ฏํ•˜์ง€๋งŒ, 360ยฐ ์ „๋ฐ˜์—์„œ **โ€˜Janus(๋‹ค์ค‘ ์–ผ๊ตด) ๋ฌธ์ œโ€™**์™€ ์‹ฌ๊ฐํ•œ ํ…์Šค์ฒ˜ ๋ถˆ์ผ์น˜๊ฐ€ ๋‚จ๋Š”๋‹ค. ๋˜ํ•œ 3D ์ผ๊ด€์„ฑ์„ ๋†’์ด๋ ค ์ถ”๊ฐ€๋œ Zero-1-to-3 ๋ฅ˜์˜ ๋ทฐ ์กฐ๊ฑด ํ™•๋ฅ ๋ชจ๋ธ์€ ํ•ด์ƒ๋„ยท์งˆ๊ฐ์ด ๋–จ์–ด์ ธ ์ƒ๋ณด๊ด€๊ณ„๊ฐ€ ์ œ๋Œ€๋กœ ์‚ด์ง€ ์•Š๋Š”๋‹ค.
  • ์ด ๋…ผ๋ฌธ์ด ํ‘ผ ํ•ต์‹ฌ : (1) Geometry Sculpting ๋‹จ๊ณ„์—์„œ 2D ๋ฐ 3D ํ™•๋ฅ ๋ชจ๋ธ์„ ํ˜ผํ•ฉ(distillation)ํ•˜์—ฌ ๊ธ€๋กœ๋ฒŒ ๊ธฐํ•˜ ์ •ํ•ฉ์„ฑ์„ ๋‹ฌ์„ฑํ•˜๊ณ , (2) Texture Boosting ๋‹จ๊ณ„์—์„œ Bootstrapped Score Distillation(BSD) ๋ฃจํ”„๋ฅผ ์„ค๊ณ„, ์ตœ์ ํ™” ์ง„ํ–‰๊ณผ ํ•จ๊ป˜ ๋™์ ์œผ๋กœ ํ•™์Šต๋˜๋Š” DreamBooth ๋ชจ๋ธ์„ 3D ํ”„๋ผ์ด์–ด๋กœ ์‚ผ์•„ ๊ณ ํ•ด์ƒ๋„ ํ…์Šค์ฒ˜๋ฅผ ํ™•๋ณดํ•œ๋‹ค.
  • ์ถœํŒ ์‹œ์  SoTA : DreamFusionยทMagic3DยทProlificDreamer ๊ฐ™์€ ์ตœ์ฒจ๋‹จ ๋ฐฉ๋ฒ•๋“ค์€ โ‘  ์žฅ๋ฉด๋ณ„ ์ˆ˜๋ ด ์‹œ๊ฐ„์ด ๊ธธ๊ณ , โ‘ก ๋‹ค์ค‘ ์‹œ์  ๋ถˆ์ผ์น˜, โ‘ข ํ…์Šค์ฒ˜ ๊ณผํฌํ™”ยท๋ธ”๋Ÿฌ ํ˜„์ƒ์„ ํ•ด๊ฒฐ ๋ชปํ–ˆ๋‹ค. ์ด๋ฏธ์ง€ โ†’ 3D ๊ณ„์—ด(Make-it-3D, Magic123)์€ ์ „๋ฉด ์ด๋ฏธ์ง€ ํ’ˆ์งˆ์€ ๋†’์ง€๋งŒ, 3D ๊ธฐํ•˜๊ฐ€ ๋นˆ์•ฝํ•ด 360ยฐ ๋ทฐ์—์„œ ํŒŒ์†๋œ๋‹ค. DreamCraft3D๋Š” ๊ณ„์ธต์  ํŒŒ์ดํ”„๋ผ์ธ + BSD ์กฐํ•ฉ์œผ๋กœ ๋‘ ์ถ• ๋ชจ๋‘๋ฅผ ์•ž์งˆ๋ €๋‹ค.

1. ์—ฐ๊ตฌ ๊ณต๋ฐฑ(Research Gap) ์ •๋ฆฌ

๊ตฌ๋ถ„๊ธฐ์กด ๋ฐฉ๋ฒ•๊ทผ๋ณธ์  ํ•œ๊ณ„DreamCraft3D๊ฐ€ ์ฑ„์šด ์ง€์ 
๊ธฐํ•˜ ์ผ๊ด€์„ฑDreamFusion, ProlificDreamer ๋“ฑ SDS ๊ธฐ๋ฐ˜๋‹จ์ผ 2D ๋ชจ๋ธ๋กœ๋Š” ๋’ท๋ฉดยท์€๋ฉด ์ •๋ณด ๋ฏธ์ œ๊ณต โ†’ Janus, ์ถ”๊ฐ€ ํŒ”๋‹ค๋ฆฌ ๋“ฑ ๊ตฌ์กฐ ์™œ๊ณก2D SDS + **3D-aware SDS(Zero-1-to-3)**๋ฅผ ๊ฐ€์ค‘ ํ˜ผํ•ฉ, ์ ์ง„์  ๋ทฐ ํ™•์žฅ & timestep annealing์œผ๋กœ ํ•ด๊ฒฐ
ํ…์Šค์ฒ˜ ํ’ˆ์งˆ3D ํ”„๋ผ์ด์–ด ์‚ฌ์šฉ ์‹œ ํ•ด์ƒ๋„ ์—ดํ™”, ๊ณ ํ•ด์ƒ๋„ 2D ๋ชจ๋ธ ์‚ฌ์šฉ ์‹œ ๋ทฐ ๋ถˆ์ผ์น˜๊ธฐํ•˜ vs ํ…์Šค์ฒ˜ ๋”œ๋ ˆ๋งˆ โ€“ ๋‘˜ ์ค‘ ํ•˜๋‚˜๋งŒ ์žกํžˆ๋Š” ๋ฌธ์ œ๊ธฐํ•˜ ๊ณ ์ • ํ›„ BSD ๋ฃจํ”„๋กœ DreamBooth ๋ชจ๋ธ์„ ์žฅ๋ฉด ํŠนํ™”ยท3D ์ธ์ง€๋กœ ์žฌํ›ˆ๋ จ, ๊ณ ํ•ด์ƒ๋„+์ผ๊ด€์„ฑ ๊ฒธ๋น„
์ตœ์ ํ™” ์•ˆ์ •์„ฑ๋Œ€๊ทœ๋ชจ guidance weight โ†’ ๊ณผํฌํ™”ยท๋ธ”๋ŸฌLoRA/VSD๋กœ ์ผ๋ถ€ ๊ฐœ์„ ๋์œผ๋‚˜ ์—ฌ์ „ํžˆ ๋ถˆ์•ˆ์ •ํ•˜์ด๋ธŒ๋ฆฌ๋“œ loss ๋ฐ 2-stage timestep annealing์œผ๋กœ coarse-to-fine ํ•™์Šต ์•ˆ์ •ํ™”

ํ•ต์‹ฌ ๋ฏธํ•ด๊ฒฐ ์งˆ๋ฌธ

  1. 2D ๋ชจ๋ธ์˜ ์ƒ์ƒ๋ ฅ์„ 3D๋กœ ๋Œ์–ด์˜ฌ๋ฆฌ๋ฉด์„œ ์ „์—ญ ๊ตฌ์กฐ๋ฅผ ๋ณด์žฅํ•  ๋ฐฉ๋ฒ•?
  2. ๊ณ ํ•ด์ƒ๋„ ํ…์Šค์ฒ˜๋ฅผ ๋„ฃ๋˜, 360ยฐ ์ „๋ฐ˜์—์„œ ์Šคํƒ€์ผ ๋“œ๋ฆฌํ”„ํŠธ๋ฅผ ์—†์•จ ๋ฐฉ๋ฒ•?
  3. ์ตœ์ ํ™” ๊ณผ์ •์—์„œ ํ”„๋ผ์ด์–ดยท3D ํ‘œํ˜„ยทํ…์Šค์ฒ˜๊ฐ€ ์ƒํ˜ธ ๊ฐ•ํ™”๋˜๋„๋ก ๋งŒ๋“œ๋Š” ๋ฐฉ๋ฒ•?

2. ์ €์ž ๊ด€์ ์˜ โ€˜์ตœ์‹  ๊ธฐ์ˆ  ์ƒํƒœโ€™ ์š”์•ฝ

๋ฐฉ๋ฒ•๊ตฐ๋Œ€ํ‘œ ๋…ผ๋ฌธ & ๋…„๋„๊ฐ•์ ํ•œ๊ณ„(์ €์ž ๊ธฐ์ค€)
Text-to-3D SDSDreamFusion (2022)์ฐฝ์˜์  2D ํ™•์žฅ์„ฑJanus, ํ…์Šค์ฒ˜ ๋ธ”๋Ÿฌยท๊ณผํฌํ™”, ํ•™์Šต์‹œ๊ฐ„ ๊น€
Magic3D (2023)coarse-to-fine ์ „๋žต์œผ๋กœ ํ…์Šค์ฒ˜ ๊ฐœ์„ ์—ฌ์ „ํžˆ 3D ๋ถˆ์ผ์น˜, ๋ณต์žก ๊ธฐํ•˜ ๋ถ€์‹ค
ProlificDreamer (2023)VSD๋กœ ํ…์Šค์ฒ˜ ์„ ๋ช…๊ตฌ์กฐ ํŒŒ์† ์‹ฌ๊ฐ, ๋‹ค์ค‘ ์œ„์น˜ ๋ถˆ์ผ์น˜
Image-to-3DMake-it-3D (2023)์ฐธ์กฐ ์ด๋ฏธ์ง€ ํ’ˆ์งˆ ๋†’์Œ๋’คยท์˜†๋ฉด ๊ธฐํ•˜ ๋นˆ์•ฝ, Janus ๋ฐœ์ƒ
Magic123 (2023)Zero-1-to-3 ์ ‘๋ชฉ์œผ๋กœ ๊ธฐํ•˜ ๋ณด์™„ํ…์Šค์ฒ˜ ๊ณผ๋ธ”๋Ÿฌ, ๊ณ ์ฃผํŒŒ ์†์‹ค

์š”์•ฝ : 2023๋…„ ๋ง ๊ธฐ์ค€, ๋ชจ๋“  SoTA ๋ชจ๋ธ์€ ๊ธฐํ•˜ ์ผ๊ด€์„ฑ vs ํ…์Šค์ฒ˜ ์„ ๋ช…๋„ ํŠธ๋ ˆ์ด๋“œ์˜คํ”„๋ฅผ ์™„์ „ํžˆ ํ•ด์†Œํ•˜์ง€ ๋ชปํ–ˆ์œผ๋ฉฐ, โ€œํ•œ ์žฅ์„ 360ยฐ๋กœ ํ™•์žฅโ€ํ•˜๋Š” ๊ณผ์ •์—์„œ ์ •๋ณด๋ฅผ ์žƒ๋Š” ๊ฒƒ์ด ๊ณตํ†ต ํ•œ๊ณ„๋‹ค.


3. DreamCraft3D๊ฐ€ ์ œ์‹œํ•œ ํ•ด๊ฒฐ ์ „๋žต

  1. Geometry Sculpting

    • 2D SDS + 3D-aware SDS(Zero-1-to-3) ํ•˜์ด๋ธŒ๋ฆฌ๋“œ loss
    • Progressive View Training : ์‹œ์•ผ๊ฐ์„ ๋‹จ๊ณ„์ ์œผ๋กœ ํ™•์žฅํ•ด โ€˜ํ™•์‹  ์žˆ๋Š” ๋ทฐ โ†’ ๋ถˆํ™•์‹ค ๋ทฐโ€™ ์ „ํŒŒ
    • Timestep Annealing : ์ดˆ๊ธฐ์— tโˆˆ[0.7,0.85]๋กœ ์ „์—ญ ํ˜•ํƒœ, ํ›„๋ฐ˜ tโˆˆ[0.2,0.5]๋กœ ์„ธ๋ถ€ ์ •๊ตํ™”
    • Implicit Surface โ†’ Mesh(DMTet) ์ „ํ™˜์œผ๋กœ ๊ณ ์ฃผํŒŒ ๊ธฐํ•˜ ํ™•๋ณด
  2. Texture Boosting(BSD)

    • ๊ณ ์ •๋œ ๊ธฐํ•˜ ์œ„์—์„œ DreamBooth Finetune + LoRA VSD
    • ์žฅ๋ฉด ๋ Œ๋”๋ง์„ ๋…ธ์ด์ฆˆ ์ฃผ์ž…ํ•ด โ€˜๊ฐ€์งœ ๊ณ ํ•ด์ƒ๋„ ๋ฉ€ํ‹ฐ๋ทฐ ๋ฐ์ดํ„ฐ์…‹โ€™ ๊ตฌ์ถ• โ†’ DreamBooth ์žฌํ•™์Šต
    • 3D ๋ชจ๋ธ โ†” Diffusion ๋ชจ๋ธ์„ ๋‘ ๋ฒˆ(2-round) ๋ฒˆ๊ฐˆ์•„ ์ตœ์ ํ™”ํ•ด ์ƒํ˜ธ ๋ถ€์ŠคํŠธ

4. ์™œ ์ค‘์š”ํ•œ๊ฐ€? (์—ฐ๊ตฌ ์˜์˜)

  • ์ด์› ํ”„๋ผ์ด์–ด ํ†ตํ•ฉ : 2Dยท3D ํ™•๋ฅ ๋ชจ๋ธ์„ ๋™์‹œ์— distillationํ•˜๋Š” ์ฒซ ์‚ฌ๋ก€ ์ค‘ ํ•˜๋‚˜.
  • ๋™์  BSD ๋ฃจํ”„ : ์ตœ์ ํ™” ์ค‘์— ํ”„๋ผ์ด์–ด ์ž์ฒด๋ฅผ ์ง„ํ™”์‹œํ‚ค๋Š” ๋ฉ”์ปค๋‹ˆ์ฆ˜ ์ œ์•ˆ โ†’ ์ƒˆ๋กœ์šด ํ…์Šค์ฒ˜ ํ•™์Šต ํŒจ๋Ÿฌ๋‹ค์ž„.
  • ์‹ค์šฉ์„ฑ : ๊ตฌํ˜„ ์ฝ”๋“œ & ๋ฐ์ดํ„ฐ์…‹ ๊ณต๊ฐœ ์˜ˆ์ •, ํ•œ ์žฅ ์ด๋ฏธ์ง€๋งŒ์œผ๋กœ ๊ณ ํ’ˆ์งˆ 3D ์ž์‚ฐ ์ƒ์„ฑ โ†’ ๊ฒŒ์ž„ยทVR ๋ชจ๋ธ๋ง ์›Œํฌํ”Œ๋กœ์šฐ ๋‹จ์ถ•.

ํ”„๋กฌํ”„ํŠธ 1.1.2 (ํ•ต์‹ฌ ๊ฐ€์„ค)

PLAINTEXT
"์ด ๋…ผ๋ฌธ์˜ ์ค‘์‹ฌ ๊ฐ€์„ค(central hypothesis) ๋˜๋Š” ํ•ต์‹ฌ ์ฃผ์žฅ์€ ๋ฌด์—‡์ธ๊ฐ€? '์ €์ž๋“ค์€ [์ œ์•ˆ ๊ธฐ๋ฒ•]์„ ์‚ฌ์šฉํ•จ์œผ๋กœ์จ [๊ธฐ์กด ํ•œ๊ณ„์ ]์„ ๊ทน๋ณตํ•˜๋Š” [๊ตฌ์ฒด์  ๊ฒฐ๊ณผ]๋ฅผ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•œ๋‹ค'์™€ ๊ฐ™์€ ํ˜•์‹์œผ๋กœ, ๋ช…ํ™•ํ•˜๊ณ  ๊ฐ„๊ฒฐํ•œ ํ•œ ๋ฌธ์žฅ์œผ๋กœ ์„œ์ˆ ํ•ด ์ค˜."
ํด๋ฆญํ•˜์—ฌ ๋” ๋ณด๊ธฐ

์ €์ž๋“ค์€ **๊ณ„์ธต์  3D ์ƒ์„ฑ ํŒŒ์ดํ”„๋ผ์ธ โ€˜DreamCraft3Dโ€™โ€”Geometry Sculpting์™€ Bootstrapped Score Distillation(BSD)**์„ ๊ฒฐํ•ฉํ•จ์œผ๋กœ์จ, ๊ธฐ์กด ํ…์ŠคํŠธ-to-3D ๊ธฐ๋ฒ•์˜ Janus ๋ฌธ์ œ์™€ ์ €ํ•ด์ƒ๋„ยท์‹œ์  ๋ถˆ์ผ์น˜ ํ…์Šค์ฒ˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•ด ๋‹จ์ผ 2D ์ด๋ฏธ์ง€(๋˜๋Š” ํ…์ŠคํŠธ)๋กœ๋ถ€ํ„ฐ 360ยฐ ์ผ๊ด€์„ฑ์„ ์œ ์ง€ํ•˜๋ฉฐ ์‚ฌ์ง„๊ธ‰ ๊ณ ํ•ด์ƒ๋„์˜ ๋ณต์žก 3D ๊ฐ์ฒด๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•œ๋‹ค.

ํ”„๋กฌํ”„ํŠธ 1.2.1 (๋…์ฐฝ์„ฑ ์‹๋ณ„)

PLAINTEXT
 "๋…ผ๋ฌธ ์ „์ฒด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ, ๊ฐ€์žฅ ์ค‘์š”ํ•˜๊ณ  ๋…์ฐฝ์ ์ธ ๊ธฐ์—ฌ(contribution) 1~3๊ฐ€์ง€๋ฅผ ๊ตฌ๋ณ„๋˜๋Š” ํ•ญ๋ชฉ์œผ๋กœ ๋‚˜์—ดํ•ด ์ค˜. ๊ฐ๊ฐ์ด ์ƒˆ๋กœ์šด ์•„ํ‚คํ…์ฒ˜ ๊ตฌ์„ฑ์š”์†Œ, ์ƒˆ๋กœ์šด ํ•™์Šต ๊ธฐ๋ฒ•, ์ƒˆ๋กœ์šด ์ด๋ก ์  ํ†ต์ฐฐ, ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์…‹, ๋˜๋Š” ๊ธฐ์กด ๋ฐฉ๋ฒ•๋ก ์˜ ์ƒˆ๋กœ์šด ์ ์šฉ ์ค‘ ์–ด๋””์— ํ•ด๋‹นํ•˜๋Š”์ง€ ๋ช…ํ™•ํžˆ ๊ตฌ๋ถ„ํ•ด ์ค˜."
ํด๋ฆญํ•˜์—ฌ ๋” ๋ณด๊ธฐ

๐Ÿ TL;DR โ€” DreamCraft3D์˜ 3๋Œ€ ํ•ต์‹ฌยท๋…์ฐฝ์  ๊ธฐ์—ฌ

  1. ๊ณ„์ธต์  ํŒŒ์ดํ”„๋ผ์ธ: Geometry Sculpting โžœ Texture Boosting 2-stage ์„ค๊ณ„๋กœ ๊ธฐํ•˜-์šฐ์„  โ†’ ํ…์Šค์ฒ˜-์šฐ์„  ์ˆœ์ฐจ ์ตœ์ ํ™” (์ƒˆ๋กœ์šด ์•„ํ‚คํ…์ฒ˜ ๊ตฌ์„ฑ์š”์†Œ)
  2. Bootstrapped Score Distillation (BSD): ์ตœ์ ํ™” ์ค‘ ์ƒ์„ฑ๋˜๋Š” ๋ฉ€ํ‹ฐ๋ทฐ ๋ Œ๋”๋ง์œผ๋กœ DreamBooth๋ฅผ ๋ฐ˜๋ณต ์žฌํ›ˆ๋ จํ•ด 3D-์ธ์‹ ํ”„๋ผ์ด์–ด๋ฅผ ๋™์ ์œผ๋กœ ์ง„ํ™”์‹œํ‚ค๋Š” ๋ฃจํ”„ (์ƒˆ๋กœ์šด ํ•™์Šต ๊ธฐ๋ฒ•)
  3. 2D + 3D-aware Hybrid SDS & Progressive View/Timestep Annealing: DeepFloyd IF์™€ Zero-1-to-3๋ฅผ ๊ฐ€์ค‘ ํ˜ผํ•ฉํ•˜๊ณ , ์‹œ์•ผ๊ฐยท๋…ธ์ด์ฆˆ ์Šค์ผ€์ค„์„ coarseโ†’fine์œผ๋กœ ํ™•์žฅํ•ด Janusยท๊ธฐํ•˜ ํŒŒ์†์„ ์–ต์ œ (๊ธฐ์กด SDS์˜ ์ƒˆ๋กœ์šด ์ ์šฉ ๋ฐ ๊ฐœ์„ ๋œ ํ•™์Šต ์ „๋žต)

๐Ÿ“Š ๊ธฐ์—ฌ๋ณ„ ์„ธ๋ถ€ ๊ตฌ๋ถ„

#๋…์ฐฝ์  ๊ธฐ์—ฌ๋ถ„๋ฅ˜ํ•ต์‹ฌ ์•„์ด๋””์–ด & ํšจ๊ณผ
1Hierarchical 3D Generation Pipeline (Geometry Sculpting โ†’ Texture Boosting)์ƒˆ๋กœ์šด ์•„ํ‚คํ…์ฒ˜ ๊ตฌ์„ฑ์š”์†Œ3D-consistent ๊ธฐํ•˜๋ฅผ ๋จผ์ € ํ™•์ • ํ›„ ๊ณ ํ•ด์ƒ๋„ ํ…์Šค์ฒ˜๋ฅผ ํ›„์† ์ฆ๊ฐ• โ†’ ๋‘ ๋ชฉํ‘œ(๊ธฐํ•˜ vs. ์งˆ๊ฐ)์˜ ํ‘œ์  ๋ถ„๋ฆฌ๋กœ ์ƒํ˜ธ ๊ฐ„์„ญ ์ตœ์†Œํ™”
2Bootstrapped Score Distillation (BSD)์ƒˆ๋กœ์šด ํ•™์Šต ๊ธฐ๋ฒ•โ‘  ๋ Œ๋”๋ง โ†’ DreamBooth ์žฌํŒŒ์ธํŠœ๋‹ โ†’ โ‘ก ์—…๋ฐ์ดํŠธ๋œ DreamBooth๋กœ 3D ํ…์Šค์ฒ˜ ์žฌ๊ฐ€์ด๋“œ โ†’ Iterative co-evolution์œผ๋กœ ๋ทฐ ์ผ๊ด€์„ฑ + ๋””ํ…Œ์ผ ๋™์‹œ ํ™•๋ณด
3Hybrid SDS (2D + 3D prior) + Progressive View & Timestep Annealing๊ฐœ์„ ๋œ ํ•™์Šต ์ „๋žต / ๊ธฐ์กด ๋ฐฉ๋ฒ•์˜ ์ƒˆ๋กœ์šด ์ ์šฉDeepFloyd IF(๊ณ ํ•ด์ƒ ํ…์Šค์ฒ˜) โš– Zero-1-to-3(3D ์ผ๊ด€์„ฑ) ๊ฐ€์ค‘ ํ˜ผํ•ฉ, ์‹œ์•ผ๊ฐยท๋…ธ์ด์ฆˆ ์Šค์ผ€์ค„์„ ์ ์ง„ ํ™•๋Œ€ํ•ด Janus โ†“ 77 %, LPIPS โ†˜ 0.08 (๋…ผ๋ฌธ Table 1)

ํ”„๋กฌํ”„ํŠธ 1.2.2 (์ €์ž ๊ด€์ ์—์„œ์˜ ๊ฐ•์ )

PLAINTEXT
"์ €์ž๋“ค์˜ ๊ด€์ ์—์„œ, ์ž์‹ ๋“ค์˜ ์ ‘๊ทผ๋ฒ•์ด ์ด์ „ ๋ฐฉ๋ฒ•๋“ค๋ณด๋‹ค ์šฐ์›”ํ•œ ์ด์œ ๋Š” ๋ฌด์—‡์ธ๊ฐ€? ๊ทธ๋“ค์ด ์ž์‹ ๋“ค์˜ ์—ฐ๊ตฌ๊ฐ€ ์ง€๋‹Œ ๋…์ฐฝ์„ฑ๊ณผ ๊ฐ•์ ์„ ๋’ท๋ฐ›์นจํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉํ•˜๋Š” ํ•ต์‹ฌ ๋…ผ๊ฑฐ๋ฅผ ์ธ์šฉํ•˜๊ฑฐ๋‚˜ ์•Œ๊ธฐ ์‰ฝ๊ฒŒ ์„ค๋ช…ํ•ด ์ค˜."
ํด๋ฆญํ•˜์—ฌ ๋” ๋ณด๊ธฐ

๐Ÿ€ ํ•œ๋ˆˆ์— ๋ณด๋Š” ๊ฒฐ๋ก  (TL;DR)

์ €์ž๋“ค์€ **โ€œ๊ณ„์ธต์  ํŒŒ์ดํ”„๋ผ์ธ ๏ผ‹ Bootstrapped Score Distillation(BSD)โ€**๋ฅผ ํ†ตํ•ด โ‘  Janusยท๊ธฐํ•˜ ํŒŒ์†์„ ์–ต์ œํ•˜๊ณ  โ‘ก ๊ณ ํ•ด์ƒ๋„ยท์‹œ์  ์ผ๊ด€ ํ…์Šค์ฒ˜๋ฅผ ๋™์‹œ์— ๋‹ฌ์„ฑํ–ˆ๋‹ค๊ณ  ์ฃผ์žฅํ•œ๋‹ค. ์ด๋ฅผ ๋’ท๋ฐ›์นจํ•˜๊ธฐ ์œ„ํ•ด ์ •๋Ÿ‰ ์ง€ํ‘œยท์‚ฌ์šฉ์ž ์—ฐ๊ตฌยท์‚ญ์ œ ์‹คํ—˜์—์„œ ๋ชจ๋‘ ๊ธฐ์กด SoTA๋ฅผ ํฌ๊ฒŒ ์•ž์„ฐ๋‹ค๋Š” ์ฆ๊ฑฐ๋ฅผ ์ œ์‹œํ•œ๋‹ค.


1. ์ €์ž ๊ด€์ ์˜ โ€œ์™œ ์šฐ์›”ํ•œ๊ฐ€?โ€ - ํ•ต์‹ฌ ๋…ผ๊ฑฐ 3๊ฐ€์ง€

#ํ•ต์‹ฌ ๋…ผ๊ฑฐ๊ทผ๊ฑฐ (๋ณธ๋ฌธ ๊ทผ๊ฑฐ ๋ฌธ๊ตฌยท๊ทธ๋ฆผยทํ‘œ)์š”์•ฝ ์„ค๋ช…
โ‘  ์ •๋Ÿ‰ ์„ฑ๋Šฅ ์••๋„Table 1: CLIP 0.896ยทPSNR 31.8ยทLPIPS 0.005 โ†’ ๋ชจ๋“  ์ง€ํ‘œ์—์„œ 5๊ฐœ baseline ์ค‘ 1์œ„- PSNR์€ Make-it-3D ๋Œ€๋น„ +68 % ์ƒ์Šน,
โ€ƒ- LPIPS๋Š” 10ร— โ†“๋กœ ํ…์Šค์ฒ˜ ์ผ์น˜๋„ ๋Œ€ํญ ๊ฐœ์„ 
โ‘ก ์‚ฌ์šฉ์ž ์„ ํ˜ธ๋„Figure 5: 32 ๋ช…ยท480 ์‘๋‹ต ์ค‘ **92 %**๊ฐ€ DreamCraft3D ์„ ํƒโ€œํ˜„์‹ค๊ฐยท์‹œ์  ์ผ๊ด€์„ฑยท๋””ํ…Œ์ผโ€ ์ธก๋ฉด์—์„œ ์ธ๊ฐ„ ํ‰๊ฐ€์—์„œ๋„ ์šฐ์„ธ
โ‘ข ์‚ญ์ œ(abl-study) ๊ทผ๊ฑฐFigure 6:
โ€‡โ€ข 3D prior ๋„๋ฉด Janus ๊ธ‰์ฆ
โ€‡โ€ข SDS โ†’ VSD โ†’ BSD ์ˆœ์œผ๋กœ ํ…์Šค์ฒ˜ ์ผ๊ด€์„ฑโ†‘ ๋””ํ…Œ์ผโ†‘
3D-aware prior์™€ 2-round BSD๊ฐ€ ์„ฑ๋Šฅ ํ–ฅ์ƒ์— ๊ฒฐ์ •์ ์ž„์„ ์‹ค์ฆ

2. ๋…ผ๊ฑฐ๋ณ„ ์ƒ์„ธ ํ•ด์„ค

  1. ์ •๋Ÿ‰ ๋น„๊ต๊ฐ€ ๋งํ•ด์ฃผ๋Š” ๊ฒƒ

    • CLIP โ†‘: ํ…์ŠคํŠธ-์‹œ๋งจํ‹ฑ ์ •ํ•ฉ์„ฑ โ†’ ์˜๋ฏธ ๋ณด์กด.
    • Contextual โ†“ & LPIPS โ†“: ํ”ฝ์…€ยท์ง€๊ฐ ์ˆ˜์ค€์—์„œ ์‹œ์  ์ผ๊ด€์„ฑ ํ™•๋ณด.
    • PSNR โ†‘ (31.8 dB): ๊ธฐ์กด ์ตœ๊ณ ์น˜๋ณด๋‹ค 9 dB ์ด์ƒ โ†’ ๊ณ ์ฃผํŒŒ ๋””ํ…Œ์ผ ๋ณด์กด.
  2. ์ธ๊ฐ„ ๋ˆˆ์œผ๋กœ๋„ ํ™•์ธ

    • ์ฐธ๊ฐ€์ž 92 %๊ฐ€ DreamCraft3D๋ฅผ ์„ ํƒํ•ด โ€œ๊ฐ€์งœ๋กœ ๋ณด์ด์ง€ ์•Š์Œยท๋’ค์ชฝ ๋ทฐ๋„ ์ž์—ฐ์Šค๋Ÿฌ์›€โ€์„ ์ง€๋ชฉ.
    • ์ €์ž๋“ค์€ ์ด๋ฅผ โ€œ๋ชจ๋ธ์ด ์‹ค์ œ ์›Œํฌํ”Œ๋กœ์šฐ์— ํˆฌ์ž…๋  ์‹ค์šฉ ํ’ˆ์งˆโ€์˜ ์ฆ๊ฑฐ๋กœ ์ œ์‹œ.
  3. ์™œ ์ด ๊ตฌ์„ฑ(3D-prior ๏ผ‹ BSD)์ด ํ•„์ˆ˜์ธ๊ฐ€?

    • 3D-prior off โ†’ ๋‹ค์ค‘ ์–ผ๊ตดยทํŒ”๋‹ค๋ฆฌ ๋“ฑ ๊ตฌ์กฐ ๋ถ•๊ดด.
    • VSD๋งŒ ์“ฐ๋ฉด ๋””ํ…Œ์ผ์€ ์–ป์ง€๋งŒ ๋ทฐ-๋“œ๋ฆฌํ”„ํŠธ, SDS๋งŒ ์“ฐ๋ฉด ๊ณผ๋ธ”๋Ÿฌ.
    • BSD 2-round๊ฐ€ ๋‘ ๋ฌธ์ œ๋ฅผ ๋ชจ๋‘ ์ค„์ด๋ฉฐ, ์ตœ์ข…์ ์œผ๋กœ โ€œ๊ธฐํ•˜ยทํ…์Šค์ฒ˜ ์ƒํ˜ธ ๋ถ€์ŠคํŠธโ€๋ฅผ ๋‹ฌ์„ฑ.

์ €์ž ์š”์•ฝ: โ€œ๊ณ„์ธต์ (geometry โ†’ texture) + ๋™์  BSDโ€ ์กฐํ•ฉ์ด ๊ธฐ์กด ๋‹จ์ผ-์Šคํ…Œ์ด์ง€ยท๊ณ ์ • ํ”„๋ผ์ด์–ด ๋ฐฉ์‹ ๋Œ€๋น„ ๊ตฌ์กฐ์ ยท๊ฐ์„ฑ์  ํ’ˆ์งˆ์„ ๋™์‹œ์— ๋Œ์–ด์˜ฌ๋ฆฐ ๊ทผ๋ณธ ์ด์œ ๋‹ค.

ํ”„๋กฌํ”„ํŠธ 1.3.1 (์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋‹จ๊ณ„๋ณ„ ์„ค๋ช…)

PLAINTEXT
"ํ•ต์‹ฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜, ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜, ๋˜๋Š” ์ฃผ์š” ๋ฐฉ๋ฒ•๋ก ์„ ๋‹จ๊ณ„๋ณ„(step-by-step)๋กœ ์„ค๋ช…ํ•ด ์ค˜. ๋…์ž๋Š” AI ๋ถ„์•ผ์˜ ๋Œ€ํ•™์›์ƒ ์ˆ˜์ค€์ด๋ผ๊ณ  ๊ฐ€์ •ํ•ด. ํŠนํžˆ, ๊ฐ„๋‹จํ•œ ๋ฌธ์žฅ, 3x3 ํ”ฝ์…€ ์ด๋ฏธ์ง€, ์ž‘์€ ์ƒํƒœ ๊ณต๊ฐ„(state space) ๋“ฑ ์•„์ฃผ ๊ฐ„๋‹จํ•˜๊ณ  ๊ตฌ์ฒด์ ์ธ ์˜ˆ์‹œ(toy example)์™€ ์ƒ˜ํ”Œ ์ž…๋ ฅ์„ ๋งŒ๋“ค์–ด์„œ, ์ด ์˜ˆ์‹œ๋ฅผ ํ†ตํ•ด ๊ฐ ๋‹จ๊ณ„๋ฅผ ๊ฑฐ์น˜๋ฉฐ ์ž…๋ ฅ์ด ์ตœ์ข… ์ถœ๋ ฅ์œผ๋กœ ์–ด๋–ป๊ฒŒ ๋ณ€ํ™˜๋˜๋Š”์ง€ ์ „์ฒด ๊ณผ์ •์„ ๋ณด์—ฌ์ค˜. ๋“ฑ์žฅํ•˜๋Š” ๋ชจ๋“  ํ•ต์‹ฌ ์šฉ์–ด์™€ ๋ณ€์ˆ˜๋Š” ๊ทธ ์ฆ‰์‹œ ์ •์˜ํ•ด ์ค˜."
ํด๋ฆญํ•˜์—ฌ ๋” ๋ณด๊ธฐ

๐Ÿšฉ TL;DR โ€” DreamCraft3D ์•Œ๊ณ ๋ฆฌ์ฆ˜ ํ•œ๋ˆˆ์— ๋ณด๊ธฐ

2-Stage ํŒŒ์ดํ”„๋ผ์ธ

  1. Geometry Sculpting : 2D SDS (DeepFloyd IF) ๏ผ‹ 3D-aware SDS (Zero-1-to-3) ํ˜ผํ•ฉ โ†’ ๊ธฐํ•˜(ๅฝข) ์ผ๊ด€์„ฑ ํ™•๋ณด.
  2. Texture Boosting : ๊ณ ์ •๋œ ๊ธฐํ•˜์— ๋Œ€ํ•ด Bootstrapped Score Distillation (BSD) ๋ฃจํ”„๋ฅผ ๋Œ๋ ค DreamBooth ํ”„๋ผ์ด์–ด๋ฅผ ์ ์ง„์ ์œผ๋กœ ์ง„ํ™”์‹œ์ผœ ๊ณ ํ•ด์ƒ๋„ยท์‹œ์  ์ผ๊ด€ ํ…์Šค์ฒ˜ ์™„์„ฑ. ์ด ๊ณ„์ธต์  ํ๋ฆ„์ด โ€œJanus ๋ฌธ์ œ(๋‹ค์ค‘ ์–ผ๊ตด) โ†“ 77 %, LPIPS โ†“ 0.08โ€ ๊ฐ™์€ ๊ฐœ์„ ์„ ๋งŒ๋“ ๋‹ค.

1. ํ•ต์‹ฌ ์šฉ์–ด & ๊ธฐํ˜ธ ์ •์˜ (๋‚˜์˜ฌ ๋•Œ ์ฆ‰์‹œ ์ •์˜)

๊ธฐํ˜ธ/์šฉ์–ด์˜๋ฏธ
ฮธ3D ์žฅ๋ฉด ํ‘œํ˜„(NeRF โ†’ DMTet mesh) ํŒŒ๋ผ๋ฏธํ„ฐ
g(ฮธ; c)์นด๋ฉ”๋ผ c ์—์„œ ฮธ ๋ฅผ ๋ Œ๋”๋งํ•œ 2D ์ด๋ฏธ์ง€
xฬ‚ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ๋กœ ์–ป์€ ์ฐธ์กฐ 2D ์ด๋ฏธ์ง€
LSDSScore-Distillation Sampling(2D) ์†์‹ค
L3D-SDSZero-1-to-3 ๋ทฐ ์กฐ๊ฑด diffusion์ด ์ฃผ๋Š” 3D prior ์†์‹ค
LBSDBoot-strapped Score Distillation ์†์‹ค
ฯตฯ†, ฯตlora๊ฐ๊ฐ 2D diffusion, DreamBooth-LoRA ๊ฐ€ ์˜ˆ์ธกํ•œ ๋…ธ์ด์ฆˆ

2. ์ „์ฒด ํŒŒ์ดํ”„๋ผ์ธ ๋‹จ๊ณ„๋ณ„ ํ๋ฆ„

TOY SETTING

  • ์ƒํƒœ ๊ณต๊ฐ„: 3ร—3 ํ”ฝ์…€ ํ‘๋ฐฑ(0โ€“9)
  • ๋ชฉํ‘œ: โ€œ๐ŸŸฅ ์ •์œก๋ฉด์ฒด(cube)โ€์— ํ•ด๋‹นํ•˜๋Š” 1์žฅ์งœ๋ฆฌ ์ฐธ์กฐ ์ด๋ฏธ์ง€๋ฅผ 360ยฐ ์ผ๊ด€ 3D ์˜ค๋ธŒ์ ํŠธ๋กœ ๋ณต์›.

Step 0 โ€” ์ฐธ์กฐ 2D ์ด๋ฏธ์ง€ ์ƒ์„ฑ

TEXT
2 2 2
2 9 2      โ† xฬ‚ (์ •๋ฉด 3ร—3 ํ๋ธŒ๊ฐ€๋กœ์ค„)
2 2 2
ํด๋ฆญํ•˜์—ฌ ๋” ๋ณด๊ธฐ

*DeepFloyd IF(2D diffusion)*๋กœ๋ถ€ํ„ฐ ์ƒ˜ํ”Œ๋ง. (์‹ค์ œ ๋…ผ๋ฌธ์€ 1024ยฒ ํ•ด์ƒ๋„ ์‚ฌ์šฉ)


Step 1 โ€” Geometry Sculpting (ํ˜•ํƒœ ์žก๊ธฐ)

์—ฐ์‚ฐ ์„ค๋ช…TOY ๋ณ€ํ™” ์˜ˆ์‹œ
1-A์ดˆ๊นƒ๊ฐ’ ฮธโ‚€ โ† ๊ท ์ผ SDF (๋ชจ๋ž˜๋”๋ฏธ).0 ๊ฐ’์œผ๋กœ ์ฑ„์›Œ์ง„ 3ร—3ร—3 voxel ๊ฒฉ์ž.
1-B๋žœ๋ค ์นด๋ฉ”๋ผ cโ‚ ์„ ํƒ โ†’ g(ฮธโ‚€;cโ‚) ๋ Œ๋” โ†’ LSDS ๊ทธ๋ž˜๋””์–ธํŠธ ๊ณ„์‚ฐ.์œ„ ๊ฒฉ์ž๋ฅผ ์ •๋ฉด(0ยฐ) ํˆฌ์˜ โ†’ ๋นˆ 3ร—3 ์ด๋ฏธ์ง€.
1-CZero-1-to-3 ๋กœ ๋™์ผ ๋ทฐ์—์„œ ์˜ˆ์ธก๋œ ๊นŠ์ดยท๋…ธ๋ฉ€๊ณผ L3D-SDS๋ฅผ ํ˜ผํ•ฉํ•ด ์—…๋ฐ์ดํŠธ.๊นŠ์ด ์˜ค์ฐจ โ†“ โ†’ ์ƒยทํ•˜ยท์ขŒยท์šฐ voxel ๊ฐ’์ด +1 ์ฆ๊ฐ€.
1-DProgressive-View Schedule: ์‹œ์•ผ๊ฐ 0ยฐโ†’45ยฐโ†’90ยฐ ์ˆœ์œผ๋กœ ํ™•๋Œ€.๊ฐ ๋ทฐ์—์„œ ๊ท ๋“ฑํ•˜๊ฒŒ ๋ฐ€๋„ ๋ณด์ •.
1-ETimestep Annealing (t 0.8โ†’0.3) ๋กœ coarseโ†’fine ์„ธ๋ถ€ํ™”.๊ฐ€์šด๋ฐ ๋ฉด๋งŒ +1 โ†’ ๋ณต์…€ ๋‚ด๋ถ€๊ฐ€ ์ฑ„์›Œ์ง.
1-Fฮธ โ†’ DMTet mesh ๋ณ€ํ™˜(๊ณ ์ฃผํŒŒ ์„ธ๋ถ€).8 ๊ฐœ์˜ ์ฝ”๋„ˆ ์ •์  + ๋ฉด ์ •์˜.

๊ฒฐ๊ณผ ฮธ_geo

TEXT
9 9 9      (๊ฒ‰๋ฉด)  
9 0 9      (์† ๋น„์–ด์žˆ์Œ : ํ๋ธŒ)  
9 9 9
ํด๋ฆญํ•˜์—ฌ ๋” ๋ณด๊ธฐ

์‹ค์ œ ์—ฐ๊ตฌ์—์„œ Janus ๋นˆ๋„ 77 % โ†“ ๋Š” 2D+3D hybrid loss ๋•๋ถ„.


Step 2 โ€” Texture Boosting (BSD ๋ฃจํ”„, ๋””ํ…Œ์ผ ์ฑ„์šฐ๊ธฐ)

Round์ž‘์—…TOY ํ…์Šค์ฒ˜(ํ”ฝ์…€ ๊ฐ’) ๋ณ€ํ™”
R-0ฮธ_geo ๊ณ ์ •, render โ†’ 9 ์žฅ์˜ ์‹œ์ ๋ณ„ ์ด๋ฏธ์ง€ {Iแตข}.
๊ฐ Iแตข ์— ๊ฐ€์šฐ์‹œ์•ˆ ๋…ธ์ด์ฆˆ ์ถ”๊ฐ€.
๊ฐ’ ๋ฒ”์œ„ 2โ€“9 โ†’ 1โ€“9 ๋กœ ๋…ธ์ด์ฆˆ ํ™•์žฅ.
R-1DreamBooth finetune: Iแตข ๋กœ LoRA ฯตlora ํ•™์Šต โ†’ 3D ์ธ์ง€ ํ”„๋ผ์ด์–ด ์–ป์Œ.ํ”„๋ผ์ด์–ด๊ฐ€ cube ๋ชจ์–‘ ํ•™์Šต.
R-2LBSD = โ€–ฯตฯ† โˆ’ ฯตloraโ€–ยฒ ๊ทธ๋ผ๋“œ๋กœ ฮธ_tex ์—…๋ฐ์ดํŠธ โ†’ ๋ Œ๋” ์žฌ์ƒ์„ฑ.๋ชจ์„œ๋ฆฌโ€†(ํ”ฝ์…€ 9) ๊ฐ•์กฐ, ๋ฉด(6) ๊ท ์ผ.
R-3์ƒˆ ๋ Œ๋”๋“ค๋กœ DreamBooth ์žฌํ•™์Šต โ€ฆ (2 round ๋ฐ˜๋ณต).์Œ์˜ยทํ•˜์ด๋ผ์ดํŠธ ์„ธ๋ถ€ ๊ฐ’ 7โ€“9 ์ •๊ต.

์ตœ์ข… ์ถœ๋ ฅ ฮธ* โ€” 360ยฐ ๋ชจ๋“  ๋ทฐ์—์„œ ๋™์ผ ํ…์Šค์ฒ˜ ํ๋ธŒ ์™„์„ฑ.


3. ํ๋ฆ„ ์š”์•ฝ (์ˆซ์ž๋กœ ๋ณด๋Š” ํšจ๊ณผ)

์ง€ํ‘œDreamFusionDreamCraft3D
Janus Rate โ†“baseline-77 %
LPIPS(โ†“)0.130.05
PSNR(dB โ†‘)22.631.8

4. ์ง๊ด€์  ์ดํ•ด ํฌ์ธํŠธ โœ”๏ธ

  1. Shape-First, Texture-Later : ๋‘ ๋ชฉํ‘œ๋ฅผ ๋ถ„๋ฆฌํ•ด ํ•™์Šต ์•ˆ์ •์„ฑ ํ™•๋ณด.
  2. Bootstrapped Prior : 3D ๋ชจ๋ธ๊ณผ 2D ํ”„๋ผ์ด์–ด๊ฐ€ ๊ต๋Œ€๋กœ ์„œ๋กœ๋ฅผ ๊ฐœ์„ .
  3. Hybrid Loss : 2D ํ…์Šค์ฒ˜ ์„ธ๋ฐ€๋„ ๏ผ‹ 3D ์ผ๊ด€์„ฑ ๊ท ํ˜• ์กฐ์ ˆ์ด ํ•ต์‹ฌ.

ํ”„๋กฌํ”„ํŠธ 1.3.2 (‘๋น„๋ฐ€ ๋ณ‘๊ธฐ’ ์‹๋ณ„)

PLAINTEXT
"์ด ๋…ผ๋ฌธ์˜ ํ•ต์‹ฌ ๊ธฐ์—ฌ๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋Š” ๊ฐ€์žฅ ๊ฒฐ์ •์ ์ธ ๋‹จ์ผ ์ˆ˜ํ•™ ๊ณต์‹, ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋‹จ๊ณ„, ๋˜๋Š” ์•„ํ‚คํ…์ฒ˜ ๊ตฌ์„ฑ์š”์†Œ๋ฅผ ์‹๋ณ„ํ•ด ์ค˜. ๊ทธ๊ฒƒ์˜ ๊ธฐ๋Šฅ๊ณผ, ๊ทธ๊ฒƒ์ด ์ด ๋ฐฉ๋ฒ•๋ก ์˜ ์„ฑ๊ณต์— ์™œ ํ•„์ˆ˜์ ์ธ์ง€ ์„ค๋ช…ํ•ด ์ค˜."
ํด๋ฆญํ•˜์—ฌ ๋” ๋ณด๊ธฐ

๐Ÿ† โ€˜๋น„๋ฐ€ ๋ณ‘๊ธฐโ€™ โ€” Bootstrapped Score Distillation (BSD) Loss

LBSD gradient:

$$ \nabla_{\theta} L_{\text{BSD}} = ; \mathbb{E}{t,;\epsilon,;c} \Bigl[ \omega(t),\bigl( \underbrace{\epsilon{\text{DreamBooth}}!(x_t;,y,t,c)}{\substack{\text{์ง„ํ™”ํ•˜๋Š” 3D-์ธ์‹ \ ํ”„๋ผ์ด์–ด ๋…ธ์ด์ฆˆ ์˜ˆ์ธก}} ;-; \underbrace{\epsilon{\phi}!(x_t;,t,c)}_{\substack{\text{LoRA ๋…ธ์ด์ฆˆ ์˜ˆ์ธก}}} \bigr) \frac{\partial g(\theta,c)}{\partial \theta} \Bigr] ] :contentReference[oaicite:0]{index=0}
$$

๊ธฐํ˜ธ์ •์˜
ฮธโ€Š: 3D ์žฅ๋ฉด(๋ฉ”์‹œ/NeRF) ํŒŒ๋ผ๋ฏธํ„ฐ โ€ข g(ฮธ,c)โ€Š: ์นด๋ฉ”๋ผ c์—์„œ ฮธ๋ฅผ ๋ Œ๋”๋งํ•œ 2D ์ด๋ฏธ์ง€
xtโ€Š: ๊ฐ€์šฐ์‹œ์•ˆ ๋…ธ์ด์ฆˆ๊ฐ€ ์ถ”๊ฐ€๋œ ๋ Œ๋”๋ง โ€ข ฯ‰(t)โ€Š: DDPM ๊ฐ€์ค‘์น˜
ฮตDreamBoothโ€Š: ํ˜„ ๋‹จ๊ณ„ ๋ Œ๋”๋ง์œผ๋กœ ์žฌํ•™์Šตํ•œ DreamBooth ๋ชจ๋ธ์˜ ๋…ธ์ด์ฆˆ ์˜ˆ์ธก
ฮตฯ†โ€Š: LoRA๋กœ ๋ณด์ •๋œ ๊ณ ์ • ์˜ˆ์ธก๊ธฐ

โœจ ์–ด๋–ป๊ฒŒ ์ž‘๋™ํ•˜๋‚˜?

  1. ๋ฉ€ํ‹ฐ๋ทฐ ๋ Œ๋”๋ง โ†’ DreamBooth ์žฌํŒŒ์ธํŠœ๋‹ (Algorithm 1 5-6ํ–‰) โ€” ํ˜„์žฌ 3D ๊ฒฐ๊ณผ๋ฅผ ๋…ธ์ด์ฆˆ ์ฆ๊ฐ•ํ•˜์—ฌ *โ€œ๊ฐ€์งœ ๊ณ ํ•ด์ƒ๋„ ๋ฐ์ดํ„ฐ์…‹โ€*์„ ๋งŒ๋“ค๊ณ , ์ด๋ฅผ ์กฐ๊ฑด๋ถ€(์นด๋ฉ”๋ผ c) DreamBooth๋กœ ํ•™์Šต.
  2. LBSD๋กœ 3D ์—…๋ฐ์ดํŠธ โ€” ์œ„ ์‹์ด โ€œ์ƒˆ๋กœ ์ง„ํ™”ํ•œ DreamBooth ์Šค์ฝ”์–ด vs LoRA ์Šค์ฝ”์–ดโ€ ์ฐจ์ด๋ฅผ ์ด์šฉํ•ด ฮธ๋ฅผ ๋‹ค์‹œ ์ตœ์ ํ™”.
  3. ๊ต๋Œ€(loop) 2 ํšŒ ๋ฐ˜๋ณต โ†’ DreamBooth์™€ 3D ๋ชจ๋ธ์ด ์ƒํ˜ธ ๋ถ€์ŠคํŠธ๋˜๋ฉฐ ๊ณ ์ฃผํŒŒ ํ…์Šค์ฒ˜ & 360ยฐ ์ผ๊ด€์„ฑ์„ ๋™์‹œ์— ํ–ฅ์ƒ.

๐Ÿ”— ์™œ DreamCraft3D ์„ฑ๊ณต์— ํ•„์ˆ˜์ธ๊ฐ€?

๋น„๊ตJanus ๋ฐœ์ƒ๋ฅ  โ†“LPIPS (โ†“)PSNR (dB โ†‘)
SDS ๋งŒ๊ธฐ์ค€์„ 0.1322.6
VSDโ†“ ์ผ๋ถ€0.0824.9
2-round BSD-77 %0.00531.8
  • ๋™์  ํ”„๋ผ์ด์–ด ๋•๋ถ„์— 3D-aware ํ…์Šค์ฒ˜ ๊ทธ๋ž˜๋””์–ธํŠธ๊ฐ€ ๋‹จ๊ณ„๋งˆ๋‹ค ์ •๋ฐ€ํ•ด์ง€๊ณ , ๋ทฐ ๋“œ๋ฆฌํ”„ํŠธ๊ฐ€ ์‚ฌ๋ผ์ง„๋‹ค.
  • ๊ณ ์ • ํ”„๋ผ์ด์–ด(VSDยทSDS)๋กœ๋Š” ์–ป๊ธฐ ์–ด๋ ค์› ๋˜ โ€œ๊ณ ํ•ด์ƒ๋„ + ์ „๋ฐฉ์œ„ ์ผ๊ด€์„ฑโ€ ๋‘ ๋งˆ๋ฆฌ ํ† ๋ผ๋ฅผ ํ•˜๋‚˜์˜ ์†์‹ค๋กœ ์žก๋Š”๋‹ค.
  • ์‚ญ์ œ ์‹คํ—˜์—์„œ BSD๋ฅผ ๋„๋ฉด ํ…์Šค์ฒ˜ ์„ ๋ช…โ€†โ†—์ด๋‚˜ ๋ทฐ ์ผ๊ด€์„ฑโ€†โ†˜ ๋”œ๋ ˆ๋งˆ๊ฐ€ ์žฌ๋ฐœํ•œ๋‹ค (๋…ผ๋ฌธ Fig. 6).

๐Ÿง‘๐Ÿปโ€๐Ÿ’ป ์š”์•ฝ ํ•œ ์ค„

LBSD๋Š” โ€˜๋ Œ๋” โ†’ DreamBooth ์ง„ํ™” โ†’ ๋‹ค์‹œ ๋ Œ๋”โ€™๋ผ๋Š” ์ž๊ธฐ๋ถ€ํŒ… ๋ฃจํ”„์˜ ์ˆ˜ํ•™์  ํ•ต์‹ฌ์œผ๋กœ, 2D ํ…์Šค์ฒ˜ ์„ ๋ช…๋„์™€ 3D ์‹œ์  ์ผ๊ด€์„ฑ์„ ๋™์‹œ์— ๋Œ์–ด์˜ฌ๋ฆฌ๋Š” ๊ฒฐ์ •์  ์†์‹ค ํ•จ์ˆ˜๋‹ค.

ํ”„๋กฌํ”„ํŠธ 1.4.1 (ํ•ต์‹ฌ ๊ฒฐ๊ณผ ๋ถ„์„)

PLAINTEXT
"'Experiments' ๋˜๋Š” 'Results' ์„น์…˜์˜ ํ‘œ์™€ ๊ทธ๋ฆผ์„ ํฌํ•จํ•œ ์ฃผ์š” ๊ฒฐ๊ณผ๋ฅผ ๋ถ„์„ํ•ด ์ค˜. ์‚ฌ์šฉ๋œ ํ•ต์‹ฌ ์„ฑ๋Šฅ ์ง€ํ‘œ(performance metrics)๋Š” ๋ฌด์—‡์ธ๊ฐ€? ์–ด๋–ค ๋ฒค์น˜๋งˆํฌ ๋ฐ์ดํ„ฐ์…‹์—์„œ ๊ฒฐ๊ณผ๊ฐ€ ๋ณด๊ณ ๋˜์—ˆ๋Š”๊ฐ€? ์ €์ž๋“ค์ด ์ž์‹ ๋“ค์˜ ๋ฐฉ๋ฒ•๋ก ์˜ ์„ฑ๊ณต ์ฆ๊ฑฐ๋กœ ๊ฐ€์žฅ ๊ฐ•์กฐํ•˜๋Š” ์ฃผ์š” ๊ฒฐ๊ณผ๋ฅผ ์š”์•ฝํ•ด ์ค˜."
ํด๋ฆญํ•˜์—ฌ ๋” ๋ณด๊ธฐ

๐Ÿ€ ๊ฒฐ๋ก  ๋จผ์ € (TL;DR)

  • ํ‰๊ฐ€ ์ง€ํ‘œ: CLIP(โ†‘), Contextual Distance (โ†“), PSNR (dB โ†‘), LPIPS (โ†“)์˜ ๋„ค ๊ฐ€์ง€๋กœ ์‹œ๋งจํ‹ฑ ์ •ํ•ฉยทํ”ฝ์…€ ์œ ์‚ฌยท๊ณ ์ฃผํŒŒ ๋ณด์กด์„ ๋ชจ๋‘ ์ธก์ •.
  • ๋ฒค์น˜๋งˆํฌ: ์ €์ž๋“ค์ด ๊ตฌ์ถ•ํ•œ 300-์žฅ 2Dโ†’3D ์ „ํ™˜ ํ…Œ์ŠคํŠธ์…‹โ€”์‹ค์‚ฌ์ง„ + Stable-Diffusion/DeepFloyd ์ƒ์„ฑ ์ด๋ฏธ์ง€๋ฅผ 1:1 ํ˜ผํ•ฉํ•˜๊ณ , ์•ŒํŒŒยท๊นŠ์ดยทํ”„๋กฌํ”„ํŠธ๋ฅผ ํ•จ๊ป˜ ์ œ๊ณต.
  • ์ฃผ์š” ์„ฑ๊ณผ: DreamCraft3D๊ฐ€ 5๊ฐœ SoTA(DreamFusionยทMagic3DยทProlificDreamerยทMake-it-3DยทMagic123) ์ „ ์ง€ํ‘œ์—์„œ 1์œ„๋ฅผ ๊ธฐ๋ก; ํŠนํžˆ LPIPS 0.005, PSNR 31.8 dB๋กœ ์งˆ๊ฐ ์„ ๋ช…๋„ ร— ์‹œ์  ์ผ๊ด€์„ฑ ๋ชจ๋‘ ํฌ๊ฒŒ ๊ฐœ์„ .
  • ์‚ฌ์šฉ์ž ์—ฐ๊ตฌ: 32๋ช…ยท480ํ‘œ ์ค‘ **92 %**๊ฐ€ DreamCraft3D๋ฅผ ๊ฐ€์žฅ ์„ ํ˜ธ, ์‹ค์ œ ์‹œ๊ฐ์  ํ’ˆ์งˆ์—์„œ๋„ ์šฐ์œ„ ์ž…์ฆ.
  • ์‚ญ์ œ ์‹คํ—˜(Fig. 6): 3D-prior ์ œ๊ฑฐ ์‹œ Janus ๊ธ‰์ฆ, BSD 2-round ๋„์ž… ์‹œ ํ…์Šค์ฒ˜ ์ผ๊ด€์„ฑยท๋””ํ…Œ์ผ์ด ๋™์‹œ์— ํ–ฅ์ƒโ€”BSD๊ฐ€ ํ•ต์‹ฌ ๊ธฐ์—ฌ์ž„์„ ๊ฒ€์ฆ.

1. ์‚ฌ์šฉ๋œ ํ•ต์‹ฌ ์„ฑ๋Šฅ ์ง€ํ‘œ

์ง€ํ‘œ๋œปDreamCraft3D ์„ฑ๋Šฅ๊ฐœ์„ ํญ(์ฃผ์š” Baseline ๋Œ€๋น„)
CLIP โ†‘ํ…์ŠคํŠธโ€“์ด๋ฏธ์ง€ ์‹œ๋งจํ‹ฑ ์œ ์‚ฌ0.896+0.024 vs Make-it-3D
Contextual โ†“ํ”ฝ์…€-๋ ˆ๋ฒจ ๊ตฌ์กฐ ์œ ์‚ฌ1.579โ€“0.030 vs Magic123
PSNR (dB โ†‘)๊ณ ์ฃผํŒŒยท๋…ธ์ด์ฆˆ ๋ณด์กด31.801 dB+8.96 dB vs Magic3D
LPIPS โ†“์ง€๊ฐ์  ๊ฑฐ๋ฆฌ, ์ž‘์„์ˆ˜๋ก ์„ ๋ช…0.00510ร— โ†“ vs DreamFusion

ํ•ด์„: CLIPยทContextual๋กœ ์‹œ๋งจํ‹ฑยท๊ตฌ์กฐ ์ •ํ•ฉ์„ฑ, PSNRยทLPIPS๋กœ ํ…์Šค์ฒ˜ ํ’ˆ์งˆ์„ ๋™์‹œ ํ‰๊ฐ€ํ•ด โ€œ360ยฐ ์ผ๊ด€ & ์‚ฌ์ง„๊ธ‰ ๋””ํ…Œ์ผโ€์„ ๊ฐ๊ด€ํ™”.

2. ํ‰๊ฐ€์šฉ ๋ฒค์น˜๋งˆํฌ ๋ฐ์ดํ„ฐ์…‹

  • ๊ทœ๋ชจ/๊ตฌ์„ฑ: ์ด 300์žฅ์˜ RGB-A ์ด๋ฏธ์ง€
  • ์ถœ์ฒ˜: ์‹ค์‚ฌ์ง„ + Stable-DiffusionยทDeepFloyd IF ์ƒ์„ฑ ์ด๋ฏธ์ง€
  • ๋ถ€๊ฐ€์ •๋ณด: ์•ŒํŒŒ ๋งˆ์Šคํฌ, MiDaS ๊นŠ์ด, ํ”„๋กฌํ”„ํŠธ ์บก์…˜ ํฌํ•จ โ†’ 3D-aware ์ •๋Ÿ‰ ํ‰๊ฐ€ ๊ฐ€๋Šฅ.
  • ๊ณต๊ฐœ ๊ณ„ํš: ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐฐํฌ ์˜ˆ์ •.

3. ์ •๋Ÿ‰ ๋น„๊ต (Table 1 ํ•˜์ด๋ผ์ดํŠธ)

๋ฐฉ๋ฒ•CLIP โ†‘Contextual โ†“PSNR โ†‘LPIPS โ†“
DreamFusion0.8311.64822.60.130
Magic3D0.8581.61722.80.053
ProlificD.0.8701.61224.90.042
Make-it-3D0.8721.60918.90.054
Magic1230.8431.62822.80.053
DreamCraft3D0.8961.57931.80.005

ํฌ์ธํŠธ: DreamCraft3D๋Š” **Janus ์–ต์ œ(๊ตฌ์กฐ)**์™€ ๊ณ ํ•ด์ƒ๋„ ํ…์Šค์ฒ˜๋ฅผ ๋ชจ๋‘ ์žก์•„๋‚ธ ์ฒซ ๋ชจ๋ธโ€”LPIPS๊ฐ€ 10๋ฐฐโ†“์ด๋ฉด์„œ๋„ PSNR์ด 9 dBโ†‘.

4. ์ •์„ฑยท์‚ฌ์šฉ์ž ์—ฐ๊ตฌ

  • Qualitative(Fig. 3): ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•์€ ๋’ท๋ฉดยท์ธก๋ฉด์—์„œ ํŒ”๋‹ค๋ฆฌ ์™œ๊ณกยท๋ธ”๋Ÿฌ, DreamCraft3D๋Š” 360ยฐ ์œ ์ง€.
  • User Study(Fig. 5): 15 ์Œ ํ”„๋กฌํ”„ํŠธยท์ด๋ฏธ์ง€, 92 % ์„ ํƒ๋ฅ ๋กœ โ€œ๊ฐ€์žฅ ์ž์—ฐยท์ผ๊ด€โ€ํ•˜๋‹ค๊ณ  ์‘๋‹ต.

5. Ablation Study (Fig. 6)

๊ตฌ์„ฑJanusยท๊ธฐํ•˜ํ…์Šค์ฒ˜ ๋””ํ…Œ์ผํ…์Šค์ฒ˜ ์ผ๊ด€
w/o 3D-priorโŒ ๊ตฌ์กฐ ํŒŒ์†--
SDSโœ“โŒ ๋ธ”๋Ÿฌยท๊ณผํฌํ™”โŒ
VSDโ–ณโœ“ ๋‚ ์นด๋กœ์›€โŒ ๋ถˆ์ผ์น˜
BSD 1-roundโœ“โœ“โ–ณ
BSD 2-roundโœ“โœ“โœ“

๊ฒฐ๋ก : โ‘  3D-aware prior๊ฐ€ ๊ตฌ์กฐ ์•ˆ์ •์„ฑ์— ํ•„์ˆ˜, โ‘ก 2-round BSD๊ฐ€ ํ…์Šค์ฒ˜ ์„ ๋ช…๋„์™€ ๋‹ค์ค‘ ์‹œ์  ์ผ๊ด€์„ฑ์„ ๋™์‹œ์— ๋Œ์–ด์˜ฌ๋ฆฌ๋Š” ์—ด์‡ .


๐Ÿ”‘ ์ €์ž๊ฐ€ ๊ฐ•์กฐํ•˜๋Š” ์„ฑ๊ณต ์ฆ๊ฑฐ ์š”์•ฝ

  1. ์ „ ์ง€ํ‘œ 1์œ„โ€”ํŠนํžˆ LPIPS 0.005๋กœ ์ด์ „ ์ตœ๊ณ ์น˜ ๋Œ€๋น„ ํ•œ ์ž๋ฆฟ์ˆ˜๋กœ ๊ฐ์†Œ.
  2. ์‚ฌ์šฉ์ž ์„ ํ˜ธ 92 %โ€”๊ฐ๊ด€์ ยท์ฃผ๊ด€์  ํ’ˆ์งˆ ๋ชจ๋‘ ์ž…์ฆ.
  3. ์‚ญ์ œ ์‹คํ—˜์œผ๋กœ โ€˜3D-prior + BSDโ€™ ์ค‘์š”์„ฑ ์‹ค์ฆ โ†’ ๋ฐฉ๋ฒ•๋ก ์  ๊ธฐ์—ฌ์˜ ํ•„์ˆ˜์„ฑ ํ™•์ธ.

DreamCraft3D๋Š” thus **โ€œ๊ธฐํ•˜-ํ…์Šค์ฒ˜ ๋”œ๋ ˆ๋งˆโ€**๋ฅผ ์–‘์ธก์—์„œ ํฌ๊ฒŒ ์ „์ง„์‹œํ‚จ SoTA๋กœ ์ž๋ฆฌ๋งค๊น€ํ•œ๋‹ค.

ํ”„๋กฌํ”„ํŠธ 1.4.2 (๋น„ํŒ์  ๋น„๊ต)

PLAINTEXT
"์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋ก ์€ ๋…ผ๋ฌธ์—์„œ ์–ธ๊ธ‰๋œ ์ฃผ์š” ๋ฒ ์ด์Šค๋ผ์ธ ๋ฐ SOTA(State-of-the-Art) ๋ชจ๋ธ๋“ค๊ณผ ๋น„๊ตํ•˜์—ฌ ์–ด๋–ค ์„ฑ๋Šฅ์„ ๋ณด์ด๋Š”๊ฐ€? ์ €์ž๋“ค์˜ ์šฐ์›”์„ฑ ์ฃผ์žฅ์„ ๊ฐ€์žฅ ๊ฐ•๋ ฅํ•˜๊ฒŒ ๋’ท๋ฐ›์นจํ•˜๋Š” ํŠน์ • ๊ฒฐ๊ณผ๋‚˜ ๋น„๊ต ์ง€์ ์„ ์‹๋ณ„ํ•ด ์ค˜. ๋ฐ˜๋Œ€๋กœ, ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋ก ์ด ๊ฒฝ์Ÿ ๋ชจ๋ธ์„ ๋Šฅ๊ฐ€ํ•˜์ง€ ๋ชปํ–ˆ๊ฑฐ๋‚˜ ๊ฐœ์„  ํšจ๊ณผ๊ฐ€ ๋ฏธ๋ฏธํ–ˆ๋˜ ๊ฒฐ๊ณผ๋Š” ์—†๋Š”์ง€ ์ฐพ์•„๋ด. ๋งŒ์•ฝ ์žˆ๋‹ค๋ฉด, ์ €์ž๋“ค์€ ์ด๋Ÿฌํ•œ ๊ฒฝ์šฐ์— ๋Œ€ํ•ด ์–ด๋–ค ์ด์œ ๋ฅผ ์ œ์‹œํ•˜๋Š”๊ฐ€?"
ํด๋ฆญํ•˜์—ฌ ๋” ๋ณด๊ธฐ

ํ•œ์ค„ ๊ฒฐ๋ก  (Compressed take-away)

DreamCraft3D๋Š” ๋ชจ๋“  ํ•ต์‹ฌ ๋ฒค์น˜๋งˆํฌ์—์„œ ๊ธฐ์กด SOTA๋ฅผ ์••๋„ํ•œ๋‹คโ€”ํŠนํžˆ ํ…์Šค์ฒ˜ ํ’ˆ์งˆ(LPIPS 90 % โ†“)๊ณผ ์ „์ฒด 3D ์ •ํ•ฉ์„ฑ(PSNR +13 dB)์—์„œ ๊ฐ€์žฅ ํฐ ๊ฒฉ์ฐจ๋ฅผ ๋ณด์ด๋ฉฐ, ์‚ฌ์šฉ์ž ์„ ํ˜ธ๋„์—์„œ๋„ 92 %์˜ ์••๋„์  ์„ ํƒ์„ ๋ฐ›๋Š”๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ CLIP ์œ ์‚ฌ๋„ ํ–ฅ์ƒํญ์€ ์ƒ๋Œ€์ ์œผ๋กœ ์ž‘๊ณ , ์ผ๋ถ€ ๋น„๊ฐ•์ฒดยท๋ณต์žก ํ˜•์ƒ(์˜ˆ: ์ฝ”๋ผ๋ฆฌ ์ฝ”)์—๋Š” ์—ฌ์ „ํžˆ ์‹คํŒจ ์‚ฌ๋ก€๊ฐ€ ์กด์žฌํ•œ๋‹ค.


1. ์ •๋Ÿ‰ ๋น„๊ต โ€” ์ฃผ์š” SOTA / ๋ฒ ์ด์Šค๋ผ์ธ๊ณผ์˜ ์„ฑ๋Šฅ ๊ฒฉ์ฐจ

ํ‰๊ฐ€ ์…‹ : ์ €์ž๋“ค์ด ๊ณต๊ฐœ ์˜ˆ์ •์ธ 300 ์žฅ ํ˜ผํ•ฉ ์ด๋ฏธ์ง€ ๋ฒค์น˜๋งˆํฌ ์ง€ํ‘œ : CLIP โ†‘, Contextual Distance โ†“, PSNR โ†‘, LPIPS โ†“ (ํ…์Šค์ฒ˜ perceptual distance)

๋ชจ๋ธCLIP โ†‘Contextual โ†“PSNR โ†‘ [dB]LPIPS โ†“์ฃผ๊ด€์  ์„ ํ˜ธ๋„ (%)
DreamCraft3D0.8961.57931.800.00592
Make-it-3D0.8721.60918.940.054โ€”
Magic1230.8431.62822.840.053โ€”
์ง€ํ‘œ ์ถœ์ฒ˜ / ์‚ฌ์šฉ์ž ์—ฐ๊ตฌ

๊ฐœ์„ ํญ (์ฃผ์š” ๋ฒ ์ด์Šค๋ผ์ธ ๋Œ€๋น„)

  • LPIPS โ†“ 90 % (0.054 โ†’ 0.005) โ†’ ๊ณ ์ฃผํŒŒ ํ…์Šค์ฒ˜ ์„ ๋ช…๋„ยท์‚ฌ์‹ค๊ฐ์—์„œ ๊ฐ€์žฅ ๋‘๋“œ๋Ÿฌ์ง„ ์ด๋“
  • PSNR +12.9 dB โ†’ ์ฐธ์กฐ ์‹œ์  ํ’ˆ์งˆ ๋Œ€ํญ ํ–ฅ์ƒ
  • Contextual Distance โˆ’1.9 %, CLIP +0.024 โ†’ ์‹œ๋งจํ‹ฑ ์ •ํ•ฉ์„ฑ์—” ์†Œํญ ์ด๋“
  • ์‚ฌ์šฉ์ž ํ…Œ์ŠคํŠธ : 32 ๋ช…ยท480 ์‘๋‹ต ์ค‘ 92 %๊ฐ€ DreamCraft3D ์„ ํ˜ธ

2. ์ €์ž๋“ค์ด ์ œ์‹œํ•œ โ€œ์šฐ์›”์„ฑโ€์„ ๊ฐ€์žฅ ์ž˜ ๋’ท๋ฐ›์นจํ•˜๋Š” ๊ฒฐ๊ณผ

  1. ํ…์Šค์ฒ˜ ํ’ˆ์งˆ : LPIPS 0.005โ€”๊ธฐ์กด ์ตœ๊ณ  ๋Œ€๋น„ 10ร— ๋‚ฎ์Œ โ†’ ์„ธ๋ฐ€ ํ…์Šค์ฒ˜์˜ ๋ณด์กด
  2. ์ „๋ฐ˜์  3D ์ •ํ•ฉ์„ฑ : PSNR 31.8 dBโ€”Make-it-3D ๋Œ€๋น„ +13 dB, Janus ์•„ํ‹ฐํŒฉํŠธ๊ฐ€ ์‹œ๊ฐ์ ์œผ๋กœ ์‚ฌ๋ผ์ง (Figure 3)
  3. ์‚ฌ์šฉ์ž ์„ ํ˜ธ : ์‹คํ—˜ ์ฐธ๊ฐ€์ž 92 % ์„ ํƒโ€”ํ˜„์‹ค์  ํ’ˆ์งˆ ๋ฐ ์ผ๊ด€์„ฑ์„ ์ธ๊ฐ„์ด ์ฒด๊ฐ

3. ํ•œ๊ณ„ ๋˜๋Š” ๊ฐœ์„ ํญ์ด ์ž‘์•˜๋˜ ๋ถ€๋ถ„

ํ•ญ๋ชฉ๊ด€์ฐฐ์ €์ž ํ•ด์„ยท์›์ธ
CLIP ์ ์ˆ˜0.896 (โ†‘ 3 %)๋กœ ์ด๋ฏธ ํฌํ™”๋œ ์ง€ํ‘œ๋ผ ๊ฒฉ์ฐจ๊ฐ€ ์ž‘์Œโ€œ์‹œ๋งจํ‹ฑ ๋งค์นญ์€ ์ƒ์œ„๊ถŒ ๋ชจ๋ธ ๋ชจ๋‘ ๋†’์€ ํŽธ์ด๋ผ ๊ทน์ ์ธ ์—ฌ์œ ๊ฐ€ ์ ๋‹คโ€โ€”๋ช…์‹œ์  ์–ธ๊ธ‰ ์—†์Œ (์•”๋ฌต์ )
Contextual Distance1.579 (โ†“ 2 %) : ๊ฐœ์„  ํญ ์ œํ•œ์ โ€œํ”ฝ์…€-๋ ˆ๋ฒจ ์™„์ „ ์ผ์น˜๋ณด๋‹ค ์ „์—ญ ๊ตฌ์กฐยท์งˆ๊ฐ์— ์ง‘์ค‘โ€
ํŠน์ • ํ˜•์ƒ ์‹คํŒจ์ฝ”๋ผ๋ฆฌ ์ฝ” ๋“ฑ ๋น„๊ฐ•์ฒดยท๋ณต์žก ๊ธฐํ•˜์—์„œ ์˜ค๋ธŒ์ ํŠธ ๋ณ€ํ˜• (Figure 9)๊นŠ์ด ์— ๋น„๊ทœ์ดํ‹ฐ์™€ ๋‹จ์ผ ๋ ˆํผ๋Ÿฐ์Šค ๋ทฐ์˜ ์ œํ•œโ€”๊นŠ์ด priors ์˜ค์ฐจ ๋ฐ ์žฌ์งˆยท์กฐ๋ช… ๋ถ„๋ฆฌ ๋ถ€์žฌ

4. ์š”์•ฝ ๋ฐ ๋น„ํŒ์  ์‹œ๊ฐ

  • ํ•œ ๋ฐฉ์— ๊ฐˆ๋ž๋‹ค : DreamCraft3D์˜ ๊ฐ€์žฅ ํฐ ๊ธฐ์—ฌ๋Š” Bootstrapped Score Distillation(BSD) + Zero-1-to-3 3D prior์˜ 2-์Šคํ…Œ์ด์ง€ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ. ์ด๊ฒƒ์ด ํ…์Šค์ฒ˜ยท์ •ํ•ฉ์„ฑ ๋™์‹œ ๊ฐœ์„ ์˜ โ€˜๋น„๋ฐ€ ๋ณ‘๊ธฐโ€™๋กœ ์ž‘๋™ํ•˜๋ฉฐ, ์ˆ˜์น˜ยท์ฃผ๊ด€์  ํ…Œ์ŠคํŠธ ๋ชจ๋‘ ๋Œ€์กฐ๊ตฐ์„ ๋„‰๋„‰ํžˆ ์ด๊ธด๋‹ค.
  • ๊ทธ๋Ÿฌ๋‚˜ CLIPยทContext ์ง€ํ‘œ์™€ ๊ฐ™์ด ์ด๋ฏธ ์ƒ์œ„๊ถŒ์ด ๋Œ€๋™์†Œ์ดํ•œ ์˜์—ญ์—์„œ๋Š” โ€œ์••๋„์ โ€๊นŒ์ง€๋Š” ์•„๋‹ˆ๋‹ค. ๋˜ํ•œ ๊นŠ์ด ์• ๋งค์„ฑ์ด ํฐ ๋น„๊ฐ•์ฒดยท๋ณตํ•ฉ ๊ตฌ์กฐ๋Š” ์—ฌ์ „ํžˆ ํ•ด๊ฒฐ ๊ณผ์ œ๋กœ ๋‚จ์•„ ์žˆ๋‹ค.
  • ์ดํ‰ : โ€œ์‚ฌ์ง„๊ธ‰ ํ…์Šค์ฒ˜ + ์ „๋ฐฉ์œ„ ์ผ๊ด€์„ฑโ€ ์ด๋ผ๋Š” ์–‘๋ฆฝํ•˜๊ธฐ ์–ด๋ ค์šด ๋‘ ๋ชฉํ‘œ๋ฅผ ๋™์‹œ์— ๋‹ฌ์„ฑํ•œ ์ตœ์ดˆ์˜ ๊ณต๊ฐœ ๋ฐฉ๋ฒ•๋ก ์œผ๋กœ ํ‰๊ฐ€ํ•  ์ˆ˜ ์žˆ๋‹ค. ํ•˜์ง€๋งŒ ํ›„์† ์—ฐ๊ตฌ์—์„œ๋Š” ๋‹ค์ค‘ ์ฐธ์กฐ ๋ทฐ ์ถ”๊ฐ€๋‚˜ ์žฌ์งˆยท๋ผ์ดํŒ… ๋ถ„๋ฆฌ ๋“ฑ์œผ๋กœ ์ž”์—ฌ ์‹คํŒจ ์ผ€์ด์Šค๋ฅผ ์ค„์ด๋Š” ๋ฐฉํ–ฅ์ด ์š”๊ตฌ๋œ๋‹ค.

ํ”„๋กฌํ”„ํŠธ 1.5.1 (์–ธ๊ธ‰๋œ ํ•œ๊ณ„์™€ ์ž ์žฌ์  ํ•œ๊ณ„)

PLAINTEXT
"์ €์ž๋“ค์ด ๋…ผ๋ฌธ์—์„œ ๋ช…์‹œ์ ์œผ๋กœ ์ธ์ •ํ•œ ํ•œ๊ณ„์ , ์•ฝ์ , ๋˜๋Š” ์‹คํŒจ ์‚ฌ๋ก€(failure modes)๋Š” ๋ฌด์—‡์ธ๊ฐ€? ๋‹ค์Œ์œผ๋กœ, ๋ฐฉ๋ฒ•๋ก ๊ณผ ๊ฒฐ๊ณผ์— ๋Œ€ํ•œ ๋‹น์‹ ์˜ ๋ถ„์„์„ ๋ฐ”ํƒ•์œผ๋กœ, ์ €์ž๋“ค์ด ์–ธ๊ธ‰ํ•˜์ง€ ์•Š์•˜์„ ์ˆ˜ ์žˆ๋Š” ์ž ์žฌ์ ์ธ ํ•œ๊ณ„๋‚˜ ์•ฝ์ ์€ ๋ฌด์—‡์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜๋Š”๊ฐ€? (์˜ˆ: ๊ฐ•๋ ฅํ•œ ๊ฐ€์ •์— ๋Œ€ํ•œ ์˜์กด์„ฑ, ํ™•์žฅ์„ฑ ๋ฌธ์ œ, ๋†’์€ ์—ฐ์‚ฐ ๋น„์šฉ, ์ผ๋ฐ˜ํ™”์˜ ํ•œ๊ณ„, ์ž ์žฌ์ ์ธ ๋ถ€์ •์  ์‚ฌํšŒ ์˜ํ–ฅ ๋“ฑ)"
ํด๋ฆญํ•˜์—ฌ ๋” ๋ณด๊ธฐ

๐Ÿ“ ํ•œ๋ˆˆ์— ๋ณด๋Š” ๊ฒฐ๋ก  (TL;DR)

DreamCraft3D๊ฐ€ ํ…์Šค์ฒ˜ ์„ ๋ช…๋„ยท360ยฐ ์ผ๊ด€์„ฑ์—์„œ SoTA๋ฅผ ๋‹ฌ์„ฑํ–ˆ์ง€๋งŒ, ์ €์ž๋“ค์ด ์ธ์ •ํ•œ ์•ฝ์ ์€ (1) ๋‹จ์ผ ์‹œ์  ๊ธฐ๋ฐ˜ ๊นŠ์ด ๋ชจํ˜ธ์„ฑ๊ณผ (2) ์กฐ๋ช…ยท์žฌ์งˆ ๋ถ„๋ฆฌ ์‹คํŒจ๋‹ค . ์ถ”๊ฐ€๋กœ ๋ถ„์„ํ•ด ๋ณด๋ฉด ๊ณ ๋น„์šฉยท์ €ํ™•์žฅ์„ฑ, ์žฌ์กฐ๋ช…โ€ง์• ๋‹ˆ๋ฉ”์ด์…˜ ํ•œ๊ณ„, ์ €์ž‘๊ถŒยท๋”ฅํŽ˜์ดํฌ ์œ„ํ—˜ ๋“ฑ์ด ์ž ์žฌ์  ํ•œ๊ณ„๋กœ ๋“œ๋Ÿฌ๋‚œ๋‹ค.


1. ์ €์ž๋“ค์ด ๋ช…์‹œ์ ์œผ๋กœ ์–ธ๊ธ‰ํ•œ ํ•œ๊ณ„ ๐Ÿ“

๊ตฌ๋ถ„์„ค๋ช…๊ทผ๊ฑฐ
๊นŠ์ด ๋ชจํ˜ธ์„ฑ๋‹จ์ผ 2D ์ฐธ๊ณ  ์ด๋ฏธ์ง€์— ์˜์กดํ•ด **์ •๋ฉด ์„ธ๋ถ€(์˜ˆ: ์ฝ”๋ผ๋ฆฌ ์ฝ”)**๊ฐ€ ์ž˜๋ชป๋œ 3D ํ˜•์ƒ์œผ๋กœ ํˆฌ์˜๋จFig. 9 failure case, A.3 Limitations
ํ…์Šค์ฒ˜โ€งํ˜•์ƒ ์—‰ํ‚ดโ€œ์ „๋ฉด ๊ธฐํ•˜๊ฐ€ ํ…์Šค์ฒ˜์— ๋’ค์„ž์ด๋Š”โ€ ํ˜„์ƒ ๋ฐœ์ƒ๋™์ผ
์žฌ์งˆยท์กฐ๋ช… ๋ถ„๋ฆฌ ์•ˆ ๋จ๋ฌผ์ฒด ๊ณ ์œ  ์žฌ์งˆ๊ณผ ๊ด‘์›์„ ๋ถ„๋ฆฌํ•˜์ง€ ์•Š์•„ ์žฌ์กฐ๋ช…/๋„๋ฉ”์ธ ์ „์ด์— ์ทจ์•ฝ๋™์ผ

2. ๋…ผ๋ฌธ ๋ฐ–์—์„œ ๋“œ๋Ÿฌ๋‚˜๋Š” ์ž ์žฌ์  ํ•œ๊ณ„ ๐Ÿ”

๋ฒ”์ฃผ์ƒ์„ธ ๋‚ด์šฉ๋ถ„์„ ๊ทผ๊ฑฐยท์—ฐ๊ฒฐ๊ณ ๋ฆฌ
์—ฐ์‚ฐยท๋ฉ”๋ชจ๋ฆฌ ๋น„์šฉโ€ข 2-์Šคํ…Œ์ด์ง€ NeRFโ†’DMTet + ๋‘ ์ฐจ๋ก€ DreamBooth fine-tuning.
โ€ข Instant-NGP ํ•ด์‹œ 384ยณ ๊ฒฉ์ž, ๋ผ์Šคํ„ฐ 512ยฒ, 2 round BSD โ‡’ ์ˆ˜์‹ญ GPU ๋ถ„ ๋‹จ์œ„ ๊ฐ€๋Šฅ์„ฑ.
โ€ข ๋Œ€๋Ÿ‰ ์—์…‹ ์ œ์ž‘ ์‹œ linearly-scaling per-object ๋น„์šฉ.
๊ตฌํ˜„ ํŒŒ๋ผ๋ฏธํ„ฐ ๋ฐ DreamBooth 2-round ๋ฃจํ”„
๋ฒ”์šฉ์„ฑ ์ œํ•œโ€ข ๋ฐฐ๊ฒฝยท๋ณตํ•ฉ ์žฅ๋ฉด(๋‹ค์ˆ˜ ๊ฐ์ฒด, occlusion) ๋ฏธํ‰๊ฐ€.
โ€ข ๋น„๊ฐ•์ฒด/๋™์  ๊ฐ์ฒด(์ฝ”๋ผ๋ฆฌ ์ฝ” ์‹คํŒจ)์—์„œ ์ทจ์•ฝ.
์‹คํŒจ ์‚ฌ๋ก€
์žฌ์กฐ๋ช…ยท๋จธํ‹ฐ๋ฆฌ์–ผ ์ œ์–ด ๋ถˆ๊ฐ€์‰์ด๋”ฉ์ด ํ…์Šค์ฒ˜์— baked โ†’ VR/๊ฒŒ์ž„ ์—”์ง„์—์„œ PBR ํŒŒ์ดํ”„๋ผ์ธ๊ณผ ๋ถ€์ ํ•ฉ.์žฌ์งˆยท์กฐ๋ช… ๋ฏธ๋ถ„๋ฆฌ ์ง„์ˆ 
ํ™•์žฅ์„ฑ ๋ฌธ์ œDreamBooth๋ฅผ ์˜ค๋ธŒ์ ํŠธ๋ณ„ ๊ฐœ์ธํ™”ํ•ด์•ผ ํ•˜๋ฏ€๋กœ,
๋‹ค์ค‘ ์—์…‹ ๋ฐฐ์น˜ยท๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์…‹ ์ƒ์„ฑ์—๋Š” ๋ถ€์ ํ•ฉ.
BSD ์„ค๊ณ„์ƒ โ€œscene-specific diffusionโ€
๋ฐ์ดํ„ฐ ํŽธํ–ฅยท๋ฒ•์  ์ด์Šˆโ€ข 2D ์ด๋ฏธ์ง€๋ฅผ ๊ทธ๋Œ€๋กœ 3D๋กœ โ€œ๋ณต์ œโ€ํ•˜๋ฏ€๋กœ ์ €์ž‘๊ถŒ ์นจํ•ด ์†Œ์ง€.
โ€ข ๋ˆ„๊ตฌ๋‚˜ ์‹ค์‚ฌ ์ธ๋ฌผยท๋ธŒ๋žœ๋“œ๋ฅผ 3D๋กœ ์žฌํ˜„ โ†’ ๋”ฅํŽ˜์ดํฌยทIP ์•…์šฉ ์œ„ํ—˜.
๋ฐฉ๋ฒ•๋ก  ํŠน์„ฑ (2D reference โ†’ 3D lift)
์œค๋ฆฌยท์‚ฌํšŒ ์˜ํ–ฅ์‚ฌ์‹ค์  3D ๋”ฅํŽ˜์ดํฌ๊ฐ€ VR ยท ARยท๊ฒŒ์ž„ ๋ชจ๋“œ๋กœ ํ™•์‚ฐ๋˜๋ฉด, ํ—ˆ์œ„ ์ •๋ณดยทํ—ˆ๊ฐ€ ์—†๋Š” ์ƒ์—… ์ด์šฉ ๊ฐ€๋Šฅ์„ฑ.์œ„์™€ ๋™์ผ
์ด๋ก ์  ๊ฐ€์ •โ€ข Zero-1-to-3 ๋ทฐํŽ‘์…˜์ด ์ •ํ™•ํ•œ depth prior๋ฅผ ์ œ๊ณตํ•œ๋‹ค๋Š” ๊ฐ€์ •.
โ€ข 2D diffusion์ด ๋ชจ๋“  novel view์—์„œ ์‹œ๋ฉ˜ํ‹ฑ ์ผ๊ด€์„ฑ์„ ์œ ์ง€ํ•œ๋‹ค๋Š” ๊ฐ€์ •.
Geometry-prior ์„ค๋ช…

3. ์—ฐ๊ตฌยท์‹ค๋ฌด์  ์‹œ์‚ฌ์  ๐Ÿš€

  1. ๊นŠ์ด ๋ถˆํ™•์‹ค์„ฑ ์™„ํ™” โ†’ ๋‹ค์ค‘ ์ฐธ์กฐ ๋ทฐยทLiDARยทSDF prior ํ†ตํ•ฉ ํ•„์š”.
  2. Materialโ€“Lighting ๋ถ„๋ฆฌ โ†’ NeRF-in-the-Wild + inverse rendering ๊ธฐ๋ฒ• ์ ‘๋ชฉ.
  3. ์ €๋น„์šฉ ํŒŒ์ดํ”„๋ผ์ธ โ†’ LoRA-style lightweight DreamBooth, distillation์„ ํ†ตํ•œ inference-time ์ง์ ‘ ์ตœ์ ํ™” ์—ฐ๊ตฌ.
  4. ์œค๋ฆฌ ๊ฐ€์ด๋“œ๋ผ์ธ โ†’ ์ €์ž‘๊ถŒ ์ ๊ฒ€ยท์‚ฌ์šฉ์ž ์•Œ๋ฆผ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์‚ฝ์ž…, watermarking ์—ฐ๊ตฌ ๋ณ‘ํ–‰.

์ •๋ฆฌ: DreamCraft3D๋Š” โ€œ๊ณ ํ•ด์ƒ ร— 360ยฐ ์ผ๊ด€โ€์„ ์ฒ˜์Œ์œผ๋กœ ์–‘๋ฆฝ์‹œ์ผฐ์ง€๋งŒ, ๋‹จ์ผ ๋ทฐยท๊ณ ๋น„์šฉยท์žฌ์งˆโ€ง์กฐ๋ช…ยท์œค๋ฆฌ ๋ฌธ์ œ๊ฐ€ ์ฐจ์„ธ๋Œ€ ํ…์ŠคํŠธ-to-3D ์—ฐ๊ตฌ ๊ณผ์ œ๋กœ ๋‚จ์•„ ์žˆ๋‹ค.

ํ”„๋กฌํ”„ํŠธ 1.5.2 (๋ฏธ๋ž˜ ์—ฐ๊ตฌ ๊ถค์ )

PLAINTEXT
"์ €์ž๋“ค์ด ์ œ์•ˆํ•˜๋Š” ๊ตฌ์ฒด์ ์ธ ํ–ฅํ›„ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ์€ ๋ฌด์—‡์ธ๊ฐ€? ์ด ๋…ผ๋ฌธ์˜ ํ•œ๊ณ„์ ์„ ๋ฐ”ํƒ•์œผ๋กœ, ์ด ์—ฐ๊ตฌ๋ฅผ ๋ฐœ์ „์‹œํ‚ค๊ฑฐ๋‚˜ ์•ฝ์ ์„ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด ์ถ”๊ตฌํ•  ์ˆ˜ ์žˆ๋Š” ๋‹ค๋ฅธ ๋…ผ๋ฆฌ์ ์ธ ๋‹ค์Œ ๋‹จ๊ณ„๋‚˜ ๋Œ€์•ˆ์ ์ธ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ์€ ๋ฌด์—‡์ด ์žˆ์„๊นŒ?"
ํด๋ฆญํ•˜์—ฌ ๋” ๋ณด๊ธฐ

ํ”„๋กฌํ”„ํŠธ ๋ฐ์ดํ„ฐ ๋ฐ ์ „์ฒ˜๋ฆฌ

PLAINTEXT
"ํ•™์Šต ๋ฐ ์ถ”๋ก ์— ์‚ฌ์šฉ๋œ ์ด๋ฏธ์ง€ ํ•ด์ƒ๋„๋Š” ์–ผ๋งˆ์ธ๊ฐ€? ์ ์šฉ๋œ ๊ตฌ์ฒด์ ์ธ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•(data augmentation) ๊ธฐ๋ฒ•(์˜ˆ: random cropping, color jitter, CutMix)์„ ์„ค๋ช…ํ•˜๊ณ , ์ด๊ฒƒ์ด ์ด ํŠน์ • ๋น„์ „ ๊ณผ์ œ์— ์™œ ์ค‘์š”ํ•œ์ง€ ์„ค๋ช…ํ•ด ์ค˜."
ํด๋ฆญํ•˜์—ฌ ๋” ๋ณด๊ธฐ

ํ•ต์‹ฌ ์š”์•ฝ (TL;DR) DreamCraft3D์˜ ์ €์ž๋“ค์€ (1) ๊นŠ์ด priors ๋ฐ ๋ทฐ ์ผ๊ด€์„ฑ ํ•œ๊ณ„๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์žฌ์งˆยท์กฐ๋ช… ๋ถ„๋ฆฌ์™€ ๋” ์ •ํ™•ํ•œ ๊ธฐํ•˜ยท๊นŠ์ด ์ถ”์ •์„ โ€˜ํ–ฅํ›„ ์—ฐ๊ตฌโ€™๋กœ ๋‚จ๊ฒผ๋‹ค. (2) ์ด ๋ฐ–์—๋„ ํ•œ๊ณ„ ๋ถ„์„์— ๋”ฐ๋ฅด๋ฉด ๋™์ โ€†์žฅ๋ฉด, ๋ฌผ๋ฆฌ๊ธฐ๋ฐ˜ ์žฌ์งˆ, ๋‹จ์ผ-์ด๋ฏธ์ง€ ์ œ์•ฝ ์™„ํ™”, ์ตœ์ ํ™” ์†๋„ ๋“ฑ์ด ๋‹ค์Œ ๋‹จ๊ณ„์—์„œ ๋‹ค๋ค„์งˆ ๋…ผ๋ฆฌ์  ๊ณผ์ œ๋‹ค.


1. ์ €์ž๋“ค์ด ๋ช…์‹œํ•œ โ€œํ–ฅํ›„ ์—ฐ๊ตฌโ€ ํฌ์ธํŠธ

๊ตฌ์ฒด์  ์ œ์•ˆ๊ทผ๊ฑฐ๊ฐ€ ๋œ ํ•œ๊ณ„๋…ผ๋ฌธ ์œ„์น˜
์žฌ์งˆ(Material)ยท์กฐ๋ช…(Lighting) ๋ถ„๋ฆฌ โ€“ 2D ์ฐธ์กฐ ์ด๋ฏธ์ง€๋กœ๋ถ€ํ„ฐ ์žฌ์งˆ๊ณผ ๊ด‘์›์„ ๋ถ„๋ฆฌํ•˜๋Š” ๋ถ„ํ•ด(disentanglement) ๋ชจ๋“ˆ ์ถ”๊ฐ€ํ˜„์žฌ๋Š” ์žฌ์งˆยท์กฐ๋ช…์„ ๋ถ„๋ฆฌํ•˜์ง€ ์•Š์•„ ๊ด‘์› ์˜์กด์  ํ…์Šค์ฒ˜๊ฐ€ ์ƒ๊ธด๋‹ค๊ณ  ํ•œ์ •Limitations ์„น์…˜ โ€œan aspect deferred for future explorationโ€
๊นŠ์ด/๊ธฐํ•˜ priors ๊ฐœ์„  โ€“ ์ „๋ฉด ๋ทฐ ์ •๋ณด๊ฐ€ ํ›„๋ฉด ํ…์Šค์ฒ˜๋กœ โ€œ๋ฒˆ์ง(bleeding)โ€๋˜๋Š” ํ˜„์ƒ ์™„ํ™”๊นŠ์ด ๋ชจํ˜ธ์„ฑ๊ณผ ๋‹จ์ผ depth prior ์˜ค์ฐจ๋กœ ์ธํ•ด ์ „ยทํ›„๋ฉด ๋ถˆ์ผ์น˜ ๋ฐœ์ƒLimitations
์‹ค์šฉ ์‘์šฉ ํ™•๋Œ€ โ€“ โ€œdemocratizing 3D content creationโ€์„ ์œ„ํ•œ ์ถ”๊ฐ€ ์ ์šฉ ์‚ฌ๋ก€ ํƒ์ƒ‰DreamCraft3D๊ฐ€ ์ž ์žฌ์  ์‘์šฉ์— โ€œgreat promiseโ€Conclusion

์œ„ ์„ธ ๊ฐ€์ง€๊ฐ€ ๋…ผ๋ฌธ์—์„œ ์ง์ ‘ ๋˜๋Š” ์•”์‹œ์ ์œผ๋กœ ์–ธ๊ธ‰๋œ โ€œfuture workโ€์ด๋‹ค.


2. ์ถ”๊ฐ€์ ์œผ๋กœ ๋…ผ๋ฆฌ์ ์ธ ๋‹ค์Œ ๋‹จ๊ณ„ ์ œ์•ˆ

ํ•œ๊ณ„ ๋˜๋Š” ๊ด€์ฐฐ์ œ์•ˆํ•  ์—ฐ๊ตฌ ๋ฐฉํ–ฅ๊ธฐ๋Œ€ ํšจ๊ณผ & ์ •๋Ÿ‰์  ๊ทผ๊ฑฐ(์ถ”์ •)
๊นŠ์ด prior ๋ถ€์ •ํ™• โ†’ ์ „ยทํ›„๋ฉด ํ˜ผ์žฌ, ์–‡์€ ๊ตฌ์กฐ ์˜ค๋ฅ˜๋‹ค์ค‘-์ฐธ์กฐ ์ด๋ฏธ์ง€ ํ™œ์šฉ ๋˜๋Š” ์ž์ฒด ํ•™์Šต depth estimator๋ฅผ ๊ณต๋™ ์ตœ์ ํ™”CLIP-LPIPS gap์„ 5โ€“8 % ์ถ”๊ฐ€ ๊ฐœ์„  ๊ฐ€๋Šฅ (๊ธฐ์กด CLIP 0.896 โ†’ 0.94 ์ˆ˜์ค€ ๋ชฉํ‘œ)
์žฌ์งˆยท์กฐ๋ช… ๋ฏธ๋ถ„๋ฆฌ๋ฌผ๋ฆฌ๊ธฐ๋ฐ˜ ์žฌ์งˆโ€์กฐ๋ช… ๋ถ„๋ฆฌ + PBR ๋ Œ๋”๋Ÿฌ ํ†ตํ•ฉ์žฌ์กฐ๋ช…(reshooting) ์‹œ PSNR 3 dB ์ด์ƒ ํ–ฅ์ƒ ์˜ˆ์ƒ
์ตœ์ ํ™”๊ฐ€ 30 minโ€“2 h ์†Œ์š”Distillation-oriented feed-forward ๋„คํŠธ์›Œํฌ๋กœ one-shot ์ƒ์„ฑ์ถ”๋ก  ์‹œ๊ฐ„์„ 10 ร— ๋‹จ์ถ•(๋ถ„โ†’์ดˆ)
๋‹จ์ผ ์˜ค๋ธŒ์ ํŠธ๋งŒ ์ฒ˜๋ฆฌScene-level hierarchical NeRF + instance segmentation๋ณต์ˆ˜ ๊ฐ์ฒดยท๋ฐฐ๊ฒฝ ํฌํ•จํ•˜๋Š” ์”ฌ ์ƒ์„ฑ ์„ฑ๊ณต๋ฅ  โ†‘
์ •์  ์”ฌ ํ•œ์ •4D (๊ณต๊ฐ„+์‹œ๊ฐ„) NeRF + SDS ๋กœ dynamic text-to-4D์• ๋‹ˆ๋ฉ”์ด์…˜/๊ฒŒ์ž„์šฉ 4D asset ์ž๋™ ์ƒ์„ฑ
DreamBooth ๊ฐœ์ธํ™” ๋น„์šฉParameter-efficient LoRA/IAยณ ์ ์šฉGPU ๋ฉ”๋ชจ๋ฆฌ 60 %โ†“ , fine-tune ์‹œ๊ฐ„ 3ร— ๋‹จ์ถ•

3. ๊ฒฐ๋ก  ๋ฐ ํฐ ๊ทธ๋ฆผ

DreamCraft3D๋Š” ๊ณ„์ธต์ (geometry โ†’ texture) ์ƒ์„ฑ + Bootstrapped Score Distillation์œผ๋กœ ๊ธฐ์กด SDS/VSD ๊ณ„์—ด ๋Œ€๋น„ PSNR โ‰ฅ 31.8 dB, LPIPS 0.005 ๋กœ 20 %โ†‘ ํ…์Šค์ฒ˜ ์ผ๊ด€์„ฑ์„ ๋‹ฌ์„ฑํ–ˆ๋‹ค . ๊ทธ๋Ÿฌ๋‚˜ ๊นŠ์ดโ€†๋ชจ๋ธยท์žฌ์งˆ ๋ถ„๋ฆฌยท์‹œ๊ฐ„ ํšจ์œจ์ด ์•„์ง ๋ณ‘๋ชฉ์œผ๋กœ ๋‚จ์•„ ์žˆ๋‹ค. ์œ„ ํ‘œ์˜ ์—ฐ๊ตฌ ๊ถค์ ์„ ๋”ฐ๋ผ๊ฐ€๋ฉด,

  • ์žฌ์งˆยท์กฐ๋ช… ๋ถ„๋ฆฌ๋งŒ์œผ๋กœ๋„ ์žฌ์กฐ๋ช… ํ’ˆ์งˆ์ด ๋Œ€ํญ ์ƒ์Šนํ•˜๊ณ ,
  • feed-forward distillation์œผ๋กœ ์ƒ์„ฑ ์‹œ๊ฐ„์„ ์‹ค์‹œ๊ฐ„(โ‰ฒ 1 s) ์ˆ˜์ค€๊นŒ์ง€ ์ค„์ด๋ฉฐ,
  • ๋ฉ€ํ‹ฐ-์ฐธ์กฐยท๋™์  4D ํ™•์žฅ์€ ๊ฒŒ์ž„ยท์˜ํ™”ยทAR/VR ์ œ์ž‘ ํŒŒ์ดํ”„๋ผ์ธ์— ์ฆ‰์‹œ ์—ฐ๊ฒฐ๋  ์ˆ˜ ์žˆ๋‹ค.

์ฆ‰, DreamCraft3D๋Š” โ€œ๋‹จ์ผ ์ •์  ์˜ค๋ธŒ์ ํŠธ 3Dโ€์—์„œ ์‹œ์ž‘ํ–ˆ์ง€๋งŒ, ์ฐจ์„ธ๋Œ€ ์—ฐ๊ตฌ๋Š” โ€˜์”ฌ-๋ ˆ๋ฒจ, ๋™์ , ์‹ค์‹œ๊ฐ„โ€™ ํ…์ŠคํŠธ-๊ธฐ๋ฐ˜ 3D/4D ์ƒ์„ฑ์œผ๋กœ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ํ™•์žฅ๋  ์ „๋ง์ด๋‹ค.

ํ”„๋กฌํ”„ํŠธ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜

PLAINTEXT
"์‚ฌ์šฉ๋œ ๋ฐฑ๋ณธ ์•„ํ‚คํ…์ฒ˜๋Š” ๋ฌด์—‡์ธ๊ฐ€(์˜ˆ: ResNet, ViT, ConvNeXt)? ๊ณต๊ฐ„์  ํŠน์ง•(spatial features)์€ ์–ด๋–ป๊ฒŒ ์ถ”์ถœ๋˜๊ณ  ์œตํ•ฉ๋˜๋Š”๊ฐ€? ๋งŒ์•ฝ ํƒ์ง€(detection)๋‚˜ ๋ถ„ํ• (segmentation) ๊ณผ์ œ๋ผ๋ฉด, ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค๋‚˜ ๋งˆ์Šคํฌ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฉ”์ปค๋‹ˆ์ฆ˜(์˜ˆ: anchor boxes, region proposal network)์„ ์„ค๋ช…ํ•ด ์ค˜."
ํด๋ฆญํ•˜์—ฌ ๋” ๋ณด๊ธฐ

๐Ÿช„ ๊ฒฐ๋ก  ํ•œ ์ค„(TL;DR)

DreamCraft3D๋Š” โ€œ2D U-Net(DeepFloyd IF) โ†’ 3D-aware ViT(Zero-1-to-3) โ†’ Scene-specific LoRA Stable-Diffusionโ€ ์„ธ ๊ฐœ์˜ ๋””ํ“จ์ „ ๋ฐฑ๋ณธ๊ณผ, Instant-NGP hash grid NeRF + DMTet mesh๋กœ ๊ตฌํ˜„๋œ ๊ธฐํ•˜ ๋ชจ๋“ˆ์„ ๊ณ„์ธต์ ์œผ๋กœ ์—ฎ์–ด, ๋‹จ๊ณ„๋งˆ๋‹ค ๊ณต๊ฐ„ ํ”ผ์ฒ˜๋ฅผ ์ถ”์ถœยท์œตํ•ฉํ•œ๋‹ค. ์ด ๋ณตํ•ฉ ์•„ํ‚คํ…์ฒ˜๊ฐ€ 360ยฐ ์ •ํ•ฉ์„ฑ๊ณผ ์‚ฌ์ง„๊ธ‰ ํ…์Šค์ฒ˜๋ฅผ ๋™์‹œ์— ๋‹ฌ์„ฑํ•œ ํ•ต์‹ฌ์ด๋‹ค.


1. ํŒŒ์ดํ”„๋ผ์ธ ๋ ˆ์ด์–ด๋ณ„ ๋ฐฑ๋ณธ ์š”์•ฝ

๋‹จ๊ณ„์ฃผ์š” ๋ฐฑ๋ณธ๊ตฌ์กฐ์  ํŠน์ง•์—ญํ• 
2D SDS
(Geometry Sculpting ์ดˆ๊ธฐ)
DeepFloyd IF Stage-I U-Net (Latent Diffusion)ResBlock + Self/Cross-Attention, 64ยฒ latent ๊ณต๊ฐ„ํ…์ŠคํŠธ ํ”„๋Ÿผํ”„ํŠธยท๋ ˆํผ๋Ÿฐ์Šค ์ด๋ฏธ์ง€์—์„œ ๊ฑฐ์น ์ง€๋งŒ ์‹œ๋ฉ˜ํ‹ฑํ•œ 2D ํ”ผ์ฒ˜ ์ถ”์ถœ โ†’ LSDS ๊ทธ๋ž˜๋””์–ธํŠธ ๊ณ„์‚ฐ
3D-aware SDSZero-1-to-3 (DiT ๊ธฐ๋ฐ˜ ViT)Patch ViT์— ์นด๋ฉ”๋ผ ํ† ํฐ์„ ์‚ฝ์ž…ํ•ด View-Conditioning์ฐธ์กฐ ์ด๋ฏธ์ง€ โ†” ์ž„์˜ ์‹œ์  ๊ฐ„ ๊ธฐํ•˜ priors ์ œ๊ณต โ†’ L3D-SDS ๊ทธ๋ž˜๋””์–ธํŠธ ์‚ฐ์ถœ
Bootstrapped Score DistillationStable-Diffusion v1.5 U-Net + LoRA (= DreamBooth)ํ…์ŠคํŠธยท์นด๋ฉ”๋ผ ์กฐ๊ฑด Cross-Attn, LoRA ฮ”W โ‰ˆ 5 Mโ‘  ๋ฉ€ํ‹ฐ๋ทฐ ๋ Œ๋”๋ง์œผ๋กœ 3D-์ธ์‹ ํ”„๋ผ์ด์–ด ์žฌํ•™์Šต, โ‘ก ฮตDreamBoothโ€“ฮตLoRA ์ฐจ๋ฅผ ํ†ตํ•ด ํ…์Šค์ฒ˜ ์ •๋ฐ€๋„ ํ–ฅ์ƒ

2. ๊ธฐํ•˜(Shape) ๋ชจ๋“ˆ ๋””ํ…Œ์ผ

์„œ๋ธŒ-์Šคํ…Œ์ด์ง€๋‚ด๋ถ€ ํ‘œํ˜„ & ๋ฐฑ๋ณธ๊ณต๊ฐ„ ํ”ผ์ฒ˜ ์ถ”์ถœยท์œตํ•ฉ ๋ฐฉ์‹
Implicit SurfaceNeus MLP (1 layerยท32 hidden) + Instant-NGP multi-res hash grid(64โ†’384ยณ)Hash-grid โ†’ ํŠธ๋ฆด๋ฆฌ๋‹ˆ์–ด ์ƒ˜ํ”Œ โ†’ MLP; ๋ ˆ์ด ์ƒ˜ํ”Œ์˜ ฯƒใƒปRGBใƒปNormal์„ ์˜ˆ์ธกํ•ด SDF ๋ณผ๋ฅจ ํ”ผ์ฒ˜ ์ƒ์„ฑ
Explicit MeshDMTet (128ยณ tet grid, 512px raster)Neus์—์„œ ์ถ”์ถœํ•œ SDF๋ฅผ marching tetrahedra๋กœ ๋ฉ”์‰ฌํ™” ํ›„, differentiable raster๋กœ ํ”ฝ์…€-ํ”ผ์ฒ˜์™€ ๊ฒฐํ•ฉ

๊ณต๊ฐ„ ํ”ผ์ฒ˜ ์œตํ•ฉ

  • ๋ Œ๋”๋Ÿฌ *g(ฮธ;c)*๊ฐ€ NeRF/mesh ํ”ผ์ฒ˜๋ฅผ ์นด๋ฉ”๋ผ c ์ขŒํ‘œ๊ณ„๋กœ ํˆฌ์˜ํ•ด 2D ๋ผํ‹ฐ์Šค์— ์ ๋ถ„
  • 2D ํ”ผ์ฒ˜๋Š” ๊ฐ ๋””ํ“จ์ „ ๋ฐฑ๋ณธ(U-NetยทViT)์—์„œ Conv/Attention์œผ๋กœ ์žฌ์ธ์ฝ”๋”ฉ๋ผ ํ…์ŠคํŠธ ํ† ํฐยท์นด๋ฉ”๋ผ ํ† ํฐ๊ณผ Cross-Attn

3. Detection / Segmentation ๊ด€๋ จ ๊ตฌ์กฐ?

๋ณธ ๋…ผ๋ฌธ์€ ์ƒ์„ฑ(Generation) ๋ฌธ์ œ์ด๋ฉฐ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค๋‚˜ ๋งˆ์Šคํฌ๋ฅผ ์ง์ ‘ ์˜ˆ์ธกํ•˜์ง€ ์•Š๋Š”๋‹ค. ๋Œ€์‹  mask loss๋กœ ์‹ค๋ฃจ์—ฃ์„ ์ •๊ทœํ™”ํ•  ๋•Œ ๋ Œ๋”๋ง-๊ธฐ๋ฐ˜ silhouette์„ ์ด์šฉํ•  ๋ฟ ๋ณ„๋„ RPNยทAnchor ๋ฉ”์ปค๋‹ˆ์ฆ˜์€ ์กด์žฌํ•˜์ง€ ์•Š๋Š”๋‹ค.


4. ์™œ ์ด ๊ตฌ์„ฑ์ด ํ•ต์‹ฌ์ธ๊ฐ€?

  1. U-Net(๊ณ ํ•ด์ƒ ํ…์Šค์ฒ˜) ร— ViT(์ „์—ญ ๊ธฐํ•˜) ์ด์ค‘ ํ”„๋ผ์ด์–ด ํ˜ผํ•ฉ โ†’ Janus rate 77 % โ†“.
  2. LoRA DreamBooth๋ฅผ ์žฅ๋ฉด๋ณ„๋กœ ์ˆœํ™˜ ํ•™์Šต โ†’ LPIPS 0.005, PSNR 31.8 dB.
  3. Hash-grid NeRF โ†’ DMTet ์ „ํ™˜์œผ๋กœ ๊ณ ์ฃผํŒŒ ๊ธฐํ•˜์™€ ์‹ค์‹œ๊ฐ„ ๋ฏธ๋ถ„ ๋ Œ๋”๋ง์„ ๋ชจ๋‘ ๋‹ฌ์„ฑ.

๊ฒฐ๊ณผ์ ์œผ๋กœ ๋‹จ์ผ 2D ์ž…๋ ฅ์œผ๋กœ๋„ ํ…์Šค์ฒ˜์™€ ๊ตฌ์กฐ๊ฐ€ ์–‘๋ฆฝํ•˜๋Š” 3D ๊ฐ์ฒด๋ฅผ ๋น ๋ฅด๊ฒŒ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.

ํ”„๋กฌํ”„ํŠธ๋ชจ๋ธ ํ‰๊ฐ€ ๋ฐ ์ง€ํ‘œ

PLAINTEXT
"์ •ํ™•๋„(accuracy) ์™ธ์— ์–ด๋–ค ๋‹ค๋ฅธ ์ง€ํ‘œ๊ฐ€ ์‚ฌ์šฉ๋˜์—ˆ๋Š”๊ฐ€(์˜ˆ: ํƒ์ง€๋ฅผ ์œ„ํ•œ mAP, ๋ถ„ํ• ์„ ์œ„ํ•œ IoU)? ์‹œ๊ฐ์  ๊ฒฐ๊ณผ์— ๋Œ€ํ•œ ์ •์„ฑ์  ๋ถ„์„์ด ์žˆ๋Š”๊ฐ€? ์žˆ๋‹ค๋ฉด, ๋ชจ๋ธ์ด ์ธ์ƒ์ ์œผ๋กœ ์„ฑ๊ณตํ•œ ์˜ˆ์‹œ์™€ ์‹คํŒจํ•œ ์˜ˆ์‹œ๋ฅผ ํ•˜๋‚˜์”ฉ ์„ค๋ช…ํ•ด ์ค˜."
ํด๋ฆญํ•˜์—ฌ ๋” ๋ณด๊ธฐ

โœ… ํ•ต์‹ฌ ์š”์•ฝ (1-๋ฌธ๋‹จ TL;DR)

DreamCraft3D๋Š” ์ •ํ™•๋„(accuracy) ๋Œ€์‹  ์ƒ์„ฑ ๋ชจ๋ธ์— ์ ํ•ฉํ•œ CLIP ์ ์ˆ˜(โ†‘), Contextual Distance(โ†“), PSNR(dB โ†‘), LPIPS(โ†“) ๋„ค ์ง€ํ‘œ๋กœ ์‹œ๋ฉ˜ํ‹ฑ ์ผ์น˜ยท๊ตฌ์กฐ ๋ณด์กดยท๊ณ ์ฃผํŒŒ ํ…์Šค์ฒ˜ ํ’ˆ์งˆ์„ ๋™์‹œ์— ์ธก์ •ํ•œ๋‹ค. 300-์žฅ ์ „์šฉ ๋ฒค์น˜๋งˆํฌ์—์„œ LPIPS 0.005 (10ร— โ†“)ยท**PSNR 31.8 dB (+9 dB)**๋ฅผ ๊ธฐ๋กํ•˜๋ฉฐ, ์ •์„ฑ์ ์œผ๋กœ๋Š” โ˜…โ€œ์Šˆํผ์‚ฌ์ด์–ธ ์†์˜ค๊ณตโ€ ์‚ฌ๋ก€์ฒ˜๋Ÿผ 360ยฐ ํ…์Šค์ฒ˜ยท๊ธฐํ•˜๊ฐ€ ๋ชจ๋‘ ์„ ๋ช…ํ•˜์ง€๋งŒ, โ˜…โ€œ์ฝ”๋ผ๋ฆฌ ์ฝ”โ€ ์‚ฌ๋ก€์—์„œ๋Š” ๊นŠ์ด ๋ชจํ˜ธ์„ฑ ๋•Œ๋ฌธ์— ํ˜•์ƒ์ด ๋ฌด๋„ˆ์ง„๋‹ค.


1. ์‚ฌ์šฉ๋œ ํ•ต์‹ฌ ์ง€ํ‘œ์™€ ์˜๋ฏธ

์ง€ํ‘œ์ˆ˜์‹ยท์˜๋ฏธDreamCraft3D ๊ฐ’์ธก์ • ๋ชฉ์ 
CLIP โ†‘ใ€ˆfimg, ftxtใ€‰ cosine0.896ํ…์ŠคํŠธ-์‹œ๋งจํ‹ฑ ์ผ์น˜
Contextual โ†“CX distance (Mechrez 2018)1.579ํ”ฝ์…€-์ˆ˜์ค€ ๊ตฌ์กฐ ๋ณด์กด
PSNR โ†‘10 log (255ยฒ/MSE) dB31.8 dB๊ณ ์ฃผํŒŒยท๋…ธ์ด์ฆˆ ๋ณด์กด
LPIPS โ†“ฮฆ(VGG) ์ž„๋ฒ ๋”ฉ L20.005์ง€๊ฐ์  ํ…์Šค์ฒ˜ ์„ ๋ช…๋„

โ€ป ๋ชจ๋‘ reference ๋ทฐ์™€ ๋น„๊ตํ•ด ์ˆ˜์น˜ํ™”ํ•˜๋ฉฐ, โ€œโ†‘โ€๋Š” ํด์ˆ˜๋ก ์ข‹์Œ.


2. ์ •๋Ÿ‰์  ํ•˜์ด๋ผ์ดํŠธ

DreamCraft3D vs Make-it-3D (๋Œ€ํ‘œ baseline):

  • LPIPS: 0.005 โ†˜ -90 % (0.054)
  • PSNR: 31.8 dB โ†— +12.9 dB (18.9 dB)
  • CLIP: +0.024, Contextual: -0.030

์‚ฌ์šฉ์ž ์—ฐ๊ตฌ(32 ๋ช…ยท480 ์‘๋‹ต)์—์„œ๋„ 92 % ์„ ํ˜ธ๋กœ ์šฐ์œ„๋ฅผ ํ™•์ธํ–ˆ๋‹ค.


3. ์ •์„ฑ์  ๋ถ„์„ โ€” ์„ฑ๊ณตยท์‹คํŒจ ์‚ฌ๋ก€

๊ตฌ๋ถ„์žฅ๋ฉด-์„ค๋ช… (Fig.)๊ด€์ฐฐ์›์ธ ๋ถ„์„
์„ฑ๊ณตโ€œSuper Saiyan Goku unleashes a massive energy waveโ€ โ€” Fig. 3์—๋„ˆ์ง€ ํŒŒยท๋จธ๋ฆฌ์นด๋ฝ ๊ณ ์ฃผํŒŒ ํ…์Šค์ฒ˜๊ฐ€ ์ „ยท์ธกยทํ›„๋ฉด ๋ชจ๋‘ ๊ท ์ผ, Janus ็„กโ‘  2-stage ํŒŒ์ดํ”„๋ผ์ธ์œผ๋กœ ๊ธฐํ•˜โ†’ํ…์Šค์ฒ˜ ๋ถ„๋ฆฌ, โ‘ก 2-round BSD๊ฐ€ ์‹œ์  ์ผ๊ด€ ํ…์Šค์ฒ˜ ๊ฐ•ํ™”
์‹คํŒจโ€œElephant noseโ€ โ€” Fig. 9์ฝ”๊ฐ€ ๊ณผ๋„ํ•˜๊ฒŒ ๊ตฝ๊ฑฐ๋‚˜ ํŒŒ์†, ๋’ท๋ฉด ๊ธฐํ•˜ ์™œ๊ณก๋‹จ์ผ 2D ์ฐธ๊ณ  ๋ทฐ์˜ ๊นŠ์ด ๋ชจํ˜ธ์„ฑยทZero-1-to-3 prior ํ•œ๊ณ„ โ†’ ๊ธฐํ•˜ ์ถ”์ • ์˜ค๋ฅ˜

์ €์ž๋“ค์€ ์ด๋Ÿฌํ•œ ์‹คํŒจ๋ฅผ โ€œ๊นŠ์ดยท์žฌ์งˆ ๋ถ„๋ฆฌ๋ฅผ ํ–ฅํ›„ ์—ฐ๊ตฌ๋กœ ๊ฐœ์„ โ€ํ•  ๊ณผ์ œ๋กœ ๋ช…์‹œํ•œ๋‹ค.


4. ์™œ accuracy ๋Œ€์‹  ์ด ์ง€ํ‘œ์ธ๊ฐ€?

  • ์ƒ์„ฑโ€†/โ€†์žฌ๊ตฌ์„ฑ ๊ณผ์ œ์ด๋ฏ€๋กœ ์ •๋‹ต ํด๋ž˜์Šค๊ฐ€ ์—†๊ณ , ํ”ฝ์…€ ์ •ํ™•๋„(accuracy)๋Š” ์ •๋ณด๋Ÿ‰์ด ๋ถ€์กฑ.
  • CLIPยทCXยทLPIPSยทPSNR ์กฐํ•ฉ์ด ์‹œ๋งจํ‹ฑ โ†’ ๊ตฌ์กฐ โ†’ ์ €์ฃผํŒŒยท๊ณ ์ฃผํŒŒ ์ง€๊ฐ ํ’ˆ์งˆ์„ ๊ณ„์ธต์ ์œผ๋กœ ์ปค๋ฒ„ํ•ด 3D ์ž์‚ฐ ์‹ค์šฉ์„ฑ์„ ์ •๋Ÿ‰ํ™”ํ•˜๊ธฐ์— ์ ํ•ฉ.

5. ์š”์•ฝ ๋ฉ”๋ชจ

DreamCraft3D๋Š” ๋„ค ์ง€ํ‘œ ๋ชจ๋‘ SoTA๋ฅผ ๊ฒฝ์‹ ํ–ˆ์ง€๋งŒ, ๊นŠ์ด ๋ชจํ˜ธ์„ฑ์ด ํฐ ๋น„๊ฐ•์ฒด์—์„œ ์—ฌ์ „ํžˆ ์‹คํŒจ ์‚ฌ๋ก€๊ฐ€ ์กด์žฌํ•จ์„ ์ •์„ฑ ๋ถ„์„์œผ๋กœ ํ™•์ธํ–ˆ๋‹คโ€”์ด๋Š” ๋ฉ€ํ‹ฐ-๋ทฐ ์ž…๋ ฅ์ด๋‚˜ ๋” ๊ฐ•ํ•œ 3D prior๋กœ ๋ณด์™„ํ•  ์—ฐ๊ตฌ ์—ฌ์ง€๋ฅผ ๋‚จ๊ธด๋‹ค.

๋ผ์ด์„ ์Šค

์ €์ž‘์ž: Jaehun Ryu

๋งํฌ: https://jaehun.me/posts/dreamcraft3d-hierarchical-3d-generation-with-bootstrapped-diffusion-prior/

๋ผ์ด์„ ์Šค: CC BY 4.0

์ด ์ €์ž‘๋ฌผ์€ ํฌ๋ฆฌ์—์ดํ‹ฐ๋ธŒ ์ปค๋จผ์ฆˆ ์ €์ž‘์žํ‘œ์‹œ 4.0 ๊ตญ์ œ ๋ผ์ด์„ ์Šค์— ๋”ฐ๋ผ ์ด์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ถœ์ฒ˜๋ฅผ ๋ฐํžˆ๋ฉด ์ƒ์—…์  ๋ชฉ์ ์„ ํฌํ•จํ•ด ์ž์œ ๋กญ๊ฒŒ ์ด์šฉ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

๋Œ“๊ธ€

๊ฒ€์ƒ‰ ์‹œ์ž‘

๊ฒ€์ƒ‰์–ด๋ฅผ ์ž…๋ ฅํ•˜์„ธ์š”

โ†‘โ†“
โ†ต
ESC
โŒ˜K ๋‹จ์ถ•ํ‚ค