[25.08.07] Denoising Diffusion Implicit Models ‐ 2 - Paper-Reading-Study/2025 GitHub Wiki

Paper Reading Study Notes

General Information

Paper Title: DENOISING DIFFUSION IMPLICIT MODELS
Authors: Jiaming Song, Chenlin Meng & Stefano Ermon
Published In: ICLR
Year: 2021
Link: https://arxiv.org/abs/2010.02502
Date of Discussion: August 7, 2025

Summary

Research Problem: Denoising Diffusion Probabilistic Models (DDPMs) produce high-quality images but are extremely slow to sample from, as they require simulating a Markov chain for thousands of small steps. This paper aims to significantly accelerate the sampling process.
Key Contributions: The paper introduces Denoising Diffusion Implicit Models (DDIMs), a more efficient class of generative models. DDIMs generalize the Markovian diffusion process of DDPMs to non-Markovian ones. This allows for a much faster sampling trajectory (e.g., 10-50 steps instead of 1000) without retraining the model. It also enables deterministic generation, which allows for meaningful latent space interpolation.
Methodology/Approach: The core idea is to define a new, non-Markovian forward process that still results in the same training objective as DDPM. This allows a pre-trained DDPM model to be used for DDIM sampling. The generation process is modified to take larger, deterministic steps (when a parameter σ is set to 0), effectively "skipping" most of the steps from the original DDPM sequence.
Results: DDIMs achieve a significantly better trade-off between computation and sample quality. With as few as 20-100 steps, DDIMs can generate samples of comparable quality to a 1000-step DDPM, resulting in a 10-50x speedup. The deterministic nature of DDIMs also allows for consistent reconstruction from latent codes and semantic image interpolation.

Discussion Points

Strengths:
- Efficiency: The massive speedup in sampling is the most compelling advantage. The group recognized this as the primary solution to DDPM's main bottleneck.
- No Retraining: The ability to use an existing, pre-trained DDPM model is a huge practical benefit.
- Deterministic Latent Space: The deterministic nature of DDIM (σ=0) allows for consistent generation from a given latent variable and enables semantic interpolation, which is not possible with the stochastic DDPM process.
Weaknesses:
- Theoretical Complexity: The group found the theoretical justification, particularly the unified variational objective in Section 3.2 and Theorem 1, difficult to grasp intuitively. They felt the mathematical argument was a bit like "forcing it to work" ("억지에 억지를 부리는 느낌").
- Potential for Mode Collapse: A question was raised whether taking very large steps could lead to issues like mode collapse, although the paper's results suggest it performs well in practice.
Key Questions:
- What is the deep intuition behind why the non-Markovian process leads to the same training objective, allowing the model to be reused?
- How exactly does conditioning on the initial state x_0 in the forward process justify taking larger, more direct steps in the reverse (generative) process?
- In the experimental results (Table 1), what is the exact difference between the DDPM with η=1 and the one denoted by σ_hat? The group was confused by the performance difference.
- The connection to Score-Based Models and SDEs was noted as fascinating but not fully understood, pointing to a knowledge gap the group wanted to address.
Applications:
- Fast, high-quality image generation for practical applications where latency is critical.
- Creative tools that leverage semantic latent space interpolation for smooth transitions between images.
- Image reconstruction and manipulation via encoding an image into a latent code and then decoding it.
Connections:
- The discussion highlighted the fascinating connection between seemingly different models. DDIM acts as a bridge between DDPMs and Score-Based Models (via SDEs/ODEs).
- The generative process of DDIM was seen as analogous to solving an ODE, where the model predicts the "flow" or direction towards the final image.

Notes and Reflections

Interesting Insights:
- The group found it "magical" that a complex, intractable reverse process could be so effectively approximated and accelerated by a neural network. They saw DDIM as a clever "hack" that breaks the strict Markovian assumption of DDPMs by looking at the overall trajectory from noise to data.
- The progression from DDPM to DDIM and its connection to score models was seen as a process of repeated generalization: first generalizing data to a noise distribution, and then generalizing the path back.
Lessons Learned:
- The key takeaway was understanding the speed-quality trade-off. DDIM is not necessarily superior at 1000 steps, but it is vastly more practical because it achieves high quality with far fewer steps.
- The discussion solidified the understanding that DDPM is stochastic, while DDIM can be made deterministic, which is the source of its unique properties like interpolation and consistency.
Future Directions:
- The group concluded that a deeper study of Stochastic Differential Equations (SDEs) is necessary to fully understand the theoretical underpinnings of modern generative models.
- Investigating the limits of the step-skipping process in DDIM could be an interesting direction. How many steps can be skipped before generation quality severely degrades?