[25.06.30] Denoising Diffusion Implicit Models (DDIM) - Paper-Reading-Study/2025 GitHub Wiki
Paper Reading Study Notes
General Information
- Paper Title: Denoising Diffusion Implicit Models (DDIM)
- Authors: Jiaming Song, Chenlin Meng & Stefano Ermon
- Published In: ICLR
- Year: 2021
- Link: https://arxiv.org/abs/2010.02502
- Date of Discussion: 2025.06.30
Summary
- Research Problem: Denoising Diffusion Probabilistic Models (DDPMs) produce high-quality images but suffer from extremely slow sampling times, as they require simulating a Markov chain for thousands of steps to generate a single sample. This makes them impractical for many applications.
- Key Contributions:
- It generalizes the DDPM framework by introducing a class of non-Markovian diffusion processes that share the same training objective as DDPMs.
- It enables a much faster sampling procedure (10-50x speed-up) by allowing the model to skip steps in the reverse (generation) process.
- It introduces a deterministic generative process (when parameter
η=0
), which allows for semantically meaningful interpolation in the latent space, a feature not possible with the stochastic DDPMs.
- Methodology/Approach: The core idea is to reformulate the generative process. Instead of being strictly Markovian (where
x_{t-1}
only depends onx_t}
), the DDIM process is non-Markovian, conditioning the prediction ofx_{t-1}
on bothx_t
and the predicted clean imagex_0
. Since the DDPM's objective function can be interpreted as training a model to predictx_0
fromx_t
, the exact same pre-trained DDPM model can be used for DDIM sampling without any re-training. The speed-up is achieved by creating a "sampling trajectory" that uses a sub-sequence of the original timesteps. - Results: DDIM significantly outperforms DDPM in sample quality (FID) when using a small number of sampling steps. While DDPM's quality degrades rapidly with fewer steps, DDIM maintains high quality, offering a much better trade-off between computation and sample quality.
Discussion Points
-
Strengths:
- The paper's main strength is achieving a massive efficiency gain without needing to retrain the model. The discussion highlighted this as a "romantic" and elegant approach to research.
- The idea of decoupling the training objective from the inference procedure was seen as highly innovative. It shows that the DDPM model was being used "inefficiently."
- The method is a true generalization of DDPMs; by setting a parameter (
η=1
), the DDIM sampling process becomes equivalent to the DDPM process.
-
Weaknesses:
- The mathematical derivations, particularly the connection to ODEs and the details in the appendix, were found to be complex and not immediately intuitive.
- The discussion noted that while the results are impressive, the performance of DDIM with very few steps is still slightly worse than with the full step count, indicating a clear trade-off.
-
Key Questions:
- The central question during the discussion was: "How is it possible to use the same pre-trained DDPM model for a different sampling process?"
- The conclusion reached was that the DDPM model is trained on a general task—predicting the original image (
x_0
) or noise (ε
) from a noisy input (x_t
). This task is independent of the sampling path. DDIM provides a new, more efficient path that can still leverage this learned denoising function.
-
Applications:
- Makes diffusion models practical for real-world use cases where generation speed is critical.
- Enables semantic image editing and interpolation by manipulating the latent variable
x_T
, thanks to the deterministic generation process.
-
Connections:
- This work is a direct improvement on DDPMs.
- It helps close the performance gap between high-quality but slow diffusion models and fast but often unstable GANs.
Notes and Reflections
-
Interesting Insights:
- The most surprising insight was that a model's fundamental limitation (slow sampling) could be solved by reframing the underlying mathematical process rather than changing the model or training.
- The discussion emphasized how DDIM reveals that the Markovian property is a constraint that can be relaxed, leading to a more flexible and efficient generative process.
-
Lessons Learned:
- A deep theoretical understanding of a model can unlock significant practical improvements.
- A single trained model can be versatile; its utility is not limited to the specific inference procedure it was originally designed for.
-
Future Directions:
- The discussion did not delve deep into future work, but the paper's approach suggests that exploring other non-Markovian processes or applying more advanced ODE solvers could further improve sampling efficiency and quality.