[25.06.30] Denoising Diffusion Implicit Models (DDIM) - Paper-Reading-Study/2025 GitHub Wiki

Paper Reading Study Notes

General Information

Paper Title: Denoising Diffusion Implicit Models (DDIM)
Authors: Jiaming Song, Chenlin Meng & Stefano Ermon
Published In: ICLR
Year: 2021
Link: https://arxiv.org/abs/2010.02502
Date of Discussion: 2025.06.30

Summary

Research Problem: Denoising Diffusion Probabilistic Models (DDPMs) produce high-quality images but suffer from extremely slow sampling times, as they require simulating a Markov chain for thousands of steps to generate a single sample. This makes them impractical for many applications.
Key Contributions:
1. It generalizes the DDPM framework by introducing a class of non-Markovian diffusion processes that share the same training objective as DDPMs.
2. It enables a much faster sampling procedure (10-50x speed-up) by allowing the model to skip steps in the reverse (generation) process.
3. It introduces a deterministic generative process (when parameter η=0), which allows for semantically meaningful interpolation in the latent space, a feature not possible with the stochastic DDPMs.
Methodology/Approach: The core idea is to reformulate the generative process. Instead of being strictly Markovian (where x_{t-1} only depends on x_t}), the DDIM process is non-Markovian, conditioning the prediction of x_{t-1} on both x_t and the predicted clean image x_0. Since the DDPM's objective function can be interpreted as training a model to predict x_0 from x_t, the exact same pre-trained DDPM model can be used for DDIM sampling without any re-training. The speed-up is achieved by creating a "sampling trajectory" that uses a sub-sequence of the original timesteps.
Results: DDIM significantly outperforms DDPM in sample quality (FID) when using a small number of sampling steps. While DDPM's quality degrades rapidly with fewer steps, DDIM maintains high quality, offering a much better trade-off between computation and sample quality.

Discussion Points

Strengths:
- The paper's main strength is achieving a massive efficiency gain without needing to retrain the model. The discussion highlighted this as a "romantic" and elegant approach to research.
- The idea of decoupling the training objective from the inference procedure was seen as highly innovative. It shows that the DDPM model was being used "inefficiently."
- The method is a true generalization of DDPMs; by setting a parameter (η=1), the DDIM sampling process becomes equivalent to the DDPM process.
Weaknesses:
- The mathematical derivations, particularly the connection to ODEs and the details in the appendix, were found to be complex and not immediately intuitive.
- The discussion noted that while the results are impressive, the performance of DDIM with very few steps is still slightly worse than with the full step count, indicating a clear trade-off.
Key Questions:
- The central question during the discussion was: "How is it possible to use the same pre-trained DDPM model for a different sampling process?"
- The conclusion reached was that the DDPM model is trained on a general task—predicting the original image (x_0) or noise (ε) from a noisy input (x_t). This task is independent of the sampling path. DDIM provides a new, more efficient path that can still leverage this learned denoising function.
Applications:
- Makes diffusion models practical for real-world use cases where generation speed is critical.
- Enables semantic image editing and interpolation by manipulating the latent variable x_T, thanks to the deterministic generation process.
Connections:
- This work is a direct improvement on DDPMs.
- It helps close the performance gap between high-quality but slow diffusion models and fast but often unstable GANs.

Notes and Reflections

Interesting Insights:
- The most surprising insight was that a model's fundamental limitation (slow sampling) could be solved by reframing the underlying mathematical process rather than changing the model or training.
- The discussion emphasized how DDIM reveals that the Markovian property is a constraint that can be relaxed, leading to a more flexible and efficient generative process.
Lessons Learned:
- A deep theoretical understanding of a model can unlock significant practical improvements.
- A single trained model can be versatile; its utility is not limited to the specific inference procedure it was originally designed for.
Future Directions:
- The discussion did not delve deep into future work, but the paper's approach suggests that exploring other non-Markovian processes or applying more advanced ODE solvers could further improve sampling efficiency and quality.