[25.06.30] Denoising Diffusion Implicit Models (DDIM) - Paper-Reading-Study/2025 GitHub Wiki

Paper Reading Study Notes

General Information

  • Paper Title: Denoising Diffusion Implicit Models (DDIM)
  • Authors: Jiaming Song, Chenlin Meng & Stefano Ermon
  • Published In: ICLR
  • Year: 2021
  • Link: https://arxiv.org/abs/2010.02502
  • Date of Discussion: 2025.06.30

Summary

  • Research Problem: Denoising Diffusion Probabilistic Models (DDPMs) produce high-quality images but suffer from extremely slow sampling times, as they require simulating a Markov chain for thousands of steps to generate a single sample. This makes them impractical for many applications.
  • Key Contributions:
    1. It generalizes the DDPM framework by introducing a class of non-Markovian diffusion processes that share the same training objective as DDPMs.
    2. It enables a much faster sampling procedure (10-50x speed-up) by allowing the model to skip steps in the reverse (generation) process.
    3. It introduces a deterministic generative process (when parameter η=0), which allows for semantically meaningful interpolation in the latent space, a feature not possible with the stochastic DDPMs.
  • Methodology/Approach: The core idea is to reformulate the generative process. Instead of being strictly Markovian (where x_{t-1} only depends on x_t}), the DDIM process is non-Markovian, conditioning the prediction of x_{t-1} on both x_t and the predicted clean image x_0. Since the DDPM's objective function can be interpreted as training a model to predict x_0 from x_t, the exact same pre-trained DDPM model can be used for DDIM sampling without any re-training. The speed-up is achieved by creating a "sampling trajectory" that uses a sub-sequence of the original timesteps.
  • Results: DDIM significantly outperforms DDPM in sample quality (FID) when using a small number of sampling steps. While DDPM's quality degrades rapidly with fewer steps, DDIM maintains high quality, offering a much better trade-off between computation and sample quality.

Discussion Points

  • Strengths:

    • The paper's main strength is achieving a massive efficiency gain without needing to retrain the model. The discussion highlighted this as a "romantic" and elegant approach to research.
    • The idea of decoupling the training objective from the inference procedure was seen as highly innovative. It shows that the DDPM model was being used "inefficiently."
    • The method is a true generalization of DDPMs; by setting a parameter (η=1), the DDIM sampling process becomes equivalent to the DDPM process.
  • Weaknesses:

    • The mathematical derivations, particularly the connection to ODEs and the details in the appendix, were found to be complex and not immediately intuitive.
    • The discussion noted that while the results are impressive, the performance of DDIM with very few steps is still slightly worse than with the full step count, indicating a clear trade-off.
  • Key Questions:

    • The central question during the discussion was: "How is it possible to use the same pre-trained DDPM model for a different sampling process?"
    • The conclusion reached was that the DDPM model is trained on a general task—predicting the original image (x_0) or noise (ε) from a noisy input (x_t). This task is independent of the sampling path. DDIM provides a new, more efficient path that can still leverage this learned denoising function.
  • Applications:

    • Makes diffusion models practical for real-world use cases where generation speed is critical.
    • Enables semantic image editing and interpolation by manipulating the latent variable x_T, thanks to the deterministic generation process.
  • Connections:

    • This work is a direct improvement on DDPMs.
    • It helps close the performance gap between high-quality but slow diffusion models and fast but often unstable GANs.

Notes and Reflections

  • Interesting Insights:

    • The most surprising insight was that a model's fundamental limitation (slow sampling) could be solved by reframing the underlying mathematical process rather than changing the model or training.
    • The discussion emphasized how DDIM reveals that the Markovian property is a constraint that can be relaxed, leading to a more flexible and efficient generative process.
  • Lessons Learned:

    • A deep theoretical understanding of a model can unlock significant practical improvements.
    • A single trained model can be versatile; its utility is not limited to the specific inference procedure it was originally designed for.
  • Future Directions:

    • The discussion did not delve deep into future work, but the paper's approach suggests that exploring other non-Markovian processes or applying more advanced ODE solvers could further improve sampling efficiency and quality.