[25.08.07] Denoising Diffusion Implicit Models ‐ 2 - Paper-Reading-Study/2025 GitHub Wiki

Paper Reading Study Notes

General Information

  • Paper Title: DENOISING DIFFUSION IMPLICIT MODELS
  • Authors: Jiaming Song, Chenlin Meng & Stefano Ermon
  • Published In: ICLR
  • Year: 2021
  • Link: https://arxiv.org/abs/2010.02502
  • Date of Discussion: August 7, 2025

Summary

  • Research Problem: Denoising Diffusion Probabilistic Models (DDPMs) produce high-quality images but are extremely slow to sample from, as they require simulating a Markov chain for thousands of small steps. This paper aims to significantly accelerate the sampling process.
  • Key Contributions: The paper introduces Denoising Diffusion Implicit Models (DDIMs), a more efficient class of generative models. DDIMs generalize the Markovian diffusion process of DDPMs to non-Markovian ones. This allows for a much faster sampling trajectory (e.g., 10-50 steps instead of 1000) without retraining the model. It also enables deterministic generation, which allows for meaningful latent space interpolation.
  • Methodology/Approach: The core idea is to define a new, non-Markovian forward process that still results in the same training objective as DDPM. This allows a pre-trained DDPM model to be used for DDIM sampling. The generation process is modified to take larger, deterministic steps (when a parameter σ is set to 0), effectively "skipping" most of the steps from the original DDPM sequence.
  • Results: DDIMs achieve a significantly better trade-off between computation and sample quality. With as few as 20-100 steps, DDIMs can generate samples of comparable quality to a 1000-step DDPM, resulting in a 10-50x speedup. The deterministic nature of DDIMs also allows for consistent reconstruction from latent codes and semantic image interpolation.

Discussion Points

  • Strengths:

    • Efficiency: The massive speedup in sampling is the most compelling advantage. The group recognized this as the primary solution to DDPM's main bottleneck.
    • No Retraining: The ability to use an existing, pre-trained DDPM model is a huge practical benefit.
    • Deterministic Latent Space: The deterministic nature of DDIM (σ=0) allows for consistent generation from a given latent variable and enables semantic interpolation, which is not possible with the stochastic DDPM process.
  • Weaknesses:

    • Theoretical Complexity: The group found the theoretical justification, particularly the unified variational objective in Section 3.2 and Theorem 1, difficult to grasp intuitively. They felt the mathematical argument was a bit like "forcing it to work" ("억지에 억지를 부리는 느낌").
    • Potential for Mode Collapse: A question was raised whether taking very large steps could lead to issues like mode collapse, although the paper's results suggest it performs well in practice.
  • Key Questions:

    • What is the deep intuition behind why the non-Markovian process leads to the same training objective, allowing the model to be reused?
    • How exactly does conditioning on the initial state x_0 in the forward process justify taking larger, more direct steps in the reverse (generative) process?
    • In the experimental results (Table 1), what is the exact difference between the DDPM with η=1 and the one denoted by σ_hat? The group was confused by the performance difference.
    • The connection to Score-Based Models and SDEs was noted as fascinating but not fully understood, pointing to a knowledge gap the group wanted to address.
  • Applications:

    • Fast, high-quality image generation for practical applications where latency is critical.
    • Creative tools that leverage semantic latent space interpolation for smooth transitions between images.
    • Image reconstruction and manipulation via encoding an image into a latent code and then decoding it.
  • Connections:

    • The discussion highlighted the fascinating connection between seemingly different models. DDIM acts as a bridge between DDPMs and Score-Based Models (via SDEs/ODEs).
    • The generative process of DDIM was seen as analogous to solving an ODE, where the model predicts the "flow" or direction towards the final image.

Notes and Reflections

  • Interesting Insights:

    • The group found it "magical" that a complex, intractable reverse process could be so effectively approximated and accelerated by a neural network. They saw DDIM as a clever "hack" that breaks the strict Markovian assumption of DDPMs by looking at the overall trajectory from noise to data.
    • The progression from DDPM to DDIM and its connection to score models was seen as a process of repeated generalization: first generalizing data to a noise distribution, and then generalizing the path back.
  • Lessons Learned:

    • The key takeaway was understanding the speed-quality trade-off. DDIM is not necessarily superior at 1000 steps, but it is vastly more practical because it achieves high quality with far fewer steps.
    • The discussion solidified the understanding that DDPM is stochastic, while DDIM can be made deterministic, which is the source of its unique properties like interpolation and consistency.
  • Future Directions:

    • The group concluded that a deeper study of Stochastic Differential Equations (SDEs) is necessary to fully understand the theoretical underpinnings of modern generative models.
    • Investigating the limits of the step-skipping process in DDIM could be an interesting direction. How many steps can be skipped before generation quality severely degrades?