[25.07.03] Score‐Based Generative Modeling through Stochastic Differential Equations - Paper-Reading-Study/2025 GitHub Wiki

Paper Reading Study Notes

General Information

Paper Title: Score-Based Generative Modeling through Stochastic Differential Equations
Authors: Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole
Published In: ICLR
Year: 2021
Link: https://arxiv.org/abs/2011.13456
Date of Discussion: 2025.07.03

Summary

Research Problem: The paper aims to create a unified framework for generative models that work by progressively adding noise to data and then learning to reverse the process. It seeks to generalize previous methods like Score Matching with Langevin Dynamics (SMLD) and Denoising Diffusion Probabilistic Models (DDPM) under a single, more powerful mathematical structure.
Key Contributions:
1. Unified Framework: It introduces a framework where the noising and denoising processes are modeled as solutions to a continuous-time Stochastic Differential Equation (SDE).
2. Generalization: It shows that previous models like SMLD and DDPM can be seen as special discretizations of two distinct SDEs proposed in this paper.
3. New Capabilities: This SDE framework enables new sampling procedures (e.g., Predictor-Corrector methods), exact likelihood calculation via an equivalent Ordinary Differential Equation (ODE), and flexible, controllable generation (like inpainting and class-conditional generation) from a single unconditional model.
Methodology/Approach: The core idea is to define a "forward" SDE that gradually transforms a data distribution into a simple prior (noise) distribution. The generative process is a "reverse" SDE that transforms the prior back into the data distribution. The key to solving this reverse SDE is the "score" (the gradient of the log probability density) of the data at every point in time, which is estimated using a time-dependent neural network.
Results: The paper achieves state-of-the-art results on image generation tasks (e.g., CIFAR-10) and demonstrates high-fidelity generation for high-resolution images. It also successfully showcases controllable generation for tasks like image inpainting and colorization.

Discussion Points

Strengths:
- The unification of SMLD and DDPM into a single, more general framework was seen as a major strength, making the approach more powerful and general (03:10, 05:47).
- The concept of modeling diffusion with SDEs was found to be very intuitive, especially from a physics perspective (01:22).
- The ability to perform controllable generation (e.g., inpainting from a masked image) using a single, unconditionally trained model was considered very impressive and novel (38:27, 44:55).
Weaknesses:
- The primary challenge discussed was the paper's immense mathematical complexity. The participants found the heavy reliance on stochastic calculus and differential equations to be a significant barrier to a deep understanding (00:11, 21:07, 41:29).
- Many of the mathematical derivations had to be taken on faith during the discussion due to their complexity ("그냥 넘어가자" - "Let's just move on") (32:28).
Key Questions:
- How are the forward and reverse SDEs mathematically derived and proven to be reverses of each other? (17:25)
- What is the core difference in what the model learns here versus in DDPM? The conclusion was that this model learns the "score" (a direction vector), while DDPM is often framed as learning the noise itself (42:31 - 43:57).
Applications: The most highlighted applications were in controllable generation, including class-conditional generation, image inpainting, and colorization, all demonstrated in the paper's figures (38:27).
Connections: This work directly builds upon and generalizes SMLD and DDPM (02:23). It provides a continuous-time perspective that encompasses both discrete-time predecessors. It was also contrasted with DDIM, noting that while both can offer deterministic sampling, this paper's framework provides a different mechanism for controllability (04:48).

Notes and Reflections

Interesting Insights:
- The shift in perspective from simply "denoising" an image to "estimating a score function" that guides a sample from noise to data was a key insight. This score represents the direction in which to move the sample to increase its likelihood (09:17, 43:57).
- Realizing that SMLD and DDPM are not entirely different families but rather two specific discretizations of a continuous process was a major takeaway (05:47).
Lessons Learned:
- Modern generative models, particularly in the diffusion family, are built on very advanced mathematical concepts. A solid foundation in these areas is necessary for a full grasp of the material (41:54).
- To understand this paper fully, it would be beneficial to first study its prerequisites, especially the concepts of score matching and Langevin dynamics.
Future Directions: The group concluded that a next step should be to go back and study the foundational papers on score matching (SMLD) to build the necessary background knowledge to better tackle this and similar advanced papers (41:54).