扩散模型是一类生成模型,主要用于图像生成和其他生成任务。这类模型的核心思想是通过逐步添加噪声(前向过程)并逆向去噪(后向过程)来生成数据。下面将详细解释扩散模型的前向和后向传播,并提供具体数据举例。
前向过程(Forward Process)
在前向过程中,扩散模型从一个真实的数据分布开始,通过逐步添加高斯噪声,将数据转换成一种标准的高斯分布。具体步骤如下:
- 初始数据:假设我们有一个数据样本 x 0 x_0 x0来自真实分布 q ( x 0 ) q(x_0) q(x0)。
- 逐步添加噪声:在每一步
t
t
t中,我们将高斯噪声添加到数据中,生成新的数据样本
x
t
x_t
xt。这个过程可以表示为:
x t = α t x t − 1 + 1 − α t ϵ t x_t = \sqrt{\alpha_t} x_{t-1} + \sqrt{1 - \alpha_t} \epsilon_t xt=αtxt−1+1−αtϵt
其中, ϵ t \epsilon_t ϵt是标准高斯噪声, α t \alpha_t αt是一个逐渐减小的参数,控制噪声的强度。
前向过程公式详解文章链接
举例
假设我们有一个二维数据点 x 0 = [ 1 , 1 ] x_0 = [1, 1] x0=[1,1]。我们选择 α t = 0.9 \alpha_t = 0.9 αt=0.9,并在前向过程中进行三步噪声添加。
-
第1步:
x 1 = 0.9 ⋅ [ 1 , 1 ] + 0.1 ⋅ ϵ 1 x_1 = \sqrt{0.9} \cdot [1, 1] + \sqrt{0.1} \cdot \epsilon_1 x1=0.9⋅[1,1]+0.1⋅ϵ1
假设 ϵ 1 ∼ N ( 0 , I ) \epsilon_1 \sim \mathcal{N}(0, I) ϵ1∼N(0,I),且 ϵ 1 = [ 0.2 , − 0.3 ] \epsilon_1 = [0.2, -0.3] ϵ1=[0.2,−0.3]。
x 1 = 0.9 ⋅ [ 1 , 1 ] + 0.1 ⋅ [ 0.2 , − 0.3 ] = [ 0.95 , 0.95 ] + [ 0.063 , − 0.095 ] = [ 1.013 , 0.855 ] x_1 = \sqrt{0.9} \cdot [1, 1] + \sqrt{0.1} \cdot [0.2, -0.3] = [0.95, 0.95] + [0.063, -0.095] = [1.013, 0.855] x1=0.9⋅[1,1]+0.1⋅[0.2,−0.3]=[0.95,0.95]+[0.063,−0.095]=[1.013,0.855] -
第2步:
x 2 = 0.9 ⋅ [ 1.013 , 0.855 ] + 0.1 ⋅ ϵ 2 x_2 = \sqrt{0.9} \cdot [1.013, 0.855] + \sqrt{0.1} \cdot \epsilon_2 x2=0.9⋅[1.013,0.855]+0.1⋅ϵ2
假设 ϵ 2 ∼ N ( 0 , I ) \epsilon_2 \sim \mathcal{N}(0, I) ϵ2∼N(0,I),且 ϵ 2 = [ − 0.1 , 0.1 ] \epsilon_2 = [-0.1, 0.1] ϵ2=[−0.1,0.1]。
x 2 = 0.9 ⋅ [ 1.013 , 0.855 ] + 0.1 ⋅ [ − 0.1 , 0.1 ] = [ 0.962 , 0.811 ] + [ − 0.032 , 0.032 ] = [ 0.930 , 0.843 ] x_2 = \sqrt{0.9} \cdot [1.013, 0.855] + \sqrt{0.1} \cdot [-0.1, 0.1] = [0.962, 0.811] + [-0.032, 0.032] = [0.930, 0.843] x2=0.9⋅[1.013,0.855]+0.1⋅[−0.1,0.1]=[0.962,0.811]+[−0.032,0.032]=[0.930,0.843] -
第3步:
x 3 = 0.9 ⋅ [ 0.930 , 0.843 ] + 0.1 ⋅ ϵ 3 x_3 = \sqrt{0.9} \cdot [0.930, 0.843] + \sqrt{0.1} \cdot \epsilon_3 x3=0.9⋅[0.930,0.843]+0.1⋅ϵ3
假设 ϵ 3 ∼ N ( 0 , I ) \epsilon_3 \sim \mathcal{N}(0, I) ϵ3∼N(0,I),且 ϵ 3 = [ 0.1 , − 0.2 ] \epsilon_3 = [0.1, -0.2] ϵ3=[0.1,−0.2]。
x 3 = 0.9 ⋅ [ 0.930 , 0.843 ] + 0.1 ⋅ [ 0.1 , − 0.2 ] = [ 0.883 , 0.801 ] + [ 0.032 , − 0.063 ] = [ 0.915 , 0.738 ] x_3 = \sqrt{0.9} \cdot [0.930, 0.843] + \sqrt{0.1} \cdot [0.1, -0.2] = [0.883, 0.801] + [0.032, -0.063] = [0.915, 0.738] x3=0.9⋅[0.930,0.843]+0.1⋅[0.1,−0.2]=[0.883,0.801]+[0.032,−0.063]=[0.915,0.738]
经过多次迭代后,数据点逐渐变得噪声化,趋近于高斯分布。
后向过程(Backward Process)
后向过程的目标是从标准高斯分布中逐步去噪,恢复到原始数据分布。这个过程是前向过程的逆过程,通过学习一个去噪模型 p θ ( x t − 1 ∣ x t ) p_\theta(x_{t-1} | x_t) pθ(xt−1∣xt)实现。
- 从高斯噪声开始:假设我们从一个标准高斯分布 x T ∼ N ( 0 , I ) x_T \sim \mathcal{N}(0, I) xT∼N(0,I)开始。
- 逐步去噪:在每一步
t
t
t中,我们使用训练好的模型
p
θ
p_\theta
pθ预测并去除噪声,生成新的数据样本
x
t
−
1
x_{t-1}
xt−1。这个过程可以表示为:
x t − 1 = 1 α t ( x t − 1 − α t ϵ θ ( x t , t ) ) x_{t-1} = \frac{1}{\sqrt{\alpha_t}} (x_t - \sqrt{1 - \alpha_t} \epsilon_\theta(x_t, t)) xt−1=αt1(xt−1−αtϵθ(xt,t))
其中, ϵ θ ( x t , t ) \epsilon_\theta(x_t, t) ϵθ(xt,t)是模型预测的噪声。
举例
假设我们从一个高斯噪声数据点 x 3 = [ 0.915 , 0.738 ] x_3 = [0.915, 0.738] x3=[0.915,0.738]开始,使用去噪模型进行逆过程。
-
第3步:
x 2 = 1 0.9 ( x 3 − 0.1 ϵ θ ( x 3 , 3 ) ) x_2 = \frac{1}{\sqrt{0.9}} (x_3 - \sqrt{0.1} \epsilon_\theta(x_3, 3)) x2=0.91(x3−0.1ϵθ(x3,3))
假设模型预测的噪声 ϵ θ ( x 3 , 3 ) = [ 0.1 , − 0.2 ] \epsilon_\theta(x_3, 3) = [0.1, -0.2] ϵθ(x3,3)=[0.1,−0.2]。
x 2 = 1 0.9 ( [ 0.915 , 0.738 ] − 0.1 ⋅ [ 0.1 , − 0.2 ] ) = 1 0.9 ( [ 0.915 , 0.738 ] − [ 0.032 , − 0.063 ] ) = 1 0.9 [ 0.883 , 0.801 ] = [ 0.930 , 0.843 ] x_2 = \frac{1}{\sqrt{0.9}} ([0.915, 0.738] - \sqrt{0.1} \cdot [0.1, -0.2]) = \frac{1}{\sqrt{0.9}} ([0.915, 0.738] - [0.032, -0.063]) = \frac{1}{\sqrt{0.9}} [0.883, 0.801] = [0.930, 0.843] x2=0.91([0.915,0.738]−0.1⋅[0.1,−0.2])=0.91([0.915,0.738]−[0.032,−0.063])=0.91[0.883,0.801]=[0.930,0.843] -
第2步:
x 1 = 1 0.9 ( x 2 − 0.1 ϵ θ ( x 2 , 2 ) ) x_1 = \frac{1}{\sqrt{0.9}} (x_2 - \sqrt{0.1} \epsilon_\theta(x_2, 2)) x1=0.91(x2−0.1ϵθ(x2,2))
假设模型预测的噪声 ϵ θ ( x 2 , 2 ) = [ − 0.1 , 0.1 ] \epsilon_\theta(x_2, 2) = [-0.1, 0.1] ϵθ(x2,2)=[−0.1,0.1]。
x 1 = 1 0.9 ( [ 0.930 , 0.843 ] − 0.1 ⋅ [ − 0.1 , 0.1 ] ) = 1 0.9 ( [ 0.930 , 0.843 ] + [ 0.032 , − 0.032 ] ) = 1 0.9 [ 0.962 , 0.811 ] = [ 1.013 , 0.855 ] x_1 = \frac{1}{\sqrt{0.9}} ([0.930, 0.843] - \sqrt{0.1} \cdot [-0.1, 0.1]) = \frac{1}{\sqrt{0.9}} ([0.930, 0.843] + [0.032, -0.032]) = \frac{1}{\sqrt{0.9}} [0.962, 0.811] = [1.013, 0.855] x1=0.91([0.930,0.843]−0.1⋅[−0.1,0.1])=0.91([0.930,0.843]+[0.032,−0.032])=0.91[0.962,0.811]=[1.013,0.855] -
第1步:
x 0 = 1 0.9 ( x 1 − 0.1 ϵ θ ( x 1 , 1 ) ) x_0 = \frac{1}{\sqrt{0.9}} (x_1 - \sqrt{0.1} \epsilon_\theta(x_1, 1)) x0=0.91(x1−0.1ϵθ(x1,1))
假设模型预测的噪声 ϵ θ ( x 1 , 1 ) = [ 0.2 , − 0.3 ] \epsilon_\theta(x_1, 1) = [0.2, -0.3] ϵθ(x1,1)=[0.2,−0.3]。
x 0 = 1 0.9 ( [ 1.013 , 0.855 ] − 0.1 ⋅ [ 0.2 , − 0.3 ] ) = 1 0.9 ( [ 1.013 , 0.855 ] − [ 0.063 , − 0.095 ] ) = 1 0.9 [ 0.95 , 0.95 ] = [ 1 , 1 ] x_0 = \frac{1}{\sqrt{0.9}} ([1.013, 0.855] - \sqrt{0.1} \cdot [0.2, -0.3]) = \frac{1}{\sqrt{0.9}} ([1.013, 0.855] - [0.063, -0.095]) = \frac{1}{\sqrt{0.9}} [0.95, 0.95] = [1, 1] x0=0.91([1.013,0.855]−0.1⋅[0.2,−0.3])=0.91([1.013,0.855]−[0.063,−0.095])=0.91[0.95,0.95]=[1,1]
通过后向过程,我们成功地从高斯噪声中恢复了原始数据点 x 0 = [ 1 , 1 ] x_0 = [1, 1] x0=[1,1]。
总结
扩散模型的前向过程通过逐步添加噪声将数据转化为高斯分布,后向过程则通过去噪模型逆向还原数据。这两个过程的有效结合,使得扩散模型在图像生成等任务中表现出色。