两种误差:一致性误差(consistency cost)和分类误差
一致性误差: J ( θ ) = E x , η ′ , η [ ∣ ∣ f ( x , θ ′ , η ′ ) − f ( x , θ , η ) ∣ ∣ 2 ] J(\theta)=\mathbb{E}_{x,\eta',\eta}[||f(x,\theta',\eta')-f(x,\theta,\eta)||^2] J(θ)=Ex,η′,η[∣∣f(x,θ′,η′)−f(x,θ,η)∣∣2]
三种半监督模型的区别(这些都用的噪声扰动):
Π-Model:
θ
′
=
θ
\theta'=\theta
θ′=θ
Temporal Ensembling:
f
(
x
,
θ
′
,
η
′
)
f(x,\theta',\eta')
f(x,θ′,η′)用连续预测的加权平均值逼近
Mean Teacher:
θ
t
′
=
α
θ
t
−
1
′
+
(
1
−
α
)
θ
t
\theta'_t=\alpha\theta_{t-1}'+(1-\alpha)\theta_t
θt′=αθt−1′+(1−α)θt
Interpolation Consistency Training(ICT) model:用的插值不是噪声扰动