mean teacher

mean teacher

半监督 mean teacher 是17年提出来的半监督方法:

Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results

arxiv

改进了Temporal ensembling 和 Π \Pi Π model

主要工作在于通过权重平均的方法来更新teacher参数,而不是简单的replicating student

算法思路

算法的整体包括两个网络,teacher网络 和 student网络
teacher 是 student网络的一份复制,因此网络结构相同,但是网络更新的方式不同,因此是两个独立的网络

其中 student模型 权重 θ \theta θ 输入噪声 η \eta η,
teacher模型 权重 θ ′ \theta' θ 输入噪声 η ′ \eta' η

定义了一致性损失(consistency cost) J 衡量 t和s 的预测的距离

J ( θ ) = E x , η ′ , η [ ∥ f ( x , θ ′ , η ′ ) − f ( x , θ , η ) ∥ ] J(\theta)=\mathbb{E}_{x,\eta',\eta}\left [ \left \| f(x,\theta',\eta') - f(x,\theta,\eta) \right \| \right ] J(θ)=Ex,η,η[f(x,θ,η)f(x,θ,η)]

student通过梯度下降更新参数,teacher通过student的参数更新

原文:After the weights of the student model have been updated with gradient descent, the teacher model weights are updated as an exponential moving average of the student weights.

1. teacher的参数更新

mean teacher 相比Temporal ensembling 和 Π \Pi Π model 的创新主要就在这里:

原文:The difference between the Π model, Temporal Ensembling, and Mean teacher is how the teacher
predictions are generated. Whereas the Π model uses θ = θ, and Temporal Ensembling approximates
f (x, θ , η ) with a weighted average of successive predictions,

teacher的参数通过EMA得到

training step θ t \theta_t θt:

θ t = α θ t − 1 ′ + ( 1 − α ) θ t \theta_t = \alpha\theta_{t-1}^{'} + (1-\alpha)\theta_t θt=αθt1+(1α)θt

2. student的参数更新

论文里面说student参数更新是SGD,并没有具体的算法loss表达和流程,
只在图例描述了过程,说了和Temporal ensembling 、 Π \Pi Π model一样

所以我们还是结合一下Temporal ensembling

在这里插入图片描述

训练

看过 Π \Pi Π model 我们来重新理解一下mean teacher
在这里插入图片描述

上图是 单个标签的训练样本的例子
student和teacher模型都接收同一个输入x分别加上不同的噪声 η , η ′ \eta,\eta' η,η
然后student模型的softmax输出 (1)首先和one-hot label做交叉熵(classification cost function),(2)同时和teacher模型做一致性cost function( J ( θ ) J(\theta) J(θ)). 这样student的权重更新以后,根据其权重,指数滑动平均EMA来更新teacher. s和t都可以用于预测,但是t可能更好。在u集上的训练类似,除了没有classification cost function.

结合Temporal ensembling来看,就很好理解了

网上有blog
做了总结,我们还是看官方的link

Mean Teacher is a simple method for semi-supervised learning. It consists of the following steps:

  1. Take a supervised architecture and make a copy of it. Let’s call the original model the student and the new one the teacher.
  2. At each training step, use the same minibatch as inputs to both the student and the teacher but add random augmentation or noise to the inputs separately.
  3. Add an additional consistency cost between the student and teacher outputs (after softmax).
  4. Let the optimizer update the student weights normally.
  5. Let the teacher weights be an exponential moving average (EMA) of the student weights. That is, after each training step, update the teacher weights a little bit toward the student weights.

Our contribution is the last step. Laine and Aila paper used shared parameters between the student and the teacher, or used a temporal ensemble of teacher predictions. In comparison, Mean Teacher is more accurate and applicable to large datasets.

代码

代码用的tf版本比较老,最好用1.2.1版本去跑

https://github.com/CuriousAI/mean-teacher/blob/master/tensorflow/mean_teacher/model.py

tf代码看的似懂非懂的。。
计算图长这样

在这里插入图片描述
// 填坑

reference

Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results

深度学习(七十四)半监督Mean teachers
https://blog.csdn.net/hjimce/article/details/80551721

TEMPORAL ENSEMBLING FOR SEMI-SUPERVISED LEARNING
https://arxiv.org/pdf/1610.02242.pdf

  • 10
    点赞
  • 50
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值