接上一篇
P26P27
MAE的Reconstruction target
n
Our MAE reconstructs the input by
predicting the pixel values for each masked patch
.
n
Each element in the decoder’s output is
a vector of pixel values representing a patch
.
n
The last layer of the decoder is
a linear projection
whose number of output channels equals the number of pixel values in a patch.
n
The decoder’s output is reshaped to form a reconstructed image.
n
Our loss function computes
the mean squared error (MSE) between the reconstructed and original images in the pixel space
.
n
We compute the loss
only on masked patches
, similar to BERT
实验效果-Mask比例的影响
从上图可以看出,随着输入图像被遮住的比例升高,MAE的性能迅速上升,在75%左右达到最佳效果。
以上为叶梓老师上课讲义,供有基础的朋友或者同行使用,未完,下一篇继续……