![3eebfe233cda887696d39c08037fbcbf.png](https://img-blog.csdnimg.cn/img_convert/3eebfe233cda887696d39c08037fbcbf.png)
赵zhijian:VIT 三部曲
赵zhijian:VIT 三部曲 - 2 Vision-Transformer
赵zhijian:VIT 三部曲 - 3 vit-pytorch
模型和代码参考
https://github.com/likelyzhao/vit-pytorch
我们从代码中进行一些详细的分析:
class ViT(nn.Module):
def __init__(self, *, image_size, patch_size, num_classes, depth, heads, mlp_dim, channels = 3, dropout = 0., emb_dropout = 0.):
super().__init__()
assert image_size % patch_size == 0, 'image dimensions must be divisible by the patch size'
num_patches = (image_size // patch_size) ** 2
hidden_size = channels * patch_size ** 2
assert num_patches > MIN_NUM_PATCHES, f'your number of patches ({num_patches}) is way too smal