self.global_model=ImageTextClassifier(
(img_proj): Sequential(
(0): Linear(in_features=1280, out_features=128, bias=True)
(1): ReLU()
(2): Dropout(p=0.1, inplace=False)
(3): Linear(in_features=128, out_features=128, bias=True)
)
(text_rnn): GRU(512, 128, batch_first=True, dropout=0.1)
(fuse_att): FuseBaseSelfAttention(
(att_fc1): Linear(in_features=128, out_features=512, bias=True)
(att_pool): Tanh()
(att_fc2): Linear(in_features=512, out_features=6, bias=True)
)
(classifier): Sequential(
(0): Linear(in_features=768, out_features=64, bias=True)
(1): ReLU()
(2): Dropout(p=0.1, inplace=False)
(3): Linear(in_features=64, out_features=8, bias=True)
)
)
这个模型是一个多模态分类器,它结合了图像特征和文本特征,并通过自注意力机制进行特征融合,最后通过分类器输出最终预测。
1. img_proj: 图像特征降维模块
(0): Linear(in_features=1280, out_features=128, bias=True)
(1): ReLU()
(2): Dropout(p=0.1, inplace=False)
(3): Linear(in_features=128, out_features=128, bia

最低0.47元/天 解锁文章
977

被折叠的 条评论
为什么被折叠?



