CV论文阅读大合集

YearNameAreamodeldescriptiondrawback
2021 ICMLClip (Contrastive Language-Image Pre-training)contrastive learning、zero-shot learing、mutimodelclip用文本作为监督信号来训练可迁移的视觉模型CLIP’s zero-shot performance, although comparable to supervised ResNet50, is not yet SOTA, and the authors estimate that to achieve SOTA, CLIP would need to add 1000x more computation, which is unimaginable;CLIP’s zero-shot performs poorly on certain datasets, such as fine-grained classification, abstraction tasks, etc; CLIP performs robustly on natural distribution drift, but still suffers from out-of-domain generalisation, i.e., if the distribution of the test dataset differs significantly from the training set, CLIP will perform poorly; CLIP does not address the data inefficiency challenges of deep learning, and training CLIP requires a large amount of data;
2021 ICLRViT (VisionTransformer)在这里插入图片描述将Transformer应用到vision中:simple, efficient,scalable当拥有足够多的数据进行预训练的时候,ViT的表现就会超过CNN,突破transformer缺少归纳偏置的限制,可以在下游任务中获得较好的迁移效果
2022DALL-E基于文本来生成模型
2021 ICCVSwin Transformer在这里插入图片描述使用滑窗和层级式的结构,解决transformer计算量大的问题;披着Transformer皮的CNN
2021MAE(Masked Autoencoders)self-supervised在这里插入图片描述CV版的bertscalablel;very high-capacity models that generalize well
TransMed: Transformers Advance Multi-modal Medical Image Classification在这里插入图片描述
I3D
2021Pathway
2021 ICMLVILT视觉文本多模态Transformer性能不高 推理时间快 训练时间特别慢
2021 NeurIPSALBEFalign before fusion 为了清理noisy data 提出用一个momentum model生成pseudo target
2021VLMo融合dual-encoder和fusion-encoder的一种结构;采用stagewise的预训练方式
CoCa
BeiTv3
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值