🐧大模型系列篇章
💖 多模态大模型 🔎 GroundingDINO 论文总结
💖 端到端目标检测 🔎 从DETR 到 GroundingDINO
💖 多模态大模型 👉 CLIP论文总结
💖 多模态大模型 👉 EVA-CLIP
💚 生成模型 👉 从 VAE 到 Diffusion Model (上)
💚 生成模型 👉 从 VAE 到 Diffusion Model (下)
💧 天气大模型
欢迎订阅专栏,第一时间掌握最新科技 专栏链接 |
一,Abstract & Introducation
- EVA-CLIP: 一系列显著的提升CLIP训练时的效率和有效性。
- 用最新的表征学习, 优化策略,增强 使得EVA-CLIP在同样数量的参数下比之前的CLIP模型要好,且花费更小的训练资源。
- pre-trained EVA 来初始化CLIP的训练
- the LAMB optimizer
- randomly deopping input tokens
- speedup trick: flash attention
- 在ImageNet-1k val的成绩
- 82% zero-shot accuracy = EVA-02-CLIP-E/14 (5.0B-parameter)+ 9 biilion samples
- 80.4% zero-shot accuracy = EVA-02-CLIP-L/14 (430 million paramters) + 6 billion samples
二,Approach
- Better Initialization: pre-trained EVA weights to initializa the images encoder of EVA-CLIP
- Optimizer: LAMB, 为large-batch training设计的优化器。 its adaptive elementwise updating and layer-wise learning rates enhance training efficiency and accelerate convergence rates
- FLIP: we randomly mask 50% image tokens during training esulting in a significant reduction of time complexity by half.
三,Experiments
Settings
- Datasets: 2B = 1.6 billion (LAION-2B) + 0.4 billion (COYO-700M)
- 实验结果一目了然
🐤 论文的链接:https://arxiv.org/pdf/2303.15389.pdf
四,使用教程
安装库
pip install open_clip_torch
查看预训练好的模型集合
>>> import open_clip
>>> open_clip.list_pretrained()
🐤
亲测,eva_clip的效果要比clip原始的模型要好很多! |
你也可以访问这个链接,查看都支持哪些模型 🐤
https://github.com/mlfoundations/open_clip/blob/main/docs/model_profile.csv
运行
import torch
from PIL import Image
import open_clip
model, _, preprocess = open_clip.create_model_and_transforms('ViT-B-32', pretrained='laion2b_s34b_b79k')
model.eval() # model in train mode by default, impacts some models with BatchNorm or stochastic depth active
tokenizer = open_clip.get_tokenizer('ViT-B-32')
image = preprocess(Image.open("docs/CLIP.png")).unsqueeze(0)
text = tokenizer(["a diagram", "a dog", "a cat"])
with torch.no_grad(), torch.cuda.amp.autocast():
image_features = model.encode_image(image)
text_features = model.encode_text(text)
image_features /= image_features.norm(dim=-1, keepdim=True)
text_features /= text_features.norm(dim=-1, keepdim=True)
text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
print("Label probs:", text_probs) # prints: [[1., 0., 0.]]
使用的是open_clip的库,很好用哦!这是官方仓库的链接:https://github.com/mlfoundations/open_clip
有什么问题,请评论留言哦~ 🎶