CLIP安装避坑记录

Robot3366

已于 2023-05-16 10:16:44 修改

阅读量6.5k

点赞数 5

文章标签： python pytorch 深度学习

于 2023-05-14 11:01:23 首次发布

本文链接：https://blog.csdn.net/Robot3366/article/details/130666425

版权

前两周就想试一下CLIP的对比学习。先看了论文，零星准备环境，一直没有成功。虽然https://github.com/mlfoundations/open_clip。官方网站上有比较详细的示例和README,无奈网络不太给力。国内网上的安装配置方法解决了一部分问题，最后付出了烧焦一顿饭的代价，把环境配置好。记录下来以防后用。基础环境安装我的电脑上有很多环境，为不影响其它正常环境，决定创建新的环境。从官网下载项目的安装包，使用pycharm中open project打开新项目。在File->Setting->Project名称下，打开Python Interpreter，创建一个本地的venv新环境。新环境base的python版本我是统一的python3.9。创建完成后只有3个文件。在此基础上，开始安装所需要的基础内容：Cuda，Pytorch，以及与notebook共用的环境。

Pytorch+Cuda安装由于经常安装环境，对于Pytorch+Cuda，我的使用的是下面的命令，并且我的机器上已经有缓存，安装起来很快。第一次安装，要很长的时间下载安装包。我的显卡是3080Ti，可以支持torch1.13.1+cu117。Pytorch的安装相比tensorflow麻烦很多，不同的版本对应不同的cuda，并且在pip源上还找不到安装包，需要在extra中下载。早先有些博文建议把后面extra去掉，也引起很多安装问题。有这个extra下载安装才正常。

pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117

Notebook与Pycharm共享环境配置

1、切换到pycharm的venv所在的目录，Scripts\activate.bat

2、安装ipykernel: pip install ipykernel

3、将环境添加到 jupyter notebook 中： python -m ipykernel install --user --name=test

4、如需删除 kernel： jupyter kernelspec uninstall myenv

5、查看当前环境：jupyter kernelspec list 到此，基础环境配置就完成了。

CLIP环境安装首先从github下载源码。将源码放在pycharm的项目目录下。在pycharm的terminal中，执行安装命令来安装依赖包。

pip install ftfy regex tqdm

这里是第一个坑。不能直接使用pip install clip来安装clip，需要使用下面方式来安装。

pip install git+https://github.com/openai/CLIP.git

接下来再安装ipywidgets依赖：

pip install ipywidgets
jupyter nbextension enable --py widgetsnbextension

到此为止，环境的安装就结束了。在测试中，可以直接运行下面的测试代码。代码会自动下载所需的模型文件。这里也是个坑。按常理，由于国内网络问题，一般是先下载模型文件，再运行程序。找模型文件就费了功夫。这里可以直接下载，速度很快。

import torch
import clip
from PIL import Image

device = "cuda" if torch.cuda.is_available() else "cpu"
# model, preprocess = clip.load("ViT-B/32", device=device)
model, preprocess = clip.load("ViT-L/14", device=device)

image = preprocess(Image.open("dog.png")).unsqueeze(0).to(device)
text = clip.tokenize(["two dogs", "this is a dog", "two dogs on grass", "there are two dogs"]).to(device)
with torch.no_grad():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)

    logits_per_image, logits_per_text = model(image, text)
    probs = logits_per_image.softmax(dim=-1).cpu().numpy()

print("Label probs:", probs)