CLIP模型的代码

CLIP的github上给出了几段代码

首先是第一段代码

import torch
import clip
from PIL import Image

device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)

image = preprocess(Image.open("CLIP.png")).unsqueeze(0).to(device)
text = clip.tokenize(["a diagram", "a dog", "a cat"]).to(device)

with torch.no_grad():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    
    logits_per_image, logits_per_text = model(image, text)
    probs = logits_per_image.softmax(dim=-1).cpu().numpy()

print("Label probs:", probs)  # prints: [[0.9927937  0.00421068 0.00299572]]

看完代码可以发现,最后的预测,image_features,text_features完全没有参与

第二段代码

import os
import clip
import torch
from torchvision.datasets import CIFAR100

# Load the model
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load('ViT-B/32', device)

# Download the dataset
cifar100 = CIFAR100(root=os.path.expanduser("~/.cache"), download=True, train=False)

# Prepare the inputs
image, class_id = cifar100[3637]
image_input = preprocess(image).unsqueeze(0).to(device)
text_inputs = torch.cat([clip.tokenize(f"a photo of a {c}") for c in cifar100.classes]).to(device)

# Calculate features
with torch.no_grad():
    image_features = model.encode_image(image_input)
    text_features = model.encode_text(text_inputs)

# Pick the top 5 most similar labels for the image
image_features /= image_features.norm(dim=-1, keepdim=True)
text_features /= text_features.norm(dim=-1, keepdim=True)
similarity = (100.0 * image_features @ text_features.T).softmax(dim=-1)
values, indices = similarity[0].topk(5)

# Print the result
print("\nTop predictions:\n")
for value, index in zip(values, indices):
    print(f"{cifar100.classes[index]:>16s}: {100 * value.item():.2f}%")

看完可以比较容易的发现,这里直接用image_features和text_features去做矩阵乘法,然后进行softmax操作就得到了最终的预测结果,这里没有经过model计算出logit,所以model所做的事情就是计算两个feature然后使用矩阵乘法乘起来。

最后简单的验证了一下

import os
import clip
import torch
from torchvision.datasets import CIFAR100

# Load the model
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"use device : {device}")
model, preprocess = clip.load('ViT-B/32', device)
# Download the dataset and the train=False that is mean we will download or load the test set
cifar100 = CIFAR100(root=os.path.expanduser("~/.cache"), download=True, train=False)

print(cifar100)

image, class_id = cifar100[3637]
image_input = preprocess(image).unsqueeze(0).to(device)
text_inputs = torch.cat([clip.tokenize(f"a photo of a {c}") for c in cifar100.classes]).to(device)

with torch.no_grad():
    image_features = model.encode_image(image_input)
    text_features = model.encode_text(text_inputs)
    logits_per_image, logits_per_text = model(image_input, text_inputs)
    print(logits_per_image.softmax(-1))
# doing L2 normalization 
image_features /= image_features.norm(dim=-1, keepdim=True)
text_features /= text_features.norm(dim=-1, keepdim=True)
print(image_features.shape)
similarity = (100.0 * image_features @ text_features.T).softmax(dim=-1)
print(similarity)
print(similarity.shape)
values, indices = similarity[0].topk(5)

# Print the result
print("\nTop predictions:\n")
for value, index in zip(values, indices):
    print(f"{cifar100.classes[index]:>16s}: {100 * value.item():.2f}%")

可以发现两种方法的预测结果确实是一样的。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
PPO-Clip算法是一种用于训练强化学习智能体的算法,它采用了近似比例优势估计(Proximal Policy Optimization,PPO)以及截断重要性采样(Clipped Surrogate Objective)的方法,能够有效地平衡学习效率和稳定性。 以下是PPO-Clip算法的代码框架: ```python # 定义策略网络和值函数网络 policy_net = PolicyNet() value_net = ValueNet() # 定义优化器 policy_optimizer = torch.optim.Adam(policy_net.parameters(), lr=0.001) value_optimizer = torch.optim.Adam(value_net.parameters(), lr=0.001) # 定义超参数 gamma = 0.99 lambda_ = 0.95 clip_ratio = 0.2 num_epochs = 10 # 开始训练 for epoch in range(num_epochs): # 收集一批经验数据 states, actions, rewards, next_states, dones = collect_experience(env, policy_net) # 计算优势估计值 advantages = compute_advantages(rewards, next_states, dones, value_net, gamma, lambda_) # 更新策略网络 for i in range(len(states)): old_log_probs, old_values = policy_net.evaluate(states[i], actions[i]) # 计算新的策略分布和价值函数预测值 new_log_probs, new_values = policy_net.evaluate(states[i], actions[i]) # 计算比例优势估计的surrogate loss ratio = torch.exp(new_log_probs - old_log_probs) surr1 = ratio * advantages[i] surr2 = torch.clamp(ratio, 1-clip_ratio, 1+clip_ratio) * advantages[i] policy_loss = -torch.min(surr1, surr2).mean() # 计算价值函数预测误差的MSE loss value_loss = F.mse_loss(new_values, old_values) # 计算总的损失函数 loss = policy_loss + 0.5 * value_loss # 执行一步优化 policy_optimizer.zero_grad() value_optimizer.zero_grad() loss.backward() policy_optimizer.step() value_optimizer.step() ``` 其中,`PolicyNet`和`ValueNet`分别表示策略网络和值函数网络,`gamma`和`lambda_`分别表示折扣因子和GAE-Lambda参数,`clip_ratio`表示PPO中的截断比例,`num_epochs`表示训练的迭代次数。在训练过程中,我们首先收集一批经验数据,然后计算优势估计值。接着,我们使用这些经验数据来更新策略网络和值函数网络。在更新策略网络时,我们使用比例优势估计的surrogate loss来进行优化,并采用截断重要性采样的方法来限制策略更新的幅度。最后,我们将策略损失函数和价值函数损失函数相加得到总的损失函数,并执行一步优化。循环执行上述过程直至收敛。
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值