【无标题】

最新推荐文章于 2024-01-18 23:25:36 发布

SEUsmith

最新推荐文章于 2024-01-18 23:25:36 发布

阅读量352

点赞数 11

分类专栏：硕士毕设学习笔记文章标签：深度学习人工智能机器学习

本文链接：https://blog.csdn.net/yuanshi985/article/details/135536838

版权

硕士毕设学习笔记专栏收录该内容

1 篇文章 0 订阅

订阅专栏

文章讲述了作者在研究数据湖搜索时遇到的多模态数据对齐问题，通过使用CLIP模型和SentenceTransformer，以及处理PyTorch中张量维度不匹配和设备问题，展示了如何在代码中实现文本与图像的相似度计算。

摘要由CSDN通过智能技术生成

Clip学习笔记No.01（图像检索案例之张量对齐）

这段时间导师建议我关于数据湖搜索相关数据集的研究，从表这样一种单一结构化数据演化到多模态数据，因此我开始调研多模态模型。
在阅读论文的同时，我今天开始实操演示了一个demo，但当头一棒遇到了多模态领域的一个痛点问题，如何将不同模态的数据对齐到同一空间。

code

import torch
from PIL import Image
from transformers import CLIPProcessor, CLIPModel
from sentence_transformers import SentenceTransformer
import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:32"
roberta_path='D:\DL_code\starmie-main\starmie-main\models\\clip-vit-base-patch32'
# 加载CLIP模型和处理器
model = CLIPModel.from_pretrained('openai/clip-vit-base-patch32' )
processor = CLIPProcessor.from_pretrained('openai/clip-vit-base-patch32')

# 加载用于将文本描述转化为向量的SentenceTransformer模型
sentence_model = SentenceTransformer('paraphrase-MiniLM-L6-v2')

# 输入文本查询
text_query = "The two children glided happily on the skateboard."

# 将文本描述转化为向量
text_vector = sentence_model.encode([text_query], convert_to_tensor=True)

# 加载本地图像并进行预处理
image_path = "img/2090545563_a4e66ec76b.jpg"
image = Image.open(image_path)
image_input = processor(images=image, return_tensors="pt")

# 获取图像的表示向量
image_features = model.get_image_features(**image_input)



# 计算文本向量与图像向量之间的相似度
text_vector_resized =  torch.nn.functional.interpolate(text_vector.unsqueeze(0), size=image_features.shape[-1], mode='nearest').squeeze(0)

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
text_vector_resized = text_vector_resized.to(device)
image_features = image_features.to(device)

similarity_score = torch.nn.functional.cosine_similarity(text_vector_resized, image_features)

print(f"Similarity Score: {similarity_score.item()}")

问题及解决方案

代码一直报错，报这两个张量的维度没有对齐
因此将文本调整与图像相同的维度，具体的代码如下所示：

text_vector_resized =  torch.nn.functional.interpolate(text_vector.unsqueeze(0), size=image_features.shape[-1], mode='nearest').squeeze(0)

后来又报错不在同一设备，因此又改动代码

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
text_vector_resized = text_vector_resized.to(device)
image_features = image_features.to(device)

这样做的目的是确保两个向量嵌入都在GPU或者说都在CPU上进行计算，将数据转移到同一设备

pytorch张量大小不匹配错误解决方案

直接调整张量大小

import torch
a=torch.randn(300,100)
b=torch.randn(512,100)
#调整张量a的大小以匹配张量b
a=a.reshape(512,300)
#另一种解决方法
a=a.view(512,300)
#执行张量运算
c=torch.matmul(a,b)

扩展张量维度

import torch 
a=torch.randn(707)
b=torch.torch.randn(512, 707)

a = a.unsqueeze(1)
#或者使用expand函数
a=a.expand(707,1)
c=torch.matmul(b,a)

此外，还可以通过切片和索引操作选择张量的部分数据进行运算。

SEUsmith

关注

11
点赞
踩
8

收藏

觉得还不错? 一键收藏
1
评论
【无标题】

本周开始研究多模态，为开题做准备
复制链接

扫一扫

专栏目录