DA-CLIP关于使用BLIP生成数据集的代码注释

lytoo0n

已于 2024-03-27 17:52:48 修改

阅读量1k

点赞数 38

分类专栏： DA-CLIP 文章标签： python 人工智能计算机视觉深度学习

于 2024-03-21 21:58:37 首次发布

本文链接：https://blog.csdn.net/m0_60350022/article/details/136918987

版权

DA-CLIP 专栏收录该内容

25 篇文章 4 订阅

订阅专栏

背景：

BLIP:

DA-CLIP需要的目标：

为了在混合的退化数据集上训练 DA-CLIP，作者使用引导式视觉语言框架 BLIP 为所有 HQ 图像生成描述。

从HQ图像生成的描述是准确的，不传递退化信息。然后，我们可以直接将这些干净的标题、LQ 图像和相应的退化类型结合起来，构建图像-文本-退化类型对。

BLIP开源deom

上BLIP的GitHub开源,readme.md有colab的简易测试代码，直接点开。hugging face的 Web demo 无法使用。

链接：https://colab.research.google.com/github/salesforce/BLIP/blob/main/demo.ipynb

代码

# install requirements
import sys
if 'google.colab' in sys.modules:
    print('Running in Colab.')
    !pip3 install transformers==4.15.0 timm==0.4.12 fairscale==0.4.4
    !git clone https://github.com/salesforce/BLIP
    %cd BLIP

加载测试图。并进行预处理

from PIL import Image
import requests
import torch
from torchvision import transforms
from torchvision.transforms.functional import InterpolationMode

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

def load_demo_image(image_size,device):
    img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg' 
    raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')   

    w,h = raw_image.size
    display(raw_image.resize((w//5,h//5)))
    
    transform = transforms.Compose([
        transforms.Resize((image_size,image_size),interpolation=InterpolationMode.BICUBIC),
        transforms.ToTensor(),
        transforms.Normalize((0.48145466, 0.4578275, 0.40821073), (0.26862954, 0.26130258, 0.27577711))
        ]) 
    image = transform(raw_image).unsqueeze(0).to(device)   
    return image

加载模型，进行生成image caption


image_size = 384
image = load_demo_image(image_size=image_size, device=device)

model_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base_capfilt_large.pth'
    
model = blip_decoder(pretrained=model_url, image_size=image_size, vit='base')
model.eval()
model = model.to(device)

with torch.no_grad():
    # beam search
    caption = model.generate(image, sample=False, num_beams=3, max_length=20, min_length=5) 
    # nucleus sampling
    # caption = model.generate(image, sample=True, top_p=0.9, max_length=20, min_length=5) 
    print('caption: '+caption[0])

model.generate 是在使用 BLIP（Bootstrapped Language Image Pretraining）模型进行图像描述（Image Captioning）任务时的一个方法。这个方法接收多个参数来控制生成图像描述的过程。下面是对您提供的代码中 model.generate 方法参数的解释：

image: 这是要生成描述的输入图像。它应该是一个已经加载并转移到指定设备（如GPU）的张量。
sample: 这是一个布尔值，用于选择生成策略。当 sample=False 时，使用贪婪解码（beam search），即每一步都选择最可能的下一个词。当 sample=True 时，使用采样方法，如核采样（nucleus sampling）。
num_beams: 当使用beam search时，这个参数定义了beam的宽度。它影响解码过程中考虑的不同可能性的数量。较大的beam size可能会导致更多样化和流畅的描述，但也会增加计算成本。
max_length: 这个参数设置了生成描述的最大长度（以词为单位）。如果生成的描述在达到最大长度之前结束，它将被截断。
min_length: 这个参数设置了生成描述的最小长度。如果生成的描述在达到最小长度之前结束，解码过程将继续，直到满足最小长度要求

问题

问题1：装依赖的时候时间较长，需要下载1个G多的timm依赖
问题2：版本报错

修改为transformers==4.16.0。

成功。提示重启会话。全部再次运行。

结果

想要换成我的测试图片。由于读取的imgurl设置成只能读外部链接的图片。了解到图床这种东西。

简单搜了一下聚合图床 - 免费无限图片上传 (superbed.cn)

上传一张测试图

修改imgurl路径，重新运行。

哈哈挺有意思的。

DA-CLIP内有相关py文件

Create dataset:

To generate clean captions with BLIP, we use the clip-interrogator tool. Install it with pip install clip-interrogator==0.6.0 and run:
python ../scripts/generate_captions.py
Then you will get daclip_train.csv and daclip_val.csv under the datasets/universal directory

pip install clip-interrogator==0.6.0

运行../scripts/generate_captions.py

找到该文件

代码解析

is_image_file: 这是一个函数，用于检查给定的文件名是否以 IMG_EXTENSIONS 列表中的某个扩展名结束，从而判断它是否是一个图像文件。

IMG_EXTENSIONS = ['.jpg', '.JPG', '.jpeg', '.JPEG', '.png', '.PNG', '.ppm', '.PPM', '.bmp', '.BMP', 'tif']
def is_image_file(filename):
    return any(filename.endswith(extension) for extension in IMG_EXTENSIONS)

_get_paths_from_images: 这个函数接受一个路径参数，递归地遍历该路径下的所有子目录，收集所有图像文件的路径，并将它们存储在一个列表中。

def _get_paths_from_images(path):
    '''get image path list from image folder'''
    assert os.path.isdir(path), '{:s} is not a valid directory'.format(path)
    images = []
    for dirpath, _, fnames in sorted(os.walk(path)):
        for fname in sorted(fnames):
            if is_image_file(fname):
                img_path = os.path.join(dirpath, fname)
                images.append(img_path)
    assert images, '{:s} has no valid image file'.format(path)
    return images

get_paired_paths: 这个函数读取高质量（GT）和低质量（LQ）图像对。它假设图像文件名的排序方式可以确保配对正确。


def get_paired_paths(dataroot):
    """
    Read LQ (Low Quality) and GT image pairs.
    The pair is ensured by 'sorted' function, so please check the name convention.
    """
    GT_paths, LQ_paths, dagradations = [], [], []
    for deg_type in DEGRADATION_TYPES:
        paths1 = _get_paths_from_images(os.path.join(dataroot, deg_type, 'GT'))
        paths2 = _get_paths_from_images(os.path.join(dataroot, deg_type, 'LQ'))

        GT_paths.extend(paths1)  # GT list
        LQ_paths.extend(paths2)  # LR list

        dagradations.extend([deg_type]*len(paths2))
    print(f'GT length: {len(GT_paths)}, LQ length: {len(LQ_paths)}')
    return GT_paths, LQ_paths, dagradations

`generate_captions`: 这个函数为给定模式（如训练集 'train' 或验证集 'val'）的图像对生成描述。

首先，函数调用get_paired_paths函数来获取当前模式（训练或验证）下的高质量（GT）图像和低质量（LQ）图像的路径列表，以及对应的退化类型列表。
函数创建一个空的字典future_df，用于存储生成的描述和对应的低质量图像路径。
使用tqdm库来创建一个进度条，这有助于在处理大量图像时显示进度。
通过zip函数，函数遍历GT图像路径、LQ图像路径和退化类型列表。对于每一组路径和退化类型：
- 使用Image.open打开GT图像，并将其转换为RGB模式。
- 调用ci.generate_caption方法，为当前的GT图像生成一个描述。
- 将生成的描述与退化类型结合起来，形成最终的标题（title）。
- 将LQ图像的路径和生成的标题添加到future_df字典中。
最后，使用pd.DataFrame.from_dict将future_df字典转换为一个Pandas DataFrame。
将DataFrame保存到CSV文件中，文件名由模式（'train'或'val'）和数据集名称组成，例如daclip_train.csv或daclip_val.csv。文件保存在dataroot指定的目录下，使用制表符（\t）作为字段分隔符。

DEGRADATION_TYPES = ['motion-blurry','hazy','jpeg-compressed','low-light','noisy','raindrop','rainy','shadowed','snowy','uncompleted']
def generate_captions(dataroot, ci, mode='train'):
    GT_paths, LQ_paths, dagradations = get_paired_paths(os.path.join(dataroot, mode))

    future_df = {"filepath":[], "title":[]}
    for gt_image_path, lq_image_path, dagradation in tqdm(zip(GT_paths, LQ_paths, dagradations)):
        image = Image.open(gt_image_path).convert('RGB')
        caption = ci.generate_caption(image)
        title = f'{caption}: {dagradation}'

        future_df["filepath"].append(lq_image_path)
        future_df["title"].append(title)

    pd.DataFrame.from_dict(future_df).to_csv(
        os.path.join(dataroot, f"daclip_{mode}.csv"), index=False, sep="\t"
    )

主函数，设置数据集地址、使用ViT-L-14模型

if __name__ == "__main__":
    dataroot = 'datasets/universal'

    ci = Interrogator(Config(clip_model_name="ViT-L-14/openai"))

    generate_captions(dataroot, ci, 'val')
    generate_captions(dataroot, ci, 'train')