【数据】-论文《Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models》

宇宙计算机

已于 2024-08-13 15:57:47 修改

阅读量680

点赞数 21

文章标签：语言模型人工智能自然语言处理

于 2024-08-12 22:59:55 首次发布

本文链接：https://blog.csdn.net/weixin_44151034/article/details/141142715

版权

论文《Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models》里的相关数据

论文作者把数据放在了这里：
数据解析：
数据下载
- 原始数据下载（直接在作者的github官网下载就好），不过我也提供了一个百度网盘：
- 我整理出的数据下载：

论文作者把数据放在了这里：

作者所生成的物体权重位于这里：
在这里插入图片描述
我们点开可以看到类似的文件目录：

也就是这样的目录：

其中没展开的spatial_imgs文件夹内放的全是图片中的bounding box的覆盖信息（其实我对这个没理解透，若大家有更详细的理解，欢迎留言。）
在这里插入图片描述

数据解析：

与数据无关的权重文件

vg根目录下的ViT-B_32_spatial_logits.pth文件。
clip_obj_feature文件夹下的ViT-B_32_clip_obj_feature_dict.pth文件。
proposal文件夹下的ViT-B_32_box_proposal_dict.pth文件和ViT-B_32_spatial_name.pth文件。

这三个文件实际上都是clip或vit的权重，我们不用去读取研究，我们去研究专门的数据就好。

与数据有关的文件：

在这里插入图片描述
这四个文件是与data有关的文件。

des_prompts.pth文件里，定义了每个关系其sub(主体)，obj(客体)，pos（位置），size（大小）的可能属性：

其结果如下所示：
在这里插入图片描述
我们print一下字典的key值，结果如下所示：

state_dict = torch.load("C:\code\\read_pth\\recode\\vg\des_prompts.pth")
print(state_dict.keys())

dict_keys(['carrying_product_human', 'carrying_product_animal', 'carrying_animal_animal', 'carrying_human_human', 'carrying_human_animal', 'carrying_animal_human', 'carrying_animal_product', 'carrying_human_product', 'carrying_product_product', 'covered in_product_animal', 'covered in_product_product', 'covered in_animal_product', 'covering_product_product', 'covering_human_product', 'covering_product_human', 'eating_product_animal', 'eating_animal_animal', 'eating_animal_product', 'eating_product_product', 'flying in_animal_animal', 'flying in_product_product', 'growing on_product_product', 'hanging from_product_product', 'hanging from_animal_product', 'hanging from_human_product', 'hanging from_product_human', 'holding_product_human', 'holding_animal_human', 'holding_human_human', 'holding_human_animal', 'holding_human_product', 'laying on_animal_human', 'laying on_animal_animal', 'laying on_human_animal', 'laying on_human_human', 'laying on_product_product', 'looking at_human_human', 'looking at_animal_human', 'looking at_human_animal', 'looking at_animal_animal', 'looking at_animal_product', 'mounted on_product_product', 'parked on_product_product', 'playing_animal_animal', 'playing_human_human', 'riding_animal_animal', 'riding_human_human', 'riding_product_product', 'says_human_human', 'says_product_product', 'sitting on_product_human', 'sitting on_animal_animal', 'sitting on_human_animal', 'sitting on_animal_human', 'sitting on_human_human', 'sitting on_product_product', 'covered in_human_product', 'covering_product_animal', 'eating_human_product', 'eating_human_animal', 'flying in_animal_product', 'growing on_product_animal', 'growing on_product_human', 'hanging from_product_animal', 'holding_animal_product', 'holding_animal_animal', 'holding_product_animal', 'holding_product_product', 'laying on_human_product', 'laying on_animal_product', 'looking at_human_product', 'looking at_product_animal', 'looking at_product_product', 'lying on_human_product', 'lying on_animal_product', 'lying on_human_animal', 'lying on_animal_animal', 'lying on_animal_human', 'lying on_human_human', 'lying on_product_product', 'mounted on_product_human', 'mounted on_product_animal', 'mounted on_human_animal', 'painted on_human_product', 'painted on_product_product', 'painted on_animal_product', 'playing_human_animal', 'playing_human_product', 'playing_animal_product', 'playing_product_product', 'riding_human_product', 'riding_animal_product', 'riding_human_animal', 'says_human_animal', 'says_human_product', 'sitting on_human_product', 'sitting on_animal_product', 'standing on_human_product', 'standing on_animal_product', 'standing on_animal_animal', 'standing on_human_animal', 'standing on_product_product', 'to_animal_product', 'to_human_human', 'to_product_animal', 'to_product_product', 'using_human_product', 'using_animal_product', 'using_human_human', 'using_product_product', 'walking in_human_product', 'walking in_animal_product', 'walking in_animal_animal', 'walking on_human_product', 'walking on_animal_product', 'walking on_animal_animal', 'walking on_product_human', 'watching_human_product', 'watching_human_animal', 'watching_human_human', 'watching_animal_product', 'watching_animal_human', 'standing on_human_human', 'to_human_product', 'to_product_human', 'walking in_human_human', 'walking on_product_product', 'watching_animal_animal', 'watching_product_animal'])

des_weight.pth文件，定义了上面每一个key值的权重：

使用代码：

np_des_weight = np.load('C:\code\\read_pth\\recode\\vg\des_weight.npy',allow_pickle = True)
print(np_des_weight)

输出其内容如下：

{'painted on_product_product': array([0.5, 0.4, 0.1]), 'to_human_product': array([0.4, 0.4, 0.2]), 'carrying_product_human': array([0.6, 0.2, 0.2]), 'carrying_product_animal': array([0.4, 0.4, 0.2]), 'carrying_animal_animal': array([0.5, 0.4, 0.1]), 'carrying_human_human': array([0.35, 0.35, 0.3 ]), 'carrying_human_animal': array([0.6, 0.3, 0.1]), 'carrying_animal_human': array([0.3, 0.4, 0.3]), 'carrying_animal_product': array([0.2, 0.6, 0.2]), 'carrying_human_product': array([0.4, 0.4, 0.2]), 'carrying_product_product': array([0.4, 0.4, 0.2]), 'covered in_product_animal': array([0.5, 0.4, 0.1]), 'covered in_product_product': array([0.4, 0.4, 0.2]), 'covered in_animal_product': array([0.5, 0.3, 0.2]), 'covering_product_product': array([0.4, 0.4, 0.2]), 'covering_human_product': array([0.2, 0.2, 0.6]), 'covering_product_human': array([0.5, 0.3, 0.2]), 'eating_product_animal': array([0.4, 0.4, 0.2]), 'eating_animal_animal': array([0.4, 0.5, 0.1]), 'eating_animal_product': array([0.4, 0.4, 0.2]), 'eating_product_product': array([0.5, 0.3, 0.2]), 'flying in_animal_animal': array([0.5, 0.5, 0. ]), 'flying in_product_product': array([0.5 , 0.35, 0.15]), 'growing on_product_product': array([0.15, 0.7 , 0.15]), 'hanging from_product_product': array([0.5, 0.3, 0.2]), 'hanging from_animal_product': array([0.4, 0.4, 0.2]), 'hanging from_human_product': array([0.3, 0.3, 0.4]), 'hanging from_product_human': array([0.5, 0.3, 0.2]), 'holding_product_human': array([0.4, 0.4, 0.2]), 'holding_animal_human': array([0.4, 0.4, 0.2]), 'holding_human_human': array([0.4, 0.4, 0.2]), 'holding_human_animal': array([0.6, 0.2, 0.2]), 'holding_human_product': array([0.4, 0.4, 0.2]), 'laying on_animal_human': array([0.2, 0.2, 0.6]), 'laying on_animal_animal': array([0.2, 0.3, 0.5]), 'laying on_human_animal': array([0.2, 0.3, 0.5]), 'laying on_human_human': array([0.1, 0.1, 0.8]), 'laying on_product_product': array([0.2, 0.2, 0.6]), 'looking at_human_human': array([0.3, 0.3, 0.4]), 'looking at_animal_human': array([0.3, 0.4, 0.3]), 'looking at_human_animal': array([0.5, 0.4, 0.1]), 'looking at_animal_animal': array([0.5, 0.5, 0. ]), 'looking at_animal_product': array([0.3, 0.5, 0.2]), 'mounted on_product_product': array([0.35, 0.35, 0.3 ]), 'parked on_product_product': array([0.5, 0.3, 0.2]), 'playing_animal_animal': array([0.4, 0.4, 0.2]), 'playing_human_human': array([0.4, 0.4, 0.2]), 'riding_animal_animal': array([0.4, 0.4, 0.2]), 'riding_human_human': array([0.6, 0.2, 0.2]), 'riding_product_product': array([0.3, 0.3, 0.4]), 'says_human_human': array([0.4, 0.4, 0.2]), 'says_product_product': array([0.5 , 0.35, 0.15]), 'sitting on_product_human': array([0.3, 0.3, 0.4]), 'sitting on_animal_animal': array([0.6, 0.2, 0.2]), 'sitting on_human_animal': array([0.4, 0.4, 0.2]), 'sitting on_animal_human': array([0.2, 0.5, 0.3]), 'sitting on_human_human': array([0.1, 0.1, 0.8]), 'sitting on_product_product': array([0.5, 0.4, 0.1]), 'covered in_human_product': array([0.5, 0.4, 0.1]), 'covering_product_animal': array([0.5, 0.4, 0.1]), 'eating_human_product': array([0.45, 0.45, 0.1 ]), 'eating_human_animal': array([0.3, 0.5, 0.2]), 'flying in_animal_product': array([0.7, 0.1, 0.2]), 'growing on_product_animal': array([0.6, 0.3, 0.1]), 'growing on_product_human': array([0.4, 0.4, 0.2]), 'hanging from_product_animal': array([0.5, 0.4, 0.1]), 'holding_animal_product': array([0.4, 0.4, 0.2]), 'holding_animal_animal': array([0.4, 0.4, 0.2]), 'holding_product_animal': array([0.5, 0.4, 0.1]), 'holding_product_product': array([0.4, 0.4, 0.2]), 'laying on_human_product': array([0.2, 0.2, 0.6]), 'laying on_animal_product': array([0.1, 0.1, 0.8]), 'looking at_human_product': array([0.6, 0.2, 0.2]), 'looking at_product_animal': array([0.375, 0.375, 0.25 ]), 'looking at_product_product': array([0.4, 0.4, 0.2]), 'lying on_human_product': array([0.3, 0.3, 0.4]), 'lying on_animal_product': array([0.2, 0.6, 0.2]), 'lying on_human_animal': array([0.6, 0.2, 0.2]), 'lying on_animal_animal': array([0.3, 0.4, 0.3]), 'lying on_animal_human': array([0.5, 0.3, 0.2]), 'lying on_human_human': array([0.2, 0.2, 0.6]), 'lying on_product_product': array([0.2, 0.6, 0.2]), 'mounted on_product_human': array([0.6, 0.3, 0.1]), 'mounted on_product_animal': array([0.4, 0.5, 0.1]), 'mounted on_human_animal': array([0.4, 0.4, 0.2]), 'painted on_human_product': array([0.35, 0.35, 0.3 ]), 'painted on_animal_product': array([0.35, 0.45, 0.2 ]), 'playing_human_animal': array([0.35, 0.35, 0.3 ]), 'playing_human_product': array([0.5, 0.3, 0.2]), 'playing_animal_product': array([0.4, 0.4, 0.2]), 'playing_product_product': array([0.4, 0.4, 0.2]), 'riding_human_product': array([0.5, 0.4, 0.1]), 'riding_animal_product': array([0.2, 0.4, 0.4]), 'riding_human_animal': array([0.3, 0.5, 0.2]), 'says_human_animal': array([0.4, 0.4, 0.2]), 'says_human_product': array([0.4, 0.4, 0.2]), 'sitting on_human_product': array([0.4, 0.4, 0.2]), 'sitting on_animal_product': array([0.45, 0.45, 0.1 ]), 'standing on_human_product': array([0.6, 0.3, 0.1]), 'standing on_animal_product': array([0.6, 0.3, 0.1]), 'standing on_animal_animal': array([0.6, 0.3, 0.1]), 'standing on_human_animal': array([0.4, 0.4, 0.2]), 'to_animal_product': array([0.3, 0.3, 0.4]), 'to_human_human': array([0.1, 0.1, 0.8]), 'to_product_animal': array([0.3, 0.3, 0.4]), 'to_product_product': array([0.4, 0.4, 0.2]), 'using_human_product': array([0.5, 0.3, 0.2]), 'using_animal_product': array([0.4, 0.4, 0.2]), 'using_human_human': array([0.5, 0.4, 0.1]), 'using_product_product': array([0.35, 0.35, 0.3 ]), 'walking in_human_product': array([0.4, 0.4, 0.2]), 'walking in_animal_product': array([0.2, 0.6, 0.2]), 'walking in_animal_animal': array([0.45, 0.45, 0.1 ]), 'walking on_human_product': array([0.4, 0.4, 0.2]), 'walking on_animal_product': array([0.4, 0.4, 0.2]), 'walking on_animal_animal': array([0.4, 0.4, 0.2]), 'walking on_product_human': array([0.6, 0.2, 0.2]), 'watching_human_product': array([0.45, 0.45, 0.1 ]), 'watching_human_animal': array([0.4, 0.4, 0.2]), 'watching_human_human': array([0.5, 0.5, 0. ]), 'watching_animal_product': array([0.4, 0.4, 0.2]), 'watching_animal_human': array([0.5, 0.4, 0.1]), 'standing on_human_human': array([0.5, 0.5, 0. ]), 'to_product_human': array([0.4, 0.4, 0.2]), 'walking in_human_human': array([0.2, 0.2, 0.6]), 'walking on_product_product': array([0.4, 0.4, 0.2]), 'watching_animal_animal': array([0.45, 0.45, 0.1 ]), 'watching_product_animal': array([0.4, 0.4, 0.2]), 'standing on_product_product': array([0.1, 0.1, 0.8])}

obj_valid_tris.npy，其内部保存了每一个object，对当前的relation是否合法，其内容如下:

使用如下代码：

obj_valid_tris = np.load('C:\code\\read_pth\\recode\\vg\obj_valid_tris.npy', allow_pickle=True)
print(obj_valid_tris)

输出内容如下：
（数据很多，但数据全粘贴的话帖子太长了，只粘贴了一部分，大家自己输出试试）

{'airplane_carrying': False, 'airplane_covered in': False, 'airplane_covering': True, 'airplane_eating': False, 'airplane_flying in': True, 'airplane_growing on': False, 'airplane_hanging from': True, 'airplane_holding': False, 'airplane_laying on': True, 'airplane_looking at': True, 'airplane_lying on': True, 'airplane_mounted on': True, 'airplane_parked on': False, 'airplane_playing': False, 'airplane_riding': True, 'airplane_says': True, 'airplane_sitting on': True, 'airplane_standing on': True, 'airplane_to': False, 'airplane_using': True, 'airplane_walking in': False, 'airplane_walking on': False, 'airplane_watching': True, 'animal_carrying': True, 'animal_covered in': True, 'animal_covering': True, 'animal_eating': True, 'animal_flying in': True, 'animal_growing on': True, 'animal_hanging from': True, 'animal_holding': True, 'animal_laying on': True, 'animal_looking at': True, 'animal_lying on': True, 'animal_mounted on': True, 'animal_painted on': True, 'animal_parked on': False, 'animal_playing': False, 'animal_says': True, 'animal_sitting on': True, 'animal_standing on': True, 'animal_to': True, 'animal_using': True, 'animal_walking on': False, 'animal_watching': True, 'arm_carrying': False, 'arm_covered in': True, 'arm_covering': True, 'arm_eating': False, 'arm_growing on': True, 'arm_hanging from': True, 'arm_holding': True, 'arm_laying on': True, 'arm_looking at': False, 'arm_mounted on': True, 'arm_painted on': True, 'arm_parked on': False, 'arm_playing': False, 'arm_says': False, 'arm_sitting on': True, 'arm_standing on': True, 'arm_to': False, 'arm_walking in': False, 'arm_walking on': False, 'arm_watching': False, 'bag_carrying': True, 'bag_covered in': True, 'bag_covering': False, 'bag_eating': False, 'bag_growing on': False, 'bag_hanging from': True, 'bag_holding': True, 'bag_laying on': True, 'bag_looking at': True, 'bag_mounted on': False, 'bag_painted on': False, 'bag_parked on': False, 'bag_playing': False,

sub_valid_tris.npy，其内部保存了每一个subject，对当前的relation是否合法，其内容如下:

使用如下代码：

sub_valid_tris = np.load('C:\code\\read_pth\\recode\\vg\sub_valid_tris.npy', allow_pickle=True)
print(sub_valid_tris)

输出内容如下：
（数据很多，但数据全粘贴的话帖子太长了，只粘贴了一部分，大家自己输出试试）

{'airplane_carrying': False, 'airplane_covered in': False, 'airplane_covering': True, 'airplane_eating': False, 'airplane_flying in': True, 'airplane_growing on': False, 'airplane_hanging from': False, 'airplane_holding': True, 'airplane_laying on': False, 'airplane_looking at': False, 'airplane_lying on': False, 'airplane_mounted on': True,

数据下载

原始数据下载（直接在作者的github官网下载就好），不过我也提供了一个百度网盘：

官网github上就有

百度网盘下载链接：
通过百度网盘分享的文件：recode.zip
链接：https://pan.baidu.com/s/1XZxrew0pvDLuCENWQwO1QA
提取码：q3pr
–来自百度网盘超级会员V8的分享

我整理出的数据下载：

我使用如下代码，把des_prompts.pth文件保存到了csv,txt,json文件里。

import torch
import numpy as np
import csv
import json


# Function to save dictionary keys and values to a text file
def save_to_txt(state_dict, filename, delimiter=" "):
    """
    Saves dictionary keys and values (one line per entry) to a text file.

    Args:
        data (dict): The dictionary containing keys and values.
        filename (str): The name of the text file to save to.
    """

    with open(filename, 'w') as f:
        for key, value in state_dict.items():
            # Handle potential non-string values by converting them to strings
            key_str = str(key)
            value_str = str(value)
            f.write(f"{key_str}{delimiter}{value_str}\n")

# Function to save dictionary keys and values to an Excel file (assuming pandas is installed)
def save_to_csv_file(state_dict, csv_filename):
    """
        Saves the contents of a state_dict (keys and corresponding values)
        to a text file in a tabular format.

        Args:
            state_dict (dict): The state_dict to be saved.
            filename (str): The name of the output file (txt or word).
    """

    # Save to CSV file
    with open(csv_filename, mode='w', newline='') as csv_file:
        writer = csv.writer(csv_file)
        # Write the header
        writer.writerow(['key', 'sub', 'obj', 'pos', 'size'])
        # Write the data rows
        for key, value in state_dict.items():
            writer.writerow([key] + [value['sub'][0], value['obj'][0], value['pos'][0], value['size'][0]])


# Function to convert numpy arrays in the dictionary to lists
def convert_arrays_to_lists(d):
    for key, value in d.items():
        if isinstance(value, dict):
            convert_arrays_to_lists(value)
        elif isinstance(value, np.ndarray):
            d[key] = value.tolist()
    return d


# Function to save dictionary to a compact JSON file
def save_to_json_file(state_dict, json_filename):
    """
    Saves the contents of a state_dict to a JSON file in a compact format.

    Args:
        state_dict (dict): The state_dict to be saved.
        json_filename (str): The name of the output JSON file.
    """
    # Convert numpy arrays to lists
    state_dict = convert_arrays_to_lists(state_dict)

    # Save to JSON file in a compact format
    with open(json_filename, 'w') as json_file:
        json.dump(state_dict, json_file, separators=(',', ':'))

if __name__ == '__main__':

    # Load the state_dict
    state_dict = torch.load("C:\code\\read_pth\\recode\\vg\des_prompts.pth")

    # Print the keys for reference (optional)
    print(state_dict.keys())  # This can be helpful for debugging or verification

    text_filename = "state_dict_output.txt"

    # Save the state_dict to a file
    save_to_txt(state_dict, text_filename)  # Replace with your desired filename

    print(f"Successfully saved data to text file: key_values.txt")


    # 注重效率仍然是我们做科研里最重要的事。
    # Save the state_dict to a file
    csv_filename = "state_dict_output.csv"  # Change extension to .csv for clarity
    save_to_csv_file(state_dict, csv_filename)

    print(f"State dict keys and values saved to {csv_filename}")

    json_filename = "state_dict_output.json"
    # Save the state_dict to a JSON file
    save_to_json_file(state_dict, json_filename)
    print(f"State dict keys and values saved to {json_filename}")