如何预处理数据集 Visual Genome

最新推荐文章于 2025-02-07 00:02:16 发布

blauschneiden

最新推荐文章于 2025-02-07 00:02:16 发布

阅读量3.3k

点赞数 13

分类专栏：论文解析文章标签：机器学习深度学习 pytorch 神经网络数据库

本文链接：https://blog.csdn.net/blauschneiden/article/details/108288040

版权

论文解析专栏收录该内容

3 篇文章

订阅专栏

本文详述了VisualGenome数据集的预处理流程，包括下载、划分数据集、处理物体与关系别名、过滤物体及关系，最终将信息整合并存储为HDF5文件。涵盖物体、属性和关系词汇表的构建，以及图片信息的标准化。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

如何预处理数据集 Visual Genome

- Visual Genome

Visual Genome

本文中所用的代码绝大部分都来自 image generation from scene graph ，配合源代码食用效果更佳。

构建 download_vg.sh 下载数据库（可选）

bash 程序：

VG_DIR=datasets/vg
mkdir -p $VG_DIR

wget https://visualgenome.org/static/data/dataset/objects.json.zip -O $VG_DIR/objects.json.zip
wget https://visualgenome.org/static/data/dataset/attributes.json.zip -O $VG_DIR/attributes.json.zip
wget https://visualgenome.org/static/data/dataset/relationships.json.zip -O $VG_DIR/relationships.json.zip
wget https://visualgenome.org/static/data/dataset/object_alias.txt -O $VG_DIR/object_alias.txt
wget https://visualgenome.org/static/data/dataset/relationship_alias.txt -O $VG_DIR/relationship_alias.txt
wget https://visualgenome.org/static/data/dataset/image_data.json.zip -O $VG_DIR/image_data.json.zip
wget https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip -O $VG_DIR/images.zip
wget https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip -O $VG_DIR/images2.zip

unzip $VG_DIR/objects.json.zip -d $VG_DIR
unzip $VG_DIR/attributes.json.zip -d $VG_DIR
unzip $VG_DIR/relationships.json.zip -d $VG_DIR
unzip $VG_DIR/image_data.json.zip -d $VG_DIR
unzip $VG_DIR/images.zip -d $VG_DIR/images
unzip $VG_DIR/images2.zip -d $VG_DIR/images

运行：

bash download_vg.sh

image_data.json

image_data.json 文件包含了 VG 数据库所有图片的信息，对总共的108,077 张图片分别给出了描述，描述内容包括但不限于图片的宽高，url，id 等。

Python 读取示例：

import json

##默认路径是'image_data.json',可根据实际情况选择路径
with open('image_data.json', 'r') as f:
	##读取出的 images 是 list 类型，长度是108,077
	images = json.load(f)

##构建 dict 实现从 id 到 image 信息的一一对应
image_id_to_image = {i['image_id']: i for i in images}

##输出一个示例看看 image 信息
for each in images:
	print('Information of a single image: \n', each)
	break

输出：

Information of a single image: 
 {'width': 800, 'url': 'https://cs.stanford.edu/people/rak248/VG_100K_2/1.jpg', 'height': 600, 'image_id': 1, 'coco_id': None, 'flickr_id': None}

创建 vg_splits.json 将数据集分为 train, val 和 test

Python 代码：

import json, random

##图片总数为108077，分为 train(80%) 86463，validation(10%) 10807，test(10%) 10807
nums, train_num, val_num = 108077, 86463, 10807

##创建一个set类型存储所有图片id
id_store = set(range(1, nums + 1))

##从 id_store 中取出用作 train 的图片 id
train_ids = random.sample(id_store, train_num)

##计算出剩下的 id
id_remain = id_store.difference(train_ids)

##从 id_remain 中取出用作 validation 的图片 id
val_ids = random.sample(id_remain, val_num)

##剩下的图片 id 用作 test
id_remain = id_remain.difference(val_ids)
test_ids = list(id_remain)

##将这些 id 都放入 dict 类型中
split_dict = {"train":train_ids, "val":val_ids, "test":test_ids}

##将类型从 dict 转换为 str， 否则存储为 json 文件的时候会报错
split_str = json.dumps(split_dict)

##存储为 json 文件，默认存储路径为 './data/vg_splits.json'，可根据实际情况修改
with open('data/vg_splits.json', 'w') as f:
	f.write(split_str)

测试是否正确存储为 json 文件：

splits_json = './data/vg_splits.json'
with open(splits_json, 'r') as f:
	splits = json.load(f)
for split_name, split_list in splits.items():
	print(split_name, type(split_list), len(split_list))

输出：

train <class 'list'> 86463
val <class 'list'> 10807
test <class 'list'> 10807

去除 train, val 和 test 中尺寸较小的图片

Python 程序：

def remove_small_images(min_image_size, image_id_to_image, splits):
	new_splits = {}
	for split_name, image_ids in splits.items():
		new_image_ids = []
		num_skipped = 0
		for image_id in image_ids:
			image = image_id_to_image[image_id]
			height, width = image['height'], image['width']
			if min(height, width) < min_image_size:
				num_skipped += 1
				continue
			new_image_ids.append(image_id)
		new_splits[split_name] = new_image_ids
		print('Removed %d images from split "%s" for being too small' %
		(num_skipped, split_name))
	return new_splits
	
##设置最小图片尺寸，可根据实际情况进行改变
min_image_size = 200

##读取储存图片 id 的 json 文件
splits_json = './data/vg_splits.json'
with open(splits_json, 'r') as f:
	splits = json.load(f)

##去除尺寸较小的图片 id
splits = remove_small_images(min_image_size, image_id_to_image, splits)

输出：

##由于数据分割是随机的，所以输出结果可能会不同
Removed 335 images from split "train" for being too small
Removed 46 images from split "val" for being too small
Removed 45 images from split "test" for being too small

处理物体和关系的别名（object_alias.txt 和 relationship_alias.txt）

同一个物体类别可以有不同的名称（例如名称有单复数的区别，但表示的还是同一类物体），物体之间的关系描述也是如此，就像 in 和 inside of 表示的也是同一类关系。

Python 代码

def load_aliases(alias_path):
	aliases = {}
	with open(alias_path, 'r') as f:
		for line in f:
			## strip() remove spaces at the beginning and at the end of the string
			line = [s.strip() for s in line.split(',')]
			for s in line:
				aliases[s] = line[0]
	return aliases
	
##默认路径是 'object_alias.txt' 和 'relationship_alias.txt'，可根据实际情况改变
obj_aliases = load_aliases('object_alias.txt')
rel_aliases = load_aliases('relationship_alias.txt')

objects.json

读取 objects.json 文件，并构建字典 object_name_to_idx 实现物体名称和编号的一一对应。当然，可以预先设置一些过滤器过滤掉出现次数较少的 object，这能让训练出来的模型表现效果更好。

Python 代码：

def create_object_vocab(min_object_instances, image_ids, objects, aliases, vocab):
	image_ids = set(image_ids)

	print('Making object vocab from %d training images' % len(image_ids))

	##使用 Counter 类型来统计物体出现的次数，方便过滤
	object_name_counter = Counter()
	for image in objects:
		if image['image_id'] not in image_ids:
			continue
		for obj in image['objects']:
			names = set()
			##需要注意的是， object 的名字是存在 list 中的，但我看了几个例子，
			##没有发现 list 放了物体的多个名字，都只有一个名字。
			for name in obj['names']:
				names.add(aliases.get(name, name))
			object_name_counter.update(names)

	object_names = ['__image__']
	for name, count in object_name_counter.most_common():
		##只有当物体出现的次数大于设定的阈值，才把它的名字加进去。
		if count >= min_object_instances:
			object_names.append(name)
	print('Found %d object categories with >= %d training instances' %
	(len(object_names), min_object_instances))
	
	##这个地方也有点意思，从物体的名字到编号用字典存储，但从物体的编号到名字却用表格存储，
	##我能想到的解释是作者不希望让字典的 key 是 int 类型。
	object_name_to_idx = {}
	object_idx_to_name = []
	for idx, name in enumerate(object_names):
		object_name_to_idx[name] = idx
		object_idx_to_name.append(name)

	vocab['object_name_to_idx'] = object_name_to_idx
	vocab['object_idx_to_name'] = object_idx_to_name
	
##objects.json 的默认路径为 './objects.json'，可根据实际情况修改。
with open('objects.json', 'r') as f:
	objects = json.load(f)
print('type and length of objects json', type(objects), len(objects))
##可以输出一个例子来看看
print(objects[0])

min_object_instances = 2000
vocab = {}
##splits 和 obj_aliases 都在上文生成过了
train_ids = splits['train']
create_object_vocab(min_object_instances, train_ids, objects, obj_aliases, vocab)

输出：

type and length of objects json <class 'list'> 108077

##其实我很惊讶，单个图片就已经列出了这么多物体的信息
{'image_id': 1, 'objects': [{'synsets': ['tree.n.01'], 'h': 557, 'object_id': 1058549, 'merged_object_ids': [], 'names': ['trees'], 'w': 799, 'y': 0, 'x': 0}, {'synsets': ['sidewalk.n.01'], 'h': 290, 'object_id': 1058534, 'merged_object_ids': [5046], 'names': ['sidewalk'], 'w': 722, 'y': 308, 'x': 78}, {'synsets': ['building.n.01'], 'h': 538, 'object_id': 1058508, 'merged_object_ids': [], 'names': ['building'], 'w': 222, 'y': 0, 'x': 1}, {'synsets': ['street.n.01'], 'h': 258, 'object_id': 1058539, 'merged_object_ids': [3798578], 'names': ['street'], 'w': 359, 'y': 283, 'x': 439}, {'synsets': ['wall.n.01'], 'h': 535, 'object_id': 1058543, 'merged_object_ids': [], 'names': ['wall'], 'w': 135, 'y': 1, 'x': 0}, {'synsets': ['tree.n.01'], 'h': 360, 'object_id': 1058545, 'merged_object_ids': [], 'names': ['tree'], 'w': 476, 'y': 0, 'x': 178}, {'synsets': ['shade.n.01'], 'h': 189, 'object_id': 5045, 'merged_object_ids': [], 'names': ['shade'], 'w': 274, 'y': 344, 'x': 116}, {'synsets': ['van.n.05'], 'h': 176, 'object_id': 1058542, 'merged_object_ids': [1058536], 'names': ['van'], 'w': 241, 'y': 278, 'x': 533}, {'synsets': ['trunk.n.01'], 'h': 348, 'object_id': 5055, 'merged_object_ids': [], 'names': ['tree trunk'], 'w': 78, 'y': 213, 'x': 623}, {'synsets': ['clock.n.01'], 'h': 363, 'object_id': 1058498, 'merged_object_ids': [], 'names': ['clock'], 'w': 77, 'y': 63, 'x': 422}, {'synsets': ['window.n.01'], 'h': 147, 'object_id': 3798579, 'merged_object_ids': [], 'names': ['windows'], 'w': 198, 'y': 1, 'x': 602}, {'synsets': ['man.n.01'], 'h': 248, 'object_id': 3798576, 'merged_object_ids': [1058540], 'names': ['man'], 'w': 82, 'y': 264, 'x': 367}, {'synsets': ['man.n.01'], 'h': 259, 'object_id': 3798577, 'merged_object_ids': [], 'names': ['man'], 'w': 57, 'y': 254, 'x': 238}, {'synsets': [], 'h': 430, 'object_id': 1058548, 'merged_object_ids': [], 'names': ['lamp post'], 'w': 43, 'y': 63, 'x': 537}, {'synsets': ['sign.n.02'], 'h': 179, 'object_id': 1058507, 'merged_object_ids': [], 'names': ['sign'], 'w': 78, 'y': 13, 'x': 123}, {'synsets': ['car.n.01'], 'h': 164, 'object_id': 1058515, 'merged_object_ids': [], 'names': ['car'], 'w': 80, 'y': 342, 'x': 719}, {'synsets': ['back.n.01'], 'h': 164, 'object_id': 5060, 'merged_object_ids': [], 'names': ['back'], 'w': 70, 'y': 345, 'x': 716}, {'synsets': ['jacket.n.01'], 'h': 98, 'object_id': 1058530, 'merged_object_ids': [], 'names': ['jacket'], 'w': 82, 'y': 296, 'x': 367}, {'synsets': ['car.n.01'], 'h': 95, 'object_id': 5049, 'merged_object_ids': [], 'names': ['car'], 'w': 78, 'y': 319, 'x': 478}, {'synsets': ['trouser.n.01'], 'h': 128, 'object_id': 1058531, 'merged_object_ids': [], 'names': ['pants'], 'w': 48, 'y': 369, 'x': 388}, {'synsets': ['shirt.n.01'], 'h': 103, 'object_id': 1058511, 'merged_object_ids': [], 'names': ['shirt'], 'w': 54, 'y': 287, 'x': 241}, {'synsets': ['parking_meter.n.01'], 'h': 143, 'object_id': 1058519, 'merged_object_ids': [], 'names': ['parking meter'], 'w': 26, 'y': 325, 'x': 577}, {'synsets': ['trouser.n.01'], 'h': 118, 'object_id': 1058528, 'merged_object_ids': [], 'names': ['pants'], 'w': 44, 'y': 384, 'x': 245}, {'synsets': ['shirt.n.01'], 'h': 102, 'object_id': 1058547, 'merged_object_ids': [], 'names': ['shirt'], 'w': 82, 'y': 295, 'x': 368}, {'synsets': ['shoe.n.01'], 'h': 28, 'object_id': 1058525, 'merged_object_ids': [5048], 'names': ['shoes'], 'w': 48, 'y': 485, 'x': 388}, {'synsets': ['arm.n.01'], 'h': 41, 'object_id': 1058546, 'merged_object_ids': [], 'names': ['arm'], 'w': 30, 'y': 285, 'x': 370}, {'synsets': ['bicycle.n.01'], 'h': 36, 'object_id': 1058535, 'merged_object_ids': [], 'names': ['bike'], 'w': 27, 'y': 319, 'x': 337}, {'synsets': ['bicycle.n.01'], 'h': 41, 'object_id': 5051, 'merged_object_ids': [], 'names': ['bike'], 'w': 27, 'y': 311, 'x': 321}, {'synsets': ['headlight.n.01'], 'h': 9, 'object_id': 5050, 'merged_object_ids': [], 'names': ['headlight'], 'w': 18, 'y': 370, 'x': 517}, {'synsets': ['spectacles.n.01'], 'h': 23, 'object_id': 1058518, 'merged_object_ids': [], 'names': ['glasses'], 'w': 43, 'y': 317, 'x': 448}, {'synsets': ['chin.n.01'], 'h': 8, 'object_id': 1058541, 'merged_object_ids': [], 'names': ['chin'], 'w': 9, 'y': 288, 'x': 401}], 'image_url': 'https://cs.stanford.edu/people/rak248/VG_100K_2/1.jpg'}

Making object vocab from 86128 training images
Found 179 object categories with >= 2000 training instances

进一步过滤物体

在上面读取 image_data.json 这一步骤中，我们实现了从 image id 到 image 的一一对应。现在我们构建完成了 object_name_to_idx （当然中间已经根据物体出现的次数过滤掉了一部分物体），接下来需要根据每个物体的尺寸进一步过滤物体，并实现从 object id 到 object (object 的 name, idx 和 box) 的一一对应。看到代码实现过程，不禁心疼 CPU 一秒钟。

Python 代码：

def filter_objects(min_object_size, objects, aliases, vocab, splits):
	object_id_to_objects = {}
	all_image_ids = set()
	for image_ids in splits.values():
		all_image_ids |= set(image_ids)

	object_name_to_idx = vocab['object_name_to_idx']
	object_id_to_obj = {}

	num_too_small = 0
	for image in objects:
		image_id = image['image_id']
		if image_id not in all_image_ids:
			continue
		for obj in image['objects']:
			object_id = obj['object_id']
			final_name = None
			final_name_idx = None
			for name in obj['names']:
				name = aliases.get(name, name)
				if name in object_name_to_idx:
					final_name = name
					final_name_idx = object_name_to_idx[final_name]
					break
			w, h = obj['w'], obj['h']
			too_small = (w < min_object_size) or (h < min_object_size)
			if too_small:
				num_too_small += 1
			if final_name is not None and not too_small:
				object_id_to_obj[object_id] = {
				'name': final_name,
				'name_idx': final_name_idx,
				'box': [obj['x'], obj['y'], obj['w'], obj['h']],
				}
	print('Skipped %d objects with size < %d' % (num_too_small, min_object_size))
	return object_id_to_obj

min_object_size = 32
object_id_to_obj = filter_objects(min_object_size, objects, obj_aliases, vocab, splits)

输出：

Skipped 997213 objects with size < 32

attributes.json

读取 attributes.json 文件，并构建字典 attribute_name_to_idx 实现属性名称和编号的一一对应。当然，可以预先设置一些过滤器过滤掉出现次数较少的 attribute，这能让训练出来的模型表现效果更好。该处理过程与处理 objects.json 类似。

Python 代码：

def create_attribute_vocab(min_attribute_instances, image_ids, attributes, vocab):
	image_ids = set(image_ids)
	print('Making attribute vocab from %d training images' % len(image_ids))
	attribute_name_counter = Counter()
	for image in attributes:
		if image['image_id'] not in image_ids:
			continue
		for attribute in image['attributes']:
			names = set()
			##这里用 try ... except 是因为图片中的有些物体并没有 attributes，
			##如果不用这个语法就会报 KeyError 的错。
			try:
				for name in attribute['attributes']:
					names.add(name)
				attribute_name_counter.update(names)
			except KeyError:
				pass
	attribute_names = []
	for name, count in attribute_name_counter.most_common():
		if count >= min_attribute_instances:
			attribute_names.append(name)
	print('Found %d attribute categories with >= %d training instances' %
		(len(attribute_names), min_attribute_instances))

	attribute_name_to_idx = {}
	attribute_idx_to_name = []
	for idx, name in enumerate(attribute_names):
		attribute_name_to_idx[name] = idx
		attribute_idx_to_name.append(name)
	vocab['attribute_name_to_idx'] = attribute_name_to_idx
	vocab['attribute_idx_to_name'] = attribute_idx_to_name

##attributes.json 的默认路径为 './attributes.json'，可根据实际情况修改。
with open('attributes.json', 'r') as f:
	attributes = json.load(f)
print('type of attributes json', type(attributes), len(attributes))
##输出一个例子来看看
print(attributes[0])

min_attribute_instances = 2000
create_attribute_vocab(min_attribute_instances, train_ids, attributes, vocab)

输出：

type of attributes json <class 'list'> 108077

##这个例子很长很长很长。。。
{'image_id': 1, 'attributes': [{'synsets': ['clock.n.01'], 'h': 339, 'object_id': 1058498, 'names': ['clock'], 'w': 79, 'attributes': ['green', 'tall'], 'y': 91, 'x': 421}, {'synsets': ['street.n.01'], 'h': 262, 'object_id': 5046, 'names': ['street'], 'w': 714, 'attributes': ['sidewalk'], 'y': 328, 'x': 77}, {'synsets': ['shade.n.01'], 'h': 192, 'object_id': 5045, 'names': ['shade'], 'w': 274, 'y': 338, 'x': 119}, {'synsets': ['man.n.01'], 'h': 262, 'object_id': 1058529, 'names': ['man'], 'w': 60, 'y': 249, 'x': 238}, {'synsets': ['gym_shoe.n.01'], 'h': 26, 'object_id': 5048, 'names': ['sneakers'], 'w': 52, 'attributes': ['grey'], 'y': 489, 'x': 243}, {'synsets': ['headlight.n.01'], 'h': 15, 'object_id': 5050, 'names': ['headlight'], 'w': 23, 'attributes': ['off'], 'y': 366, 'x': 514}, {'synsets': ['car.n.01'], 'h': 98, 'object_id': 5049, 'names': ['car'], 'w': 74, 'y': 315, 'x': 479}, {'synsets': ['bicycle.n.01'], 'h': 34, 'object_id': 5051, 'names': ['bike'], 'w': 28, 'attributes': ['parked', 'far away'], 'y': 319, 'x': 318}, {'synsets': ['bicycle.n.01'], 'h': 35, 'object_id': 1058535, 'names': ['bike'], 'w': 29, 'attributes': ['parked', 'far away', 'chained'], 'y': 319, 'x': 334}, {'synsets': ['sign.n.02'], 'h': 182, 'object_id': 1058507, 'names': ['sign'], 'w': 88, 'attributes': ['black'], 'y': 13, 'x': 118}, {'synsets': ['building.n.01'], 'h': 536, 'object_id': 1058508, 'names': ['building'], 'w': 218, 'attributes': ['tall', 'brick', 'made of bricks'], 'y': 2, 'x': 1}, {'synsets': ['trunk.n.01'], 'h': 327, 'object_id': 5055, 'names': ['tree trunk'], 'w': 87, 'y': 234, 'x': 622}, {'synsets': ['sidewalk.n.01'], 'h': 266, 'object_id': 1058534, 'names': ['sidewalk'], 'w': 722, 'attributes': ['brick'], 'y': 331, 'x': 77}, {'synsets': ['shirt.n.01'], 'h': 101, 'object_id': 1058511, 'names': ['shirt'], 'w': 59, 'attributes': ['red', 'orange'], 'y': 289, 'x': 241}, {'synsets': ['street.n.01'], 'h': 233, 'object_id': 1058539, 'names': ['street'], 'w': 440, 'attributes': ['clean'], 'y': 283, 'x': 358}, {'synsets': ['car.n.01'], 'h': 174, 'object_id': 1058515, 'names': ['car'], 'w': 91, 'attributes': ['white', 'parked'], 'y': 342, 'x': 708}, {'synsets': ['back.n.01'], 'h': 170, 'object_id': 5060, 'names': ['back'], 'w': 67, 'y': 339, 'x': 721}, {'synsets': ['spectacles.n.01'], 'h': 12, 'object_id': 1058518, 'names': ['glasses'], 'w': 20, 'y': 268, 'x': 271}, {'synsets': ['parking_meter.n.01'], 'h': 143, 'object_id': 1058519, 'names': ['parking meter'], 'w': 32, 'attributes': ['orange'], 'y': 327, 'x': 574}, {'synsets': ['shoe.n.01'], 'h': 34, 'object_id': 1058525, 'names': ['shoes'], 'w': 46, 'attributes': ['brown'], 'y': 481, 'x': 391}, {'synsets': ['man.n.01'], 'h': 251, 'object_id': 1058532, 'names': ['man'], 'w': 75, 'y': 264, 'x': 372}, {'synsets': ['trouser.n.01'], 'h': 118, 'object_id': 1058528, 'names': ['pants'], 'w': 38, 'attributes': ['black'], 'y': 384, 'x': 245}, {'synsets': ['jacket.n.01'], 'h': 97, 'object_id': 1058530, 'names': ['jacket'], 'w': 89, 'attributes': ['gray', 'grey'], 'y': 296, 'x': 356}, {'synsets': ['trouser.n.01'], 'h': 128, 'object_id': 1058531, 'names': ['pants'], 'w': 54, 'attributes': ['gray', 'grey'], 'y': 369, 'x': 382}, {'synsets': [], 'h': 185, 'object_id': 1058536, 'names': ['work truck'], 'w': 265, 'attributes': ['white'], 'y': 271, 'x': 521}, {'synsets': ['sidewalk.n.01'], 'h': 189, 'object_id': 3798575, 'names': ['sidewalk'], 'w': 50, 'y': 318, 'x': 343}, {'synsets': ['chin.n.01'], 'h': 9, 'object_id': 1058541, 'names': ['chin'], 'w': 11, 'attributes': ['raised'], 'y': 288, 'x': 399}, {'synsets': ['guy.n.01'], 'h': 250, 'object_id': 1058540, 'names': ['guy'], 'w': 82, 'y': 264, 'x': 369}, {'synsets': ['van.n.05'], 'h': 134, 'object_id': 1058542, 'names': ['van'], 'w': 233, 'attributes': ['parked', 'white'], 'y': 298, 'x': 529}, {'synsets': ['wall.n.01'], 'h': 533, 'object_id': 1058543, 'names': ['wall'], 'w': 134, 'attributes': ['grey'], 'y': 1, 'x': 0}, {'synsets': ['tree.n.01'], 'h': 360, 'object_id': 1058545, 'names': ['tree'], 'w': 176, 'y': 0, 'x': 249}, {'synsets': ['bicycle.n.01'], 'h': 35, 'object_id': 1058544, 'names': ['bikes'], 'w': 40, 'y': 319, 'x': 321}, {'synsets': ['arm.n.01'], 'h': 43, 'object_id': 1058546, 'names': ['arm'], 'w': 32, 'attributes': ['raised'], 'y': 283, 'x': 368}, {'synsets': ['shirt.n.01'], 'h': 66, 'object_id': 1058547, 'names': ['shirt'], 'w': 37, 'attributes': ['grey'], 'y': 306, 'x': 384}, {'synsets': ['man.n.01'], 'h': 248, 'object_id': 3798576, 'names': ['man'], 'w': 97, 'y': 264, 'x': 362}, {'synsets': ['man.n.01'], 'h': 264, 'object_id': 3798577, 'names': ['man'], 'w': 72, 'y': 251, 'x': 230}, {'synsets': ['road.n.01'], 'h': 218, 'object_id': 3798578, 'names': ['road'], 'w': 340, 'y': 295, 'x': 435}, {'synsets': [], 'h': 430, 'object_id': 1058548, 'names': ['lamp post'], 'w': 41, 'y': 63, 'x': 537}, {'synsets': ['tree.n.01'], 'h': 557, 'object_id': 1058549, 'names': ['trees'], 'w': 606, 'attributes': ['sparse'], 'y': 0, 'x': 190}, {'synsets': ['window.n.01'], 'h': 148, 'object_id': 3798579, 'names': ['windows'], 'w': 173, 'y': 4, 'x': 602}]}

Making attribute vocab from 86128 training images
Found 80 attribute categories with >= 2000 training instances

relationships.json

读取 relationships.json 文件，并构建字典 pred_name_to_idx 实现关系名称和编号的一一对应。当然，可以预先设置一些过滤器过滤掉出现次数较少的 relationship，这能让训练出来的模型表现效果更好。

Python 代码：

def create_rel_vocab(min_relationship_instances, image_ids, relationships, 
                     object_id_to_obj, rel_aliases, vocab):
	pred_counter = defaultdict(int)
	image_ids_set = set(image_ids)
	for image in relationships:
		image_id = image['image_id']
		if image_id not in image_ids_set:
			continue
		for rel in image['relationships']:
			sid = rel['subject']['object_id']
			oid = rel['object']['object_id']
			found_subject = sid in object_id_to_obj
			found_object = oid in object_id_to_obj
			if not found_subject or not found_object:
				continue
			pred = rel['predicate'].lower().strip()
			pred = rel_aliases.get(pred, pred)
			rel['predicate'] = pred
			pred_counter[pred] += 1

	pred_names = ['__in_image__']
	for pred, count in pred_counter.items():
		if count >= min_relationship_instances:
			pred_names.append(pred)
	print('Found %d relationship types with >= %d training instances'
		% (len(pred_names), min_relationship_instances))

	pred_name_to_idx = {}
	pred_idx_to_name = []
	for idx, name in enumerate(pred_names):
		pred_name_to_idx[name] = idx
		pred_idx_to_name.append(name)

	vocab['pred_name_to_idx'] = pred_name_to_idx
	vocab['pred_idx_to_name'] = pred_idx_to_name

##relationships.json 的默认路径是 './relationships.json',可根据实际情况修改。
with open('relationships.json', 'r') as f:
	relationships = json.load(f)
print('type of relationships json', type(relationships), len(relationships))
##输出一个示例看看
print(relationships[0])
##注意下面函数所用的参数在之前的步骤中都已经生成了。
min_relationship_instances = 500
create_rel_vocab(min_relationship_instances, train_ids, relationships,
	             object_id_to_obj, rel_aliases, vocab)

输出：

type of relationships json <class 'list'> 108077

##又是一个很长很长的示例。。。。
{'relationships': [{'predicate': 'ON', 'object': {'h': 290, 'object_id': 1058534, 'merged_object_ids': [5046], 'synsets': ['sidewalk.n.01'], 'w': 722, 'y': 308, 'x': 78, 'names': ['sidewalk']}, 'relationship_id': 15927, 'synsets': ['along.r.01'], 'subject': {'name': 'shade', 'h': 192, 'synsets': ['shade.n.01'], 'object_id': 5045, 'w': 274, 'y': 338, 'x': 119}}, {'predicate': 'wears', 'object': {'h': 28, 'object_id': 1058525, 'merged_object_ids': [5048], 'synsets': ['shoe.n.01'], 'w': 48, 'y': 485, 'x': 388, 'names': ['shoes']}, 'relationship_id': 15928, 'synsets': ['wear.v.01'], 'subject': {'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {'predicate': 'has', 'object': {'name': 'headlight', 'h': 15, 'synsets': ['headlight.n.01'], 'object_id': 5050, 'w': 23, 'y': 366, 'x': 514}, 'relationship_id': 15929, 'synsets': ['have.v.01'], 'subject': {'name': 'car', 'h': 98, 'synsets': ['car.n.01'], 'object_id': 5049, 'w': 74, 'y': 315, 'x': 479}}, {'predicate': 'ON', 'object': {'name': 'building', 'h': 536, 'synsets': ['building.n.01'], 'object_id': 1058508, 'w': 218, 'y': 2, 'x': 1}, 'relationship_id': 15930, 'synsets': ['along.r.01'], 'subject': {'name': 'sign', 'h': 182, 'synsets': ['sign.n.02'], 'object_id': 1058507, 'w': 88, 'y': 13, 'x': 118}}, {'predicate': 'ON', 'object': {'name': 'sidewalk', 'h': 266, 'synsets': ['sidewalk.n.01'], 'object_id': 1058534, 'w': 722, 'y': 331, 'x': 77}, 'relationship_id': 15931, 'synsets': ['along.r.01'], 'subject': {'name': 'tree trunk', 'h': 327, 'synsets': ['trunk.n.01'], 'object_id': 5055, 'w': 87, 'y': 234, 'x': 622}}, {'predicate': 'has', 'object': {'name': 'shirt', 'h': 101, 'synsets': ['shirt.n.01'], 'object_id': 1058511, 'w': 59, 'y': 289, 'x': 241}, 'relationship_id': 15932, 'synsets': ['have.v.01'], 'subject': {'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {'predicate': 'next to', 'object': {'name': 'street', 'h': 233, 'synsets': ['street.n.01'], 'object_id': 1058539, 'w': 440, 'y': 283, 'x': 358}, 'relationship_id': 15933, 'synsets': ['next.r.01'], 'subject': {'name': 'sidewalk', 'h': 266, 'synsets': ['sidewalk.n.01'], 'object_id': 1058534, 'w': 722, 'y': 331, 'x': 77}}, {'predicate': 'has', 'object': {'name': 'back', 'h': 170, 'synsets': ['back.n.01'], 'object_id': 5060, 'w': 67, 'y': 339, 'x': 721}, 'relationship_id': 15934, 'synsets': ['have.v.01'], 'subject': {'name': 'car', 'h': 174, 'synsets': ['car.n.01'], 'object_id': 1058515, 'w': 91, 'y': 342, 'x': 708}}, {'predicate': 'has', 'object': {'name': 'glasses', 'h': 12, 'synsets': ['spectacles.n.01'], 'object_id': 1058518, 'w': 20, 'y': 268, 'x': 271}, 'relationship_id': 15935, 'synsets': ['have.v.01'], 'subject': {'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {'predicate': 'ON', 'object': {'name': 'sidewalk', 'h': 266, 'synsets': ['sidewalk.n.01'], 'object_id': 1058534, 'w': 722, 'y': 331, 'x': 77}, 'relationship_id': 15936, 'synsets': ['along.r.01'], 'subject': {'name': 'parking meter', 'h': 143, 'synsets': ['parking_meter.n.01'], 'object_id': 1058519, 'w': 32, 'y': 327, 'x': 574}}, {'predicate': 'wears', 'object': {'h': 28, 'object_id': 1058525, 'merged_object_ids': [5048], 'synsets': ['shoe.n.01'], 'w': 48, 'y': 485, 'x': 388, 'names': ['shoes']}, 'relationship_id': 15937, 'synsets': ['wear.v.01'], 'subject': {'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {'predicate': 'has', 'object': {'name': 'shoes', 'h': 34, 'synsets': ['shoe.n.01'], 'object_id': 1058525, 'w': 46, 'y': 481, 'x': 391}, 'relationship_id': 15938, 'synsets': ['have.v.01'], 'subject': {'name': 'man', 'h': 251, 'synsets': ['man.n.01'], 'object_id': 1058532, 'w': 75, 'y': 264, 'x': 372}}, {'predicate': 'has', 'object': {'name': 'shirt', 'h': 101, 'synsets': ['shirt.n.01'], 'object_id': 1058511, 'w': 59, 'y': 289, 'x': 241}, 'relationship_id': 15939, 'synsets': ['have.v.01'], 'subject': {'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {'predicate': 'wears', 'object': {'name': 'pants', 'h': 118, 'synsets': ['trouser.n.01'], 'object_id': 1058528, 'w': 38, 'y': 384, 'x': 245}, 'relationship_id': 15940, 'synsets': ['wear.v.01'], 'subject': {'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {'predicate': 'has', 'object': {'name': 'jacket', 'h': 97, 'synsets': ['jacket.n.01'], 'object_id': 1058530, 'w': 89, 'y': 296, 'x': 356}, 'relationship_id': 15941, 'synsets': ['have.v.01'], 'subject': {'name': 'man', 'h': 251, 'synsets': ['man.n.01'], 'object_id': 1058532, 'w': 75, 'y': 264, 'x': 372}}, {'predicate': 'has', 'object': {'name': 'pants', 'h': 128, 'synsets': ['trouser.n.01'], 'object_id': 1058531, 'w': 54, 'y': 369, 'x': 382}, 'relationship_id': 15942, 'synsets': ['have.v.01'], 'subject': {'name': 'man', 'h': 251, 'synsets': ['man.n.01'], 'object_id': 1058532, 'w': 75, 'y': 264, 'x': 372}}, {'predicate': 'parked on', 'object': {'name': 'sidewalk', 'h': 266, 'synsets': ['sidewalk.n.01'], 'object_id': 1058534, 'w': 722, 'y': 331, 'x': 77}, 'relationship_id': 15943, 'synsets': ['along.r.01'], 'subject': {'name': 'bike', 'h': 34, 'synsets': ['bicycle.n.01'], 'object_id': 5051, 'w': 28, 'y': 319, 'x': 318}}, {'predicate': 'parked on', 'object': {'name': 'sidewalk', 'h': 266, 'synsets': ['sidewalk.n.01'], 'object_id': 1058534, 'w': 722, 'y': 331, 'x': 77}, 'relationship_id': 15944, 'synsets': ['along.r.01'], 'subject': {'name': 'bike', 'h': 35, 'synsets': ['bicycle.n.01'], 'object_id': 1058535, 'w': 29, 'y': 319, 'x': 334}}, {'predicate': 'parked on', 'object': {'name': 'street', 'h': 233, 'synsets': ['street.n.01'], 'object_id': 1058539, 'w': 440, 'y': 283, 'x': 358}, 'relationship_id': 15945, 'synsets': ['along.r.01'], 'subject': {'h': 176, 'object_id': 1058542, 'merged_object_ids': [1058536], 'synsets': ['van.n.05'], 'w': 241, 'y': 278, 'x': 533, 'names': ['van']}}, {'predicate': 'parked on', 'object': {'name': 'street', 'h': 233, 'synsets': ['street.n.01'], 'object_id': 1058539, 'w': 440, 'y': 283, 'x': 358}, 'relationship_id': 15946, 'synsets': ['along.r.01'], 'subject': {'name': 'car', 'h': 174, 'synsets': ['car.n.01'], 'object_id': 1058515, 'w': 91, 'y': 342, 'x': 708}}, {'predicate': 'ON', 'object': {'name': 'sidewalk', 'h': 189, 'synsets': ['sidewalk.n.01'], 'object_id': 3798575, 'w': 50, 'y': 318, 'x': 343}, 'relationship_id': 4265923, 'synsets': ['along.r.01'], 'subject': {'name': 'bike', 'h': 35, 'synsets': ['bicycle.n.01'], 'object_id': 1058535, 'w': 29, 'y': 319, 'x': 334}}, {'predicate': 'behind', 'object': {'name': 'man', 'h': 251, 'synsets': ['man.n.01'], 'object_id': 1058532, 'w': 75, 'y': 264, 'x': 372}, 'relationship_id': 3186256, 'synsets': ['behind.r.01'], 'subject': {'name': 'parking meter', 'h': 143, 'synsets': ['parking_meter.n.01'], 'object_id': 1058519, 'w': 32, 'y': 327, 'x': 574}}, {'predicate': 'holding', 'object': {'name': 'chin', 'h': 9, 'synsets': ['chin.n.01'], 'object_id': 1058541, 'w': 11, 'y': 288, 'x': 399}, 'relationship_id': 3186257, 'synsets': ['have.v.01'], 'subject': {'h': 248, 'object_id': 3798576, 'merged_object_ids': [1058540], 'synsets': ['man.n.01'], 'w': 82, 'y': 264, 'x': 367, 'names': ['man']}}, {'predicate': 'WEARING', 'object': {'name': 'shirt', 'h': 101, 'synsets': ['shirt.n.01'], 'object_id': 1058511, 'w': 59, 'y': 289, 'x': 241}, 'relationship_id': 3186258, 'synsets': ['wear.v.01'], 'subject': {'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {'predicate': 'holding', 'object': {'name': 'chin', 'h': 9, 'synsets': ['chin.n.01'], 'object_id': 1058541, 'w': 11, 'y': 288, 'x': 399}, 'relationship_id': 3186259, 'synsets': ['have.v.01'], 'subject': {'name': 'man', 'h': 251, 'synsets': ['man.n.01'], 'object_id': 1058532, 'w': 75, 'y': 264, 'x': 372}}, {'predicate': 'near', 'object': {'name': 'tree', 'h': 360, 'synsets': ['tree.n.01'], 'object_id': 1058545, 'w': 176, 'y': 0, 'x': 249}, 'relationship_id': 3186260, 'synsets': ['about.r.07'], 'subject': {'name': 'bikes', 'h': 35, 'synsets': ['bicycle.n.01'], 'object_id': 1058544, 'w': 40, 'y': 319, 'x': 321}}, {'predicate': 'WEARING', 'object': {'name': 'shoes', 'h': 34, 'synsets': ['shoe.n.01'], 'object_id': 1058525, 'w': 46, 'y': 481, 'x': 391}, 'relationship_id': 3186261, 'synsets': ['wear.v.01'], 'subject': {'name': 'man', 'h': 251, 'synsets': ['man.n.01'], 'object_id': 1058532, 'w': 75, 'y': 264, 'x': 372}}, {'predicate': 'near', 'object': {'name': 'tree', 'h': 360, 'synsets': ['tree.n.01'], 'object_id': 1058545, 'w': 176, 'y': 0, 'x': 249}, 'relationship_id': 3186262, 'synsets': ['about.r.07'], 'subject': {'name': 'bikes', 'h': 35, 'synsets': ['bicycle.n.01'], 'object_id': 1058544, 'w': 40, 'y': 319, 'x': 321}}, {'predicate': 'ON', 'object': {'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}, 'relationship_id': 3186263, 'synsets': ['along.r.01'], 'subject': {'name': 'shirt', 'h': 101, 'synsets': ['shirt.n.01'], 'object_id': 1058511, 'w': 59, 'y': 289, 'x': 241}}, {'predicate': 'holding', 'object': {'name': 'chin', 'h': 9, 'synsets': ['chin.n.01'], 'object_id': 1058541, 'w': 11, 'y': 288, 'x': 399}, 'relationship_id': 4265924, 'synsets': ['have.v.01'], 'subject': {'name': 'man', 'h': 248, 'synsets': ['man.n.01'], 'object_id': 3798576, 'w': 97, 'y': 264, 'x': 362}}, {'predicate': 'WEARING', 'object': {'name': 'glasses', 'h': 12, 'synsets': ['spectacles.n.01'], 'object_id': 1058518, 'w': 20, 'y': 268, 'x': 271}, 'relationship_id': 4265925, 'synsets': ['wear.v.01'], 'subject': {'name': 'man', 'h': 264, 'synsets': ['man.n.01'], 'object_id': 3798577, 'w': 72, 'y': 251, 'x': 230}}, {'predicate': 'along', 'object': {'h': 258, 'object_id': 1058539, 'merged_object_ids': [3798578], 'synsets': ['street.n.01'], 'w': 359, 'y': 283, 'x': 439, 'names': ['street']}, 'relationship_id': 4265926, 'synsets': ['along.r.01'], 'subject': {'name': 'lamp post', 'h': 430, 'synsets': [], 'object_id': 1058548, 'w': 41, 'y': 63, 'x': 537}}, {'predicate': 'IN', 'object': {'name': 'shirt', 'h': 101, 'synsets': ['shirt.n.01'], 'object_id': 1058511, 'w': 59, 'y': 289, 'x': 241}, 'relationship_id': 3186264, 'synsets': ['in.r.01'], 'subject': {'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {'predicate': 'WEARING', 'object': {'name': 'pants', 'h': 118, 'synsets': ['trouser.n.01'], 'object_id': 1058528, 'w': 38, 'y': 384, 'x': 245}, 'relationship_id': 3186265, 'synsets': ['wear.v.01'], 'subject': {'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {'predicate': 'on top of', 'object': {'name': 'street', 'h': 233, 'synsets': ['street.n.01'], 'object_id': 1058539, 'w': 440, 'y': 283, 'x': 358}, 'relationship_id': 3186266, 'synsets': ['along.r.01'], 'subject': {'name': 'parking meter', 'h': 143, 'synsets': ['parking_meter.n.01'], 'object_id': 1058519, 'w': 32, 'y': 327, 'x': 574}}, {'predicate': 'next to', 'object': {'name': 'street', 'h': 233, 'synsets': ['street.n.01'], 'object_id': 1058539, 'w': 440, 'y': 283, 'x': 358}, 'relationship_id': 3186267, 'synsets': ['next.r.01'], 'subject': {'name': 'tree', 'h': 360, 'synsets': ['tree.n.01'], 'object_id': 1058545, 'w': 176, 'y': 0, 'x': 249}}, {'predicate': 'WEARING', 'object': {'name': 'glasses', 'h': 12, 'synsets': ['spectacles.n.01'], 'object_id': 1058518, 'w': 20, 'y': 268, 'x': 271}, 'relationship_id': 3186268, 'synsets': ['wear.v.01'], 'subject': {'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {'predicate': 'behind', 'object': {'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}, 'relationship_id': 3186269, 'synsets': ['behind.r.01'], 'subject': {'name': 'bikes', 'h': 35, 'synsets': ['bicycle.n.01'], 'object_id': 1058544, 'w': 40, 'y': 319, 'x': 321}}, {'predicate': 'by', 'object': {'name': 'sidewalk', 'h': 266, 'synsets': ['sidewalk.n.01'], 'object_id': 1058534, 'w': 722, 'y': 331, 'x': 77}, 'relationship_id': 3186270, 'synsets': ['by.r.01'], 'subject': {'name': 'trees', 'h': 557, 'synsets': ['tree.n.01'], 'object_id': 1058549, 'w': 606, 'y': 0, 'x': 190}}, {'predicate': 'WEARING', 'object': {'name': 'jacket', 'h': 97, 'synsets': ['jacket.n.01'], 'object_id': 1058530, 'w': 89, 'y': 296, 'x': 356}, 'relationship_id': 3186271, 'synsets': ['wear.v.01'], 'subject': {'name': 'man', 'h': 251, 'synsets': ['man.n.01'], 'object_id': 1058532, 'w': 75, 'y': 264, 'x': 372}}, {'predicate': 'with', 'object': {'name': 'windows', 'h': 148, 'synsets': ['window.n.01'], 'object_id': 3798579, 'w': 173, 'y': 4, 'x': 602}, 'relationship_id': 4265927, 'synsets': [], 'subject': {'name': 'building', 'h': 536, 'synsets': ['building.n.01'], 'object_id': 1058508, 'w': 218, 'y': 2, 'x': 1}}], 'image_id': 1}

Found 46 relationship types with >= 500 training instances

综合处理所有图片的信息

到目前为止，我们得到了图片 id 到图片的一一对应 (image_id_to_image)，物体的名字到编号的一一对应 (object_name_to_idx)，描述的名字到编号的一一对应 (attribute_name_to_idx)，关系的名字到编号的一一对应 (pred_name_to_idx)，甚至在心疼 CPU 的同时得到了物体的 id 到物体相关信息（‘name’, ‘name_idx’, ‘box’）的一一对应 (object_id_to_obj)。当然了，在得到这些信息的过程中我们都添加了各种各样的过滤器用来筛选出符合我们标准的信息。接下来，激动人心的时刻就要来临了（为什么我的脑海中浮现了凯南的台词），我们需要综合处理我们所得到的所有信息，实现每一张图片信息的标准化，此时，需要心疼 CPU 2秒钟。

Python 代码：

import argparse, json, os
from collections import Counter, defaultdict
import numpy as np
parser = argparse.ArgumentParser()

parser.add_argument('--min_objects_per_image', default=3, type=int)
parser.add_argument('--max_objects_per_image', default=30, type=int)
parser.add_argument('--max_attributes_per_image', default=30, type=int)
parser.add_argument('--min_relationships_per_image', default=1, type=int)
parser.add_argument('--max_relationships_per_image', default=30, type=int)

def encode_graphs(args, splits, objects, relationships, vocab,
                  object_id_to_obj, attributes):

	image_id_to_objects = {}
	for image in objects:
		image_id = image['image_id']
		image_id_to_objects[image_id] = image['objects']
	image_id_to_relationships = {}
	for image in relationships:
		image_id = image['image_id']
		image_id_to_relationships[image_id] = image['relationships']
	image_id_to_attributes = {}
	for image in attributes:
		image_id = image['image_id']
		image_id_to_attributes[image_id] = image['attributes']

	numpy_arrays = {}
	for split, image_ids in splits.items():
		skip_stats = defaultdict(int)
		# We need to filter *again* based on number of objects and relationships
		final_image_ids = []
		object_ids = []
		object_names = []
		object_boxes = []
		objects_per_image = []
		relationship_ids = []
		relationship_subjects = []
		relationship_predicates = []
		relationship_objects = []
		relationships_per_image = []
		attribute_ids = []
		attributes_per_object = []
		object_attributes = []
		for image_id in image_ids:
			image_object_ids = []
			image_object_names = []
			image_object_boxes = []
			object_id_to_idx = {}
			for obj in image_id_to_objects[image_id]:
				object_id = obj['object_id']
				if object_id not in object_id_to_obj:
					continue
				obj = object_id_to_obj[object_id]
				object_id_to_idx[object_id] = len(image_object_ids)
				image_object_ids.append(object_id)
				image_object_names.append(obj['name_idx'])
				image_object_boxes.append(obj['box'])
			num_objects = len(image_object_ids)
			too_few = num_objects < args.min_objects_per_image
			too_many = num_objects > args.max_objects_per_image
			if too_few:
				skip_stats['too_few_objects'] += 1
				continue
			if too_many:
				skip_stats['too_many_objects'] += 1
				continue
			image_rel_ids = []
			image_rel_subs = []
			image_rel_preds = []
			image_rel_objs = []
			for rel in image_id_to_relationships[image_id]:
				relationship_id = rel['relationship_id']
				pred = rel['predicate']
				pred_idx = vocab['pred_name_to_idx'].get(pred, None)
				if pred_idx is None:
					continue
				sid = rel['subject']['object_id']
				sidx = object_id_to_idx.get(sid, None)
				oid = rel['object']['object_id']
				oidx = object_id_to_idx.get(oid, None)
				if sidx is None or oidx is None:
					continue
				image_rel_ids.append(relationship_id)
				image_rel_subs.append(sidx)
				image_rel_preds.append(pred_idx)
				image_rel_objs.append(oidx)
			num_relationships = len(image_rel_ids)
			too_few = num_relationships < args.min_relationships_per_image
			too_many = num_relationships > args.max_relationships_per_image
			if too_few:
				skip_stats['too_few_relationships'] += 1
				continue
			if too_many:
				skip_stats['too_many_relationships'] += 1
				continue

			obj_id_to_attributes = {}
			num_attributes = []
			for obj_attribute in image_id_to_attributes[image_id]:
				obj_id_to_attributes[obj_attribute['object_id']] = obj_attribute.get('attributes', None)
			for object_id in image_object_ids:
				attributes = obj_id_to_attributes.get(object_id, None)
				if attributes is None:
					object_attributes.append([-1] * args.max_attributes_per_image)
					num_attributes.append(0)
				else:
					attribute_ids = []
					for attribute in attributes:
						if attribute in vocab['attribute_name_to_idx']:
							attribute_ids.append(vocab['attribute_name_to_idx'][attribute])
						if len(attribute_ids) >= args.max_attributes_per_image:
							break
					num_attributes.append(len(attribute_ids))
					pad_len = args.max_attributes_per_image - len(attribute_ids)
					attribute_ids = attribute_ids + [-1] * pad_len
					object_attributes.append(attribute_ids)

			# Pad object info out to max_objects_per_image
			while len(image_object_ids) < args.max_objects_per_image:
				image_object_ids.append(-1)
				image_object_names.append(-1)
				image_object_boxes.append([-1, -1, -1, -1])
				num_attributes.append(-1)

			# Pad relationship info out to max_relationships_per_image
			while len(image_rel_ids) < args.max_relationships_per_image:
				image_rel_ids.append(-1)
				image_rel_subs.append(-1)
				image_rel_preds.append(-1)
				image_rel_objs.append(-1)

			final_image_ids.append(image_id)
			object_ids.append(image_object_ids)
			object_names.append(image_object_names)
			object_boxes.append(image_object_boxes)
			objects_per_image.append(num_objects)
			relationship_ids.append(image_rel_ids)
			relationship_subjects.append(image_rel_subs)
			relationship_predicates.append(image_rel_preds)
			relationship_objects.append(image_rel_objs)
			relationships_per_image.append(num_relationships)
			attributes_per_object.append(num_attributes)

		print('Skip stats for split "%s"' % split)
		for stat, count in skip_stats.items():
			print(stat, count)
		print()
		numpy_arrays[split] = {
			'image_ids': np.asarray(final_image_ids),
			'object_ids': np.asarray(object_ids),
			'object_names': np.asarray(object_names),
			'object_boxes': np.asarray(object_boxes),
			'objects_per_image': np.asarray(objects_per_image),
			'relationship_ids': np.asarray(relationship_ids),
			'relationship_subjects': np.asarray(relationship_subjects),
			'relationship_predicates': np.asarray(relationship_predicates),
			'relationship_objects': np.asarray(relationship_objects),
			'relationships_per_image': np.asarray(relationships_per_image),
			'attributes_per_object': np.asarray(attributes_per_object),
			'object_attributes': np.asarray(object_attributes),
		}
		for k, v in numpy_arrays[split].items():
			if v.dtype == np.int64:
				numpy_arrays[split][k] = v.astype(np.int32)
	return numpy_arrays

args = parser.parse_args()
numpy_arrays = encode_graphs(args, splits, objects, relationships, vocab,
                               object_id_to_obj, attributes)

##观察对 train 数据集所做的信息综合                               
for key, value in numpy_arrays['train'].items():
	##输出 value 的类型和长度
	print(key, type(value), len(value))
	##输出每个 value 的第一个元素
	print(value[0])

输出：

Skip stats for split "train"
too_few_relationships 16402
too_few_objects 6794
too_many_objects 187
too_many_relationships 180

Skip stats for split "test"
too_few_relationships 4803
too_few_objects 837
too_many_objects 26

Skip stats for split "val"
too_few_objects 853
too_few_relationships 4815
too_many_objects 27
too_many_relationships 4

image_ids <class 'numpy.ndarray'> 62565
1

object_ids <class 'numpy.ndarray'> 62565
[1058549 1058534 1058508 1058539 1058543 1058545 1058498 3798579 3798576
 3798577 1058507 1058515    5060 1058530    5049 1058531 1058511 1058528
 1058547      -1      -1      -1      -1      -1      -1      -1      -1
      -1      -1      -1]

object_names <class 'numpy.ndarray'> 62565
[  2  52   7  60   5   2  95   1   3   3   9  19 134  44  19  32   4  32
   4  -1  -1  -1  -1  -1  -1  -1  -1  -1  -1  -1]

object_boxes <class 'numpy.ndarray'> 62565
[[  0   0 799 557]
 [ 78 308 722 290]
 [  1   0 222 538]
 [439 283 359 258]
 [  0   1 135 535]
 [178   0 476 360]
 [422  63  77 363]
 [602   1 198 147]
 [367 264  82 248]
 [238 254  57 259]
 [123  13  78 179]
 [719 342  80 164]
 [716 345  70 164]
 [367 296  82  98]
 [478 319  78  95]
 [388 369  48 128]
 [241 287  54 103]
 [245 384  44 118]
 [368 295  82 102]
 [ -1  -1  -1  -1]
 [ -1  -1  -1  -1]
 [ -1  -1  -1  -1]
 [ -1  -1  -1  -1]
 [ -1  -1  -1  -1]
 [ -1  -1  -1  -1]
 [ -1  -1  -1  -1]
 [ -1  -1  -1  -1]
 [ -1  -1  -1  -1]
 [ -1  -1  -1  -1]
 [ -1  -1  -1  -1]]

objects_per_image <class 'numpy.ndarray'> 62565
19

relationship_ids <class 'numpy.ndarray'> 62565
[  15930   15933   15934   15946 3186267 3186270 4265927      -1      -1
      -1      -1      -1      -1      -1      -1      -1      -1      -1
      -1      -1      -1      -1      -1      -1      -1      -1      -1
      -1      -1      -1]

relationship_subjects <class 'numpy.ndarray'> 62565
[10  1 11 11  5  0  2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 -1 -1 -1 -1 -1 -1]

relationship_predicates <class 'numpy.ndarray'> 62565
[ 1  2  3  4  2  5  6 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 -1 -1 -1 -1 -1 -1]

relationship_objects <class 'numpy.ndarray'> 62565
[ 2  3 12  3  3  1  7 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 -1 -1 -1 -1 -1 -1]

relationships_per_image <class 'numpy.ndarray'> 62565
7

attributes_per_object <class 'numpy.ndarray'> 62565
[ 0  1  2  0  1  0  2  0  0  0  1  2  0  2  0  2  2  1  1 -1 -1 -1 -1 -1
 -1 -1 -1 -1 -1 -1]

object_attributes <class 'numpy.ndarray'> 606319
[-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 -1 -1 -1 -1 -1 -1]

将综合的信息写到文件中

经过了让人头晕脑胀甚至不明所以的信息综合后，终于可以把得到的信息写到文件中保存起来了。

Pyhton 代码：

import h5py
def get_image_paths(image_id_to_image, image_ids):
	paths = []
	for image_id in image_ids:
		image = image_id_to_image[image_id]
		base, filename = os.path.split(image['url'])
		path = os.path.join(os.path.basename(base), filename)
		paths.append(path)
	return paths

output_h5_dir = './'
output_vocab_json = 'vocab.json'
print('Writing HDF5 output files')
for split_name, split_arrays in numpy_arrays.items():
	image_ids = list(split_arrays['image_ids'].astype(int))
	h5_path = os.path.join(output_h5_dir, '%s.h5' % split_name)
	print('Writing file "%s"' % h5_path)
	with h5py.File(h5_path, 'w') as h5_file:
		for name, ary in split_arrays.items():
			print('Creating datset: ', name, ary.shape, ary.dtype)
			h5_file.create_dataset(name, data=ary)
		print('Writing image paths')
		image_paths = get_image_paths(image_id_to_image, image_ids)
		path_dtype = h5py.special_dtype(vlen=str)
		path_shape = (len(image_paths),)
		path_dset = h5_file.create_dataset('image_paths', path_shape,
                                         dtype=path_dtype)
		for i, p in enumerate(image_paths):
			path_dset[i] = p
	print()

print('Writing vocab to "%s"' % output_vocab_json)
with open(output_vocab_json, 'w') as f:
	json.dump(vocab, f)

输出：

Writing HDF5 output files
Writing file "./train.h5"
Creating datset:  image_ids (62565,) int32
Creating datset:  object_ids (62565, 30) int32
Creating datset:  object_names (62565, 30) int32
Creating datset:  object_boxes (62565, 30, 4) int32
Creating datset:  objects_per_image (62565,) int32
Creating datset:  relationship_ids (62565, 30) int32
Creating datset:  relationship_subjects (62565, 30) int32
Creating datset:  relationship_predicates (62565, 30) int32
Creating datset:  relationship_objects (62565, 30) int32
Creating datset:  relationships_per_image (62565,) int32
Creating datset:  attributes_per_object (62565, 30) int32
Creating datset:  object_attributes (606319, 30) int32
Writing image paths

Writing file "./test.h5"
Creating datset:  image_ids (5096,) int32
Creating datset:  object_ids (5096, 30) int32
Creating datset:  object_names (5096, 30) int32
Creating datset:  object_boxes (5096, 30, 4) int32
Creating datset:  objects_per_image (5096,) int32
Creating datset:  relationship_ids (5096, 30) int32
Creating datset:  relationship_subjects (5096, 30) int32
Creating datset:  relationship_predicates (5096, 30) int32
Creating datset:  relationship_objects (5096, 30) int32
Creating datset:  relationships_per_image (5096,) int32
Creating datset:  attributes_per_object (5096, 30) int32
Creating datset:  object_attributes (51626, 30) int32
Writing image paths

Writing file "./val.h5"
Creating datset:  image_ids (5062,) int32
Creating datset:  object_ids (5062, 30) int32
Creating datset:  object_names (5062, 30) int32
Creating datset:  object_boxes (5062, 30, 4) int32
Creating datset:  objects_per_image (5062,) int32
Creating datset:  relationship_ids (5062, 30) int32
Creating datset:  relationship_subjects (5062, 30) int32
Creating datset:  relationship_predicates (5062, 30) int32
Creating datset:  relationship_objects (5062, 30) int32
Creating datset:  relationships_per_image (5062,) int32
Creating datset:  attributes_per_object (5062, 30) int32
Creating datset:  object_attributes (51090, 30) int32
Writing image paths

Writing vocab to "vocab.json"