首先介绍一下工作要求:针对nyu_depth_v2_labeled.mat数据,实现用python读取mat文件里面的数据并在mat中图片上画出每个物品的bounding-box。
首先,通过h5_file = h5py.File(“nyu_depth_v2_labeled.mat”)用h5py将mat数据转化为矩阵;
file=scipy.io.loadmat('splits.mat')
##遍历mat文件中所有的items并输出每个items的数值
for name,value in f.items():
print "Name ", name
print "Value", value
variables = f.items()
for var in variables:
name = var[0]
data = var[1]
print "Name ", name # Name
if type(data) is h5py.Dataset:
# If DataSet pull the associated Data
# If not a dataset, you may need to access the element sub-items
value = data.value
print "Value", value # NumPy Array / Value
通过上面的代码可以查看mat文件中存储的所有字段:
annotation: N=1449,number of images
accelData:4 4*1449 accelerometer values indicated when each frame was taken:contain the roll, yaw, pitch and tilt angle of the device.
depths:1449 in-painted depth maps:HxWxN=480*640*1449 H and W are the height and width
images:1449 RGB images:HxWx3xN=480*640*3*1449
instances:1449 instance maps:HxWxN=480*640*1449
labels:1449 HxWxN=480*640*1449 range from 1..C where C is the total number of classes.
If a pixel’s label value is 0, then that pixel is ‘unlabeled’.
names:1 Cx1 cell array of the english names of each class. C=894
namesToIds: map from english label names to class IDs (with C key-value pairs)
1*6 [3707764736,2,1,1,1,1]
rawDepthFilenames:1 Nx1 cell array of the filenames (in the Raw dataset)
used for each of the depth images in the labeled dataset.
rawDepths:1449 raw depth maps:HxWxN=480*640*1449
These depth maps capture the depth images after they have been projected onto the RGB image plane but before the missing depth values have been filled in.
Additionally, the depth non-linearity from the Kinect device has been removed
and the values of each depth image are in meters.
rawRgbFilenames:1 Nx1 cell array of the filenames (in the Raw dataset)
used for each of the RGB images in the labeled dataset.
sceneTypes:1 Nx1 cell array of the scene type from which each image was taken.
scenes:1 Nx1 cell array of the name of the scene from which each image was taken.
通过labels里面存放的每张图片中所有物品所属的类别标签,来将每种物品用bounding-box框出来
labels = h5_file['labels'] # 640*480
images = h5_file['images'] # 640*480
scenes = [u''.join(unichr(c) for c in h5_file[obj_ref]) for obj_ref in h5_file['sceneTypes'][0]]
print("processing images")
for i, image in enumerate(images):
print("image", i + 1, "/", len(images))
draw_box(i, scenes[i], image.T, labels[i, :, :].T)
获取mat数据里面的images、labels和scenes数据,在images里面的图片画框,labels用来获取每张图片中物品的类别,scenes可将框好后图片按照场景类别进行分类保存。
接下来就是画bounding-box的关键代码:
先计算每张图中所有的类别标签,存储到list中:
def draw_box(i, scene, image, label):
L=[]
shape = list(label.shape) + [3]
for j in xrange(shape[0]):
for k in xrange(shape[1]):
if (label[j, k]!=0):
L.append(label[j,k])
L1=list(set(L))
获取每个像素点所属的类别,0为背景类,不计算在内;由于同类别的像素点很多,所以最后再对整个list进行去重,最后保存在list中的就是没有重复的几个类别(每张图包含的类别都不一样)
接下来就是按照list里面的类别对整张图片进行遍历,找出相同类别的像素块:
image=image.copy()
for X in L1:
minX = shape[0]
minY = shape[1]
maxX = maxY = 0
for j in xrange(shape[0]):
for k in xrange(shape[1]):
if (label[j, k]==X):
if (k<minX): minX=k
if (k>maxX): maxX=k
if (j<minY): minY=j
if (j>maxY): maxY=j
cv2.rectangle(image, (minX, minY), (maxX, maxY), (0, 255, 0), 2)
imsave("%s/%05d_bounding_box.png" % (folder, i), image)
由于opencv和python的接口有一定的bug,不能直接在image上画框,需要先copy一份
注意,python中获取到的labels的shape,shape[0]代表高H,shape[1]代表宽W
在计算bounding-box左上角和右下角的坐标时,要将k与X进行比较,将j与Y进行比较。