踩着坑画bounding-box

最新推荐文章于 2022-05-10 14:53:00 发布

KGzhang

最新推荐文章于 2022-05-10 14:53:00 发布

阅读量2.4k

点赞数

分类专栏：图像处理文章标签： python 数据 bbox

本文链接：https://blog.csdn.net/KGzhang/article/details/75014093

版权

图像处理专栏收录该内容

5 篇文章

订阅专栏

首先介绍一下工作要求：针对nyu_depth_v2_labeled.mat数据，实现用python读取mat文件里面的数据并在mat中图片上画出每个物品的bounding-box。
首先，通过h5_file = h5py.File(“nyu_depth_v2_labeled.mat”)用h5py将mat数据转化为矩阵;

file=scipy.io.loadmat('splits.mat')
##遍历mat文件中所有的items并输出每个items的数值
for name,value in f.items():
    print "Name ", name
    print "Value", value
variables = f.items()
for var in variables:
    name = var[0]
    data = var[1]
    print "Name ", name  # Name
    if type(data) is h5py.Dataset:
        # If DataSet pull the associated Data
        # If not a dataset, you may need to access the element sub-items
        value = data.value
        print "Value", value  # NumPy Array / Value

通过上面的代码可以查看mat文件中存储的所有字段：

annotation:  N=1449,number of images
accelData:4     4*1449 accelerometer values indicated when each frame was taken:contain the roll, yaw, pitch and tilt angle of the device.
depths:1449     in-painted depth maps:HxWxN=480*640*1449     H and W are the height and width
images:1449     RGB images:HxWx3xN=480*640*3*1449
instances:1449  instance maps:HxWxN=480*640*1449 
labels:1449     HxWxN=480*640*1449  range from 1..C where C is the total number of classes. 
                If a pixel’s label value is 0, then that pixel is ‘unlabeled’.
names:1         Cx1 cell array of the english names of each class.    C=894 
namesToIds:     map from english label names to class IDs (with C key-value pairs) 
                1*6 [3707764736，2，1，1，1，1] 
rawDepthFilenames:1    Nx1 cell array of the filenames (in the Raw dataset) 
                       used for each of the depth images in the labeled dataset.
rawDepths:1449  raw depth maps:HxWxN=480*640*1449
                These depth maps capture the depth images after they have been projected onto the RGB image plane but before the missing depth values have been filled in.
                Additionally, the depth non-linearity from the Kinect device has been removed 
                and the values of each depth image are in meters.
rawRgbFilenames:1      Nx1 cell array of the filenames (in the Raw dataset)
                       used for each of the RGB images in the labeled dataset.
sceneTypes:1           Nx1 cell array of the scene type from which each image was taken.
scenes:1               Nx1 cell array of the name of the scene from which each image was taken.

通过labels里面存放的每张图片中所有物品所属的类别标签，来将每种物品用bounding-box框出来

    labels = h5_file['labels']   # 640*480
    images = h5_file['images']   # 640*480
    scenes = [u''.join(unichr(c) for c in h5_file[obj_ref]) for obj_ref in              h5_file['sceneTypes'][0]]

    print("processing images")
    for i, image in enumerate(images):
        print("image", i + 1, "/", len(images))
        draw_box(i, scenes[i], image.T, labels[i, :, :].T)

获取mat数据里面的images、labels和scenes数据，在images里面的图片画框，labels用来获取每张图片中物品的类别，scenes可将框好后图片按照场景类别进行分类保存。
接下来就是画bounding-box的关键代码：
先计算每张图中所有的类别标签，存储到list中：

def draw_box(i, scene, image, label):
    L=[]
    shape = list(label.shape) + [3]
    for j in xrange(shape[0]):
        for k in xrange(shape[1]):
            if (label[j, k]!=0):
                L.append(label[j,k])
    L1=list(set(L))

获取每个像素点所属的类别，0为背景类，不计算在内;由于同类别的像素点很多，所以最后再对整个list进行去重，最后保存在list中的就是没有重复的几个类别（每张图包含的类别都不一样）
接下来就是按照list里面的类别对整张图片进行遍历，找出相同类别的像素块：

    image=image.copy()
    for X in L1: 
        minX = shape[0]
        minY = shape[1]
        maxX = maxY = 0
        for j in xrange(shape[0]):
            for k in xrange(shape[1]):
                if (label[j, k]==X):
                    if (k<minX): minX=k
                    if (k>maxX): maxX=k
                    if (j<minY): minY=j
                    if (j>maxY): maxY=j
        cv2.rectangle(image, (minX, minY), (maxX, maxY), (0, 255, 0), 2)
    imsave("%s/%05d_bounding_box.png" % (folder, i), image)