OpenMMLab-AI实战营第二期——相关3. RGB语义分割标注图像转为Gray格式的mask

文章目录

0. 方法一（普适性）
1. 方法二（不那么普适）
2. macOS上查看语义mask（使用默认的预览）

0. 方法一（普适性）

0.1 rgb转为单通道语义mask图像

需要自己提前看一下图片中每个类对应的rgb值，自己要清楚哪个类是什么颜色。
（建议用cv2.imshow查看，然后鼠标滑动，查看每个类对应的rgb/bgr值）

import cv2
import matplotlib.pyplot as plt
import numpy as np

mask_path = "/Users/huangshan/Documents/MediWorks/AV_groundTruth/test/av/01_test.png"
train_ratio=0.75

def convertRGB2OneChannel(mask_path):
    """
    送入网络的mask，需要是背景为0，前景为1的那种图像，而不是可以直接看到的rgb图
    """
    mask = cv2.imread(mask_path)
    new_mask = np.zeros(mask.shape[:2])
    # bgr格式 
    new_mask[(mask==[0,0,0]).all(2)]=0 # 背景类 0
    new_mask[(mask==[0,0,255]).all(2)]=1 # 动脉 1
    new_mask[(mask==[255,0,0]).all(2)]=2 # 静脉 2
    new_mask[(mask==[0,255,0]).all(2)]=3 # 重叠 3
    new_mask[(mask==[255,255,255]).all(2)]=4 # 不确定 
    return mask,new_mask


raw_mask,new_mask = convertRGB2OneChannel(mask_path)
plt.figure(figsize=(16,8))
plt.subplot(1,2,1)
plt.imshow(raw_mask)
plt.title("raw mask")
plt.subplot(1, 2, 2)
plt.imshow(new_mask)
plt.title("new mask")

在这里插入图片描述

这里解释一下代码：

mask = cv2.imread(mask_path)[:2,:3,::] 
# 为了输出简单，这里截取y方向2，x方向3的一个矩阵
print((mask == [0, 0, 0]))
# 输出是一个2*3*3的bool矩阵，其中axis=2的3就是深度/三个颜色分量
print((mask == [0, 0, 0]).all(2))
# all(2)指的是axis=2的值要全部相等,所以输出shape是2*3

print((mask == [0, 0, 0]).all(0))
# all(2)指的是axis=0的值要全部相等,所以输出shape是3*3
> 
# mask == [0, 0, 0]
[[[ True  True  True]
  [ True  True  True]
  [ True  True  True]]

 [[ True  True  True]
  [ True  True  True]
  [ True  True  True]]]

# (mask == [0, 0, 0]).all(2)  
[[ True  True  True]
 [ True  True  True]]

# (mask == [0, 0, 0]).all(0)
[[ True  True  True]
 [ True  True  True]
 [ True  True  True]]
"""
意思是：
三通道的图像，深度方向上的三个值与[0,0,0]比较，得到的其实是一个mask大小的bool矩阵，
只有当三个颜色分量都相同，即mask
"""

参考：

0.2 rgb转为多通道的语义mask图像

直接看链接：https://github.com/nikhilroxtomar/RGB-Mask-to-Single-Channel-Mask-for-Multiclass-Segmentation/blob/main/rgb_mask_to_single_channel_mask.py，
关键部分是以下这个函数：

def process_mask(rgb_mask, colormap):
    output_mask = []

    for i, color in enumerate(colormap):
        cmap = np.all(np.equal(rgb_mask, color), axis=-1)
        output_mask.append(cmap)
    output_mask = np.stack(output_mask, axis=-1)
    return output_mask

0.3 单通道语义图像转多通道语义图像

mask = read_image(mask_path)
# instances are encoded as different colors
obj_ids = torch.unique(mask)
# first id is the background, so remove it
obj_ids = obj_ids[1:]
num_objs = len(obj_ids)

# split the color-encoded mask into a set
# of binary masks
masks = (mask == obj_ids[:, None, None]).to(dtype=torch.uint8)

参考：

✅https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html

1. 方法二（不那么普适）

找到了一个语义分割的数据集，但是标注图像是RGB格式的，需要转为mask。
受损花朵分割（280MB）：Accurate damaged flower shapes/segmentation

1.1 查看原始图像

import matplotlib.pyplot as plt
import cv2
%matplotlib ipympl

mask = cv2.imread("datasets/05-damaged-01/mask/mask00000001.png")
plt.imshow(mask[:,:,::-1])

在这里插入图片描述
结论一：大致可以看到，给出的原始标记图像是RGB格式的，三通道。

另外由于数据集没有给出标签信息，需要自己分析这个图像上一共有多少种颜色，可以使用np.unique函数来完成这个操作，参考stackoverflow-numpy: unique list of colors in the image

import numpy as np
np.unique(mask.reshape(-1, mask.shape[2]), axis=0,return_counts=True)
>(array([[  0,   0,   0],
        [  4,   0,  67],
        [ 12,   0, 243],
        [ 13,   0, 252],
        [ 13,   0, 255]], dtype=uint8),
 array([813467,    322,      1,    336, 107474]))

# 或者用counter方法
from collections import Counter
Counter([tuple(colors) for i in mask for colors in i])

>Counter({(0, 0, 0): 813467,
         (4, 0, 67): 322,
         (13, 0, 252): 336,
         (13, 0, 255): 107474,
         (12, 0, 243): 1})

结论2：可以看到其实频率最高的两类就是背景的黑色和花的红色，其他3种颜色其实是边缘，这在之后可以看到。

将原图转为灰度图，构建映射词典，查看灰度图的语义mask

gray_img = cv2.cvtColor(mask, cv2.COLOR_BGRA2GRAY)
print(np.unique(gray_img,return_counts=True))
>((array([ 0, 20, 74, 77, 78], dtype=uint8),
  array([813467,    322,      1,    336, 107474])))

uniqueRs=np.unique(gray_img)
indexMapDict = {}
# 这里一开始是默认每个颜色按顺序对应index
for i in range(len(uniqueRs)):
    indexMapDict[uniqueRs[i]]=i

# 构建相同大小的数组
pngArray = np.zeros_like(gray_img)
height = gray_img.shape[0]
width = gray_img.shape[1]

for i in range(height):
    for j in range(width):
        grayLevel = gray_img[i,j]
        pngArray[i,j]=indexMapDict[grayLevel]
np.unique(pngArray,return_counts=True)
>(array([0, 1, 2, 3, 4], dtype=uint8),
 array([813467,    322,      1,    336, 107474])) # 频率和上面是一致的

# 可视化看一下结果
import matplotlib
# 5种颜色，所以建立一个5类的colormap
flower_cmap = matplotlib.colors.ListedColormap(["black", "white","yellow","white","red"],N=5)

plt.figure()
plt.imshow(pngArray,cmap=flower_cmap)

在这里插入图片描述
结论3：除了红色和黑色，显示为白色和黄色的其实就是刚刚出现频率比较低的那三种颜色值，可以看出来，是花的边缘（与背景挨着的地方）。

可能是使用的标注软件或者是利用了一定的视频追踪标注技术导致的
因此实际使用的时候，只考虑背景类是0，前景类是1

参考语义分割中数据样本的整理标注及调色板代码

1.2 代码

整体思路：

RGB图像转为灰度图，这样颜色的三通道就变成一个数值了，方便操作
用np.unique找出颜色种类
构建映射dict，出现频率最高的颜色就是背景（0），次高的是前景（1），其余是花的边缘，这里暂时赋值为0
构建对应的numpy矩阵并存储

具体代码：

import os
import cv2
import numpy as np

raw_mask_base = "datasets/05-damaged-01/mask"
save_mask_base = "datasets/05-damaged-01/maskLabel"
os.makedirs(save_mask_base,exist_ok=True)

maskList = os.listdir(raw_mask_base)

for imgName in maskList:
    imgPath = os.path.join(raw_mask_base,imgName)
    savePath = os.path.join(save_mask_base,imgName.split(".")[0]+str('.png'))
	
	# 1. 转为灰度图
    mask = cv2.imread(imgPath)
    gray_img = cv2.cvtColor(mask, cv2.COLOR_BGRA2GRAY)
    
    # 2. 找出颜色种类
    uniqueRs,index,count=np.unique(gray_img,return_inverse=True,return_counts=True)
    
    # 3. 构建dict
    mapDict = {}
    indexSort=np.argsort(count)
    for i in indexSort[:-2]:
        mapDict[uniqueRs[i]]=0        
    mapDict[uniqueRs[indexSort[-1]]]=0
    mapDict[uniqueRs[indexSort[-2]]]=1
    
    # 4. 映射并报错图像
    try:
        pngArray = np.array([mapDict[x] for x in uniqueRs])[index].reshape(gray_img.shape)
        cv2.imwrite(savePath, pngArray)
    except Exception as e:
        print(f'处理出现问题：{e}，文件是 {imgName}')

处理过程中，发现不是所有图像的颜色种类都是5种，因此在构建mapDict时，只对频率最高的两类做明确处理，其他颜色类别都作为背景
在使用numpy进行dict映射时，发现上述方法是最快的，相对于下面这种纯纯for循环，能快100X以上。
```
 for i in range(height):
     for j in range(width):
         grayLevel = gray_img[i,j]
         pngArray[i,j]=mapDict[grayLevel] 
```
详见：Translate every element in numpy array according to key
使用imwrite保存单通道灰度图时，需要保证值的范围是0-255的整数，详见：Saving GRAYSCALE .png image with cv2.imwrite() not working

1.3 cv::IMREAD_GRAYSCALE与CV_BGR2GRAY结果不一致

1.3.1 现象描述

"""
上面代码采取了
"""
mask = cv2.imread(imgPath)
gray_img = cv2.cvtColor(mask, cv2.COLOR_BGRA2GRAY)

"""
而不是直接一步到位
"""
gray_img = cv2.imread(imgPath,cv2.IMREAD_GRAYSCALE)

这是因为虽然上述两种方式都可以用来进行灰度图的转换，但是结果会有些差异，见下面的示例：

# 1
gray_image = cv2.cvtColor(mask,cv2.COLOR_BGR2GRAY)
np.unique(gray_image)
>array([ 0, 20, 74, 77, 78], dtype=uint8)

# 3
gray_image = cv2.imread("datasets/05-damaged-01/mask/mask00000001.png",cv2.IMREAD_GRAYSCALE)
np.unique(gray_image)
>array([ 0, 20, 74, 76, 77], dtype=uint8)

可以看到，上面是77和78，下面是76和78。

1.3.2 原因

在opencv-imread的文档里，提到：

在这里插入图片描述
大致意思就是：

当使用IMREAD_GRAYSCALE时，直接调用了当前平台可以使用的编码解码器（codec）来进行灰度转换，所以结果会和cvtColor()有些不同。
对于Windows和macOS系统，默认使用（libjpeg，libpng，libtiff以及libjasper）这些编解码器对opencv图像进行处理，这也是为什么Opencv能处理这些格式图像的原因。

另外，根据Opencv - Grayscale mode Vs gray color conversion

cvtColor() 是一种Opencv的实现，同时在所有平台下都会保持一致（本质是对数组进行处理）
而使用imread()来把彩色图转为灰度图时，则会受制于imread()函数在特定平台下的具体实现，即上面说的编解码器（要和存储系统打交道，浮点数的规定等会有差异）
所以问题其实来源于：为什么要用一个读图的函数去完成色彩转换？

感谢OpenCV 中 imread cvtColor cv::IMREAD_GRAYSCALE与CV_BGR2GRAY得到灰度图不一致问题，下面是搬运的：请去原博点赞。

在opencv3.0中，

cv::IMREAD_COLOR 解析jpg时候，由cv::JpegDecoder解码得到一个RGB图像，然后由icvCvt_RGB2BGR_8u_C3R() 函数交换R和B空间，得到BGR格式的彩色图。
cv::IMREAD_GRAYSCALE 这个图像由cv::JpegDecoder解码得到一个灰度图，所有的颜色转换和其他预处理或后处理等相关细节都是由libjpeg处理的，最后，将解压缩的数据复制到给定cv::Mat的内部缓冲区中。因此，在cv::IMREAD_GRAYSCALE中没有调用opencv中的函数cv::cvtColor来进行颜色转换。

1.3.3 推荐做法

在这里插入图片描述

如果原图是彩色图，则imread之后再用cvtColor转为灰度图
如果原图本身就是灰度图，则imread的时候添加cv2.IMREAD_GRAYSCALE参数读取灰度图

1.4 CV_BGR2GRAY和CV_RGB2GRAY不一致

# 1
gray_image = cv2.cvtColor(mask,cv2.COLOR_BGR2GRAY)
np.unique(gray_image)
>array([ 0, 20, 74, 77, 78], dtype=uint8)

# 2
gray_image = cv2.cvtColor(mask,cv2.COLOR_RGB2GRAY)
np.unique(gray_image)
>array([ 0,  9, 31, 33], dtype=uint8)

cv2.COLOR_RGB2GRAY和cv2.COLOR_BGR2GRAY对同一图像处理，结果不同
根据Why would cv2.COLOR_RGB2GRAY and cv2.COLOR_BGR2GRAY give different results?可知：

RGB2GRAY过程中，三个通道不是平均的，是不同的权重系数，所以对同一个图分别调用cv2.COLOR_RGB2GRAY和cv2.COLOR_BGR2GRAY结果会不一样，这个很好理解。
Opencv中关于RGB→GRAY图像的转换公式，详见文档：https://docs.opencv.org/4.x/de/d25/imgproc_color_conversions.html#color_convert_rgb_gray documentation