图像的数组、张量、维度表示

最新推荐文章于 2024-06-27 20:59:13 发布

lalalagjl

最新推荐文章于 2024-06-27 20:59:13 发布

阅读量2.1w

点赞数 23

1.图像的数组表示

图像的RGB色彩模式:
RGB三个颜色通道的变化和叠加得到各种颜色，其中各通道取值范围:
R 0~255
G 0~255
B 0~255
白色 255 255 255
黑色 0 0 0
RGB形成的颜色包括了人类视力所能感知的所有颜色。
常用RGB颜色表：http://jsxzjh.bokee.com/3744988.html

PIL库
PIL: Python Image Library
PIL库是一个具有强大图像处理能力的第三方库。

pip install pillow
from PIL import Image
#Image是PIL库中代表一个图像的类(对象)

图像是一个由像素组成的二维矩阵，每个元素是一个RGB值(R, G, B)。

from PIL import Image
import numpy as np
im  = np.array(Image.open("D:/pycodes/beijing.jpg"))
print(im.shape, im.dtype)

图像是一个三维数组，维度分别是高度，宽度和像素RGB值。

图像的变换
读入图像后，获得像素RGB值，修改后保存为新的文件。

from PIL import Image
import numpy as np
a  = np.array(Image.open("D:/pycodes/beijing.jpg"))
print(a.shape, a.dtype)

b = [255, 255, 255] - a #补值
im = Image.formarray(b.astype('uint8'))
im.save("D:/pycodes/fcity2.jpg")

from PIL import Image
import numpy as np
a  = np.array(Image.open("D:/pycodes/beijing.jpg").convert('L')) #转换为灰度值图片(二维数组)

b = 255 - a
im = Image.formarray(b.astype('uint8'))
im.save("D:/pycodes/fcity3.jpg")

2.tf.nn.conv2d中NHWC不同参数维度

tf.nn.conv2d(
    input,
    filter,
    strides,
    padding,
    use_cudnn_on_gpu=True,
    data_format='NHWC',
    dilations=[1, 1, 1, 1],
    name=None
)

data_format 默认值为 “NHWC”，也可以手动设置为 “NCHW”。
这个参数规定了 input Tensor 和 output Tensor 的排列方式。

data_format 设置为 “NHWC” 时，排列顺序为 [batch, height, width, channels]；设置为 “NCHW” 时，排列顺序为 [batch, channels, height, width]。

其中 N 表示这批图像有几张，H 表示图像在竖直方向有多少像素，W 表示水平方向像素数，C 表示通道数（例如黑白图像的通道数 C = 1，而 RGB 彩色图像的通道数 C = 3）。

TensorFlow选择 NHWC 格式作为默认格式，因为早期开发都是基于 CPU，使用 NHWC 比 NCHW 稍快一些（不难理解，NHWC 局部性更好，cache 利用率高）。NCHW 则是 Nvidia cuDNN 默认格式，使用 GPU 加速时用 NCHW 格式速度会更快（也有个别情况例外）。

3.Tensorflow中图像和张量的相互转换

1、通过函数 read_file 读取本地的图像文件
2、利用函数 decode_jpeg 解码为张量

# -*- coding: utf-8 -*-
import tensorflow as tf
import matplotlib.pylab as plt
# 读取数据文件
image = tf.read_file("C:/kitty.jpg", 'r')
# 将图像文件解码为Tensor
image_tensor = tf.image.decode_jpeg(image)
# 图像张量的形状
shape = tf.shape(image_tensor)
session = tf.Session()
print("图像的形状为：")
print(session.run(shape))
# 将tensor转换为ndarray
image_ndarray = image_tensor.eval(session=session)
# 显示图片
plt.imshow(image_ndarray)
plt.show()

图像的形状为：
[393 500   3]

4.PIL.Image/numpy.ndarray与Tensor的相互转化

pytorch团队提供了一个torchvision.transforms包，我们可以用transforms进行以下操作：

PIL.Image/numpy.ndarray与Tensor的相互转化；

 transforms.ToTensor()
 #把像素值范围为[0, 255]的PIL.Image或者numpy.ndarray型数据，shape=(H x W x C)转换
 成的像素值范围为[0.0, 1.0]的torch.FloatTensor，shape为(N x C x H x W) 待确认。

归一化

transforms.Normalize(mean, std)

此转换类作用于torch.*Tensor。给定均值(R, G, B)和标准差(R, G, B)，用公式channel = (channel - mean) / std进行规范化。(是对tensor进行归一化，所以需要放在transforms.ToTensor()之后)

对PIL.Image进行裁剪、缩放等操作。

transforms.Scale(256),
transforms.RandomSizedCrop(224),
transforms.RandomHorizontalFlip()

transforms.Compose(transforms)将多个transform组合起来使用。其中，transforms：由transform构成的列表.

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
all_transforms = transforms.Compose([
                    transforms.Scale(256),
                    transforms.RandomSizedCrop(224),
                    transforms.RandomHorizontalFlip(), # 对PIL.Image图片进行操作
                    transforms.ToTensor(),
                    normalize])
#其中，transforms.Compose中的操作是按照顺序执行的。

PIL.Image/numpy.ndarray与Tensor的相互转换：
PIL.Image/numpy.ndarray转化为Tensor，常常用在训练模型阶段的数据读取，而Tensor转化为PIL.Image/numpy.ndarray则用在验证模型阶段的数据输出。

取值范围为[0, 255]的PIL.Image，转换成形状为[C, H, W]，取值范围是[0,
1.0]的torch.FloadTensor；
形状为[H, W, C]的numpy.ndarray，转换成形状为[C, H, W]，取值范围是[0,
1.0]的torch.FloadTensor。
而transforms.ToPILImage则是将Tensor或numpy.ndarray转化为PIL.Image。如果，我们要将Tensor转化为numpy，只需要使用
.numpy() 即可。

#-*-coding:utf-8-*-
import torch
from torchvision import transforms
from PIL import Image
import cv2

img_path = "./cat.59.jpg"  

transform1 = transforms.Compose([
	transforms.CenterCrop((224,224)), # 只能对PIL图片进行裁剪
	transforms.ToTensor(), 
	]
)

## PIL图片与Tensor互转
img_PIL = Image.open(img_path).convert('RGB')
img_PIL.show() # 原始图片
img_PIL_Tensor = transform1(img_PIL)
print(type(img_PIL))
print(type(img_PIL_Tensor))

#Tensor转成PIL.Image重新显示
new_img_PIL = transforms.ToPILImage()(img_PIL_Tensor).convert('RGB')
new_img_PIL.show() # 处理后的PIL图片

## opencv读取的图片与Tensor互转
# transforms中，没有对np.ndarray格式图像的操作
img_cv = cv2.imread(img_path)
transform2 = transforms.Compose([
	transforms.ToTensor(), 
	]
)

img_cv_Tensor = transform2(img_cv)
print(type(img_cv))
print(type(img_cv_Tensor))

5.图像通道转换——从np.ndarray的[w, h , c]转为Tensor的[c, w, h]

在神经网络中，图像被表示成[c, h, w]格式或者[n, c, h, w]格式，但如果想要将图像以np.ndarray形式输入，因np.ndarray默认将图像表示成[h, w, c]个格式，需要对其进行转化。

n：样本数量
c：图像通道数
w：图像宽度
h：图像高度

np.ndarray表示图像：
用PIL打开一张图像，然后通过array()方法将其转为np.ndarray形式，最后打印出它的shape即能得到图像时如何存储在np.ndarray中的

from PIL import Image
imoprt numpy as np

img_path = ('./test.jpg')
img = Image.open(img_path)
img_arr = np.array(img)
print(img_arr.shape)

# 输出的结果是(500, 300, 3)
#从上面的试验结果我们可以知道，图像以[h, w, c]的格式存储在np.ndarray中的。

np.ndarray与Tensor中图像格式区别
两者均以三维数组来表示一张图像，他们的区别在于图像信息被保存在数组中的不同位置，具体来说：

np.ndarray的[h, w, c]格式：数组中第一层元素为图像的每一行像素，第二层元素为每一列像素，最后一层元素为每一个通道的像素值，它将图片中的每一个像素作为描述单元，记录它三个通道的像素值。

Tensor的[c, h, w]格式：数组中第一层元素为图像的三个通道，第二层元素为某个通道上的一行像素，第三层为该通道上某列的像素值，它将图像某个通道的某行像素值作为描述单元。

from PIL import Image
import numpy as np
import matplotlib.pyplot as plt


# 用随机数模拟一张图像
image = np.random.randint(256, size=60)
image = image.reshape((5,4,3))
image_hwc = np.uint8(image)

# 展示图像
image_show = Image.fromarray(image_hwc)
plt.imshow(image_show)
plt.show()

# 打印图像像素值，[h, w, c]格式
print(image_hwc)

# 打印像素值，[c, h, w]格式
image_chw = np.transpose(image_hwc, (2,0,1))
print(image_chw)

从[w, h, c]转为[c, w, h]

可以借助numpy的transpose()函数来实现这个转换。

image_chw = np.transpose(image_hwc, (2,0,1))

6.获取张量形状的操作

TensorFlow 获取张量形状的操作
tf.shape(tensor)
tensor.shape
tensor.get_shape() ：tensor.shape 的别名

>>> import tensorflow as tf
>>> tf.Variable(initial_value=tf.truncated_normal([100,100]))
<tf.Variable 'Variable:0' shape=(100, 100) dtype=float32_ref>
>>> v = tf.Variable(initial_value=tf.truncated_normal([100,100]))

# tf.shape() 方法
>>> tf.shape(v)
<tf.Tensor 'Shape:0' shape=(2,) dtype=int32>

# shape 属性
>>> v.shape
TensorShape([Dimension(100), Dimension(100)])

# get_shape() 方法
>>> v.get_shape()
TensorShape([Dimension(100), Dimension(100)])

# 错误的用法举例
# 将属性当成方法
>>> v.shape()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'TensorShape' object is not callable

# 将方法当成属性
>>> v.get_shape
<bound method RefVariable.get_shape of <tf.Variable 'Variable_1:0' shape=(100, 100) dtype=float32_ref>>