深度学习笔记2——Tranforms的使用

最新推荐文章于 2024-06-13 11:00:04 发布

喜欢写代码的小唐

最新推荐文章于 2024-06-13 11:00:04 发布

阅读量844

点赞数 27

文章标签：深度学习笔记人工智能

本文链接：https://blog.csdn.net/m0_73245690/article/details/136994940

版权

1. transforms结构

transforms的底层并没有太过于复杂的代码和逻辑。他是由transforms.py里写的几个工具类来实现的功能和算法。图片通过transforms的工具处理，然后得出我们想得到的结果。

2.Tensor的用法和以及其数据类型

对于tensor我们首先得有一个最简单的认识，那就是他是一种图片打开的类型，关于这种类型我们常用的有三种，一种是PIL他是由Image.open()打开的，第二章就是tensor他是用ToTensor()打开的图片类型，最后一种就是narrays是cv.imread()打开的，这里我们先讲Tensor的用法和数据结构

1.tensor的用法

tensor_trans=transforms.ToTensor()
tensor_img=tensor_trans(img)
print(tensor_img)

我们来看上面这三行代码，之前我们说过，transforms这个包是个工具类，所以他下面有许许多多的工具类，而ToTensor这个类的就是为了将图片转换成Tensor形式。所以我们先创建一个ToTensor类的对象，但第二行代码就有些让人摸不清头脑了，这个时候我们来看源码：

class ToTensor:
    """Convert a PIL Image or ndarray to tensor and scale the values accordingly.

    This transform does not support torchscript.

    Converts a PIL Image or numpy.ndarray (H x W x C) in the range
    [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0]
    if the PIL Image belongs to one of the modes (L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1)
    or if the numpy.ndarray has dtype = np.uint8

    In the other cases, tensors are returned without scaling.

    .. note::
        Because the input image is scaled to [0.0, 1.0], this transformation should not be used when
        transforming target image masks. See the `references`_ for implementing the transforms for image masks.

    .. _references: https://github.com/pytorch/vision/tree/main/references/segmentation
    """

    def __init__(self) -> None:
        _log_api_usage_once(self)

    def __call__(self, pic):
        """
        Args:
            pic (PIL Image or numpy.ndarray): Image to be converted to tensor.

        Returns:
            Tensor: Converted image.
        """
        return F.to_tensor(pic)

    def __repr__(self) -> str:
        return f"{self.__class__.__name__}()"

我们发现在这个类里只有三个函数，第一个是初始化函数，第三个是相当于java里面的toString方法，就是为了返回一个字符串，但造成上面的结果的第二个函数。就是__call__()

1.call()

讲到这个函数，我们就得讲到一个概念，就是就是可调用对象。我们平时自定义的函数、内置函数和类都属于可调用对象，但凡是可以把一对括号()应用到某个对象身上都可称之为可调用对象，如果你想确定某个对象是否是课调用对象可以用到callable。很明显，我们平时的类对象是不可调用对象，因为我们平时除了初始化的时候可以类名(参数)这样写，但对于类对象我们基本是不会怎么写的，所以在这里__call__函数就是将实例对象转换成了可调用对象，这里我举了一个例子：

class Person:
    def __call__(self, name):
        print("__call__"+"hello"+name)
    def hello(self, name):
        print("hello"+name)

person = Person()
person(name="zhangsan")
person.hello("list")

就像上面这样，我们定义了一个Person类，在person类里面写了两个函数，然后对其调用，结果如下：

我们发现如果你直接调用实例对象并往里面传参，代码就会调用__call__()函数并调用他的函数块，而Tensor类这里也是一样，

tensor_img=tensor_trans(img)

如果你直接往实例对象里面传参他就会，调用__call__()函数，但其实他和tensor_trans.__call__(img)这个函数是一样的，但把实例对象转化成可调用对象是这个函数的特性。然后就会调用对应的函数块，如下：

 def __call__(self, pic):
        """
        Args:
            pic (PIL Image or numpy.ndarray): Image to be converted to tensor.

        Returns:
            Tensor: Converted image.
        """
        return F.to_tensor(pic)

2.to_tensor()

对于F.to_tensor(pic)这个语句，pic这个参数可以是PIL或是numpy数据类型，但F这个似乎没有见过对应的库，因为在transforms库里面，用了这样一个语句：

from . import functional as F

这段代码语句 from . import functional as F 在Python中是一种导入模块的方式。

from .: 这表示从当前目录中导入模块。
import functional as F: 这表示导入名为functional的模块，并将其重命名为F，以便在代码中更方便地引用。

所以，这行代码的意思是从当前目录中导入名为functional的模块，并将其重命名为F，以后可以使用F来访问这个模块中的函数和变量。而to_tensor()这个函数就是把img转换成Tensor数据类型：

def to_tensor(pic) -> Tensor:
    """Convert a ``PIL Image`` or ``numpy.ndarray`` to tensor.
    This function does not support torchscript.

    See :class:`~torchvision.transforms.ToTensor` for more details.

    Args:
        pic (PIL Image or numpy.ndarray): Image to be converted to tensor.

    Returns:
        Tensor: Converted image.
    """
    if not torch.jit.is_scripting() and not torch.jit.is_tracing():
        _log_api_usage_once(to_tensor)
    if not (F_pil._is_pil_image(pic) or _is_numpy(pic)):
        raise TypeError(f"pic should be PIL Image or ndarray. Got {type(pic)}")

    if _is_numpy(pic) and not _is_numpy_image(pic):
        raise ValueError(f"pic should be 2/3 dimensional. Got {pic.ndim} dimensions.")

    default_float_dtype = torch.get_default_dtype()

    if isinstance(pic, np.ndarray):
        # handle numpy array
        if pic.ndim == 2:
            pic = pic[:, :, None]

        img = torch.from_numpy(pic.transpose((2, 0, 1))).contiguous()
        # backward compatibility
        if isinstance(img, torch.ByteTensor):
            return img.to(dtype=default_float_dtype).div(255)
        else:
            return img

    if accimage is not None and isinstance(pic, accimage.Image):
        nppic = np.zeros([pic.channels, pic.height, pic.width], dtype=np.float32)
        pic.copyto(nppic)
        return torch.from_numpy(nppic).to(dtype=default_float_dtype)

    # handle PIL Image
    mode_to_nptype = {"I": np.int32, "I;16" if sys.byteorder == "little" else "I;16B": np.int16, "F": np.float32}
    img = torch.from_numpy(np.array(pic, mode_to_nptype.get(pic.mode, np.uint8), copy=True))

    if pic.mode == "1":
        img = 255 * img
    img = img.view(pic.size[1], pic.size[0], F_pil.get_image_num_channels(pic))
    # put it from HWC to CHW format
    img = img.permute((2, 0, 1)).contiguous()
    if isinstance(img, torch.ByteTensor):
        return img.to(dtype=default_float_dtype).div(255)
    else:
        return img

从注释我们可以看出来,往这里面传的参数pic可以是PIL Image 或 numpy.ndarray，然后返回一个Tensor()类型，所以对于ToTensor()类的使用我们完整的代码应该是：

img_path="D:\\pythonData\\ten\\face_image1.jpg"
img=Image.open(img_path)
tensor_trans=transforms.ToTensor()
print(tensor_trans)
tensor_img=tensor_trans(img)
print(tensor_img)

然后我们用前面讲过的SummaryWriter类的writer.add_image()方法把图片数据放入tensorboard如下：

2.tensor类的数据类型

在上述代码里面我们用了print(tensor.img)我们打印出了tensor的数据，结果如下：

我们单从打印结果来看，tensor似乎是一个三维数组，在深度学习的知识里面，tensor也叫张量。张量一般和向量和矩阵放在一起来理解。他们是不同维度的数据。例如，向量是一阶张量，矩阵是二阶张量，也就是说张量是多维数组，目的是把向量和矩阵推向更高的维度。因为张量其实是更高维度的向量，所以向量的一些数学性质他也是有的。在我们开始处理图象的时候，张量就更加重要，因为图像一般以n维数组的形式出现。