【动手开发深度学习框架日记】Tensor基本数据结构

最新推荐文章于 2024-08-31 13:33:17 发布

Auzdora.

最新推荐文章于 2024-08-31 13:33:17 发布

阅读量815

点赞数 2

文章标签：深度学习数据结构 pytorch 人工智能 python

本文链接：https://blog.csdn.net/MelroseLbt/article/details/126606696

版权

我的专业是智能科学与技术专业，硬件软件方面的知识都有涉及一些。深度学习技术在我学习生涯中占比例是很大的。虽然学习过一些机器学习、神经网络等一些理论基础，但是用别人成熟的框架总感觉缺失点东西。以学习出发为目的，我花了一两个月时间使用Numpy+Python写了第一个自己的深度学习框架MetaFlow，实现了Tensor、反向传播、自动微分算子、Dataset、DataLoader等模块，使用起来类似于Pytorch，但是由于学习种种原因就被搁置了，仅仅实现了全连接神经网络的封装。同时MetaFlow还有一个缺点就是没有使用GPU加速。CUDA程序设计也是我比较期待学习的。所以产生了写一个基于CUDA/C为后端，Python为前端的简单的深度学习框架。这次的目标主要是实现全连接以及卷积操作，并提供类似于Pytorch的接口，同时支持batch训练、模型保存等等功能。
这个系列的博客用于个人学习记录，同时把一些实现的方法分享出来，也算是项目的参考文档。

Content

一、数据结构的定义
- - - 1.1 CPU数据结构和GPU数据结构
    - 1.2 Tensor定义
二、使用例程

一、数据结构的定义

张量（Tensor）是在任何深度学习框架中最为重要的一个数据结构。该数据结构需要实现以下几个功能：

支持高维度矩阵的运算
允许记录梯度值，且可以设置是否需要求梯度
作为计算图中的结点可以记录父节点和子节点
可以记录得到该结点时运用了哪些算子
能够做到数据从CPU（host）到GPU（device）的转换和创建
支持反向传播算法

这里需要关注的就是CPU到GPU数据的转换和格式如何去定义。

1.1 CPU数据结构和GPU数据结构

我采取了两种基本数据类型，分别时Numpy数组（CPU端计算）以及Quark（GPU端计算），Tensor在这里的作用更像是管理者和资源记录分配者，而真正需要进行数据运算的是上面二位基本数据结构支持的。Quark也叫夸克（项目名称是Neutron，中文是中子的意思，为了战术上的统一就给后端的数据结构起了名字叫夸克），是定义在后端的结构体。具体实现的代码如下：

extern "C"{
    typedef enum{
        CPU = 0,
        GPU = 1
    }Device;

    typedef struct{
        float* data;
        Device device;
        int* shape;
        int dim;
    }Quark;
}

代码定义在 array.h 头文件当中，其中包含float型数据指针；enum类型Device判断是使用CPU计算数据还是GPU；同样是数组指针shape，用于记录张量形状；以及dim记录张量维数。这些是在CUDA程序中十分重要的几个参数。

为了能够使C/C++的API供Python调用，使用了Python内置的ctypes库。那么在Python端，将CUDA/C++编译好的动态链接库文件导入，就可以使用后端的数据结构和API了。关于混合编程以及接口调用的问题，会在另一个文章里记录说明（未更新）。

总之，Quark结构体映射至Python代码中的实现如下：

class Quark(ctypes.Structure):
    """
        C++ back-end data structure. Contains data pointer (numpy data type has to
        be float32, otherwise it'll raise calculate error), device, data shape poin
        ter and dimension.
    """
    _fields_ = [('data', ctypes.POINTER(c_float)),
                ('device', ctypes.c_int),
                ('shape', ctypes.POINTER(ctypes.c_int)),
                ('dim', ctypes.c_int)]

关于使用CPU计算的数据就很简单了，np.array()就完事儿，什么shape，dim都能获取到。

1.2 Tensor定义

之前提到过，Tensor在框架中的角色其实并不是计算，而是充当一个资源调度和管理的角色。那么它就要能集两家之数据（CPU和GPU），无缝的、信息不丢失的衔接切换。同时还要实现什么梯度记录呀、父子结点记录呀、反向传播算法等等功能。当然要实现数据到GPU还是需要一些CUDA代码的，本篇日记仅仅记录实现逻辑，背后的CUDA代码会在另一个记录中说明（未更新）。

首先定义一个类，Tensor类，类中的属性就可以按照我们的需求来

class Tensor:
    """
        Python fore-end data structure.
        The most important attr is handle. Handle is a pointer to the real data str
        -ucture. It manages GPU data structure (Quark) and CPU data structure (numpy).
        
        When you instantiate the Tensor, you need to give parameters as follows:
        1. data: numpy array, dtype is np.float32.
        2. device: on cpu or on gpu.
        3. require_grad: require calculate gradient or not.
    """

    def __init__(self, data, device=CPU, require_grad=False):
        self.children = []
        self.father = []
        self.op = None
        self.grad = None
        self.device = device
        self.require_grad = require_grad
        self.handle = self.configureHandle(self, data, device)

里面包括父子结点列表、op算子、梯度、设备（CPU/GPU）、求梯度标志位、数据结构句柄。这个handle的作用就是根据device属性来判别需要创建什么句柄。

这里涉及到类的函数，self.configureHandle()，实现如下：

    @staticmethod
    # configure the handle attribute
    def configureHandle(self, data, device):
        if isinstance(data, tuple):
            data = np.random.random(data)
        if device == GPU:
            return self.getQuarkHandle(data.astype(np.float32))
        elif device == CPU:
            return self.getNumpyHandle(data.astype(np.float32))
    
    @staticmethod
    # get the Quark data structure handle
    def getQuarkHandle(numpy_data):
        assert isinstance(numpy_data, ndarray), "input data should be numpy array"
        data = numpy_data
        arr = Quark()
        arr.data = data.ctypes.data_as(ctypes.POINTER(c_float))
        arr.device = GPU
        arr.shape = getShape(ctypes.c_int, data.shape)
        arr.dim = len(data.shape)
        
        # start to allocate and copy data to GPU
        size = CUDALib.getSize(arr.dim, arr.shape)
        dev_ptr = CUDALib.AllocateDeviceData(size)
        CUDALib.CopyDataFromTo(arr.data, dev_ptr, CPU, GPU, size)
        arr.data = dev_ptr
        return arr

    @staticmethod
    # get the numpy data structure handle4
    def getNumpyHandle(numpy_data):
        assert isinstance(numpy_data, ndarray), "input data should be numpy array"
        return numpy_data```

首先判断要在CPU计算还是GPU计算，分别转到getNumpyHandle()和getQuarkHandle()。第一个函数实现很简单，返回numpy数组即可。getQuarkHandle()首先要从numpy数据中获取想要的信息，实例化Quark，在根据Quark的信息调用后端实现CUDA内存分配代码，把数据直接加载到GPU显存上。最后返回Quark实例化数据。

为了方便观察和调试，需要提供一些数据获取的接口。

    @property
    def shape(self):  # get data shape
        if isinstance(self.handle, ndarray):
            return self.handle.shape
        return tuple([self.handle.shape[idx] for idx in range(self.handle.dim)])
    
    @property
    def data(self):  # get data
        assert(self.device == GPU), "the data on the gpu instead of cpu"
        return np.ctypeslib.as_array(self.handle.data, shape=self.shape)
    
    def __str__(self):

        return "Tensor({}, shape={}, dtype=Tensor.float32)".format(np.ctypeslib.as_array(self.handle.data, shape=self.shape), self.shape)

和Pytorch类似，转移Tensor数据时，只需要xx.cpu()或者xx.gpu()即可。具体代码如下：

    # transfer the data from the gpu to the cpu
    def cpu(self):
        if self.device == GPU:
            size = CUDALib.getSize(self.handle.dim, self.handle.shape)
            host_ptr = CUDALib.AllocateHostData(size)
            CUDALib.CopyDataFromTo(self.handle.data, host_ptr, GPU, CPU, size)
            self.handle.data = host_ptr
        return self
    
    # transfer the data from the cpu to the gpu
    def gpu(self):
        if self.device == CPU and isinstance(self.handle, ndarray):
            self.handle = self.getQuarkHandle(self.handle)
            self.device = GPU
        return self

gpu()实现的逻辑就调用了getQuarkHandle()创建一个GPU的Quark。cpu()则需要利用CUDA的API将GPU显存数据移动到CPU上。

对于反向传播方法和剩余的方法将分布在其他文章中解释（未更新）。

二、使用例程

使用Tensor，可以预先创建numpy数组，前提必须数据是 np.float32 类型。这是因为在后端定义Quark是float类型，如果采用其他数据类型GPU运算出的数据会千奇百怪（如果忘记也没有关系，代码中会自动将类型变为float32）。
首先，可以使用numpy创建任意维度的数组，然后通过Tensor进一步封装。

x = np.ones((64, 64))
xt = Tensor(x, CPU, require_grad=False)
print(xt)

# console results
Tensor([[1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]       
 [1. 1. 1. ... 1. 1. 1.]       
 ...
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]], shape=(64, 64), dtype=Tensor.float32)

我们可以通过调用cpu()和gpu()转移Tensor。

xt.cpu()
xt.gpu()

值得注意的是，若想要打印数据，必须调用cpu()把数据移动到CPU来。
也可以通过输入元组数据，Tensor会自动创建基于正态分布的随机张量。

xt = Tensor((1, 64, 64), CPU, require_grad=False)
print(xt)

# console results
Tensor([[[0.43749017 0.29031968 0.8365907  ... 0.61214393 0.44423762 0.03210686]
  [0.6642815  0.7885864  0.6017005  ... 0.28682867 0.49431917 0.64389694]
  [0.02547996 0.5165705  0.711713   ... 0.33360547 0.13552403 0.6047031 ]
  ...
  [0.5312942  0.13073258 0.39996797 ... 0.3393874  0.38398758 0.81480604]
  [0.08465459 0.855784   0.6820476  ... 0.10212806 0.11926474 0.6199378 ]
  [0.92551076 0.92917097 0.8674459  ... 0.34977752 0.55820996 0.50206757]]], shape=(1, 64, 64), dtype=Tensor.float32)