机器学习 | 取代全连接神经网络！稀疏网络能用 Pytorch 这么实现！

最新推荐文章于 2023-12-27 18:00:57 发布

CHUNLIN GO

最新推荐文章于 2023-12-27 18:00:57 发布

阅读量6k

点赞数 7

分类专栏： Machine Learning Python Module 文章标签：神经网络 pytorch 深度学习

本文链接：https://blog.csdn.net/kuo_jun_lin/article/details/115552545

版权

Machine Learning 同时被 2 个专栏收录

24 篇文章 7 订阅

订阅专栏

Python Module

9 篇文章 0 订阅

订阅专栏

关键词：机器学习 / 神经网络 / 稀疏

摘要：
全联接神经网络从发明至今到处可见其身影，它可以用来实现分类或回归任务，有效拓展机器的决策能力，然而全联接的参数总量非常庞大，既需要硬盘空间保存模型，运行时又需要占用大量内存实现计算，稀疏网络的概念由此应运而生！去掉某些链接实现网络的轻量化正是它的重要贡献！

简介

神经网络专指一种结构化的层状网络结构，虽然同样也都是由节点 (node) 和边 (edge) 组成，但每个节点不能与和自己同一层的其他节点产生连接，因此常常网络结构的示意图就会被画成这个样子：

全联接神经网络

而实际在计算机中实现这个结构的过程正是通过矩阵计算:

神经网络下的矩阵操作

节点的数量由矩阵的长宽决定，而边的权重则由另一个充满颜色矩阵来管理，示意图里面的 “空” 框框可能乍看之下令人纳闷，这是因为实际操作中可以一次放入多比数据到神经网络里，沿着有颜色的示意方向看，我们很容易看出这里一次放了 4 比数据！但具体能够放几笔呢？这得依据电脑的缓存大小而定，理论上如果缓存够大的话，所有的数据可以全部一起被放入网络中分析。

然而随着一个神经网络的结构越加复杂，参数量也成倍数增长，除了直接舍弃神经网络不用之外，剪枝 成了另一个不错的解决方案，既然全部的联接加起来很占空间，那就减去一些不用了！数学上的表述方式就是让权重矩阵里的某些值变成 0 即可，既然理论上可行，那就具备实现这一理论的基础！

这次的算法讲解与实践将会用到以下模块：

import math
import torch
import torch.nn as nn

模块并非 Python 内置，需要另外自行安装。

权重归零

先放上一张示意图，用来给读者们更好地理解剪掉联接后的神经网络：

剪枝后网络
前面提到把联接断开的方法就是把权重矩阵里面的值归零，为了更方便控制联接，并且确保这些断开了的联接都能有机会被复原，我们并不会直接归零矩阵里的值，而是通过使用另一个一模一样尺寸的 mask 来控制联接是否断开：

权重的罩子

【 – 与 Dropout 的差别 – 】
有经验的读者们可能会纳闷，这个操作与常用的 dropout 方法具体有什么区别呢？不都是断开某个联接然后实现神经网络的简化吗？其实不然，剪枝实现的断开是一种彻底的断开，也就是说连梯度的回传过程都直接不能更新被断开的参数，是一种彻底和训练过程分离的操作，反之 dropout 只有在前向传播的时候断开联接，而训练过程还是会通过回传的梯度来更新被断开的联接参数。

稀疏神经网络实现

经过了小编的全网地毯式搜索，直到 2019 年末 Pytorch 和 Tensorflow 等流行框架才开始提供相对应的剪枝函数让我们能够直接调用，虽然有些改进，但功能还是有些局限性，为了能够在剪枝的同时享有 GPU 加速的效果，接下来我们就来改改 Pytorch 的源代码，打造一个我们自定义的函数！

p.s. 如果想阅读官方提供的说明文档，可以点此进入网站

既然控制网络节点断开与否的方式是通过 0 与 1 操作，我们就需要一个函数可以用来自动生成符合我们预期大小的遮罩，以下是随机生成的函数示范，其他生成方式也可以自己定义！

def gen_mask(row, col, percent=0.5, num_zeros=None):
    if num_zeros is None:
        # Total number being masked is 0.5 by default.
        num_zeros = int((row * col) * percent)

    mask = np.hstack([
    	np.zeros(num_zeros),
        np.ones(row * col - num_zeros)])

    np.random.shuffle(mask)
    return mask.reshape(row, col)

遮挡权重的数量可以根据具体的个数 num_zeros 而定，也可以根据总体数量的比例 percent 来定。

Pytorch 的全联接函数

此函数将会继承自动反向传播的类，我们的目标就是在前向和反向传播的过程都让遮罩参与其中，因此改造函数的时候需要新增加一个参数 mask 传入：

class LinearFunction(torch.autograd.Function):
    """
    autograd function which masks it's weights by 'mask'.
    """

    # Note that both forward and backward are @staticmethods
    @staticmethod
    # bias, mask is an optional argument
    def forward(ctx, input, weight, bias=None, mask=None):
        if mask is not None:
            # change weight to 0 where mask == 0
            weight = weight * mask

        output = input.mm(weight.t())
        if bias is not None:
            output += bias.unsqueeze(0).expand_as(output)

        ctx.save_for_backward(input, weight, bias, mask)
        return output

    # This function has only a single output, so it gets only one gradient
    @staticmethod
    def backward(ctx, grad_output):
        input, weight, bias, mask = ctx.saved_tensors
        grad_input = grad_weight = grad_bias = grad_mask = None

        # These needs_input_grad checks are optional and there only to
        # improve efficiency. If you want to make your code simpler, you can
        # skip them. Returning gradients for inputs that don't require it is
        # not an error.
        if ctx.needs_input_grad[0]:
            grad_input = grad_output.mm(weight)

        if ctx.needs_input_grad[1]:
            grad_weight = grad_output.t().mm(input)
            if mask is not None:
                # change grad_weight to 0 where mask == 0
                grad_weight = grad_weight * mask

        # if bias is not None and ctx.needs_input_grad[2]:
        if ctx.needs_input_grad[2]:
            grad_bias = grad_output.sum(0).squeeze(0)

        return grad_input, grad_weight, grad_bias, grad_mask

不过这还只是自创 Pytorch 函数的第一步，接下来才是本文代码部分的重头戏，自定义的一个类也需要继承 nn.Module，这么一来类才有 forward 等一系列的 Pytorch 基本操作。努力了这么久，该是为自己的方法取个帅名的好时候了！

class CustomizedLinear(nn.Module):
    def __init__(self, input_features, output_features, bias=True, mask=None):
        """
        Argumens
        ------------------
        mask [numpy.array]:
            the shape is (n_input_feature, n_output_feature).
            the elements are 0 or 1 which declare un-connected or
            connected.
        bias [bool]:
            flg of bias.
        """
        super(CustomizedLinear, self).__init__()
        self.input_features = input_features
        self.output_features = output_features

        # nn.Parameter is a special kind of Tensor, that will get
        # automatically registered as Module's parameter once it's assigned
        # as an attribute.
        self.weight = nn.Parameter(torch.Tensor(
            self.output_features, self.input_features))

        if bias:
            self.bias = nn.Parameter(
            	torch.Tensor(self.output_features))
        else:
            # You should always register all possible parameters, but the
            # optional ones can be None if you want.
            self.register_parameter('bias', None)

        # Initialize the above parameters (weight & bias).
        self.init_params()

        if mask is not None:
            mask = torch.tensor(mask, dtype=torch.float).t()
            self.mask = nn.Parameter(mask, requires_grad=False)
            # print('\n[!] CustomizedLinear: \n', self.weight.data.t())
        else:
            self.register_parameter('mask', None)

    def init_params(self):
        stdv = 1. / math.sqrt(self.weight.size(1))
        self.weight.data.uniform_(-stdv, stdv)
        if self.bias is not None:
            self.bias.data.uniform_(-stdv, stdv)

    def forward(self, input):
        # See the autograd section for explanation of what happens here.
        return LinearFunction.apply(
        	input, self.weight, self.bias, self.mask)

    def extra_repr(self):
        # (Optional)Set the extra information about this module. You can test
        # it by printing an object of this class.
        return 'input_features={}, output_features={}, bias={}, mask={}'.format(
            self.input_features, self.output_features,
            self.bias is not None, self.mask is not None)

在 CustomizedLinear 类的 __init__ 函数里要特别注意参数注册成 parameter 的顺序，小编的经验可以告诉你，如果 mask 参数早于 weight 或 bias 注册的话，那就是品尝 bug 滋味的时候了。参数定义清楚并注册之后，记得要通过一些分布来初始化里面的数值。经过这么一大通修改后，我们就能得到一个可以输入遮罩 mask 的自定义层，并且在 backward 的时候避免更新那些被断开的权重。

实际模型的应用

算法实操中使用自定义遮罩稀疏神经网络的方法也非常直观，只要参数的顺序，初始化的过程，还有反向传播的机制设定没有问题，就能够用 CustomizedLinear 完全取代 nn.Linear 的功能，不多废话先上代码！

class Network(nn.Module):
    def __init__(self, in_size, out_size, ratio=[0, 0.5, 0]):
        super(Network, self).__init__()
        # self.fc1 = nn.Linear(in_size, 32)
        self.fc1 = CustomizedLinear(
            in_size, 32, mask=gen_mask(
            	in_size, 32, ratio[0]))
        self.bn1 = nn.BatchNorm1d(32)
        # self.fc2 = nn.Linear(32, 16)
        self.fc2 = CustomizedLinear(32, 16, 
        	mask=gen_mask(32, 16, ratio[1]))
        self.bn2 = nn.BatchNorm1d(16)
        # self.fc3 = nn.Linear(16, out_size)
        self.fc3 = CustomizedLinear(
            16, out_size, mask=gen_mask(
            	16, out_size, ratio[2]))
        self.bn3 = nn.BatchNorm1d(out_size)
        self.relu = nn.ReLU()

        for m in self.modules():
            if isinstance(m, nn.Linear):
                nn.init.uniform_(m.weight, a=0, b=1)
            elif isinstance(m, (
            	nn.BatchNorm1d, nn.GroupNorm)):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

    def forward(self, x):
        x = self.fc1(x)
        x = self.bn1(x)
        x = self.relu(x)

        x = self.fc2(x)
        x = self.bn2(x)
        x = self.relu(x)

        x = self.fc3(x)
        x = self.bn3(x)
        x = self.relu(x)
        return x