keras源码分析之Layer

最新推荐文章于 2024-05-30 16:33:48 发布

爱编程真是太好了

最新推荐文章于 2024-05-30 16:33:48 发布

阅读量5.1k

点赞数 15

CC 4.0 BY-SA版权

分类专栏：深度学习 keras 深度学习框架

本文链接：https://blog.csdn.net/u012526436/article/details/98206560

深度学习同时被 3 个专栏收录

44 篇文章

订阅专栏

深度学习框架

7 篇文章

订阅专栏

keras

6 篇文章

订阅专栏

本文深入解析Keras框架中Layer类的实现原理，包括构造方法、add_weight方法、__call__方法等内容，阐述Layer如何作为神经网络的基础组件，连接各层并处理数据。

简介

本文主要是对base_layer.py代码的分析，该文件包含了最重要的Layer类代码，keras所有的层都是Layer的父类，class Sequential(Model)继承了keras/engine/training.py中的Model类，而Model类则继承了同目录下的keras/engine/topology.py中的Container类，Container类继承了同文件中的Layer类，所以说Layer类就是keras的地基，承载着整个框架。

预备知识

在开始讲解源码之前，需要给大家再讲一些预备知识，许多做ai的同学对python的了解可能没有那么深入，如果对这些预备知识不理解的话可能很难理解源码

装饰器

装饰器是用来修改函数功能的，它能让代码变得更加简洁，在Layer层中，采用了@property和一个自定义的装饰器，@property类似Java中类的变量的get和set方法，例如下面的源码

get方法，添加@property即可

@property
def built(self):
    return self._built

set方法，添加@属性名.setter

@built.setter
def built(self, value):
    self._built = value

另一个自定义的装饰器是@interfaces.legacy_add_weight_support，这个主要是keras2的代码对于1的兼容，这里就不做过多的讲解了。

magic函数

python类中有一个函数__call__叫做magic函数，实现了该方法的类的实例对象，可以直接以实例名作为方法进行调用，这也是为什么我们能把每一层直接连接起来

inputs = Input(shape=(100))
x = Dense(64)(inputs)

源码分析

接下来我们正式来看Layer的源码，其中截取了我认为比较重要的内容，更多细节各位直接去看源码吧，先从构造方法开始

 def __init__(self, **kwargs):
     self.input_spec = None
     self.supports_masking = False
     self.stateful = False

     # These properties will be set upon call of self.build()
     self._trainable_weights = []
     self._non_trainable_weights = []
     self._losses = []
     self._updates = []
     self._per_input_losses = {}
     self._per_input_updates = {}
     self._built = False

     # These lists will be filled via successive calls
     # to self._add_inbound_node().
     self._inbound_nodes = []
     self._outbound_nodes = []

     # These properties should be set by the user via keyword arguments.
     # note that 'dtype', 'input_shape' and 'batch_input_shape'
     # are only applicable to input layers: do not pass these keywords
     # to non-input layers.
     allowed_kwargs = {'input_shape',
                       'batch_input_shape',
                       'batch_size',
                       'dtype',
                       'name',
                       'trainable',
                       'weights',
                       'input_dtype',  # legacy
                       }
     for kwarg in kwargs:
         if kwarg not in allowed_kwargs:
             raise TypeError('Keyword argument not understood:', kwarg)
     name = kwargs.get('name')
     if not name:
         prefix = self.__class__.__name__
         name = _to_snake_case(prefix) + '_' + str(K.get_uid(prefix))
     self.name = name

     self.trainable = kwargs.get('trainable', True)
     if 'input_shape' in kwargs or 'batch_input_shape' in kwargs:
         # In this case we will later create an input layer
         # to insert before the current layer
         if 'batch_input_shape' in kwargs:
             batch_input_shape = tuple(kwargs['batch_input_shape'])
         elif 'input_shape' in kwargs:
             batch_size = kwargs.get('batch_size')
             batch_input_shape = (
                 batch_size,) + tuple(kwargs['input_shape'])
         self.batch_input_shape = batch_input_shape

     # Set dtype.
     dtype = kwargs.get('dtype')
     if dtype is None:
         dtype = kwargs.get('input_dtype')
     if dtype is None:
         dtype = K.floatx()
     self.dtype = dtype

     self._initial_weights = kwargs.get('weights')

构造函数主要是参数的初始化和一些变量的赋值，其输入参数在allowed_kwargs中，包括

input_shape，输入维度
batch_input_shape，包括batch_size的输入维度
batch_size，batch_size
dtype，数据类型
name，名字
trainable，是否训练
weights，权重
input_dtype，输入类型

其中dtype, input_shape 和 batch_input_shape是输入层才需要输入的参数，其它时候不要传。

然后我们来看add_weight方法，该方法会给当前层添加需要训练的权重。

def add_weight(self,
               name,
               shape,
               dtype=None,
               initializer=None,
               regularizer=None,
               trainable=True,
               constraint=None):
    initializer = initializers.get(initializer)
    if dtype is None:
        dtype = self.dtype
    weight = K.variable(initializer(shape, dtype=dtype),
                        dtype=dtype,
                        name=name,
                        constraint=constraint)
    if regularizer is not None:
        with K.name_scope('weight_regularizer'):
            self.add_loss(regularizer(weight))
    if trainable:
        self._trainable_weights.append(weight)
    else:
        self._non_trainable_weights.append(weight)
    return weight

其中K.variable()方法其实就是调用的tf的tf.Variable()方法，并判断是否有正则项，调用add_loss方法把当前层的loss保存下来，然后根据trainable参数判断是否是需要训练的权重，并分别添加到需要训练和不需要训练的两个列表中。

接下来我们来看最重要的magic函数__call__，这个函数比较长，我们分几段来讲

def __call__(self, inputs, **kwargs):
    if isinstance(inputs, list):
        inputs = inputs[:]
    with K.name_scope(self.name):
        # Handle laying building (weight creating, input spec locking).
        if not self.built:
            # Raise exceptions in case the input is not compatible
            # with the input_spec specified in the layer constructor.
            self.assert_input_compatibility(inputs)

            # Collect input shapes to build layer.
            input_shapes = []
            for x_elem in to_list(inputs):
                if hasattr(x_elem, '_keras_shape'):
                    input_shapes.append(x_elem._keras_shape)
                elif hasattr(K, 'int_shape'):
                    input_shapes.append(K.int_shape(x_elem))
                else:
                    raise ValueError('You tried to call layer "' +
                                     self.name +
                                     '". This layer has no information'
                                     ' about its expected input shape, '
                                     'and thus cannot be built. '
                                     'You can build it manually via: '
                                     '`layer.build(batch_input_shape)`')
            self.build(unpack_singleton(input_shapes))
            self.built = True

            # Load weights that were specified at layer instantiation.
            if self._initial_weights is not None:
                self.set_weights(self._initial_weights)

首先，根据构造方法中的name构造了一个scope，然后对built参数进行了判断，如果是false，表示还没有执行过build方法，那么先判断是否是可用的输入值，然后取得输入数据的shape并执行build方法，build方法是对网络结构的构造，执行完build方法后会把built置为true，并判断一下权重是否需要初始化。如果我们要复用一个layer，此时的built参数是true，就不需要多次执行build方法了，这也是为啥要构造scope的原因。

	   # Raise exceptions in case the input is not compatible
	   # with the input_spec set at build time.
	   self.assert_input_compatibility(inputs)
	
	   # Handle mask propagation.
	   previous_mask = _collect_previous_mask(inputs)
	   user_kwargs = kwargs.copy()
	   if not is_all_none(previous_mask):
	       # The previous layer generated a mask.
	       if has_arg(self.call, 'mask'):
	           if 'mask' not in kwargs:
	               # If mask is explicitly passed to __call__,
	               # we should override the default mask.
	               kwargs['mask'] = previous_mask
	   # Handle automatic shape inference (only useful for Theano).
	   input_shape = _collect_input_shape(inputs)
	
	   # Actually call the layer,
	   # collecting output(s), mask(s), and shape(s).
	   output = self.call(inputs, **kwargs)
	   output_mask = self.compute_mask(inputs, previous_mask)
	
	   # If the layer returns tensors from its inputs, unmodified,
	   # we copy them to avoid loss of tensor metadata.
	   output_ls = to_list(output)
	   inputs_ls = to_list(inputs)
	   output_ls_copy = []
	   for x in output_ls:
	       if x in inputs_ls:
	           x = K.identity(x)
	       output_ls_copy.append(x)
	   output = unpack_singleton(output_ls_copy)

这部分的代码主要的内容是把输入经过网络后得到输出结果，首先还是先判断下是否是可用的输入值，然后从输入值从判断下是否有mask，mask在后续的文章中会讲解，如果有就取出来，然后调用call方法，得到输出值，并更新mask，之后对输入输出值做一个比较，如果输入值与输出值是一样的，则直接返回输入值，防止丢失一些元数据，其中call方法就是网络中tensor的一个变化过程。

 		# Inferring the output shape is only relevant for Theano.
        if all([s is not None
                for s in to_list(input_shape)]):
            output_shape = self.compute_output_shape(input_shape)
        else:
            if isinstance(input_shape, list):
                output_shape = [None for _ in input_shape]
            else:
                output_shape = None

        if (not isinstance(output_mask, (list, tuple)) and
                len(output_ls) > 1):
            # Augment the mask to match the length of the output.
            output_mask = [output_mask] * len(output_ls)

        # Add an inbound node to the layer, so that it keeps track
        # of the call and of all new variables created during the call.
        # This also updates the layer history of the output tensor(s).
        # If the input tensor(s) had not previous Keras history,
        # this does nothing.
        self._add_inbound_node(input_tensors=inputs,
                               output_tensors=output,
                               input_masks=previous_mask,
                               output_masks=output_mask,
                               input_shapes=input_shape,
                               output_shapes=output_shape,
                               arguments=user_kwargs)

        # Apply activity regularizer if any:
        if (hasattr(self, 'activity_regularizer') and
                self.activity_regularizer is not None):
            with K.name_scope('activity_regularizer'):
                regularization_losses = [
                    self.activity_regularizer(x)
                    for x in to_list(output)]
            self.add_loss(regularization_losses,
                          inputs=to_list(inputs))
    return output

最后这段代码就是一些善后工作了，首先根据输入shape得到输出shape，也正是在此处调用了compute_output_shape方法，然后判断下mask和输出值维度是否相同，如果不同则把mask的维度进行一个扩展。然后调用了_add_inbound_node方法，该方法会创建一个node（node的作用下文会讲），把不同的层连接起来，这样当前的层才能拿到之前层的一些输出值和mask。最后判断下是否需要activity_regularizer，并添加到loss中去。kera中包括三种正则化，这里简单提一下

kernel_regularizer：对该层中的权值进行正则化，使其不至于过大。
bias_regularizer：与权值类似，限制该层中 biases 的大小。
activity_regularizer：对该层的输出进行正则化。

除了Layer类外，这个py文件中还有两个类，一个是InputSpec另一个是Node，我们再一起看下这两个类是干啥的。

class InputSpec(object):
    def __init__(self, dtype=None,
                 shape=None,
                 ndim=None,
                 max_ndim=None,
                 min_ndim=None,
                 axes=None):
        self.dtype = dtype
        self.shape = shape
        if shape is not None:
            self.ndim = len(shape)
        else:
            self.ndim = ndim
        self.max_ndim = max_ndim
        self.min_ndim = min_ndim
        self.axes = axes or {}

    def __repr__(self):
        spec = [('dtype=' + str(self.dtype)) if self.dtype else '',
                ('shape=' + str(self.shape)) if self.shape else '',
                ('ndim=' + str(self.ndim)) if self.ndim else '',
                ('max_ndim=' + str(self.max_ndim)) if self.max_ndim else '',
                ('min_ndim=' + str(self.min_ndim)) if self.min_ndim else '',
                ('axes=' + str(self.axes)) if self.axes else '']
        return 'InputSpec(%s)' % ', '.join(x for x in spec if x)

这个类主要是给layer来指定ndim、dtype、shape等参数的，简单的赋值操作，没有过多复杂的语句。

Node类是用来把两个layer连接在一起的，好比一个数据通道，传递不同的layer之间需要用到的数据，为什么要有这个Node呢，我的理解是为了解耦，Layer只关注其本身数据的处理，不关注数据的传递。回顾下Layer类，其中有两个参数，self._inbound_nodes与self._outbound_nodes，每次实例化Node的时候，就会向这两个参数中添加Node对象。

class Node(object):
    def __init__(self, outbound_layer,
                 inbound_layers, node_indices, tensor_indices,
                 input_tensors, output_tensors,
                 input_masks, output_masks,
                 input_shapes, output_shapes,
                 arguments=None):
        # Layer instance (NOT a list).
        # this is the layer that takes a list of input tensors
        # and turns them into a list of output tensors.
        # the current node will be added to
        # the inbound_nodes of outbound_layer.
        self.outbound_layer = outbound_layer

        # The following 3 properties describe where
        # the input tensors come from: which layers,
        # and for each layer, which node and which
        # tensor output of each node.

        # List of layer instances.
        self.inbound_layers = inbound_layers
        # List of integers, 1:1 mapping with inbound_layers.
        self.node_indices = node_indices
        # List of integers, 1:1 mapping with inbound_layers.
        self.tensor_indices = tensor_indices

        # Following 2 properties:
        # tensor inputs and outputs of outbound_layer.

        # List of tensors. 1:1 mapping with inbound_layers.
        self.input_tensors = input_tensors
        # List of tensors, created by outbound_layer.call().
        self.output_tensors = output_tensors

        # Following 2 properties: input and output masks.
        # List of tensors, 1:1 mapping with input_tensor.
        self.input_masks = input_masks
        # List of tensors, created by outbound_layer.compute_mask().
        self.output_masks = output_masks

        # Following 2 properties: input and output shapes.

        # List of shape tuples, shapes of input_tensors.
        self.input_shapes = input_shapes
        # List of shape tuples, shapes of output_tensors.
        self.output_shapes = output_shapes

        # Optional keyword arguments to layer's `call`.
        self.arguments = arguments

        # Add nodes to all layers involved.
        for layer in inbound_layers:
            if layer is not None:
                layer._outbound_nodes.append(self)
        outbound_layer._inbound_nodes.append(self)

    def get_config(self):
        inbound_names = []
        for layer in self.inbound_layers:
            if layer:
                inbound_names.append(layer.name)
            else:
                inbound_names.append(None)
        if self.outbound_layer:
            outbound_layer = self.outbound_layer.name
        else:
            outbound_layer = None
        return {'outbound_layer': outbound_layer,
                'inbound_layers': inbound_names,
                'node_indices': self.node_indices,
                'tensor_indices': self.tensor_indices}

Node类代码也都是赋值操作和获取几个属性，这里关键看下Layer中调用的_add_inbound_node方法。

    def _add_inbound_node(self, input_tensors, output_tensors,
                          input_masks, output_masks,
                          input_shapes, output_shapes, arguments=None):
                          
        input_tensors = to_list(input_tensors)
        output_tensors = to_list(output_tensors)
        input_masks = to_list(input_masks)
        output_masks = to_list(output_masks)
        input_shapes = to_list(input_shapes)
        output_shapes = to_list(output_shapes)

        # Collect input tensor(s) coordinates.
        inbound_layers = []
        node_indices = []
        tensor_indices = []
        for x in input_tensors:
            if hasattr(x, '_keras_history'):
                inbound_layer, node_index, tensor_index = x._keras_history
                inbound_layers.append(inbound_layer)
                node_indices.append(node_index)
                tensor_indices.append(tensor_index)
            else:
                inbound_layers.append(None)
                node_indices.append(None)
                tensor_indices.append(None)

        # Create node, add it to inbound nodes.
        Node(
            self,
            inbound_layers=inbound_layers,
            node_indices=node_indices,
            tensor_indices=tensor_indices,
            input_tensors=input_tensors,
            output_tensors=output_tensors,
            input_masks=input_masks,
            output_masks=output_masks,
            input_shapes=input_shapes,
            output_shapes=output_shapes,
            arguments=arguments
        )

        # Update tensor history, _keras_shape and _uses_learning_phase.
        for i in range(len(output_tensors)):
            output_tensors[i]._keras_shape = output_shapes[i]
            uses_lp = any(
                [getattr(x, '_uses_learning_phase', False)
                 for x in input_tensors])
            uses_lp = getattr(self, 'uses_learning_phase', False) or uses_lp
            output_tensors[i]._uses_learning_phase = getattr(
                output_tensors[i], '_uses_learning_phase', False) or uses_lp
            output_tensors[i]._keras_history = (self,
                                                len(self._inbound_nodes) - 1,
                                                i)