keras\layers目录文件详解6.1（embeddings.py）-keras学习笔记六

最新推荐文章于 2022-12-27 08:55:01 发布

wyx100

最新推荐文章于 2022-12-27 08:55:01 发布

阅读量1.2k

点赞数 1

分类专栏： python 人工智能文章标签：深度学习 keras

本文链接：https://blog.csdn.net/wyx100/article/details/81079266

版权

人工智能同时被 2 个专栏收录

56 篇文章 0 订阅

订阅专栏

python

27 篇文章 0 订阅

订阅专栏

keras目录文件详解5.1（embeddings.py）-keras学习笔记五

keras\layers\embeddings.py

建立词向量嵌入层，把输入文本转为可以进一步处理的数据格式（例如，矩阵）

Keras开发包文件目录

Keras实例文件目录

代码注释

"""Embedding layer.
# 嵌入层
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from .. import backend as K
from .. import initializers
from .. import regularizers
from .. import constraints
from ..engine import Layer
from ..legacy import interfaces


class Embedding(Layer):
    """Turns positive integers (indexes) into dense vectors of fixed size.
    将正整数（索引）转换成固定大小的全连接向量。
    eg. [[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]]

    This layer can only be used as the first layer in a model.
    本层只能用作模型第一层

    # Example
    举例：

    ```python
      model = Sequential()
      model.add(Embedding(1000, 64, input_length=10))
      # the model will take as input an integer matrix of size (batch, input_length).
      该模型将作为输入的整数矩阵的大小（batch, input_length）。
      #
      # the largest integer (i.e. word index) in the input should be no larger than 999 (vocabulary size).
      输入中最大的整数（即单词索引）不应大于999（vocabulary size,词汇量）。
      # now model.output_shape == (None, 10, 64), where None is the batch dimension.
      现在模型输出形状.output_shape == (None, 10, 64),None是批次的维度

      input_array = np.random.randint(1000, size=(32, 10))

      model.compile('rmsprop', 'mse')
      output_array = model.predict(input_array)
      assert output_array.shape == (32, 10, 64)
    ```

    # Arguments
      input_dim: int > 0. Size of the vocabulary,
      input_dim: int > 0. 词汇总数（大小，量）
          i.e. maximum integer index + 1.
      output_dim: int >= 0. Dimension of the dense embedding.
      output_dim: int >= 0. 全连接嵌入的维度.
      embeddings_initializer: Initializer for the `embeddings` matrix
      embeddings_initializer: 嵌入矩阵初始化
          (see [initializers](../initializers.md)).
          (见 [initializers](../initializers.md)).
      embeddings_regularizer: Regularizer function applied to
          the `embeddings` matrix
          (see [regularizer](../regularizers.md)).
      embeddings_regularizer: 嵌入矩阵的规则化函数
          (见 [regularizer](../regularizers.md)).
      embeddings_constraint: Constraint function applied to
          the `embeddings` matrix
          约束函数在“嵌入”矩阵中的应用
          (see [constraints](../constraints.md)).
          (见 [constraints](../constraints.md)).
      mask_zero: Whether or not the input value 0 is a special "padding"
          value that should be masked out.
          输入值0是否是一个特殊的“填充”值，应该被屏蔽掉。
          This is useful when using [recurrent layers](recurrent.md)
          使用 [recurrent layers](recurrent.md)是有用的
          which may take variable length input.
          可以采用可变长度的输入。
          If this is `True` then all subsequent layers
          in the model need to support masking or an exception will be raised.
          If mask_zero is set to True, as a consequence, index 0 cannot be
          used in the vocabulary (input_dim should equal size of
          vocabulary + 1).
          如果这是“真”，那么模型中的所有后续层都需要支持掩蔽，否则将引发异常。如果mask_zero设置为true，
          因此，索引0不能在词汇表中使用（input_dim应该等于vocabulary + 1的大小）。
      input_length: Length of input sequences, when it is constant.
          This argument is required if you are going to connect
          `Flatten` then `Dense` layers upstream
          (without it, the shape of the dense outputs cannot be computed).
      input_length: 输入序列的长度，当它是常数时。
            如果要连接上游的“Flatten”和“Dense”层（如果没有它，则无法计算全连接输出的形状），则需要此参数。

    # Input shape
    输入形状
        2D tensor with shape: `(batch_size, sequence_length)`.
        2维张量，形状：(batch_size, sequence_length)`.

    # Output shape
    输出形状
        3D tensor with shape: `(batch_size, sequence_length, output_dim)`.
        3维张量，形状: `(batch_size, sequence_length, output_dim)`.

    # References
    参考
        - [A Theoretically Grounded Application of Dropout in Recurrent Neural Networks](http://arxiv.org/abs/1512.05287)
        - [循环（递归）神经网络中Dropout理论的一个接地应用](http://arxiv.org/abs/1512.05287)
    """

    @interfaces.legacy_embedding_support
    def __init__(self, input_dim, output_dim,
                 embeddings_initializer='uniform',
                 embeddings_regularizer=None,
                 activity_regularizer=None,
                 embeddings_constraint=None,
                 mask_zero=False,
                 input_length=None,
                 **kwargs):
        if 'input_shape' not in kwargs:
            if input_length:
                kwargs['input_shape'] = (input_length,)
            else:
                kwargs['input_shape'] = (None,)
        super(Embedding, self).__init__(**kwargs)

        self.input_dim = input_dim
        self.output_dim = output_dim
        self.embeddings_initializer = initializers.get(embeddings_initializer)
        self.embeddings_regularizer = regularizers.get(embeddings_regularizer)
        self.activity_regularizer = regularizers.get(activity_regularizer)
        self.embeddings_constraint = constraints.get(embeddings_constraint)
        self.mask_zero = mask_zero
        self.input_length = input_length

    def build(self, input_shape):
        self.embeddings = self.add_weight(
            shape=(self.input_dim, self.output_dim),
            initializer=self.embeddings_initializer,
            name='embeddings',
            regularizer=self.embeddings_regularizer,
            constraint=self.embeddings_constraint,
            dtype=self.dtype)
        self.built = True

    def compute_mask(self, inputs, mask=None):
        if not self.mask_zero:
            return None
        else:
            return K.not_equal(inputs, 0)

    def compute_output_shape(self, input_shape):
        if self.input_length is None:
            return input_shape + (self.output_dim,)
        else:
            # input_length can be tuple if input is 3D or higher
            # 如果输入是3维或更高（维度），输入长度可以是元组。
            if isinstance(self.input_length, (list, tuple)):
                in_lens = list(self.input_length)
            else:
                in_lens = [self.input_length]
            if len(in_lens) != len(input_shape) - 1:
                ValueError('"input_length" is %s, but received input has shape %s' %
                           (str(self.input_length), str(input_shape)))
            else:
                for i, (s1, s2) in enumerate(zip(in_lens, input_shape[1:])):
                    if s1 is not None and s2 is not None and s1 != s2:
                        ValueError('"input_length" is %s, but received input has shape %s' %
                                   (str(self.input_length), str(input_shape)))
                    elif s1 is None:
                        in_lens[i] = s2
            return (input_shape[0],) + tuple(in_lens) + (self.output_dim,)

    def call(self, inputs):
        if K.dtype(inputs) != 'int32':
            inputs = K.cast(inputs, 'int32')
        out = K.gather(self.embeddings, inputs)
        return out

    def get_config(self):
        config = {'input_dim': self.input_dim,
                  'output_dim': self.output_dim,
                  'embeddings_initializer': initializers.serialize(self.embeddings_initializer),
                  'embeddings_regularizer': regularizers.serialize(self.embeddings_regularizer),
                  'activity_regularizer': regularizers.serialize(self.activity_regularizer),
                  'embeddings_constraint': constraints.serialize(self.embeddings_constraint),
                  'mask_zero': self.mask_zero,
                  'input_length': self.input_length}
        base_config = super(Embedding, self).get_config()
        return dict(list(base_config.items()) + list(config.items()))

代码执行

Keras详细介绍

英文：https://keras.io/

中文：http://keras-cn.readthedocs.io/en/latest/

实例下载

https://github.com/keras-team/keras

https://github.com/keras-team/keras/tree/master/examples

完整项目下载

方便没积分童鞋，请加企鹅452205574，共享文件夹。

包括：代码、数据集合（图片）、已生成model、安装库文件等。