keras目录文件详解5.1(embeddings.py)-keras学习笔记五
keras\layers\embeddings.py
建立词向量嵌入层,把输入文本转为可以进一步处理的数据格式(例如,矩阵)
代码注释
"""Embedding layer.
# 嵌入层
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from .. import backend as K
from .. import initializers
from .. import regularizers
from .. import constraints
from ..engine import Layer
from ..legacy import interfaces
class Embedding(Layer):
"""Turns positive integers (indexes) into dense vectors of fixed size.
将正整数(索引)转换成固定大小的全连接向量。
eg. [[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]]
This layer can only be used as the first layer in a model.
本层只能用作模型第一层
# Example
举例:
```python
model = Sequential()
model.add(Embedding(1000, 64, input_length=10))
# the model will take as input an integer matrix of size (batch, input_length).
该模型将作为输入的整数矩阵的大小(batch, input_length)。
#
# the largest integer (i.e. word index) in the input should be no larger than 999 (vocabulary size).
输入中最大的整数(即单词索引)不应大于999(vocabulary size,词汇量)。
# now model.output_shape == (None, 10, 64), where None is the batch dimension.
现在模型输出形状.output_shape == (None, 10, 64),None是批次的维度
input_array = np.random.randint(1000, size=(32, 10))
model.compile('rmsprop', 'mse')
output_array = model.predict(input_array)
assert output_array.shape == (32, 10, 64)
```
# Arguments
input_dim: int > 0. Size of the vocabulary,
input_dim: int > 0. 词汇总数(大小,量)
i.e. maximum integer index + 1.
output_dim: int >= 0. Dimension of the dense embedding.
output_dim: int >= 0. 全连接嵌入的维度.
embeddings_initializer: Initializer for the `embeddings` matrix
embeddings_initializer: 嵌入矩阵初始化
(see [initializers](../initializers.md)).
(见 [initializers](../initializers.md)).
embeddings_regularizer: Regularizer function applied to
the `embeddings` matrix
(see [regularizer](../regularizers.md)).
embeddings_regularizer: 嵌入矩阵的规则化函数
(见 [regularizer](../regularizers.md)).
embeddings_constraint: Constraint function applied to
the `embeddings` matrix
约束函数在“嵌入”矩阵中的应用
(see [constraints](../constraints.md)).
(见 [constraints](../constraints.md)).
mask_zero: Whether or not the input value 0 is a special "padding"
value that should be masked out.
输入值0是否是一个特殊的“填充”值,应该被屏蔽掉。
This is useful when using [recurrent layers](recurrent.md)
使用 [recurrent layers](recurrent.md)是有用的
which may take variable length input.
可以采用可变长度的输入。
If this is `True` then all subsequent layers
in the model need to support masking or an exception will be raised.
If mask_zero is set to True, as a consequence, index 0 cannot be
used in the vocabulary (input_dim should equal size of
vocabulary + 1).
如果这是“真”,那么模型中的所有后续层都需要支持掩蔽,否则将引发异常。如果mask_zero设置为true,
因此,索引0不能在词汇表中使用(input_dim应该等于vocabulary + 1的大小)。
input_length: Length of input sequences, when it is constant.
This argument is required if you are going to connect
`Flatten` then `Dense` layers upstream
(without it, the shape of the dense outputs cannot be computed).
input_length: 输入序列的长度,当它是常数时。
如果要连接上游的“Flatten”和“Dense”层(如果没有它,则无法计算全连接输出的形状),则需要此参数。
# Input shape
输入形状
2D tensor with shape: `(batch_size, sequence_length)`.
2维张量,形状:(batch_size, sequence_length)`.
# Output shape
输出形状
3D tensor with shape: `(batch_size, sequence_length, output_dim)`.
3维张量,形状: `(batch_size, sequence_length, output_dim)`.
# References
参考
- [A Theoretically Grounded Application of Dropout in Recurrent Neural Networks](http://arxiv.org/abs/1512.05287)
- [循环(递归)神经网络中Dropout理论的一个接地应用](http://arxiv.org/abs/1512.05287)
"""
@interfaces.legacy_embedding_support
def __init__(self, input_dim, output_dim,
embeddings_initializer='uniform',
embeddings_regularizer=None,
activity_regularizer=None,
embeddings_constraint=None,
mask_zero=False,
input_length=None,
**kwargs):
if 'input_shape' not in kwargs:
if input_length:
kwargs['input_shape'] = (input_length,)
else:
kwargs['input_shape'] = (None,)
super(Embedding, self).__init__(**kwargs)
self.input_dim = input_dim
self.output_dim = output_dim
self.embeddings_initializer = initializers.get(embeddings_initializer)
self.embeddings_regularizer = regularizers.get(embeddings_regularizer)
self.activity_regularizer = regularizers.get(activity_regularizer)
self.embeddings_constraint = constraints.get(embeddings_constraint)
self.mask_zero = mask_zero
self.input_length = input_length
def build(self, input_shape):
self.embeddings = self.add_weight(
shape=(self.input_dim, self.output_dim),
initializer=self.embeddings_initializer,
name='embeddings',
regularizer=self.embeddings_regularizer,
constraint=self.embeddings_constraint,
dtype=self.dtype)
self.built = True
def compute_mask(self, inputs, mask=None):
if not self.mask_zero:
return None
else:
return K.not_equal(inputs, 0)
def compute_output_shape(self, input_shape):
if self.input_length is None:
return input_shape + (self.output_dim,)
else:
# input_length can be tuple if input is 3D or higher
# 如果输入是3维或更高(维度),输入长度可以是元组。
if isinstance(self.input_length, (list, tuple)):
in_lens = list(self.input_length)
else:
in_lens = [self.input_length]
if len(in_lens) != len(input_shape) - 1:
ValueError('"input_length" is %s, but received input has shape %s' %
(str(self.input_length), str(input_shape)))
else:
for i, (s1, s2) in enumerate(zip(in_lens, input_shape[1:])):
if s1 is not None and s2 is not None and s1 != s2:
ValueError('"input_length" is %s, but received input has shape %s' %
(str(self.input_length), str(input_shape)))
elif s1 is None:
in_lens[i] = s2
return (input_shape[0],) + tuple(in_lens) + (self.output_dim,)
def call(self, inputs):
if K.dtype(inputs) != 'int32':
inputs = K.cast(inputs, 'int32')
out = K.gather(self.embeddings, inputs)
return out
def get_config(self):
config = {'input_dim': self.input_dim,
'output_dim': self.output_dim,
'embeddings_initializer': initializers.serialize(self.embeddings_initializer),
'embeddings_regularizer': regularizers.serialize(self.embeddings_regularizer),
'activity_regularizer': regularizers.serialize(self.activity_regularizer),
'embeddings_constraint': constraints.serialize(self.embeddings_constraint),
'mask_zero': self.mask_zero,
'input_length': self.input_length}
base_config = super(Embedding, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
代码执行
Keras详细介绍
中文:http://keras-cn.readthedocs.io/en/latest/
实例下载
https://github.com/keras-team/keras
https://github.com/keras-team/keras/tree/master/examples
完整项目下载
方便没积分童鞋,请加企鹅452205574,共享文件夹。
包括:代码、数据集合(图片)、已生成model、安装库文件等。