最近在工作中进行了NLP的内容,使用的还是Keras中embedding的词嵌入来做的。
Keras中embedding层做一下介绍。
中文文档地址: https://keras.io/zh/layers/embeddings/
参数如下:
其中参数重点有input_dim,output_dim,非必选参数input_length.
初始化方法参数设置后面会单独总结一下。
demo使用预训练(使用百度百科(word2vec)的语料库)参考
embedding使用的demo参考:
def create_embedding(word_index, num_words, word2vec_model):
embedding_matrix = np.zeros((num_words, EMBEDDING_DIM))
for word, i in word_index.items():
try:
embedding_vector = word2vec_model[word]
embedding_matrix[i] = embedding_vector
except:
continue
return embedding_matrix
#word_index:词典(统计词转换为索引)
#num_word:词典长度+1
#word2vec_model:词向量的model
加载词向量model的方法:
def pre_load_embedding_model(model_file):
# model = gensim.models.Word2Vec.load(model_file)
# model = gensim.models.Word2Vec.load(model_file,binary=True)
model = gensim.models.KeyedVectors.load_word2vec_format(model_file)
return model
model中Embedding层的设置(注意参数,Input层的输入,初始化方法):
embedding_matrix = create_embedding(word_index, num_words, word2vec_model)
embedding_layer = Embedding(num_words,
EMBEDDING_DIM,
embeddings_initializer=Constant(embedding_matrix),
input_length=MAX_SEQUENCE_LENGTH,
trainable=False)
sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32')
embedded_sequences = embedding_layer(sequence_input)
embedding层的初始化设置
keras embeding设置初始值的两种方式
随机初始化Embedding
from keras.models import Sequential
from keras.layers import Embedding
import numpy as np
model = Sequential()
model.add(Embedding(1000, 64, input_length=10))
# the model will take as input an integer matrix of size (batch, input_length).
# the largest integer (i.e. word index) in the input should be no larger than 999 (vocabulary size).
# now model.output_shape == (None, 10, 64), where None is the batch dimension.
input_array = np.random.randint(1000, size=(32, 10))
model.compile('rmsprop', 'mse')
output_array = model.predict(input_array)
print(