文本的embedding

最新推荐文章于 2024-07-30 09:54:58 发布

qq_33051477

最新推荐文章于 2024-07-30 09:54:58 发布

阅读量1.3k

点赞数

本文链接：https://blog.csdn.net/qq_33051477/article/details/79377933

版权

本文探讨了使用word2vec和glove进行文本embedding的方法。通过加载预训练的词向量模型，将词转换为固定维度的向量。实验结果显示，glove在准确率上比word2vec高出8%。参考了相关的Keras模型整合预训练embedding的资源。

摘要由CSDN通过智能技术生成

使用：word2vec进行文本的embedding

VECTOR_DIR = 'GoogleNews-vectors-negative300.bin' # 词向量模型文件

from keras.utils import plot_model
from keras.layers import Embedding
import gensim
from gensim.models import Word2Vec
EMBEDDING_DIM = 300 # 词向量空间维度
w2v_model = gensim.models.KeyedVectors.load_word2vec_format(VECTOR_DIR, binary=True)
embedding_matrix = np.zeros((len(word_index) + 1, EMBEDDING_DIM))
for word, i in word_index.items():
if unicode(word) in w2v_model:
embedding_matrix[i] = np.asarray(w2v_model[unicode(word)],
dtype='float32')
embedding_layer = Embedding(len(word_index) + 1,
EMBEDDING_DIM,
weights=[embedding_matrix],
input_length=500,