tf.contrib.learn.preprocessing.VocabularyProcessor

最新推荐文章于 2020-07-02 18:16:15 发布

chnhbhndchngn

最新推荐文章于 2020-07-02 18:16:15 发布

阅读量258

点赞数 1

分类专栏： TensorFlow学习笔记文章标签： tensorflow VocabularyProcessor preprocessing

本文链接：https://blog.csdn.net/a857553315/article/details/86442037

版权

TensorFlow学习笔记专栏收录该内容

13 篇文章 0 订阅

订阅专栏

tf中有直接将英文的句子单词转换成向量的函数

tf.contrib.learn.preprocessing.VocabularyProcessor (max_document_length, 
                                                    min_frequency=0, 
                                                    vocabulary=None, 
                                                    tokenizer_fn=None)

各参数的作用分别是：

max_document_length: 文档的最大长度。如果文本的长度大于最大长度，那么它会被剪切，反之则用0填充。
min_frequency: 词频的最小值，出现次数小于最小词频则不会被收录到词表中。
vocabulary: CategoricalVocabulary 对象。
tokenizer_fn：分词函数
下面以案例说明：

from tensorflow.contrib import learn
import numpy as np
max_document_length = 10
x_text =["Trump says he had great conversation with Putin", 'These people make it up' ]
vocab_processor = learn.preprocessing.VocabularyProcessor(max_document_length)
x = np.array(list(vocab_processor.fit_transform(x_text)))
print(x)

输出的结果如下：

[[ 1  2  3  4  5  6  7  8  0  0]
 [ 9 10 11 12 13  0  0  0  0  0]]

即，此方法只是将所有的出现的单词按出现顺序组成列表，然后将语句中出现的单词换成该单词在列表中的索引，当句子的长度不够设置的句子最大长度时，直接补0.

这种方法得到的词向量之间没有任何的关系，所以在下一步得到的模型的准确度一般不是很高。仅作参考

chnhbhndchngn

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录