使用TensorFlow将词语转化为数字索引

最新推荐文章于 2023-01-03 21:16:56 发布

shanghai_in_summer

最新推荐文章于 2023-01-03 21:16:56 发布

阅读量996

点赞数

分类专栏： TensorFlow

本文链接：https://blog.csdn.net/sunjianqiang12345/article/details/83377284

版权

TensorFlow 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

TensorFlow中内建的类tf.contrib.learn.preprocessing.VocabularyProcessor( max_document_length, min_frequency=0, vocabulary=None, tokenizer_fn=None)可以返回一个“能够将文档中的词汇转化为数字索引文档”的对象。其中，max_document_length表示转换完之后，文档中，每句话的长度，min_frequency=0表示文档中，每个词出现的频次最小数。

from tensorflow.contrib import learn

texts = ['go until jurong point crazy available only in bugis n great world la e buffet cine there got amore wat',
 'ok lar joking wif u oni',
 'free entry in a wkly comp to win fa cup final tkts st may text fa to to receive entry questionstd txt ratetcs apply overs',
 'u dun say so early hor u c already then say',
 'nah i dont think he goes to usf he lives around here though']

texts2 = texts[0:5]
vocab_processor = learn.preprocessing.VocabularyProcessor(20, min_frequency=1)
transformed_texts = np.array([x for x in vocab_processor.transform(texts)])
print(transformed_texts)

## 运行结果：
[[   1    2    3 ...   18   19   20]
 [  21   22   23 ...    0    0    0]
 [  27   28    8 ...   32   41   28]
 ...
 [7687  302    8 ...    0    0    0]
 [ 128 3066  205 ...  166   68   54]
 [3173   64 1156 ...    0    0    0]]