tf.keras.layers.TextVectorization 示例

最新推荐文章于 2025-03-30 18:58:23 发布

夏华东的博客

最新推荐文章于 2025-03-30 18:58:23 发布

阅读量4.4k

点赞数 2

本文链接：https://blog.csdn.net/weixin_44493841/article/details/121850492

版权

将文本要素映射到整数序列的预处理图层。

将不同长度的文本，转换成相同长度的数组

import tensorflow as tf

text_layer = tf.keras.layers.TextVectorization(
    max_tokens=5000,  # 词汇表最大尺寸
    output_mode='int',  # 输出整数索引
)  # 创建 TextVectorization 层
print(text_layer)

<keras.layers.preprocessing.text_vectorization.TextVectorization object at 0x000001E6C7EE61C0>

data = [
    "听 话",  # 第1句话
    "你 好 吗 ？",  # 第2句话
    "我 是 一 个 中 国 人"  # 第3句话
]  # 数据

text_layer.adapt(data)  # 数据加入 TextVectorization 层

text_layer.get_vocabulary()  # 得到所有单词字典（字典里多了 '' '[UNK]'）

['', '[UNK]', '？', '话', '是', '我', '好', '国', '听', '吗', '你', '人', '中', '个', '一']

text_layer(data)  # 得到 data 中字典下标组成的数组

<tf.Tensor: shape=(3, 7), dtype=int64, numpy=
array([[ 8,  3,  0,  0,  0,  0,  0],
       [10,  6,  9,  2,  0,  0,  0],
       [ 5,  4, 14, 13, 12,  7, 11]], dtype=int64)>

在这里插入图片描述