《Python 深度学习》6.1 One-hot 编码（代码）

最新推荐文章于 2024-03-14 10:15:25 发布

布拉格沃兹基硕德

最新推荐文章于 2024-03-14 10:15:25 发布

阅读量1.9k

点赞数

分类专栏： Python Keras 文章标签： python 深度学习

本文链接：https://blog.csdn.net/baidu_30506559/article/details/121304224

版权

One-hot encoding of words or characters

单词和字符的 one-hot 编码

one-hot 编码是将标记转换为向量的最常用、最基本的方法。在第 3 章的 IMDB 和路透社两个例子中，你已经用过这种方法（都是处理单词）。它将每个单词与一个唯一的整数索引相关联，然后将这个整数索引 i 转换为长度为 N 的二进制向量（N 是词表大小），这个向量只有第 i 个元素是 1，其余元素都为 0。

当然，也可以进行字符级的 one-hot 编码。为了让你完全理解什么是 one-hot 编码以及如何实现 one-hot 编码，代码清单 6-1 和代码清单 6-2 给出了两个简单示例，一个是单词级的 one-hot 编码，另一个是字符级的 one-hot 编码。

1. 单词级的 one-hot 编码（简单示例）：

import numpy as np

# This is our initial data; one entry per "sample"
# (in this toy example, a "sample" is just a sentence, but
# it could be an entire document).
# 初始数据：每个样本是列表的一个元素（本例中的样本是一个句子，但也可以是一整篇文档）
samples = ['The cat sat on the mat.', 'The dog ate my homework.']

# First, build an index of all tokens in the data.
#