独热编码（One-Hot Encoding）及在CNN中的应用

最新推荐文章于 2023-05-05 21:27:37 发布

vivian_ll

最新推荐文章于 2023-05-05 21:27:37 发布

阅读量6.2k

点赞数

分类专栏：深度学习文章标签：编码 cnn

本文链接：https://blog.csdn.net/vivian_ll/article/details/75647545

版权

深度学习专栏收录该内容

15 篇文章 3 订阅

订阅专栏

学CNN的时候在TensorFlow里看到了一个tf.one_hot函数，于是查到了这篇博文。
但是由于我要分的类非常多，如果用one_hot会产生非常稀疏的矩阵，浪费算力，计算速度慢，所以并没有采用。
用one-hot的方式输出CNN的结果

# encode labels in TensorFlow
import tensorflow as tf

original_indices = tf.constant([1, 5, 3])
depth = tf.constant(10)
one_hot_encoded = tf.one_hot(indices=original_indices, depth=depth)

with tf.Session():
  print(one_hot_encoded.eval())

def decode_one_hot(batch_of_vectors):
  """Computes indices for the non-zero entries in batched one-hot vectors.

  Args:
    batch_of_vectors: A Tensor with length-N vectors, having shape [..., N].
  Returns:
    An integer Tensor with shape [...] indicating the index of the non-zero
    value in each vector.
  """
  nonzero_indices = tf.where(tf.not_equal(
      batch_of_vectors, tf.zeros_like(batch_of_vectors)))
  reshaped_nonzero_indices = tf.reshape(
      nonzero_indices[:, -1], tf.shape(batch_of_vectors)[:-1])
  return reshaped_nonzero_indices

with tf.Session():
  print(decode_one_hot(one_hot_encoded).eval())

'''
[[ 0.  1.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  1.  0.  0.  0.  0.]
 [ 0.  0.  0.  1.  0.  0.  0.  0.  0.  0.]]
[1 5 3]
'''

关于one-hot：转载自：http://blog.csdn.net/xidianliutingting/article/details/53261139

比如
sex:[“male”, “female”]
country: [‘china’,’USA’,’Japan’]
正常数字量化后：
“male”, “female”用0,1表示;
‘china’,’USA’,’Japan’用0,1,2表示。
现在有3个样本：
[‘male’，‘USA’],
[‘male’，‘Japan’],
[‘female’,’China’]
处理后：
[0,1]
[0,2]
[1,0]
上述数据也不能直接用在我们的分类器中。因为，分类器往往默认数据数据是连续有序的。但是，按照我们上述的表示，数字并不是有序的，而是随机分配的。

为了解决上述问题，可以采用独热编码（One-Hot Encoding）。独热编码即 One-Hot 编码，又称一位有效编码，其方法是使用N位状态寄存器来对N个状态进行编码，每个状态都由他独立的寄存器位，并且在任意时候，其中只有一位有效。
sex有两个取值，那么可以这么考虑：当取male时为01，取female时为10
country有三个取值，当取china时为001，取USA时为010，为Japan时为100。

这些特征互斥，每次只有一个激活。因此，数据会变成稀疏的。

这样做的好处主要有：

解决了分类器不好处理属性数据的问题
在一定程度上也起到了扩充特征的作用

sklearn中有具体的处理方法：
import pandas as pd
data=
这里写图片描述
var_to_encode= [‘sex’,’country’]
data = pd.get_dummies(data, columns=var_to_encode)

(function () { $(function () {$ ('pre.prettyprint code').each(function () { var lines =

(this).text().split(′\n′).length;var $(this).text().split('\n').length; var$ numbering = $('

').addClass('pre-numbering').hide();

(this).addClass(′has−numbering′).parent().append( $(this).addClass('has-numbering').parent().append($ numbering); for (i = 1; i

vivian_ll

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
独热编码（One-Hot Encoding）及在CNN中的应用

比如 sex:[“male”, “female”] country: [‘china’,’USA’,’Japan’] 正常数字量化后： “male”, “female”用0,1表示; ‘china’,’USA’,’Japan’用0,1,2表示。现在有3个样本： [‘male’，‘USA’], [‘male’，‘Japan’], [‘female’,’China’] 处理后：
复制链接

扫一扫

专栏目录