1.视频推荐:
独热编码python实现:
独热编码缺点——(离散稀疏)维度灾难:
tf.one_hot()函数是将input转化为one-hot类型数据输出,相当于将多个数值联合放在一起作为多个相同类型的向量,可用于表示各自的概率分布,通常用于分类任务中作为最后的FC层的输出,有时翻译成“独热”编码。
tf.one_hot(
indices,
depth,
on_value=None,
off_value=None,
axis=None,
dtype=None,
name=None
)
Returns a one-hot tensor(返回一个one_hot张量).
The locations represented by indices in indices take value on_value, while all other locations take value off_value.
(由indices指定的位置将被on_value填充, 其他位置被off_value填充).
on_value and off_value must have matching data types. If dtype is also provided, they must be the same data type as specified by dtype.
(on_value和off_value必须具有相同的数据类型).
If on_value is not provided, it will default to the value 1 with type dtype.
If off_value is not provided, it will default to the value 0 with type dtype.
If the input indices is rank N, the output will have rank N+1. The new axis is created at dimension axis (default: the new axis is appended at the end).
(如果indices是N维张量,那么函数输出将是N+1维张量,默认在最后一维添加新的维度).
If indices is a scalar the output shape will be a vector of length depth.
(如果indices是一个标量, 函数输出将是一个长度为depth的向量)
If indices is a vector of length features, the output shape will be:
features x depth if axis == -1.
(如果indices是一个长度为features的向量,则默认输出一个features*depth形状的张量)
depth x features if axis == 0.
(如果indices是一个长度为features的向量,axis=0,则输出一个depth*features形状的张量)
If indices is a matrix (batch) with shape [batch, features], the output shape will be:
batch x features x depth if axis == -1
(如果indices是一个形状为[batch, features]的矩阵,axis=-1(默认),则输出一个batch * features * depth形状的张量)
batch x depth x features if axis == 1
(如果indices是一个形状为[batch, features]的矩阵,axis=1,则输出一个batch * depth * features形状的张量)
depth x batch x features if axis == 0
(如果indices是一个形状为[batch, features]的矩阵,axis=0,则输出一个depth * batch * features形状的张量)
indices表示输入的多个数值,通常是矩阵形式;depth表示输出的尺寸。
由于one-hot类型数据长度为depth位,其中只用一位数字表示原输入数据,这里的on_value就是这个数字,默认值为1,one-hot数据的其他位用off_value表示,默认值为0。
tf.one_hot()函数规定输入的元素indices从0开始,最大的元素值不能超过(depth - 1),因此能够表示depth个单位的输入。若输入的元素值超出范围,输出的编码均为 [0, 0 … 0, 0]。