tf.nn.embedding_lookup_sparse 详解

最新推荐文章于 2021-05-20 18:05:16 发布

kangshuangzhu

最新推荐文章于 2021-05-20 18:05:16 发布

阅读量6.2k

点赞数 12

分类专栏：数据挖掘

本文链接：https://blog.csdn.net/kangshuangzhu/article/details/108196186

版权

数据挖掘专栏收录该内容

14 篇文章 1 订阅

订阅专栏

简介

tf.nn.embedding_lookup_sparse是用于embedding查表的函数。这个函数跳过把特征onehot，然后再用onehot的向量与embeedding矩阵做矩阵乘法的过程。直接根据特征的值，从embedding矩阵中查表得到embedding的结果。

例如对于性别这个特征，我们假设有2个值，0代表男，1 代表女。embedding矩阵是

[ [0.3,0.2,0.4,0.7,0.6], [0.1,0.1,0.8,0.5,0.4] ]

一般流程是：先把特征onehot，男变成[1,0],女变成[0,1]。然后计算矩阵乘法

男的embedding过程就是

[0,1] × [ [0.3,0.2,0.4,0.7,0.6], [0.1,0.1,0.8,0.5,0.4] ] = [0.3,0.2,0.4,0.7,0.6]

女的embedding过程就是

[1,0] × [ [0.3,0.2,0.4,0.7,0.6], [0.1,0.1,0.8,0.5,0.4] ] = [0.1,0.1,0.8,0.5,0.4]

tf.nn.embedding_lookup_sparse则帮助我们省略了手动计算矩阵乘法的过程，直接根据男的特征值0，从embedding中查表抽出第一行[0.3,0.2,0.4,0.7,0.6]。

用法

首先来看一下官方的文档：

tf.nn.embedding_lookup_sparse(
    params, sp_ids, sp_weights, combiner=None, max_norm=None, name=None
)

Args
`params`	A single tensor representing the complete embedding tensor, or a list of tensors all of same shape except for the first dimension, representing sharded embedding tensors following "div" partition strategy.
`sp_ids`	N x M `SparseTensor` of int64 ids where N is typically batch size and M is arbitrary.
`sp_weights`	either a `SparseTensor` of float / double weights, or `None` to indicate all weights should be taken to be 1. If specified, `sp_weights` must have exactly the same shape and indices as `sp_ids`.
`combiner`	A string specifying the reduction op. Currently "mean", "sqrtn" and "sum" are supported. "sum" computes the weighted sum of the embedding results for each row. "mean" is the weighted sum divided by the total weight. "sqrtn" is the weighted sum divided by the square root of the sum of the squares of the weights. Defaults to `mean`.
`max_norm`	If not `None`, each embedding is clipped if its l2-norm is larger than this value, before combining.
`name`	Optional name for the op.

这个文档很多地方语焉不详，下面是我的个人理解：

tf.nn.embedding_lookup_sparse接收3个必须入参：

param：用于查表的embedding矩阵

sp_id：一个sparse tensor，记录要从param抽取数据的位置

sparse tensor是一种记录稀疏数矩阵的方法，一个sparse tensor有三个分量：

indices: 一个二维tensor，记录不为nan的元素位置。

如果这个稀疏矩阵是一维的，那么indices的形式是[ [0], [2], [6] ] 意为第0，2，6个元素不为nan

如果这个稀疏矩阵是二维的，那么indices的形式是[ [0,1], [0,4], [2,3] ]。意为位置 [0,1], [0,4], [2,3]上的元素不为0

如果这个稀疏矩阵是三维的，那么indices的形式是[ [0,1,1], [0,4,2], [2,3,2] ]。意为位置 [0,1,1], [0,4,2], [2,3,2] 上的元素不为0

注意indice一定要是单调递增的，否则会报错

values:不为nan的元素的值是多少，是一个一维向量。向量的维度和indices第一个维度必须相同。这很好理解，你指定了多少个位置的元素不为nan，那你就得把这些不为nan的值全部说明。

dense_shape：这个稀疏矩阵的总体形状

sp_weights：各个embedding后的向量的权重，形状必须与sp_id相同。如果为none，则全部为1。

下面通过几个例子来说明计算过程：

例子一：

params = tf.constant([[0.1, 0.4, 0.5, 7.0, 6.4, 1.2, 0.5, 0.3, 3.3, 2.0],
                      [0.3, 0.4, 0.9, 0.8, 0.5, 0.3, 0.7, 0.5, 0.8, 3.2],
                      [0.4, 0.9, 1.1, 4.3, 3.4, 0.2, 0.3, 0.2, 0.5, 0.1]])
ids = tf.SparseTensor(indices=[[0,1]],
                      values=[0],
                      dense_shape=[4,4])

tf.nn.embedding_lookup_sparse(params, ids, None)

result:
tf.Tensor([[0.1 0.4 0.5 7.  6.4 1.2 0.5 0.3 3.3 2. ]], shape=(1, 10), dtype=float32)

# 这里的ids只是为了方便理解，实际执行不会输出
ids ：
[ [nan, 0,   nan, nan],
  [nan, nan, nan, nan],
  [nan, nan, nan, nan],
  [nan, nan, nan, nan]
]

ids 只有一个非nan的值是0，所以就取出param[0](第一行)。注意这里不是把ids和param做乘法，而是根据ids 中不为nan的值去param中查表。

例子二：

params = tf.constant([[0.1, 0.4, 0.5, 7.0, 6.4, 1.2, 0.5, 0.3, 3.3, 2.0],
                      [0.3, 0.4, 0.9, 0.8, 0.5, 0.3, 0.7, 0.5, 0.8, 3.2],
                      [0.4, 0.9, 1.1, 4.3, 3.4, 0.2, 0.3, 0.2, 0.5, 0.1]])
ids = tf.SparseTensor(indices=[[0,1],[0,2]],
                      values=[0,2],
                      dense_shape=[4,4])
tf.nn.embedding_lookup_sparse(params, ids, None)

result:
tf.Tensor([[0.25 0.65 0.8 5.65 4.9 0.70000005 0.4  0.25  1.9 1.05 ]], shape=(1, 10), dtype=float32)

ids ：
[ [nan, 0,   2,   nan],
  [nan, nan, nan, nan],
  [nan, nan, nan, nan],
  [nan, nan, nan, nan]
]

ids 有2个非nan值，分别是0、2，都在ids第一行。所以取出param的param[0]和param[2]

[0.1, 0.4, 0.5, 7.0, 6.4, 1.2, 0.5, 0.3, 3.3, 2.0],

[0.4, 0.9, 1.1, 4.3, 3.4, 0.2, 0.3, 0.2, 0.5, 0.1]

因为2个非nan值都在ids同一行，ids的每一行是一个样本，同一个样本的结果要进行合并。合并方式由combiner来决定，这里combiner没有指定，所以用的是默认值mean，即2个向量对位取平均

得到[[0.25 0.65 0.8 5.65 4.9 0.70000005 0.4 0.25 1.9 1.05 ]],

当然combiner还有其他的合并方式，不过都非常简单，直接看文档就能看明白

例子三

params = tf.constant([[0.1, 0.4, 0.5, 7.0, 6.4, 1.2, 0.5, 0.3, 3.3, 2.0],
                      [0.3, 0.4, 0.9, 0.8, 0.5, 0.3, 0.7, 0.5, 0.8, 3.2],
                      [0.4, 0.9, 1.1, 4.3, 3.4, 0.2, 0.3, 0.2, 0.5, 0.1]])
ids = tf.SparseTensor(indices=[[0,1],[0,2],[1,2]],
                      values=[0,2,1],
                      dense_shape=[4,4])

tf.nn.embedding_lookup_sparse(params, ids, None)

result:
tf.Tensor(
[[0.25   0.65   0.8    5.65   4.9    0.70000005  0.4    0.25   1.9    1.05 ]
 [0.3    0.4    0.9    0.8    0.5    0.3         0.7    0.5    0.8    3.2  ]], shape=(2, 10), dtype=float32)

ids ：
[ [nan, 0,   2,   nan],
  [nan, nan, 1,   nan],
  [nan, nan, nan, nan],
  [nan, nan, nan, nan]
]

ids有三个非nan值，0、2在第一行，1在第二行。第一行的0，2计算方式和例二相同，第二行有一个1，取出param[1](第二行）

因为非nan值分布在两行，即两个样本中，所以最终查表的结果也是两行，因为查表的本质还是在做embedding，两个样本就应该有两个embedding结果。这里就突显出了当ids一行有2个值的时候合并的必要性，如果不合并，第一条样本根据0，2进行查表得到了

[0.1, 0.4, 0.5, 7.0, 6.4, 1.2, 0.5, 0.3, 3.3, 2.0],

[0.4, 0.9, 1.1, 4.3, 3.4, 0.2, 0.3, 0.2, 0.5, 0.1]

第二条样本根据1进行查表，得到

[0.3, 0.4, 0.9, 0.8, 0.5, 0.3, 0.7, 0.5, 0.8, 3.2],

这就导致同一个特征的2条样本embedding后的维度不一样，后面的网络结构就没办法构建了。

例子四

params = tf.constant([[0.1, 0.4, 0.5, 7.0, 6.4, 1.2, 0.5, 0.3, 3.3, 2.0],
                      [0.3, 0.4, 0.9, 0.8, 0.5, 0.3, 0.7, 0.5, 0.8, 3.2],
                      [0.4, 0.9, 1.1, 4.3, 3.4, 0.2, 0.3, 0.2, 0.5, 0.1]])
ids = tf.SparseTensor(indices=[[1,2],[2,0]],
                      values=[0,2],
                      dense_shape=[4,4])
tf.nn.embedding_lookup_sparse(params, ids, None)

result:
[[0.  0.  0.  0.  0.  0.  0.  0.  0.  0. ]
 [0.1 0.4 0.5 7.  6.4 1.2 0.5 0.3 3.3 2. ]
 [0.4 0.9 1.1 4.3 3.4 0.2 0.3 0.2 0.5 0.1]], shape=(3, 10), dtype=float32)

ids ：
[ [nan, nan, nan, nan],
  [nan, nan, 0,   nan],
  [2,   nan, nan, nan],
  [nan, nan, nan, nan]
]

这个例子本来非常简单：

ids第二行有一个值0，第三行有一个值2，从param中查表就行，可是最后的结果却有三行，最前面有一行0。这是因为tf.nn.embedding_lookup_sparse 中会自动识别ids的实际有效行数，如果矩阵某一行没有非nan值，后面的所有行也都没有非nan值（即末位的连续n行都没有非nan值），则认为该行是无效行，进行舍弃。所以下面的这个4 × 4的矩阵，实际有效行数是1，会按照1×4来处理

ids ：
[ [nan, 0,   2,   nan],
  [nan, nan, nan, nan],
  [nan, nan, nan, nan],
  [nan, nan, nan, nan]
]

那遇到下面这种sparse tensor，虽然第一行没有有效值，但是它后面的行有非nan值，所以不会舍弃。所以下面这种矩阵在tf.nn.embedding_lookup_sparse 中认为是3 × 4的。第一行没有有效值，embedding的结果就全部用0来填充。

ids ：
[ [nan, nan, nan, nan],
  [nan, nan, 0,   nan],
  [2,   nan, nan, nan],
  [nan, nan, nan, nan]
]

所以也就得到了刚才第一行全部是0的结果。

所以下面这个例子也就不难理解了：

params = tf.constant([[0.1, 0.4, 0.5, 7.0, 6.4, 1.2, 0.5, 0.3, 3.3, 2.0],
                      [0.3, 0.4, 0.9, 0.8, 0.5, 0.3, 0.7, 0.5, 0.8, 3.2],
                      [0.4, 0.9, 1.1, 4.3, 3.4, 0.2, 0.3, 0.2, 0.5, 0.1]])
ids = tf.SparseTensor(indices=[[0,2],[2,0]],
                      values=[0,2],
                      dense_shape=[4,4])
tf.nn.embedding_lookup_sparse(params, ids, None)

result:
tf.Tensor(
[[0.1 0.4 0.5 7.  6.4 1.2 0.5 0.3 3.3 2. ]
 [0.  0.  0.  0.  0.  0.  0.  0.  0.  0. ]
 [0.4 0.9 1.1 4.3 3.4 0.2 0.3 0.2 0.5 0.1]], shape=(3, 10), dtype=float32)

ids ：
[ [nan, nan, 0,   nan],
  [nan, nan, nan, nan],
  [2,   nan, nan, nan],
  [nan, nan, nan, nan]
]

kangshuangzhu

关注

12
点赞
踩
17

收藏

觉得还不错? 一键收藏
1
评论
tf.nn.embedding_lookup_sparse 详解

tf.nn.embedding_lookup_sparse和tf.nn.embedding_lookup 都是用于embedding查表的函数。这2个函数跳过把特征onehoe，然后再用onehot的向量与embeedding矩阵做矩阵乘法的过程。下面来看他们的用法：tf.nn.embedding_lookup_sparse( params, sp_ids, sp_weights, combiner=None, max_norm=None, name=None) Args...
复制链接

扫一扫