FiBiNet 网络介绍与源码浅析

最新推荐文章于 2023-04-25 20:32:52 发布

珍妮的选择

最新推荐文章于 2023-04-25 20:32:52 发布

阅读量871

点赞数 1

分类专栏：机器学习文章标签： tensorflow deepctr fibinet ctr paper

本文链接：https://blog.csdn.net/Eric_1993/article/details/108890910

版权

机器学习专栏收录该内容

24 篇文章 35 订阅

订阅专栏

FiBiNet 网络介绍与源码浅析

前言 (与主题无关, 可以忽略)

2020-09-30: 我知道这有点不太厚道, 文章不写全就发出来, 但最近真的很忙, 同时给自己立了 9 月再写一篇博客的 Flag~ 可是这个月只写了一篇 😭😭😭 今晚是 9 月 30 日, 月色很美 … (猜测的, 毕竟明日中秋和国庆一起过; 走路忘了抬头看看夜空, 忧桑~). 虽然下班较早, 但心事重重, 不到最后一刻不动笔. 因此现在先扯一点前言, 后续一定会以全副精力来完成 Flag! 我最近可是看了很多 paper 的, 可以乘着假期总结一番.

2020-10-10: 来更新了… 果然, Flag 这东西真的不能乱立, 看清自己了, 假期还是想玩 🤣🤣🤣

广而告之

可以在微信中搜索 “珍妮的算法之路” 或者 “world4458” 关注我的微信公众号；另外可以看看知乎专栏 PoorMemory-机器学习, 以后文章也会发在知乎专栏中；

FiBiNet

文章信息

论文标题: FiBiNET: Combining Feature Importance and Bilinear feature Interaction for Click-Through Rate Prediction
论文地址: https://arxiv.org/abs/1905.09433
代码地址: 没有找到作者释出的代码, 但是 DeepCTR 给出了实现 https://github.com/shenweichen/DeepCTR/blob/master/deepctr/models/fibinet.py
发表时间: RecSys, 2019
论文作者: Tongwen Huang, Zhiqi Zhang, Junlin Zhang
作者单位: 新浪微博

插句题外话, 三作 Junlin Zhang 应该是知乎上的 @张俊林, 当时看到大佬的推荐系统技术演进趋势：从召回到排序再到重排很受启发, 向大佬学习.

核心观点

本文提出的 FiBiNet 模型包含两个核心模块, 分别是:

SENET(Squeeze-Excitation network)
Bilinear Feature Interaction

其中 SENET 是借鉴计算机视觉中的网络, 可以动态地学习特征的重要性, 对于越重要的特征, 将学习出更大的权重, 并且减小不那么重要的特征的权重;
另外对于特征交叉的问题, 经典的方法主要采用 Inner Product 或者 Hadamard Product 来构造交叉特征, 作者认为这些方法比较简单, 可能无法对交叉特征进行有效的建模, 因此提出了 Bilinear Feature Interaction 方法, 结合了 Inner Product 以及 Hadamard Product 二者, 在两个要交叉的特征间插入一个权重矩阵, 以动态学习到特征间的组合关系.

核心观点介绍

FiBiNet 的网络结构图如下图所示:

在这里插入图片描述
网络上半部分为 Deep Part, 主要是 MLP 网络, 不过多介绍; 下半部分 Shallow Part 是 FiBiNet 的核心, 主要对输入特征进行处理. 首先是图的左下部分, 高维的稀疏输入特征经过 Embedding Layer 后映射为低维的稠密向量 embeddings, 此外 embeddings 要经过 SENET 层动态学习特征的重要性, 从而得到 SENET-Like embeddings. 之后 embeddings 和 SENET-Like embeddings 分别输入到 Bilinear-Interaction Layer 中并进行特征交叉, 输出的交叉特征 concatenation 起来后, 再输入到 MLP 中做 CTR 的预估.
其中核心模块为 SENET Layer 以及 Bilinear-Interaction Layer, 下面分别进行介绍.

先约定一些符号, 设经过高维稀疏特征经过 Embedding Layer 后映射为低维的稠密向量, 表示为: $E=\left[e_{1}, e_{2}, \cdots, e_{i}, \cdots, e_{f}\right]$ , 其中 $f$ 表示 field 的个数, $e_i\in\mathbb{R}^k$ 表示第 $i$ 个 field 所对应的 embedding, $k$ 表示 embedding 的大小.

SENET

SENET 的网络结构图如下:

其全称为 Squeeze-and-Excitation Network (SENET), 在计算机视觉任务中有出色的表现, FiBiNet 将其用在 CTR 预估任务中, 用于动态学习特征的重要性, 对越重要的特征学习出更大的权重, 并且减小不那么重要的特征的权重.
总的来说, SENET 的输入为 $E=\left[e_{1}, e_{2}, \cdots, e_{i}, \cdots, e_{f}\right]$ , 并针对每个 field 所对应 embedding 分别产生权重 $A=\left[a_{1}, a_{2}, \cdots, a_{i}, \cdots, a_{f}\right]$ , 其中权重 $a_i\in\mathbb{R}$ 为 scalar, 将其和输入 embedding 进行相乘, 得到 SENET-Like embeddings $V=\left[v_{1}, v_{2}, \cdots, v_{i}, \cdots, v_{f}\right]$ , 其中 $v_i = a_i\cdot e_i\in\mathbb{R}^k$ .

SENET 主要分为 Squeeze, Excitation, 以及 Re-weight 三个步骤, 其中:

Squeeze: 计算每个 field 对应 embedding 的统计信息, 具体是使用 sum/mean pooling 操作将输入特征 $E=\left[e_{1}, e_{2}, \cdots, e_{i}, \cdots, e_{f}\right]$ 变换为 $Z=\left[z_{1}, z_{2}, \cdots, z_{i}, \cdots, z_{f}\right]$ , 其中 $z_i$ 包含着 embedding $e_i$ 的全局信息, 计算公式为:

$z_{i}=F_{s q}\left(e_{i}\right)=\frac{1}{k} \sum_{t=1}^{k} e_{i}^{(t)}$

Excitation: 这一步利用 Squeeze 获得的统计信息 $Z$ 来学习每个 embedding 所对应的权重 $a_i$ . 作者采用两层全连接层来进行学习, 计算公式为:
$A=F_{e x}(Z)=\sigma_{2}\left(W_{2} \sigma_{1}\left(W_{1} Z\right)\right)$
其中 $A\in\mathbb{R}^f$ , $\sigma_1$ 和 $\sigma_2$ 为激活函数, $W_{1} \in R^{f \times \frac{f}{r}}$ 以及 $W_{2} \in R^{\frac{f}{r} \times f}$ , 其中 $r$ 为 reduction ratio.
Re-weight: 将第二步学出来的权重和输入 embedding 进行 field-wise multiplication, 从而得到最终的输出结果, 计算公式为:

$V=F_{R e W e i g h t}(A, E)=\left[ a_{1} \cdot e_{1}, \cdots, a_{f} \cdot e_{f}\right]=\left[ v_{1}, \cdots, v_{f}\right]$

从上面的介绍可以看出, SENET 主要利用两层全连接层来动态学习特征的权重.

Bilinear-Interaction Layer

Bilinear-Interaction Layer 主要用于计算二阶特征交叉, 其计算过程可以使用下图表示:

图示非常形象, 其中图 ( $c$ ) 描述了 Bilinear-Interaction Layer 的计算过程, 在两个要进行特征交叉的向量中插入一个权重矩阵 $W$ .

Bilinear-Interaction 计算交叉特征 $p_{ij}$ 的方式有三种:

Field-All Type: 这种情况下所有的交叉特征共享同一个权重矩阵 $W$ , 即:
$p_{i j}=v_{i} \cdot W \odot v_{j}$
其中 $W\in\mathbb{R}^{k\times k}$
Field-Each Type: 这种情况下每个 Field 会维护一个权重矩阵 $W_i$ , 即:
$p_{i j}=v_{i} \cdot W_i \odot v_{j}$
其中 $W_i\in\mathbb{R}^{k\times k}$ , 而 $W=\left[W_{1}, W_{2}, \cdots, W_{i}, \cdots, W_{f}\right]\in\mathbb{R}^{f\times k\times k}$
Field-Interaction Type: 这种情况下每组交叉特征 $v_i, v_j)$ 会维护一个权重矩阵 $W_{ij}$ , 即:
$p_{i j}=v_{i} \cdot W_{ij}\odot v_{j}$
其中 $W_{ij}\in\mathbb{R}^{k\times k}$ , 由于交叉特征 pair 的个数总共有 $\frac{f\times (f - 1)}{2}$ , 因此权重也有 $n$ 个.

Combination Layer 与 MLP

Bilinear-Interaction Layer 分别对原始的 Embedding $E$ 和 SENET-Like Embedding $V$ 进行处理, 分别得到交叉特征 $\left[ p_{1}, \cdots, p_{i}, \cdots, p_{n}\right]$ 与 $\left[ q_{1}, \cdots, q_{i}, \cdots, q_{n}\right]$ , 其中 $p_i, q_i\in\mathbb{R}^k$ 均为向量.

Combination Layer 对 $p$ 和 $q$ 进行 concatenation, 得到输出结果为:

$c=F_{\text {concat}}(p, q)=\left[p_{1}, \cdots, p_{n}, q_{1}, \cdots, q_{n}\right]=\left[c_{1}, \cdots, c_{2 n}\right]$

之后将 $c$ 输入到 MLP 中获得 CTR 的估计.

源码浅析

原作者的代码没有找到, 发现 DeepCTR 实现了 FiBiNet, 因此下面的源码浅析分析的是 DeepCTR 的实现. 代码地址为: https://github.com/shenweichen/DeepCTR/blob/master/deepctr/models/fibinet.py

SENET

该层定义于: https://github.com/shenweichen/DeepCTR/blob/master/deepctr/layers/interaction.py 中的 SENETLayer 中:

SENETLayer 要求的输入为 inputs = [e1, e2, ..., ef], 其中 ei 的 shape 为 [batch_size, 1, embedding_size], 这跟 DeepCTR 处理特征的方式有关, 具体不过多介绍, 注意在其 call 函数中第一步操作为: inputs = concat_func(inputs, axis=1), 这样就将 inputs 转化成了 shape 为 [batch_size, field_size, embedding_size] 的 tensor, 符合我们的直觉.

class SENETLayer(Layer):
    """SENETLayer used in FiBiNET.
      Input shape
        - A list of 3D tensor with shape: ``(batch_size,1,embedding_size)``.
      Output shape
        - A list of 3D tensor with shape: ``(batch_size,1,embedding_size)``.
      Arguments
        - **reduction_ratio** : Positive integer, dimensionality of the
         attention network output space.
        - **seed** : A Python integer to use as random seed.
      References
        - [FiBiNET: Combining Feature Importance and Bilinear feature Interaction for Click-Through Rate Prediction](https://arxiv.org/pdf/1905.09433.pdf)
    """

    def __init__(self, reduction_ratio=3, seed=1024, **kwargs):
        self.reduction_ratio = reduction_ratio

        self.seed = seed
        super(SENETLayer, self).__init__(**kwargs)

    def build(self, input_shape):

        if not isinstance(input_shape, list) or len(input_shape) < 2:
            raise ValueError('A `AttentionalFM` layer should be called '
                             'on a list of at least 2 inputs')

        self.filed_size = len(input_shape)  ## F, 表示 Field 的数量
        self.embedding_size = input_shape[0][-1]  ## K, 表示 embedding 的大小
        reduction_size = max(1, self.filed_size // self.reduction_ratio)  ## r, 表示 reduction ratio
		
		## W1, shape 为 (F, F/r)
        self.W_1 = self.add_weight(shape=(
            self.filed_size, reduction_size), 
            initializer=glorot_normal(seed=self.seed), name="W_1")
        ## W2, shape 为 (F/r, F)
        self.W_2 = self.add_weight(shape=(
            reduction_size, self.filed_size), 
            initializer=glorot_normal(seed=self.seed), name="W_2")
		
		## tf.tensordot 做的是 element-wise multiplication, 
		## 具体可以参考我的博客: https://blog.csdn.net/Eric_1993/article/details/105670381
        self.tensordot = tf.keras.layers.Lambda(
            lambda x: tf.tensordot(x[0], x[1], axes=(-1, 0)))

        # Be sure to call this somewhere!
        super(SENETLayer, self).build(input_shape)

    def call(self, inputs, training=None, **kwargs):
		## inputs = [e1, e2, ..., ef]
		## 其中 ei 的大小为 [B, 1, K]
        if K.ndim(inputs[0]) != 3:
            raise ValueError(
                "Unexpected inputs dimensions %d, expect to be 3 dimensions" % (K.ndim(inputs)))
		
		## 经 concat_func 处理后, inputs shape 为 [B, F, K]
        inputs = concat_func(inputs, axis=1)
        Z = reduce_mean(inputs, axis=-1, ) ## [B, F]

        A_1 = tf.nn.relu(self.tensordot([Z, self.W_1])) ## [B, F/r]
        A_2 = tf.nn.relu(self.tensordot([A_1, self.W_2])) ## [B, F]
        V = tf.multiply(inputs, tf.expand_dims(A_2, axis=2)) ## [B, F, K]
		
		## 这一步和 DeepCTR 对特征的处理有关, 对 V 进行 split, 结果为:
		## [v1, v2, ..., vf], 其中 vi 的 shape 为 [B, 1, K]
        return tf.split(V, self.filed_size, axis=1)

Bilinear-Interaction Layer

该层定义于: https://github.com/shenweichen/DeepCTR/blob/master/deepctr/layers/interaction.py 中的 BilinearInteraction 中:

BilinearInteraction 的输入为 inputs = [e1, e2, ..., ef], 其中 ei 的 shape 为 [batch_size, 1, embedding_size], 这跟 DeepCTR 处理特征的方式有关, 具体不过多介绍.

class BilinearInteraction(Layer):
    """BilinearInteraction Layer used in FiBiNET.
      Input shape
        - A list of 3D tensor with shape: ``(batch_size,1,embedding_size)``.
      Output shape
        - 3D tensor with shape: ``(batch_size,1,embedding_size)``.
      Arguments
        - **str** : String, types of bilinear functions used in this layer.
        - **seed** : A Python integer to use as random seed.
      References
        - [FiBiNET: Combining Feature Importance and Bilinear feature Interaction for Click-Through Rate Prediction](https://arxiv.org/pdf/1905.09433.pdf)
    """

    def __init__(self, bilinear_type="interaction", seed=1024, **kwargs):
        self.bilinear_type = bilinear_type
        self.seed = seed

        super(BilinearInteraction, self).__init__(**kwargs)

    def build(self, input_shape):

        if not isinstance(input_shape, list) or len(input_shape) < 2:
            raise ValueError('A `AttentionalFM` layer should be called '
                             'on a list of at least 2 inputs')
        embedding_size = int(input_shape[0][-1]) ## K

        if self.bilinear_type == "all":
        	## Field-All Type: W_list 的 shape 为 K * K
            self.W = self.add_weight(shape=(embedding_size, embedding_size), 
            						initializer=glorot_normal(seed=self.seed), name="bilinear_weight")
        elif self.bilinear_type == "each":
        	## Field-Each Type: W 的 shape 为 F * K * K
            self.W_list = [self.add_weight(shape=(embedding_size, embedding_size), 
            							initializer=glorot_normal(seed=self.seed), name="bilinear_weight" + str(i)) for i in range(len(input_shape) - 1)]
        elif self.bilinear_type == "interaction":
        	## Field-Interaction Type: W_list 的 shape 为 F*(F - 1)/2 * K * K
            self.W_list = [self.add_weight(shape=(embedding_size, embedding_size), 
            							initializer=glorot_normal(seed=self.seed), name="bilinear_weight" + str(i) + '_' + str(j)) for i, j in
                           itertools.combinations(range(len(input_shape)), 2)]
        else:
            raise NotImplementedError

        super(BilinearInteraction, self).build(input_shape)  # Be sure to call this somewhere!

    def call(self, inputs, **kwargs):
		## inputs = [e1, p2, ..., ef],
		## 其中 ei 的大小为 [B, 1, K]
        if K.ndim(inputs[0]) != 3:
            raise ValueError(
                "Unexpected inputs dimensions %d, expect to be 3 dimensions" % (K.ndim(inputs)))
		
        n = len(inputs)  ## 这里的 n 就是 field 的个数, 还是用 F 来表示吧
        ## 下面的计算, 看着 Bilinear-Interaction Layer 的图示就明白了, 不多说.
        if self.bilinear_type == "all":
            vidots = [tf.tensordot(inputs[i], self.W, axes=(-1, 0)) for i in range(n)]
            p = [tf.multiply(vidots[i], inputs[j]) for i, j in itertools.combinations(range(n), 2)]
        elif self.bilinear_type == "each":
            vidots = [tf.tensordot(inputs[i], self.W_list[i], axes=(-1, 0)) for i in range(n - 1)]
            p = [tf.multiply(vidots[i], inputs[j]) for i, j in itertools.combinations(range(n), 2)]
        elif self.bilinear_type == "interaction":
            p = [tf.multiply(tf.tensordot(v[0], w, axes=(-1, 0)), v[1])
                 for v, w in zip(itertools.combinations(inputs, 2), self.W_list)]
        else:
            raise NotImplementedError
        return concat_func(p)

总结

假期看了两部漫画《迷域行者》和《一人之下》, 我很快乐~ 🤣🤣🤣

珍妮的选择

关注

1
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
FiBiNet 网络介绍与源码浅析

FiBiNet 网络介绍与源码浅析前言 (与主题无关, 可以忽略)我知道这有点不太厚道, 文章不写全就发出来, 但最近真的很忙, 同时给自己立了 9 月再写一篇博客的 Flag~ 可是这个月只写了一篇 ???????????? 今晚是 9 月 30 日, 月色很美 … (猜测的, 毕竟明日中秋和国庆一起过; 走路忘了抬头看看夜空, 忧桑~). 虽然下班较早, 但心事重重, 不到最后一刻不动笔. 因此现在先扯一点前言, 后续一定会以全副精力来完成 Flag! 我最近可是看了很多 paper 的, 可以乘着
复制链接

扫一扫