推荐系统(十)DeepFM模型(A Factorization-Machine based Neural Network)

推荐系统(十)DeepFM模型(A Factorization-Machine based Neural Network)

推荐系统系列博客:

  1. 推荐系统(一)推荐系统整体概览
  2. 推荐系统(二)GBDT+LR模型
  3. 推荐系统(三)Factorization Machines(FM)
  4. 推荐系统(四)Field-aware Factorization Machines(FFM)
  5. 推荐系统(五)wide&deep
  6. 推荐系统(六)Deep & Cross Network(DCN)
  7. 推荐系统(七)xDeepFM模型
  8. 推荐系统(八)FNN模型(FM+MLP=FNN)
  9. 推荐系统(九)PNN模型(Product-based Neural Networks)

DeepFM是哈工大和华为合作发表在IJCAI2017上的文章,这篇文章也是受到谷歌wide&deep模型的启发,是一个左右组合(混合)模型结构,不同的是,在wide部分用了FM模型来代替LR模型。因此,强烈建议在看这篇文章之前,先移步看完我之前写的关于wide&deep的博客: 推荐系统(五)wide&deep。我们来看看DeepFM相比较wide&deep模型的改进点及优势(前提是你已经很了解wide&deep模型了):

  1. 在wide部分使用FM代替了wide&deep中的LR,有了FM自动构造学习二阶(考虑到时间复杂度原因,通常都是二阶)交叉特征的能力,因此不再需要特征工程。Wide&Deep模型中LR部分依然需要人工的特征交叉,比如【用户已安装的app】与【给用户曝光的app】两个特征做交叉。另外,仅仅通过人工的手动交叉,又回到了之前在讲FM模型中提到的,比如要两个特征共现,否则无法训练。
  2. 在DeepFM模型中,FM模型与DNN模型共享底层embedding向量,然后联合训练。这种方式也更符合现在推荐/广告领域里多任务模型多塔共享底座embedding的方式,然后end-to-end训练得到的embedding向量也更加准确。

其实如果你很熟悉wide&deep模型,再经过上面的介绍,你基本已经知道DeepFM的大体网络结构了。接下来,本文将从两个方面介绍deepFM:

  1. DeepFM的模型结构细节
  2. DeepFM的代码实现
  3. 总结

一、DeepFM的模型结构细节

来看下DeepFM的模型结构(图片来自王喆《深度学习推荐系统》,ps:原论文的图不清晰,所以没有直接从原论文取图)
在这里插入图片描述
整体模型结构也比较简单,自底向上看分别为:

  1. 原始输入层:onehot编码的稀疏输入
  2. embedding层:FM和DNN共享的底座
  3. FM与DNN
  4. 输出层
1.1 FM

重点说下FM层,先来回顾下FM的公式:
y ^ ( x ) = w 0 + ∑ i = 1 n w i x i + ∑ i = 1 n ∑ j = i + 1 n < v i , v j > x i x j (1) \hat{y}(x) = w_0 + \sum_{i=1}^nw_ix_i + \sum_{i=1}^n\sum_{j=i+1}^n<v_i, v_j>x_ix_j \tag{1} y^(x)=w0+i=1nwixi+i=1nj=i+1n<vi,vj>xixj(1)
上面的公式分为两个部分:

1.1.1 一阶部分

w 0 + ∑ i = 1 n w i x i (2) w_0 + \sum_{i=1}^nw_ix_i \tag{2} w0+i=1nwixi(2)

其实就是个LR,没什么要说的。

1.1.2 二阶部分

∑ i = 1 n ∑ j = i + 1 n < v i , v j > x i x j (3) \sum_{i=1}^n\sum_{j=i+1}^n<v_i, v_j>x_ix_j \tag{3} i=1nj=i+1n<vi,vj>xixj(3)

在讲解 推荐系统(三)Factorization Machines(FM)的博客里讲过,公式(3)直接算(两两交叉算内积)时间复杂度为 O ( N 2 ) O(N^2) O(N2),FM论文中给出了推导,把时间复杂度降低至 O ( K N ) O(KN) O(KN)
∑ i = 1 n ∑ j = i + 1 n < v i , v j > x i x j = 1 2 [ ∑ i = 1 n ∑ j = 1 n < v i , v j > x i x j − ∑ i = 1 n < v i , v i > x i x i ] = 1 2 ( ∑ i = 1 n ∑ j = 1 n ∑ f = 1 k v i , f ⋅ v j , f x i x j − ∑ i = 1 n ∑ f = 1 k v i , f ⋅ v i , f x i x i ) = 1 2 ∑ f = 1 k ( ( ∑ i = 1 n v i , f x i ) ( ∑ j = 1 n v j , f x j ) − ∑ i = 1 n v i , f 2 x i 2 ) = 1 2 ∑ f = 1 k ( ( ∑ i = 1 n v i , f x i ) 2 − ∑ i = 1 n v i , f 2 x i 2 ) (4) \begin{aligned} & \sum_{i=1}^{n}\sum_{j=i+1}^n<v_i, v_j>x_ix_j \\ &= \frac{1}{2} \tag {4}[\sum_{i=1}^{n}\sum_{j=1}^n<v_i, v_j>x_ix_j - \sum_{i=1}^{n}<v_i, v_i>x_ix_i] \\ &= \frac{1}{2}(\sum_{i=1}^n\sum_{j=1}^n\sum_{f=1}^kv_{i,f} \cdot v_{j,f} x_ix_j - \sum_{i=1}^n\sum_{f=1}^kv_{i,f} \cdot v_{i,f} x_ix_i) \\ &=\frac{1}{2}\sum_{f=1}^k((\sum_{i=1}^nv_{i,f}x_i)(\sum_{j=1}^nv_{j,f}x_j) - \sum_{i=1}^nv_{i,f}^2x_i^2) \\ &=\frac{1}{2}\sum_{f=1}^k((\sum_{i=1}^n v_{i,f}x_i)^2 -\sum_{i=1}^nv_{i,f}^2x_i^2) \end{aligned} i=1nj=i+1n<vi,vj>xixj=21[i=1nj=1n<vi,vj>xixji=1n<vi,vi>xixi]=21(i=1nj=1nf=1kvi,fvj,fxixji=1nf=1kvi,fvi,fxixi)=21f=1k((i=1nvi,fxi)(j=1nvj,fxj)i=1nvi,f2xi2)=21f=1k((i=1nvi,fxi)2i=1nvi,f2xi2)(4)
所以,在实现的时候一般都是实现公式(4)。

1.2 DNN

没什么好讲的,多层全连接网络。

1.3 最终的输出

把FM的输出和DNN的输出想加送到sigmoid里,如下所示
y ^ = s i g m o i d ( y F M + y D N N ) \hat{y} = sigmoid(y_{FM} + y_{DNN}) y^=sigmoid(yFM+yDNN)

二、DeepFM的代码实现

这部分是本博客的重点,我这里直接用paddle官方的代码讲解下,具体代码参见:搞清楚代码细节,有助于我们对DeepFM模型更深入的了解。

2.1 数据集

这里用的是Criteo数据集,用于广告CTR预估的数据集,关于数据集的介绍参见:Criteo。特征方面,这个数据集共26个离散特征,13个连续值特征。

2.2 FM部分实现

我给代码增加了详细的注释(主要是矩阵维度的注释),大家看代码即可。

class FM(nn.Layer):
    def __init__(self, sparse_feature_number, sparse_feature_dim,
                 dense_feature_dim, sparse_num_field):
        super(FM, self).__init__()
        self.sparse_feature_number = sparse_feature_number  # 1000001
        self.sparse_feature_dim = sparse_feature_dim   # 9
        self.dense_feature_dim = dense_feature_dim  # 13
        self.dense_emb_dim = self.sparse_feature_dim  # 9
        self.sparse_num_field = sparse_num_field   # 26
        self.init_value_ = 0.1
        use_sparse = True
        # sparse coding
        # Embedding(1000001, 1, padding_idx=0, sparse=True)
        self.embedding_one = paddle.nn.Embedding(
            sparse_feature_number,
            1,
            padding_idx=0,
            sparse=use_sparse,
            weight_attr=paddle.ParamAttr(
                initializer=paddle.nn.initializer.TruncatedNormal(
                    mean=0.0,
                    std=self.init_value_ /
                    math.sqrt(float(self.sparse_feature_dim)))))
        # Embedding(1000001, 9, padding_idx=0, sparse=True)
        self.embedding = paddle.nn.Embedding(
            self.sparse_feature_number,
            self.sparse_feature_dim,
            sparse=use_sparse,
            padding_idx=0,
            weight_attr=paddle.ParamAttr(
                initializer=paddle.nn.initializer.TruncatedNormal(
                    mean=0.0,
                    std=self.init_value_ /
                    math.sqrt(float(self.sparse_feature_dim)))))

        # dense coding
        """
        Tensor(shape=[13], dtype=float32, place=CPUPlace, stop_gradient=False,
        [-0.00486396,  0.02755001, -0.01340683,  0.05218775,  0.00938804,  0.01068084,  0.00679830,  
        0.04791596, -0.04357519,  0.06603041, -0.02062148, -0.02801327, -0.04119579]))
        """
        self.dense_w_one = paddle.create_parameter(
            shape=[self.dense_feature_dim],
            dtype='float32',
            default_initializer=paddle.nn.initializer.TruncatedNormal(
                mean=0.0,
                std=self.init_value_ /
                math.sqrt(float(self.sparse_feature_dim))))

        # Tensor(shape=[1, 13, 9])
        self.dense_w = paddle.create_parameter(
            shape=[1, self.dense_feature_dim, self.dense_emb_dim],
            dtype='float32',
            default_initializer=paddle.nn.initializer.TruncatedNormal(
                mean=0.0,
                std=self.init_value_ /
                math.sqrt(float(self.sparse_feature_dim))))
    
    
    def forward(self, sparse_inputs, dense_inputs):
        # -------------------- first order term  --------------------
        """
        sparse_inputs: list, length:26, list[tensor], each tensor shape: [2, 1]
        dense_inputs: Tensor(shape=[2, 13]), 2 --> train_batch_size
        """
        # Tensor(shape=[2, 26])
        sparse_inputs_concat = paddle.concat(sparse_inputs, axis=1)
        # Tensor(shape=[2, 26, 1])
        sparse_emb_one = self.embedding_one(sparse_inputs_concat)
        # dense_w_one: shape=[13], dense_inputs: shape=[2, 13]
        # dense_emb_one: shape=[2, 13]
        dense_emb_one = paddle.multiply(dense_inputs, self.dense_w_one)
        # shape=[2, 13, 1]
        dense_emb_one = paddle.unsqueeze(dense_emb_one, axis=2)
        # paddle.sum(sparse_emb_one, 1): shape=[2, 1]
        # paddle.sum(dense_emb_one, 1): shape=[2, 1]
        # y_first_order: shape=[2, 1]
        y_first_order = paddle.sum(sparse_emb_one, 1) + paddle.sum(
            dense_emb_one, 1)
        # -------------------- second order term  --------------------
        # Tensor(shape=[2, 26, 9])
        sparse_embeddings = self.embedding(sparse_inputs_concat)
        # Tensor(shape=[2, 13, 1])
        dense_inputs_re = paddle.unsqueeze(dense_inputs, axis=2)
        # dense_inputs_re: Tensor(shape=[2, 13, 1])
        # dense_w: Tensor(shape=[1, 13, 9])
        # dense_embeddings: Tensor(shape=[2, 13, 9])
        dense_embeddings = paddle.multiply(dense_inputs_re, self.dense_w)
        # Tensor(shape=[2, 39, 9])
        feat_embeddings = paddle.concat([sparse_embeddings, dense_embeddings],
                                        1)
        # sum_square part
        # Tensor(shape=[2, 9])
        # \sum_{i=1}^n(v_{i,f}x_i) ---> for each embedding element: e_i, sum all feature's e_i
        summed_features_emb = paddle.sum(feat_embeddings,
                                         1)  # None * embedding_size
        # Tensor(shape=[2, 9]) 2-->batch_size
        summed_features_emb_square = paddle.square(
            summed_features_emb)  # None * embedding_size
        # square_sum part
        # Tensor(shape=[2, 39, 9])
        squared_features_emb = paddle.square(
            feat_embeddings)  # None * num_field * embedding_size
        # Tensor(shape=[2, 9]) 2-->batch_size
        squared_sum_features_emb = paddle.sum(squared_features_emb,
                                              1)  # None * embedding_size
        # Tensor(shape=[2, 1])
        y_second_order = 0.5 * paddle.sum(
            summed_features_emb_square - squared_sum_features_emb,
            1,
            keepdim=True)  # None * 1

        return y_first_order, y_second_order, feat_embeddings
2.3 DNN部分实现

这部分着实没什么好说的,直接略过。

class DNN(paddle.nn.Layer):
    def __init__(self, sparse_feature_number, sparse_feature_dim,
                 dense_feature_dim, num_field, layer_sizes):
        super(DNN, self).__init__()
        self.sparse_feature_number = sparse_feature_number
        self.sparse_feature_dim = sparse_feature_dim
        self.dense_feature_dim = dense_feature_dim
        self.num_field = num_field
        self.layer_sizes = layer_sizes
        # [351, 512, 256, 128, 32, 1]
        sizes = [sparse_feature_dim * num_field] + self.layer_sizes + [1]
        acts = ["relu" for _ in range(len(self.layer_sizes))] + [None]
        self._mlp_layers = []
        for i in range(len(layer_sizes) + 1):
            linear = paddle.nn.Linear(
                in_features=sizes[i],
                out_features=sizes[i + 1],
                weight_attr=paddle.ParamAttr(
                    initializer=paddle.nn.initializer.Normal(
                        std=1.0 / math.sqrt(sizes[i]))))
            self.add_sublayer('linear_%d' % i, linear)
            self._mlp_layers.append(linear)
            if acts[i] == 'relu':
                act = paddle.nn.ReLU()
                self.add_sublayer('act_%d' % i, act)

    def forward(self, feat_embeddings):
        """
        feat_embeddings: Tensor(shape=[2, 39, 9])
        """
        # Tensor(shape=[2, 351]) --> 351=39*9, 
        # 39 is the number of features(category feature+ continous feature), 9 is embedding size
        y_dnn = paddle.reshape(feat_embeddings,
                               [-1, self.num_field * self.sparse_feature_dim])
        for n_layer in self._mlp_layers:
            y_dnn = n_layer(y_dnn)
        return y_dnn
2.4 FM和DNN部分结合
    def forward(self, sparse_inputs, dense_inputs):

        y_first_order, y_second_order, feat_embeddings = self.fm.forward(
            sparse_inputs, dense_inputs)
        # feat_embeddings: Tensor(shape=[2, 39, 9])
        # y_dnn: Tensor(shape=[2, 1])
        y_dnn = self.dnn.forward(feat_embeddings)
        print("y_dnn:", y_dnn)

        predict = F.sigmoid(y_first_order + y_second_order + y_dnn)

        return predict

三、总结

总得来说,DeepFM还是一个挺不错的模型,在工业界应用的也挺多。还是那句话,如果你的场景下之前是LR,正在往深度学习迁移,为了最大化节约成本,可以尝试下wide&deep模型。如果原来是xgboost一类的树模型,需要尝试深度学习模型,建议直接deepFM。

此外,deepFM与DCN都是在2017年发表的,因此这两篇paper里均没有直接有过实验数据对比,但在DCN V2里给出了实验效果对比,在论文给定的数据集下,两个模型效果差不多。详情参见:DCN V2。具体的实验效果需要大家在各自的场景下做实验观察。




参考文献
  1. Guo H, Tang R, Ye Y, et al. DeepFM: a factorization-machine based neural network for CTR prediction[J]. arXiv preprint arXiv:1703.04247, 2017.
  • 3
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值