[代码实现]用Tensorflow实现MMoE

本文主要介绍tensorflow中mmoe的实现方式。

一.mmoe概念

先简单回忆下mmoe的概念:

https://blog.csdn.net/u013250416/article/details/118642297

二.已有keras版本实现思路分析

github中已经有keras版本mmoe的实现(https://github.com/drawbridge/keras-mmoe/blob/master/mmoe.py),通过已经有的代码,可以简单理顺一下mmoe中各个组件的维度。

class MMoE(Layer):
    """
    Multi-gate Mixture-of-Experts model.
    """

    def build(self, input_shape):
        """
        Method for creating the layer weights.

        :param input_shape: Keras tensor (future input to layer)
                            or list/tuple of Keras tensors to reference
                            for weight shape computations
        """
        assert input_shape is not None and len(input_shape) >= 2

        input_dimension = input_shape[-1]

        # Initialize expert weights (number of input features * number of units per expert * number of experts)
        self.expert_kernels = self.add_weight(
            name='expert_kernel',
            shape=(input_dimension, self.units, self.num_experts),
            initializer=self.expert_kernel_initializer,
            regularizer=self.expert_kernel_regularizer,
            constraint=self.expert_kernel_constraint,
        )

        # Initialize expert bias (number of units per expert * number of experts)
        if self.use_expert_bias:
            self.expert_bias = self.add_weight(
                name='expert_bias',
                shape=(self.units, self.num_experts),
                initializer=self.expert_bias_initializer,
                regularizer=self.expert_bias_regularizer,
                constraint=self.expert_bias_constraint,
            )

        # Initialize gate weights (number of input features * number of experts * number of tasks)
        self.gate_kernels = [self.add_weight(
            name='gate_kernel_task_{}'.format(i),
            shape=(input_dimension, self.num_experts),
            initializer=self.gate_kernel_initializer,
            regularizer=self.gate_kernel_regularizer,
            constraint=self.gate_kernel_constraint
        ) for i in range(self.num_tasks)]

        # Initialize gate bias (number of experts * number of tasks)
        if self.use_gate_bias:
            self.gate_bias = [self.add_weight(
                name='gate_bias_task_{}'.format(i),
                shape=(self.num_experts,),
                initializer=self.gate_bias_initializer,
                regularizer=self.gate_bias_regularizer,
                constraint=self.gate_bias_constraint
            ) for i in range(self.num_tasks)]

        self.input_spec = InputSpec(min_ndim=2, axes={-1: input_dimension})

        super(MMoE, self).build(input_shape)

    def call(self, inputs, **kwargs):
        """
        Method for the forward function of the layer.

        :param inputs: Input tensor
        :param kwargs: Additional keyword arguments for the base method
        :return: A tensor
        """
        gate_outputs = []
        final_outputs = []

        # f_{i}(x) = activation(W_{i} * x + b), where activation is ReLU according to the paper
        expert_outputs = tf.tensordot(a=inputs, b=self.expert_kernels, axes=1)
        # Add the bias term to the expert weights if necessary
        if self.use_expert_bias:
            expert_outputs = K.bias_add(x=expert_outputs, bias=self.expert_bias)
        expert_outputs = self.expert_activation(expert_outputs)

        # g^{k}(x) = activation(W_{gk} * x + b), where activation is softmax according to the paper
        for index, gate_kernel in enumerate(self.gate_kernels):
            gate_output = K.dot(x=inputs, y=gate_kernel)
            # Add the bias term to the gate weights if necessary
            if self.use_gate_bias:
                gate_output = K.bias_add(x=gate_output, bias=self.gate_bias[index])
            gate_output = self.gate_activation(gate_output)
            gate_outputs.append(gate_output)

        # f^{k}(x) = sum_{i=1}^{n}(g^{k}(x)_{i} * f_{i}(x))
        for gate_output in gate_outputs:
            expanded_gate_output = K.expand_dims(gate_output, axis=1)
            weighted_expert_output = expert_outputs * K.repeat_elements(expanded_gate_output, self.units, axis=1)
            final_outputs.append(K.sum(weighted_expert_output, axis=2))

        return final_outputs

1.inputs维度: [batch_size, input_dimention]

2.expert权重: [input_dimention, hidden_size, num_experts]

那么,expert网络输出维度是: [batch_size, hidden_size, num_experts]

3.gate权重: [input_dimention, num_experts, num_tasks]

那么,gate网络输出维度是: [batch_size, num_experts, num_tasks]

由于在mmoe中,每个子任务都有一个自己的门控网络。对于每个子任务,都通过一个门控网络来实现对不同专家网络的组合。

因此,对于每个子任务,对应的gate网络的维度是: [batch_size, num_experts]

4.对于每个gate网络都有:

对gate网络进行softmax操作,得到每个expert网络的权重,对应的维度是: [batch_size, num_experts]

这里需要注意的是,expert网络输出维度是: [batch_size, hidden_size, num_experts];

而expert网络的权重对应的维度是: [batch_size, num_experts],因此需要对expert网络的权重进行在维度1上进行hidden_size次的重叠操作,得到[batch_size, hidden_size, num_experts]

接下来,将expert网络输出与expert网络的权重相乘并求和,就可以得到新的expert网络输出, [batch_size, hidden_size]

因此,整体的输出为: [batch_size, hidden_size] * num_tasks

5.后续可以对 [batch_size, hidden_size] 进行各种操作,来得到我们想要的输出。

三.tensorflow版本实现

GitHub - crediks/MMoE: MMoE completed by Tensorflow

这里,我们直接将第二节的说明翻译成tensorflow代码就可以了。

class MMoE(object):
    def __init__(self, hidden_size, num_experts, num_tasks):
        self.hidden_size = hidden_size
        self.num_experts = num_experts
        self.num_tasks = num_tasks

    def get_output(self, inputs):
        expert_weight = tf.get_variable(name='expert_weight', initializer=xavier_init, shape=[inputs.get_shape()[-1], self.hidden_size, self.num_experts])          
        expert_bias = tf.get_variable(name='expert_bias', initializer=xavier_init, shape=[self.hidden_size, self.num_experts])
        
        # [batch_size, hidden_size, num_experts]
        expert_output = tf.tensordot(inputs, expert_weight, axes=1) + expert_bias
        expert_output = tf.nn.relu(expert_output, name='expert_output')

        gate_weight = tf.get_variable(name='gate_weight', initializer=xavier_init, shape=[inputs.get_shape()[-1], self.num_experts, self.num_tasks])
        gate_bias = tf.get_variable(name='gate_bias', initializer=xavier_init, shape=[self.num_experts, self.num_tasks])
        
        # [batch_size, num_experts, num_tasks]
        gate_output = tf.tensordot(inputs, gate_weight, axes=1) + gate_bias

        # [batch_size, num_experts, 1] * num_tasks
        gate_outputs = tf.split(gate_output, num_or_size_splits=self.num_tasks, axis=2)

        final_outputs = []
        for gate_output in gate_outputs:
        	# [batch_size, 1, num_experts]
        	gate_output = tf.transpose(gate_output, [0,2,1])
        	gate_output = tf.nn.softmax(gate_output, name='gate_output_softmax')

        	# [batch_size, hidden_size, num_experts]
        	gate_output = tf.tile(gate_output, [1, self.hidden_size, 1])

        	weighted_expert_output = expert_output * gate_output

        	# [batch_size, hidden_size]
        	final_outputs.append(tf.reduce_sum(weighted_expert_output, axis=2))
        	
        return final_outputs



有什么问题,欢迎在下方留言,和我讨论!

  • 2
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Faster R-CNN是一种用于目标检测的深度学习模型,它结合了区域建议网络(Region Proposal Network)和目标检测网络(Detection Network)。要在TensorFlow实现Faster R-CNN模型,首先需要编写区域建议网络和目标检测网络的代码。 在TensorFlow实现区域建议网络,可以使用卷积神经网络(CNN)来提取特征,并结合锚框(anchor boxes)来生成候选区域。在目标检测网络的实现中,可以使用卷积神经网络和全连接层来对候选区域进行分类和边界框回归。 除了实现区域建议网络和目标检测网络的代码,还需要编写损失函数、优化器和训练过程的代码。损失函数通常包括目标检测网络的分类损失和边界框回归损失,优化器可以选择Adam或者SGD等算法,训练过程则是通过反向传播来更新模型参数。 在实现Faster R-CNN模型的过程中,还需要考虑如何处理数据集、如何进行模型评估和部署等问题。可以使用TensorFlow中的数据读取和预处理工具来处理数据集,同时可以使用评估指标来评估模型的性能,并通过TensorFlow Serving等工具将模型部署到生产环境中。 总之,要在TensorFlow实现Faster R-CNN模型,需要编写区域建议网络和目标检测网络的代码,并配合损失函数、优化器、训练过程等组件,同时还需要考虑数据处理、模型评估和部署等方面的问题。通过认真地实现这些部分,就可以在TensorFlow中成功实现Faster R-CNN模型。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值