[代码实现]用Tensorflow实现MMoE

最新推荐文章于 2024-02-28 14:51:11 发布

crediks

最新推荐文章于 2024-02-28 14:51:11 发布

阅读量1.9k

点赞数 2

分类专栏：多任务学习

本文链接：https://blog.csdn.net/u013250416/article/details/119517857

版权

多任务学习专栏收录该内容

4 篇文章 0 订阅

订阅专栏

本文主要介绍tensorflow中mmoe的实现方式。

一.mmoe概念

先简单回忆下mmoe的概念：

https://blog.csdn.net/u013250416/article/details/118642297

二.已有keras版本实现思路分析

github中已经有keras版本mmoe的实现（https://github.com/drawbridge/keras-mmoe/blob/master/mmoe.py），通过已经有的代码，可以简单理顺一下mmoe中各个组件的维度。

class MMoE(Layer):
    """
    Multi-gate Mixture-of-Experts model.
    """

    def build(self, input_shape):
        """
        Method for creating the layer weights.

        :param input_shape: Keras tensor (future input to layer)
                            or list/tuple of Keras tensors to reference
                            for weight shape computations
        """
        assert input_shape is not None and len(input_shape) >= 2

        input_dimension = input_shape[-1]

        # Initialize expert weights (number of input features * number of units per expert * number of experts)
        self.expert_kernels = self.add_weight(
            name='expert_kernel',
            shape=(input_dimension, self.units, self.num_experts),
            initializer=self.expert_kernel_initializer,
            regularizer=self.expert_kernel_regularizer,
            constraint=self.expert_kernel_constraint,
        )

        # Initialize expert bias (number of units per expert * number of experts)
        if self.use_expert_bias:
            self.expert_bias = self.add_weight(
                name='expert_bias',
                shape=(self.units, self.num_experts),
                initializer=self.expert_bias_initializer,
                regularizer=self.expert_bias_regularizer,
                constraint=self.expert_bias_constraint,
            )

        # Initialize gate weights (number of input features * number of experts * number of tasks)
        self.gate_kernels = [self.add_weight(
            name='gate_kernel_task_{}'.format(i),
            shape=(input_dimension, self.num_experts),
            initializer=self.gate_kernel_initializer,
            regularizer=self.gate_kernel_regularizer,
            constraint=self.gate_kernel_constraint
        ) for i in range(self.num_tasks)]

        # Initialize gate bias (number of experts * number of tasks)
        if self.use_gate_bias:
            self.gate_bias = [self.add_weight(
                name='gate_bias_task_{}'.format(i),
                shape=(self.num_experts,),
                initializer=self.gate_bias_initializer,
                regularizer=self.gate_bias_regularizer,
                constraint=self.gate_bias_constraint
            ) for i in range(self.num_tasks)]

        self.input_spec = InputSpec(min_ndim=2, axes={-1: input_dimension})

        super(MMoE, self).build(input_shape)

    def call(self, inputs, **kwargs):
        """
        Method for the forward function of the layer.

        :param inputs: Input tensor
        :param kwargs: Additional keyword arguments for the base method
        :return: A tensor
        """
        gate_outputs = []
        final_outputs = []

        # f_{i}(x) = activation(W_{i} * x + b), where activation is ReLU according to the paper
        expert_outputs = tf.tensordot(a=inputs, b=self.expert_kernels, axes=1)
        # Add the bias term to the expert weights if necessary
        if self.use_expert_bias:
            expert_outputs = K.bias_add(x=expert_outputs, bias=self.expert_bias)
        expert_outputs = self.expert_activation(expert_outputs)

        # g^{k}(x) = activation(W_{gk} * x + b), where activation is softmax according to the paper
        for index, gate_kernel in enumerate(self.gate_kernels):
            gate_output = K.dot(x=inputs, y=gate_kernel)
            # Add the bias term to the gate weights if necessary
            if self.use_gate_bias:
                gate_output = K.bias_add(x=gate_output, bias=self.gate_bias[index])
            gate_output = self.gate_activation(gate_output)
            gate_outputs.append(gate_output)

        # f^{k}(x) = sum_{i=1}^{n}(g^{k}(x)_{i} * f_{i}(x))
        for gate_output in gate_outputs:
            expanded_gate_output = K.expand_dims(gate_output, axis=1)
            weighted_expert_output = expert_outputs * K.repeat_elements(expanded_gate_output, self.units, axis=1)
            final_outputs.append(K.sum(weighted_expert_output, axis=2))

        return final_outputs

1.inputs维度: [batch_size, input_dimention]

2.expert权重: [input_dimention, hidden_size, num_experts]

那么，expert网络输出维度是: [batch_size, hidden_size, num_experts]

3.gate权重： [input_dimention, num_experts, num_tasks]

那么，gate网络输出维度是: [batch_size, num_experts, num_tasks]

由于在mmoe中，每个子任务都有一个自己的门控网络。对于每个子任务，都通过一个门控网络来实现对不同专家网络的组合。

因此，对于每个子任务，对应的gate网络的维度是: [batch_size, num_experts]

4.对于每个gate网络都有：

对gate网络进行softmax操作，得到每个expert网络的权重，对应的维度是: [batch_size, num_experts]

这里需要注意的是，expert网络输出维度是: [batch_size, hidden_size, num_experts]；

而expert网络的权重对应的维度是: [batch_size, num_experts]，因此需要对expert网络的权重进行在维度1上进行hidden_size次的重叠操作，得到[batch_size, hidden_size, num_experts]

接下来，将expert网络输出与expert网络的权重相乘并求和，就可以得到新的expert网络输出， [batch_size, hidden_size]

因此，整体的输出为： [batch_size, hidden_size] * num_tasks

5.后续可以对 [batch_size, hidden_size] 进行各种操作，来得到我们想要的输出。

三.tensorflow版本实现

GitHub - crediks/MMoE: MMoE completed by Tensorflow

这里，我们直接将第二节的说明翻译成tensorflow代码就可以了。

class MMoE(object):
    def __init__(self, hidden_size, num_experts, num_tasks):
        self.hidden_size = hidden_size
        self.num_experts = num_experts
        self.num_tasks = num_tasks

    def get_output(self, inputs):
        expert_weight = tf.get_variable(name='expert_weight', initializer=xavier_init, shape=[inputs.get_shape()[-1], self.hidden_size, self.num_experts])          
        expert_bias = tf.get_variable(name='expert_bias', initializer=xavier_init, shape=[self.hidden_size, self.num_experts])
        
        # [batch_size, hidden_size, num_experts]
        expert_output = tf.tensordot(inputs, expert_weight, axes=1) + expert_bias
        expert_output = tf.nn.relu(expert_output, name='expert_output')

        gate_weight = tf.get_variable(name='gate_weight', initializer=xavier_init, shape=[inputs.get_shape()[-1], self.num_experts, self.num_tasks])
        gate_bias = tf.get_variable(name='gate_bias', initializer=xavier_init, shape=[self.num_experts, self.num_tasks])
        
        # [batch_size, num_experts, num_tasks]
        gate_output = tf.tensordot(inputs, gate_weight, axes=1) + gate_bias

        # [batch_size, num_experts, 1] * num_tasks
        gate_outputs = tf.split(gate_output, num_or_size_splits=self.num_tasks, axis=2)

        final_outputs = []
        for gate_output in gate_outputs:
        	# [batch_size, 1, num_experts]
        	gate_output = tf.transpose(gate_output, [0,2,1])
        	gate_output = tf.nn.softmax(gate_output, name='gate_output_softmax')

        	# [batch_size, hidden_size, num_experts]
        	gate_output = tf.tile(gate_output, [1, self.hidden_size, 1])

        	weighted_expert_output = expert_output * gate_output

        	# [batch_size, hidden_size]
        	final_outputs.append(tf.reduce_sum(weighted_expert_output, axis=2))
        	
        return final_outputs

有什么问题，欢迎在下方留言，和我讨论！

crediks

关注

2
点赞
踩
10

收藏

觉得还不错? 一键收藏
0
评论
[代码实现]用Tensorflow实现MMoE

本文主要介绍tensorflow中mmoe的实现方式。一.mmoe概念先简单回忆下mmoe的概念：https://blog.csdn.net/u013250416/article/details/118642297二.已有keras版本实现思路分析github中已经有keras版本mmoe的实现（https://github.com/drawbridge/keras-mmoe/blob/master/mmoe.py），通过已经有的代码，可以简单理顺一下mmoe中各个组件的维度。cl
复制链接

扫一扫