DSSM & Multi-view DSSM TensorFlow实现

最新推荐文章于 2025-09-21 23:51:24 发布

原创最新推荐文章于 2025-09-21 23:51:24 发布 · 2.1w 阅读

37 ·

CC 4.0 BY-SA版权

文章标签：

#DSSM #Multi-View DSSM #TensorFlow

DeepLearning 同时被 2 个专栏收录

41 篇文章

订阅专栏

机器学习

39 篇文章

订阅专栏

本文介绍了一种基于点击数据的深度结构化语义模型（DSSM）及其多视图扩展版的实现方法，包括数据预处理、模型结构搭建、训练流程等关键步骤。

Learning Deep Structured Semantic Models for Web Search using Clickthrough Data以及其后续文章

A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems的实现Demo。

1. 数据

DSSM，对于输入数据是Query对，即Query短句和相应的展示，展示中分点击和未点击，分别为正负样，同时对于点击的先后顺序，也是有不同赋值，具体可参考论文。

对于我的Query数据本人无权开放，还请自行寻找数据。

2. word hashing

原文使用3-grams，对于中文，我使用了uni-gram，因为中文本身字有一定代表意义（也有论文拆笔画），对于每个gram都使用one-hot编码代替，最终可以大大降低短句维度。

3. 结构

结构图：

把条目映射成低维向量。
计算查询和文档的cosine相似度。

3.1 输入

这里使用了TensorBoard可视化，所以定义了name_scope:

with tf.name_scope('input'):
    query_batch = tf.sparse_placeholder(tf.float32, shape=[None, TRIGRAM_D], name='QueryBatch')
    doc_positive_batch = tf.sparse_placeholder(tf.float32, shape=[None, TRIGRAM_D], name='DocBatch')
    doc_negative_batch = tf.sparse_placeholder(tf.float32, shape=[None, TRIGRAM_D], name='DocBatch')
    on_train = tf.placeholder(tf.bool)

3.2 全连接层

我使用三层的全连接层，对于每一层全连接层，除了神经元不一样，其他都一样，所以可以写一个函数复用。
$l_n = W_n x + b_1$

def add_layer(inputs, in_size, out_size, activation_function=None):
    wlimit = np.sqrt(6.0 / (in_size + out_size))
    Weights = tf.Variable(tf.random_uniform([in_size, out_size], -wlimit, wlimit))
    biases = tf.Variable(tf.random_uniform([out_size], -wlimit, wlimit))
    Wx_plus_b = tf.matmul(inputs, Weights) + biases
    if activation_function is None:
        outputs = Wx_plus_b
    else:
        outputs = activation_function(Wx_plus_b)
    return outputs

其中，对于权重和Bias，使用了按照论文的特定的初始化方式：

	wlimit = np.sqrt(6.0 / (in_size + out_size))
    Weights = tf.Variable(tf.random_uniform([in_size, out_size], -wlimit, wlimit))
    biases = tf.Variable(tf.random_uniform([out_size], -wlimit, wlimit))

Batch Normalization

def batch_normalization(x, phase_train, out_size):
    """
    Batch normalization on convolutional maps.
    Ref.: http://stackoverflow.com/questions/33949786/how-could-i-use-batch-normalization-in-tensorflow
    Args:
        x:           Tensor, 4D BHWD input maps
        out_size:       integer, depth of input maps
        phase_train: boolean tf.Varialbe, true indicates training phase
        scope:       string, variable scope
    Return:
        normed:      batch-normalized maps
    """
    with tf.variable_scope('bn'):
        beta = tf.Variable(tf.constant(0.0, shape=[out_size]),
                           name='beta', trainable=True)
        gamma = tf.Variable(tf.constant(1.0, shape=[out_size]),
                            name='gamma', trainable=True)
        batch_mean, batch_var = tf.nn.moments(x, [0], name='moments')
        ema = tf.train.ExponentialMovingAverage(decay=0.5)

        def mean_var_with_update():
            ema_apply_op = ema.apply([batch_mean, batch_var])
            with tf.control_dependencies([ema_apply_op]):
                return tf.identity(batch_mean), tf.identity(batch_var)

        mean, var = tf.cond(phase_train,
                            mean_var_with_update,
                            lambda: (ema.average(batch_mean), ema.average(batch_var)))
        normed = tf.nn.batch_normalization(x, mean, var, beta, gamma, 1e-3)
    return normed

单层

with tf.name_scope('FC1'):
    # 激活函数在BN之后，所以此处为None
    query_l1 = add_layer(query_batch, TRIGRAM_D, L1_N, activation_function=None)
    doc_positive_l1 = add_layer(doc_positive_batch, TRIGRAM_D, L1_N, activation_function=None)
    doc_negative_l1 = add_layer(doc_negative_batch, TRIGRAM_D, L1_N, activation_function=None)

with tf.name_scope('BN1'):
    query_l1 = batch_normalization(query_l1, on_train, L1_N)
    doc_l1 = batch_normalization(tf.concat([doc_positive_l1, doc_negative_l1], axis=0), on_train, L1_N)
    doc_positive_l1 = tf.slice(doc_l1, [0, 0], [query_BS, -1])
    doc_negative_l1 = tf.slice(doc_l1, [query_BS, 0], [-1, -1])
    query_l1_out = tf.nn.relu(query_l1)
    doc_positive_l1_out = tf.nn.relu(doc_positive_l1)
    doc_negative_l1_out = tf.nn.relu(doc_negative_l1)
······

合并负样本

with tf.name_scope('Merge_Negative_Doc'):
    # 合并负样本，tile可选择是否扩展负样本。
    doc_y = tf.tile(doc_positive_y, [1, 1])
    for i in range(NEG):
        for j in range(query_BS):
            # slice(input_, begin, size)切片API
            doc_y = tf.concat([doc_y, tf.slice(doc_negative_y, [j * NEG + i, 0], [1, -1])], 0)

3.3 计算cos相似度

with tf.name_scope('Cosine_Similarity'):
    # Cosine similarity
    # query_norm = sqrt(sum(each x^2))
    query_norm = tf.tile(tf.sqrt(tf.reduce_sum(tf.square(query_y), 1, True)), [NEG + 1, 1])
    # doc_norm = sqrt(sum(each x^2))
    doc_norm = tf.sqrt(tf.reduce_sum(tf.square(doc_y), 1, True))

    prod = tf.reduce_sum(tf.multiply(tf.tile(query_y, [NEG + 1, 1]), doc_y), 1, True)
    norm_prod = tf.multiply(query_norm, doc_norm)

    # cos_sim_raw = query * doc / (||query|| * ||doc||)
    cos_sim_raw = tf.truediv(prod, norm_prod)
    # gamma = 20
    cos_sim = tf.transpose(tf.reshape(tf.transpose(cos_sim_raw), [NEG + 1, query_BS])) * 20

3.4 定义损失函数

with tf.name_scope('Loss'):
    # Train Loss
    # 转化为softmax概率矩阵。
    prob = tf.nn.softmax(cos_sim)
    # 只取第一列，即正样本列概率。
    hit_prob = tf.slice(prob, [0, 0], [-1, 1])
    loss = -tf.reduce_sum(tf.log(hit_prob))
    tf.summary.scalar('loss', loss)

3.5选择优化方法

with tf.name_scope('Training'):
    # Optimizer
    train_step = tf.train.AdamOptimizer(FLAGS.learning_rate).minimize(loss)

3.6 开始训练

# 创建一个Saver对象，选择性保存变量或者模型。
saver = tf.train.Saver()
# with tf.Session(config=config) as sess:
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    train_writer = tf.summary.FileWriter(FLAGS.summaries_dir + '/train', sess.graph)
    start = time.time()
    for step in range(FLAGS.max_steps):
        batch_id = step % FLAGS.epoch_steps
        sess.run(train_step, feed_dict=feed_dict(True, True, batch_id % FLAGS.pack_size, 0.5))

GitHub完整代码 https://github.com/InsaneLife/dssm

Multi-view DSSM实现同理，可以参考GitHub：multi_view_dssm

CSDN原文：http://blog.csdn.net/shine19930820/article/details/79042567

注意：
由于之前代码api过时，已更新最新代码于：https://github.com/InsaneLife/dssm/blob/master/dssm_rnn.py 数据处理代码data_input.py 和数据data 已经更新，由于使用了rnn，所以输入非bag of words方式。

33 条评论

zzy_ucas 2022.10.26
您好，请问，n-gram word hashing是把query中的每个word按照n-gram拆分，然后为每个n-gram获取其embedding吗？每个单词会有多个n-gram embedding，要相加？还是怎样？谢谢

bihanren32 2019.10.28
博主请问multi_view_data_input的python文件可以上传一份到github下么，
- 百川AI回复bihanren32 2019.10.29
  [reply]bihanren32[/reply] 不好意思，时间太久远了，那部分数据和代码都没了，所以后来补充了一个天池的公开数据集和对应的输入。

刘大梦y 2019.09.17
博主好，麻烦问下，网络最后一层是relu激活函数，之后计算cosine，那么计算的cosine值都是大于0的吧？

夜空中最亮的小星星 2019.07.26
您好，我想问下。lstm-dssm训练的时候用1个正样本，4个负样本。但是预测的时候怎么处理模型呢？只用正样本的分支吗？
- zzw_1024回复weixin_39012047 2020.06.02
  [reply]weixin_39012047[/reply]预测的时候只serving隐层
- weixin_39012047回复夜空中最亮的小星星 2019.09.17
  [reply]qq_26692941[/reply] 你好，预测的问题解决了吗

转身之后泪不停流 2019.06.12
你好，请问一下运行DSSM的时候，训练三次之后loss就会变成NAN，这是什么原因呢？怎么解决呢？
- 夜空中最亮的小星星回复转身之后泪不停流 2019.07.26
  [reply]suijiatun9753[/reply] 您好，我想问下。lstm-dssm训练的时候用1个正样本，4个负样本。但是预测的时候怎么处理模型呢？只用正样本的分支吗？
- IT界的小小小学生回复转身之后泪不停流 2019.06.27
  [reply]suijiatun9753[/reply] 您好，我也出现这个问题，可否加下qq
- 百川AI回复转身之后泪不停流 2019.06.12
  [reply]suijiatun9753[/reply] 一般都是损失函数取对数出现0，或者分母为0，可以添加一个极小值，例如 loss = -tf.reduce_sum(tf.log(hit_prob + 1e-8))

百川AI 2019.05.05
由于之前代码api过时，已更新最新代码于：https://github.com/InsaneLife/dssm/blob/master/dssm_rnn.py 数据处理代码data_input.py 和数据data 已经更新，由于使用了rnn，所以输入非bag of words方式。
- 转身之后泪不停流回复百川AI 2019.06.13
  [reply]shine19930820[/reply] 楼主，你好，抱歉打扰您，关于这个问题，在计算loss的时候我加了1e-8,并且利用tf.clip_by_value(cos_sim, 1e-8, 1.0)进行截断，也还是会出现NAN的问题，这里损失函数好像没有涉及到取对数，所以请问这个是什么原因呢？万分感谢！

weixin_42760246 2019.02.25
有没Keras版本的求一份 Q775301251

homehehe2014 2019.02.18
看回复，楼主的代码还有一堆问题啊。。

tt163789 2018.08.20
博主好，V3代码matmul部分有错误，FC2的输入也有问题。这几处修改后才能运行成功。最重要的是，代码中只有训练没有预测部分，模型生成后加载时，doc该怎么组织?训练时候事先知道那个是正样本，那个是负样本，但预测时候不知道doc是什么样本，那对应的： doc_positive_l1 = add_layer(doc_positive_batch, TRIGRAM_D, L1_N, activation_function=None) doc_negative_l1 = add_layer(doc_negative_batch, TRIGRAM_D, L1_N, activation_function=None) 在预测时候，怎么取值？
- 夜空中最亮的小星星回复tt163789 2019.07.26
  [reply]tt163789[/reply] 您好，我想问下。lstm-dssm训练的时候用1个正样本，4个负样本。但是预测的时候怎么处理模型呢？只用正样本的分支吗？
- tt163789回复muyang_ma 2018.09.20
  [reply]muyang_ma[/reply] 预测时候，使用restore恢复模型，sess.run(cos_sim, feed_dict = feed_data())，负样本由之前的4个修改为1个，与正样本相同，我训练数据比较少，模型训练不收敛，因此预测有点问题
- muyang_ma回复tt163789 2018.09.18
  [reply]tt163789[/reply] 能否将预测部分分享一下呢？
- 百川AI回复tt163789 2018.09.06
  [reply]tt163789[/reply] 好的，欢迎git 提交指正，show me your code，哈哈。

ThanksCreek 2018.06.11
博主好，query是表示成one-hot形式的吗？生成的逻辑是什么？是通过判断query中的字在字典中是否存在，字典中存在的字设置成1否则为0，然后query表示成类似[0,0,0,1,0,1,0]样式的稀疏向量？
- 百川AI回复ThanksCreek 2018.08.16
  [reply]LGCSSX[/reply] bag of words了解一下。