随笔-关系抽取（三） — Dependency-based Models

最新推荐文章于 2022-07-25 20:26:59 发布

赵文淮

最新推荐文章于 2022-07-25 20:26:59 发布

阅读量601

点赞数

分类专栏：深度学习

本文链接：https://blog.csdn.net/eyeshere/article/details/104021701

版权

深度学习专栏收录该内容

8 篇文章 0 订阅

订阅专栏

关系抽取（三） — Dependency-based Models

Improved Relation Classification by Deep Recurrent Neural Networks with Data Augmentation
- 现状
- 模型
Bidirectional Recurrent Convolutional Neural Network for Relation Classification

Improved Relation Classification by Deep Recurrent Neural Networks with Data Augmentation

现状

使用feature-based和kernel-based方式：feature: lexical, syntactic, semantic one; kernel: require predefined similarity measure of two data samples, a kernel along the shortest dependency path (SDP) between two entities
当前的NN都是single layer，需要用一个deep的架构

模型

在这里插入图片描述

将SDP拆分成两条sub-path，每个sub-path用RNN处理。
每个RNN利用4个channel：word embeddings, POS embeddings, grammatical relation embeddings, and WordNet embeddings。

Word representations: Each word in a given sentence is mapped to a real-valued vector by looking up in a word embedding table. Unsupervisedly trained on a large corpus, word embeddings are thought to be able to well capture words’ syntactic and semantic information (Mikolov et al., 2013b)
Part-of-speech tags: Since word embed- dings are obtained on a generic corpus of a large scale, the information they contain may not agree with a specific sentence. We deal with this problem by allying each input word with its POS tag, e.g., noun, verb, etc.In our experiment, we only take into use a coarse-grained POS category, containing 15 different tags
Grammatical relations: The dependency relations between a governing word and its children makes a difference in meaning. A same word pair may have different dependency relation types. For example, “beats nsubj − − − → it” is distinct from “beats dobj − − − → it.”Thus, it is necessary to capture such grammatical relations in SDPs
WordNet hypernyms: As illustrated in Section 1, hyponymy information is also useful for relation classification. (Details are not repeated here.) To leverage WordNet hypernyms, we use a tool developed by Ciaramita and Altun (2006).2The tool assigns a hypernym to each word, from 41 predefined concepts in WordNet, e.g., noun.food, verb.motion, etc. Given its hypernym, each word gains a more abstract concept, which helps to build a linkage between different but conceptual similar words.

层叠拼接4个RNN层StackedRNNCells
每个RNN后接一个max-pooling来搜集信息
所有的max-pooling结果concatenation在一起送入softmax
为了解决deep数据不足的问题，通过将SDP反向+directed relation来进行data augmentation

Bidirectional Recurrent Convolutional Neural Network for Relation Classification

模型

最短依存路径SDP
在这里插入图片描述

贡献

提出RCNN，通过two-channel LSTM捕捉SDP的全局特征，通过CNN捕捉相邻节点的局部特征
提出沿着SDP双向BiRCNN，从两个方向去捕捉信息。

模型架构

在这里插入图片描述

SDP作为输入，包括words和relation，其中words用预训练向量初始化，relation随机初始化。将words和relation分别通过embedding层送入LSTM组成two-channel LSTM（words和relation分别通过不同的LSTM, 互不影响，是两个channel)
CONV从LSTM结果中捕捉dependency unit的局部特征（两个word+一个relation），再通过max-pooling搜集全局信息
损失函数
$J=\sum_{i=1}^{2K+1}\overrightarrow{t_i}log\overrightarrow{y_i} + \sum_{i=1}^{2K+1}\overleftarrow{t_i}log\overleftarrow{y_i} + \sum_{i=1}^{K}{t_i}log{y_i} + \lambda·\|\theta\|^2$
其中前两项对应是forward和backward的损失函数，第三项对应拼接forward和backward的损失函数，第4项是l2正则项。
预测：
$y_{test}=\alpha·\overrightarrow{y} + (1-\alpha)·\overleftarrow{y}$

代码

def model(input_, word_vec_matrix_pretrained, keep_prob, config):
    word_vec = tf.constant(value=word_vec_matrix_pretrained, name="word_vec", dtype=tf.float32)
    rel_vec = tf.Variable(tf.random_uniform([config.rel_size, config.rel_vec_size], -0.05, 0.05), name="rel_vec", dtype=tf.float32)
    #tf.add_to_collection(l2_collection_name, word_vec)
    tf.add_to_collection(l2_collection_name, rel_vec)

	# 前向two-channel BiLSTM + CNN
    with tf.name_scope("look_up_table_f"):
        inputs_words_f = tf.nn.embedding_lookup(word_vec, input_["sdp_words_index"])
        inputs_rels_f = tf.nn.embedding_lookup(rel_vec, input_["sdp_rels_index"])
        inputs_words_f = tf.nn.dropout(inputs_words_f, keep_prob)
        inputs_rels_f = tf.nn.dropout(inputs_rels_f, keep_prob)

    with tf.name_scope("lstm_f"):
        words_lstm_rst_f = lstm_layer(inputs_words_f, length2(input_["sdp_words_index"]), config.word_lstm_hidden_size, config.forget_bias, "word_lstm_f")
        rels_lstm_rst_f = lstm_layer(inputs_rels_f, length2(input_["sdp_rels_index"]), config.rel_lstm_hidden_size, config.forget_bias, "rel_lstm_f")
        tf.summary.histogram("words_lstm_rst_f", words_lstm_rst_f)
        tf.summary.histogram("rels_lstm_rst_f", rels_lstm_rst_f)

    with tf.name_scope("conv_max_pool_f"):
        conv_output_f = conv_layer(words_lstm_rst_f, rels_lstm_rst_f, input_["mask"], config.concat_conv_size, config.conv_out_size, "conv_f")
        pool_output_f = pool_layer(conv_output_f, config)
        tf.summary.histogram("conv_output_f", conv_output_f)
        tf.summary.histogram("pool_output_f", pool_output_f)

	# 后向two-channel BiLSTM + CNN
    with tf.name_scope("look_up_table_b"):
        inputs_words_b = tf.nn.embedding_lookup(word_vec, input_["sdp_rev_words_index"])
        inputs_rels_b = tf.nn.embedding_lookup(rel_vec, input_["sdp_rev_rels_index"])
        inputs_words_b = tf.nn.dropout(inputs_words_b, keep_prob)
        inputs_rels_b = tf.nn.dropout(inputs_rels_b, keep_prob)

    with tf.name_scope("lstm_b"):
        words_lstm_rst_b = lstm_layer(inputs_words_b, length2(input_["sdp_rev_words_index"]), config.word_lstm_hidden_size, config.forget_bias, "word_lstm_b")
        rels_lstm_rst_b = lstm_layer(inputs_rels_b, length2(input_["sdp_rev_rels_index"]), config.rel_lstm_hidden_size, config.forget_bias, "rel_lstm_b")
        tf.summary.histogram("words_lstm_rst_b", words_lstm_rst_b)
        tf.summary.histogram("rels_lstm_rst_b", rels_lstm_rst_b)

    with tf.name_scope("conv_max_pool_b"):
        conv_output_b = conv_layer(words_lstm_rst_b, rels_lstm_rst_b, input_["mask"], config.concat_conv_size, config.conv_out_size, "conv_b")
        pool_output_b = pool_layer(conv_output_b, config)
        tf.summary.histogram("conv_output_b", conv_output_b)
        tf.summary.histogram("pool_output_b", pool_output_b)

    with tf.name_scope("softmax"):
        # 拼接前向和后向的输出
        pool_concat = tf.concat([pool_output_f, pool_output_b], 1)
        logits_f, hypothesis_f = softmax_layer(pool_output_f, config.conv_out_size, 19, "softmax_f")
        logits_b, hypothesis_b = softmax_layer(pool_output_b, config.conv_out_size, 19, "softmax_b")
        logits_concat, hypothesis_concat = softmax_layer(pool_concat, 2*(config.conv_out_size), 10, "softmax_concat")

    # L2 regularization
    regularizers = 0
    vars = tf.get_collection(l2_collection_name)
    for var in vars:
        regularizers += tf.nn.l2_loss(var)

    # loss function
    # 损失函数
    loss = tf.nn.softmax_cross_entropy_with_logits(logits=logits_f, labels=input_["label_fb"])
    loss += tf.nn.softmax_cross_entropy_with_logits(logits=logits_b, labels=input_["label_fb"])
    if config.has_corase_grained:
        loss += tf.nn.softmax_cross_entropy_with_logits(logits=logits_concat, labels=input_["label_concat"])
    loss_avg = tf.reduce_mean(loss) + config.l2 * regularizers

    # gradient clip
    tvars = tf.trainable_variables()
    grads, _ = tf.clip_by_global_norm(tf.gradients(loss_avg, tvars), config.grad_clip)
    #train_op = tf.train.AdamOptimizer(config.lr)
    train_op = tf.train.AdadeltaOptimizer(config.lr)
    optimizer = train_op.apply_gradients(zip(grads, tvars))

    # get predict results
    # 预测
    prediction = get_prediction(hypothesis_f, hypothesis_b, config.alpha)
    correct_prediction = tf.equal(prediction, tf.argmax(input_["label_fb"], 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

    loss_summary = tf.summary.scalar("loss", loss_avg)
    accuracy_summary = tf.summary.scalar("accuracy_summary", accuracy)

    grad_summaries = []
    for g, v in zip(grads, tvars):
        if g is not None:
            grad_hist_summary = tf.summary.histogram("{}/grad/hist".format(v.name), g)
            sparsity_summary = tf.summary.histogram("{}/grad/sparsity".format(v.name), tf.nn.zero_fraction(g))
            grad_summaries.append(grad_hist_summary)
            grad_summaries.append(sparsity_summary)
    grad_summaries_merged = tf.summary.merge(grad_summaries)
    summary = tf.summary.merge_all()

    return loss_avg, accuracy, prediction, optimizer, summary

赵文淮

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
随笔-关系抽取（三） — Dependency-based Models

关系抽取（三） — Dependency-based ModelsImproved Relation Classification by Deep Recurrent Neural Networks with Data Augmentation现状模型Bidirectional Recurrent Convolutional Neural Network for Relation Classifi...
复制链接

扫一扫

专栏目录