随笔-关系抽取(三) — Dependency-based Models

Improved Relation Classification by Deep Recurrent Neural Networks with Data Augmentation

现状

使用feature-based和kernel-based方式:feature: lexical, syntactic, semantic one; kernel: require predefined similarity measure of two data samples, a kernel along the shortest dependency path (SDP) between two entities
当前的NN都是single layer,需要用一个deep的架构

模型

在这里插入图片描述

  1. 将SDP拆分成两条sub-path,每个sub-path用RNN处理。
  2. 每个RNN利用4个channel:word embeddings, POS embeddings, grammatical relation embeddings, and WordNet embeddings。
  • Word representations: Each word in a given sentence is mapped to a real-valued vector by looking up in a word embedding table. Unsupervisedly trained on a large corpus, word embeddings are thought to be able to well capture words’ syntactic and semantic information (Mikolov et al., 2013b)
  • Part-of-speech tags: Since word embed- dings are obtained on a generic corpus of a large scale, the information they contain may not agree with a specific sentence. We deal with this problem by allying each input word with its POS tag, e.g., noun, verb, etc.In our experiment, we only take into use a coarse-grained POS category, containing 15 different tags
  • Grammatical relations: The dependency relations between a governing word and its children makes a difference in meaning. A same word pair may have different dependency relation types. For example, “beats nsubj − − − → it” is distinct from “beats dobj − − − → it.”Thus, it is necessary to capture such grammatical relations in SDPs
  • WordNet hypernyms: As illustrated in Section 1, hyponymy information is also useful for relation classification. (Details are not repeated here.) To leverage WordNet hypernyms, we use a tool developed by Ciaramita and Altun (2006).2The tool assigns a hypernym to each word, from 41 predefined concepts in WordNet, e.g., noun.food, verb.motion, etc. Given its hypernym, each word gains a more abstract concept, which helps to build a linkage between different but conceptual similar words.
  1. 层叠拼接4个RNN层StackedRNNCells
  2. 每个RNN后接一个max-pooling来搜集信息
  3. 所有的max-pooling结果concatenation在一起送入softmax
  4. 为了解决deep数据不足的问题,通过将SDP反向+directed relation来进行data augmentation

Bidirectional Recurrent Convolutional Neural Network for Relation Classification

模型

最短依存路径SDP
在这里插入图片描述

贡献

  1. 提出RCNN,通过two-channel LSTM捕捉SDP的全局特征,通过CNN捕捉相邻节点的局部特征
  2. 提出沿着SDP双向BiRCNN,从两个方向去捕捉信息。

模型架构

在这里插入图片描述

  1. SDP作为输入,包括words和relation,其中words用预训练向量初始化,relation随机初始化。将words和relation分别通过embedding层送入LSTM组成two-channel LSTM(words和relation分别通过不同的LSTM, 互不影响,是两个channel)
  2. CONV从LSTM结果中捕捉dependency unit的局部特征(两个word+一个relation),再通过max-pooling搜集全局信息
  3. 损失函数
    J = ∑ i = 1 2 K + 1 t i → l o g y i → + ∑ i = 1 2 K + 1 t i ← l o g y i ← + ∑ i = 1 K t i l o g y i + λ ⋅ ∥ θ ∥ 2 J=\sum_{i=1}^{2K+1}\overrightarrow{t_i}log\overrightarrow{y_i} + \sum_{i=1}^{2K+1}\overleftarrow{t_i}log\overleftarrow{y_i} + \sum_{i=1}^{K}{t_i}log{y_i} + \lambda·\|\theta\|^2 J=i=12K+1ti logyi +i=12K+1ti logyi +i=1Ktilogyi+λθ2
    其中前两项对应是forward和backward的损失函数,第三项对应拼接forward和backward的损失函数,第4项是l2正则项。
    预测:
    y t e s t = α ⋅ y → + ( 1 − α ) ⋅ y ← y_{test}=\alpha·\overrightarrow{y} + (1-\alpha)·\overleftarrow{y} ytest=αy +(1α)y

代码

def model(input_, word_vec_matrix_pretrained, keep_prob, config):
    word_vec = tf.constant(value=word_vec_matrix_pretrained, name="word_vec", dtype=tf.float32)
    rel_vec = tf.Variable(tf.random_uniform([config.rel_size, config.rel_vec_size], -0.05, 0.05), name="rel_vec", dtype=tf.float32)
    #tf.add_to_collection(l2_collection_name, word_vec)
    tf.add_to_collection(l2_collection_name, rel_vec)

	# 前向two-channel BiLSTM + CNN
    with tf.name_scope("look_up_table_f"):
        inputs_words_f = tf.nn.embedding_lookup(word_vec, input_["sdp_words_index"])
        inputs_rels_f = tf.nn.embedding_lookup(rel_vec, input_["sdp_rels_index"])
        inputs_words_f = tf.nn.dropout(inputs_words_f, keep_prob)
        inputs_rels_f = tf.nn.dropout(inputs_rels_f, keep_prob)

    with tf.name_scope("lstm_f"):
        words_lstm_rst_f = lstm_layer(inputs_words_f, length2(input_["sdp_words_index"]), config.word_lstm_hidden_size, config.forget_bias, "word_lstm_f")
        rels_lstm_rst_f = lstm_layer(inputs_rels_f, length2(input_["sdp_rels_index"]), config.rel_lstm_hidden_size, config.forget_bias, "rel_lstm_f")
        tf.summary.histogram("words_lstm_rst_f", words_lstm_rst_f)
        tf.summary.histogram("rels_lstm_rst_f", rels_lstm_rst_f)

    with tf.name_scope("conv_max_pool_f"):
        conv_output_f = conv_layer(words_lstm_rst_f, rels_lstm_rst_f, input_["mask"], config.concat_conv_size, config.conv_out_size, "conv_f")
        pool_output_f = pool_layer(conv_output_f, config)
        tf.summary.histogram("conv_output_f", conv_output_f)
        tf.summary.histogram("pool_output_f", pool_output_f)

	# 后向two-channel BiLSTM + CNN
    with tf.name_scope("look_up_table_b"):
        inputs_words_b = tf.nn.embedding_lookup(word_vec, input_["sdp_rev_words_index"])
        inputs_rels_b = tf.nn.embedding_lookup(rel_vec, input_["sdp_rev_rels_index"])
        inputs_words_b = tf.nn.dropout(inputs_words_b, keep_prob)
        inputs_rels_b = tf.nn.dropout(inputs_rels_b, keep_prob)

    with tf.name_scope("lstm_b"):
        words_lstm_rst_b = lstm_layer(inputs_words_b, length2(input_["sdp_rev_words_index"]), config.word_lstm_hidden_size, config.forget_bias, "word_lstm_b")
        rels_lstm_rst_b = lstm_layer(inputs_rels_b, length2(input_["sdp_rev_rels_index"]), config.rel_lstm_hidden_size, config.forget_bias, "rel_lstm_b")
        tf.summary.histogram("words_lstm_rst_b", words_lstm_rst_b)
        tf.summary.histogram("rels_lstm_rst_b", rels_lstm_rst_b)

    with tf.name_scope("conv_max_pool_b"):
        conv_output_b = conv_layer(words_lstm_rst_b, rels_lstm_rst_b, input_["mask"], config.concat_conv_size, config.conv_out_size, "conv_b")
        pool_output_b = pool_layer(conv_output_b, config)
        tf.summary.histogram("conv_output_b", conv_output_b)
        tf.summary.histogram("pool_output_b", pool_output_b)

    with tf.name_scope("softmax"):
        # 拼接前向和后向的输出
        pool_concat = tf.concat([pool_output_f, pool_output_b], 1)
        logits_f, hypothesis_f = softmax_layer(pool_output_f, config.conv_out_size, 19, "softmax_f")
        logits_b, hypothesis_b = softmax_layer(pool_output_b, config.conv_out_size, 19, "softmax_b")
        logits_concat, hypothesis_concat = softmax_layer(pool_concat, 2*(config.conv_out_size), 10, "softmax_concat")

    # L2 regularization
    regularizers = 0
    vars = tf.get_collection(l2_collection_name)
    for var in vars:
        regularizers += tf.nn.l2_loss(var)

    # loss function
    # 损失函数
    loss = tf.nn.softmax_cross_entropy_with_logits(logits=logits_f, labels=input_["label_fb"])
    loss += tf.nn.softmax_cross_entropy_with_logits(logits=logits_b, labels=input_["label_fb"])
    if config.has_corase_grained:
        loss += tf.nn.softmax_cross_entropy_with_logits(logits=logits_concat, labels=input_["label_concat"])
    loss_avg = tf.reduce_mean(loss) + config.l2 * regularizers

    # gradient clip
    tvars = tf.trainable_variables()
    grads, _ = tf.clip_by_global_norm(tf.gradients(loss_avg, tvars), config.grad_clip)
    #train_op = tf.train.AdamOptimizer(config.lr)
    train_op = tf.train.AdadeltaOptimizer(config.lr)
    optimizer = train_op.apply_gradients(zip(grads, tvars))

    # get predict results
    # 预测
    prediction = get_prediction(hypothesis_f, hypothesis_b, config.alpha)
    correct_prediction = tf.equal(prediction, tf.argmax(input_["label_fb"], 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

    loss_summary = tf.summary.scalar("loss", loss_avg)
    accuracy_summary = tf.summary.scalar("accuracy_summary", accuracy)

    grad_summaries = []
    for g, v in zip(grads, tvars):
        if g is not None:
            grad_hist_summary = tf.summary.histogram("{}/grad/hist".format(v.name), g)
            sparsity_summary = tf.summary.histogram("{}/grad/sparsity".format(v.name), tf.nn.zero_fraction(g))
            grad_summaries.append(grad_hist_summary)
            grad_summaries.append(sparsity_summary)
    grad_summaries_merged = tf.summary.merge(grad_summaries)
    summary = tf.summary.merge_all()

    return loss_avg, accuracy, prediction, optimizer, summary
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值