CS224n - 任务3 - 2. RNN解决命名实体问题

给定输入,每个RNNcell使用sigmoid函数得到隐藏状态向量。 然后我们使用隐藏状态来预测每个时间步的输出:

\begin{array}{l}e^{(t)}=x^{(t)}L\\h^{(t)}=\sigma(h^{(t-1)}W_h+e^{(t)}W_x+b_1)\\\widehat y^{(t)}=\mathrm{softmax}(h^{(t)}U+b_2)\end{array}

为了训练模型,我们对每个预测的标记使用交叉熵损失:

\begin{array}{l}J=CE(y^{(t)},\widehat y^{(t)})\\CE(y^{(t)},\widehat y^{(t)})=-\sum_iy_i^{(t)}\log(\widehat y_i^{(t)})\end{array}

(a) i. RNN模型:W_x 有 D\times H个参数,W_h 有 H\times H个参数

基于window的模型:W 有 (2\omega+1)D\times H个参数

ii. 预测长T的句子标签的时间复杂度:O((D+H)HT)

  • e^{(t)}O(D)
  • h^{(t)}O(H^2+DH+H)
  • \widehat y^{(t)}O(HC+C)

(b) 很难直接对F1进行优化

  • F1不可微分
  • 需要从整个语料库预测来计算,这使批处理和并行化非常困难

(c) 在q2_rnn_cell.py实现RNN cell,并运行

class RNNCell(tf.nn.rnn_cell.RNNCell):
    def __init__(self,input_size,state_size):
        self.input_size=input_size
        self._state_size=state_size

    @property
    def state_size(self):
        return self._state_size

    @property
    def output_size(self):
        return self._state_size

    def __call__(self, inputs, state,scope=None):
        "TODO: In the code below, implement an RNN cell using @inputs"
        scope=scope or type(self).__name__

        with tf.variable_scope(scope):
            W_x=tf.get_variable('W_x',shape=(self.input_size,self._state_size),dtype=tf.float32,
                                initializer=tf.contrib.layers.xavier_initializer())
            W_h=tf.get_variable('W_h',shape=(self._state_size,self._state_size),dtype=tf.float32,
                                initializer=tf.contrib.layers.xavier_initializer())
            b=tf.get_variable('b',shape=(self._state_size),dtype=tf.float32,
                              initializer=tf.contrib.layers.xavier_initializer())
            new_state=tf.nn.sigmoid(tf.matmul(state,W_h)+tf.matmul(inputs,W_x)+b)
        output=new_state
        return output,new_state

def test_rnn_cell():
    with tf.Graph().as_default():
        with tf.variable_scope("test_rnn_cell"):
            x_placeholder=tf.placeholder(tf.float32,shape=(None,3))
            h_placeholder=tf.placeholder(tf.float32,shape=(None,2))

            with tf.variable_scope("rnn"):
                tf.get_variable("W_x",initializer=np.array(np.eye(3,2),dtype=np.float32))
                tf.get_variable("W_h",initializer=np.array(np.eye(2,2),dtype=np.float32))
                tf.get_variable("b",initializer=np.array(np.ones(2),dtype=np.float32))

            tf.get_variable_scope().reuse_variables()
            cell=RNNCell(3,2)
            y_var,ht_var=cell(x_placeholder,h_placeholder,scope="rnn")

            init=tf.global_variables_initializer()
            with tf.Session() as session:
                session.run(init)
                x=np.array([
                    [0.4,0.5,0.6],
                    [0.3,-0.2,-0.1]],dtype=np.float32)
                h=np.array([
                    [0.2,0.5],
                    [-0.3,-0.3]],dtype=np.float32)
                y=np.array([
                    [0.832,0.881],
                    [0.731,0.622]],dtype=np.float32)
                ht=y

                y_,ht_=session.run([y_var,ht_var],feed_dict={x_placeholder:x,h_placeholder:h})
                print("y_ = "+str(y_))
                print("ht_ = "+str(ht_))

                assert np.allclose(y_,ht_), "output and state should be equal."
                assert np.allclose(ht,ht_,atol=1e-2),"new state vector does not seem to be correct."

(d) 实现RNN要求我们在整个句子上展开计算。 但是每个句子可以任意长度,这导致RNN针对不同句子展开不同次数。使得不可能批量处理数据。解决此问题的最常见方法是使用零填充输入,假设我们输入的句子最大长度为M,对长度为T的输入:

  1. 将“0-vectors”添加到x和y以使它们长为M
  2. 创建mask vector {(m^{(t)})}_{t=1}^M,其中t\leq T为1,t> T为0
  3. 损失函数:J=\sum_{t=1}^Mm^{(t)}CE(y^{(t)},\widehat y^{(t)})

i. 通过掩盖损失,我们将由于这些额外的 0-labels(以及它们的梯度)引起的损失归零

ii. 实现pad_sequence并运行验证

def pad_sequences(data,max_length):
    "TODO: In the code below, for every sentence, labels pair in @data"
    ret=[]

    # Use this zero vector when padding sequences.
    zero_vector=[0]*Config.n_features
    zero_label=4

    for sentence,labels in data:
        len_senquence=len(sentence)
        add_length=max_length-len_senquence
        if add_length>0:
            filled_sentence=sentence+([zero_vector]*add_length)
            filled_labels=labels+([zero_label]*add_length)
            mark=[True]*len_senquence
            mark.extend([False]*add_length)
        else:
            mark=[True]*max_length
            filled_sentence=sentence[:max_length]
            filled_labels=labels[:max_length]

        ret.append((filled_sentence,filled_labels,mark))
    return ret

(e) RNN模型的实现

  • 实现add_placeholders, add_embedding, add_training_op函数
  • 实现add_prediction_op操作,展开RNN循环self.max_length次。从第2步开始,在变量范围内重用变量,共享权重W_xW_h
  • 实现add_loss_op处理上一部分返回的mask vector
class RNNModel(NERModel):
    def add_placeholders(self):
        "TODO: Add these placeholders to self as the instance variables"
        self.input_placeholder=tf.placeholder(
            tf.int32,shape=(None,self.max_length,Config.n_features),name='input')
        self.lables_placeholder=tf.placeholder(
            tf.int32,shape=(None,self.max_length),name='labels')
        self.mask_placeholder=tf.placeholder(
            tf.bool,shape=(None,self.max_length),name='mask')
        self.dropout_placeholder=tf.placeholder(
            tf.float32,name='dropout')

    def create_feed_dict(self, inputs_batch, mask_batch, labels_batch=None, dropout=1):
        feed_dict={
            self.input_placeholder:inputs_batch,
            self.mask_placeholder:mask_batch,
            self.dropout_placeholder:dropout
        }
        if labels_batch is not None:
            feed_dict[self.lables_placeholder]=labels_batch
        return feed_dict

    def add_embedding(self):
        """Adds an embedding layer that maps from input tokens (integers) to vectors and
        then concatenates those vectors"""
        embeddings=tf.nn.embedding_lookup(
            tf.Variable(self.pretrained_embeddings),
            self.input_placeholder)
        embeddings=tf.reshape(
            embeddings,[-1,self.max_length,Config.n_features*Config.embed_size])
        return embeddings

    def add_prediction_op(self):
        """Adds the unrolled RNN:
        h_0=0
        for t in 1 to T:
            o_t,h_t=cell(x_t,h_{t-1})
            o_drop_t=Dropout(o_t,dropout_rate)
            y_t=o_drop_t U+b_2"""
        x=self.add_embedding()
        dropout_rate=self.dropout_placeholder

        preds=[]
        if self.config.cell=="rnn":
            cell=RNNCell(Config.n_features*Config.embed_size,Config.hidden_size)
        else:
            raise ValueError("Unsupported cell type"+self.config.cell)

        with tf.variable_scope('Layer1'):
            U=tf.get_variable('U',(Config.hidden_size,Config.n_classes),initializer=tf.contrib.layers.xavier_initializer())
            b2=tf.get_variable('b2',(Config.n_classes),initializer=tf.constant_initializer(0))

        input_shape=tf.shape(x)
        state=tf.zeros((input_shape[0],Config.hidden_size))
        with tf.variable_scope("RNN"):
            for time_step in range(self.max_length):
                if time_step>0:
                    tf.get_variable_scope().reuse_variables()
                o,state=cell(x[:,time_step,:],state,scope="RNN")
                o_drop=tf.nn.dropout(o,dropout_rate)
                output=tf.matmul(o_drop,U)+b2
                preds.append(output)

        preds=tf.stack(preds,axis=1)

        assert preds.get_shape().as_list()==[None,self.max_length,self.config.n_classes],\
            "predictions are not of the right shape. Expected {}, got {}"\
            .format([None,self.max_length,self.config.n_classes],preds.get_shape().as_list())
        return preds

    def add_loss_op(self, preds):
        """Adds Ops for the loss function to the computational graph."""
        masked_logits=tf.boolean_mask(preds,self.mask_placeholder)
        masked_labels=tf.boolean_mask(self.lables_placeholder,self.mask_placeholder)
        loss=tf.reduce_mean(
            tf.nn.sparse_softmax_cross_entropy_with_logits(logits=masked_logits,
                                                           labels=masked_labels)
        )
        return loss

    def add_training_op(self, loss):
        train_op=tf.train.AdamOptimizer(Config.lr).minimize(loss)
        return train_op

    def preprocess_sequence_data(self, examples):
        def featurize_windows(data, start, end, window_size=1):
            """Use the input sequences in @data to construct new windowed data points."""
            ret=[]
            for sentence,labels in data:
                from util import window_iterator
                sentence_=[]
                for window in window_iterator(sentence,window_size,beg=start,end=end):
                    sentence_.append(sum(window,[]))
                ret.append((sentence_,labels))
            return ret
        examples=featurize_windows(examples,self.helper.START,self.helper.END)
        return pad_sequences(examples,self.max_length)

    def consolidate_predictions(self, examples_raw, examples, preds):
        "Batch the predictions into groups of sequence length"
        assert len(examples_raw)==len(examples)
        assert len(examples_raw)==len(preds)
        ret=[]
        for i,(sentence,labels) in enumerate(examples_raw):
            _,_,mask=examples[i]
            labels_=[l for l,m in zip(preds[i],mask) if m]
            assert len(labels_)==len(labels)
            ret.append([sentence,labels,labels_])
        return ret

    def predict_on_batch(self, sess, inputs_batch,mask_batch):
        feed=self.create_feed_dict(inputs_batch=inputs_batch,mask_batch=mask_batch)
        predictions=sess.run(tf.argmax(self.pred,axis=2),feed_dict=feed)
        return predictions

    def train_on_batch(self, sess, inputs_batch, labels_batch,mask_batch):
        feed=self.create_feed_dict(inputs_batch,labels_batch=labels_batch,mask_batch=mask_batch,
                                   dropout=Config.dropout)
        _,loss=sess.run([self.train_op,self.loss],feed_dict=feed)
        return loss

    def __init__(self,helper,config,pretrained_embedding,report=None):
        super(RNNModel,self).__init__(helper,config,report)
        self.max_length=min(Config.max_length,helper.max_length)
        Config.max_length=self.max_length
        self.pretrained_embeddings=pretrained_embedding

        self.input_placeholder=None
        self.lables_placeholder=None
        self.mask_placeholder=None
        self.dropout_placeholder=None

        self.build()

通过python q2_rnn.py test2对实现的代码进行测试

(f) 通过python q2_rnn.py train训练模型:CPU需要两小时,GPU需要10-20分钟

模型和输出将会保存在 results/rnn/<timestamp>/ ,其中<timestamp>是程序运行的日期和时间。文件results.txt包含模型在验证集上预测结果的格式化输出,文件log包含打印输出,比如在训练期间计算的混淆矩阵和F1分数。

最后,可以使用以下方式与模型进行交互:python q2_rnn.py shell -m results/rnn/<timestamp>/

(g) RNN模型的限制

  • 模型在进行预测时无法看到未来。 例如 “New York State University”, 第一个New可能被标记为“LOC”(New York)。biRNN可以解决这个问题。
  • 该模型不强制相邻的令牌具有相同的标签(由上例说明)。 在损失中引入成对协议(即使用CRF损失)将解决该问题。

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
1. ARIMA 2. SARIMA 3. VAR 4. Auto-ARIMA 5. Auto-SARIMA 6. LSTM 7. GRU 8. RNN 9. CNN 10. MLP 11. DNN 12. MLP-LSTM 13. MLP-GRU 14. MLP-RNN 15. MLP-CNN 16. LSTM-ARIMA 17. LSTM-MLP 18. LSTM-CNN 19. GRU-ARIMA 20. GRU-MLP 21. GRU-CNN 22. RNN-ARIMA 23. RNN-MLP 24. RNN-CNN 25. CNN-ARIMA 26. CNN-MLP 27. CNN-LSTM 28. CNN-GRU 29. ARIMA-SVM 30. SARIMA-SVM 31. VAR-SVM 32. Auto-ARIMA-SVM 33. Auto-SARIMA-SVM 34. LSTM-SVM 35. GRU-SVM 36. RNN-SVM 37. CNN-SVM 38. MLP-SVM 39. LSTM-ARIMA-SVM 40. LSTM-MLP-SVM 41. LSTM-CNN-SVM 42. GRU-ARIMA-SVM 43. GRU-MLP-SVM 44. GRU-CNN-SVM 45. RNN-ARIMA-SVM 46. RNN-MLP-SVM 47. RNN-CNN-SVM 48. CNN-ARIMA-SVM 49. CNN-MLP-SVM 50. CNN-LSTM-SVM 51. CNN-GRU-SVM 52. ARIMA-RF 53. SARIMA-RF 54. VAR-RF 55. Auto-ARIMA-RF 56. Auto-SARIMA-RF 57. LSTM-RF 58. GRU-RF 59. RNN-RF 60. CNN-RF 61. MLP-RF 62. LSTM-ARIMA-RF 63. LSTM-MLP-RF 64. LSTM-CNN-RF 65. GRU-ARIMA-RF 66. GRU-MLP-RF 67. GRU-CNN-RF 68. RNN-ARIMA-RF 69. RNN-MLP-RF 70. RNN-CNN-RF 71. CNN-ARIMA-RF 72. CNN-MLP-RF 73. CNN-LSTM-RF 74. CNN-GRU-RF 75. ARIMA-XGBoost 76. SARIMA-XGBoost 77. VAR-XGBoost 78. Auto-ARIMA-XGBoost 79. Auto-SARIMA-XGBoost 80. LSTM-XGBoost 81. GRU-XGBoost 82. RNN-XGBoost 83. CNN-XGBoost 84. MLP-XGBoost 85. LSTM-ARIMA-XGBoost 86. LSTM-MLP-XGBoost 87. LSTM-CNN-XGBoost 88. GRU-ARIMA-XGBoost 89. GRU-MLP-XGBoost 90. GRU-CNN-XGBoost 91. RNN-ARIMA-XGBoost 92. RNN-MLP-XGBoost 93. RNN-CNN-XGBoost 94. CNN-ARIMA-XGBoost 95. CNN-MLP-XGBoost 96. CNN-LSTM-XGBoost 97. CNN-GRU-XGBoost 98. ARIMA-ANN 99. SARIMA-ANN 100. VAR-ANN 上面这些缩写模型的全称及相关用途功能详细解释
最新发布
07-15

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值