cs224d 作业 problem set2 (三) 用RNNLM模型实现Language Model,来预测下一个单词的出现

 

今天将的还是cs224d 的problem set2 的第三部分习题,

原来国外大学的系统难度真的如此之大,相比之下还是默默地再天朝继续搬砖吧

下面讲述一下RNN语言建模的数学公式:

 

给出一串连续的词x1,x2...xt关于预测其后面紧跟的词xt+1的建模方式是:

vj是词库中的某个词。实现一个循环神经网络,此网络利用隐层中的反馈信息对"历史记录"x1,x2...xt进行建模:

$h^{(0)}=h_{0}\epsilon R^{D_{h}}$是隐藏层的初始化向量

$x^{(t)}L$是以$x^{(t)}$one-hot行向量与嵌入矩阵L的乘积

这个one-hot行向量就是当前处理词汇的索引

            

是词嵌入矩阵,

$L$是词嵌入矩阵

$I$是输入词表征矩阵

$H$是隐藏转换矩阵

$U$是输出词表征矩阵

$b_{1}$ $b_{2}$是偏置值

$d$是词嵌入的维数

|V|代表词库的规模

$D_{h}$是隐层的维数

输出向量

是面向整个词库的概率分布,我们需要最优化交叉熵(非正则化的)的损失率: 

使用困惑度来评估语言模型的性能,其定义形式如下:

梯度:

该模型中各个变量进行最优化迭代的时候的梯度如下所示:

初始化所有的上面这些需要训练的参数的值

然后通过对每一个词进行训练,安装上述公司求出每个参数的导数值

然后使用梯度下降方法对其进行更新

将新得到的参数代入到模型中,如果损失的值小于初始设定的值则停止迭代,否则继续进行迭代 

 


下面是一张RNNLM的结构图

 

上面这张是第二层RNN节点的结构图

上面这张是在RNN的变量上面应用Dropout的结构,降低模型过拟合的误差,第一层RNN的dropout结构

上面这张是第一层RNN的结构图

(注意前方高能,一大批天书即将来袭)

'''
Created on 2017年9月26日

@author: weizhen
'''
import getpass
import sys
import time
import numpy as np
from copy import deepcopy
from utils import calculate_perplexity, get_ptb_dataset, Vocab
from utils import ptb_iterator, sample
import tensorflow as tf
from model import LanguageModel
from tensorflow.contrib.legacy_seq2seq.python.ops.seq2seq import sequence_loss


class Config(object):
    """储存超参数和数据信息"""
    batch_size = 64
    embed_size = 50
    hidden_size = 100
    num_steps = 10
    max_epochs = 16
    early_stopping = 2
    dropout = 0.9
    lr = 0.001


class RNNLM_Model(LanguageModel):
    def load_data(self, debug=False):
        """加载词向量并且训练   train/dev/test 数据"""
        self.vocab = Vocab()
        self.vocab.construct(get_ptb_dataset('train'))
        self.encoded_train = np.array([self.vocab.encode(word) for word in get_ptb_dataset('train')], dtype=np.int32)
        self.encoded_valid = np.array([self.vocab.encode(word) for word in get_ptb_dataset('valid')], dtype=np.int32)
        self.encoded_test = np.array([self.vocab.encode(word) for word in get_ptb_dataset('test')])
        if debug:
            num_debug = 1024
            self.encoded_train = self.encoded_train[:num_debug]
            self.encoded_valid = self.encoded_valid[:num_debug]
            self.encoded_test = self.encoded_test[:num_debug]

    def add_placeholders(self):
        """生成placeholder 变量来表示输入的 tensors
            这些placeholder 被用来在模型的其他地方被填充
                            并且在训练的过程中会被填充
            input_placeholder:Input placeholder shape (None,num_steps),type  tf.int32
            labels_placeholder:label placeholder shape (None,num_steps) type tf.float32
            dropout_placeholder:dropput value placeholder (scalar), type tf.float32
        """
        self.input_placeholder = tf.placeholder(tf.int32, shape=[None, self.config.num_steps], name='Input')
        self.labels_placeholder = tf.placeholder(tf.int32, shape=[None, self.config.num_steps], name='Target')
        self.dropout_placeholder = tf.placeholder(tf.float32, name='Dropout')

    def add_embedding(self):
        """添加词嵌入层
        Hint : 这一层应该用input_placeholder 来索引词嵌入
        Hint : 你或许能发现tf.nn.embedding_lookup 是有用的
        Hint : 你或许能发现tf.split , tf.squeeze 是有用的在构造tensor 的输入的时候
        Hint : 下面是你需要创建的变量的维度
                L:(len(self.vocab),embed_size)
        Returns:
            inputs:一个训练次数的列表,每一个元素应该是
                    一个张量 大小是 (batch_size,embed_size)
        tf.split(dimension,num_split,input)
                dimension表示输入张量的哪一个维度,
                                        如果是0就表示对第0维度进行切割,
                num_split就是切割的数量,
                                        如果是2就表示输入张量被切成2份,
                                        每一份是一个列表
        tf.squeeze(input,squeeze_dims=None,name=None)
                                        从tensor中删除所有大小是1的维度
                example: t is a tensor of shape [1,2,1,3,1,1]
                        shape(squeeze(t))==>[2,3]
                        t is a tensor of shape [1,2,1,3,1,1]
                        shape(squeeze(t,[2,4]))==>[1,2,3,1]
        tf.nn.embedding_lookup 将词的索引映射到词的向量
        """
        with tf.device('/cpu:0'):
            embedding = tf.get_variable('Embedding', [len(self.vocab), self.config.embed_size], trainable=True)
            inputs = tf.nn.embedding_lookup(embedding, self.input_placeholder)
            inputs = [tf.squeeze(x, [1]) for x in tf.split(inputs, self.config.num_steps, 1)]
            return inputs

    def add_projection(self, rnn_outputs):
        """添加一个投影层
            投影层将隐藏层的表示变换到整个词向量上的分布式表示
            Hint:下面是你需要去创建的维度
                U(hidden_size,len(vocab))
                b_2:(len(vocab),)
            参数:
                rnn_outputs:一个训练次数的列表,每一个元素应该是一个张量
                            大小是(batch_size,embed_size)
            Returns:
                outputs:一个长度的列表,每一个元素是一个张量(batch_size,len(vocab))
        """
        with tf.variable_scope('Projection'):
            U = tf.get_variable('Matrix', [self.config.hidden_size, len(self.vocab)])
            proj_b = tf.get_variable('Bias', [len(self.vocab)])
            outputs = [tf.matmul(o, U) + proj_b for o in rnn_outputs]
        return outputs
    
    def add_loss_op(self, output):
        """将损失添加到目标函数上面
            Hint:使用tensorflow.python.ops.seq2seq.sequence_loss 来实现序列损失
                              参数:
                                        输出:一个张量   大小是 (None,self.vocab)
                              返回:
                                        损失:一个0-d大小的张量
        """
        all_ones = [tf.ones([self.config.batch_size * self.config.num_steps])]
        cross_entropy = sequence_loss([output], [tf.reshape(self.labels_placeholder, [-1])], all_ones, len(self.vocab))
        tf.add_to_collection('total_loss', cross_entropy)
        loss = tf.add_n(tf.get_collection('total_loss'))
        return loss
        
        
    def add_training_op(self, loss):
        """将目标损失添加到计算图上
            创建一个优化器并且应用梯度下降到所有的训练变量上面
            Hint:使用tf.train.AdamOptimizer 对于这个模型
                使用optimizer.minimize() 会返回一个train_op的对象
            参数:
                loss: 损失张量,来自于cross_entropy_loss 交叉熵损失
            返回:
                train_op:训练的目标
        """
        with tf.variable_scope("Optimizer") as scope:
            train_op = tf.train.AdamOptimizer(self.config.lr).minimize(loss)
        return train_op

    def __init__(self, config):
        self.config = config
        self.load_data(debug=False)
        self.add_placeholders()
        self.inputs = self.add_embedding()
        self.rnn_outputs = self.add_model(self.inputs)
        self.outputs = self.add_projection(self.rnn_outputs)

        # 我们想去检验下一个词预测得多好
        # 我们把o转变成float64 位 因为如果不这样就会有数值问题
        # sum(output of softmax) = 1.00000298179 并且不是 1
        self.predictions = [tf.nn.softmax(tf.cast(o, 'float64')) for o in self.outputs]
        # 将输出值转变成 len(vocab) 的大小
        output = tf.reshape(tf.concat(self.outputs, 1), [-1, len(self.vocab)])
        self.calculate_loss = self.add_loss_op(output)
        self.train_step = self.add_training_op(self.calculate_loss)

    def add_model(self, inputs):
        """创建RNN LM 模型
                      在下面的实现里面你需要去实现RNN LM 模型的等式
        Hint: 使用一个零向量 大小是 (batch_size,hidden_size) 作为初始的RNN的状态
        Hint: 将最后RNN输出 作为实例变量
            self.final_state
        Hint : 确保将dropout应用到 输入和输出的 变量上面
        Hint : 使用变量域 RNN 来定义 RNN变量
        Hint : 表现一个明显的 for-loop 在输入上面
                你可以使用scope.reuse_variable() 来确定权重
                在每一次迭代都是相同的
                确保不会在第一次循环的时候调用这个,因为没有变量会被初始化
        Hint : 下面变量的不同的维度 , 你需要去创建的

            H: (hidden_size,hidden_size)
            I: (embed_size,hidden_size)
            b_1:(hidden_size,)
        Args:
            inputs:一个记录num_steps的列表,里边的每一个元素应该是一个张量
                    大小是(batch_size,embed_size)的大小
        Returns:返回
            outputs:一个记录num_steps的列表,里面每一个元素应该是一个张量
                    大小是(batch_size,hidden_size)
        """
        with tf.variable_scope('InputDropout'):
            inputs = [tf.nn.dropout(x, self.dropout_placeholder) for x in inputs]

        with tf.variable_scope('RNN') as scope:
            self.initial_state = tf.zeros([self.config.batch_size, self.config.hidden_size])
            state = self.initial_state
            rnn_outputs = []
            for tstep, current_input in enumerate(inputs):
                if tstep > 0:
                    scope.reuse_variables()
                RNN_H = tf.get_variable('HMatrix', [self.config.hidden_size, self.config.hidden_size])
                RNN_I = tf.get_variable('IMatrix', [self.config.embed_size, self.config.hidden_size])
                RNN_b = tf.get_variable('B', [self.config.hidden_size])
                state = tf.nn.sigmoid(tf.matmul(state, RNN_H) + tf.matmul(current_input, RNN_I) + RNN_b)
                rnn_outputs.append(state)
            self.final_state = rnn_outputs[-1]

        with tf.variable_scope('RNNDropout'):
            rnn_outputs = [tf.nn.dropout(x, self.dropout_placeholder) for x in rnn_outputs]
        return rnn_outputs

    def run_epoch(self, session, data, train_op=None, verbose=10):
        config = self.config
        dp = config.dropout
        if not train_op:
            train_op = tf.no_op()
            dp = 1
        total_steps = sum(1 for x in ptb_iterator(data, config.batch_size, config.num_steps))
        total_loss = []
        state = self.initial_state.eval()
        for step, (x, y) in enumerate(ptb_iterator(data, config.batch_size, config.num_steps)):
            # 我们需要通过初始状态,并且从最终状态中抽取数据来进行填充
            # RNN 合适的 历史
            feed = {self.input_placeholder: x,
                    self.labels_placeholder: y,
                    self.initial_state: state,
                    self.dropout_placeholder: dp
                    }
            loss, state, _ = session.run([self.calculate_loss, self.final_state, train_op], feed_dict=feed)
            total_loss.append(loss)
            if verbose and step % verbose == 0:
                sys.stdout.write('\r{} / {} : pp = {} '.format(step, total_steps, np.exp(np.mean(total_loss))))
                sys.stdout.flush()
        if verbose:
            sys.stdout.write('\r')
        return np.exp(np.mean(total_loss))

def generate_text(session, model, config, starting_text='<eos>', stop_length=100, stop_tokens=None, temp=1.0):
    """从模型自动生成文字
        Hint:创建一个feed-dictionary 并且使用sess.run()方法去执行这个模型
                你会需要使用model.initial_state 作为一个键传递给feed_dict
        Hint:得到model.final_state 和 model.predictions[-1].
             在add_model()方法中设置model.final_state  。
             model.predictions 是在 __init__方法中设置的
        Hint:在模型的训练中存储输出的参数值,和预测的y_pred的值
        参数:
        Args:
            session : tf.Session() object
            model : Object of type RNNLM Model
            config : A Config() object
            starting_text:Initial text passed to model
        Returns:
            output : List of word idxs
    """
    state = model.initial_state.eval()
    # Imagine tokens as a batch size of one, length of len(tokens[0])
    tokens = [model.vocab.encode(word) for word in starting_text.split()]
    for i in range(stop_length):
        feed = {model.input_placeholder: [tokens[-1:]],
                model.initial_state: state,
                model.dropout_placeholder: 1}
        state, y_pred = session.run([model.final_state, model.predictions[-1]], feed_dict=feed)
        next_word_idx = sample(y_pred[0], temperature=temp)
        tokens.append(next_word_idx)
        if stop_tokens and model.vocab.decode(tokens[-1]) in stop_tokens:
            break
    output = [model.vocab.decode(word_idx) for word_idx in tokens]
    return output

def generate_sentence(session, model, config, *args, **kwargs):
    """方便从模型来生成句子"""
    return generate_text(session, model, config, *args, stop_tokens=['<eos>'], **kwargs)

def test_RNNLM():
    config = Config()
    gen_config = deepcopy(config)
    gen_config.batch_size = gen_config.num_steps = 1

    # 创建训练模型,并且生成模型
    with tf.variable_scope('RNNLM',reuse=None) as scope:
        model = RNNLM_Model(config)
        # 这个指示gen_model来重新使用相同的变量作为以上的模型
        scope.reuse_variables()
        gen_model = RNNLM_Model(gen_config)

    init = tf.global_variables_initializer()
    saver = tf.train.Saver()

    with tf.Session() as session:
        best_val_pp = float('inf')
        best_val_epoch = 0
        session.run(init)
        for epoch in range(config.max_epochs):
            print('Epoch {0}'.format(epoch))
            start = time.time()

            train_pp = model.run_epoch(session,
                                       model.encoded_train,
                                       train_op=model.train_step)
            valid_pp = model.run_epoch(session, model.encoded_valid)
            print('Training perplexity: {0}'.format(train_pp))
            print('Validation perplexity:{0}'.format(valid_pp))
            if valid_pp < best_val_pp:
                best_val_pp = valid_pp
                best_val_epoch = epoch
                saver.save(session, './ptb_rnnlm.weights')
            if epoch - best_val_epoch > config.early_stopping:
                break
            print('Total time : {0}'.format(time.time() - start))

        saver.restore(session, 'ptb_rnnlm.weights')
        test_pp = model.run_epoch(session, model.encoded_test)
        print('=-=' * 5)
        print('Test perplexity: {0} '.format(test_pp))
        print('=-=' * 5)
        starting_text = 'in palo alto'
        while starting_text:
            print(' '.join(generate_sentence(session, gen_model, gen_config, starting_text=starting_text, temp=1.0)))
            #starting_text = raw_input('>')


if __name__ == "__main__":
    test_RNNLM()

(其实也不算是天书啦,比高数简单多啦,比数学分析那是简单了好几十万倍了呀)

下面是训练的Log

1380 / 1452 : pp = 266.20892333984375 
1390 / 1452 : pp = 265.94439697265625 
1400 / 1452 : pp = 265.66845703125 
1410 / 1452 : pp = 265.5393981933594 
1420 / 1452 : pp = 265.32489013671875 
1430 / 1452 : pp = 265.2019348144531 
1440 / 1452 : pp = 265.13720703125 
1450 / 1452 : pp = 264.954833984375 

0 / 115 : pp = 296.9217224121094 
10 / 115 : pp = 282.02130126953125 
20 / 115 : pp = 279.76800537109375 
30 / 115 : pp = 276.4101257324219 
40 / 115 : pp = 276.2939147949219 
50 / 115 : pp = 270.73565673828125 
60 / 115 : pp = 269.88134765625 
70 / 115 : pp = 266.8675231933594 
80 / 115 : pp = 263.6731872558594 
90 / 115 : pp = 260.8569030761719 
100 / 115 : pp = 256.3356628417969 
110 / 115 : pp = 255.1026611328125 
Training perplexity: 264.9092102050781
Validation perplexity:254.84902954101562
Total time : 41.65332388877869
Epoch 3

0 / 1452 : pp = 327.0847473144531 
10 / 1452 : pp = 273.9620056152344 
20 / 1452 : pp = 270.22943115234375 
30 / 1452 : pp = 263.5213317871094 
40 / 1452 : pp = 264.0644836425781 
50 / 1452 : pp = 258.6029968261719 
60 / 1452 : pp = 257.04290771484375 
70 / 1452 : pp = 257.59161376953125 
80 / 1452 : pp = 256.7600402832031 
90 / 1452 : pp = 254.5120391845703 
100 / 1452 : pp = 252.44725036621094 
110 / 1452 : pp = 250.13954162597656 
120 / 1452 : pp = 249.91647338867188 
130 / 1452 : pp = 249.50460815429688 
140 / 1452 : pp = 247.67440795898438 
150 / 1452 : pp = 247.19090270996094 
160 / 1452 : pp = 247.8919219970703 
170 / 1452 : pp = 247.54322814941406 
180 / 1452 : pp = 246.17623901367188 
190 / 1452 : pp = 245.78330993652344 
200 / 1452 : pp = 246.80552673339844 
210 / 1452 : pp = 246.3059844970703 
220 / 1452 : pp = 246.19021606445312 
230 / 1452 : pp = 246.70140075683594 
240 / 1452 : pp = 246.3099822998047 
250 / 1452 : pp = 245.1745147705078 
260 / 1452 : pp = 244.17384338378906 
270 / 1452 : pp = 242.57363891601562 
280 / 1452 : pp = 242.8500213623047 
290 / 1452 : pp = 243.0492706298828 
300 / 1452 : pp = 243.1466522216797 
310 / 1452 : pp = 242.89044189453125 
320 / 1452 : pp = 243.08045959472656 
330 / 1452 : pp = 243.32235717773438 
340 / 1452 : pp = 242.34715270996094 
350 / 1452 : pp = 242.80972290039062 
360 / 1452 : pp = 242.5345458984375 
370 / 1452 : pp = 242.0083465576172 
380 / 1452 : pp = 241.22708129882812 
390 / 1452 : pp = 241.24398803710938 
400 / 1452 : pp = 240.63473510742188 
410 / 1452 : pp = 240.94094848632812 
420 / 1452 : pp = 241.19717407226562 
430 / 1452 : pp = 240.8896026611328 
440 / 1452 : pp = 240.7772979736328 
450 / 1452 : pp = 240.45913696289062 
460 / 1452 : pp = 240.06674194335938 
470 / 1452 : pp = 239.42198181152344 
480 / 1452 : pp = 238.39271545410156 
490 / 1452 : pp = 238.0517120361328 
500 / 1452 : pp = 237.31752014160156 
510 / 1452 : pp = 237.1197967529297 
520 / 1452 : pp = 236.64865112304688 
530 / 1452 : pp = 236.004638671875 
540 / 1452 : pp = 235.192626953125 
550 / 1452 : pp = 234.6700439453125 
560 / 1452 : pp = 234.1914825439453 
570 / 1452 : pp = 233.80899047851562 
580 / 1452 : pp = 233.3753662109375 
590 / 1452 : pp = 232.8699188232422 
600 / 1452 : pp = 232.2629852294922 
610 / 1452 : pp = 231.8668212890625 
620 / 1452 : pp = 231.478515625 
630 / 1452 : pp = 231.0444793701172 
640 / 1452 : pp = 231.2737579345703 
650 / 1452 : pp = 231.28114318847656 
660 / 1452 : pp = 231.4324951171875 
670 / 1452 : pp = 231.48513793945312 
680 / 1452 : pp = 231.45932006835938 
690 / 1452 : pp = 231.17738342285156 
700 / 1452 : pp = 231.00570678710938 
710 / 1452 : pp = 231.03810119628906 
720 / 1452 : pp = 230.96131896972656 
730 / 1452 : pp = 230.91110229492188 
740 / 1452 : pp = 231.13539123535156 
750 / 1452 : pp = 231.04393005371094 
760 / 1452 : pp = 231.03489685058594 
770 / 1452 : pp = 231.19744873046875 
780 / 1452 : pp = 231.26625061035156 
790 / 1452 : pp = 231.38714599609375 
800 / 1452 : pp = 231.24441528320312 
810 / 1452 : pp = 231.16824340820312 
820 / 1452 : pp = 231.11831665039062 
830 / 1452 : pp = 231.34886169433594 
840 / 1452 : pp = 231.221923828125 
850 / 1452 : pp = 231.2562255859375 
860 / 1452 : pp = 231.26492309570312 
870 / 1452 : pp = 231.1961212158203 
880 / 1452 : pp = 231.30506896972656 
890 / 1452 : pp = 231.24728393554688 
900 / 1452 : pp = 231.15744018554688 
910 / 1452 : pp = 231.20175170898438 
920 / 1452 : pp = 231.25534057617188 
930 / 1452 : pp = 231.09461975097656 
940 / 1452 : pp = 231.12612915039062 
950 / 1452 : pp = 231.0475616455078 
960 / 1452 : pp = 230.86056518554688 
970 / 1452 : pp = 230.80377197265625 
980 / 1452 : pp = 230.4598846435547 
990 / 1452 : pp = 230.24559020996094 
1000 / 1452 : pp = 229.91030883789062 
1010 / 1452 : pp = 229.9349822998047 
1020 / 1452 : pp = 230.01470947265625 
1030 / 1452 : pp = 229.8909149169922 
1040 / 1452 : pp = 229.9403533935547 
1050 / 1452 : pp = 229.84815979003906 
1060 / 1452 : pp = 229.60377502441406 
1070 / 1452 : pp = 229.74647521972656 
1080 / 1452 : pp = 229.80410766601562 
1090 / 1452 : pp = 229.78733825683594 
1100 / 1452 : pp = 229.64549255371094 
1110 / 1452 : pp = 229.26255798339844 
1120 / 1452 : pp = 229.00262451171875 
1130 / 1452 : pp = 228.6716766357422 
1140 / 1452 : pp = 228.55067443847656 
1150 / 1452 : pp = 228.61563110351562 
1160 / 1452 : pp = 228.50958251953125 
1170 / 1452 : pp = 228.3498992919922 
1180 / 1452 : pp = 228.29786682128906 
1190 / 1452 : pp = 228.33204650878906 
1200 / 1452 : pp = 228.27369689941406 
1210 / 1452 : pp = 228.11831665039062 
1220 / 1452 : pp = 228.21775817871094 
1230 / 1452 : pp = 228.3170166015625 
1240 / 1452 : pp = 228.22134399414062 
1250 / 1452 : pp = 228.3769073486328 
1260 / 1452 : pp = 228.37527465820312 
1270 / 1452 : pp = 228.33694458007812 
1280 / 1452 : pp = 228.27108764648438 
1290 / 1452 : pp = 228.1731414794922 
1300 / 1452 : pp = 228.12200927734375 
1310 / 1452 : pp = 228.10275268554688 
1320 / 1452 : pp = 227.9289093017578 
1330 / 1452 : pp = 227.77723693847656 
1340 / 1452 : pp = 227.79623413085938 
1350 / 1452 : pp = 227.7408447265625 
1360 / 1452 : pp = 227.72586059570312 
1370 / 1452 : pp = 227.49728393554688 
1380 / 1452 : pp = 227.37940979003906 
1390 / 1452 : pp = 227.20166015625 
1400 / 1452 : pp = 227.018310546875 
1410 / 1452 : pp = 226.95651245117188 
1420 / 1452 : pp = 226.8065643310547 
1430 / 1452 : pp = 226.7261199951172 
1440 / 1452 : pp = 226.7193145751953 
1450 / 1452 : pp = 226.61068725585938 

0 / 115 : pp = 269.342041015625 
10 / 115 : pp = 255.03016662597656 
20 / 115 : pp = 253.8992919921875 
30 / 115 : pp = 251.04025268554688 
40 / 115 : pp = 250.51756286621094 
50 / 115 : pp = 245.3595428466797 
60 / 115 : pp = 244.4713897705078 
70 / 115 : pp = 241.2674560546875 
80 / 115 : pp = 238.3473663330078 
90 / 115 : pp = 235.56423950195312 
100 / 115 : pp = 231.2281036376953 
110 / 115 : pp = 229.8423614501953 
Training perplexity: 226.5760040283203
Validation perplexity:229.59939575195312
Total time : 42.202677726745605
Epoch 4

0 / 1452 : pp = 282.2423095703125 
10 / 1452 : pp = 240.16258239746094 
20 / 1452 : pp = 236.12203979492188 
30 / 1452 : pp = 230.3953857421875 
40 / 1452 : pp = 231.8789825439453 
50 / 1452 : pp = 227.26612854003906 
60 / 1452 : pp = 226.22061157226562 
70 / 1452 : pp = 227.01885986328125 
80 / 1452 : pp = 226.2459716796875 
90 / 1452 : pp = 224.3211669921875 
100 / 1452 : pp = 222.65615844726562 
110 / 1452 : pp = 220.70326232910156 
120 / 1452 : pp = 220.42288208007812 
130 / 1452 : pp = 219.8100128173828 
140 / 1452 : pp = 218.04432678222656 
150 / 1452 : pp = 217.31639099121094 
160 / 1452 : pp = 217.86349487304688 
170 / 1452 : pp = 217.46597290039062 
180 / 1452 : pp = 216.3349151611328 
190 / 1452 : pp = 216.12240600585938 
200 / 1452 : pp = 216.97842407226562 
210 / 1452 : pp = 216.51014709472656 
220 / 1452 : pp = 216.46751403808594 
230 / 1452 : pp = 216.80126953125 
240 / 1452 : pp = 216.45965576171875 
250 / 1452 : pp = 215.5008544921875 
260 / 1452 : pp = 214.62210083007812 
270 / 1452 : pp = 213.29183959960938 
280 / 1452 : pp = 213.5621337890625 
290 / 1452 : pp = 213.80657958984375 
300 / 1452 : pp = 213.8963165283203 
310 / 1452 : pp = 213.60653686523438 
320 / 1452 : pp = 213.85877990722656 
330 / 1452 : pp = 214.07345581054688 
340 / 1452 : pp = 213.25421142578125 
350 / 1452 : pp = 213.68019104003906 
360 / 1452 : pp = 213.41717529296875 
370 / 1452 : pp = 213.04920959472656 
380 / 1452 : pp = 212.39019775390625 
390 / 1452 : pp = 212.4908905029297 
400 / 1452 : pp = 212.01914978027344 
410 / 1452 : pp = 212.36903381347656 
420 / 1452 : pp = 212.6802520751953 
430 / 1452 : pp = 212.42697143554688 
440 / 1452 : pp = 212.42990112304688 
450 / 1452 : pp = 212.14524841308594 
460 / 1452 : pp = 211.7836151123047 
470 / 1452 : pp = 211.17282104492188 
480 / 1452 : pp = 210.27903747558594 
490 / 1452 : pp = 209.95211791992188 
500 / 1452 : pp = 209.28302001953125 
510 / 1452 : pp = 209.1029815673828 
520 / 1452 : pp = 208.73855590820312 
530 / 1452 : pp = 208.19700622558594 
540 / 1452 : pp = 207.4554443359375 
550 / 1452 : pp = 207.0062255859375 
560 / 1452 : pp = 206.59739685058594 
570 / 1452 : pp = 206.27874755859375 
580 / 1452 : pp = 205.87144470214844 
590 / 1452 : pp = 205.43545532226562 
600 / 1452 : pp = 204.90940856933594 
610 / 1452 : pp = 204.5686798095703 
620 / 1452 : pp = 204.22862243652344 
630 / 1452 : pp = 203.8448028564453 
640 / 1452 : pp = 204.06576538085938 
650 / 1452 : pp = 204.0941925048828 
660 / 1452 : pp = 204.22103881835938 
670 / 1452 : pp = 204.289794921875 
680 / 1452 : pp = 204.3115234375 
690 / 1452 : pp = 204.10284423828125 
700 / 1452 : pp = 203.99757385253906 
710 / 1452 : pp = 204.04971313476562 
720 / 1452 : pp = 204.03152465820312 
730 / 1452 : pp = 203.99046325683594 
740 / 1452 : pp = 204.19786071777344 
750 / 1452 : pp = 204.1642608642578 
760 / 1452 : pp = 204.19435119628906 
770 / 1452 : pp = 204.37786865234375 
780 / 1452 : pp = 204.4965057373047 
790 / 1452 : pp = 204.6479034423828 
800 / 1452 : pp = 204.56117248535156 
810 / 1452 : pp = 204.52284240722656 
820 / 1452 : pp = 204.50978088378906 
830 / 1452 : pp = 204.7531280517578 
840 / 1452 : pp = 204.64468383789062 
850 / 1452 : pp = 204.71348571777344 
860 / 1452 : pp = 204.7399444580078 
870 / 1452 : pp = 204.69406127929688 
880 / 1452 : pp = 204.7965850830078 
890 / 1452 : pp = 204.7594757080078 
900 / 1452 : pp = 204.71446228027344 
910 / 1452 : pp = 204.7590789794922 
920 / 1452 : pp = 204.85772705078125 
930 / 1452 : pp = 204.7428741455078 
940 / 1452 : pp = 204.8068389892578 
950 / 1452 : pp = 204.75791931152344 
960 / 1452 : pp = 204.63815307617188 
970 / 1452 : pp = 204.60760498046875 
980 / 1452 : pp = 204.34347534179688 
990 / 1452 : pp = 204.151611328125 
1000 / 1452 : pp = 203.8665771484375 
1010 / 1452 : pp = 203.9164581298828 
1020 / 1452 : pp = 204.0184783935547 
1030 / 1452 : pp = 203.95166015625 
1040 / 1452 : pp = 204.03045654296875 
1050 / 1452 : pp = 203.95846557617188 
1060 / 1452 : pp = 203.77114868164062 
1070 / 1452 : pp = 203.93260192871094 
1080 / 1452 : pp = 204.00048828125 
1090 / 1452 : pp = 204.00233459472656 
1100 / 1452 : pp = 203.8960418701172 
1110 / 1452 : pp = 203.5987548828125 
1120 / 1452 : pp = 203.38392639160156 
1130 / 1452 : pp = 203.08872985839844 
1140 / 1452 : pp = 203.01272583007812 
1150 / 1452 : pp = 203.0865936279297 
1160 / 1452 : pp = 203.02308654785156 
1170 / 1452 : pp = 202.9125518798828 
1180 / 1452 : pp = 202.9097442626953 
1190 / 1452 : pp = 202.98252868652344 
1200 / 1452 : pp = 202.95387268066406 
1210 / 1452 : pp = 202.851318359375 
1220 / 1452 : pp = 202.97671508789062 
1230 / 1452 : pp = 203.1051025390625 
1240 / 1452 : pp = 203.0526123046875 
1250 / 1452 : pp = 203.21417236328125 
1260 / 1452 : pp = 203.23617553710938 
1270 / 1452 : pp = 203.22802734375 
1280 / 1452 : pp = 203.20846557617188 
1290 / 1452 : pp = 203.15362548828125 
1300 / 1452 : pp = 203.14315795898438 
1310 / 1452 : pp = 203.15264892578125 
1320 / 1452 : pp = 203.02801513671875 
1330 / 1452 : pp = 202.92977905273438 
1340 / 1452 : pp = 202.95484924316406 
1350 / 1452 : pp = 202.9335479736328 
1360 / 1452 : pp = 202.955322265625 
1370 / 1452 : pp = 202.7740478515625 
1380 / 1452 : pp = 202.68569946289062 
1390 / 1452 : pp = 202.55816650390625 
1400 / 1452 : pp = 202.41651916503906 
1410 / 1452 : pp = 202.38494873046875 
1420 / 1452 : pp = 202.27593994140625 
1430 / 1452 : pp = 202.21826171875 
1440 / 1452 : pp = 202.23272705078125 
1450 / 1452 : pp = 202.16099548339844 

0 / 115 : pp = 253.23211669921875 
10 / 115 : pp = 237.62506103515625 
20 / 115 : pp = 237.60557556152344 
30 / 115 : pp = 234.9273223876953 
40 / 115 : pp = 234.30519104003906 
50 / 115 : pp = 229.43960571289062 
60 / 115 : pp = 228.6050567626953 
70 / 115 : pp = 225.2646484375 
80 / 115 : pp = 222.55935668945312 
90 / 115 : pp = 219.83255004882812 
100 / 115 : pp = 215.5491485595703 
110 / 115 : pp = 214.07937622070312 
Training perplexity: 202.1349639892578
Validation perplexity:213.85256958007812
Total time : 42.10724234580994
Epoch 5

0 / 1452 : pp = 255.92384338378906 
10 / 1452 : pp = 219.5322265625 
20 / 1452 : pp = 214.36212158203125 
30 / 1452 : pp = 209.12620544433594 
40 / 1452 : pp = 210.04193115234375 
50 / 1452 : pp = 205.77398681640625 
60 / 1452 : pp = 204.8201141357422 
70 / 1452 : pp = 205.3955841064453 
80 / 1452 : pp = 204.8386688232422 
90 / 1452 : pp = 203.21194458007812 
100 / 1452 : pp = 201.87643432617188 
110 / 1452 : pp = 200.10122680664062 
120 / 1452 : pp = 199.82012939453125 
130 / 1452 : pp = 199.11192321777344 
140 / 1452 : pp = 197.51919555664062 
150 / 1452 : pp = 197.03567504882812 
160 / 1452 : pp = 197.4231414794922 
170 / 1452 : pp = 197.09571838378906 
180 / 1452 : pp = 196.17665100097656 
190 / 1452 : pp = 196.0064697265625 
200 / 1452 : pp = 196.7347869873047 
210 / 1452 : pp = 196.3063507080078 
220 / 1452 : pp = 196.21388244628906 
230 / 1452 : pp = 196.5252227783203 
240 / 1452 : pp = 196.203125 
250 / 1452 : pp = 195.3251953125 
260 / 1452 : pp = 194.53335571289062 
270 / 1452 : pp = 193.3546142578125 
280 / 1452 : pp = 193.59420776367188 
290 / 1452 : pp = 193.83297729492188 
300 / 1452 : pp = 193.98489379882812 
310 / 1452 : pp = 193.68414306640625 
320 / 1452 : pp = 193.89065551757812 
330 / 1452 : pp = 194.0518798828125 
340 / 1452 : pp = 193.32888793945312 
350 / 1452 : pp = 193.76219177246094 
360 / 1452 : pp = 193.56106567382812 
370 / 1452 : pp = 193.28179931640625 
380 / 1452 : pp = 192.7037811279297 
390 / 1452 : pp = 192.8145294189453 
400 / 1452 : pp = 192.43325805664062 
410 / 1452 : pp = 192.81527709960938 
420 / 1452 : pp = 193.13760375976562 
430 / 1452 : pp = 192.9148712158203 
440 / 1452 : pp = 192.92526245117188 
450 / 1452 : pp = 192.70083618164062 
460 / 1452 : pp = 192.36647033691406 
470 / 1452 : pp = 191.85394287109375 
480 / 1452 : pp = 191.07244873046875 
490 / 1452 : pp = 190.75401306152344 
500 / 1452 : pp = 190.1843719482422 
510 / 1452 : pp = 190.03334045410156 
520 / 1452 : pp = 189.72938537597656 
530 / 1452 : pp = 189.25889587402344 
540 / 1452 : pp = 188.59315490722656 
550 / 1452 : pp = 188.19313049316406 
560 / 1452 : pp = 187.80621337890625 
570 / 1452 : pp = 187.5229034423828 
580 / 1452 : pp = 187.1091766357422 
590 / 1452 : pp = 186.72592163085938 
600 / 1452 : pp = 186.2238006591797 
610 / 1452 : pp = 185.89695739746094 
620 / 1452 : pp = 185.60989379882812 
630 / 1452 : pp = 185.2689208984375 
640 / 1452 : pp = 185.47567749023438 
650 / 1452 : pp = 185.5127410888672 
660 / 1452 : pp = 185.64627075195312 
670 / 1452 : pp = 185.71311950683594 
680 / 1452 : pp = 185.72569274902344 
690 / 1452 : pp = 185.56459045410156 
700 / 1452 : pp = 185.48681640625 
710 / 1452 : pp = 185.5458221435547 
720 / 1452 : pp = 185.5598907470703 
730 / 1452 : pp = 185.5335235595703 
740 / 1452 : pp = 185.73995971679688 
750 / 1452 : pp = 185.744384765625 
760 / 1452 : pp = 185.81268310546875 
770 / 1452 : pp = 186.00088500976562 
780 / 1452 : pp = 186.14443969726562 
790 / 1452 : pp = 186.30764770507812 
800 / 1452 : pp = 186.2595977783203 
810 / 1452 : pp = 186.23028564453125 
820 / 1452 : pp = 186.23997497558594 
830 / 1452 : pp = 186.49057006835938 
840 / 1452 : pp = 186.43331909179688 
850 / 1452 : pp = 186.48887634277344 
860 / 1452 : pp = 186.51502990722656 
870 / 1452 : pp = 186.5167999267578 
880 / 1452 : pp = 186.62400817871094 
890 / 1452 : pp = 186.6103973388672 
900 / 1452 : pp = 186.58111572265625 
910 / 1452 : pp = 186.64126586914062 
920 / 1452 : pp = 186.7366180419922 
930 / 1452 : pp = 186.65719604492188 
940 / 1452 : pp = 186.71755981445312 
950 / 1452 : pp = 186.6977996826172 
960 / 1452 : pp = 186.62774658203125 
970 / 1452 : pp = 186.62115478515625 
980 / 1452 : pp = 186.3773193359375 
990 / 1452 : pp = 186.23109436035156 
1000 / 1452 : pp = 185.99227905273438 
1010 / 1452 : pp = 186.0488739013672 
1020 / 1452 : pp = 186.1744384765625 
1030 / 1452 : pp = 186.1162109375 
1040 / 1452 : pp = 186.18899536132812 
1050 / 1452 : pp = 186.1549072265625 
1060 / 1452 : pp = 186.01419067382812 
1070 / 1452 : pp = 186.17364501953125 
1080 / 1452 : pp = 186.27061462402344 
1090 / 1452 : pp = 186.28428649902344 
1100 / 1452 : pp = 186.2150115966797 
1110 / 1452 : pp = 185.95103454589844 
1120 / 1452 : pp = 185.77423095703125 
1130 / 1452 : pp = 185.5232696533203 
1140 / 1452 : pp = 185.4607391357422 
1150 / 1452 : pp = 185.56077575683594 
1160 / 1452 : pp = 185.53343200683594 
1170 / 1452 : pp = 185.46453857421875 
1180 / 1452 : pp = 185.4741668701172 
1190 / 1452 : pp = 185.5594482421875 
1200 / 1452 : pp = 185.53785705566406 
1210 / 1452 : pp = 185.4576416015625 
1220 / 1452 : pp = 185.5943145751953 
1230 / 1452 : pp = 185.7483673095703 
1240 / 1452 : pp = 185.70762634277344 
1250 / 1452 : pp = 185.8568115234375 
1260 / 1452 : pp = 185.90635681152344 
1270 / 1452 : pp = 185.8961639404297 
1280 / 1452 : pp = 185.89199829101562 
1290 / 1452 : pp = 185.85911560058594 
1300 / 1452 : pp = 185.86097717285156 
1310 / 1452 : pp = 185.88739013671875 
1320 / 1452 : pp = 185.79248046875 
1330 / 1452 : pp = 185.69700622558594 
1340 / 1452 : pp = 185.7310028076172 
1350 / 1452 : pp = 185.72613525390625 
1360 / 1452 : pp = 185.76829528808594 
1370 / 1452 : pp = 185.6322021484375 
1380 / 1452 : pp = 185.56378173828125 
1390 / 1452 : pp = 185.4654998779297 
1400 / 1452 : pp = 185.35110473632812 
1410 / 1452 : pp = 185.33917236328125 
1420 / 1452 : pp = 185.2509002685547 
1430 / 1452 : pp = 185.20436096191406 
1440 / 1452 : pp = 185.2254638671875 
1450 / 1452 : pp = 185.16542053222656 

0 / 115 : pp = 242.26800537109375 
10 / 115 : pp = 226.12258911132812 
20 / 115 : pp = 226.4702606201172 
30 / 115 : pp = 223.982666015625 
40 / 115 : pp = 223.376953125 
50 / 115 : pp = 218.65716552734375 
60 / 115 : pp = 217.95306396484375 
70 / 115 : pp = 214.5392303466797 
80 / 115 : pp = 212.07525634765625 
90 / 115 : pp = 209.40631103515625 
100 / 115 : pp = 205.1455078125 
110 / 115 : pp = 203.6289520263672 
Training perplexity: 185.14476013183594
Validation perplexity:203.3822784423828
Total time : 42.47052240371704
Epoch 6

0 / 1452 : pp = 233.56707763671875 
10 / 1452 : pp = 202.6468505859375 
20 / 1452 : pp = 198.2734375 
30 / 1452 : pp = 193.47442626953125 
40 / 1452 : pp = 195.17147827148438 
50 / 1452 : pp = 191.5596923828125 
60 / 1452 : pp = 190.4825897216797 
70 / 1452 : pp = 191.07681274414062 
80 / 1452 : pp = 190.339599609375 
90 / 1452 : pp = 188.98277282714844 
100 / 1452 : pp = 187.74757385253906 
110 / 1452 : pp = 186.10104370117188 
120 / 1452 : pp = 185.7500457763672 
130 / 1452 : pp = 184.90707397460938 
140 / 1452 : pp = 183.340087890625 
150 / 1452 : pp = 182.70840454101562 
160 / 1452 : pp = 183.1043701171875 
170 / 1452 : pp = 182.69776916503906 
180 / 1452 : pp = 181.88400268554688 
190 / 1452 : pp = 181.8062286376953 
200 / 1452 : pp = 182.4969940185547 
210 / 1452 : pp = 182.10572814941406 
220 / 1452 : pp = 181.9981689453125 
230 / 1452 : pp = 182.3802490234375 
240 / 1452 : pp = 182.03636169433594 
250 / 1452 : pp = 181.23712158203125 
260 / 1452 : pp = 180.53726196289062 
270 / 1452 : pp = 179.53567504882812 
280 / 1452 : pp = 179.70208740234375 
290 / 1452 : pp = 179.977783203125 
300 / 1452 : pp = 180.16600036621094 
310 / 1452 : pp = 179.87294006347656 
320 / 1452 : pp = 180.11849975585938 
330 / 1452 : pp = 180.31838989257812 
340 / 1452 : pp = 179.56759643554688 
350 / 1452 : pp = 179.97134399414062 
360 / 1452 : pp = 179.80030822753906 
370 / 1452 : pp = 179.52085876464844 
380 / 1452 : pp = 178.98228454589844 
390 / 1452 : pp = 179.0868682861328 
400 / 1452 : pp = 178.74569702148438 
410 / 1452 : pp = 179.1776580810547 
420 / 1452 : pp = 179.5055389404297 
430 / 1452 : pp = 179.3883056640625 
440 / 1452 : pp = 179.42279052734375 
450 / 1452 : pp = 179.2106475830078 
460 / 1452 : pp = 178.85311889648438 
470 / 1452 : pp = 178.33840942382812 
480 / 1452 : pp = 177.60350036621094 
490 / 1452 : pp = 177.30335998535156 
500 / 1452 : pp = 176.72222900390625 
510 / 1452 : pp = 176.6067352294922 
520 / 1452 : pp = 176.33998107910156 
530 / 1452 : pp = 175.93162536621094 
540 / 1452 : pp = 175.30657958984375 
550 / 1452 : pp = 174.9462432861328 
560 / 1452 : pp = 174.5836639404297 
570 / 1452 : pp = 174.31431579589844 
580 / 1452 : pp = 173.92300415039062 
590 / 1452 : pp = 173.55856323242188 
600 / 1452 : pp = 173.08277893066406 
610 / 1452 : pp = 172.75930786132812 
620 / 1452 : pp = 172.53192138671875 
630 / 1452 : pp = 172.20652770996094 
640 / 1452 : pp = 172.37454223632812 
650 / 1452 : pp = 172.39845275878906 
660 / 1452 : pp = 172.52255249023438 
670 / 1452 : pp = 172.60935974121094 
680 / 1452 : pp = 172.6611328125 
690 / 1452 : pp = 172.53118896484375 
700 / 1452 : pp = 172.4709014892578 
710 / 1452 : pp = 172.5406494140625 
720 / 1452 : pp = 172.55447387695312 
730 / 1452 : pp = 172.5330047607422 
740 / 1452 : pp = 172.7061767578125 
750 / 1452 : pp = 172.71054077148438 
760 / 1452 : pp = 172.77743530273438 
770 / 1452 : pp = 172.95481872558594 
780 / 1452 : pp = 173.11265563964844 
790 / 1452 : pp = 173.2832794189453 
800 / 1452 : pp = 173.2537841796875 
810 / 1452 : pp = 173.22164916992188 
820 / 1452 : pp = 173.24148559570312 
830 / 1452 : pp = 173.48228454589844 
840 / 1452 : pp = 173.43753051757812 
850 / 1452 : pp = 173.505615234375 
860 / 1452 : pp = 173.5214080810547 
870 / 1452 : pp = 173.5009002685547 
880 / 1452 : pp = 173.6202392578125 
890 / 1452 : pp = 173.622802734375 
900 / 1452 : pp = 173.5987091064453 
910 / 1452 : pp = 173.68316650390625 
920 / 1452 : pp = 173.77330017089844 
930 / 1452 : pp = 173.72018432617188 
940 / 1452 : pp = 173.79351806640625 
950 / 1452 : pp = 173.7653350830078 
960 / 1452 : pp = 173.7102508544922 
970 / 1452 : pp = 173.69766235351562 
980 / 1452 : pp = 173.4836883544922 
990 / 1452 : pp = 173.3550262451172 
1000 / 1452 : pp = 173.14816284179688 
1010 / 1452 : pp = 173.20777893066406 
1020 / 1452 : pp = 173.3390655517578 
1030 / 1452 : pp = 173.2884063720703 
1040 / 1452 : pp = 173.38015747070312 
1050 / 1452 : pp = 173.35592651367188 
1060 / 1452 : pp = 173.2260284423828 
1070 / 1452 : pp = 173.39321899414062 
1080 / 1452 : pp = 173.4879913330078 
1090 / 1452 : pp = 173.5231475830078 
1100 / 1452 : pp = 173.47177124023438 
1110 / 1452 : pp = 173.24453735351562 
1120 / 1452 : pp = 173.09408569335938 
1130 / 1452 : pp = 172.86627197265625 
1140 / 1452 : pp = 172.8234100341797 
1150 / 1452 : pp = 172.92843627929688 
1160 / 1452 : pp = 172.90065002441406 
1170 / 1452 : pp = 172.8550567626953 
1180 / 1452 : pp = 172.8810272216797 
1190 / 1452 : pp = 172.97312927246094 
1200 / 1452 : pp = 172.9776611328125 
1210 / 1452 : pp = 172.89413452148438 
1220 / 1452 : pp = 173.0257568359375 
1230 / 1452 : pp = 173.1847381591797 
1240 / 1452 : pp = 173.1756591796875 
1250 / 1452 : pp = 173.32138061523438 
1260 / 1452 : pp = 173.37229919433594 
1270 / 1452 : pp = 173.36891174316406 
1280 / 1452 : pp = 173.36337280273438 
1290 / 1452 : pp = 173.3444366455078 
1300 / 1452 : pp = 173.36138916015625 
1310 / 1452 : pp = 173.4015655517578 
1320 / 1452 : pp = 173.31790161132812 
1330 / 1452 : pp = 173.24710083007812 
1340 / 1452 : pp = 173.27212524414062 
1350 / 1452 : pp = 173.27674865722656 
1360 / 1452 : pp = 173.32749938964844 
1370 / 1452 : pp = 173.20472717285156 
1380 / 1452 : pp = 173.14889526367188 
1390 / 1452 : pp = 173.0755157470703 
1400 / 1452 : pp = 172.9678497314453 
1410 / 1452 : pp = 172.9612579345703 
1420 / 1452 : pp = 172.8872833251953 
1430 / 1452 : pp = 172.84805297851562 
1440 / 1452 : pp = 172.87252807617188 
1450 / 1452 : pp = 172.82505798339844 

0 / 115 : pp = 236.35635375976562 
10 / 115 : pp = 219.06166076660156 
20 / 115 : pp = 219.7670440673828 
30 / 115 : pp = 217.33587646484375 
40 / 115 : pp = 216.6626739501953 
50 / 115 : pp = 212.04734802246094 
60 / 115 : pp = 211.42068481445312 
70 / 115 : pp = 207.9592742919922 
80 / 115 : pp = 205.6216583251953 
90 / 115 : pp = 202.93597412109375 
100 / 115 : pp = 198.62583923339844 
110 / 115 : pp = 196.97216796875 
Training perplexity: 172.80404663085938
Validation perplexity:196.6871337890625
Total time : 41.52522921562195
Epoch 7

0 / 1452 : pp = 219.23231506347656 
10 / 1452 : pp = 192.07225036621094 
20 / 1452 : pp = 187.48464965820312 
30 / 1452 : pp = 182.9149932861328 
40 / 1452 : pp = 184.2945098876953 
50 / 1452 : pp = 180.78492736816406 
60 / 1452 : pp = 179.377197265625 
70 / 1452 : pp = 180.0273895263672 
80 / 1452 : pp = 179.2517547607422 
90 / 1452 : pp = 177.77540588378906 
100 / 1452 : pp = 176.6474151611328 
110 / 1452 : pp = 174.84066772460938 
120 / 1452 : pp = 174.46890258789062 
130 / 1452 : pp = 173.64573669433594 
140 / 1452 : pp = 172.17483520507812 
150 / 1452 : pp = 171.57041931152344 
160 / 1452 : pp = 171.92059326171875 
170 / 1452 : pp = 171.5497283935547 
180 / 1452 : pp = 170.77249145507812 
190 / 1452 : pp = 170.72103881835938 
200 / 1452 : pp = 171.336181640625 
210 / 1452 : pp = 170.98524475097656 
220 / 1452 : pp = 170.99771118164062 
230 / 1452 : pp = 171.39918518066406 
240 / 1452 : pp = 171.09925842285156 
250 / 1452 : pp = 170.39962768554688 
260 / 1452 : pp = 169.7328643798828 
270 / 1452 : pp = 168.72225952148438 
280 / 1452 : pp = 168.92552185058594 
290 / 1452 : pp = 169.20147705078125 
300 / 1452 : pp = 169.40338134765625 
310 / 1452 : pp = 169.12057495117188 
320 / 1452 : pp = 169.31236267089844 
330 / 1452 : pp = 169.49945068359375 
340 / 1452 : pp = 168.8396759033203 
350 / 1452 : pp = 169.25917053222656 
360 / 1452 : pp = 169.09388732910156 
370 / 1452 : pp = 168.84323120117188 
380 / 1452 : pp = 168.3832550048828 
390 / 1452 : pp = 168.48275756835938 
400 / 1452 : pp = 168.19972229003906 
410 / 1452 : pp = 168.5838623046875 
420 / 1452 : pp = 168.91119384765625 
430 / 1452 : pp = 168.80836486816406 
440 / 1452 : pp = 168.90264892578125 
450 / 1452 : pp = 168.68589782714844 
460 / 1452 : pp = 168.3704071044922 
470 / 1452 : pp = 167.90394592285156 
480 / 1452 : pp = 167.23373413085938 
490 / 1452 : pp = 166.9560546875 
500 / 1452 : pp = 166.43161010742188 
510 / 1452 : pp = 166.320068359375 
520 / 1452 : pp = 166.05902099609375 
530 / 1452 : pp = 165.71714782714844 
540 / 1452 : pp = 165.10398864746094 
550 / 1452 : pp = 164.80430603027344 
560 / 1452 : pp = 164.4687042236328 
570 / 1452 : pp = 164.2272491455078 
580 / 1452 : pp = 163.84312438964844 
590 / 1452 : pp = 163.46035766601562 
600 / 1452 : pp = 163.01559448242188 
610 / 1452 : pp = 162.74134826660156 
620 / 1452 : pp = 162.50267028808594 
630 / 1452 : pp = 162.2018280029297 
640 / 1452 : pp = 162.37130737304688 
650 / 1452 : pp = 162.3895721435547 
660 / 1452 : pp = 162.51351928710938 
670 / 1452 : pp = 162.57684326171875 
680 / 1452 : pp = 162.6346893310547 
690 / 1452 : pp = 162.5135955810547 
700 / 1452 : pp = 162.47052001953125 
710 / 1452 : pp = 162.539794921875 
720 / 1452 : pp = 162.55381774902344 
730 / 1452 : pp = 162.5297088623047 
740 / 1452 : pp = 162.71652221679688 
750 / 1452 : pp = 162.740966796875 
760 / 1452 : pp = 162.79754638671875 
770 / 1452 : pp = 162.9949951171875 
780 / 1452 : pp = 163.17868041992188 
790 / 1452 : pp = 163.33055114746094 
800 / 1452 : pp = 163.31591796875 
810 / 1452 : pp = 163.2859344482422 
820 / 1452 : pp = 163.2958984375 
830 / 1452 : pp = 163.528564453125 
840 / 1452 : pp = 163.47610473632812 
850 / 1452 : pp = 163.5260772705078 
860 / 1452 : pp = 163.55352783203125 
870 / 1452 : pp = 163.55718994140625 
880 / 1452 : pp = 163.67523193359375 
890 / 1452 : pp = 163.6920166015625 
900 / 1452 : pp = 163.67710876464844 
910 / 1452 : pp = 163.7476806640625 
920 / 1452 : pp = 163.84803771972656 
930 / 1452 : pp = 163.8114013671875 
940 / 1452 : pp = 163.86663818359375 
950 / 1452 : pp = 163.83531188964844 
960 / 1452 : pp = 163.79945373535156 
970 / 1452 : pp = 163.80320739746094 
980 / 1452 : pp = 163.5953369140625 
990 / 1452 : pp = 163.48382568359375 
1000 / 1452 : pp = 163.2642822265625 
1010 / 1452 : pp = 163.32113647460938 
1020 / 1452 : pp = 163.44204711914062 
1030 / 1452 : pp = 163.40206909179688 
1040 / 1452 : pp = 163.4915313720703 
1050 / 1452 : pp = 163.47096252441406 
1060 / 1452 : pp = 163.3601531982422 
1070 / 1452 : pp = 163.5138397216797 
1080 / 1452 : pp = 163.6189727783203 
1090 / 1452 : pp = 163.6471405029297 
1100 / 1452 : pp = 163.60406494140625 
1110 / 1452 : pp = 163.40736389160156 
1120 / 1452 : pp = 163.26841735839844 
1130 / 1452 : pp = 163.0680694580078 
1140 / 1452 : pp = 163.04591369628906 
1150 / 1452 : pp = 163.15478515625 
1160 / 1452 : pp = 163.1380615234375 
1170 / 1452 : pp = 163.09303283691406 
1180 / 1452 : pp = 163.14149475097656 
1190 / 1452 : pp = 163.2374267578125 
1200 / 1452 : pp = 163.2394561767578 
1210 / 1452 : pp = 163.17835998535156 
1220 / 1452 : pp = 163.32347106933594 
1230 / 1452 : pp = 163.4639434814453 
1240 / 1452 : pp = 163.4611358642578 
1250 / 1452 : pp = 163.60687255859375 
1260 / 1452 : pp = 163.67227172851562 
1270 / 1452 : pp = 163.67515563964844 
1280 / 1452 : pp = 163.6881103515625 
1290 / 1452 : pp = 163.66648864746094 
1300 / 1452 : pp = 163.69287109375 
1310 / 1452 : pp = 163.7276153564453 
1320 / 1452 : pp = 163.6551055908203 
1330 / 1452 : pp = 163.58901977539062 
1340 / 1452 : pp = 163.6205291748047 
1350 / 1452 : pp = 163.63824462890625 
1360 / 1452 : pp = 163.69334411621094 
1370 / 1452 : pp = 163.5885467529297 
1380 / 1452 : pp = 163.54049682617188 
1390 / 1452 : pp = 163.4760284423828 
1400 / 1452 : pp = 163.38897705078125 
1410 / 1452 : pp = 163.3974609375 
1420 / 1452 : pp = 163.35009765625 
1430 / 1452 : pp = 163.32191467285156 
1440 / 1452 : pp = 163.35220336914062 
1450 / 1452 : pp = 163.3201904296875 

0 / 115 : pp = 232.2108154296875 
10 / 115 : pp = 214.35496520996094 
20 / 115 : pp = 215.20510864257812 
30 / 115 : pp = 212.82754516601562 
40 / 115 : pp = 212.0598907470703 
50 / 115 : pp = 207.5095672607422 
60 / 115 : pp = 206.86976623535156 
70 / 115 : pp = 203.36016845703125 
80 / 115 : pp = 201.11538696289062 
90 / 115 : pp = 198.52120971679688 
100 / 115 : pp = 194.1772003173828 
110 / 115 : pp = 192.41224670410156 
Training perplexity: 163.29916381835938
Validation perplexity:192.09552001953125
Total time : 41.78096055984497
Epoch 8

0 / 1452 : pp = 201.77548217773438 
10 / 1452 : pp = 180.4141082763672 
20 / 1452 : pp = 176.41432189941406 
30 / 1452 : pp = 172.7764434814453 
40 / 1452 : pp = 174.69166564941406 
50 / 1452 : pp = 171.2933807373047 
60 / 1452 : pp = 170.08010864257812 
70 / 1452 : pp = 170.6719512939453 
80 / 1452 : pp = 170.07589721679688 
90 / 1452 : pp = 168.7478485107422 
100 / 1452 : pp = 167.57081604003906 
110 / 1452 : pp = 166.06971740722656 
120 / 1452 : pp = 165.73374938964844 
130 / 1452 : pp = 164.80674743652344 
140 / 1452 : pp = 163.32821655273438 
150 / 1452 : pp = 162.6752471923828 
160 / 1452 : pp = 163.02049255371094 
170 / 1452 : pp = 162.64120483398438 
180 / 1452 : pp = 161.95529174804688 
190 / 1452 : pp = 161.91954040527344 
200 / 1452 : pp = 162.5446014404297 
210 / 1452 : pp = 162.2645721435547 
220 / 1452 : pp = 162.3128662109375 
230 / 1452 : pp = 162.65872192382812 
240 / 1452 : pp = 162.40948486328125 
250 / 1452 : pp = 161.75787353515625 
260 / 1452 : pp = 161.15213012695312 
270 / 1452 : pp = 160.22256469726562 
280 / 1452 : pp = 160.3651123046875 
290 / 1452 : pp = 160.63780212402344 
300 / 1452 : pp = 160.80026245117188 
310 / 1452 : pp = 160.54383850097656 
320 / 1452 : pp = 160.7539520263672 
330 / 1452 : pp = 160.94317626953125 
340 / 1452 : pp = 160.3373565673828 
350 / 1452 : pp = 160.71763610839844 
360 / 1452 : pp = 160.60960388183594 
370 / 1452 : pp = 160.37527465820312 
380 / 1452 : pp = 159.92990112304688 
390 / 1452 : pp = 160.0165557861328 
400 / 1452 : pp = 159.75697326660156 
410 / 1452 : pp = 160.15274047851562 
420 / 1452 : pp = 160.48390197753906 
430 / 1452 : pp = 160.4031982421875 
440 / 1452 : pp = 160.4693603515625 
450 / 1452 : pp = 160.28016662597656 
460 / 1452 : pp = 159.94004821777344 
470 / 1452 : pp = 159.48257446289062 
480 / 1452 : pp = 158.87998962402344 
490 / 1452 : pp = 158.59765625 
500 / 1452 : pp = 158.10865783691406 
510 / 1452 : pp = 157.96795654296875 
520 / 1452 : pp = 157.7591552734375 
530 / 1452 : pp = 157.42648315429688 
540 / 1452 : pp = 156.85348510742188 
550 / 1452 : pp = 156.5618438720703 
560 / 1452 : pp = 156.24905395507812 
570 / 1452 : pp = 155.9994354248047 
580 / 1452 : pp = 155.612060546875 
590 / 1452 : pp = 155.25830078125 
600 / 1452 : pp = 154.8464813232422 
610 / 1452 : pp = 154.5833282470703 
620 / 1452 : pp = 154.38040161132812 
630 / 1452 : pp = 154.0767364501953 
640 / 1452 : pp = 154.2534637451172 
650 / 1452 : pp = 154.25875854492188 
660 / 1452 : pp = 154.35874938964844 
670 / 1452 : pp = 154.4289093017578 
680 / 1452 : pp = 154.51412963867188 
690 / 1452 : pp = 154.41676330566406 
700 / 1452 : pp = 154.37892150878906 
710 / 1452 : pp = 154.4234619140625 
720 / 1452 : pp = 154.4586639404297 
730 / 1452 : pp = 154.4351806640625 
740 / 1452 : pp = 154.6002197265625 
750 / 1452 : pp = 154.65684509277344 
760 / 1452 : pp = 154.73318481445312 
770 / 1452 : pp = 154.92935180664062 
780 / 1452 : pp = 155.1021728515625 
790 / 1452 : pp = 155.24757385253906 
800 / 1452 : pp = 155.223876953125 
810 / 1452 : pp = 155.2095184326172 
820 / 1452 : pp = 155.24009704589844 
830 / 1452 : pp = 155.4519500732422 
840 / 1452 : pp = 155.3947296142578 
850 / 1452 : pp = 155.45306396484375 
860 / 1452 : pp = 155.4661102294922 
870 / 1452 : pp = 155.45765686035156 
880 / 1452 : pp = 155.58758544921875 
890 / 1452 : pp = 155.59373474121094 
900 / 1452 : pp = 155.59254455566406 
910 / 1452 : pp = 155.66854858398438 
920 / 1452 : pp = 155.75942993164062 
930 / 1452 : pp = 155.73350524902344 
940 / 1452 : pp = 155.80740356445312 
950 / 1452 : pp = 155.7733917236328 
960 / 1452 : pp = 155.73565673828125 
970 / 1452 : pp = 155.74404907226562 
980 / 1452 : pp = 155.55902099609375 
990 / 1452 : pp = 155.45675659179688 
1000 / 1452 : pp = 155.2649688720703 
1010 / 1452 : pp = 155.31332397460938 
1020 / 1452 : pp = 155.44979858398438 
1030 / 1452 : pp = 155.4137725830078 
1040 / 1452 : pp = 155.49012756347656 
1050 / 1452 : pp = 155.46054077148438 
1060 / 1452 : pp = 155.3616943359375 
1070 / 1452 : pp = 155.5286865234375 
1080 / 1452 : pp = 155.63743591308594 
1090 / 1452 : pp = 155.6842803955078 
1100 / 1452 : pp = 155.65599060058594 
1110 / 1452 : pp = 155.4827880859375 
1120 / 1452 : pp = 155.35450744628906 
1130 / 1452 : pp = 155.1777801513672 
1140 / 1452 : pp = 155.15994262695312 
1150 / 1452 : pp = 155.26193237304688 
1160 / 1452 : pp = 155.26214599609375 
1170 / 1452 : pp = 155.23231506347656 
1180 / 1452 : pp = 155.29266357421875 
1190 / 1452 : pp = 155.37680053710938 
1200 / 1452 : pp = 155.3736114501953 
1210 / 1452 : pp = 155.3380584716797 
1220 / 1452 : pp = 155.474853515625 
1230 / 1452 : pp = 155.62986755371094 
1240 / 1452 : pp = 155.62831115722656 
1250 / 1452 : pp = 155.77101135253906 
1260 / 1452 : pp = 155.83445739746094 
1270 / 1452 : pp = 155.845458984375 
1280 / 1452 : pp = 155.8556365966797 
1290 / 1452 : pp = 155.8556365966797 
1300 / 1452 : pp = 155.8843994140625 
1310 / 1452 : pp = 155.92417907714844 
1320 / 1452 : pp = 155.8560791015625 
1330 / 1452 : pp = 155.80636596679688 
1340 / 1452 : pp = 155.84344482421875 
1350 / 1452 : pp = 155.8706512451172 
1360 / 1452 : pp = 155.9273681640625 
1370 / 1452 : pp = 155.83140563964844 
1380 / 1452 : pp = 155.7911376953125 
1390 / 1452 : pp = 155.7401885986328 
1400 / 1452 : pp = 155.6622314453125 
1410 / 1452 : pp = 155.68531799316406 
1420 / 1452 : pp = 155.64041137695312 
1430 / 1452 : pp = 155.62216186523438 
1440 / 1452 : pp = 155.6437530517578 
1450 / 1452 : pp = 155.62757873535156 

0 / 115 : pp = 228.70111083984375 
10 / 115 : pp = 211.03330993652344 
20 / 115 : pp = 212.24957275390625 
30 / 115 : pp = 209.8839569091797 
40 / 115 : pp = 209.11045837402344 
50 / 115 : pp = 204.66351318359375 
60 / 115 : pp = 204.03366088867188 
70 / 115 : pp = 200.46681213378906 
80 / 115 : pp = 198.24404907226562 
90 / 115 : pp = 195.63223266601562 
100 / 115 : pp = 191.18345642089844 
110 / 115 : pp = 189.31134033203125 
Training perplexity: 155.61154174804688
Validation perplexity:188.94537353515625
Total time : 42.13483738899231
Epoch 9

0 / 1452 : pp = 197.80628967285156 
10 / 1452 : pp = 172.6316680908203 
20 / 1452 : pp = 168.6739959716797 
30 / 1452 : pp = 164.4781036376953 
40 / 1452 : pp = 166.1627960205078 
50 / 1452 : pp = 163.05197143554688 
60 / 1452 : pp = 161.87924194335938 
70 / 1452 : pp = 162.5297088623047 
80 / 1452 : pp = 161.7450714111328 
90 / 1452 : pp = 160.6148223876953 
100 / 1452 : pp = 159.73289489746094 
110 / 1452 : pp = 158.4092254638672 
120 / 1452 : pp = 158.04653930664062 
130 / 1452 : pp = 157.13563537597656 
140 / 1452 : pp = 155.71798706054688 
150 / 1452 : pp = 155.19161987304688 
160 / 1452 : pp = 155.42718505859375 
170 / 1452 : pp = 155.0531463623047 
180 / 1452 : pp = 154.46897888183594 
190 / 1452 : pp = 154.4127197265625 
200 / 1452 : pp = 154.97154235839844 
210 / 1452 : pp = 154.70169067382812 
220 / 1452 : pp = 154.72816467285156 
230 / 1452 : pp = 155.03799438476562 
240 / 1452 : pp = 154.85601806640625 
250 / 1452 : pp = 154.28016662597656 
260 / 1452 : pp = 153.7699432373047 
270 / 1452 : pp = 152.90948486328125 
280 / 1452 : pp = 153.0459747314453 
290 / 1452 : pp = 153.298095703125 
300 / 1452 : pp = 153.45716857910156 
310 / 1452 : pp = 153.22195434570312 
320 / 1452 : pp = 153.41664123535156 
330 / 1452 : pp = 153.66542053222656 
340 / 1452 : pp = 153.06378173828125 
350 / 1452 : pp = 153.43923950195312 
360 / 1452 : pp = 153.31381225585938 
370 / 1452 : pp = 153.13473510742188 
380 / 1452 : pp = 152.75267028808594 
390 / 1452 : pp = 152.85504150390625 
400 / 1452 : pp = 152.62342834472656 
410 / 1452 : pp = 153.03152465820312 
420 / 1452 : pp = 153.39161682128906 
430 / 1452 : pp = 153.30364990234375 
440 / 1452 : pp = 153.37896728515625 
450 / 1452 : pp = 153.18988037109375 
460 / 1452 : pp = 152.88478088378906 
470 / 1452 : pp = 152.4380340576172 
480 / 1452 : pp = 151.86618041992188 
490 / 1452 : pp = 151.5962371826172 
500 / 1452 : pp = 151.11614990234375 
510 / 1452 : pp = 150.99830627441406 
520 / 1452 : pp = 150.8135986328125 
530 / 1452 : pp = 150.500732421875 
540 / 1452 : pp = 149.9623260498047 
550 / 1452 : pp = 149.68028259277344 
560 / 1452 : pp = 149.3885040283203 
570 / 1452 : pp = 149.140380859375 
580 / 1452 : pp = 148.76876831054688 
590 / 1452 : pp = 148.43368530273438 
600 / 1452 : pp = 148.02598571777344 
610 / 1452 : pp = 147.7869110107422 
620 / 1452 : pp = 147.59796142578125 
630 / 1452 : pp = 147.30068969726562 
640 / 1452 : pp = 147.45240783691406 
650 / 1452 : pp = 147.4651336669922 
660 / 1452 : pp = 147.5808563232422 
670 / 1452 : pp = 147.65582275390625 
680 / 1452 : pp = 147.7360382080078 
690 / 1452 : pp = 147.63075256347656 
700 / 1452 : pp = 147.6066131591797 
710 / 1452 : pp = 147.7024383544922 
720 / 1452 : pp = 147.7445526123047 
730 / 1452 : pp = 147.72279357910156 
740 / 1452 : pp = 147.87107849121094 
750 / 1452 : pp = 147.91436767578125 
760 / 1452 : pp = 147.9857635498047 
770 / 1452 : pp = 148.18206787109375 
780 / 1452 : pp = 148.3845672607422 
790 / 1452 : pp = 148.5517120361328 
800 / 1452 : pp = 148.54002380371094 
810 / 1452 : pp = 148.51119995117188 
820 / 1452 : pp = 148.5664520263672 
830 / 1452 : pp = 148.7821044921875 
840 / 1452 : pp = 148.72486877441406 
850 / 1452 : pp = 148.77452087402344 
860 / 1452 : pp = 148.80076599121094 
870 / 1452 : pp = 148.79701232910156 
880 / 1452 : pp = 148.9181671142578 
890 / 1452 : pp = 148.94537353515625 
900 / 1452 : pp = 148.9435272216797 
910 / 1452 : pp = 149.02102661132812 
920 / 1452 : pp = 149.1085968017578 
930 / 1452 : pp = 149.06893920898438 
940 / 1452 : pp = 149.1317138671875 
950 / 1452 : pp = 149.1232452392578 
960 / 1452 : pp = 149.10354614257812 
970 / 1452 : pp = 149.11656188964844 
980 / 1452 : pp = 148.94259643554688 
990 / 1452 : pp = 148.8236846923828 
1000 / 1452 : pp = 148.633056640625 
1010 / 1452 : pp = 148.6830291748047 
1020 / 1452 : pp = 148.8126220703125 
1030 / 1452 : pp = 148.78089904785156 
1040 / 1452 : pp = 148.8600311279297 
1050 / 1452 : pp = 148.8486785888672 
1060 / 1452 : pp = 148.7664337158203 
1070 / 1452 : pp = 148.9337921142578 
1080 / 1452 : pp = 149.04441833496094 
1090 / 1452 : pp = 149.07284545898438 
1100 / 1452 : pp = 149.03318786621094 
1110 / 1452 : pp = 148.86428833007812 
1120 / 1452 : pp = 148.7332305908203 
1130 / 1452 : pp = 148.5670166015625 
1140 / 1452 : pp = 148.54661560058594 
1150 / 1452 : pp = 148.64219665527344 
1160 / 1452 : pp = 148.6490020751953 
1170 / 1452 : pp = 148.62420654296875 
1180 / 1452 : pp = 148.67665100097656 
1190 / 1452 : pp = 148.7633056640625 
1200 / 1452 : pp = 148.7782745361328 
1210 / 1452 : pp = 148.72500610351562 
1220 / 1452 : pp = 148.87493896484375 
1230 / 1452 : pp = 149.039794921875 
1240 / 1452 : pp = 149.04000854492188 
1250 / 1452 : pp = 149.17054748535156 
1260 / 1452 : pp = 149.23863220214844 
1270 / 1452 : pp = 149.2436065673828 
1280 / 1452 : pp = 149.25086975097656 
1290 / 1452 : pp = 149.24147033691406 
1300 / 1452 : pp = 149.27413940429688 
1310 / 1452 : pp = 149.32077026367188 
1320 / 1452 : pp = 149.27301025390625 
1330 / 1452 : pp = 149.23080444335938 
1340 / 1452 : pp = 149.25791931152344 
1350 / 1452 : pp = 149.2841033935547 
1360 / 1452 : pp = 149.337158203125 
1370 / 1452 : pp = 149.2467498779297 
1380 / 1452 : pp = 149.21351623535156 
1390 / 1452 : pp = 149.15403747558594 
1400 / 1452 : pp = 149.0877685546875 
1410 / 1452 : pp = 149.110595703125 
1420 / 1452 : pp = 149.07241821289062 
1430 / 1452 : pp = 149.05166625976562 
1440 / 1452 : pp = 149.0776824951172 
1450 / 1452 : pp = 149.06771850585938 

0 / 115 : pp = 227.0559844970703 
10 / 115 : pp = 208.7002410888672 
20 / 115 : pp = 210.38775634765625 
30 / 115 : pp = 207.9513397216797 
40 / 115 : pp = 207.12994384765625 
50 / 115 : pp = 202.70811462402344 
60 / 115 : pp = 202.05787658691406 
70 / 115 : pp = 198.3761444091797 
80 / 115 : pp = 196.17637634277344 
90 / 115 : pp = 193.5880126953125 
100 / 115 : pp = 189.0758819580078 
110 / 115 : pp = 187.07528686523438 
Training perplexity: 149.0502471923828
Validation perplexity:186.6911163330078
Total time : 47.274805545806885
Epoch 10

0 / 1452 : pp = 181.8408203125 
10 / 1452 : pp = 164.99664306640625 
20 / 1452 : pp = 161.8847198486328 
30 / 1452 : pp = 158.30064392089844 
40 / 1452 : pp = 160.13914489746094 
50 / 1452 : pp = 157.58743286132812 
60 / 1452 : pp = 156.11871337890625 
70 / 1452 : pp = 156.82948303222656 
80 / 1452 : pp = 156.2889862060547 
90 / 1452 : pp = 155.04833984375 
100 / 1452 : pp = 154.09327697753906 
110 / 1452 : pp = 152.5070343017578 
120 / 1452 : pp = 152.20750427246094 
130 / 1452 : pp = 151.3399200439453 
140 / 1452 : pp = 149.90740966796875 
150 / 1452 : pp = 149.345703125 
160 / 1452 : pp = 149.59814453125 
170 / 1452 : pp = 149.26539611816406 
180 / 1452 : pp = 148.624267578125 
190 / 1452 : pp = 148.58819580078125 
200 / 1452 : pp = 149.09552001953125 
210 / 1452 : pp = 148.8439178466797 
220 / 1452 : pp = 148.86605834960938 
230 / 1452 : pp = 149.1971435546875 
240 / 1452 : pp = 148.96533203125 
250 / 1452 : pp = 148.4253387451172 
260 / 1452 : pp = 147.9200897216797 
270 / 1452 : pp = 147.08816528320312 
280 / 1452 : pp = 147.24366760253906 
290 / 1452 : pp = 147.52182006835938 
300 / 1452 : pp = 147.72222900390625 
310 / 1452 : pp = 147.50486755371094 
320 / 1452 : pp = 147.73892211914062 
330 / 1452 : pp = 147.9404754638672 
340 / 1452 : pp = 147.37803649902344 
350 / 1452 : pp = 147.6969451904297 
360 / 1452 : pp = 147.5704345703125 
370 / 1452 : pp = 147.38674926757812 
380 / 1452 : pp = 147.03970336914062 
390 / 1452 : pp = 147.14231872558594 
400 / 1452 : pp = 146.91656494140625 
410 / 1452 : pp = 147.34059143066406 
420 / 1452 : pp = 147.68496704101562 
430 / 1452 : pp = 147.61195373535156 
440 / 1452 : pp = 147.68405151367188 
450 / 1452 : pp = 147.4711151123047 
460 / 1452 : pp = 147.1927032470703 
470 / 1452 : pp = 146.72970581054688 
480 / 1452 : pp = 146.17173767089844 
490 / 1452 : pp = 145.9028778076172 
500 / 1452 : pp = 145.42721557617188 
510 / 1452 : pp = 145.3111114501953 
520 / 1452 : pp = 145.11460876464844 
530 / 1452 : pp = 144.81488037109375 
540 / 1452 : pp = 144.263916015625 
550 / 1452 : pp = 143.997802734375 
560 / 1452 : pp = 143.71766662597656 
570 / 1452 : pp = 143.47451782226562 
580 / 1452 : pp = 143.08474731445312 
590 / 1452 : pp = 142.77920532226562 
600 / 1452 : pp = 142.39573669433594 
610 / 1452 : pp = 142.14906311035156 
620 / 1452 : pp = 141.9574432373047 
630 / 1452 : pp = 141.67369079589844 
640 / 1452 : pp = 141.81556701660156 
650 / 1452 : pp = 141.81759643554688 
660 / 1452 : pp = 141.9339599609375 
670 / 1452 : pp = 142.01248168945312 
680 / 1452 : pp = 142.08773803710938 
690 / 1452 : pp = 142.00328063964844 
700 / 1452 : pp = 141.98086547851562 
710 / 1452 : pp = 142.0632781982422 
720 / 1452 : pp = 142.10372924804688 
730 / 1452 : pp = 142.08055114746094 
740 / 1452 : pp = 142.23619079589844 
750 / 1452 : pp = 142.2660369873047 
760 / 1452 : pp = 142.34678649902344 
770 / 1452 : pp = 142.5257568359375 
780 / 1452 : pp = 142.70025634765625 
790 / 1452 : pp = 142.8614044189453 
800 / 1452 : pp = 142.84573364257812 
810 / 1452 : pp = 142.8250274658203 
820 / 1452 : pp = 142.8540496826172 
830 / 1452 : pp = 143.06053161621094 
840 / 1452 : pp = 143.0423126220703 
850 / 1452 : pp = 143.09634399414062 
860 / 1452 : pp = 143.10487365722656 
870 / 1452 : pp = 143.0884246826172 
880 / 1452 : pp = 143.19387817382812 
890 / 1452 : pp = 143.236083984375 
900 / 1452 : pp = 143.23390197753906 
910 / 1452 : pp = 143.29537963867188 
920 / 1452 : pp = 143.3722686767578 
930 / 1452 : pp = 143.33795166015625 
940 / 1452 : pp = 143.40618896484375 
950 / 1452 : pp = 143.3929901123047 
960 / 1452 : pp = 143.3693389892578 
970 / 1452 : pp = 143.39736938476562 
980 / 1452 : pp = 143.2371063232422 
990 / 1452 : pp = 143.13893127441406 
1000 / 1452 : pp = 142.9658660888672 
1010 / 1452 : pp = 143.01544189453125 
1020 / 1452 : pp = 143.152587890625 
1030 / 1452 : pp = 143.11334228515625 
1040 / 1452 : pp = 143.19020080566406 
1050 / 1452 : pp = 143.18234252929688 
1060 / 1452 : pp = 143.092041015625 
1070 / 1452 : pp = 143.24449157714844 
1080 / 1452 : pp = 143.34828186035156 
1090 / 1452 : pp = 143.38739013671875 
1100 / 1452 : pp = 143.37432861328125 
1110 / 1452 : pp = 143.20596313476562 
1120 / 1452 : pp = 143.07969665527344 
1130 / 1452 : pp = 142.92041015625 
1140 / 1452 : pp = 142.90902709960938 
1150 / 1452 : pp = 143.00732421875 
1160 / 1452 : pp = 143.01182556152344 
1170 / 1452 : pp = 142.9925994873047 
1180 / 1452 : pp = 143.06080627441406 
1190 / 1452 : pp = 143.14337158203125 
1200 / 1452 : pp = 143.16644287109375 
1210 / 1452 : pp = 143.1259002685547 
1220 / 1452 : pp = 143.2671661376953 
1230 / 1452 : pp = 143.4210968017578 
1240 / 1452 : pp = 143.4327850341797 
1250 / 1452 : pp = 143.5699920654297 
1260 / 1452 : pp = 143.63771057128906 
1270 / 1452 : pp = 143.65798950195312 
1280 / 1452 : pp = 143.68251037597656 
1290 / 1452 : pp = 143.68045043945312 
1300 / 1452 : pp = 143.72293090820312 
1310 / 1452 : pp = 143.77015686035156 
1320 / 1452 : pp = 143.71910095214844 
1330 / 1452 : pp = 143.68792724609375 
1340 / 1452 : pp = 143.7241668701172 
1350 / 1452 : pp = 143.7570037841797 
1360 / 1452 : pp = 143.81829833984375 
1370 / 1452 : pp = 143.7487030029297 
1380 / 1452 : pp = 143.7196502685547 
1390 / 1452 : pp = 143.67359924316406 
1400 / 1452 : pp = 143.60592651367188 
1410 / 1452 : pp = 143.62620544433594 
1420 / 1452 : pp = 143.5905303955078 
1430 / 1452 : pp = 143.55799865722656 
1440 / 1452 : pp = 143.5891571044922 
1450 / 1452 : pp = 143.5869598388672 

0 / 115 : pp = 226.9864959716797 
10 / 115 : pp = 207.8067169189453 
20 / 115 : pp = 209.68667602539062 
30 / 115 : pp = 207.1610565185547 
40 / 115 : pp = 206.3247833251953 
50 / 115 : pp = 201.77403259277344 
60 / 115 : pp = 201.07098388671875 
70 / 115 : pp = 197.33335876464844 
80 / 115 : pp = 195.12513732910156 
90 / 115 : pp = 192.5349578857422 
100 / 115 : pp = 187.90072631835938 
110 / 115 : pp = 185.81240844726562 
Training perplexity: 143.57354736328125
Validation perplexity:185.40573120117188
Total time : 46.14846849441528
Epoch 11

0 / 1452 : pp = 181.93162536621094 
10 / 1452 : pp = 159.94607543945312 
20 / 1452 : pp = 156.83673095703125 
30 / 1452 : pp = 153.75843811035156 
40 / 1452 : pp = 155.18362426757812 
50 / 1452 : pp = 152.39529418945312 
60 / 1452 : pp = 151.18772888183594 
70 / 1452 : pp = 151.9004364013672 
80 / 1452 : pp = 151.30239868164062 
90 / 1452 : pp = 150.1591033935547 
100 / 1452 : pp = 149.18618774414062 
110 / 1452 : pp = 147.72653198242188 
120 / 1452 : pp = 147.4357452392578 
130 / 1452 : pp = 146.41372680664062 
140 / 1452 : pp = 145.0057373046875 
150 / 1452 : pp = 144.39447021484375 
160 / 1452 : pp = 144.5330047607422 
170 / 1452 : pp = 144.23593139648438 
180 / 1452 : pp = 143.63990783691406 
190 / 1452 : pp = 143.63812255859375 
200 / 1452 : pp = 144.1143798828125 
210 / 1452 : pp = 143.88278198242188 
220 / 1452 : pp = 143.92518615722656 
230 / 1452 : pp = 144.24032592773438 
240 / 1452 : pp = 143.94110107421875 
250 / 1452 : pp = 143.3688507080078 
260 / 1452 : pp = 142.8829345703125 
270 / 1452 : pp = 142.11952209472656 
280 / 1452 : pp = 142.19415283203125 
290 / 1452 : pp = 142.51889038085938 
300 / 1452 : pp = 142.70494079589844 
310 / 1452 : pp = 142.51426696777344 
320 / 1452 : pp = 142.70106506347656 
330 / 1452 : pp = 142.88014221191406 
340 / 1452 : pp = 142.3287353515625 
350 / 1452 : pp = 142.6169891357422 
360 / 1452 : pp = 142.51971435546875 
370 / 1452 : pp = 142.33566284179688 
380 / 1452 : pp = 142.04161071777344 
390 / 1452 : pp = 142.13551330566406 
400 / 1452 : pp = 141.9499969482422 
410 / 1452 : pp = 142.3361358642578 
420 / 1452 : pp = 142.64065551757812 
430 / 1452 : pp = 142.5511016845703 
440 / 1452 : pp = 142.6728973388672 
450 / 1452 : pp = 142.47030639648438 
460 / 1452 : pp = 142.1704864501953 
470 / 1452 : pp = 141.73390197753906 
480 / 1452 : pp = 141.23020935058594 
490 / 1452 : pp = 140.9759521484375 
500 / 1452 : pp = 140.51609802246094 
510 / 1452 : pp = 140.40545654296875 
520 / 1452 : pp = 140.1936492919922 
530 / 1452 : pp = 139.8929443359375 
540 / 1452 : pp = 139.3696746826172 
550 / 1452 : pp = 139.13217163085938 
560 / 1452 : pp = 138.85247802734375 
570 / 1452 : pp = 138.6092987060547 
580 / 1452 : pp = 138.2471160888672 
590 / 1452 : pp = 137.9485626220703 
600 / 1452 : pp = 137.57379150390625 
610 / 1452 : pp = 137.31576538085938 
620 / 1452 : pp = 137.14230346679688 
630 / 1452 : pp = 136.87405395507812 
640 / 1452 : pp = 137.02928161621094 
650 / 1452 : pp = 137.0481719970703 
660 / 1452 : pp = 137.1595001220703 
670 / 1452 : pp = 137.21124267578125 
680 / 1452 : pp = 137.2671356201172 
690 / 1452 : pp = 137.19410705566406 
700 / 1452 : pp = 137.1850128173828 
710 / 1452 : pp = 137.26058959960938 
720 / 1452 : pp = 137.30726623535156 
730 / 1452 : pp = 137.28048706054688 
740 / 1452 : pp = 137.4352569580078 
750 / 1452 : pp = 137.4680938720703 
760 / 1452 : pp = 137.5524139404297 
770 / 1452 : pp = 137.73829650878906 
780 / 1452 : pp = 137.90882873535156 
790 / 1452 : pp = 138.05865478515625 
800 / 1452 : pp = 138.0673370361328 
810 / 1452 : pp = 138.03909301757812 
820 / 1452 : pp = 138.084716796875 
830 / 1452 : pp = 138.27989196777344 
840 / 1452 : pp = 138.23545837402344 
850 / 1452 : pp = 138.30343627929688 
860 / 1452 : pp = 138.3339080810547 
870 / 1452 : pp = 138.32835388183594 
880 / 1452 : pp = 138.4450225830078 
890 / 1452 : pp = 138.47157287597656 
900 / 1452 : pp = 138.46304321289062 
910 / 1452 : pp = 138.55618286132812 
920 / 1452 : pp = 138.64512634277344 
930 / 1452 : pp = 138.6160430908203 
940 / 1452 : pp = 138.66932678222656 
950 / 1452 : pp = 138.6573028564453 
960 / 1452 : pp = 138.6463165283203 
970 / 1452 : pp = 138.67059326171875 
980 / 1452 : pp = 138.50999450683594 
990 / 1452 : pp = 138.42430114746094 
1000 / 1452 : pp = 138.25344848632812 
1010 / 1452 : pp = 138.3004608154297 
1020 / 1452 : pp = 138.4243621826172 
1030 / 1452 : pp = 138.40713500976562 
1040 / 1452 : pp = 138.47129821777344 
1050 / 1452 : pp = 138.45928955078125 
1060 / 1452 : pp = 138.3919677734375 
1070 / 1452 : pp = 138.5287628173828 
1080 / 1452 : pp = 138.62298583984375 
1090 / 1452 : pp = 138.6699981689453 
1100 / 1452 : pp = 138.64849853515625 
1110 / 1452 : pp = 138.49191284179688 
1120 / 1452 : pp = 138.37355041503906 
1130 / 1452 : pp = 138.2216796875 
1140 / 1452 : pp = 138.21534729003906 
1150 / 1452 : pp = 138.30963134765625 
1160 / 1452 : pp = 138.316162109375 
1170 / 1452 : pp = 138.3023681640625 
1180 / 1452 : pp = 138.36932373046875 
1190 / 1452 : pp = 138.45960998535156 
1200 / 1452 : pp = 138.4866180419922 
1210 / 1452 : pp = 138.45730590820312 
1220 / 1452 : pp = 138.60031127929688 
1230 / 1452 : pp = 138.75485229492188 
1240 / 1452 : pp = 138.7751007080078 
1250 / 1452 : pp = 138.91221618652344 
1260 / 1452 : pp = 138.9815216064453 
1270 / 1452 : pp = 138.9919891357422 
1280 / 1452 : pp = 139.0243377685547 
1290 / 1452 : pp = 139.02725219726562 
1300 / 1452 : pp = 139.0701446533203 
1310 / 1452 : pp = 139.1090850830078 
1320 / 1452 : pp = 139.06027221679688 
1330 / 1452 : pp = 139.0338134765625 
1340 / 1452 : pp = 139.06385803222656 
1350 / 1452 : pp = 139.09608459472656 
1360 / 1452 : pp = 139.1609649658203 
1370 / 1452 : pp = 139.0869903564453 
1380 / 1452 : pp = 139.0604705810547 
1390 / 1452 : pp = 139.01670837402344 
1400 / 1452 : pp = 138.94393920898438 
1410 / 1452 : pp = 138.97323608398438 
1420 / 1452 : pp = 138.9404296875 
1430 / 1452 : pp = 138.90943908691406 
1440 / 1452 : pp = 138.94268798828125 
1450 / 1452 : pp = 138.93991088867188 

0 / 115 : pp = 225.55990600585938 
10 / 115 : pp = 207.0504608154297 
20 / 115 : pp = 208.98306274414062 
30 / 115 : pp = 206.28396606445312 
40 / 115 : pp = 205.35386657714844 
50 / 115 : pp = 200.7255401611328 
60 / 115 : pp = 200.0526580810547 
70 / 115 : pp = 196.33087158203125 
80 / 115 : pp = 194.12110900878906 
90 / 115 : pp = 191.52816772460938 
100 / 115 : pp = 186.7974395751953 
110 / 115 : pp = 184.59829711914062 
Training perplexity: 138.9222869873047
Validation perplexity:184.18101501464844
Total time : 43.92928600311279
Epoch 12

0 / 1452 : pp = 173.0251007080078 
10 / 1452 : pp = 152.98446655273438 
20 / 1452 : pp = 150.43128967285156 
30 / 1452 : pp = 147.5819854736328 
40 / 1452 : pp = 149.4164276123047 
50 / 1452 : pp = 146.70816040039062 
60 / 1452 : pp = 145.557861328125 
70 / 1452 : pp = 146.50473022460938 
80 / 1452 : pp = 145.83200073242188 
90 / 1452 : pp = 144.84402465820312 
100 / 1452 : pp = 144.0390167236328 
110 / 1452 : pp = 142.66514587402344 
120 / 1452 : pp = 142.3549346923828 
130 / 1452 : pp = 141.4630126953125 
140 / 1452 : pp = 140.2266082763672 
150 / 1452 : pp = 139.67518615722656 
160 / 1452 : pp = 139.90414428710938 
170 / 1452 : pp = 139.5490264892578 
180 / 1452 : pp = 138.91969299316406 
190 / 1452 : pp = 138.89234924316406 
200 / 1452 : pp = 139.40908813476562 
210 / 1452 : pp = 139.19068908691406 
220 / 1452 : pp = 139.35513305664062 
230 / 1452 : pp = 139.5464324951172 
240 / 1452 : pp = 139.3047637939453 
250 / 1452 : pp = 138.7708740234375 
260 / 1452 : pp = 138.29188537597656 
270 / 1452 : pp = 137.4787139892578 
280 / 1452 : pp = 137.6367950439453 
290 / 1452 : pp = 137.98513793945312 
300 / 1452 : pp = 138.17819213867188 
310 / 1452 : pp = 137.943359375 
320 / 1452 : pp = 138.12060546875 
330 / 1452 : pp = 138.29037475585938 
340 / 1452 : pp = 137.77606201171875 
350 / 1452 : pp = 138.06378173828125 
360 / 1452 : pp = 137.99000549316406 
370 / 1452 : pp = 137.81922912597656 
380 / 1452 : pp = 137.52159118652344 
390 / 1452 : pp = 137.61782836914062 
400 / 1452 : pp = 137.4178924560547 
410 / 1452 : pp = 137.82632446289062 
420 / 1452 : pp = 138.17567443847656 
430 / 1452 : pp = 138.11863708496094 
440 / 1452 : pp = 138.215087890625 
450 / 1452 : pp = 137.9976348876953 
460 / 1452 : pp = 137.6929168701172 
470 / 1452 : pp = 137.25416564941406 
480 / 1452 : pp = 136.75140380859375 
490 / 1452 : pp = 136.51712036132812 
500 / 1452 : pp = 136.0896453857422 
510 / 1452 : pp = 135.97048950195312 
520 / 1452 : pp = 135.7760009765625 
530 / 1452 : pp = 135.50389099121094 
540 / 1452 : pp = 135.01437377929688 
550 / 1452 : pp = 134.7666015625 
560 / 1452 : pp = 134.48973083496094 
570 / 1452 : pp = 134.22853088378906 
580 / 1452 : pp = 133.88455200195312 
590 / 1452 : pp = 133.5808868408203 
600 / 1452 : pp = 133.22975158691406 
610 / 1452 : pp = 132.99591064453125 
620 / 1452 : pp = 132.79502868652344 
630 / 1452 : pp = 132.5094451904297 
640 / 1452 : pp = 132.62892150878906 
650 / 1452 : pp = 132.63499450683594 
660 / 1452 : pp = 132.7379913330078 
670 / 1452 : pp = 132.79046630859375 
680 / 1452 : pp = 132.85842895507812 
690 / 1452 : pp = 132.80364990234375 
700 / 1452 : pp = 132.80477905273438 
710 / 1452 : pp = 132.90170288085938 
720 / 1452 : pp = 132.92971801757812 
730 / 1452 : pp = 132.9019012451172 
740 / 1452 : pp = 133.04811096191406 
750 / 1452 : pp = 133.10877990722656 
760 / 1452 : pp = 133.19189453125 
770 / 1452 : pp = 133.3564910888672 
780 / 1452 : pp = 133.54000854492188 
790 / 1452 : pp = 133.69239807128906 
800 / 1452 : pp = 133.68495178222656 
810 / 1452 : pp = 133.67971801757812 
820 / 1452 : pp = 133.7035675048828 
830 / 1452 : pp = 133.89329528808594 
840 / 1452 : pp = 133.850341796875 
850 / 1452 : pp = 133.90390014648438 
860 / 1452 : pp = 133.9090118408203 
870 / 1452 : pp = 133.89974975585938 
880 / 1452 : pp = 134.0077667236328 
890 / 1452 : pp = 134.03485107421875 
900 / 1452 : pp = 134.0261688232422 
910 / 1452 : pp = 134.10255432128906 
920 / 1452 : pp = 134.17291259765625 
930 / 1452 : pp = 134.14796447753906 
940 / 1452 : pp = 134.20925903320312 
950 / 1452 : pp = 134.19281005859375 
960 / 1452 : pp = 134.17745971679688 
970 / 1452 : pp = 134.18653869628906 
980 / 1452 : pp = 134.03192138671875 
990 / 1452 : pp = 133.94349670410156 
1000 / 1452 : pp = 133.79685974121094 
1010 / 1452 : pp = 133.8438262939453 
1020 / 1452 : pp = 133.9608612060547 
1030 / 1452 : pp = 133.93934631347656 
1040 / 1452 : pp = 134.02833557128906 
1050 / 1452 : pp = 134.01734924316406 
1060 / 1452 : pp = 133.95346069335938 
1070 / 1452 : pp = 134.10205078125 
1080 / 1452 : pp = 134.2030487060547 
1090 / 1452 : pp = 134.23696899414062 
1100 / 1452 : pp = 134.2230224609375 
1110 / 1452 : pp = 134.0829315185547 
1120 / 1452 : pp = 133.980224609375 
1130 / 1452 : pp = 133.83815002441406 
1140 / 1452 : pp = 133.8366241455078 
1150 / 1452 : pp = 133.92108154296875 
1160 / 1452 : pp = 133.94375610351562 
1170 / 1452 : pp = 133.9360809326172 
1180 / 1452 : pp = 133.99684143066406 
1190 / 1452 : pp = 134.0944366455078 
1200 / 1452 : pp = 134.11676025390625 
1210 / 1452 : pp = 134.0911102294922 
1220 / 1452 : pp = 134.22763061523438 
1230 / 1452 : pp = 134.38043212890625 
1240 / 1452 : pp = 134.39817810058594 
1250 / 1452 : pp = 134.5367431640625 
1260 / 1452 : pp = 134.593017578125 
1270 / 1452 : pp = 134.61497497558594 
1280 / 1452 : pp = 134.6423797607422 
1290 / 1452 : pp = 134.64340209960938 
1300 / 1452 : pp = 134.68026733398438 
1310 / 1452 : pp = 134.73556518554688 
1320 / 1452 : pp = 134.69021606445312 
1330 / 1452 : pp = 134.66131591796875 
1340 / 1452 : pp = 134.69393920898438 
1350 / 1452 : pp = 134.7328643798828 
1360 / 1452 : pp = 134.79405212402344 
1370 / 1452 : pp = 134.71237182617188 
1380 / 1452 : pp = 134.6885528564453 
1390 / 1452 : pp = 134.65110778808594 
1400 / 1452 : pp = 134.59584045410156 
1410 / 1452 : pp = 134.6193389892578 
1420 / 1452 : pp = 134.58338928222656 
1430 / 1452 : pp = 134.559326171875 
1440 / 1452 : pp = 134.59507751464844 
1450 / 1452 : pp = 134.59365844726562 

0 / 115 : pp = 226.0741729736328 
10 / 115 : pp = 207.00494384765625 
20 / 115 : pp = 209.26976013183594 
30 / 115 : pp = 206.44662475585938 
40 / 115 : pp = 205.47268676757812 
50 / 115 : pp = 200.7876739501953 
60 / 115 : pp = 200.13414001464844 
70 / 115 : pp = 196.35549926757812 
80 / 115 : pp = 194.10777282714844 
90 / 115 : pp = 191.47467041015625 
100 / 115 : pp = 186.61351013183594 
110 / 115 : pp = 184.30374145507812 
Training perplexity: 134.57826232910156
Validation perplexity:183.8900146484375
Total time : 45.410256147384644
Epoch 13

0 / 1452 : pp = 169.39393615722656 
10 / 1452 : pp = 150.13232421875 
20 / 1452 : pp = 147.60450744628906 
30 / 1452 : pp = 144.64317321777344 
40 / 1452 : pp = 146.47427368164062 
50 / 1452 : pp = 143.929443359375 
60 / 1452 : pp = 142.8344268798828 
70 / 1452 : pp = 143.45248413085938 
80 / 1452 : pp = 142.5418701171875 
90 / 1452 : pp = 141.6178436279297 
100 / 1452 : pp = 140.70127868652344 
110 / 1452 : pp = 139.2852325439453 
120 / 1452 : pp = 138.8017120361328 
130 / 1452 : pp = 137.85629272460938 
140 / 1452 : pp = 136.51718139648438 
150 / 1452 : pp = 136.03619384765625 
160 / 1452 : pp = 136.154296875 
170 / 1452 : pp = 135.67037963867188 
180 / 1452 : pp = 135.0376739501953 
190 / 1452 : pp = 134.9230499267578 
200 / 1452 : pp = 135.4241180419922 
210 / 1452 : pp = 135.24581909179688 
220 / 1452 : pp = 135.37957763671875 
230 / 1452 : pp = 135.67652893066406 
240 / 1452 : pp = 135.4161834716797 
250 / 1452 : pp = 134.90895080566406 
260 / 1452 : pp = 134.46754455566406 
270 / 1452 : pp = 133.68577575683594 
280 / 1452 : pp = 133.86770629882812 
290 / 1452 : pp = 134.18475341796875 
300 / 1452 : pp = 134.39132690429688 
310 / 1452 : pp = 134.19985961914062 
320 / 1452 : pp = 134.37998962402344 
330 / 1452 : pp = 134.5557403564453 
340 / 1452 : pp = 134.00686645507812 
350 / 1452 : pp = 134.27749633789062 
360 / 1452 : pp = 134.20286560058594 
370 / 1452 : pp = 134.042724609375 
380 / 1452 : pp = 133.74398803710938 
390 / 1452 : pp = 133.83584594726562 
400 / 1452 : pp = 133.64382934570312 
410 / 1452 : pp = 134.02366638183594 
420 / 1452 : pp = 134.35415649414062 
430 / 1452 : pp = 134.310546875 
440 / 1452 : pp = 134.3634490966797 
450 / 1452 : pp = 134.15602111816406 
460 / 1452 : pp = 133.86578369140625 
470 / 1452 : pp = 133.43414306640625 
480 / 1452 : pp = 132.90310668945312 
490 / 1452 : pp = 132.646240234375 
500 / 1452 : pp = 132.1982421875 
510 / 1452 : pp = 132.04200744628906 
520 / 1452 : pp = 131.86940002441406 
530 / 1452 : pp = 131.59841918945312 
540 / 1452 : pp = 131.12356567382812 
550 / 1452 : pp = 130.887939453125 
560 / 1452 : pp = 130.6210174560547 
570 / 1452 : pp = 130.37826538085938 
580 / 1452 : pp = 130.0374755859375 
590 / 1452 : pp = 129.75979614257812 
600 / 1452 : pp = 129.38308715820312 
610 / 1452 : pp = 129.16685485839844 
620 / 1452 : pp = 129.0115509033203 
630 / 1452 : pp = 128.75152587890625 
640 / 1452 : pp = 128.87295532226562 
650 / 1452 : pp = 128.88734436035156 
660 / 1452 : pp = 128.98275756835938 
670 / 1452 : pp = 129.0487060546875 
680 / 1452 : pp = 129.11013793945312 
690 / 1452 : pp = 129.0646514892578 
700 / 1452 : pp = 129.06280517578125 
710 / 1452 : pp = 129.1343994140625 
720 / 1452 : pp = 129.18582153320312 
730 / 1452 : pp = 129.15138244628906 
740 / 1452 : pp = 129.29811096191406 
750 / 1452 : pp = 129.339599609375 
760 / 1452 : pp = 129.4257354736328 
770 / 1452 : pp = 129.61631774902344 
780 / 1452 : pp = 129.802734375 
790 / 1452 : pp = 129.96804809570312 
800 / 1452 : pp = 129.95187377929688 
810 / 1452 : pp = 129.92417907714844 
820 / 1452 : pp = 129.9774627685547 
830 / 1452 : pp = 130.1638946533203 
840 / 1452 : pp = 130.13095092773438 
850 / 1452 : pp = 130.16595458984375 
860 / 1452 : pp = 130.173828125 
870 / 1452 : pp = 130.170166015625 
880 / 1452 : pp = 130.27032470703125 
890 / 1452 : pp = 130.3022003173828 
900 / 1452 : pp = 130.3071746826172 
910 / 1452 : pp = 130.37939453125 
920 / 1452 : pp = 130.46229553222656 
930 / 1452 : pp = 130.43846130371094 
940 / 1452 : pp = 130.50889587402344 
950 / 1452 : pp = 130.50086975097656 
960 / 1452 : pp = 130.4833221435547 
970 / 1452 : pp = 130.50814819335938 
980 / 1452 : pp = 130.35577392578125 
990 / 1452 : pp = 130.26759338378906 
1000 / 1452 : pp = 130.1064453125 
1010 / 1452 : pp = 130.1472625732422 
1020 / 1452 : pp = 130.27169799804688 
1030 / 1452 : pp = 130.25100708007812 
1040 / 1452 : pp = 130.30816650390625 
1050 / 1452 : pp = 130.29803466796875 
1060 / 1452 : pp = 130.2242431640625 
1070 / 1452 : pp = 130.35906982421875 
1080 / 1452 : pp = 130.45103454589844 
1090 / 1452 : pp = 130.49838256835938 
1100 / 1452 : pp = 130.484130859375 
1110 / 1452 : pp = 130.35316467285156 
1120 / 1452 : pp = 130.24697875976562 
1130 / 1452 : pp = 130.10804748535156 
1140 / 1452 : pp = 130.1076202392578 
1150 / 1452 : pp = 130.195068359375 
1160 / 1452 : pp = 130.19674682617188 
1170 / 1452 : pp = 130.18321228027344 
1180 / 1452 : pp = 130.24623107910156 
1190 / 1452 : pp = 130.33905029296875 
1200 / 1452 : pp = 130.3650360107422 
1210 / 1452 : pp = 130.34588623046875 
1220 / 1452 : pp = 130.4850616455078 
1230 / 1452 : pp = 130.63160705566406 
1240 / 1452 : pp = 130.64674377441406 
1250 / 1452 : pp = 130.77078247070312 
1260 / 1452 : pp = 130.8397674560547 
1270 / 1452 : pp = 130.8511199951172 
1280 / 1452 : pp = 130.88967895507812 
1290 / 1452 : pp = 130.9040985107422 
1300 / 1452 : pp = 130.93511962890625 
1310 / 1452 : pp = 130.9759063720703 
1320 / 1452 : pp = 130.92800903320312 
1330 / 1452 : pp = 130.9105224609375 
1340 / 1452 : pp = 130.929443359375 
1350 / 1452 : pp = 130.96153259277344 
1360 / 1452 : pp = 131.02381896972656 
1370 / 1452 : pp = 130.9545440673828 
1380 / 1452 : pp = 130.9344940185547 
1390 / 1452 : pp = 130.9055938720703 
1400 / 1452 : pp = 130.85386657714844 
1410 / 1452 : pp = 130.8874969482422 
1420 / 1452 : pp = 130.85928344726562 
1430 / 1452 : pp = 130.83995056152344 
1440 / 1452 : pp = 130.86659240722656 
1450 / 1452 : pp = 130.86839294433594 

0 / 115 : pp = 227.78428649902344 
10 / 115 : pp = 207.609619140625 
20 / 115 : pp = 209.92459106445312 
30 / 115 : pp = 206.96240234375 
40 / 115 : pp = 205.9295654296875 
50 / 115 : pp = 201.0296630859375 
60 / 115 : pp = 200.38059997558594 
70 / 115 : pp = 196.55764770507812 
80 / 115 : pp = 194.31735229492188 
90 / 115 : pp = 191.66146850585938 
100 / 115 : pp = 186.70437622070312 
110 / 115 : pp = 184.3171844482422 
Training perplexity: 130.85043334960938
Validation perplexity:183.88186645507812
Total time : 45.345656394958496
Epoch 14

0 / 1452 : pp = 164.82191467285156 
10 / 1452 : pp = 146.39089965820312 
20 / 1452 : pp = 142.93240356445312 
30 / 1452 : pp = 140.3113555908203 
40 / 1452 : pp = 142.39939880371094 
50 / 1452 : pp = 139.70162963867188 
60 / 1452 : pp = 138.73023986816406 
70 / 1452 : pp = 139.2675018310547 
80 / 1452 : pp = 138.47824096679688 
90 / 1452 : pp = 137.40432739257812 
100 / 1452 : pp = 136.47793579101562 
110 / 1452 : pp = 135.2294464111328 
120 / 1452 : pp = 134.80728149414062 
130 / 1452 : pp = 133.89822387695312 
140 / 1452 : pp = 132.54141235351562 
150 / 1452 : pp = 132.10025024414062 
160 / 1452 : pp = 132.21829223632812 
170 / 1452 : pp = 131.8765106201172 
180 / 1452 : pp = 131.37515258789062 
190 / 1452 : pp = 131.31622314453125 
200 / 1452 : pp = 131.78297424316406 
210 / 1452 : pp = 131.5507354736328 
220 / 1452 : pp = 131.7002410888672 
230 / 1452 : pp = 131.9277801513672 
240 / 1452 : pp = 131.72166442871094 
250 / 1452 : pp = 131.225830078125 
260 / 1452 : pp = 130.7496337890625 
270 / 1452 : pp = 129.9896697998047 
280 / 1452 : pp = 130.10594177246094 
290 / 1452 : pp = 130.41644287109375 
300 / 1452 : pp = 130.5982208251953 
310 / 1452 : pp = 130.36329650878906 
320 / 1452 : pp = 130.5633544921875 
330 / 1452 : pp = 130.77252197265625 
340 / 1452 : pp = 130.273193359375 
350 / 1452 : pp = 130.47889709472656 
360 / 1452 : pp = 130.4348602294922 
370 / 1452 : pp = 130.28126525878906 
380 / 1452 : pp = 130.02786254882812 
390 / 1452 : pp = 130.1564483642578 
400 / 1452 : pp = 129.98440551757812 
410 / 1452 : pp = 130.37721252441406 
420 / 1452 : pp = 130.71859741210938 
430 / 1452 : pp = 130.65939331054688 
440 / 1452 : pp = 130.72987365722656 
450 / 1452 : pp = 130.56272888183594 
460 / 1452 : pp = 130.28195190429688 
470 / 1452 : pp = 129.90936279296875 
480 / 1452 : pp = 129.42857360839844 
490 / 1452 : pp = 129.18077087402344 
500 / 1452 : pp = 128.7588348388672 
510 / 1452 : pp = 128.6303253173828 
520 / 1452 : pp = 128.47616577148438 
530 / 1452 : pp = 128.21148681640625 
540 / 1452 : pp = 127.7218017578125 
550 / 1452 : pp = 127.50067138671875 
560 / 1452 : pp = 127.27574157714844 
570 / 1452 : pp = 127.05399322509766 
580 / 1452 : pp = 126.73983001708984 
590 / 1452 : pp = 126.43692779541016 
600 / 1452 : pp = 126.06050109863281 
610 / 1452 : pp = 125.82952880859375 
620 / 1452 : pp = 125.66295623779297 
630 / 1452 : pp = 125.39354705810547 
640 / 1452 : pp = 125.49463653564453 
650 / 1452 : pp = 125.48816680908203 
660 / 1452 : pp = 125.58712005615234 
670 / 1452 : pp = 125.65978240966797 
680 / 1452 : pp = 125.71456146240234 
690 / 1452 : pp = 125.66937255859375 
700 / 1452 : pp = 125.65900421142578 
710 / 1452 : pp = 125.7271499633789 
720 / 1452 : pp = 125.77758026123047 
730 / 1452 : pp = 125.74129486083984 
740 / 1452 : pp = 125.8759765625 
750 / 1452 : pp = 125.91793823242188 
760 / 1452 : pp = 125.99595642089844 
770 / 1452 : pp = 126.18113708496094 
780 / 1452 : pp = 126.35147094726562 
790 / 1452 : pp = 126.50797271728516 
800 / 1452 : pp = 126.49759674072266 
810 / 1452 : pp = 126.48113250732422 
820 / 1452 : pp = 126.52528381347656 
830 / 1452 : pp = 126.705810546875 
840 / 1452 : pp = 126.67517852783203 
850 / 1452 : pp = 126.74176025390625 
860 / 1452 : pp = 126.74151611328125 
870 / 1452 : pp = 126.73414611816406 
880 / 1452 : pp = 126.83026885986328 
890 / 1452 : pp = 126.88519287109375 
900 / 1452 : pp = 126.88053894042969 
910 / 1452 : pp = 126.97138214111328 
920 / 1452 : pp = 127.04660034179688 
930 / 1452 : pp = 127.03763580322266 
940 / 1452 : pp = 127.1126480102539 
950 / 1452 : pp = 127.09610748291016 
960 / 1452 : pp = 127.0873794555664 
970 / 1452 : pp = 127.10343933105469 
980 / 1452 : pp = 126.96441650390625 
990 / 1452 : pp = 126.88519287109375 
1000 / 1452 : pp = 126.7336654663086 
1010 / 1452 : pp = 126.77796936035156 
1020 / 1452 : pp = 126.89826202392578 
1030 / 1452 : pp = 126.88761138916016 
1040 / 1452 : pp = 126.95309448242188 
1050 / 1452 : pp = 126.96478271484375 
1060 / 1452 : pp = 126.89324188232422 
1070 / 1452 : pp = 127.03242492675781 
1080 / 1452 : pp = 127.13228607177734 
1090 / 1452 : pp = 127.173095703125 
1100 / 1452 : pp = 127.15975189208984 
1110 / 1452 : pp = 127.0392074584961 
1120 / 1452 : pp = 126.94032287597656 
1130 / 1452 : pp = 126.80693054199219 
1140 / 1452 : pp = 126.81315612792969 
1150 / 1452 : pp = 126.90467834472656 
1160 / 1452 : pp = 126.91236114501953 
1170 / 1452 : pp = 126.90897369384766 
1180 / 1452 : pp = 126.98052215576172 
1190 / 1452 : pp = 127.07483673095703 
1200 / 1452 : pp = 127.10216522216797 
1210 / 1452 : pp = 127.08258819580078 
1220 / 1452 : pp = 127.22943878173828 
1230 / 1452 : pp = 127.38563537597656 
1240 / 1452 : pp = 127.40538024902344 
1250 / 1452 : pp = 127.53369140625 
1260 / 1452 : pp = 127.59293365478516 
1270 / 1452 : pp = 127.61489868164062 
1280 / 1452 : pp = 127.6484375 
1290 / 1452 : pp = 127.65257263183594 
1300 / 1452 : pp = 127.69329833984375 
1310 / 1452 : pp = 127.74549102783203 
1320 / 1452 : pp = 127.7043228149414 
1330 / 1452 : pp = 127.6866683959961 
1340 / 1452 : pp = 127.70913696289062 
1350 / 1452 : pp = 127.73233795166016 
1360 / 1452 : pp = 127.7855224609375 
1370 / 1452 : pp = 127.71918487548828 
1380 / 1452 : pp = 127.69987487792969 
1390 / 1452 : pp = 127.6697998046875 
1400 / 1452 : pp = 127.61137390136719 
1410 / 1452 : pp = 127.6404037475586 
1420 / 1452 : pp = 127.61094665527344 
1430 / 1452 : pp = 127.58216857910156 
1440 / 1452 : pp = 127.61477661132812 
1450 / 1452 : pp = 127.61964416503906 

0 / 115 : pp = 228.21578979492188 
10 / 115 : pp = 208.11244201660156 
20 / 115 : pp = 210.688232421875 
30 / 115 : pp = 207.62408447265625 
40 / 115 : pp = 206.45184326171875 
50 / 115 : pp = 201.52760314941406 
60 / 115 : pp = 200.7784881591797 
70 / 115 : pp = 196.83067321777344 
80 / 115 : pp = 194.6357879638672 
90 / 115 : pp = 191.9783935546875 
100 / 115 : pp = 186.8787841796875 
110 / 115 : pp = 184.35252380371094 
Training perplexity: 127.60413360595703
Validation perplexity:183.8877410888672
Total time : 41.6636528968811
Epoch 15

0 / 1452 : pp = 156.81654357910156 
10 / 1452 : pp = 142.1070556640625 
20 / 1452 : pp = 139.55076599121094 
30 / 1452 : pp = 136.63551330566406 
40 / 1452 : pp = 138.5840606689453 
50 / 1452 : pp = 136.052734375 
60 / 1452 : pp = 134.93019104003906 
70 / 1452 : pp = 135.65206909179688 
80 / 1452 : pp = 135.2620086669922 
90 / 1452 : pp = 134.314697265625 
100 / 1452 : pp = 133.4916229248047 
110 / 1452 : pp = 132.26052856445312 
120 / 1452 : pp = 131.7714080810547 
130 / 1452 : pp = 130.77365112304688 
140 / 1452 : pp = 129.5411834716797 
150 / 1452 : pp = 129.0791778564453 
160 / 1452 : pp = 129.21920776367188 
170 / 1452 : pp = 128.7528839111328 
180 / 1452 : pp = 128.22279357910156 
190 / 1452 : pp = 128.18177795410156 
200 / 1452 : pp = 128.58758544921875 
210 / 1452 : pp = 128.3906707763672 
220 / 1452 : pp = 128.5266571044922 
230 / 1452 : pp = 128.80563354492188 
240 / 1452 : pp = 128.61886596679688 
250 / 1452 : pp = 128.13172912597656 
260 / 1452 : pp = 127.69220733642578 
270 / 1452 : pp = 126.96150970458984 
280 / 1452 : pp = 127.04702758789062 
290 / 1452 : pp = 127.33565521240234 
300 / 1452 : pp = 127.55929565429688 
310 / 1452 : pp = 127.38514709472656 
320 / 1452 : pp = 127.52171325683594 
330 / 1452 : pp = 127.68690490722656 
340 / 1452 : pp = 127.18340301513672 
350 / 1452 : pp = 127.4073257446289 
360 / 1452 : pp = 127.30432891845703 
370 / 1452 : pp = 127.17618560791016 
380 / 1452 : pp = 126.92579650878906 
390 / 1452 : pp = 127.02473449707031 
400 / 1452 : pp = 126.8515625 
410 / 1452 : pp = 127.211669921875 
420 / 1452 : pp = 127.51788330078125 
430 / 1452 : pp = 127.47386169433594 
440 / 1452 : pp = 127.57164001464844 
450 / 1452 : pp = 127.3601303100586 
460 / 1452 : pp = 127.09434509277344 
470 / 1452 : pp = 126.71922302246094 
480 / 1452 : pp = 126.24349212646484 
490 / 1452 : pp = 125.98778533935547 
500 / 1452 : pp = 125.59526824951172 
510 / 1452 : pp = 125.4450912475586 
520 / 1452 : pp = 125.29247283935547 
530 / 1452 : pp = 125.03536224365234 
540 / 1452 : pp = 124.5813980102539 
550 / 1452 : pp = 124.33724212646484 
560 / 1452 : pp = 124.08995819091797 
570 / 1452 : pp = 123.86637878417969 
580 / 1452 : pp = 123.53152465820312 
590 / 1452 : pp = 123.20321655273438 
600 / 1452 : pp = 122.85673522949219 
610 / 1452 : pp = 122.64250946044922 
620 / 1452 : pp = 122.4958724975586 
630 / 1452 : pp = 122.22386169433594 
640 / 1452 : pp = 122.31143188476562 
650 / 1452 : pp = 122.30093383789062 
660 / 1452 : pp = 122.39427947998047 
670 / 1452 : pp = 122.45440673828125 
680 / 1452 : pp = 122.51146697998047 
690 / 1452 : pp = 122.4854736328125 
700 / 1452 : pp = 122.48600006103516 
710 / 1452 : pp = 122.56084442138672 
720 / 1452 : pp = 122.59059143066406 
730 / 1452 : pp = 122.55529022216797 
740 / 1452 : pp = 122.69409942626953 
750 / 1452 : pp = 122.76456451416016 
760 / 1452 : pp = 122.84437561035156 
770 / 1452 : pp = 123.02527618408203 
780 / 1452 : pp = 123.20509338378906 
790 / 1452 : pp = 123.36305236816406 
800 / 1452 : pp = 123.36852264404297 
810 / 1452 : pp = 123.36799621582031 
820 / 1452 : pp = 123.39976501464844 
830 / 1452 : pp = 123.59362030029297 
840 / 1452 : pp = 123.56946563720703 
850 / 1452 : pp = 123.63800811767578 
860 / 1452 : pp = 123.63983917236328 
870 / 1452 : pp = 123.64148712158203 
880 / 1452 : pp = 123.7568588256836 
890 / 1452 : pp = 123.7885513305664 
900 / 1452 : pp = 123.79640197753906 
910 / 1452 : pp = 123.86153411865234 
920 / 1452 : pp = 123.92941284179688 
930 / 1452 : pp = 123.9125747680664 
940 / 1452 : pp = 123.95559692382812 
950 / 1452 : pp = 123.93928527832031 
960 / 1452 : pp = 123.94294738769531 
970 / 1452 : pp = 123.95547485351562 
980 / 1452 : pp = 123.8229751586914 
990 / 1452 : pp = 123.73727416992188 
1000 / 1452 : pp = 123.59091186523438 
1010 / 1452 : pp = 123.634765625 
1020 / 1452 : pp = 123.76506042480469 
1030 / 1452 : pp = 123.75485229492188 
1040 / 1452 : pp = 123.807861328125 
1050 / 1452 : pp = 123.79156494140625 
1060 / 1452 : pp = 123.73054504394531 
1070 / 1452 : pp = 123.8615951538086 
1080 / 1452 : pp = 123.96564483642578 
1090 / 1452 : pp = 124.02104187011719 
1100 / 1452 : pp = 124.012939453125 
1110 / 1452 : pp = 123.87582397460938 
1120 / 1452 : pp = 123.775390625 
1130 / 1452 : pp = 123.63182067871094 
1140 / 1452 : pp = 123.62391662597656 
1150 / 1452 : pp = 123.71013641357422 
1160 / 1452 : pp = 123.72423553466797 
1170 / 1452 : pp = 123.71726989746094 
1180 / 1452 : pp = 123.79032897949219 
1190 / 1452 : pp = 123.87883758544922 
1200 / 1452 : pp = 123.9125747680664 
1210 / 1452 : pp = 123.90140533447266 
1220 / 1452 : pp = 124.03245544433594 
1230 / 1452 : pp = 124.19799041748047 
1240 / 1452 : pp = 124.21469116210938 
1250 / 1452 : pp = 124.34103393554688 
1260 / 1452 : pp = 124.4041976928711 
1270 / 1452 : pp = 124.42852020263672 
1280 / 1452 : pp = 124.46656036376953 
1290 / 1452 : pp = 124.4811019897461 
1300 / 1452 : pp = 124.52384185791016 
1310 / 1452 : pp = 124.57533264160156 
1320 / 1452 : pp = 124.5398178100586 
1330 / 1452 : pp = 124.52598571777344 
1340 / 1452 : pp = 124.53311157226562 
1350 / 1452 : pp = 124.57759094238281 
1360 / 1452 : pp = 124.63385772705078 
1370 / 1452 : pp = 124.58133697509766 
1380 / 1452 : pp = 124.55769348144531 
1390 / 1452 : pp = 124.54011535644531 
1400 / 1452 : pp = 124.4884033203125 
1410 / 1452 : pp = 124.51226806640625 
1420 / 1452 : pp = 124.49683380126953 
1430 / 1452 : pp = 124.4754638671875 
1440 / 1452 : pp = 124.50164031982422 
1450 / 1452 : pp = 124.50894165039062 

0 / 115 : pp = 230.8488006591797 
10 / 115 : pp = 209.2509002685547 
20 / 115 : pp = 211.68577575683594 
30 / 115 : pp = 208.44056701660156 
40 / 115 : pp = 207.2039337158203 
50 / 115 : pp = 202.1859588623047 
60 / 115 : pp = 201.34739685058594 
70 / 115 : pp = 197.4251251220703 
80 / 115 : pp = 195.2623291015625 
90 / 115 : pp = 192.592529296875 
100 / 115 : pp = 187.39553833007812 
110 / 115 : pp = 184.791259765625 
Training perplexity: 124.4933853149414
Validation perplexity:184.32510375976562
Total time : 40.856229066848755

0 / 128 : pp = 184.6475067138672 
10 / 128 : pp = 176.8856964111328 
20 / 128 : pp = 164.3444366455078 
30 / 128 : pp = 167.85472106933594 
40 / 128 : pp = 169.25367736816406 
50 / 128 : pp = 168.86561584472656 
60 / 128 : pp = 168.11801147460938 
70 / 128 : pp = 165.4105224609375 
80 / 128 : pp = 162.91146850585938 
90 / 128 : pp = 161.29742431640625 
100 / 128 : pp = 162.45989990234375 
110 / 128 : pp = 162.6834716796875 
120 / 128 : pp = 164.3359832763672 
=-==-==-==-==-=
Test perplexity: 164.0149383544922 
=-==-==-==-==-=
View Code

更详细的内容请参考下面链接

https://github.com/weizhenzhao/cs224d_nlp_problem_set2

 

今天将的还是cs224d 的problem set2 的第三部分习题,

原来国外大学的系统难度真的如此之大,相比之下还是默默地再天朝继续搬砖吧

下面讲述一下RNN语言建模的数学公式:

 

给出一串连续的词x1,x2...xt关于预测其后面紧跟的词xt+1的建模方式是:

vj是词库中的某个词。实现一个循环神经网络,此网络利用隐层中的反馈信息对"历史记录"x1,x2...xt进行建模:

$h^{(0)}=h_{0}\epsilon R^{D_{h}}$是隐藏层的初始化向量

$x^{(t)}L$是以$x^{(t)}$one-hot行向量与嵌入矩阵L的乘积

这个one-hot行向量就是当前处理词汇的索引

            

是词嵌入矩阵,

$L$是词嵌入矩阵

$I$是输入词表征矩阵

$H$是隐藏转换矩阵

$U$是输出词表征矩阵

$b_{1}$ $b_{2}$是偏置值

$d$是词嵌入的维数

|V|代表词库的规模

$D_{h}$是隐层的维数

输出向量

是面向整个词库的概率分布,我们需要最优化交叉熵(非正则化的)的损失率: 

使用困惑度来评估语言模型的性能,其定义形式如下:

梯度:

该模型中各个变量进行最优化迭代的时候的梯度如下所示:

初始化所有的上面这些需要训练的参数的值

然后通过对每一个词进行训练,安装上述公司求出每个参数的导数值

然后使用梯度下降方法对其进行更新

将新得到的参数代入到模型中,如果损失的值小于初始设定的值则停止迭代,否则继续进行迭代 

 


下面是一张RNNLM的结构图

 

上面这张是第二层RNN节点的结构图

上面这张是在RNN的变量上面应用Dropout的结构,降低模型过拟合的误差,第一层RNN的dropout结构

上面这张是第一层RNN的结构图

(注意前方高能,一大批天书即将来袭)

'''
Created on 2017年9月26日

@author: weizhen
'''
import getpass
import sys
import time
import numpy as np
from copy import deepcopy
from utils import calculate_perplexity, get_ptb_dataset, Vocab
from utils import ptb_iterator, sample
import tensorflow as tf
from model import LanguageModel
from tensorflow.contrib.legacy_seq2seq.python.ops.seq2seq import sequence_loss


class Config(object):
    """储存超参数和数据信息"""
    batch_size = 64
    embed_size = 50
    hidden_size = 100
    num_steps = 10
    max_epochs = 16
    early_stopping = 2
    dropout = 0.9
    lr = 0.001


class RNNLM_Model(LanguageModel):
    def load_data(self, debug=False):
        """加载词向量并且训练   train/dev/test 数据"""
        self.vocab = Vocab()
        self.vocab.construct(get_ptb_dataset('train'))
        self.encoded_train = np.array([self.vocab.encode(word) for word in get_ptb_dataset('train')], dtype=np.int32)
        self.encoded_valid = np.array([self.vocab.encode(word) for word in get_ptb_dataset('valid')], dtype=np.int32)
        self.encoded_test = np.array([self.vocab.encode(word) for word in get_ptb_dataset('test')])
        if debug:
            num_debug = 1024
            self.encoded_train = self.encoded_train[:num_debug]
            self.encoded_valid = self.encoded_valid[:num_debug]
            self.encoded_test = self.encoded_test[:num_debug]

    def add_placeholders(self):
        """生成placeholder 变量来表示输入的 tensors
            这些placeholder 被用来在模型的其他地方被填充
                            并且在训练的过程中会被填充
            input_placeholder:Input placeholder shape (None,num_steps),type  tf.int32
            labels_placeholder:label placeholder shape (None,num_steps) type tf.float32
            dropout_placeholder:dropput value placeholder (scalar), type tf.float32
        """
        self.input_placeholder = tf.placeholder(tf.int32, shape=[None, self.config.num_steps], name='Input')
        self.labels_placeholder = tf.placeholder(tf.int32, shape=[None, self.config.num_steps], name='Target')
        self.dropout_placeholder = tf.placeholder(tf.float32, name='Dropout')

    def add_embedding(self):
        """添加词嵌入层
        Hint : 这一层应该用input_placeholder 来索引词嵌入
        Hint : 你或许能发现tf.nn.embedding_lookup 是有用的
        Hint : 你或许能发现tf.split , tf.squeeze 是有用的在构造tensor 的输入的时候
        Hint : 下面是你需要创建的变量的维度
                L:(len(self.vocab),embed_size)
        Returns:
            inputs:一个训练次数的列表,每一个元素应该是
                    一个张量 大小是 (batch_size,embed_size)
        tf.split(dimension,num_split,input)
                dimension表示输入张量的哪一个维度,
                                        如果是0就表示对第0维度进行切割,
                num_split就是切割的数量,
                                        如果是2就表示输入张量被切成2份,
                                        每一份是一个列表
        tf.squeeze(input,squeeze_dims=None,name=None)
                                        从tensor中删除所有大小是1的维度
                example: t is a tensor of shape [1,2,1,3,1,1]
                        shape(squeeze(t))==>[2,3]
                        t is a tensor of shape [1,2,1,3,1,1]
                        shape(squeeze(t,[2,4]))==>[1,2,3,1]
        tf.nn.embedding_lookup 将词的索引映射到词的向量
        """
        with tf.device('/cpu:0'):
            embedding = tf.get_variable('Embedding', [len(self.vocab), self.config.embed_size], trainable=True)
            inputs = tf.nn.embedding_lookup(embedding, self.input_placeholder)
            inputs = [tf.squeeze(x, [1]) for x in tf.split(inputs, self.config.num_steps, 1)]
            return inputs

    def add_projection(self, rnn_outputs):
        """添加一个投影层
            投影层将隐藏层的表示变换到整个词向量上的分布式表示
            Hint:下面是你需要去创建的维度
                U(hidden_size,len(vocab))
                b_2:(len(vocab),)
            参数:
                rnn_outputs:一个训练次数的列表,每一个元素应该是一个张量
                            大小是(batch_size,embed_size)
            Returns:
                outputs:一个长度的列表,每一个元素是一个张量(batch_size,len(vocab))
        """
        with tf.variable_scope('Projection'):
            U = tf.get_variable('Matrix', [self.config.hidden_size, len(self.vocab)])
            proj_b = tf.get_variable('Bias', [len(self.vocab)])
            outputs = [tf.matmul(o, U) + proj_b for o in rnn_outputs]
        return outputs
    
    def add_loss_op(self, output):
        """将损失添加到目标函数上面
            Hint:使用tensorflow.python.ops.seq2seq.sequence_loss 来实现序列损失
                              参数:
                                        输出:一个张量   大小是 (None,self.vocab)
                              返回:
                                        损失:一个0-d大小的张量
        """
        all_ones = [tf.ones([self.config.batch_size * self.config.num_steps])]
        cross_entropy = sequence_loss([output], [tf.reshape(self.labels_placeholder, [-1])], all_ones, len(self.vocab))
        tf.add_to_collection('total_loss', cross_entropy)
        loss = tf.add_n(tf.get_collection('total_loss'))
        return loss
        
        
    def add_training_op(self, loss):
        """将目标损失添加到计算图上
            创建一个优化器并且应用梯度下降到所有的训练变量上面
            Hint:使用tf.train.AdamOptimizer 对于这个模型
                使用optimizer.minimize() 会返回一个train_op的对象
            参数:
                loss: 损失张量,来自于cross_entropy_loss 交叉熵损失
            返回:
                train_op:训练的目标
        """
        with tf.variable_scope("Optimizer") as scope:
            train_op = tf.train.AdamOptimizer(self.config.lr).minimize(loss)
        return train_op

    def __init__(self, config):
        self.config = config
        self.load_data(debug=False)
        self.add_placeholders()
        self.inputs = self.add_embedding()
        self.rnn_outputs = self.add_model(self.inputs)
        self.outputs = self.add_projection(self.rnn_outputs)

        # 我们想去检验下一个词预测得多好
        # 我们把o转变成float64 位 因为如果不这样就会有数值问题
        # sum(output of softmax) = 1.00000298179 并且不是 1
        self.predictions = [tf.nn.softmax(tf.cast(o, 'float64')) for o in self.outputs]
        # 将输出值转变成 len(vocab) 的大小
        output = tf.reshape(tf.concat(self.outputs, 1), [-1, len(self.vocab)])
        self.calculate_loss = self.add_loss_op(output)
        self.train_step = self.add_training_op(self.calculate_loss)

    def add_model(self, inputs):
        """创建RNN LM 模型
                      在下面的实现里面你需要去实现RNN LM 模型的等式
        Hint: 使用一个零向量 大小是 (batch_size,hidden_size) 作为初始的RNN的状态
        Hint: 将最后RNN输出 作为实例变量
            self.final_state
        Hint : 确保将dropout应用到 输入和输出的 变量上面
        Hint : 使用变量域 RNN 来定义 RNN变量
        Hint : 表现一个明显的 for-loop 在输入上面
                你可以使用scope.reuse_variable() 来确定权重
                在每一次迭代都是相同的
                确保不会在第一次循环的时候调用这个,因为没有变量会被初始化
        Hint : 下面变量的不同的维度 , 你需要去创建的

            H: (hidden_size,hidden_size)
            I: (embed_size,hidden_size)
            b_1:(hidden_size,)
        Args:
            inputs:一个记录num_steps的列表,里边的每一个元素应该是一个张量
                    大小是(batch_size,embed_size)的大小
        Returns:返回
            outputs:一个记录num_steps的列表,里面每一个元素应该是一个张量
                    大小是(batch_size,hidden_size)
        """
        with tf.variable_scope('InputDropout'):
            inputs = [tf.nn.dropout(x, self.dropout_placeholder) for x in inputs]

        with tf.variable_scope('RNN') as scope:
            self.initial_state = tf.zeros([self.config.batch_size, self.config.hidden_size])
            state = self.initial_state
            rnn_outputs = []
            for tstep, current_input in enumerate(inputs):
                if tstep > 0:
                    scope.reuse_variables()
                RNN_H = tf.get_variable('HMatrix', [self.config.hidden_size, self.config.hidden_size])
                RNN_I = tf.get_variable('IMatrix', [self.config.embed_size, self.config.hidden_size])
                RNN_b = tf.get_variable('B', [self.config.hidden_size])
                state = tf.nn.sigmoid(tf.matmul(state, RNN_H) + tf.matmul(current_input, RNN_I) + RNN_b)
                rnn_outputs.append(state)
            self.final_state = rnn_outputs[-1]

        with tf.variable_scope('RNNDropout'):
            rnn_outputs = [tf.nn.dropout(x, self.dropout_placeholder) for x in rnn_outputs]
        return rnn_outputs

    def run_epoch(self, session, data, train_op=None, verbose=10):
        config = self.config
        dp = config.dropout
        if not train_op:
            train_op = tf.no_op()
            dp = 1
        total_steps = sum(1 for x in ptb_iterator(data, config.batch_size, config.num_steps))
        total_loss = []
        state = self.initial_state.eval()
        for step, (x, y) in enumerate(ptb_iterator(data, config.batch_size, config.num_steps)):
            # 我们需要通过初始状态,并且从最终状态中抽取数据来进行填充
            # RNN 合适的 历史
            feed = {self.input_placeholder: x,
                    self.labels_placeholder: y,
                    self.initial_state: state,
                    self.dropout_placeholder: dp
                    }
            loss, state, _ = session.run([self.calculate_loss, self.final_state, train_op], feed_dict=feed)
            total_loss.append(loss)
            if verbose and step % verbose == 0:
                sys.stdout.write('\r{} / {} : pp = {} '.format(step, total_steps, np.exp(np.mean(total_loss))))
                sys.stdout.flush()
        if verbose:
            sys.stdout.write('\r')
        return np.exp(np.mean(total_loss))

def generate_text(session, model, config, starting_text='<eos>', stop_length=100, stop_tokens=None, temp=1.0):
    """从模型自动生成文字
        Hint:创建一个feed-dictionary 并且使用sess.run()方法去执行这个模型
                你会需要使用model.initial_state 作为一个键传递给feed_dict
        Hint:得到model.final_state 和 model.predictions[-1].
             在add_model()方法中设置model.final_state  。
             model.predictions 是在 __init__方法中设置的
        Hint:在模型的训练中存储输出的参数值,和预测的y_pred的值
        参数:
        Args:
            session : tf.Session() object
            model : Object of type RNNLM Model
            config : A Config() object
            starting_text:Initial text passed to model
        Returns:
            output : List of word idxs
    """
    state = model.initial_state.eval()
    # Imagine tokens as a batch size of one, length of len(tokens[0])
    tokens = [model.vocab.encode(word) for word in starting_text.split()]
    for i in range(stop_length):
        feed = {model.input_placeholder: [tokens[-1:]],
                model.initial_state: state,
                model.dropout_placeholder: 1}
        state, y_pred = session.run([model.final_state, model.predictions[-1]], feed_dict=feed)
        next_word_idx = sample(y_pred[0], temperature=temp)
        tokens.append(next_word_idx)
        if stop_tokens and model.vocab.decode(tokens[-1]) in stop_tokens:
            break
    output = [model.vocab.decode(word_idx) for word_idx in tokens]
    return output

def generate_sentence(session, model, config, *args, **kwargs):
    """方便从模型来生成句子"""
    return generate_text(session, model, config, *args, stop_tokens=['<eos>'], **kwargs)

def test_RNNLM():
    config = Config()
    gen_config = deepcopy(config)
    gen_config.batch_size = gen_config.num_steps = 1

    # 创建训练模型,并且生成模型
    with tf.variable_scope('RNNLM',reuse=None) as scope:
        model = RNNLM_Model(config)
        # 这个指示gen_model来重新使用相同的变量作为以上的模型
        scope.reuse_variables()
        gen_model = RNNLM_Model(gen_config)

    init = tf.global_variables_initializer()
    saver = tf.train.Saver()

    with tf.Session() as session:
        best_val_pp = float('inf')
        best_val_epoch = 0
        session.run(init)
        for epoch in range(config.max_epochs):
            print('Epoch {0}'.format(epoch))
            start = time.time()

            train_pp = model.run_epoch(session,
                                       model.encoded_train,
                                       train_op=model.train_step)
            valid_pp = model.run_epoch(session, model.encoded_valid)
            print('Training perplexity: {0}'.format(train_pp))
            print('Validation perplexity:{0}'.format(valid_pp))
            if valid_pp < best_val_pp:
                best_val_pp = valid_pp
                best_val_epoch = epoch
                saver.save(session, './ptb_rnnlm.weights')
            if epoch - best_val_epoch > config.early_stopping:
                break
            print('Total time : {0}'.format(time.time() - start))

        saver.restore(session, 'ptb_rnnlm.weights')
        test_pp = model.run_epoch(session, model.encoded_test)
        print('=-=' * 5)
        print('Test perplexity: {0} '.format(test_pp))
        print('=-=' * 5)
        starting_text = 'in palo alto'
        while starting_text:
            print(' '.join(generate_sentence(session, gen_model, gen_config, starting_text=starting_text, temp=1.0)))
            #starting_text = raw_input('>')


if __name__ == "__main__":
    test_RNNLM()

(其实也不算是天书啦,比高数简单多啦,比数学分析那是简单了好几十万倍了呀)

下面是训练的Log

1380 / 1452 : pp = 266.20892333984375 
1390 / 1452 : pp = 265.94439697265625 
1400 / 1452 : pp = 265.66845703125 
1410 / 1452 : pp = 265.5393981933594 
1420 / 1452 : pp = 265.32489013671875 
1430 / 1452 : pp = 265.2019348144531 
1440 / 1452 : pp = 265.13720703125 
1450 / 1452 : pp = 264.954833984375 

0 / 115 : pp = 296.9217224121094 
10 / 115 : pp = 282.02130126953125 
20 / 115 : pp = 279.76800537109375 
30 / 115 : pp = 276.4101257324219 
40 / 115 : pp = 276.2939147949219 
50 / 115 : pp = 270.73565673828125 
60 / 115 : pp = 269.88134765625 
70 / 115 : pp = 266.8675231933594 
80 / 115 : pp = 263.6731872558594 
90 / 115 : pp = 260.8569030761719 
100 / 115 : pp = 256.3356628417969 
110 / 115 : pp = 255.1026611328125 
Training perplexity: 264.9092102050781
Validation perplexity:254.84902954101562
Total time : 41.65332388877869
Epoch 3

0 / 1452 : pp = 327.0847473144531 
10 / 1452 : pp = 273.9620056152344 
20 / 1452 : pp = 270.22943115234375 
30 / 1452 : pp = 263.5213317871094 
40 / 1452 : pp = 264.0644836425781 
50 / 1452 : pp = 258.6029968261719 
60 / 1452 : pp = 257.04290771484375 
70 / 1452 : pp = 257.59161376953125 
80 / 1452 : pp = 256.7600402832031 
90 / 1452 : pp = 254.5120391845703 
100 / 1452 : pp = 252.44725036621094 
110 / 1452 : pp = 250.13954162597656 
120 / 1452 : pp = 249.91647338867188 
130 / 1452 : pp = 249.50460815429688 
140 / 1452 : pp = 247.67440795898438 
150 / 1452 : pp = 247.19090270996094 
160 / 1452 : pp = 247.8919219970703 
170 / 1452 : pp = 247.54322814941406 
180 / 1452 : pp = 246.17623901367188 
190 / 1452 : pp = 245.78330993652344 
200 / 1452 : pp = 246.80552673339844 
210 / 1452 : pp = 246.3059844970703 
220 / 1452 : pp = 246.19021606445312 
230 / 1452 : pp = 246.70140075683594 
240 / 1452 : pp = 246.3099822998047 
250 / 1452 : pp = 245.1745147705078 
260 / 1452 : pp = 244.17384338378906 
270 / 1452 : pp = 242.57363891601562 
280 / 1452 : pp = 242.8500213623047 
290 / 1452 : pp = 243.0492706298828 
300 / 1452 : pp = 243.1466522216797 
310 / 1452 : pp = 242.89044189453125 
320 / 1452 : pp = 243.08045959472656 
330 / 1452 : pp = 243.32235717773438 
340 / 1452 : pp = 242.34715270996094 
350 / 1452 : pp = 242.80972290039062 
360 / 1452 : pp = 242.5345458984375 
370 / 1452 : pp = 242.0083465576172 
380 / 1452 : pp = 241.22708129882812 
390 / 1452 : pp = 241.24398803710938 
400 / 1452 : pp = 240.63473510742188 
410 / 1452 : pp = 240.94094848632812 
420 / 1452 : pp = 241.19717407226562 
430 / 1452 : pp = 240.8896026611328 
440 / 1452 : pp = 240.7772979736328 
450 / 1452 : pp = 240.45913696289062 
460 / 1452 : pp = 240.06674194335938 
470 / 1452 : pp = 239.42198181152344 
480 / 1452 : pp = 238.39271545410156 
490 / 1452 : pp = 238.0517120361328 
500 / 1452 : pp = 237.31752014160156 
510 / 1452 : pp = 237.1197967529297 
520 / 1452 : pp = 236.64865112304688 
530 / 1452 : pp = 236.004638671875 
540 / 1452 : pp = 235.192626953125 
550 / 1452 : pp = 234.6700439453125 
560 / 1452 : pp = 234.1914825439453 
570 / 1452 : pp = 233.80899047851562 
580 / 1452 : pp = 233.3753662109375 
590 / 1452 : pp = 232.8699188232422 
600 / 1452 : pp = 232.2629852294922 
610 / 1452 : pp = 231.8668212890625 
620 / 1452 : pp = 231.478515625 
630 / 1452 : pp = 231.0444793701172 
640 / 1452 : pp = 231.2737579345703 
650 / 1452 : pp = 231.28114318847656 
660 / 1452 : pp = 231.4324951171875 
670 / 1452 : pp = 231.48513793945312 
680 / 1452 : pp = 231.45932006835938 
690 / 1452 : pp = 231.17738342285156 
700 / 1452 : pp = 231.00570678710938 
710 / 1452 : pp = 231.03810119628906 
720 / 1452 : pp = 230.96131896972656 
730 / 1452 : pp = 230.91110229492188 
740 / 1452 : pp = 231.13539123535156 
750 / 1452 : pp = 231.04393005371094 
760 / 1452 : pp = 231.03489685058594 
770 / 1452 : pp = 231.19744873046875 
780 / 1452 : pp = 231.26625061035156 
790 / 1452 : pp = 231.38714599609375 
800 / 1452 : pp = 231.24441528320312 
810 / 1452 : pp = 231.16824340820312 
820 / 1452 : pp = 231.11831665039062 
830 / 1452 : pp = 231.34886169433594 
840 / 1452 : pp = 231.221923828125 
850 / 1452 : pp = 231.2562255859375 
860 / 1452 : pp = 231.26492309570312 
870 / 1452 : pp = 231.1961212158203 
880 / 1452 : pp = 231.30506896972656 
890 / 1452 : pp = 231.24728393554688 
900 / 1452 : pp = 231.15744018554688 
910 / 1452 : pp = 231.20175170898438 
920 / 1452 : pp = 231.25534057617188 
930 / 1452 : pp = 231.09461975097656 
940 / 1452 : pp = 231.12612915039062 
950 / 1452 : pp = 231.0475616455078 
960 / 1452 : pp = 230.86056518554688 
970 / 1452 : pp = 230.80377197265625 
980 / 1452 : pp = 230.4598846435547 
990 / 1452 : pp = 230.24559020996094 
1000 / 1452 : pp = 229.91030883789062 
1010 / 1452 : pp = 229.9349822998047 
1020 / 1452 : pp = 230.01470947265625 
1030 / 1452 : pp = 229.8909149169922 
1040 / 1452 : pp = 229.9403533935547 
1050 / 1452 : pp = 229.84815979003906 
1060 / 1452 : pp = 229.60377502441406 
1070 / 1452 : pp = 229.74647521972656 
1080 / 1452 : pp = 229.80410766601562 
1090 / 1452 : pp = 229.78733825683594 
1100 / 1452 : pp = 229.64549255371094 
1110 / 1452 : pp = 229.26255798339844 
1120 / 1452 : pp = 229.00262451171875 
1130 / 1452 : pp = 228.6716766357422 
1140 / 1452 : pp = 228.55067443847656 
1150 / 1452 : pp = 228.61563110351562 
1160 / 1452 : pp = 228.50958251953125 
1170 / 1452 : pp = 228.3498992919922 
1180 / 1452 : pp = 228.29786682128906 
1190 / 1452 : pp = 228.33204650878906 
1200 / 1452 : pp = 228.27369689941406 
1210 / 1452 : pp = 228.11831665039062 
1220 / 1452 : pp = 228.21775817871094 
1230 / 1452 : pp = 228.3170166015625 
1240 / 1452 : pp = 228.22134399414062 
1250 / 1452 : pp = 228.3769073486328 
1260 / 1452 : pp = 228.37527465820312 
1270 / 1452 : pp = 228.33694458007812 
1280 / 1452 : pp = 228.27108764648438 
1290 / 1452 : pp = 228.1731414794922 
1300 / 1452 : pp = 228.12200927734375 
1310 / 1452 : pp = 228.10275268554688 
1320 / 1452 : pp = 227.9289093017578 
1330 / 1452 : pp = 227.77723693847656 
1340 / 1452 : pp = 227.79623413085938 
1350 / 1452 : pp = 227.7408447265625 
1360 / 1452 : pp = 227.72586059570312 
1370 / 1452 : pp = 227.49728393554688 
1380 / 1452 : pp = 227.37940979003906 
1390 / 1452 : pp = 227.20166015625 
1400 / 1452 : pp = 227.018310546875 
1410 / 1452 : pp = 226.95651245117188 
1420 / 1452 : pp = 226.8065643310547 
1430 / 1452 : pp = 226.7261199951172 
1440 / 1452 : pp = 226.7193145751953 
1450 / 1452 : pp = 226.61068725585938 

0 / 115 : pp = 269.342041015625 
10 / 115 : pp = 255.03016662597656 
20 / 115 : pp = 253.8992919921875 
30 / 115 : pp = 251.04025268554688 
40 / 115 : pp = 250.51756286621094 
50 / 115 : pp = 245.3595428466797 
60 / 115 : pp = 244.4713897705078 
70 / 115 : pp = 241.2674560546875 
80 / 115 : pp = 238.3473663330078 
90 / 115 : pp = 235.56423950195312 
100 / 115 : pp = 231.2281036376953 
110 / 115 : pp = 229.8423614501953 
Training perplexity: 226.5760040283203
Validation perplexity:229.59939575195312
Total time : 42.202677726745605
Epoch 4

0 / 1452 : pp = 282.2423095703125 
10 / 1452 : pp = 240.16258239746094 
20 / 1452 : pp = 236.12203979492188 
30 / 1452 : pp = 230.3953857421875 
40 / 1452 : pp = 231.8789825439453 
50 / 1452 : pp = 227.26612854003906 
60 / 1452 : pp = 226.22061157226562 
70 / 1452 : pp = 227.01885986328125 
80 / 1452 : pp = 226.2459716796875 
90 / 1452 : pp = 224.3211669921875 
100 / 1452 : pp = 222.65615844726562 
110 / 1452 : pp = 220.70326232910156 
120 / 1452 : pp = 220.42288208007812 
130 / 1452 : pp = 219.8100128173828 
140 / 1452 : pp = 218.04432678222656 
150 / 1452 : pp = 217.31639099121094 
160 / 1452 : pp = 217.86349487304688 
170 / 1452 : pp = 217.46597290039062 
180 / 1452 : pp = 216.3349151611328 
190 / 1452 : pp = 216.12240600585938 
200 / 1452 : pp = 216.97842407226562 
210 / 1452 : pp = 216.51014709472656 
220 / 1452 : pp = 216.46751403808594 
230 / 1452 : pp = 216.80126953125 
240 / 1452 : pp = 216.45965576171875 
250 / 1452 : pp = 215.5008544921875 
260 / 1452 : pp = 214.62210083007812 
270 / 1452 : pp = 213.29183959960938 
280 / 1452 : pp = 213.5621337890625 
290 / 1452 : pp = 213.80657958984375 
300 / 1452 : pp = 213.8963165283203 
310 / 1452 : pp = 213.60653686523438 
320 / 1452 : pp = 213.85877990722656 
330 / 1452 : pp = 214.07345581054688 
340 / 1452 : pp = 213.25421142578125 
350 / 1452 : pp = 213.68019104003906 
360 / 1452 : pp = 213.41717529296875 
370 / 1452 : pp = 213.04920959472656 
380 / 1452 : pp = 212.39019775390625 
390 / 1452 : pp = 212.4908905029297 
400 / 1452 : pp = 212.01914978027344 
410 / 1452 : pp = 212.36903381347656 
420 / 1452 : pp = 212.6802520751953 
430 / 1452 : pp = 212.42697143554688 
440 / 1452 : pp = 212.42990112304688 
450 / 1452 : pp = 212.14524841308594 
460 / 1452 : pp = 211.7836151123047 
470 / 1452 : pp = 211.17282104492188 
480 / 1452 : pp = 210.27903747558594 
490 / 1452 : pp = 209.95211791992188 
500 / 1452 : pp = 209.28302001953125 
510 / 1452 : pp = 209.1029815673828 
520 / 1452 : pp = 208.73855590820312 
530 / 1452 : pp = 208.19700622558594 
540 / 1452 : pp = 207.4554443359375 
550 / 1452 : pp = 207.0062255859375 
560 / 1452 : pp = 206.59739685058594 
570 / 1452 : pp = 206.27874755859375 
580 / 1452 : pp = 205.87144470214844 
590 / 1452 : pp = 205.43545532226562 
600 / 1452 : pp = 204.90940856933594 
610 / 1452 : pp = 204.5686798095703 
620 / 1452 : pp = 204.22862243652344 
630 / 1452 : pp = 203.8448028564453 
640 / 1452 : pp = 204.06576538085938 
650 / 1452 : pp = 204.0941925048828 
660 / 1452 : pp = 204.22103881835938 
670 / 1452 : pp = 204.289794921875 
680 / 1452 : pp = 204.3115234375 
690 / 1452 : pp = 204.10284423828125 
700 / 1452 : pp = 203.99757385253906 
710 / 1452 : pp = 204.04971313476562 
720 / 1452 : pp = 204.03152465820312 
730 / 1452 : pp = 203.99046325683594 
740 / 1452 : pp = 204.19786071777344 
750 / 1452 : pp = 204.1642608642578 
760 / 1452 : pp = 204.19435119628906 
770 / 1452 : pp = 204.37786865234375 
780 / 1452 : pp = 204.4965057373047 
790 / 1452 : pp = 204.6479034423828 
800 / 1452 : pp = 204.56117248535156 
810 / 1452 : pp = 204.52284240722656 
820 / 1452 : pp = 204.50978088378906 
830 / 1452 : pp = 204.7531280517578 
840 / 1452 : pp = 204.64468383789062 
850 / 1452 : pp = 204.71348571777344 
860 / 1452 : pp = 204.7399444580078 
870 / 1452 : pp = 204.69406127929688 
880 / 1452 : pp = 204.7965850830078 
890 / 1452 : pp = 204.7594757080078 
900 / 1452 : pp = 204.71446228027344 
910 / 1452 : pp = 204.7590789794922 
920 / 1452 : pp = 204.85772705078125 
930 / 1452 : pp = 204.7428741455078 
940 / 1452 : pp = 204.8068389892578 
950 / 1452 : pp = 204.75791931152344 
960 / 1452 : pp = 204.63815307617188 
970 / 1452 : pp = 204.60760498046875 
980 / 1452 : pp = 204.34347534179688 
990 / 1452 : pp = 204.151611328125 
1000 / 1452 : pp = 203.8665771484375 
1010 / 1452 : pp = 203.9164581298828 
1020 / 1452 : pp = 204.0184783935547 
1030 / 1452 : pp = 203.95166015625 
1040 / 1452 : pp = 204.03045654296875 
1050 / 1452 : pp = 203.95846557617188 
1060 / 1452 : pp = 203.77114868164062 
1070 / 1452 : pp = 203.93260192871094 
1080 / 1452 : pp = 204.00048828125 
1090 / 1452 : pp = 204.00233459472656 
1100 / 1452 : pp = 203.8960418701172 
1110 / 1452 : pp = 203.5987548828125 
1120 / 1452 : pp = 203.38392639160156 
1130 / 1452 : pp = 203.08872985839844 
1140 / 1452 : pp = 203.01272583007812 
1150 / 1452 : pp = 203.0865936279297 
1160 / 1452 : pp = 203.02308654785156 
1170 / 1452 : pp = 202.9125518798828 
1180 / 1452 : pp = 202.9097442626953 
1190 / 1452 : pp = 202.98252868652344 
1200 / 1452 : pp = 202.95387268066406 
1210 / 1452 : pp = 202.851318359375 
1220 / 1452 : pp = 202.97671508789062 
1230 / 1452 : pp = 203.1051025390625 
1240 / 1452 : pp = 203.0526123046875 
1250 / 1452 : pp = 203.21417236328125 
1260 / 1452 : pp = 203.23617553710938 
1270 / 1452 : pp = 203.22802734375 
1280 / 1452 : pp = 203.20846557617188 
1290 / 1452 : pp = 203.15362548828125 
1300 / 1452 : pp = 203.14315795898438 
1310 / 1452 : pp = 203.15264892578125 
1320 / 1452 : pp = 203.02801513671875 
1330 / 1452 : pp = 202.92977905273438 
1340 / 1452 : pp = 202.95484924316406 
1350 / 1452 : pp = 202.9335479736328 
1360 / 1452 : pp = 202.955322265625 
1370 / 1452 : pp = 202.7740478515625 
1380 / 1452 : pp = 202.68569946289062 
1390 / 1452 : pp = 202.55816650390625 
1400 / 1452 : pp = 202.41651916503906 
1410 / 1452 : pp = 202.38494873046875 
1420 / 1452 : pp = 202.27593994140625 
1430 / 1452 : pp = 202.21826171875 
1440 / 1452 : pp = 202.23272705078125 
1450 / 1452 : pp = 202.16099548339844 

0 / 115 : pp = 253.23211669921875 
10 / 115 : pp = 237.62506103515625 
20 / 115 : pp = 237.60557556152344 
30 / 115 : pp = 234.9273223876953 
40 / 115 : pp = 234.30519104003906 
50 / 115 : pp = 229.43960571289062 
60 / 115 : pp = 228.6050567626953 
70 / 115 : pp = 225.2646484375 
80 / 115 : pp = 222.55935668945312 
90 / 115 : pp = 219.83255004882812 
100 / 115 : pp = 215.5491485595703 
110 / 115 : pp = 214.07937622070312 
Training perplexity: 202.1349639892578
Validation perplexity:213.85256958007812
Total time : 42.10724234580994
Epoch 5

0 / 1452 : pp = 255.92384338378906 
10 / 1452 : pp = 219.5322265625 
20 / 1452 : pp = 214.36212158203125 
30 / 1452 : pp = 209.12620544433594 
40 / 1452 : pp = 210.04193115234375 
50 / 1452 : pp = 205.77398681640625 
60 / 1452 : pp = 204.8201141357422 
70 / 1452 : pp = 205.3955841064453 
80 / 1452 : pp = 204.8386688232422 
90 / 1452 : pp = 203.21194458007812 
100 / 1452 : pp = 201.87643432617188 
110 / 1452 : pp = 200.10122680664062 
120 / 1452 : pp = 199.82012939453125 
130 / 1452 : pp = 199.11192321777344 
140 / 1452 : pp = 197.51919555664062 
150 / 1452 : pp = 197.03567504882812 
160 / 1452 : pp = 197.4231414794922 
170 / 1452 : pp = 197.09571838378906 
180 / 1452 : pp = 196.17665100097656 
190 / 1452 : pp = 196.0064697265625 
200 / 1452 : pp = 196.7347869873047 
210 / 1452 : pp = 196.3063507080078 
220 / 1452 : pp = 196.21388244628906 
230 / 1452 : pp = 196.5252227783203 
240 / 1452 : pp = 196.203125 
250 / 1452 : pp = 195.3251953125 
260 / 1452 : pp = 194.53335571289062 
270 / 1452 : pp = 193.3546142578125 
280 / 1452 : pp = 193.59420776367188 
290 / 1452 : pp = 193.83297729492188 
300 / 1452 : pp = 193.98489379882812 
310 / 1452 : pp = 193.68414306640625 
320 / 1452 : pp = 193.89065551757812 
330 / 1452 : pp = 194.0518798828125 
340 / 1452 : pp = 193.32888793945312 
350 / 1452 : pp = 193.76219177246094 
360 / 1452 : pp = 193.56106567382812 
370 / 1452 : pp = 193.28179931640625 
380 / 1452 : pp = 192.7037811279297 
390 / 1452 : pp = 192.8145294189453 
400 / 1452 : pp = 192.43325805664062 
410 / 1452 : pp = 192.81527709960938 
420 / 1452 : pp = 193.13760375976562 
430 / 1452 : pp = 192.9148712158203 
440 / 1452 : pp = 192.92526245117188 
450 / 1452 : pp = 192.70083618164062 
460 / 1452 : pp = 192.36647033691406 
470 / 1452 : pp = 191.85394287109375 
480 / 1452 : pp = 191.07244873046875 
490 / 1452 : pp = 190.75401306152344 
500 / 1452 : pp = 190.1843719482422 
510 / 1452 : pp = 190.03334045410156 
520 / 1452 : pp = 189.72938537597656 
530 / 1452 : pp = 189.25889587402344 
540 / 1452 : pp = 188.59315490722656 
550 / 1452 : pp = 188.19313049316406 
560 / 1452 : pp = 187.80621337890625 
570 / 1452 : pp = 187.5229034423828 
580 / 1452 : pp = 187.1091766357422 
590 / 1452 : pp = 186.72592163085938 
600 / 1452 : pp = 186.2238006591797 
610 / 1452 : pp = 185.89695739746094 
620 / 1452 : pp = 185.60989379882812 
630 / 1452 : pp = 185.2689208984375 
640 / 1452 : pp = 185.47567749023438 
650 / 1452 : pp = 185.5127410888672 
660 / 1452 : pp = 185.64627075195312 
670 / 1452 : pp = 185.71311950683594 
680 / 1452 : pp = 185.72569274902344 
690 / 1452 : pp = 185.56459045410156 
700 / 1452 : pp = 185.48681640625 
710 / 1452 : pp = 185.5458221435547 
720 / 1452 : pp = 185.5598907470703 
730 / 1452 : pp = 185.5335235595703 
740 / 1452 : pp = 185.73995971679688 
750 / 1452 : pp = 185.744384765625 
760 / 1452 : pp = 185.81268310546875 
770 / 1452 : pp = 186.00088500976562 
780 / 1452 : pp = 186.14443969726562 
790 / 1452 : pp = 186.30764770507812 
800 / 1452 : pp = 186.2595977783203 
810 / 1452 : pp = 186.23028564453125 
820 / 1452 : pp = 186.23997497558594 
830 / 1452 : pp = 186.49057006835938 
840 / 1452 : pp = 186.43331909179688 
850 / 1452 : pp = 186.48887634277344 
860 / 1452 : pp = 186.51502990722656 
870 / 1452 : pp = 186.5167999267578 
880 / 1452 : pp = 186.62400817871094 
890 / 1452 : pp = 186.6103973388672 
900 / 1452 : pp = 186.58111572265625 
910 / 1452 : pp = 186.64126586914062 
920 / 1452 : pp = 186.7366180419922 
930 / 1452 : pp = 186.65719604492188 
940 / 1452 : pp = 186.71755981445312 
950 / 1452 : pp = 186.6977996826172 
960 / 1452 : pp = 186.62774658203125 
970 / 1452 : pp = 186.62115478515625 
980 / 1452 : pp = 186.3773193359375 
990 / 1452 : pp = 186.23109436035156 
1000 / 1452 : pp = 185.99227905273438 
1010 / 1452 : pp = 186.0488739013672 
1020 / 1452 : pp = 186.1744384765625 
1030 / 1452 : pp = 186.1162109375 
1040 / 1452 : pp = 186.18899536132812 
1050 / 1452 : pp = 186.1549072265625 
1060 / 1452 : pp = 186.01419067382812 
1070 / 1452 : pp = 186.17364501953125 
1080 / 1452 : pp = 186.27061462402344 
1090 / 1452 : pp = 186.28428649902344 
1100 / 1452 : pp = 186.2150115966797 
1110 / 1452 : pp = 185.95103454589844 
1120 / 1452 : pp = 185.77423095703125 
1130 / 1452 : pp = 185.5232696533203 
1140 / 1452 : pp = 185.4607391357422 
1150 / 1452 : pp = 185.56077575683594 
1160 / 1452 : pp = 185.53343200683594 
1170 / 1452 : pp = 185.46453857421875 
1180 / 1452 : pp = 185.4741668701172 
1190 / 1452 : pp = 185.5594482421875 
1200 / 1452 : pp = 185.53785705566406 
1210 / 1452 : pp = 185.4576416015625 
1220 / 1452 : pp = 185.5943145751953 
1230 / 1452 : pp = 185.7483673095703 
1240 / 1452 : pp = 185.70762634277344 
1250 / 1452 : pp = 185.8568115234375 
1260 / 1452 : pp = 185.90635681152344 
1270 / 1452 : pp = 185.8961639404297 
1280 / 1452 : pp = 185.89199829101562 
1290 / 1452 : pp = 185.85911560058594 
1300 / 1452 : pp = 185.86097717285156 
1310 / 1452 : pp = 185.88739013671875 
1320 / 1452 : pp = 185.79248046875 
1330 / 1452 : pp = 185.69700622558594 
1340 / 1452 : pp = 185.7310028076172 
1350 / 1452 : pp = 185.72613525390625 
1360 / 1452 : pp = 185.76829528808594 
1370 / 1452 : pp = 185.6322021484375 
1380 / 1452 : pp = 185.56378173828125 
1390 / 1452 : pp = 185.4654998779297 
1400 / 1452 : pp = 185.35110473632812 
1410 / 1452 : pp = 185.33917236328125 
1420 / 1452 : pp = 185.2509002685547 
1430 / 1452 : pp = 185.20436096191406 
1440 / 1452 : pp = 185.2254638671875 
1450 / 1452 : pp = 185.16542053222656 

0 / 115 : pp = 242.26800537109375 
10 / 115 : pp = 226.12258911132812 
20 / 115 : pp = 226.4702606201172 
30 / 115 : pp = 223.982666015625 
40 / 115 : pp = 223.376953125 
50 / 115 : pp = 218.65716552734375 
60 / 115 : pp = 217.95306396484375 
70 / 115 : pp = 214.5392303466797 
80 / 115 : pp = 212.07525634765625 
90 / 115 : pp = 209.40631103515625 
100 / 115 : pp = 205.1455078125 
110 / 115 : pp = 203.6289520263672 
Training perplexity: 185.14476013183594
Validation perplexity:203.3822784423828
Total time : 42.47052240371704
Epoch 6

0 / 1452 : pp = 233.56707763671875 
10 / 1452 : pp = 202.6468505859375 
20 / 1452 : pp = 198.2734375 
30 / 1452 : pp = 193.47442626953125 
40 / 1452 : pp = 195.17147827148438 
50 / 1452 : pp = 191.5596923828125 
60 / 1452 : pp = 190.4825897216797 
70 / 1452 : pp = 191.07681274414062 
80 / 1452 : pp = 190.339599609375 
90 / 1452 : pp = 188.98277282714844 
100 / 1452 : pp = 187.74757385253906 
110 / 1452 : pp = 186.10104370117188 
120 / 1452 : pp = 185.7500457763672 
130 / 1452 : pp = 184.90707397460938 
140 / 1452 : pp = 183.340087890625 
150 / 1452 : pp = 182.70840454101562 
160 / 1452 : pp = 183.1043701171875 
170 / 1452 : pp = 182.69776916503906 
180 / 1452 : pp = 181.88400268554688 
190 / 1452 : pp = 181.8062286376953 
200 / 1452 : pp = 182.4969940185547 
210 / 1452 : pp = 182.10572814941406 
220 / 1452 : pp = 181.9981689453125 
230 / 1452 : pp = 182.3802490234375 
240 / 1452 : pp = 182.03636169433594 
250 / 1452 : pp = 181.23712158203125 
260 / 1452 : pp = 180.53726196289062 
270 / 1452 : pp = 179.53567504882812 
280 / 1452 : pp = 179.70208740234375 
290 / 1452 : pp = 179.977783203125 
300 / 1452 : pp = 180.16600036621094 
310 / 1452 : pp = 179.87294006347656 
320 / 1452 : pp = 180.11849975585938 
330 / 1452 : pp = 180.31838989257812 
340 / 1452 : pp = 179.56759643554688 
350 / 1452 : pp = 179.97134399414062 
360 / 1452 : pp = 179.80030822753906 
370 / 1452 : pp = 179.52085876464844 
380 / 1452 : pp = 178.98228454589844 
390 / 1452 : pp = 179.0868682861328 
400 / 1452 : pp = 178.74569702148438 
410 / 1452 : pp = 179.1776580810547 
420 / 1452 : pp = 179.5055389404297 
430 / 1452 : pp = 179.3883056640625 
440 / 1452 : pp = 179.42279052734375 
450 / 1452 : pp = 179.2106475830078 
460 / 1452 : pp = 178.85311889648438 
470 / 1452 : pp = 178.33840942382812 
480 / 1452 : pp = 177.60350036621094 
490 / 1452 : pp = 177.30335998535156 
500 / 1452 : pp = 176.72222900390625 
510 / 1452 : pp = 176.6067352294922 
520 / 1452 : pp = 176.33998107910156 
530 / 1452 : pp = 175.93162536621094 
540 / 1452 : pp = 175.30657958984375 
550 / 1452 : pp = 174.9462432861328 
560 / 1452 : pp = 174.5836639404297 
570 / 1452 : pp = 174.31431579589844 
580 / 1452 : pp = 173.92300415039062 
590 / 1452 : pp = 173.55856323242188 
600 / 1452 : pp = 173.08277893066406 
610 / 1452 : pp = 172.75930786132812 
620 / 1452 : pp = 172.53192138671875 
630 / 1452 : pp = 172.20652770996094 
640 / 1452 : pp = 172.37454223632812 
650 / 1452 : pp = 172.39845275878906 
660 / 1452 : pp = 172.52255249023438 
670 / 1452 : pp = 172.60935974121094 
680 / 1452 : pp = 172.6611328125 
690 / 1452 : pp = 172.53118896484375 
700 / 1452 : pp = 172.4709014892578 
710 / 1452 : pp = 172.5406494140625 
720 / 1452 : pp = 172.55447387695312 
730 / 1452 : pp = 172.5330047607422 
740 / 1452 : pp = 172.7061767578125 
750 / 1452 : pp = 172.71054077148438 
760 / 1452 : pp = 172.77743530273438 
770 / 1452 : pp = 172.95481872558594 
780 / 1452 : pp = 173.11265563964844 
790 / 1452 : pp = 173.2832794189453 
800 / 1452 : pp = 173.2537841796875 
810 / 1452 : pp = 173.22164916992188 
820 / 1452 : pp = 173.24148559570312 
830 / 1452 : pp = 173.48228454589844 
840 / 1452 : pp = 173.43753051757812 
850 / 1452 : pp = 173.505615234375 
860 / 1452 : pp = 173.5214080810547 
870 / 1452 : pp = 173.5009002685547 
880 / 1452 : pp = 173.6202392578125 
890 / 1452 : pp = 173.622802734375 
900 / 1452 : pp = 173.5987091064453 
910 / 1452 : pp = 173.68316650390625 
920 / 1452 : pp = 173.77330017089844 
930 / 1452 : pp = 173.72018432617188 
940 / 1452 : pp = 173.79351806640625 
950 / 1452 : pp = 173.7653350830078 
960 / 1452 : pp = 173.7102508544922 
970 / 1452 : pp = 173.69766235351562 
980 / 1452 : pp = 173.4836883544922 
990 / 1452 : pp = 173.3550262451172 
1000 / 1452 : pp = 173.14816284179688 
1010 / 1452 : pp = 173.20777893066406 
1020 / 1452 : pp = 173.3390655517578 
1030 / 1452 : pp = 173.2884063720703 
1040 / 1452 : pp = 173.38015747070312 
1050 / 1452 : pp = 173.35592651367188 
1060 / 1452 : pp = 173.2260284423828 
1070 / 1452 : pp = 173.39321899414062 
1080 / 1452 : pp = 173.4879913330078 
1090 / 1452 : pp = 173.5231475830078 
1100 / 1452 : pp = 173.47177124023438 
1110 / 1452 : pp = 173.24453735351562 
1120 / 1452 : pp = 173.09408569335938 
1130 / 1452 : pp = 172.86627197265625 
1140 / 1452 : pp = 172.8234100341797 
1150 / 1452 : pp = 172.92843627929688 
1160 / 1452 : pp = 172.90065002441406 
1170 / 1452 : pp = 172.8550567626953 
1180 / 1452 : pp = 172.8810272216797 
1190 / 1452 : pp = 172.97312927246094 
1200 / 1452 : pp = 172.9776611328125 
1210 / 1452 : pp = 172.89413452148438 
1220 / 1452 : pp = 173.0257568359375 
1230 / 1452 : pp = 173.1847381591797 
1240 / 1452 : pp = 173.1756591796875 
1250 / 1452 : pp = 173.32138061523438 
1260 / 1452 : pp = 173.37229919433594 
1270 / 1452 : pp = 173.36891174316406 
1280 / 1452 : pp = 173.36337280273438 
1290 / 1452 : pp = 173.3444366455078 
1300 / 1452 : pp = 173.36138916015625 
1310 / 1452 : pp = 173.4015655517578 
1320 / 1452 : pp = 173.31790161132812 
1330 / 1452 : pp = 173.24710083007812 
1340 / 1452 : pp = 173.27212524414062 
1350 / 1452 : pp = 173.27674865722656 
1360 / 1452 : pp = 173.32749938964844 
1370 / 1452 : pp = 173.20472717285156 
1380 / 1452 : pp = 173.14889526367188 
1390 / 1452 : pp = 173.0755157470703 
1400 / 1452 : pp = 172.9678497314453 
1410 / 1452 : pp = 172.9612579345703 
1420 / 1452 : pp = 172.8872833251953 
1430 / 1452 : pp = 172.84805297851562 
1440 / 1452 : pp = 172.87252807617188 
1450 / 1452 : pp = 172.82505798339844 

0 / 115 : pp = 236.35635375976562 
10 / 115 : pp = 219.06166076660156 
20 / 115 : pp = 219.7670440673828 
30 / 115 : pp = 217.33587646484375 
40 / 115 : pp = 216.6626739501953 
50 / 115 : pp = 212.04734802246094 
60 / 115 : pp = 211.42068481445312 
70 / 115 : pp = 207.9592742919922 
80 / 115 : pp = 205.6216583251953 
90 / 115 : pp = 202.93597412109375 
100 / 115 : pp = 198.62583923339844 
110 / 115 : pp = 196.97216796875 
Training perplexity: 172.80404663085938
Validation perplexity:196.6871337890625
Total time : 41.52522921562195
Epoch 7

0 / 1452 : pp = 219.23231506347656 
10 / 1452 : pp = 192.07225036621094 
20 / 1452 : pp = 187.48464965820312 
30 / 1452 : pp = 182.9149932861328 
40 / 1452 : pp = 184.2945098876953 
50 / 1452 : pp = 180.78492736816406 
60 / 1452 : pp = 179.377197265625 
70 / 1452 : pp = 180.0273895263672 
80 / 1452 : pp = 179.2517547607422 
90 / 1452 : pp = 177.77540588378906 
100 / 1452 : pp = 176.6474151611328 
110 / 1452 : pp = 174.84066772460938 
120 / 1452 : pp = 174.46890258789062 
130 / 1452 : pp = 173.64573669433594 
140 / 1452 : pp = 172.17483520507812 
150 / 1452 : pp = 171.57041931152344 
160 / 1452 : pp = 171.92059326171875 
170 / 1452 : pp = 171.5497283935547 
180 / 1452 : pp = 170.77249145507812 
190 / 1452 : pp = 170.72103881835938 
200 / 1452 : pp = 171.336181640625 
210 / 1452 : pp = 170.98524475097656 
220 / 1452 : pp = 170.99771118164062 
230 / 1452 : pp = 171.39918518066406 
240 / 1452 : pp = 171.09925842285156 
250 / 1452 : pp = 170.39962768554688 
260 / 1452 : pp = 169.7328643798828 
270 / 1452 : pp = 168.72225952148438 
280 / 1452 : pp = 168.92552185058594 
290 / 1452 : pp = 169.20147705078125 
300 / 1452 : pp = 169.40338134765625 
310 / 1452 : pp = 169.12057495117188 
320 / 1452 : pp = 169.31236267089844 
330 / 1452 : pp = 169.49945068359375 
340 / 1452 : pp = 168.8396759033203 
350 / 1452 : pp = 169.25917053222656 
360 / 1452 : pp = 169.09388732910156 
370 / 1452 : pp = 168.84323120117188 
380 / 1452 : pp = 168.3832550048828 
390 / 1452 : pp = 168.48275756835938 
400 / 1452 : pp = 168.19972229003906 
410 / 1452 : pp = 168.5838623046875 
420 / 1452 : pp = 168.91119384765625 
430 / 1452 : pp = 168.80836486816406 
440 / 1452 : pp = 168.90264892578125 
450 / 1452 : pp = 168.68589782714844 
460 / 1452 : pp = 168.3704071044922 
470 / 1452 : pp = 167.90394592285156 
480 / 1452 : pp = 167.23373413085938 
490 / 1452 : pp = 166.9560546875 
500 / 1452 : pp = 166.43161010742188 
510 / 1452 : pp = 166.320068359375 
520 / 1452 : pp = 166.05902099609375 
530 / 1452 : pp = 165.71714782714844 
540 / 1452 : pp = 165.10398864746094 
550 / 1452 : pp = 164.80430603027344 
560 / 1452 : pp = 164.4687042236328 
570 / 1452 : pp = 164.2272491455078 
580 / 1452 : pp = 163.84312438964844 
590 / 1452 : pp = 163.46035766601562 
600 / 1452 : pp = 163.01559448242188 
610 / 1452 : pp = 162.74134826660156 
620 / 1452 : pp = 162.50267028808594 
630 / 1452 : pp = 162.2018280029297 
640 / 1452 : pp = 162.37130737304688 
650 / 1452 : pp = 162.3895721435547 
660 / 1452 : pp = 162.51351928710938 
670 / 1452 : pp = 162.57684326171875 
680 / 1452 : pp = 162.6346893310547 
690 / 1452 : pp = 162.5135955810547 
700 / 1452 : pp = 162.47052001953125 
710 / 1452 : pp = 162.539794921875 
720 / 1452 : pp = 162.55381774902344 
730 / 1452 : pp = 162.5297088623047 
740 / 1452 : pp = 162.71652221679688 
750 / 1452 : pp = 162.740966796875 
760 / 1452 : pp = 162.79754638671875 
770 / 1452 : pp = 162.9949951171875 
780 / 1452 : pp = 163.17868041992188 
790 / 1452 : pp = 163.33055114746094 
800 / 1452 : pp = 163.31591796875 
810 / 1452 : pp = 163.2859344482422 
820 / 1452 : pp = 163.2958984375 
830 / 1452 : pp = 163.528564453125 
840 / 1452 : pp = 163.47610473632812 
850 / 1452 : pp = 163.5260772705078 
860 / 1452 : pp = 163.55352783203125 
870 / 1452 : pp = 163.55718994140625 
880 / 1452 : pp = 163.67523193359375 
890 / 1452 : pp = 163.6920166015625 
900 / 1452 : pp = 163.67710876464844 
910 / 1452 : pp = 163.7476806640625 
920 / 1452 : pp = 163.84803771972656 
930 / 1452 : pp = 163.8114013671875 
940 / 1452 : pp = 163.86663818359375 
950 / 1452 : pp = 163.83531188964844 
960 / 1452 : pp = 163.79945373535156 
970 / 1452 : pp = 163.80320739746094 
980 / 1452 : pp = 163.5953369140625 
990 / 1452 : pp = 163.48382568359375 
1000 / 1452 : pp = 163.2642822265625 
1010 / 1452 : pp = 163.32113647460938 
1020 / 1452 : pp = 163.44204711914062 
1030 / 1452 : pp = 163.40206909179688 
1040 / 1452 : pp = 163.4915313720703 
1050 / 1452 : pp = 163.47096252441406 
1060 / 1452 : pp = 163.3601531982422 
1070 / 1452 : pp = 163.5138397216797 
1080 / 1452 : pp = 163.6189727783203 
1090 / 1452 : pp = 163.6471405029297 
1100 / 1452 : pp = 163.60406494140625 
1110 / 1452 : pp = 163.40736389160156 
1120 / 1452 : pp = 163.26841735839844 
1130 / 1452 : pp = 163.0680694580078 
1140 / 1452 : pp = 163.04591369628906 
1150 / 1452 : pp = 163.15478515625 
1160 / 1452 : pp = 163.1380615234375 
1170 / 1452 : pp = 163.09303283691406 
1180 / 1452 : pp = 163.14149475097656 
1190 / 1452 : pp = 163.2374267578125 
1200 / 1452 : pp = 163.2394561767578 
1210 / 1452 : pp = 163.17835998535156 
1220 / 1452 : pp = 163.32347106933594 
1230 / 1452 : pp = 163.4639434814453 
1240 / 1452 : pp = 163.4611358642578 
1250 / 1452 : pp = 163.60687255859375 
1260 / 1452 : pp = 163.67227172851562 
1270 / 1452 : pp = 163.67515563964844 
1280 / 1452 : pp = 163.6881103515625 
1290 / 1452 : pp = 163.66648864746094 
1300 / 1452 : pp = 163.69287109375 
1310 / 1452 : pp = 163.7276153564453 
1320 / 1452 : pp = 163.6551055908203 
1330 / 1452 : pp = 163.58901977539062 
1340 / 1452 : pp = 163.6205291748047 
1350 / 1452 : pp = 163.63824462890625 
1360 / 1452 : pp = 163.69334411621094 
1370 / 1452 : pp = 163.5885467529297 
1380 / 1452 : pp = 163.54049682617188 
1390 / 1452 : pp = 163.4760284423828 
1400 / 1452 : pp = 163.38897705078125 
1410 / 1452 : pp = 163.3974609375 
1420 / 1452 : pp = 163.35009765625 
1430 / 1452 : pp = 163.32191467285156 
1440 / 1452 : pp = 163.35220336914062 
1450 / 1452 : pp = 163.3201904296875 

0 / 115 : pp = 232.2108154296875 
10 / 115 : pp = 214.35496520996094 
20 / 115 : pp = 215.20510864257812 
30 / 115 : pp = 212.82754516601562 
40 / 115 : pp = 212.0598907470703 
50 / 115 : pp = 207.5095672607422 
60 / 115 : pp = 206.86976623535156 
70 / 115 : pp = 203.36016845703125 
80 / 115 : pp = 201.11538696289062 
90 / 115 : pp = 198.52120971679688 
100 / 115 : pp = 194.1772003173828 
110 / 115 : pp = 192.41224670410156 
Training perplexity: 163.29916381835938
Validation perplexity:192.09552001953125
Total time : 41.78096055984497
Epoch 8

0 / 1452 : pp = 201.77548217773438 
10 / 1452 : pp = 180.4141082763672 
20 / 1452 : pp = 176.41432189941406 
30 / 1452 : pp = 172.7764434814453 
40 / 1452 : pp = 174.69166564941406 
50 / 1452 : pp = 171.2933807373047 
60 / 1452 : pp = 170.08010864257812 
70 / 1452 : pp = 170.6719512939453 
80 / 1452 : pp = 170.07589721679688 
90 / 1452 : pp = 168.7478485107422 
100 / 1452 : pp = 167.57081604003906 
110 / 1452 : pp = 166.06971740722656 
120 / 1452 : pp = 165.73374938964844 
130 / 1452 : pp = 164.80674743652344 
140 / 1452 : pp = 163.32821655273438 
150 / 1452 : pp = 162.6752471923828 
160 / 1452 : pp = 163.02049255371094 
170 / 1452 : pp = 162.64120483398438 
180 / 1452 : pp = 161.95529174804688 
190 / 1452 : pp = 161.91954040527344 
200 / 1452 : pp = 162.5446014404297 
210 / 1452 : pp = 162.2645721435547 
220 / 1452 : pp = 162.3128662109375 
230 / 1452 : pp = 162.65872192382812 
240 / 1452 : pp = 162.40948486328125 
250 / 1452 : pp = 161.75787353515625 
260 / 1452 : pp = 161.15213012695312 
270 / 1452 : pp = 160.22256469726562 
280 / 1452 : pp = 160.3651123046875 
290 / 1452 : pp = 160.63780212402344 
300 / 1452 : pp = 160.80026245117188 
310 / 1452 : pp = 160.54383850097656 
320 / 1452 : pp = 160.7539520263672 
330 / 1452 : pp = 160.94317626953125 
340 / 1452 : pp = 160.3373565673828 
350 / 1452 : pp = 160.71763610839844 
360 / 1452 : pp = 160.60960388183594 
370 / 1452 : pp = 160.37527465820312 
380 / 1452 : pp = 159.92990112304688 
390 / 1452 : pp = 160.0165557861328 
400 / 1452 : pp = 159.75697326660156 
410 / 1452 : pp = 160.15274047851562 
420 / 1452 : pp = 160.48390197753906 
430 / 1452 : pp = 160.4031982421875 
440 / 1452 : pp = 160.4693603515625 
450 / 1452 : pp = 160.28016662597656 
460 / 1452 : pp = 159.94004821777344 
470 / 1452 : pp = 159.48257446289062 
480 / 1452 : pp = 158.87998962402344 
490 / 1452 : pp = 158.59765625 
500 / 1452 : pp = 158.10865783691406 
510 / 1452 : pp = 157.96795654296875 
520 / 1452 : pp = 157.7591552734375 
530 / 1452 : pp = 157.42648315429688 
540 / 1452 : pp = 156.85348510742188 
550 / 1452 : pp = 156.5618438720703 
560 / 1452 : pp = 156.24905395507812 
570 / 1452 : pp = 155.9994354248047 
580 / 1452 : pp = 155.612060546875 
590 / 1452 : pp = 155.25830078125 
600 / 1452 : pp = 154.8464813232422 
610 / 1452 : pp = 154.5833282470703 
620 / 1452 : pp = 154.38040161132812 
630 / 1452 : pp = 154.0767364501953 
640 / 1452 : pp = 154.2534637451172 
650 / 1452 : pp = 154.25875854492188 
660 / 1452 : pp = 154.35874938964844 
670 / 1452 : pp = 154.4289093017578 
680 / 1452 : pp = 154.51412963867188 
690 / 1452 : pp = 154.41676330566406 
700 / 1452 : pp = 154.37892150878906 
710 / 1452 : pp = 154.4234619140625 
720 / 1452 : pp = 154.4586639404297 
730 / 1452 : pp = 154.4351806640625 
740 / 1452 : pp = 154.6002197265625 
750 / 1452 : pp = 154.65684509277344 
760 / 1452 : pp = 154.73318481445312 
770 / 1452 : pp = 154.92935180664062 
780 / 1452 : pp = 155.1021728515625 
790 / 1452 : pp = 155.24757385253906 
800 / 1452 : pp = 155.223876953125 
810 / 1452 : pp = 155.2095184326172 
820 / 1452 : pp = 155.24009704589844 
830 / 1452 : pp = 155.4519500732422 
840 / 1452 : pp = 155.3947296142578 
850 / 1452 : pp = 155.45306396484375 
860 / 1452 : pp = 155.4661102294922 
870 / 1452 : pp = 155.45765686035156 
880 / 1452 : pp = 155.58758544921875 
890 / 1452 : pp = 155.59373474121094 
900 / 1452 : pp = 155.59254455566406 
910 / 1452 : pp = 155.66854858398438 
920 / 1452 : pp = 155.75942993164062 
930 / 1452 : pp = 155.73350524902344 
940 / 1452 : pp = 155.80740356445312 
950 / 1452 : pp = 155.7733917236328 
960 / 1452 : pp = 155.73565673828125 
970 / 1452 : pp = 155.74404907226562 
980 / 1452 : pp = 155.55902099609375 
990 / 1452 : pp = 155.45675659179688 
1000 / 1452 : pp = 155.2649688720703 
1010 / 1452 : pp = 155.31332397460938 
1020 / 1452 : pp = 155.44979858398438 
1030 / 1452 : pp = 155.4137725830078 
1040 / 1452 : pp = 155.49012756347656 
1050 / 1452 : pp = 155.46054077148438 
1060 / 1452 : pp = 155.3616943359375 
1070 / 1452 : pp = 155.5286865234375 
1080 / 1452 : pp = 155.63743591308594 
1090 / 1452 : pp = 155.6842803955078 
1100 / 1452 : pp = 155.65599060058594 
1110 / 1452 : pp = 155.4827880859375 
1120 / 1452 : pp = 155.35450744628906 
1130 / 1452 : pp = 155.1777801513672 
1140 / 1452 : pp = 155.15994262695312 
1150 / 1452 : pp = 155.26193237304688 
1160 / 1452 : pp = 155.26214599609375 
1170 / 1452 : pp = 155.23231506347656 
1180 / 1452 : pp = 155.29266357421875 
1190 / 1452 : pp = 155.37680053710938 
1200 / 1452 : pp = 155.3736114501953 
1210 / 1452 : pp = 155.3380584716797 
1220 / 1452 : pp = 155.474853515625 
1230 / 1452 : pp = 155.62986755371094 
1240 / 1452 : pp = 155.62831115722656 
1250 / 1452 : pp = 155.77101135253906 
1260 / 1452 : pp = 155.83445739746094 
1270 / 1452 : pp = 155.845458984375 
1280 / 1452 : pp = 155.8556365966797 
1290 / 1452 : pp = 155.8556365966797 
1300 / 1452 : pp = 155.8843994140625 
1310 / 1452 : pp = 155.92417907714844 
1320 / 1452 : pp = 155.8560791015625 
1330 / 1452 : pp = 155.80636596679688 
1340 / 1452 : pp = 155.84344482421875 
1350 / 1452 : pp = 155.8706512451172 
1360 / 1452 : pp = 155.9273681640625 
1370 / 1452 : pp = 155.83140563964844 
1380 / 1452 : pp = 155.7911376953125 
1390 / 1452 : pp = 155.7401885986328 
1400 / 1452 : pp = 155.6622314453125 
1410 / 1452 : pp = 155.68531799316406 
1420 / 1452 : pp = 155.64041137695312 
1430 / 1452 : pp = 155.62216186523438 
1440 / 1452 : pp = 155.6437530517578 
1450 / 1452 : pp = 155.62757873535156 

0 / 115 : pp = 228.70111083984375 
10 / 115 : pp = 211.03330993652344 
20 / 115 : pp = 212.24957275390625 
30 / 115 : pp = 209.8839569091797 
40 / 115 : pp = 209.11045837402344 
50 / 115 : pp = 204.66351318359375 
60 / 115 : pp = 204.03366088867188 
70 / 115 : pp = 200.46681213378906 
80 / 115 : pp = 198.24404907226562 
90 / 115 : pp = 195.63223266601562 
100 / 115 : pp = 191.18345642089844 
110 / 115 : pp = 189.31134033203125 
Training perplexity: 155.61154174804688
Validation perplexity:188.94537353515625
Total time : 42.13483738899231
Epoch 9

0 / 1452 : pp = 197.80628967285156 
10 / 1452 : pp = 172.6316680908203 
20 / 1452 : pp = 168.6739959716797 
30 / 1452 : pp = 164.4781036376953 
40 / 1452 : pp = 166.1627960205078 
50 / 1452 : pp = 163.05197143554688 
60 / 1452 : pp = 161.87924194335938 
70 / 1452 : pp = 162.5297088623047 
80 / 1452 : pp = 161.7450714111328 
90 / 1452 : pp = 160.6148223876953 
100 / 1452 : pp = 159.73289489746094 
110 / 1452 : pp = 158.4092254638672 
120 / 1452 : pp = 158.04653930664062 
130 / 1452 : pp = 157.13563537597656 
140 / 1452 : pp = 155.71798706054688 
150 / 1452 : pp = 155.19161987304688 
160 / 1452 : pp = 155.42718505859375 
170 / 1452 : pp = 155.0531463623047 
180 / 1452 : pp = 154.46897888183594 
190 / 1452 : pp = 154.4127197265625 
200 / 1452 : pp = 154.97154235839844 
210 / 1452 : pp = 154.70169067382812 
220 / 1452 : pp = 154.72816467285156 
230 / 1452 : pp = 155.03799438476562 
240 / 1452 : pp = 154.85601806640625 
250 / 1452 : pp = 154.28016662597656 
260 / 1452 : pp = 153.7699432373047 
270 / 1452 : pp = 152.90948486328125 
280 / 1452 : pp = 153.0459747314453 
290 / 1452 : pp = 153.298095703125 
300 / 1452 : pp = 153.45716857910156 
310 / 1452 : pp = 153.22195434570312 
320 / 1452 : pp = 153.41664123535156 
330 / 1452 : pp = 153.66542053222656 
340 / 1452 : pp = 153.06378173828125 
350 / 1452 : pp = 153.43923950195312 
360 / 1452 : pp = 153.31381225585938 
370 / 1452 : pp = 153.13473510742188 
380 / 1452 : pp = 152.75267028808594 
390 / 1452 : pp = 152.85504150390625 
400 / 1452 : pp = 152.62342834472656 
410 / 1452 : pp = 153.03152465820312 
420 / 1452 : pp = 153.39161682128906 
430 / 1452 : pp = 153.30364990234375 
440 / 1452 : pp = 153.37896728515625 
450 / 1452 : pp = 153.18988037109375 
460 / 1452 : pp = 152.88478088378906 
470 / 1452 : pp = 152.4380340576172 
480 / 1452 : pp = 151.86618041992188 
490 / 1452 : pp = 151.5962371826172 
500 / 1452 : pp = 151.11614990234375 
510 / 1452 : pp = 150.99830627441406 
520 / 1452 : pp = 150.8135986328125 
530 / 1452 : pp = 150.500732421875 
540 / 1452 : pp = 149.9623260498047 
550 / 1452 : pp = 149.68028259277344 
560 / 1452 : pp = 149.3885040283203 
570 / 1452 : pp = 149.140380859375 
580 / 1452 : pp = 148.76876831054688 
590 / 1452 : pp = 148.43368530273438 
600 / 1452 : pp = 148.02598571777344 
610 / 1452 : pp = 147.7869110107422 
620 / 1452 : pp = 147.59796142578125 
630 / 1452 : pp = 147.30068969726562 
640 / 1452 : pp = 147.45240783691406 
650 / 1452 : pp = 147.4651336669922 
660 / 1452 : pp = 147.5808563232422 
670 / 1452 : pp = 147.65582275390625 
680 / 1452 : pp = 147.7360382080078 
690 / 1452 : pp = 147.63075256347656 
700 / 1452 : pp = 147.6066131591797 
710 / 1452 : pp = 147.7024383544922 
720 / 1452 : pp = 147.7445526123047 
730 / 1452 : pp = 147.72279357910156 
740 / 1452 : pp = 147.87107849121094 
750 / 1452 : pp = 147.91436767578125 
760 / 1452 : pp = 147.9857635498047 
770 / 1452 : pp = 148.18206787109375 
780 / 1452 : pp = 148.3845672607422 
790 / 1452 : pp = 148.5517120361328 
800 / 1452 : pp = 148.54002380371094 
810 / 1452 : pp = 148.51119995117188 
820 / 1452 : pp = 148.5664520263672 
830 / 1452 : pp = 148.7821044921875 
840 / 1452 : pp = 148.72486877441406 
850 / 1452 : pp = 148.77452087402344 
860 / 1452 : pp = 148.80076599121094 
870 / 1452 : pp = 148.79701232910156 
880 / 1452 : pp = 148.9181671142578 
890 / 1452 : pp = 148.94537353515625 
900 / 1452 : pp = 148.9435272216797 
910 / 1452 : pp = 149.02102661132812 
920 / 1452 : pp = 149.1085968017578 
930 / 1452 : pp = 149.06893920898438 
940 / 1452 : pp = 149.1317138671875 
950 / 1452 : pp = 149.1232452392578 
960 / 1452 : pp = 149.10354614257812 
970 / 1452 : pp = 149.11656188964844 
980 / 1452 : pp = 148.94259643554688 
990 / 1452 : pp = 148.8236846923828 
1000 / 1452 : pp = 148.633056640625 
1010 / 1452 : pp = 148.6830291748047 
1020 / 1452 : pp = 148.8126220703125 
1030 / 1452 : pp = 148.78089904785156 
1040 / 1452 : pp = 148.8600311279297 
1050 / 1452 : pp = 148.8486785888672 
1060 / 1452 : pp = 148.7664337158203 
1070 / 1452 : pp = 148.9337921142578 
1080 / 1452 : pp = 149.04441833496094 
1090 / 1452 : pp = 149.07284545898438 
1100 / 1452 : pp = 149.03318786621094 
1110 / 1452 : pp = 148.86428833007812 
1120 / 1452 : pp = 148.7332305908203 
1130 / 1452 : pp = 148.5670166015625 
1140 / 1452 : pp = 148.54661560058594 
1150 / 1452 : pp = 148.64219665527344 
1160 / 1452 : pp = 148.6490020751953 
1170 / 1452 : pp = 148.62420654296875 
1180 / 1452 : pp = 148.67665100097656 
1190 / 1452 : pp = 148.7633056640625 
1200 / 1452 : pp = 148.7782745361328 
1210 / 1452 : pp = 148.72500610351562 
1220 / 1452 : pp = 148.87493896484375 
1230 / 1452 : pp = 149.039794921875 
1240 / 1452 : pp = 149.04000854492188 
1250 / 1452 : pp = 149.17054748535156 
1260 / 1452 : pp = 149.23863220214844 
1270 / 1452 : pp = 149.2436065673828 
1280 / 1452 : pp = 149.25086975097656 
1290 / 1452 : pp = 149.24147033691406 
1300 / 1452 : pp = 149.27413940429688 
1310 / 1452 : pp = 149.32077026367188 
1320 / 1452 : pp = 149.27301025390625 
1330 / 1452 : pp = 149.23080444335938 
1340 / 1452 : pp = 149.25791931152344 
1350 / 1452 : pp = 149.2841033935547 
1360 / 1452 : pp = 149.337158203125 
1370 / 1452 : pp = 149.2467498779297 
1380 / 1452 : pp = 149.21351623535156 
1390 / 1452 : pp = 149.15403747558594 
1400 / 1452 : pp = 149.0877685546875 
1410 / 1452 : pp = 149.110595703125 
1420 / 1452 : pp = 149.07241821289062 
1430 / 1452 : pp = 149.05166625976562 
1440 / 1452 : pp = 149.0776824951172 
1450 / 1452 : pp = 149.06771850585938 

0 / 115 : pp = 227.0559844970703 
10 / 115 : pp = 208.7002410888672 
20 / 115 : pp = 210.38775634765625 
30 / 115 : pp = 207.9513397216797 
40 / 115 : pp = 207.12994384765625 
50 / 115 : pp = 202.70811462402344 
60 / 115 : pp = 202.05787658691406 
70 / 115 : pp = 198.3761444091797 
80 / 115 : pp = 196.17637634277344 
90 / 115 : pp = 193.5880126953125 
100 / 115 : pp = 189.0758819580078 
110 / 115 : pp = 187.07528686523438 
Training perplexity: 149.0502471923828
Validation perplexity:186.6911163330078
Total time : 47.274805545806885
Epoch 10

0 / 1452 : pp = 181.8408203125 
10 / 1452 : pp = 164.99664306640625 
20 / 1452 : pp = 161.8847198486328 
30 / 1452 : pp = 158.30064392089844 
40 / 1452 : pp = 160.13914489746094 
50 / 1452 : pp = 157.58743286132812 
60 / 1452 : pp = 156.11871337890625 
70 / 1452 : pp = 156.82948303222656 
80 / 1452 : pp = 156.2889862060547 
90 / 1452 : pp = 155.04833984375 
100 / 1452 : pp = 154.09327697753906 
110 / 1452 : pp = 152.5070343017578 
120 / 1452 : pp = 152.20750427246094 
130 / 1452 : pp = 151.3399200439453 
140 / 1452 : pp = 149.90740966796875 
150 / 1452 : pp = 149.345703125 
160 / 1452 : pp = 149.59814453125 
170 / 1452 : pp = 149.26539611816406 
180 / 1452 : pp = 148.624267578125 
190 / 1452 : pp = 148.58819580078125 
200 / 1452 : pp = 149.09552001953125 
210 / 1452 : pp = 148.8439178466797 
220 / 1452 : pp = 148.86605834960938 
230 / 1452 : pp = 149.1971435546875 
240 / 1452 : pp = 148.96533203125 
250 / 1452 : pp = 148.4253387451172 
260 / 1452 : pp = 147.9200897216797 
270 / 1452 : pp = 147.08816528320312 
280 / 1452 : pp = 147.24366760253906 
290 / 1452 : pp = 147.52182006835938 
300 / 1452 : pp = 147.72222900390625 
310 / 1452 : pp = 147.50486755371094 
320 / 1452 : pp = 147.73892211914062 
330 / 1452 : pp = 147.9404754638672 
340 / 1452 : pp = 147.37803649902344 
350 / 1452 : pp = 147.6969451904297 
360 / 1452 : pp = 147.5704345703125 
370 / 1452 : pp = 147.38674926757812 
380 / 1452 : pp = 147.03970336914062 
390 / 1452 : pp = 147.14231872558594 
400 / 1452 : pp = 146.91656494140625 
410 / 1452 : pp = 147.34059143066406 
420 / 1452 : pp = 147.68496704101562 
430 / 1452 : pp = 147.61195373535156 
440 / 1452 : pp = 147.68405151367188 
450 / 1452 : pp = 147.4711151123047 
460 / 1452 : pp = 147.1927032470703 
470 / 1452 : pp = 146.72970581054688 
480 / 1452 : pp = 146.17173767089844 
490 / 1452 : pp = 145.9028778076172 
500 / 1452 : pp = 145.42721557617188 
510 / 1452 : pp = 145.3111114501953 
520 / 1452 : pp = 145.11460876464844 
530 / 1452 : pp = 144.81488037109375 
540 / 1452 : pp = 144.263916015625 
550 / 1452 : pp = 143.997802734375 
560 / 1452 : pp = 143.71766662597656 
570 / 1452 : pp = 143.47451782226562 
580 / 1452 : pp = 143.08474731445312 
590 / 1452 : pp = 142.77920532226562 
600 / 1452 : pp = 142.39573669433594 
610 / 1452 : pp = 142.14906311035156 
620 / 1452 : pp = 141.9574432373047 
630 / 1452 : pp = 141.67369079589844 
640 / 1452 : pp = 141.81556701660156 
650 / 1452 : pp = 141.81759643554688 
660 / 1452 : pp = 141.9339599609375 
670 / 1452 : pp = 142.01248168945312 
680 / 1452 : pp = 142.08773803710938 
690 / 1452 : pp = 142.00328063964844 
700 / 1452 : pp = 141.98086547851562 
710 / 1452 : pp = 142.0632781982422 
720 / 1452 : pp = 142.10372924804688 
730 / 1452 : pp = 142.08055114746094 
740 / 1452 : pp = 142.23619079589844 
750 / 1452 : pp = 142.2660369873047 
760 / 1452 : pp = 142.34678649902344 
770 / 1452 : pp = 142.5257568359375 
780 / 1452 : pp = 142.70025634765625 
790 / 1452 : pp = 142.8614044189453 
800 / 1452 : pp = 142.84573364257812 
810 / 1452 : pp = 142.8250274658203 
820 / 1452 : pp = 142.8540496826172 
830 / 1452 : pp = 143.06053161621094 
840 / 1452 : pp = 143.0423126220703 
850 / 1452 : pp = 143.09634399414062 
860 / 1452 : pp = 143.10487365722656 
870 / 1452 : pp = 143.0884246826172 
880 / 1452 : pp = 143.19387817382812 
890 / 1452 : pp = 143.236083984375 
900 / 1452 : pp = 143.23390197753906 
910 / 1452 : pp = 143.29537963867188 
920 / 1452 : pp = 143.3722686767578 
930 / 1452 : pp = 143.33795166015625 
940 / 1452 : pp = 143.40618896484375 
950 / 1452 : pp = 143.3929901123047 
960 / 1452 : pp = 143.3693389892578 
970 / 1452 : pp = 143.39736938476562 
980 / 1452 : pp = 143.2371063232422 
990 / 1452 : pp = 143.13893127441406 
1000 / 1452 : pp = 142.9658660888672 
1010 / 1452 : pp = 143.01544189453125 
1020 / 1452 : pp = 143.152587890625 
1030 / 1452 : pp = 143.11334228515625 
1040 / 1452 : pp = 143.19020080566406 
1050 / 1452 : pp = 143.18234252929688 
1060 / 1452 : pp = 143.092041015625 
1070 / 1452 : pp = 143.24449157714844 
1080 / 1452 : pp = 143.34828186035156 
1090 / 1452 : pp = 143.38739013671875 
1100 / 1452 : pp = 143.37432861328125 
1110 / 1452 : pp = 143.20596313476562 
1120 / 1452 : pp = 143.07969665527344 
1130 / 1452 : pp = 142.92041015625 
1140 / 1452 : pp = 142.90902709960938 
1150 / 1452 : pp = 143.00732421875 
1160 / 1452 : pp = 143.01182556152344 
1170 / 1452 : pp = 142.9925994873047 
1180 / 1452 : pp = 143.06080627441406 
1190 / 1452 : pp = 143.14337158203125 
1200 / 1452 : pp = 143.16644287109375 
1210 / 1452 : pp = 143.1259002685547 
1220 / 1452 : pp = 143.2671661376953 
1230 / 1452 : pp = 143.4210968017578 
1240 / 1452 : pp = 143.4327850341797 
1250 / 1452 : pp = 143.5699920654297 
1260 / 1452 : pp = 143.63771057128906 
1270 / 1452 : pp = 143.65798950195312 
1280 / 1452 : pp = 143.68251037597656 
1290 / 1452 : pp = 143.68045043945312 
1300 / 1452 : pp = 143.72293090820312 
1310 / 1452 : pp = 143.77015686035156 
1320 / 1452 : pp = 143.71910095214844 
1330 / 1452 : pp = 143.68792724609375 
1340 / 1452 : pp = 143.7241668701172 
1350 / 1452 : pp = 143.7570037841797 
1360 / 1452 : pp = 143.81829833984375 
1370 / 1452 : pp = 143.7487030029297 
1380 / 1452 : pp = 143.7196502685547 
1390 / 1452 : pp = 143.67359924316406 
1400 / 1452 : pp = 143.60592651367188 
1410 / 1452 : pp = 143.62620544433594 
1420 / 1452 : pp = 143.5905303955078 
1430 / 1452 : pp = 143.55799865722656 
1440 / 1452 : pp = 143.5891571044922 
1450 / 1452 : pp = 143.5869598388672 

0 / 115 : pp = 226.9864959716797 
10 / 115 : pp = 207.8067169189453 
20 / 115 : pp = 209.68667602539062 
30 / 115 : pp = 207.1610565185547 
40 / 115 : pp = 206.3247833251953 
50 / 115 : pp = 201.77403259277344 
60 / 115 : pp = 201.07098388671875 
70 / 115 : pp = 197.33335876464844 
80 / 115 : pp = 195.12513732910156 
90 / 115 : pp = 192.5349578857422 
100 / 115 : pp = 187.90072631835938 
110 / 115 : pp = 185.81240844726562 
Training perplexity: 143.57354736328125
Validation perplexity:185.40573120117188
Total time : 46.14846849441528
Epoch 11

0 / 1452 : pp = 181.93162536621094 
10 / 1452 : pp = 159.94607543945312 
20 / 1452 : pp = 156.83673095703125 
30 / 1452 : pp = 153.75843811035156 
40 / 1452 : pp = 155.18362426757812 
50 / 1452 : pp = 152.39529418945312 
60 / 1452 : pp = 151.18772888183594 
70 / 1452 : pp = 151.9004364013672 
80 / 1452 : pp = 151.30239868164062 
90 / 1452 : pp = 150.1591033935547 
100 / 1452 : pp = 149.18618774414062 
110 / 1452 : pp = 147.72653198242188 
120 / 1452 : pp = 147.4357452392578 
130 / 1452 : pp = 146.41372680664062 
140 / 1452 : pp = 145.0057373046875 
150 / 1452 : pp = 144.39447021484375 
160 / 1452 : pp = 144.5330047607422 
170 / 1452 : pp = 144.23593139648438 
180 / 1452 : pp = 143.63990783691406 
190 / 1452 : pp = 143.63812255859375 
200 / 1452 : pp = 144.1143798828125 
210 / 1452 : pp = 143.88278198242188 
220 / 1452 : pp = 143.92518615722656 
230 / 1452 : pp = 144.24032592773438 
240 / 1452 : pp = 143.94110107421875 
250 / 1452 : pp = 143.3688507080078 
260 / 1452 : pp = 142.8829345703125 
270 / 1452 : pp = 142.11952209472656 
280 / 1452 : pp = 142.19415283203125 
290 / 1452 : pp = 142.51889038085938 
300 / 1452 : pp = 142.70494079589844 
310 / 1452 : pp = 142.51426696777344 
320 / 1452 : pp = 142.70106506347656 
330 / 1452 : pp = 142.88014221191406 
340 / 1452 : pp = 142.3287353515625 
350 / 1452 : pp = 142.6169891357422 
360 / 1452 : pp = 142.51971435546875 
370 / 1452 : pp = 142.33566284179688 
380 / 1452 : pp = 142.04161071777344 
390 / 1452 : pp = 142.13551330566406 
400 / 1452 : pp = 141.9499969482422 
410 / 1452 : pp = 142.3361358642578 
420 / 1452 : pp = 142.64065551757812 
430 / 1452 : pp = 142.5511016845703 
440 / 1452 : pp = 142.6728973388672 
450 / 1452 : pp = 142.47030639648438 
460 / 1452 : pp = 142.1704864501953 
470 / 1452 : pp = 141.73390197753906 
480 / 1452 : pp = 141.23020935058594 
490 / 1452 : pp = 140.9759521484375 
500 / 1452 : pp = 140.51609802246094 
510 / 1452 : pp = 140.40545654296875 
520 / 1452 : pp = 140.1936492919922 
530 / 1452 : pp = 139.8929443359375 
540 / 1452 : pp = 139.3696746826172 
550 / 1452 : pp = 139.13217163085938 
560 / 1452 : pp = 138.85247802734375 
570 / 1452 : pp = 138.6092987060547 
580 / 1452 : pp = 138.2471160888672 
590 / 1452 : pp = 137.9485626220703 
600 / 1452 : pp = 137.57379150390625 
610 / 1452 : pp = 137.31576538085938 
620 / 1452 : pp = 137.14230346679688 
630 / 1452 : pp = 136.87405395507812 
640 / 1452 : pp = 137.02928161621094 
650 / 1452 : pp = 137.0481719970703 
660 / 1452 : pp = 137.1595001220703 
670 / 1452 : pp = 137.21124267578125 
680 / 1452 : pp = 137.2671356201172 
690 / 1452 : pp = 137.19410705566406 
700 / 1452 : pp = 137.1850128173828 
710 / 1452 : pp = 137.26058959960938 
720 / 1452 : pp = 137.30726623535156 
730 / 1452 : pp = 137.28048706054688 
740 / 1452 : pp = 137.4352569580078 
750 / 1452 : pp = 137.4680938720703 
760 / 1452 : pp = 137.5524139404297 
770 / 1452 : pp = 137.73829650878906 
780 / 1452 : pp = 137.90882873535156 
790 / 1452 : pp = 138.05865478515625 
800 / 1452 : pp = 138.0673370361328 
810 / 1452 : pp = 138.03909301757812 
820 / 1452 : pp = 138.084716796875 
830 / 1452 : pp = 138.27989196777344 
840 / 1452 : pp = 138.23545837402344 
850 / 1452 : pp = 138.30343627929688 
860 / 1452 : pp = 138.3339080810547 
870 / 1452 : pp = 138.32835388183594 
880 / 1452 : pp = 138.4450225830078 
890 / 1452 : pp = 138.47157287597656 
900 / 1452 : pp = 138.46304321289062 
910 / 1452 : pp = 138.55618286132812 
920 / 1452 : pp = 138.64512634277344 
930 / 1452 : pp = 138.6160430908203 
940 / 1452 : pp = 138.66932678222656 
950 / 1452 : pp = 138.6573028564453 
960 / 1452 : pp = 138.6463165283203 
970 / 1452 : pp = 138.67059326171875 
980 / 1452 : pp = 138.50999450683594 
990 / 1452 : pp = 138.42430114746094 
1000 / 1452 : pp = 138.25344848632812 
1010 / 1452 : pp = 138.3004608154297 
1020 / 1452 : pp = 138.4243621826172 
1030 / 1452 : pp = 138.40713500976562 
1040 / 1452 : pp = 138.47129821777344 
1050 / 1452 : pp = 138.45928955078125 
1060 / 1452 : pp = 138.3919677734375 
1070 / 1452 : pp = 138.5287628173828 
1080 / 1452 : pp = 138.62298583984375 
1090 / 1452 : pp = 138.6699981689453 
1100 / 1452 : pp = 138.64849853515625 
1110 / 1452 : pp = 138.49191284179688 
1120 / 1452 : pp = 138.37355041503906 
1130 / 1452 : pp = 138.2216796875 
1140 / 1452 : pp = 138.21534729003906 
1150 / 1452 : pp = 138.30963134765625 
1160 / 1452 : pp = 138.316162109375 
1170 / 1452 : pp = 138.3023681640625 
1180 / 1452 : pp = 138.36932373046875 
1190 / 1452 : pp = 138.45960998535156 
1200 / 1452 : pp = 138.4866180419922 
1210 / 1452 : pp = 138.45730590820312 
1220 / 1452 : pp = 138.60031127929688 
1230 / 1452 : pp = 138.75485229492188 
1240 / 1452 : pp = 138.7751007080078 
1250 / 1452 : pp = 138.91221618652344 
1260 / 1452 : pp = 138.9815216064453 
1270 / 1452 : pp = 138.9919891357422 
1280 / 1452 : pp = 139.0243377685547 
1290 / 1452 : pp = 139.02725219726562 
1300 / 1452 : pp = 139.0701446533203 
1310 / 1452 : pp = 139.1090850830078 
1320 / 1452 : pp = 139.06027221679688 
1330 / 1452 : pp = 139.0338134765625 
1340 / 1452 : pp = 139.06385803222656 
1350 / 1452 : pp = 139.09608459472656 
1360 / 1452 : pp = 139.1609649658203 
1370 / 1452 : pp = 139.0869903564453 
1380 / 1452 : pp = 139.0604705810547 
1390 / 1452 : pp = 139.01670837402344 
1400 / 1452 : pp = 138.94393920898438 
1410 / 1452 : pp = 138.97323608398438 
1420 / 1452 : pp = 138.9404296875 
1430 / 1452 : pp = 138.90943908691406 
1440 / 1452 : pp = 138.94268798828125 
1450 / 1452 : pp = 138.93991088867188 

0 / 115 : pp = 225.55990600585938 
10 / 115 : pp = 207.0504608154297 
20 / 115 : pp = 208.98306274414062 
30 / 115 : pp = 206.28396606445312 
40 / 115 : pp = 205.35386657714844 
50 / 115 : pp = 200.7255401611328 
60 / 115 : pp = 200.0526580810547 
70 / 115 : pp = 196.33087158203125 
80 / 115 : pp = 194.12110900878906 
90 / 115 : pp = 191.52816772460938 
100 / 115 : pp = 186.7974395751953 
110 / 115 : pp = 184.59829711914062 
Training perplexity: 138.9222869873047
Validation perplexity:184.18101501464844
Total time : 43.92928600311279
Epoch 12

0 / 1452 : pp = 173.0251007080078 
10 / 1452 : pp = 152.98446655273438 
20 / 1452 : pp = 150.43128967285156 
30 / 1452 : pp = 147.5819854736328 
40 / 1452 : pp = 149.4164276123047 
50 / 1452 : pp = 146.70816040039062 
60 / 1452 : pp = 145.557861328125 
70 / 1452 : pp = 146.50473022460938 
80 / 1452 : pp = 145.83200073242188 
90 / 1452 : pp = 144.84402465820312 
100 / 1452 : pp = 144.0390167236328 
110 / 1452 : pp = 142.66514587402344 
120 / 1452 : pp = 142.3549346923828 
130 / 1452 : pp = 141.4630126953125 
140 / 1452 : pp = 140.2266082763672 
150 / 1452 : pp = 139.67518615722656 
160 / 1452 : pp = 139.90414428710938 
170 / 1452 : pp = 139.5490264892578 
180 / 1452 : pp = 138.91969299316406 
190 / 1452 : pp = 138.89234924316406 
200 / 1452 : pp = 139.40908813476562 
210 / 1452 : pp = 139.19068908691406 
220 / 1452 : pp = 139.35513305664062 
230 / 1452 : pp = 139.5464324951172 
240 / 1452 : pp = 139.3047637939453 
250 / 1452 : pp = 138.7708740234375 
260 / 1452 : pp = 138.29188537597656 
270 / 1452 : pp = 137.4787139892578 
280 / 1452 : pp = 137.6367950439453 
290 / 1452 : pp = 137.98513793945312 
300 / 1452 : pp = 138.17819213867188 
310 / 1452 : pp = 137.943359375 
320 / 1452 : pp = 138.12060546875 
330 / 1452 : pp = 138.29037475585938 
340 / 1452 : pp = 137.77606201171875 
350 / 1452 : pp = 138.06378173828125 
360 / 1452 : pp = 137.99000549316406 
370 / 1452 : pp = 137.81922912597656 
380 / 1452 : pp = 137.52159118652344 
390 / 1452 : pp = 137.61782836914062 
400 / 1452 : pp = 137.4178924560547 
410 / 1452 : pp = 137.82632446289062 
420 / 1452 : pp = 138.17567443847656 
430 / 1452 : pp = 138.11863708496094 
440 / 1452 : pp = 138.215087890625 
450 / 1452 : pp = 137.9976348876953 
460 / 1452 : pp = 137.6929168701172 
470 / 1452 : pp = 137.25416564941406 
480 / 1452 : pp = 136.75140380859375 
490 / 1452 : pp = 136.51712036132812 
500 / 1452 : pp = 136.0896453857422 
510 / 1452 : pp = 135.97048950195312 
520 / 1452 : pp = 135.7760009765625 
530 / 1452 : pp = 135.50389099121094 
540 / 1452 : pp = 135.01437377929688 
550 / 1452 : pp = 134.7666015625 
560 / 1452 : pp = 134.48973083496094 
570 / 1452 : pp = 134.22853088378906 
580 / 1452 : pp = 133.88455200195312 
590 / 1452 : pp = 133.5808868408203 
600 / 1452 : pp = 133.22975158691406 
610 / 1452 : pp = 132.99591064453125 
620 / 1452 : pp = 132.79502868652344 
630 / 1452 : pp = 132.5094451904297 
640 / 1452 : pp = 132.62892150878906 
650 / 1452 : pp = 132.63499450683594 
660 / 1452 : pp = 132.7379913330078 
670 / 1452 : pp = 132.79046630859375 
680 / 1452 : pp = 132.85842895507812 
690 / 1452 : pp = 132.80364990234375 
700 / 1452 : pp = 132.80477905273438 
710 / 1452 : pp = 132.90170288085938 
720 / 1452 : pp = 132.92971801757812 
730 / 1452 : pp = 132.9019012451172 
740 / 1452 : pp = 133.04811096191406 
750 / 1452 : pp = 133.10877990722656 
760 / 1452 : pp = 133.19189453125 
770 / 1452 : pp = 133.3564910888672 
780 / 1452 : pp = 133.54000854492188 
790 / 1452 : pp = 133.69239807128906 
800 / 1452 : pp = 133.68495178222656 
810 / 1452 : pp = 133.67971801757812 
820 / 1452 : pp = 133.7035675048828 
830 / 1452 : pp = 133.89329528808594 
840 / 1452 : pp = 133.850341796875 
850 / 1452 : pp = 133.90390014648438 
860 / 1452 : pp = 133.9090118408203 
870 / 1452 : pp = 133.89974975585938 
880 / 1452 : pp = 134.0077667236328 
890 / 1452 : pp = 134.03485107421875 
900 / 1452 : pp = 134.0261688232422 
910 / 1452 : pp = 134.10255432128906 
920 / 1452 : pp = 134.17291259765625 
930 / 1452 : pp = 134.14796447753906 
940 / 1452 : pp = 134.20925903320312 
950 / 1452 : pp = 134.19281005859375 
960 / 1452 : pp = 134.17745971679688 
970 / 1452 : pp = 134.18653869628906 
980 / 1452 : pp = 134.03192138671875 
990 / 1452 : pp = 133.94349670410156 
1000 / 1452 : pp = 133.79685974121094 
1010 / 1452 : pp = 133.8438262939453 
1020 / 1452 : pp = 133.9608612060547 
1030 / 1452 : pp = 133.93934631347656 
1040 / 1452 : pp = 134.02833557128906 
1050 / 1452 : pp = 134.01734924316406 
1060 / 1452 : pp = 133.95346069335938 
1070 / 1452 : pp = 134.10205078125 
1080 / 1452 : pp = 134.2030487060547 
1090 / 1452 : pp = 134.23696899414062 
1100 / 1452 : pp = 134.2230224609375 
1110 / 1452 : pp = 134.0829315185547 
1120 / 1452 : pp = 133.980224609375 
1130 / 1452 : pp = 133.83815002441406 
1140 / 1452 : pp = 133.8366241455078 
1150 / 1452 : pp = 133.92108154296875 
1160 / 1452 : pp = 133.94375610351562 
1170 / 1452 : pp = 133.9360809326172 
1180 / 1452 : pp = 133.99684143066406 
1190 / 1452 : pp = 134.0944366455078 
1200 / 1452 : pp = 134.11676025390625 
1210 / 1452 : pp = 134.0911102294922 
1220 / 1452 : pp = 134.22763061523438 
1230 / 1452 : pp = 134.38043212890625 
1240 / 1452 : pp = 134.39817810058594 
1250 / 1452 : pp = 134.5367431640625 
1260 / 1452 : pp = 134.593017578125 
1270 / 1452 : pp = 134.61497497558594 
1280 / 1452 : pp = 134.6423797607422 
1290 / 1452 : pp = 134.64340209960938 
1300 / 1452 : pp = 134.68026733398438 
1310 / 1452 : pp = 134.73556518554688 
1320 / 1452 : pp = 134.69021606445312 
1330 / 1452 : pp = 134.66131591796875 
1340 / 1452 : pp = 134.69393920898438 
1350 / 1452 : pp = 134.7328643798828 
1360 / 1452 : pp = 134.79405212402344 
1370 / 1452 : pp = 134.71237182617188 
1380 / 1452 : pp = 134.6885528564453 
1390 / 1452 : pp = 134.65110778808594 
1400 / 1452 : pp = 134.59584045410156 
1410 / 1452 : pp = 134.6193389892578 
1420 / 1452 : pp = 134.58338928222656 
1430 / 1452 : pp = 134.559326171875 
1440 / 1452 : pp = 134.59507751464844 
1450 / 1452 : pp = 134.59365844726562 

0 / 115 : pp = 226.0741729736328 
10 / 115 : pp = 207.00494384765625 
20 / 115 : pp = 209.26976013183594 
30 / 115 : pp = 206.44662475585938 
40 / 115 : pp = 205.47268676757812 
50 / 115 : pp = 200.7876739501953 
60 / 115 : pp = 200.13414001464844 
70 / 115 : pp = 196.35549926757812 
80 / 115 : pp = 194.10777282714844 
90 / 115 : pp = 191.47467041015625 
100 / 115 : pp = 186.61351013183594 
110 / 115 : pp = 184.30374145507812 
Training perplexity: 134.57826232910156
Validation perplexity:183.8900146484375
Total time : 45.410256147384644
Epoch 13

0 / 1452 : pp = 169.39393615722656 
10 / 1452 : pp = 150.13232421875 
20 / 1452 : pp = 147.60450744628906 
30 / 1452 : pp = 144.64317321777344 
40 / 1452 : pp = 146.47427368164062 
50 / 1452 : pp = 143.929443359375 
60 / 1452 : pp = 142.8344268798828 
70 / 1452 : pp = 143.45248413085938 
80 / 1452 : pp = 142.5418701171875 
90 / 1452 : pp = 141.6178436279297 
100 / 1452 : pp = 140.70127868652344 
110 / 1452 : pp = 139.2852325439453 
120 / 1452 : pp = 138.8017120361328 
130 / 1452 : pp = 137.85629272460938 
140 / 1452 : pp = 136.51718139648438 
150 / 1452 : pp = 136.03619384765625 
160 / 1452 : pp = 136.154296875 
170 / 1452 : pp = 135.67037963867188 
180 / 1452 : pp = 135.0376739501953 
190 / 1452 : pp = 134.9230499267578 
200 / 1452 : pp = 135.4241180419922 
210 / 1452 : pp = 135.24581909179688 
220 / 1452 : pp = 135.37957763671875 
230 / 1452 : pp = 135.67652893066406 
240 / 1452 : pp = 135.4161834716797 
250 / 1452 : pp = 134.90895080566406 
260 / 1452 : pp = 134.46754455566406 
270 / 1452 : pp = 133.68577575683594 
280 / 1452 : pp = 133.86770629882812 
290 / 1452 : pp = 134.18475341796875 
300 / 1452 : pp = 134.39132690429688 
310 / 1452 : pp = 134.19985961914062 
320 / 1452 : pp = 134.37998962402344 
330 / 1452 : pp = 134.5557403564453 
340 / 1452 : pp = 134.00686645507812 
350 / 1452 : pp = 134.27749633789062 
360 / 1452 : pp = 134.20286560058594 
370 / 1452 : pp = 134.042724609375 
380 / 1452 : pp = 133.74398803710938 
390 / 1452 : pp = 133.83584594726562 
400 / 1452 : pp = 133.64382934570312 
410 / 1452 : pp = 134.02366638183594 
420 / 1452 : pp = 134.35415649414062 
430 / 1452 : pp = 134.310546875 
440 / 1452 : pp = 134.3634490966797 
450 / 1452 : pp = 134.15602111816406 
460 / 1452 : pp = 133.86578369140625 
470 / 1452 : pp = 133.43414306640625 
480 / 1452 : pp = 132.90310668945312 
490 / 1452 : pp = 132.646240234375 
500 / 1452 : pp = 132.1982421875 
510 / 1452 : pp = 132.04200744628906 
520 / 1452 : pp = 131.86940002441406 
530 / 1452 : pp = 131.59841918945312 
540 / 1452 : pp = 131.12356567382812 
550 / 1452 : pp = 130.887939453125 
560 / 1452 : pp = 130.6210174560547 
570 / 1452 : pp = 130.37826538085938 
580 / 1452 : pp = 130.0374755859375 
590 / 1452 : pp = 129.75979614257812 
600 / 1452 : pp = 129.38308715820312 
610 / 1452 : pp = 129.16685485839844 
620 / 1452 : pp = 129.0115509033203 
630 / 1452 : pp = 128.75152587890625 
640 / 1452 : pp = 128.87295532226562 
650 / 1452 : pp = 128.88734436035156 
660 / 1452 : pp = 128.98275756835938 
670 / 1452 : pp = 129.0487060546875 
680 / 1452 : pp = 129.11013793945312 
690 / 1452 : pp = 129.0646514892578 
700 / 1452 : pp = 129.06280517578125 
710 / 1452 : pp = 129.1343994140625 
720 / 1452 : pp = 129.18582153320312 
730 / 1452 : pp = 129.15138244628906 
740 / 1452 : pp = 129.29811096191406 
750 / 1452 : pp = 129.339599609375 
760 / 1452 : pp = 129.4257354736328 
770 / 1452 : pp = 129.61631774902344 
780 / 1452 : pp = 129.802734375 
790 / 1452 : pp = 129.96804809570312 
800 / 1452 : pp = 129.95187377929688 
810 / 1452 : pp = 129.92417907714844 
820 / 1452 : pp = 129.9774627685547 
830 / 1452 : pp = 130.1638946533203 
840 / 1452 : pp = 130.13095092773438 
850 / 1452 : pp = 130.16595458984375 
860 / 1452 : pp = 130.173828125 
870 / 1452 : pp = 130.170166015625 
880 / 1452 : pp = 130.27032470703125 
890 / 1452 : pp = 130.3022003173828 
900 / 1452 : pp = 130.3071746826172 
910 / 1452 : pp = 130.37939453125 
920 / 1452 : pp = 130.46229553222656 
930 / 1452 : pp = 130.43846130371094 
940 / 1452 : pp = 130.50889587402344 
950 / 1452 : pp = 130.50086975097656 
960 / 1452 : pp = 130.4833221435547 
970 / 1452 : pp = 130.50814819335938 
980 / 1452 : pp = 130.35577392578125 
990 / 1452 : pp = 130.26759338378906 
1000 / 1452 : pp = 130.1064453125 
1010 / 1452 : pp = 130.1472625732422 
1020 / 1452 : pp = 130.27169799804688 
1030 / 1452 : pp = 130.25100708007812 
1040 / 1452 : pp = 130.30816650390625 
1050 / 1452 : pp = 130.29803466796875 
1060 / 1452 : pp = 130.2242431640625 
1070 / 1452 : pp = 130.35906982421875 
1080 / 1452 : pp = 130.45103454589844 
1090 / 1452 : pp = 130.49838256835938 
1100 / 1452 : pp = 130.484130859375 
1110 / 1452 : pp = 130.35316467285156 
1120 / 1452 : pp = 130.24697875976562 
1130 / 1452 : pp = 130.10804748535156 
1140 / 1452 : pp = 130.1076202392578 
1150 / 1452 : pp = 130.195068359375 
1160 / 1452 : pp = 130.19674682617188 
1170 / 1452 : pp = 130.18321228027344 
1180 / 1452 : pp = 130.24623107910156 
1190 / 1452 : pp = 130.33905029296875 
1200 / 1452 : pp = 130.3650360107422 
1210 / 1452 : pp = 130.34588623046875 
1220 / 1452 : pp = 130.4850616455078 
1230 / 1452 : pp = 130.63160705566406 
1240 / 1452 : pp = 130.64674377441406 
1250 / 1452 : pp = 130.77078247070312 
1260 / 1452 : pp = 130.8397674560547 
1270 / 1452 : pp = 130.8511199951172 
1280 / 1452 : pp = 130.88967895507812 
1290 / 1452 : pp = 130.9040985107422 
1300 / 1452 : pp = 130.93511962890625 
1310 / 1452 : pp = 130.9759063720703 
1320 / 1452 : pp = 130.92800903320312 
1330 / 1452 : pp = 130.9105224609375 
1340 / 1452 : pp = 130.929443359375 
1350 / 1452 : pp = 130.96153259277344 
1360 / 1452 : pp = 131.02381896972656 
1370 / 1452 : pp = 130.9545440673828 
1380 / 1452 : pp = 130.9344940185547 
1390 / 1452 : pp = 130.9055938720703 
1400 / 1452 : pp = 130.85386657714844 
1410 / 1452 : pp = 130.8874969482422 
1420 / 1452 : pp = 130.85928344726562 
1430 / 1452 : pp = 130.83995056152344 
1440 / 1452 : pp = 130.86659240722656 
1450 / 1452 : pp = 130.86839294433594 

0 / 115 : pp = 227.78428649902344 
10 / 115 : pp = 207.609619140625 
20 / 115 : pp = 209.92459106445312 
30 / 115 : pp = 206.96240234375 
40 / 115 : pp = 205.9295654296875 
50 / 115 : pp = 201.0296630859375 
60 / 115 : pp = 200.38059997558594 
70 / 115 : pp = 196.55764770507812 
80 / 115 : pp = 194.31735229492188 
90 / 115 : pp = 191.66146850585938 
100 / 115 : pp = 186.70437622070312 
110 / 115 : pp = 184.3171844482422 
Training perplexity: 130.85043334960938
Validation perplexity:183.88186645507812
Total time : 45.345656394958496
Epoch 14

0 / 1452 : pp = 164.82191467285156 
10 / 1452 : pp = 146.39089965820312 
20 / 1452 : pp = 142.93240356445312 
30 / 1452 : pp = 140.3113555908203 
40 / 1452 : pp = 142.39939880371094 
50 / 1452 : pp = 139.70162963867188 
60 / 1452 : pp = 138.73023986816406 
70 / 1452 : pp = 139.2675018310547 
80 / 1452 : pp = 138.47824096679688 
90 / 1452 : pp = 137.40432739257812 
100 / 1452 : pp = 136.47793579101562 
110 / 1452 : pp = 135.2294464111328 
120 / 1452 : pp = 134.80728149414062 
130 / 1452 : pp = 133.89822387695312 
140 / 1452 : pp = 132.54141235351562 
150 / 1452 : pp = 132.10025024414062 
160 / 1452 : pp = 132.21829223632812 
170 / 1452 : pp = 131.8765106201172 
180 / 1452 : pp = 131.37515258789062 
190 / 1452 : pp = 131.31622314453125 
200 / 1452 : pp = 131.78297424316406 
210 / 1452 : pp = 131.5507354736328 
220 / 1452 : pp = 131.7002410888672 
230 / 1452 : pp = 131.9277801513672 
240 / 1452 : pp = 131.72166442871094 
250 / 1452 : pp = 131.225830078125 
260 / 1452 : pp = 130.7496337890625 
270 / 1452 : pp = 129.9896697998047 
280 / 1452 : pp = 130.10594177246094 
290 / 1452 : pp = 130.41644287109375 
300 / 1452 : pp = 130.5982208251953 
310 / 1452 : pp = 130.36329650878906 
320 / 1452 : pp = 130.5633544921875 
330 / 1452 : pp = 130.77252197265625 
340 / 1452 : pp = 130.273193359375 
350 / 1452 : pp = 130.47889709472656 
360 / 1452 : pp = 130.4348602294922 
370 / 1452 : pp = 130.28126525878906 
380 / 1452 : pp = 130.02786254882812 
390 / 1452 : pp = 130.1564483642578 
400 / 1452 : pp = 129.98440551757812 
410 / 1452 : pp = 130.37721252441406 
420 / 1452 : pp = 130.71859741210938 
430 / 1452 : pp = 130.65939331054688 
440 / 1452 : pp = 130.72987365722656 
450 / 1452 : pp = 130.56272888183594 
460 / 1452 : pp = 130.28195190429688 
470 / 1452 : pp = 129.90936279296875 
480 / 1452 : pp = 129.42857360839844 
490 / 1452 : pp = 129.18077087402344 
500 / 1452 : pp = 128.7588348388672 
510 / 1452 : pp = 128.6303253173828 
520 / 1452 : pp = 128.47616577148438 
530 / 1452 : pp = 128.21148681640625 
540 / 1452 : pp = 127.7218017578125 
550 / 1452 : pp = 127.50067138671875 
560 / 1452 : pp = 127.27574157714844 
570 / 1452 : pp = 127.05399322509766 
580 / 1452 : pp = 126.73983001708984 
590 / 1452 : pp = 126.43692779541016 
600 / 1452 : pp = 126.06050109863281 
610 / 1452 : pp = 125.82952880859375 
620 / 1452 : pp = 125.66295623779297 
630 / 1452 : pp = 125.39354705810547 
640 / 1452 : pp = 125.49463653564453 
650 / 1452 : pp = 125.48816680908203 
660 / 1452 : pp = 125.58712005615234 
670 / 1452 : pp = 125.65978240966797 
680 / 1452 : pp = 125.71456146240234 
690 / 1452 : pp = 125.66937255859375 
700 / 1452 : pp = 125.65900421142578 
710 / 1452 : pp = 125.7271499633789 
720 / 1452 : pp = 125.77758026123047 
730 / 1452 : pp = 125.74129486083984 
740 / 1452 : pp = 125.8759765625 
750 / 1452 : pp = 125.91793823242188 
760 / 1452 : pp = 125.99595642089844 
770 / 1452 : pp = 126.18113708496094 
780 / 1452 : pp = 126.35147094726562 
790 / 1452 : pp = 126.50797271728516 
800 / 1452 : pp = 126.49759674072266 
810 / 1452 : pp = 126.48113250732422 
820 / 1452 : pp = 126.52528381347656 
830 / 1452 : pp = 126.705810546875 
840 / 1452 : pp = 126.67517852783203 
850 / 1452 : pp = 126.74176025390625 
860 / 1452 : pp = 126.74151611328125 
870 / 1452 : pp = 126.73414611816406 
880 / 1452 : pp = 126.83026885986328 
890 / 1452 : pp = 126.88519287109375 
900 / 1452 : pp = 126.88053894042969 
910 / 1452 : pp = 126.97138214111328 
920 / 1452 : pp = 127.04660034179688 
930 / 1452 : pp = 127.03763580322266 
940 / 1452 : pp = 127.1126480102539 
950 / 1452 : pp = 127.09610748291016 
960 / 1452 : pp = 127.0873794555664 
970 / 1452 : pp = 127.10343933105469 
980 / 1452 : pp = 126.96441650390625 
990 / 1452 : pp = 126.88519287109375 
1000 / 1452 : pp = 126.7336654663086 
1010 / 1452 : pp = 126.77796936035156 
1020 / 1452 : pp = 126.89826202392578 
1030 / 1452 : pp = 126.88761138916016 
1040 / 1452 : pp = 126.95309448242188 
1050 / 1452 : pp = 126.96478271484375 
1060 / 1452 : pp = 126.89324188232422 
1070 / 1452 : pp = 127.03242492675781 
1080 / 1452 : pp = 127.13228607177734 
1090 / 1452 : pp = 127.173095703125 
1100 / 1452 : pp = 127.15975189208984 
1110 / 1452 : pp = 127.0392074584961 
1120 / 1452 : pp = 126.94032287597656 
1130 / 1452 : pp = 126.80693054199219 
1140 / 1452 : pp = 126.81315612792969 
1150 / 1452 : pp = 126.90467834472656 
1160 / 1452 : pp = 126.91236114501953 
1170 / 1452 : pp = 126.90897369384766 
1180 / 1452 : pp = 126.98052215576172 
1190 / 1452 : pp = 127.07483673095703 
1200 / 1452 : pp = 127.10216522216797 
1210 / 1452 : pp = 127.08258819580078 
1220 / 1452 : pp = 127.22943878173828 
1230 / 1452 : pp = 127.38563537597656 
1240 / 1452 : pp = 127.40538024902344 
1250 / 1452 : pp = 127.53369140625 
1260 / 1452 : pp = 127.59293365478516 
1270 / 1452 : pp = 127.61489868164062 
1280 / 1452 : pp = 127.6484375 
1290 / 1452 : pp = 127.65257263183594 
1300 / 1452 : pp = 127.69329833984375 
1310 / 1452 : pp = 127.74549102783203 
1320 / 1452 : pp = 127.7043228149414 
1330 / 1452 : pp = 127.6866683959961 
1340 / 1452 : pp = 127.70913696289062 
1350 / 1452 : pp = 127.73233795166016 
1360 / 1452 : pp = 127.7855224609375 
1370 / 1452 : pp = 127.71918487548828 
1380 / 1452 : pp = 127.69987487792969 
1390 / 1452 : pp = 127.6697998046875 
1400 / 1452 : pp = 127.61137390136719 
1410 / 1452 : pp = 127.6404037475586 
1420 / 1452 : pp = 127.61094665527344 
1430 / 1452 : pp = 127.58216857910156 
1440 / 1452 : pp = 127.61477661132812 
1450 / 1452 : pp = 127.61964416503906 

0 / 115 : pp = 228.21578979492188 
10 / 115 : pp = 208.11244201660156 
20 / 115 : pp = 210.688232421875 
30 / 115 : pp = 207.62408447265625 
40 / 115 : pp = 206.45184326171875 
50 / 115 : pp = 201.52760314941406 
60 / 115 : pp = 200.7784881591797 
70 / 115 : pp = 196.83067321777344 
80 / 115 : pp = 194.6357879638672 
90 / 115 : pp = 191.9783935546875 
100 / 115 : pp = 186.8787841796875 
110 / 115 : pp = 184.35252380371094 
Training perplexity: 127.60413360595703
Validation perplexity:183.8877410888672
Total time : 41.6636528968811
Epoch 15

0 / 1452 : pp = 156.81654357910156 
10 / 1452 : pp = 142.1070556640625 
20 / 1452 : pp = 139.55076599121094 
30 / 1452 : pp = 136.63551330566406 
40 / 1452 : pp = 138.5840606689453 
50 / 1452 : pp = 136.052734375 
60 / 1452 : pp = 134.93019104003906 
70 / 1452 : pp = 135.65206909179688 
80 / 1452 : pp = 135.2620086669922 
90 / 1452 : pp = 134.314697265625 
100 / 1452 : pp = 133.4916229248047 
110 / 1452 : pp = 132.26052856445312 
120 / 1452 : pp = 131.7714080810547 
130 / 1452 : pp = 130.77365112304688 
140 / 1452 : pp = 129.5411834716797 
150 / 1452 : pp = 129.0791778564453 
160 / 1452 : pp = 129.21920776367188 
170 / 1452 : pp = 128.7528839111328 
180 / 1452 : pp = 128.22279357910156 
190 / 1452 : pp = 128.18177795410156 
200 / 1452 : pp = 128.58758544921875 
210 / 1452 : pp = 128.3906707763672 
220 / 1452 : pp = 128.5266571044922 
230 / 1452 : pp = 128.80563354492188 
240 / 1452 : pp = 128.61886596679688 
250 / 1452 : pp = 128.13172912597656 
260 / 1452 : pp = 127.69220733642578 
270 / 1452 : pp = 126.96150970458984 
280 / 1452 : pp = 127.04702758789062 
290 / 1452 : pp = 127.33565521240234 
300 / 1452 : pp = 127.55929565429688 
310 / 1452 : pp = 127.38514709472656 
320 / 1452 : pp = 127.52171325683594 
330 / 1452 : pp = 127.68690490722656 
340 / 1452 : pp = 127.18340301513672 
350 / 1452 : pp = 127.4073257446289 
360 / 1452 : pp = 127.30432891845703 
370 / 1452 : pp = 127.17618560791016 
380 / 1452 : pp = 126.92579650878906 
390 / 1452 : pp = 127.02473449707031 
400 / 1452 : pp = 126.8515625 
410 / 1452 : pp = 127.211669921875 
420 / 1452 : pp = 127.51788330078125 
430 / 1452 : pp = 127.47386169433594 
440 / 1452 : pp = 127.57164001464844 
450 / 1452 : pp = 127.3601303100586 
460 / 1452 : pp = 127.09434509277344 
470 / 1452 : pp = 126.71922302246094 
480 / 1452 : pp = 126.24349212646484 
490 / 1452 : pp = 125.98778533935547 
500 / 1452 : pp = 125.59526824951172 
510 / 1452 : pp = 125.4450912475586 
520 / 1452 : pp = 125.29247283935547 
530 / 1452 : pp = 125.03536224365234 
540 / 1452 : pp = 124.5813980102539 
550 / 1452 : pp = 124.33724212646484 
560 / 1452 : pp = 124.08995819091797 
570 / 1452 : pp = 123.86637878417969 
580 / 1452 : pp = 123.53152465820312 
590 / 1452 : pp = 123.20321655273438 
600 / 1452 : pp = 122.85673522949219 
610 / 1452 : pp = 122.64250946044922 
620 / 1452 : pp = 122.4958724975586 
630 / 1452 : pp = 122.22386169433594 
640 / 1452 : pp = 122.31143188476562 
650 / 1452 : pp = 122.30093383789062 
660 / 1452 : pp = 122.39427947998047 
670 / 1452 : pp = 122.45440673828125 
680 / 1452 : pp = 122.51146697998047 
690 / 1452 : pp = 122.4854736328125 
700 / 1452 : pp = 122.48600006103516 
710 / 1452 : pp = 122.56084442138672 
720 / 1452 : pp = 122.59059143066406 
730 / 1452 : pp = 122.55529022216797 
740 / 1452 : pp = 122.69409942626953 
750 / 1452 : pp = 122.76456451416016 
760 / 1452 : pp = 122.84437561035156 
770 / 1452 : pp = 123.02527618408203 
780 / 1452 : pp = 123.20509338378906 
790 / 1452 : pp = 123.36305236816406 
800 / 1452 : pp = 123.36852264404297 
810 / 1452 : pp = 123.36799621582031 
820 / 1452 : pp = 123.39976501464844 
830 / 1452 : pp = 123.59362030029297 
840 / 1452 : pp = 123.56946563720703 
850 / 1452 : pp = 123.63800811767578 
860 / 1452 : pp = 123.63983917236328 
870 / 1452 : pp = 123.64148712158203 
880 / 1452 : pp = 123.7568588256836 
890 / 1452 : pp = 123.7885513305664 
900 / 1452 : pp = 123.79640197753906 
910 / 1452 : pp = 123.86153411865234 
920 / 1452 : pp = 123.92941284179688 
930 / 1452 : pp = 123.9125747680664 
940 / 1452 : pp = 123.95559692382812 
950 / 1452 : pp = 123.93928527832031 
960 / 1452 : pp = 123.94294738769531 
970 / 1452 : pp = 123.95547485351562 
980 / 1452 : pp = 123.8229751586914 
990 / 1452 : pp = 123.73727416992188 
1000 / 1452 : pp = 123.59091186523438 
1010 / 1452 : pp = 123.634765625 
1020 / 1452 : pp = 123.76506042480469 
1030 / 1452 : pp = 123.75485229492188 
1040 / 1452 : pp = 123.807861328125 
1050 / 1452 : pp = 123.79156494140625 
1060 / 1452 : pp = 123.73054504394531 
1070 / 1452 : pp = 123.8615951538086 
1080 / 1452 : pp = 123.96564483642578 
1090 / 1452 : pp = 124.02104187011719 
1100 / 1452 : pp = 124.012939453125 
1110 / 1452 : pp = 123.87582397460938 
1120 / 1452 : pp = 123.775390625 
1130 / 1452 : pp = 123.63182067871094 
1140 / 1452 : pp = 123.62391662597656 
1150 / 1452 : pp = 123.71013641357422 
1160 / 1452 : pp = 123.72423553466797 
1170 / 1452 : pp = 123.71726989746094 
1180 / 1452 : pp = 123.79032897949219 
1190 / 1452 : pp = 123.87883758544922 
1200 / 1452 : pp = 123.9125747680664 
1210 / 1452 : pp = 123.90140533447266 
1220 / 1452 : pp = 124.03245544433594 
1230 / 1452 : pp = 124.19799041748047 
1240 / 1452 : pp = 124.21469116210938 
1250 / 1452 : pp = 124.34103393554688 
1260 / 1452 : pp = 124.4041976928711 
1270 / 1452 : pp = 124.42852020263672 
1280 / 1452 : pp = 124.46656036376953 
1290 / 1452 : pp = 124.4811019897461 
1300 / 1452 : pp = 124.52384185791016 
1310 / 1452 : pp = 124.57533264160156 
1320 / 1452 : pp = 124.5398178100586 
1330 / 1452 : pp = 124.52598571777344 
1340 / 1452 : pp = 124.53311157226562 
1350 / 1452 : pp = 124.57759094238281 
1360 / 1452 : pp = 124.63385772705078 
1370 / 1452 : pp = 124.58133697509766 
1380 / 1452 : pp = 124.55769348144531 
1390 / 1452 : pp = 124.54011535644531 
1400 / 1452 : pp = 124.4884033203125 
1410 / 1452 : pp = 124.51226806640625 
1420 / 1452 : pp = 124.49683380126953 
1430 / 1452 : pp = 124.4754638671875 
1440 / 1452 : pp = 124.50164031982422 
1450 / 1452 : pp = 124.50894165039062 

0 / 115 : pp = 230.8488006591797 
10 / 115 : pp = 209.2509002685547 
20 / 115 : pp = 211.68577575683594 
30 / 115 : pp = 208.44056701660156 
40 / 115 : pp = 207.2039337158203 
50 / 115 : pp = 202.1859588623047 
60 / 115 : pp = 201.34739685058594 
70 / 115 : pp = 197.4251251220703 
80 / 115 : pp = 195.2623291015625 
90 / 115 : pp = 192.592529296875 
100 / 115 : pp = 187.39553833007812 
110 / 115 : pp = 184.791259765625 
Training perplexity: 124.4933853149414
Validation perplexity:184.32510375976562
Total time : 40.856229066848755

0 / 128 : pp = 184.6475067138672 
10 / 128 : pp = 176.8856964111328 
20 / 128 : pp = 164.3444366455078 
30 / 128 : pp = 167.85472106933594 
40 / 128 : pp = 169.25367736816406 
50 / 128 : pp = 168.86561584472656 
60 / 128 : pp = 168.11801147460938 
70 / 128 : pp = 165.4105224609375 
80 / 128 : pp = 162.91146850585938 
90 / 128 : pp = 161.29742431640625 
100 / 128 : pp = 162.45989990234375 
110 / 128 : pp = 162.6834716796875 
120 / 128 : pp = 164.3359832763672 
=-==-==-==-==-=
Test perplexity: 164.0149383544922 
=-==-==-==-==-=
View Code

更详细的内容请参考下面链接

https://github.com/weizhenzhao/cs224d_nlp_problem_set2

 

今天将的还是cs224d 的problem set2 的第三部分习题,

原来国外大学的系统难度真的如此之大,相比之下还是默默地再天朝继续搬砖吧

下面讲述一下RNN语言建模的数学公式:

 

给出一串连续的词x1,x2...xt关于预测其后面紧跟的词xt+1的建模方式是:

vj是词库中的某个词。实现一个循环神经网络,此网络利用隐层中的反馈信息对"历史记录"x1,x2...xt进行建模:

$h^{(0)}=h_{0}\epsilon R^{D_{h}}$是隐藏层的初始化向量

$x^{(t)}L$是以$x^{(t)}$one-hot行向量与嵌入矩阵L的乘积

这个one-hot行向量就是当前处理词汇的索引

            

是词嵌入矩阵,

$L$是词嵌入矩阵

$I$是输入词表征矩阵

$H$是隐藏转换矩阵

$U$是输出词表征矩阵

$b_{1}$ $b_{2}$是偏置值

$d$是词嵌入的维数

|V|代表词库的规模

$D_{h}$是隐层的维数

输出向量

是面向整个词库的概率分布,我们需要最优化交叉熵(非正则化的)的损失率: 

使用困惑度来评估语言模型的性能,其定义形式如下:

梯度:

该模型中各个变量进行最优化迭代的时候的梯度如下所示:

初始化所有的上面这些需要训练的参数的值

然后通过对每一个词进行训练,安装上述公司求出每个参数的导数值

然后使用梯度下降方法对其进行更新

将新得到的参数代入到模型中,如果损失的值小于初始设定的值则停止迭代,否则继续进行迭代 

 


下面是一张RNNLM的结构图

 

上面这张是第二层RNN节点的结构图

上面这张是在RNN的变量上面应用Dropout的结构,降低模型过拟合的误差,第一层RNN的dropout结构

上面这张是第一层RNN的结构图

(注意前方高能,一大批天书即将来袭)

'''
Created on 2017年9月26日

@author: weizhen
'''
import getpass
import sys
import time
import numpy as np
from copy import deepcopy
from utils import calculate_perplexity, get_ptb_dataset, Vocab
from utils import ptb_iterator, sample
import tensorflow as tf
from model import LanguageModel
from tensorflow.contrib.legacy_seq2seq.python.ops.seq2seq import sequence_loss


class Config(object):
    """储存超参数和数据信息"""
    batch_size = 64
    embed_size = 50
    hidden_size = 100
    num_steps = 10
    max_epochs = 16
    early_stopping = 2
    dropout = 0.9
    lr = 0.001


class RNNLM_Model(LanguageModel):
    def load_data(self, debug=False):
        """加载词向量并且训练   train/dev/test 数据"""
        self.vocab = Vocab()
        self.vocab.construct(get_ptb_dataset('train'))
        self.encoded_train = np.array([self.vocab.encode(word) for word in get_ptb_dataset('train')], dtype=np.int32)
        self.encoded_valid = np.array([self.vocab.encode(word) for word in get_ptb_dataset('valid')], dtype=np.int32)
        self.encoded_test = np.array([self.vocab.encode(word) for word in get_ptb_dataset('test')])
        if debug:
            num_debug = 1024
            self.encoded_train = self.encoded_train[:num_debug]
            self.encoded_valid = self.encoded_valid[:num_debug]
            self.encoded_test = self.encoded_test[:num_debug]

    def add_placeholders(self):
        """生成placeholder 变量来表示输入的 tensors
            这些placeholder 被用来在模型的其他地方被填充
                            并且在训练的过程中会被填充
            input_placeholder:Input placeholder shape (None,num_steps),type  tf.int32
            labels_placeholder:label placeholder shape (None,num_steps) type tf.float32
            dropout_placeholder:dropput value placeholder (scalar), type tf.float32
        """
        self.input_placeholder = tf.placeholder(tf.int32, shape=[None, self.config.num_steps], name='Input')
        self.labels_placeholder = tf.placeholder(tf.int32, shape=[None, self.config.num_steps], name='Target')
        self.dropout_placeholder = tf.placeholder(tf.float32, name='Dropout')

    def add_embedding(self):
        """添加词嵌入层
        Hint : 这一层应该用input_placeholder 来索引词嵌入
        Hint : 你或许能发现tf.nn.embedding_lookup 是有用的
        Hint : 你或许能发现tf.split , tf.squeeze 是有用的在构造tensor 的输入的时候
        Hint : 下面是你需要创建的变量的维度
                L:(len(self.vocab),embed_size)
        Returns:
            inputs:一个训练次数的列表,每一个元素应该是
                    一个张量 大小是 (batch_size,embed_size)
        tf.split(dimension,num_split,input)
                dimension表示输入张量的哪一个维度,
                                        如果是0就表示对第0维度进行切割,
                num_split就是切割的数量,
                                        如果是2就表示输入张量被切成2份,
                                        每一份是一个列表
        tf.squeeze(input,squeeze_dims=None,name=None)
                                        从tensor中删除所有大小是1的维度
                example: t is a tensor of shape [1,2,1,3,1,1]
                        shape(squeeze(t))==>[2,3]
                        t is a tensor of shape [1,2,1,3,1,1]
                        shape(squeeze(t,[2,4]))==>[1,2,3,1]
        tf.nn.embedding_lookup 将词的索引映射到词的向量
        """
        with tf.device('/cpu:0'):
            embedding = tf.get_variable('Embedding', [len(self.vocab), self.config.embed_size], trainable=True)
            inputs = tf.nn.embedding_lookup(embedding, self.input_placeholder)
            inputs = [tf.squeeze(x, [1]) for x in tf.split(inputs, self.config.num_steps, 1)]
            return inputs

    def add_projection(self, rnn_outputs):
        """添加一个投影层
            投影层将隐藏层的表示变换到整个词向量上的分布式表示
            Hint:下面是你需要去创建的维度
                U(hidden_size,len(vocab))
                b_2:(len(vocab),)
            参数:
                rnn_outputs:一个训练次数的列表,每一个元素应该是一个张量
                            大小是(batch_size,embed_size)
            Returns:
                outputs:一个长度的列表,每一个元素是一个张量(batch_size,len(vocab))
        """
        with tf.variable_scope('Projection'):
            U = tf.get_variable('Matrix', [self.config.hidden_size, len(self.vocab)])
            proj_b = tf.get_variable('Bias', [len(self.vocab)])
            outputs = [tf.matmul(o, U) + proj_b for o in rnn_outputs]
        return outputs
    
    def add_loss_op(self, output):
        """将损失添加到目标函数上面
            Hint:使用tensorflow.python.ops.seq2seq.sequence_loss 来实现序列损失
                              参数:
                                        输出:一个张量   大小是 (None,self.vocab)
                              返回:
                                        损失:一个0-d大小的张量
        """
        all_ones = [tf.ones([self.config.batch_size * self.config.num_steps])]
        cross_entropy = sequence_loss([output], [tf.reshape(self.labels_placeholder, [-1])], all_ones, len(self.vocab))
        tf.add_to_collection('total_loss', cross_entropy)
        loss = tf.add_n(tf.get_collection('total_loss'))
        return loss
        
        
    def add_training_op(self, loss):
        """将目标损失添加到计算图上
            创建一个优化器并且应用梯度下降到所有的训练变量上面
            Hint:使用tf.train.AdamOptimizer 对于这个模型
                使用optimizer.minimize() 会返回一个train_op的对象
            参数:
                loss: 损失张量,来自于cross_entropy_loss 交叉熵损失
            返回:
                train_op:训练的目标
        """
        with tf.variable_scope("Optimizer") as scope:
            train_op = tf.train.AdamOptimizer(self.config.lr).minimize(loss)
        return train_op

    def __init__(self, config):
        self.config = config
        self.load_data(debug=False)
        self.add_placeholders()
        self.inputs = self.add_embedding()
        self.rnn_outputs = self.add_model(self.inputs)
        self.outputs = self.add_projection(self.rnn_outputs)

        # 我们想去检验下一个词预测得多好
        # 我们把o转变成float64 位 因为如果不这样就会有数值问题
        # sum(output of softmax) = 1.00000298179 并且不是 1
        self.predictions = [tf.nn.softmax(tf.cast(o, 'float64')) for o in self.outputs]
        # 将输出值转变成 len(vocab) 的大小
        output = tf.reshape(tf.concat(self.outputs, 1), [-1, len(self.vocab)])
        self.calculate_loss = self.add_loss_op(output)
        self.train_step = self.add_training_op(self.calculate_loss)

    def add_model(self, inputs):
        """创建RNN LM 模型
                      在下面的实现里面你需要去实现RNN LM 模型的等式
        Hint: 使用一个零向量 大小是 (batch_size,hidden_size) 作为初始的RNN的状态
        Hint: 将最后RNN输出 作为实例变量
            self.final_state
        Hint : 确保将dropout应用到 输入和输出的 变量上面
        Hint : 使用变量域 RNN 来定义 RNN变量
        Hint : 表现一个明显的 for-loop 在输入上面
                你可以使用scope.reuse_variable() 来确定权重
                在每一次迭代都是相同的
                确保不会在第一次循环的时候调用这个,因为没有变量会被初始化
        Hint : 下面变量的不同的维度 , 你需要去创建的

            H: (hidden_size,hidden_size)
            I: (embed_size,hidden_size)
            b_1:(hidden_size,)
        Args:
            inputs:一个记录num_steps的列表,里边的每一个元素应该是一个张量
                    大小是(batch_size,embed_size)的大小
        Returns:返回
            outputs:一个记录num_steps的列表,里面每一个元素应该是一个张量
                    大小是(batch_size,hidden_size)
        """
        with tf.variable_scope('InputDropout'):
            inputs = [tf.nn.dropout(x, self.dropout_placeholder) for x in inputs]

        with tf.variable_scope('RNN') as scope:
            self.initial_state = tf.zeros([self.config.batch_size, self.config.hidden_size])
            state = self.initial_state
            rnn_outputs = []
            for tstep, current_input in enumerate(inputs):
                if tstep > 0:
                    scope.reuse_variables()
                RNN_H = tf.get_variable('HMatrix', [self.config.hidden_size, self.config.hidden_size])
                RNN_I = tf.get_variable('IMatrix', [self.config.embed_size, self.config.hidden_size])
                RNN_b = tf.get_variable('B', [self.config.hidden_size])
                state = tf.nn.sigmoid(tf.matmul(state, RNN_H) + tf.matmul(current_input, RNN_I) + RNN_b)
                rnn_outputs.append(state)
            self.final_state = rnn_outputs[-1]

        with tf.variable_scope('RNNDropout'):
            rnn_outputs = [tf.nn.dropout(x, self.dropout_placeholder) for x in rnn_outputs]
        return rnn_outputs

    def run_epoch(self, session, data, train_op=None, verbose=10):
        config = self.config
        dp = config.dropout
        if not train_op:
            train_op = tf.no_op()
            dp = 1
        total_steps = sum(1 for x in ptb_iterator(data, config.batch_size, config.num_steps))
        total_loss = []
        state = self.initial_state.eval()
        for step, (x, y) in enumerate(ptb_iterator(data, config.batch_size, config.num_steps)):
            # 我们需要通过初始状态,并且从最终状态中抽取数据来进行填充
            # RNN 合适的 历史
            feed = {self.input_placeholder: x,
                    self.labels_placeholder: y,
                    self.initial_state: state,
                    self.dropout_placeholder: dp
                    }
            loss, state, _ = session.run([self.calculate_loss, self.final_state, train_op], feed_dict=feed)
            total_loss.append(loss)
            if verbose and step % verbose == 0:
                sys.stdout.write('\r{} / {} : pp = {} '.format(step, total_steps, np.exp(np.mean(total_loss))))
                sys.stdout.flush()
        if verbose:
            sys.stdout.write('\r')
        return np.exp(np.mean(total_loss))

def generate_text(session, model, config, starting_text='<eos>', stop_length=100, stop_tokens=None, temp=1.0):
    """从模型自动生成文字
        Hint:创建一个feed-dictionary 并且使用sess.run()方法去执行这个模型
                你会需要使用model.initial_state 作为一个键传递给feed_dict
        Hint:得到model.final_state 和 model.predictions[-1].
             在add_model()方法中设置model.final_state  。
             model.predictions 是在 __init__方法中设置的
        Hint:在模型的训练中存储输出的参数值,和预测的y_pred的值
        参数:
        Args:
            session : tf.Session() object
            model : Object of type RNNLM Model
            config : A Config() object
            starting_text:Initial text passed to model
        Returns:
            output : List of word idxs
    """
    state = model.initial_state.eval()
    # Imagine tokens as a batch size of one, length of len(tokens[0])
    tokens = [model.vocab.encode(word) for word in starting_text.split()]
    for i in range(stop_length):
        feed = {model.input_placeholder: [tokens[-1:]],
                model.initial_state: state,
                model.dropout_placeholder: 1}
        state, y_pred = session.run([model.final_state, model.predictions[-1]], feed_dict=feed)
        next_word_idx = sample(y_pred[0], temperature=temp)
        tokens.append(next_word_idx)
        if stop_tokens and model.vocab.decode(tokens[-1]) in stop_tokens:
            break
    output = [model.vocab.decode(word_idx) for word_idx in tokens]
    return output

def generate_sentence(session, model, config, *args, **kwargs):
    """方便从模型来生成句子"""
    return generate_text(session, model, config, *args, stop_tokens=['<eos>'], **kwargs)

def test_RNNLM():
    config = Config()
    gen_config = deepcopy(config)
    gen_config.batch_size = gen_config.num_steps = 1

    # 创建训练模型,并且生成模型
    with tf.variable_scope('RNNLM',reuse=None) as scope:
        model = RNNLM_Model(config)
        # 这个指示gen_model来重新使用相同的变量作为以上的模型
        scope.reuse_variables()
        gen_model = RNNLM_Model(gen_config)

    init = tf.global_variables_initializer()
    saver = tf.train.Saver()

    with tf.Session() as session:
        best_val_pp = float('inf')
        best_val_epoch = 0
        session.run(init)
        for epoch in range(config.max_epochs):
            print('Epoch {0}'.format(epoch))
            start = time.time()

            train_pp = model.run_epoch(session,
                                       model.encoded_train,
                                       train_op=model.train_step)
            valid_pp = model.run_epoch(session, model.encoded_valid)
            print('Training perplexity: {0}'.format(train_pp))
            print('Validation perplexity:{0}'.format(valid_pp))
            if valid_pp < best_val_pp:
                best_val_pp = valid_pp
                best_val_epoch = epoch
                saver.save(session, './ptb_rnnlm.weights')
            if epoch - best_val_epoch > config.early_stopping:
                break
            print('Total time : {0}'.format(time.time() - start))

        saver.restore(session, 'ptb_rnnlm.weights')
        test_pp = model.run_epoch(session, model.encoded_test)
        print('=-=' * 5)
        print('Test perplexity: {0} '.format(test_pp))
        print('=-=' * 5)
        starting_text = 'in palo alto'
        while starting_text:
            print(' '.join(generate_sentence(session, gen_model, gen_config, starting_text=starting_text, temp=1.0)))
            #starting_text = raw_input('>')


if __name__ == "__main__":
    test_RNNLM()

(其实也不算是天书啦,比高数简单多啦,比数学分析那是简单了好几十万倍了呀)

下面是训练的Log

1380 / 1452 : pp = 266.20892333984375 
1390 / 1452 : pp = 265.94439697265625 
1400 / 1452 : pp = 265.66845703125 
1410 / 1452 : pp = 265.5393981933594 
1420 / 1452 : pp = 265.32489013671875 
1430 / 1452 : pp = 265.2019348144531 
1440 / 1452 : pp = 265.13720703125 
1450 / 1452 : pp = 264.954833984375 

0 / 115 : pp = 296.9217224121094 
10 / 115 : pp = 282.02130126953125 
20 / 115 : pp = 279.76800537109375 
30 / 115 : pp = 276.4101257324219 
40 / 115 : pp = 276.2939147949219 
50 / 115 : pp = 270.73565673828125 
60 / 115 : pp = 269.88134765625 
70 / 115 : pp = 266.8675231933594 
80 / 115 : pp = 263.6731872558594 
90 / 115 : pp = 260.8569030761719 
100 / 115 : pp = 256.3356628417969 
110 / 115 : pp = 255.1026611328125 
Training perplexity: 264.9092102050781
Validation perplexity:254.84902954101562
Total time : 41.65332388877869
Epoch 3

0 / 1452 : pp = 327.0847473144531 
10 / 1452 : pp = 273.9620056152344 
20 / 1452 : pp = 270.22943115234375 
30 / 1452 : pp = 263.5213317871094 
40 / 1452 : pp = 264.0644836425781 
50 / 1452 : pp = 258.6029968261719 
60 / 1452 : pp = 257.04290771484375 
70 / 1452 : pp = 257.59161376953125 
80 / 1452 : pp = 256.7600402832031 
90 / 1452 : pp = 254.5120391845703 
100 / 1452 : pp = 252.44725036621094 
110 / 1452 : pp = 250.13954162597656 
120 / 1452 : pp = 249.91647338867188 
130 / 1452 : pp = 249.50460815429688 
140 / 1452 : pp = 247.67440795898438 
150 / 1452 : pp = 247.19090270996094 
160 / 1452 : pp = 247.8919219970703 
170 / 1452 : pp = 247.54322814941406 
180 / 1452 : pp = 246.17623901367188 
190 / 1452 : pp = 245.78330993652344 
200 / 1452 : pp = 246.80552673339844 
210 / 1452 : pp = 246.3059844970703 
220 / 1452 : pp = 246.19021606445312 
230 / 1452 : pp = 246.70140075683594 
240 / 1452 : pp = 246.3099822998047 
250 / 1452 : pp = 245.1745147705078 
260 / 1452 : pp = 244.17384338378906 
270 / 1452 : pp = 242.57363891601562 
280 / 1452 : pp = 242.8500213623047 
290 / 1452 : pp = 243.0492706298828 
300 / 1452 : pp = 243.1466522216797 
310 / 1452 : pp = 242.89044189453125 
320 / 1452 : pp = 243.08045959472656 
330 / 1452 : pp = 243.32235717773438 
340 / 1452 : pp = 242.34715270996094 
350 / 1452 : pp = 242.80972290039062 
360 / 1452 : pp = 242.5345458984375 
370 / 1452 : pp = 242.0083465576172 
380 / 1452 : pp = 241.22708129882812 
390 / 1452 : pp = 241.24398803710938 
400 / 1452 : pp = 240.63473510742188 
410 / 1452 : pp = 240.94094848632812 
420 / 1452 : pp = 241.19717407226562 
430 / 1452 : pp = 240.8896026611328 
440 / 1452 : pp = 240.7772979736328 
450 / 1452 : pp = 240.45913696289062 
460 / 1452 : pp = 240.06674194335938 
470 / 1452 : pp = 239.42198181152344 
480 / 1452 : pp = 238.39271545410156 
490 / 1452 : pp = 238.0517120361328 
500 / 1452 : pp = 237.31752014160156 
510 / 1452 : pp = 237.1197967529297 
520 / 1452 : pp = 236.64865112304688 
530 / 1452 : pp = 236.004638671875 
540 / 1452 : pp = 235.192626953125 
550 / 1452 : pp = 234.6700439453125 
560 / 1452 : pp = 234.1914825439453 
570 / 1452 : pp = 233.80899047851562 
580 / 1452 : pp = 233.3753662109375 
590 / 1452 : pp = 232.8699188232422 
600 / 1452 : pp = 232.2629852294922 
610 / 1452 : pp = 231.8668212890625 
620 / 1452 : pp = 231.478515625 
630 / 1452 : pp = 231.0444793701172 
640 / 1452 : pp = 231.2737579345703 
650 / 1452 : pp = 231.28114318847656 
660 / 1452 : pp = 231.4324951171875 
670 / 1452 : pp = 231.48513793945312 
680 / 1452 : pp = 231.45932006835938 
690 / 1452 : pp = 231.17738342285156 
700 / 1452 : pp = 231.00570678710938 
710 / 1452 : pp = 231.03810119628906 
720 / 1452 : pp = 230.96131896972656 
730 / 1452 : pp = 230.91110229492188 
740 / 1452 : pp = 231.13539123535156 
750 / 1452 : pp = 231.04393005371094 
760 / 1452 : pp = 231.03489685058594 
770 / 1452 : pp = 231.19744873046875 
780 / 1452 : pp = 231.26625061035156 
790 / 1452 : pp = 231.38714599609375 
800 / 1452 : pp = 231.24441528320312 
810 / 1452 : pp = 231.16824340820312 
820 / 1452 : pp = 231.11831665039062 
830 / 1452 : pp = 231.34886169433594 
840 / 1452 : pp = 231.221923828125 
850 / 1452 : pp = 231.2562255859375 
860 / 1452 : pp = 231.26492309570312 
870 / 1452 : pp = 231.1961212158203 
880 / 1452 : pp = 231.30506896972656 
890 / 1452 : pp = 231.24728393554688 
900 / 1452 : pp = 231.15744018554688 
910 / 1452 : pp = 231.20175170898438 
920 / 1452 : pp = 231.25534057617188 
930 / 1452 : pp = 231.09461975097656 
940 / 1452 : pp = 231.12612915039062 
950 / 1452 : pp = 231.0475616455078 
960 / 1452 : pp = 230.86056518554688 
970 / 1452 : pp = 230.80377197265625 
980 / 1452 : pp = 230.4598846435547 
990 / 1452 : pp = 230.24559020996094 
1000 / 1452 : pp = 229.91030883789062 
1010 / 1452 : pp = 229.9349822998047 
1020 / 1452 : pp = 230.01470947265625 
1030 / 1452 : pp = 229.8909149169922 
1040 / 1452 : pp = 229.9403533935547 
1050 / 1452 : pp = 229.84815979003906 
1060 / 1452 : pp = 229.60377502441406 
1070 / 1452 : pp = 229.74647521972656 
1080 / 1452 : pp = 229.80410766601562 
1090 / 1452 : pp = 229.78733825683594 
1100 / 1452 : pp = 229.64549255371094 
1110 / 1452 : pp = 229.26255798339844 
1120 / 1452 : pp = 229.00262451171875 
1130 / 1452 : pp = 228.6716766357422 
1140 / 1452 : pp = 228.55067443847656 
1150 / 1452 : pp = 228.61563110351562 
1160 / 1452 : pp = 228.50958251953125 
1170 / 1452 : pp = 228.3498992919922 
1180 / 1452 : pp = 228.29786682128906 
1190 / 1452 : pp = 228.33204650878906 
1200 / 1452 : pp = 228.27369689941406 
1210 / 1452 : pp = 228.11831665039062 
1220 / 1452 : pp = 228.21775817871094 
1230 / 1452 : pp = 228.3170166015625 
1240 / 1452 : pp = 228.22134399414062 
1250 / 1452 : pp = 228.3769073486328 
1260 / 1452 : pp = 228.37527465820312 
1270 / 1452 : pp = 228.33694458007812 
1280 / 1452 : pp = 228.27108764648438 
1290 / 1452 : pp = 228.1731414794922 
1300 / 1452 : pp = 228.12200927734375 
1310 / 1452 : pp = 228.10275268554688 
1320 / 1452 : pp = 227.9289093017578 
1330 / 1452 : pp = 227.77723693847656 
1340 / 1452 : pp = 227.79623413085938 
1350 / 1452 : pp = 227.7408447265625 
1360 / 1452 : pp = 227.72586059570312 
1370 / 1452 : pp = 227.49728393554688 
1380 / 1452 : pp = 227.37940979003906 
1390 / 1452 : pp = 227.20166015625 
1400 / 1452 : pp = 227.018310546875 
1410 / 1452 : pp = 226.95651245117188 
1420 / 1452 : pp = 226.8065643310547 
1430 / 1452 : pp = 226.7261199951172 
1440 / 1452 : pp = 226.7193145751953 
1450 / 1452 : pp = 226.61068725585938 

0 / 115 : pp = 269.342041015625 
10 / 115 : pp = 255.03016662597656 
20 / 115 : pp = 253.8992919921875 
30 / 115 : pp = 251.04025268554688 
40 / 115 : pp = 250.51756286621094 
50 / 115 : pp = 245.3595428466797 
60 / 115 : pp = 244.4713897705078 
70 / 115 : pp = 241.2674560546875 
80 / 115 : pp = 238.3473663330078 
90 / 115 : pp = 235.56423950195312 
100 / 115 : pp = 231.2281036376953 
110 / 115 : pp = 229.8423614501953 
Training perplexity: 226.5760040283203
Validation perplexity:229.59939575195312
Total time : 42.202677726745605
Epoch 4

0 / 1452 : pp = 282.2423095703125 
10 / 1452 : pp = 240.16258239746094 
20 / 1452 : pp = 236.12203979492188 
30 / 1452 : pp = 230.3953857421875 
40 / 1452 : pp = 231.8789825439453 
50 / 1452 : pp = 227.26612854003906 
60 / 1452 : pp = 226.22061157226562 
70 / 1452 : pp = 227.01885986328125 
80 / 1452 : pp = 226.2459716796875 
90 / 1452 : pp = 224.3211669921875 
100 / 1452 : pp = 222.65615844726562 
110 / 1452 : pp = 220.70326232910156 
120 / 1452 : pp = 220.42288208007812 
130 / 1452 : pp = 219.8100128173828 
140 / 1452 : pp = 218.04432678222656 
150 / 1452 : pp = 217.31639099121094 
160 / 1452 : pp = 217.86349487304688 
170 / 1452 : pp = 217.46597290039062 
180 / 1452 : pp = 216.3349151611328 
190 / 1452 : pp = 216.12240600585938 
200 / 1452 : pp = 216.97842407226562 
210 / 1452 : pp = 216.51014709472656 
220 / 1452 : pp = 216.46751403808594 
230 / 1452 : pp = 216.80126953125 
240 / 1452 : pp = 216.45965576171875 
250 / 1452 : pp = 215.5008544921875 
260 / 1452 : pp = 214.62210083007812 
270 / 1452 : pp = 213.29183959960938 
280 / 1452 : pp = 213.5621337890625 
290 / 1452 : pp = 213.80657958984375 
300 / 1452 : pp = 213.8963165283203 
310 / 1452 : pp = 213.60653686523438 
320 / 1452 : pp = 213.85877990722656 
330 / 1452 : pp = 214.07345581054688 
340 / 1452 : pp = 213.25421142578125 
350 / 1452 : pp = 213.68019104003906 
360 / 1452 : pp = 213.41717529296875 
370 / 1452 : pp = 213.04920959472656 
380 / 1452 : pp = 212.39019775390625 
390 / 1452 : pp = 212.4908905029297 
400 / 1452 : pp = 212.01914978027344 
410 / 1452 : pp = 212.36903381347656 
420 / 1452 : pp = 212.6802520751953 
430 / 1452 : pp = 212.42697143554688 
440 / 1452 : pp = 212.42990112304688 
450 / 1452 : pp = 212.14524841308594 
460 / 1452 : pp = 211.7836151123047 
470 / 1452 : pp = 211.17282104492188 
480 / 1452 : pp = 210.27903747558594 
490 / 1452 : pp = 209.95211791992188 
500 / 1452 : pp = 209.28302001953125 
510 / 1452 : pp = 209.1029815673828 
520 / 1452 : pp = 208.73855590820312 
530 / 1452 : pp = 208.19700622558594 
540 / 1452 : pp = 207.4554443359375 
550 / 1452 : pp = 207.0062255859375 
560 / 1452 : pp = 206.59739685058594 
570 / 1452 : pp = 206.27874755859375 
580 / 1452 : pp = 205.87144470214844 
590 / 1452 : pp = 205.43545532226562 
600 / 1452 : pp = 204.90940856933594 
610 / 1452 : pp = 204.5686798095703 
620 / 1452 : pp = 204.22862243652344 
630 / 1452 : pp = 203.8448028564453 
640 / 1452 : pp = 204.06576538085938 
650 / 1452 : pp = 204.0941925048828 
660 / 1452 : pp = 204.22103881835938 
670 / 1452 : pp = 204.289794921875 
680 / 1452 : pp = 204.3115234375 
690 / 1452 : pp = 204.10284423828125 
700 / 1452 : pp = 203.99757385253906 
710 / 1452 : pp = 204.04971313476562 
720 / 1452 : pp = 204.03152465820312 
730 / 1452 : pp = 203.99046325683594 
740 / 1452 : pp = 204.19786071777344 
750 / 1452 : pp = 204.1642608642578 
760 / 1452 : pp = 204.19435119628906 
770 / 1452 : pp = 204.37786865234375 
780 / 1452 : pp = 204.4965057373047 
790 / 1452 : pp = 204.6479034423828 
800 / 1452 : pp = 204.56117248535156 
810 / 1452 : pp = 204.52284240722656 
820 / 1452 : pp = 204.50978088378906 
830 / 1452 : pp = 204.7531280517578 
840 / 1452 : pp = 204.64468383789062 
850 / 1452 : pp = 204.71348571777344 
860 / 1452 : pp = 204.7399444580078 
870 / 1452 : pp = 204.69406127929688 
880 / 1452 : pp = 204.7965850830078 
890 / 1452 : pp = 204.7594757080078 
900 / 1452 : pp = 204.71446228027344 
910 / 1452 : pp = 204.7590789794922 
920 / 1452 : pp = 204.85772705078125 
930 / 1452 : pp = 204.7428741455078 
940 / 1452 : pp = 204.8068389892578 
950 / 1452 : pp = 204.75791931152344 
960 / 1452 : pp = 204.63815307617188 
970 / 1452 : pp = 204.60760498046875 
980 / 1452 : pp = 204.34347534179688 
990 / 1452 : pp = 204.151611328125 
1000 / 1452 : pp = 203.8665771484375 
1010 / 1452 : pp = 203.9164581298828 
1020 / 1452 : pp = 204.0184783935547 
1030 / 1452 : pp = 203.95166015625 
1040 / 1452 : pp = 204.03045654296875 
1050 / 1452 : pp = 203.95846557617188 
1060 / 1452 : pp = 203.77114868164062 
1070 / 1452 : pp = 203.93260192871094 
1080 / 1452 : pp = 204.00048828125 
1090 / 1452 : pp = 204.00233459472656 
1100 / 1452 : pp = 203.8960418701172 
1110 / 1452 : pp = 203.5987548828125 
1120 / 1452 : pp = 203.38392639160156 
1130 / 1452 : pp = 203.08872985839844 
1140 / 1452 : pp = 203.01272583007812 
1150 / 1452 : pp = 203.0865936279297 
1160 / 1452 : pp = 203.02308654785156 
1170 / 1452 : pp = 202.9125518798828 
1180 / 1452 : pp = 202.9097442626953 
1190 / 1452 : pp = 202.98252868652344 
1200 / 1452 : pp = 202.95387268066406 
1210 / 1452 : pp = 202.851318359375 
1220 / 1452 : pp = 202.97671508789062 
1230 / 1452 : pp = 203.1051025390625 
1240 / 1452 : pp = 203.0526123046875 
1250 / 1452 : pp = 203.21417236328125 
1260 / 1452 : pp = 203.23617553710938 
1270 / 1452 : pp = 203.22802734375 
1280 / 1452 : pp = 203.20846557617188 
1290 / 1452 : pp = 203.15362548828125 
1300 / 1452 : pp = 203.14315795898438 
1310 / 1452 : pp = 203.15264892578125 
1320 / 1452 : pp = 203.02801513671875 
1330 / 1452 : pp = 202.92977905273438 
1340 / 1452 : pp = 202.95484924316406 
1350 / 1452 : pp = 202.9335479736328 
1360 / 1452 : pp = 202.955322265625 
1370 / 1452 : pp = 202.7740478515625 
1380 / 1452 : pp = 202.68569946289062 
1390 / 1452 : pp = 202.55816650390625 
1400 / 1452 : pp = 202.41651916503906 
1410 / 1452 : pp = 202.38494873046875 
1420 / 1452 : pp = 202.27593994140625 
1430 / 1452 : pp = 202.21826171875 
1440 / 1452 : pp = 202.23272705078125 
1450 / 1452 : pp = 202.16099548339844 

0 / 115 : pp = 253.23211669921875 
10 / 115 : pp = 237.62506103515625 
20 / 115 : pp = 237.60557556152344 
30 / 115 : pp = 234.9273223876953 
40 / 115 : pp = 234.30519104003906 
50 / 115 : pp = 229.43960571289062 
60 / 115 : pp = 228.6050567626953 
70 / 115 : pp = 225.2646484375 
80 / 115 : pp = 222.55935668945312 
90 / 115 : pp = 219.83255004882812 
100 / 115 : pp = 215.5491485595703 
110 / 115 : pp = 214.07937622070312 
Training perplexity: 202.1349639892578
Validation perplexity:213.85256958007812
Total time : 42.10724234580994
Epoch 5

0 / 1452 : pp = 255.92384338378906 
10 / 1452 : pp = 219.5322265625 
20 / 1452 : pp = 214.36212158203125 
30 / 1452 : pp = 209.12620544433594 
40 / 1452 : pp = 210.04193115234375 
50 / 1452 : pp = 205.77398681640625 
60 / 1452 : pp = 204.8201141357422 
70 / 1452 : pp = 205.3955841064453 
80 / 1452 : pp = 204.8386688232422 
90 / 1452 : pp = 203.21194458007812 
100 / 1452 : pp = 201.87643432617188 
110 / 1452 : pp = 200.10122680664062 
120 / 1452 : pp = 199.82012939453125 
130 / 1452 : pp = 199.11192321777344 
140 / 1452 : pp = 197.51919555664062 
150 / 1452 : pp = 197.03567504882812 
160 / 1452 : pp = 197.4231414794922 
170 / 1452 : pp = 197.09571838378906 
180 / 1452 : pp = 196.17665100097656 
190 / 1452 : pp = 196.0064697265625 
200 / 1452 : pp = 196.7347869873047 
210 / 1452 : pp = 196.3063507080078 
220 / 1452 : pp = 196.21388244628906 
230 / 1452 : pp = 196.5252227783203 
240 / 1452 : pp = 196.203125 
250 / 1452 : pp = 195.3251953125 
260 / 1452 : pp = 194.53335571289062 
270 / 1452 : pp = 193.3546142578125 
280 / 1452 : pp = 193.59420776367188 
290 / 1452 : pp = 193.83297729492188 
300 / 1452 : pp = 193.98489379882812 
310 / 1452 : pp = 193.68414306640625 
320 / 1452 : pp = 193.89065551757812 
330 / 1452 : pp = 194.0518798828125 
340 / 1452 : pp = 193.32888793945312 
350 / 1452 : pp = 193.76219177246094 
360 / 1452 : pp = 193.56106567382812 
370 / 1452 : pp = 193.28179931640625 
380 / 1452 : pp = 192.7037811279297 
390 / 1452 : pp = 192.8145294189453 
400 / 1452 : pp = 192.43325805664062 
410 / 1452 : pp = 192.81527709960938 
420 / 1452 : pp = 193.13760375976562 
430 / 1452 : pp = 192.9148712158203 
440 / 1452 : pp = 192.92526245117188 
450 / 1452 : pp = 192.70083618164062 
460 / 1452 : pp = 192.36647033691406 
470 / 1452 : pp = 191.85394287109375 
480 / 1452 : pp = 191.07244873046875 
490 / 1452 : pp = 190.75401306152344 
500 / 1452 : pp = 190.1843719482422 
510 / 1452 : pp = 190.03334045410156 
520 / 1452 : pp = 189.72938537597656 
530 / 1452 : pp = 189.25889587402344 
540 / 1452 : pp = 188.59315490722656 
550 / 1452 : pp = 188.19313049316406 
560 / 1452 : pp = 187.80621337890625 
570 / 1452 : pp = 187.5229034423828 
580 / 1452 : pp = 187.1091766357422 
590 / 1452 : pp = 186.72592163085938 
600 / 1452 : pp = 186.2238006591797 
610 / 1452 : pp = 185.89695739746094 
620 / 1452 : pp = 185.60989379882812 
630 / 1452 : pp = 185.2689208984375 
640 / 1452 : pp = 185.47567749023438 
650 / 1452 : pp = 185.5127410888672 
660 / 1452 : pp = 185.64627075195312 
670 / 1452 : pp = 185.71311950683594 
680 / 1452 : pp = 185.72569274902344 
690 / 1452 : pp = 185.56459045410156 
700 / 1452 : pp = 185.48681640625 
710 / 1452 : pp = 185.5458221435547 
720 / 1452 : pp = 185.5598907470703 
730 / 1452 : pp = 185.5335235595703 
740 / 1452 : pp = 185.73995971679688 
750 / 1452 : pp = 185.744384765625 
760 / 1452 : pp = 185.81268310546875 
770 / 1452 : pp = 186.00088500976562 
780 / 1452 : pp = 186.14443969726562 
790 / 1452 : pp = 186.30764770507812 
800 / 1452 : pp = 186.2595977783203 
810 / 1452 : pp = 186.23028564453125 
820 / 1452 : pp = 186.23997497558594 
830 / 1452 : pp = 186.49057006835938 
840 / 1452 : pp = 186.43331909179688 
850 / 1452 : pp = 186.48887634277344 
860 / 1452 : pp = 186.51502990722656 
870 / 1452 : pp = 186.5167999267578 
880 / 1452 : pp = 186.62400817871094 
890 / 1452 : pp = 186.6103973388672 
900 / 1452 : pp = 186.58111572265625 
910 / 1452 : pp = 186.64126586914062 
920 / 1452 : pp = 186.7366180419922 
930 / 1452 : pp = 186.65719604492188 
940 / 1452 : pp = 186.71755981445312 
950 / 1452 : pp = 186.6977996826172 
960 / 1452 : pp = 186.62774658203125 
970 / 1452 : pp = 186.62115478515625 
980 / 1452 : pp = 186.3773193359375 
990 / 1452 : pp = 186.23109436035156 
1000 / 1452 : pp = 185.99227905273438 
1010 / 1452 : pp = 186.0488739013672 
1020 / 1452 : pp = 186.1744384765625 
1030 / 1452 : pp = 186.1162109375 
1040 / 1452 : pp = 186.18899536132812 
1050 / 1452 : pp = 186.1549072265625 
1060 / 1452 : pp = 186.01419067382812 
1070 / 1452 : pp = 186.17364501953125 
1080 / 1452 : pp = 186.27061462402344 
1090 / 1452 : pp = 186.28428649902344 
1100 / 1452 : pp = 186.2150115966797 
1110 / 1452 : pp = 185.95103454589844 
1120 / 1452 : pp = 185.77423095703125 
1130 / 1452 : pp = 185.5232696533203 
1140 / 1452 : pp = 185.4607391357422 
1150 / 1452 : pp = 185.56077575683594 
1160 / 1452 : pp = 185.53343200683594 
1170 / 1452 : pp = 185.46453857421875 
1180 / 1452 : pp = 185.4741668701172 
1190 / 1452 : pp = 185.5594482421875 
1200 / 1452 : pp = 185.53785705566406 
1210 / 1452 : pp = 185.4576416015625 
1220 / 1452 : pp = 185.5943145751953 
1230 / 1452 : pp = 185.7483673095703 
1240 / 1452 : pp = 185.70762634277344 
1250 / 1452 : pp = 185.8568115234375 
1260 / 1452 : pp = 185.90635681152344 
1270 / 1452 : pp = 185.8961639404297 
1280 / 1452 : pp = 185.89199829101562 
1290 / 1452 : pp = 185.85911560058594 
1300 / 1452 : pp = 185.86097717285156 
1310 / 1452 : pp = 185.88739013671875 
1320 / 1452 : pp = 185.79248046875 
1330 / 1452 : pp = 185.69700622558594 
1340 / 1452 : pp = 185.7310028076172 
1350 / 1452 : pp = 185.72613525390625 
1360 / 1452 : pp = 185.76829528808594 
1370 / 1452 : pp = 185.6322021484375 
1380 / 1452 : pp = 185.56378173828125 
1390 / 1452 : pp = 185.4654998779297 
1400 / 1452 : pp = 185.35110473632812 
1410 / 1452 : pp = 185.33917236328125 
1420 / 1452 : pp = 185.2509002685547 
1430 / 1452 : pp = 185.20436096191406 
1440 / 1452 : pp = 185.2254638671875 
1450 / 1452 : pp = 185.16542053222656 

0 / 115 : pp = 242.26800537109375 
10 / 115 : pp = 226.12258911132812 
20 / 115 : pp = 226.4702606201172 
30 / 115 : pp = 223.982666015625 
40 / 115 : pp = 223.376953125 
50 / 115 : pp = 218.65716552734375 
60 / 115 : pp = 217.95306396484375 
70 / 115 : pp = 214.5392303466797 
80 / 115 : pp = 212.07525634765625 
90 / 115 : pp = 209.40631103515625 
100 / 115 : pp = 205.1455078125 
110 / 115 : pp = 203.6289520263672 
Training perplexity: 185.14476013183594
Validation perplexity:203.3822784423828
Total time : 42.47052240371704
Epoch 6

0 / 1452 : pp = 233.56707763671875 
10 / 1452 : pp = 202.6468505859375 
20 / 1452 : pp = 198.2734375 
30 / 1452 : pp = 193.47442626953125 
40 / 1452 : pp = 195.17147827148438 
50 / 1452 : pp = 191.5596923828125 
60 / 1452 : pp = 190.4825897216797 
70 / 1452 : pp = 191.07681274414062 
80 / 1452 : pp = 190.339599609375 
90 / 1452 : pp = 188.98277282714844 
100 / 1452 : pp = 187.74757385253906 
110 / 1452 : pp = 186.10104370117188 
120 / 1452 : pp = 185.7500457763672 
130 / 1452 : pp = 184.90707397460938 
140 / 1452 : pp = 183.340087890625 
150 / 1452 : pp = 182.70840454101562 
160 / 1452 : pp = 183.1043701171875 
170 / 1452 : pp = 182.69776916503906 
180 / 1452 : pp = 181.88400268554688 
190 / 1452 : pp = 181.8062286376953 
200 / 1452 : pp = 182.4969940185547 
210 / 1452 : pp = 182.10572814941406 
220 / 1452 : pp = 181.9981689453125 
230 / 1452 : pp = 182.3802490234375 
240 / 1452 : pp = 182.03636169433594 
250 / 1452 : pp = 181.23712158203125 
260 / 1452 : pp = 180.53726196289062 
270 / 1452 : pp = 179.53567504882812 
280 / 1452 : pp = 179.70208740234375 
290 / 1452 : pp = 179.977783203125 
300 / 1452 : pp = 180.16600036621094 
310 / 1452 : pp = 179.87294006347656 
320 / 1452 : pp = 180.11849975585938 
330 / 1452 : pp = 180.31838989257812 
340 / 1452 : pp = 179.56759643554688 
350 / 1452 : pp = 179.97134399414062 
360 / 1452 : pp = 179.80030822753906 
370 / 1452 : pp = 179.52085876464844 
380 / 1452 : pp = 178.98228454589844 
390 / 1452 : pp = 179.0868682861328 
400 / 1452 : pp = 178.74569702148438 
410 / 1452 : pp = 179.1776580810547 
420 / 1452 : pp = 179.5055389404297 
430 / 1452 : pp = 179.3883056640625 
440 / 1452 : pp = 179.42279052734375 
450 / 1452 : pp = 179.2106475830078 
460 / 1452 : pp = 178.85311889648438 
470 / 1452 : pp = 178.33840942382812 
480 / 1452 : pp = 177.60350036621094 
490 / 1452 : pp = 177.30335998535156 
500 / 1452 : pp = 176.72222900390625 
510 / 1452 : pp = 176.6067352294922 
520 / 1452 : pp = 176.33998107910156 
530 / 1452 : pp = 175.93162536621094 
540 / 1452 : pp = 175.30657958984375 
550 / 1452 : pp = 174.9462432861328 
560 / 1452 : pp = 174.5836639404297 
570 / 1452 : pp = 174.31431579589844 
580 / 1452 : pp = 173.92300415039062 
590 / 1452 : pp = 173.55856323242188 
600 / 1452 : pp = 173.08277893066406 
610 / 1452 : pp = 172.75930786132812 
620 / 1452 : pp = 172.53192138671875 
630 / 1452 : pp = 172.20652770996094 
640 / 1452 : pp = 172.37454223632812 
650 / 1452 : pp = 172.39845275878906 
660 / 1452 : pp = 172.52255249023438 
670 / 1452 : pp = 172.60935974121094 
680 / 1452 : pp = 172.6611328125 
690 / 1452 : pp = 172.53118896484375 
700 / 1452 : pp = 172.4709014892578 
710 / 1452 : pp = 172.5406494140625 
720 / 1452 : pp = 172.55447387695312 
730 / 1452 : pp = 172.5330047607422 
740 / 1452 : pp = 172.7061767578125 
750 / 1452 : pp = 172.71054077148438 
760 / 1452 : pp = 172.77743530273438 
770 / 1452 : pp = 172.95481872558594 
780 / 1452 : pp = 173.11265563964844 
790 / 1452 : pp = 173.2832794189453 
800 / 1452 : pp = 173.2537841796875 
810 / 1452 : pp = 173.22164916992188 
820 / 1452 : pp = 173.24148559570312 
830 / 1452 : pp = 173.48228454589844 
840 / 1452 : pp = 173.43753051757812 
850 / 1452 : pp = 173.505615234375 
860 / 1452 : pp = 173.5214080810547 
870 / 1452 : pp = 173.5009002685547 
880 / 1452 : pp = 173.6202392578125 
890 / 1452 : pp = 173.622802734375 
900 / 1452 : pp = 173.5987091064453 
910 / 1452 : pp = 173.68316650390625 
920 / 1452 : pp = 173.77330017089844 
930 / 1452 : pp = 173.72018432617188 
940 / 1452 : pp = 173.79351806640625 
950 / 1452 : pp = 173.7653350830078 
960 / 1452 : pp = 173.7102508544922 
970 / 1452 : pp = 173.69766235351562 
980 / 1452 : pp = 173.4836883544922 
990 / 1452 : pp = 173.3550262451172 
1000 / 1452 : pp = 173.14816284179688 
1010 / 1452 : pp = 173.20777893066406 
1020 / 1452 : pp = 173.3390655517578 
1030 / 1452 : pp = 173.2884063720703 
1040 / 1452 : pp = 173.38015747070312 
1050 / 1452 : pp = 173.35592651367188 
1060 / 1452 : pp = 173.2260284423828 
1070 / 1452 : pp = 173.39321899414062 
1080 / 1452 : pp = 173.4879913330078 
1090 / 1452 : pp = 173.5231475830078 
1100 / 1452 : pp = 173.47177124023438 
1110 / 1452 : pp = 173.24453735351562 
1120 / 1452 : pp = 173.09408569335938 
1130 / 1452 : pp = 172.86627197265625 
1140 / 1452 : pp = 172.8234100341797 
1150 / 1452 : pp = 172.92843627929688 
1160 / 1452 : pp = 172.90065002441406 
1170 / 1452 : pp = 172.8550567626953 
1180 / 1452 : pp = 172.8810272216797 
1190 / 1452 : pp = 172.97312927246094 
1200 / 1452 : pp = 172.9776611328125 
1210 / 1452 : pp = 172.89413452148438 
1220 / 1452 : pp = 173.0257568359375 
1230 / 1452 : pp = 173.1847381591797 
1240 / 1452 : pp = 173.1756591796875 
1250 / 1452 : pp = 173.32138061523438 
1260 / 1452 : pp = 173.37229919433594 
1270 / 1452 : pp = 173.36891174316406 
1280 / 1452 : pp = 173.36337280273438 
1290 / 1452 : pp = 173.3444366455078 
1300 / 1452 : pp = 173.36138916015625 
1310 / 1452 : pp = 173.4015655517578 
1320 / 1452 : pp = 173.31790161132812 
1330 / 1452 : pp = 173.24710083007812 
1340 / 1452 : pp = 173.27212524414062 
1350 / 1452 : pp = 173.27674865722656 
1360 / 1452 : pp = 173.32749938964844 
1370 / 1452 : pp = 173.20472717285156 
1380 / 1452 : pp = 173.14889526367188 
1390 / 1452 : pp = 173.0755157470703 
1400 / 1452 : pp = 172.9678497314453 
1410 / 1452 : pp = 172.9612579345703 
1420 / 1452 : pp = 172.8872833251953 
1430 / 1452 : pp = 172.84805297851562 
1440 / 1452 : pp = 172.87252807617188 
1450 / 1452 : pp = 172.82505798339844 

0 / 115 : pp = 236.35635375976562 
10 / 115 : pp = 219.06166076660156 
20 / 115 : pp = 219.7670440673828 
30 / 115 : pp = 217.33587646484375 
40 / 115 : pp = 216.6626739501953 
50 / 115 : pp = 212.04734802246094 
60 / 115 : pp = 211.42068481445312 
70 / 115 : pp = 207.9592742919922 
80 / 115 : pp = 205.6216583251953 
90 / 115 : pp = 202.93597412109375 
100 / 115 : pp = 198.62583923339844 
110 / 115 : pp = 196.97216796875 
Training perplexity: 172.80404663085938
Validation perplexity:196.6871337890625
Total time : 41.52522921562195
Epoch 7

0 / 1452 : pp = 219.23231506347656 
10 / 1452 : pp = 192.07225036621094 
20 / 1452 : pp = 187.48464965820312 
30 / 1452 : pp = 182.9149932861328 
40 / 1452 : pp = 184.2945098876953 
50 / 1452 : pp = 180.78492736816406 
60 / 1452 : pp = 179.377197265625 
70 / 1452 : pp = 180.0273895263672 
80 / 1452 : pp = 179.2517547607422 
90 / 1452 : pp = 177.77540588378906 
100 / 1452 : pp = 176.6474151611328 
110 / 1452 : pp = 174.84066772460938 
120 / 1452 : pp = 174.46890258789062 
130 / 1452 : pp = 173.64573669433594 
140 / 1452 : pp = 172.17483520507812 
150 / 1452 : pp = 171.57041931152344 
160 / 1452 : pp = 171.92059326171875 
170 / 1452 : pp = 171.5497283935547 
180 / 1452 : pp = 170.77249145507812 
190 / 1452 : pp = 170.72103881835938 
200 / 1452 : pp = 171.336181640625 
210 / 1452 : pp = 170.98524475097656 
220 / 1452 : pp = 170.99771118164062 
230 / 1452 : pp = 171.39918518066406 
240 / 1452 : pp = 171.09925842285156 
250 / 1452 : pp = 170.39962768554688 
260 / 1452 : pp = 169.7328643798828 
270 / 1452 : pp = 168.72225952148438 
280 / 1452 : pp = 168.92552185058594 
290 / 1452 : pp = 169.20147705078125 
300 / 1452 : pp = 169.40338134765625 
310 / 1452 : pp = 169.12057495117188 
320 / 1452 : pp = 169.31236267089844 
330 / 1452 : pp = 169.49945068359375 
340 / 1452 : pp = 168.8396759033203 
350 / 1452 : pp = 169.25917053222656 
360 / 1452 : pp = 169.09388732910156 
370 / 1452 : pp = 168.84323120117188 
380 / 1452 : pp = 168.3832550048828 
390 / 1452 : pp = 168.48275756835938 
400 / 1452 : pp = 168.19972229003906 
410 / 1452 : pp = 168.5838623046875 
420 / 1452 : pp = 168.91119384765625 
430 / 1452 : pp = 168.80836486816406 
440 / 1452 : pp = 168.90264892578125 
450 / 1452 : pp = 168.68589782714844 
460 / 1452 : pp = 168.3704071044922 
470 / 1452 : pp = 167.90394592285156 
480 / 1452 : pp = 167.23373413085938 
490 / 1452 : pp = 166.9560546875 
500 / 1452 : pp = 166.43161010742188 
510 / 1452 : pp = 166.320068359375 
520 / 1452 : pp = 166.05902099609375 
530 / 1452 : pp = 165.71714782714844 
540 / 1452 : pp = 165.10398864746094 
550 / 1452 : pp = 164.80430603027344 
560 / 1452 : pp = 164.4687042236328 
570 / 1452 : pp = 164.2272491455078 
580 / 1452 : pp = 163.84312438964844 
590 / 1452 : pp = 163.46035766601562 
600 / 1452 : pp = 163.01559448242188 
610 / 1452 : pp = 162.74134826660156 
620 / 1452 : pp = 162.50267028808594 
630 / 1452 : pp = 162.2018280029297 
640 / 1452 : pp = 162.37130737304688 
650 / 1452 : pp = 162.3895721435547 
660 / 1452 : pp = 162.51351928710938 
670 / 1452 : pp = 162.57684326171875 
680 / 1452 : pp = 162.6346893310547 
690 / 1452 : pp = 162.5135955810547 
700 / 1452 : pp = 162.47052001953125 
710 / 1452 : pp = 162.539794921875 
720 / 1452 : pp = 162.55381774902344 
730 / 1452 : pp = 162.5297088623047 
740 / 1452 : pp = 162.71652221679688 
750 / 1452 : pp = 162.740966796875 
760 / 1452 : pp = 162.79754638671875 
770 / 1452 : pp = 162.9949951171875 
780 / 1452 : pp = 163.17868041992188 
790 / 1452 : pp = 163.33055114746094 
800 / 1452 : pp = 163.31591796875 
810 / 1452 : pp = 163.2859344482422 
820 / 1452 : pp = 163.2958984375 
830 / 1452 : pp = 163.528564453125 
840 / 1452 : pp = 163.47610473632812 
850 / 1452 : pp = 163.5260772705078 
860 / 1452 : pp = 163.55352783203125 
870 / 1452 : pp = 163.55718994140625 
880 / 1452 : pp = 163.67523193359375 
890 / 1452 : pp = 163.6920166015625 
900 / 1452 : pp = 163.67710876464844 
910 / 1452 : pp = 163.7476806640625 
920 / 1452 : pp = 163.84803771972656 
930 / 1452 : pp = 163.8114013671875 
940 / 1452 : pp = 163.86663818359375 
950 / 1452 : pp = 163.83531188964844 
960 / 1452 : pp = 163.79945373535156 
970 / 1452 : pp = 163.80320739746094 
980 / 1452 : pp = 163.5953369140625 
990 / 1452 : pp = 163.48382568359375 
1000 / 1452 : pp = 163.2642822265625 
1010 / 1452 : pp = 163.32113647460938 
1020 / 1452 : pp = 163.44204711914062 
1030 / 1452 : pp = 163.40206909179688 
1040 / 1452 : pp = 163.4915313720703 
1050 / 1452 : pp = 163.47096252441406 
1060 / 1452 : pp = 163.3601531982422 
1070 / 1452 : pp = 163.5138397216797 
1080 / 1452 : pp = 163.6189727783203 
1090 / 1452 : pp = 163.6471405029297 
1100 / 1452 : pp = 163.60406494140625 
1110 / 1452 : pp = 163.40736389160156 
1120 / 1452 : pp = 163.26841735839844 
1130 / 1452 : pp = 163.0680694580078 
1140 / 1452 : pp = 163.04591369628906 
1150 / 1452 : pp = 163.15478515625 
1160 / 1452 : pp = 163.1380615234375 
1170 / 1452 : pp = 163.09303283691406 
1180 / 1452 : pp = 163.14149475097656 
1190 / 1452 : pp = 163.2374267578125 
1200 / 1452 : pp = 163.2394561767578 
1210 / 1452 : pp = 163.17835998535156 
1220 / 1452 : pp = 163.32347106933594 
1230 / 1452 : pp = 163.4639434814453 
1240 / 1452 : pp = 163.4611358642578 
1250 / 1452 : pp = 163.60687255859375 
1260 / 1452 : pp = 163.67227172851562 
1270 / 1452 : pp = 163.67515563964844 
1280 / 1452 : pp = 163.6881103515625 
1290 / 1452 : pp = 163.66648864746094 
1300 / 1452 : pp = 163.69287109375 
1310 / 1452 : pp = 163.7276153564453 
1320 / 1452 : pp = 163.6551055908203 
1330 / 1452 : pp = 163.58901977539062 
1340 / 1452 : pp = 163.6205291748047 
1350 / 1452 : pp = 163.63824462890625 
1360 / 1452 : pp = 163.69334411621094 
1370 / 1452 : pp = 163.5885467529297 
1380 / 1452 : pp = 163.54049682617188 
1390 / 1452 : pp = 163.4760284423828 
1400 / 1452 : pp = 163.38897705078125 
1410 / 1452 : pp = 163.3974609375 
1420 / 1452 : pp = 163.35009765625 
1430 / 1452 : pp = 163.32191467285156 
1440 / 1452 : pp = 163.35220336914062 
1450 / 1452 : pp = 163.3201904296875 

0 / 115 : pp = 232.2108154296875 
10 / 115 : pp = 214.35496520996094 
20 / 115 : pp = 215.20510864257812 
30 / 115 : pp = 212.82754516601562 
40 / 115 : pp = 212.0598907470703 
50 / 115 : pp = 207.5095672607422 
60 / 115 : pp = 206.86976623535156 
70 / 115 : pp = 203.36016845703125 
80 / 115 : pp = 201.11538696289062 
90 / 115 : pp = 198.52120971679688 
100 / 115 : pp = 194.1772003173828 
110 / 115 : pp = 192.41224670410156 
Training perplexity: 163.29916381835938
Validation perplexity:192.09552001953125
Total time : 41.78096055984497
Epoch 8

0 / 1452 : pp = 201.77548217773438 
10 / 1452 : pp = 180.4141082763672 
20 / 1452 : pp = 176.41432189941406 
30 / 1452 : pp = 172.7764434814453 
40 / 1452 : pp = 174.69166564941406 
50 / 1452 : pp = 171.2933807373047 
60 / 1452 : pp = 170.08010864257812 
70 / 1452 : pp = 170.6719512939453 
80 / 1452 : pp = 170.07589721679688 
90 / 1452 : pp = 168.7478485107422 
100 / 1452 : pp = 167.57081604003906 
110 / 1452 : pp = 166.06971740722656 
120 / 1452 : pp = 165.73374938964844 
130 / 1452 : pp = 164.80674743652344 
140 / 1452 : pp = 163.32821655273438 
150 / 1452 : pp = 162.6752471923828 
160 / 1452 : pp = 163.02049255371094 
170 / 1452 : pp = 162.64120483398438 
180 / 1452 : pp = 161.95529174804688 
190 / 1452 : pp = 161.91954040527344 
200 / 1452 : pp = 162.5446014404297 
210 / 1452 : pp = 162.2645721435547 
220 / 1452 : pp = 162.3128662109375 
230 / 1452 : pp = 162.65872192382812 
240 / 1452 : pp = 162.40948486328125 
250 / 1452 : pp = 161.75787353515625 
260 / 1452 : pp = 161.15213012695312 
270 / 1452 : pp = 160.22256469726562 
280 / 1452 : pp = 160.3651123046875 
290 / 1452 : pp = 160.63780212402344 
300 / 1452 : pp = 160.80026245117188 
310 / 1452 : pp = 160.54383850097656 
320 / 1452 : pp = 160.7539520263672 
330 / 1452 : pp = 160.94317626953125 
340 / 1452 : pp = 160.3373565673828 
350 / 1452 : pp = 160.71763610839844 
360 / 1452 : pp = 160.60960388183594 
370 / 1452 : pp = 160.37527465820312 
380 / 1452 : pp = 159.92990112304688 
390 / 1452 : pp = 160.0165557861328 
400 / 1452 : pp = 159.75697326660156 
410 / 1452 : pp = 160.15274047851562 
420 / 1452 : pp = 160.48390197753906 
430 / 1452 : pp = 160.4031982421875 
440 / 1452 : pp = 160.4693603515625 
450 / 1452 : pp = 160.28016662597656 
460 / 1452 : pp = 159.94004821777344 
470 / 1452 : pp = 159.48257446289062 
480 / 1452 : pp = 158.87998962402344 
490 / 1452 : pp = 158.59765625 
500 / 1452 : pp = 158.10865783691406 
510 / 1452 : pp = 157.96795654296875 
520 / 1452 : pp = 157.7591552734375 
530 / 1452 : pp = 157.42648315429688 
540 / 1452 : pp = 156.85348510742188 
550 / 1452 : pp = 156.5618438720703 
560 / 1452 : pp = 156.24905395507812 
570 / 1452 : pp = 155.9994354248047 
580 / 1452 : pp = 155.612060546875 
590 / 1452 : pp = 155.25830078125 
600 / 1452 : pp = 154.8464813232422 
610 / 1452 : pp = 154.5833282470703 
620 / 1452 : pp = 154.38040161132812 
630 / 1452 : pp = 154.0767364501953 
640 / 1452 : pp = 154.2534637451172 
650 / 1452 : pp = 154.25875854492188 
660 / 1452 : pp = 154.35874938964844 
670 / 1452 : pp = 154.4289093017578 
680 / 1452 : pp = 154.51412963867188 
690 / 1452 : pp = 154.41676330566406 
700 / 1452 : pp = 154.37892150878906 
710 / 1452 : pp = 154.4234619140625 
720 / 1452 : pp = 154.4586639404297 
730 / 1452 : pp = 154.4351806640625 
740 / 1452 : pp = 154.6002197265625 
750 / 1452 : pp = 154.65684509277344 
760 / 1452 : pp = 154.73318481445312 
770 / 1452 : pp = 154.92935180664062 
780 / 1452 : pp = 155.1021728515625 
790 / 1452 : pp = 155.24757385253906 
800 / 1452 : pp = 155.223876953125 
810 / 1452 : pp = 155.2095184326172 
820 / 1452 : pp = 155.24009704589844 
830 / 1452 : pp = 155.4519500732422 
840 / 1452 : pp = 155.3947296142578 
850 / 1452 : pp = 155.45306396484375 
860 / 1452 : pp = 155.4661102294922 
870 / 1452 : pp = 155.45765686035156 
880 / 1452 : pp = 155.58758544921875 
890 / 1452 : pp = 155.59373474121094 
900 / 1452 : pp = 155.59254455566406 
910 / 1452 : pp = 155.66854858398438 
920 / 1452 : pp = 155.75942993164062 
930 / 1452 : pp = 155.73350524902344 
940 / 1452 : pp = 155.80740356445312 
950 / 1452 : pp = 155.7733917236328 
960 / 1452 : pp = 155.73565673828125 
970 / 1452 : pp = 155.74404907226562 
980 / 1452 : pp = 155.55902099609375 
990 / 1452 : pp = 155.45675659179688 
1000 / 1452 : pp = 155.2649688720703 
1010 / 1452 : pp = 155.31332397460938 
1020 / 1452 : pp = 155.44979858398438 
1030 / 1452 : pp = 155.4137725830078 
1040 / 1452 : pp = 155.49012756347656 
1050 / 1452 : pp = 155.46054077148438 
1060 / 1452 : pp = 155.3616943359375 
1070 / 1452 : pp = 155.5286865234375 
1080 / 1452 : pp = 155.63743591308594 
1090 / 1452 : pp = 155.6842803955078 
1100 / 1452 : pp = 155.65599060058594 
1110 / 1452 : pp = 155.4827880859375 
1120 / 1452 : pp = 155.35450744628906 
1130 / 1452 : pp = 155.1777801513672 
1140 / 1452 : pp = 155.15994262695312 
1150 / 1452 : pp = 155.26193237304688 
1160 / 1452 : pp = 155.26214599609375 
1170 / 1452 : pp = 155.23231506347656 
1180 / 1452 : pp = 155.29266357421875 
1190 / 1452 : pp = 155.37680053710938 
1200 / 1452 : pp = 155.3736114501953 
1210 / 1452 : pp = 155.3380584716797 
1220 / 1452 : pp = 155.474853515625 
1230 / 1452 : pp = 155.62986755371094 
1240 / 1452 : pp = 155.62831115722656 
1250 / 1452 : pp = 155.77101135253906 
1260 / 1452 : pp = 155.83445739746094 
1270 / 1452 : pp = 155.845458984375 
1280 / 1452 : pp = 155.8556365966797 
1290 / 1452 : pp = 155.8556365966797 
1300 / 1452 : pp = 155.8843994140625 
1310 / 1452 : pp = 155.92417907714844 
1320 / 1452 : pp = 155.8560791015625 
1330 / 1452 : pp = 155.80636596679688 
1340 / 1452 : pp = 155.84344482421875 
1350 / 1452 : pp = 155.8706512451172 
1360 / 1452 : pp = 155.9273681640625 
1370 / 1452 : pp = 155.83140563964844 
1380 / 1452 : pp = 155.7911376953125 
1390 / 1452 : pp = 155.7401885986328 
1400 / 1452 : pp = 155.6622314453125 
1410 / 1452 : pp = 155.68531799316406 
1420 / 1452 : pp = 155.64041137695312 
1430 / 1452 : pp = 155.62216186523438 
1440 / 1452 : pp = 155.6437530517578 
1450 / 1452 : pp = 155.62757873535156 

0 / 115 : pp = 228.70111083984375 
10 / 115 : pp = 211.03330993652344 
20 / 115 : pp = 212.24957275390625 
30 / 115 : pp = 209.8839569091797 
40 / 115 : pp = 209.11045837402344 
50 / 115 : pp = 204.66351318359375 
60 / 115 : pp = 204.03366088867188 
70 / 115 : pp = 200.46681213378906 
80 / 115 : pp = 198.24404907226562 
90 / 115 : pp = 195.63223266601562 
100 / 115 : pp = 191.18345642089844 
110 / 115 : pp = 189.31134033203125 
Training perplexity: 155.61154174804688
Validation perplexity:188.94537353515625
Total time : 42.13483738899231
Epoch 9

0 / 1452 : pp = 197.80628967285156 
10 / 1452 : pp = 172.6316680908203 
20 / 1452 : pp = 168.6739959716797 
30 / 1452 : pp = 164.4781036376953 
40 / 1452 : pp = 166.1627960205078 
50 / 1452 : pp = 163.05197143554688 
60 / 1452 : pp = 161.87924194335938 
70 / 1452 : pp = 162.5297088623047 
80 / 1452 : pp = 161.7450714111328 
90 / 1452 : pp = 160.6148223876953 
100 / 1452 : pp = 159.73289489746094 
110 / 1452 : pp = 158.4092254638672 
120 / 1452 : pp = 158.04653930664062 
130 / 1452 : pp = 157.13563537597656 
140 / 1452 : pp = 155.71798706054688 
150 / 1452 : pp = 155.19161987304688 
160 / 1452 : pp = 155.42718505859375 
170 / 1452 : pp = 155.0531463623047 
180 / 1452 : pp = 154.46897888183594 
190 / 1452 : pp = 154.4127197265625 
200 / 1452 : pp = 154.97154235839844 
210 / 1452 : pp = 154.70169067382812 
220 / 1452 : pp = 154.72816467285156 
230 / 1452 : pp = 155.03799438476562 
240 / 1452 : pp = 154.85601806640625 
250 / 1452 : pp = 154.28016662597656 
260 / 1452 : pp = 153.7699432373047 
270 / 1452 : pp = 152.90948486328125 
280 / 1452 : pp = 153.0459747314453 
290 / 1452 : pp = 153.298095703125 
300 / 1452 : pp = 153.45716857910156 
310 / 1452 : pp = 153.22195434570312 
320 / 1452 : pp = 153.41664123535156 
330 / 1452 : pp = 153.66542053222656 
340 / 1452 : pp = 153.06378173828125 
350 / 1452 : pp = 153.43923950195312 
360 / 1452 : pp = 153.31381225585938 
370 / 1452 : pp = 153.13473510742188 
380 / 1452 : pp = 152.75267028808594 
390 / 1452 : pp = 152.85504150390625 
400 / 1452 : pp = 152.62342834472656 
410 / 1452 : pp = 153.03152465820312 
420 / 1452 : pp = 153.39161682128906 
430 / 1452 : pp = 153.30364990234375 
440 / 1452 : pp = 153.37896728515625 
450 / 1452 : pp = 153.18988037109375 
460 / 1452 : pp = 152.88478088378906 
470 / 1452 : pp = 152.4380340576172 
480 / 1452 : pp = 151.86618041992188 
490 / 1452 : pp = 151.5962371826172 
500 / 1452 : pp = 151.11614990234375 
510 / 1452 : pp = 150.99830627441406 
520 / 1452 : pp = 150.8135986328125 
530 / 1452 : pp = 150.500732421875 
540 / 1452 : pp = 149.9623260498047 
550 / 1452 : pp = 149.68028259277344 
560 / 1452 : pp = 149.3885040283203 
570 / 1452 : pp = 149.140380859375 
580 / 1452 : pp = 148.76876831054688 
590 / 1452 : pp = 148.43368530273438 
600 / 1452 : pp = 148.02598571777344 
610 / 1452 : pp = 147.7869110107422 
620 / 1452 : pp = 147.59796142578125 
630 / 1452 : pp = 147.30068969726562 
640 / 1452 : pp = 147.45240783691406 
650 / 1452 : pp = 147.4651336669922 
660 / 1452 : pp = 147.5808563232422 
670 / 1452 : pp = 147.65582275390625 
680 / 1452 : pp = 147.7360382080078 
690 / 1452 : pp = 147.63075256347656 
700 / 1452 : pp = 147.6066131591797 
710 / 1452 : pp = 147.7024383544922 
720 / 1452 : pp = 147.7445526123047 
730 / 1452 : pp = 147.72279357910156 
740 / 1452 : pp = 147.87107849121094 
750 / 1452 : pp = 147.91436767578125 
760 / 1452 : pp = 147.9857635498047 
770 / 1452 : pp = 148.18206787109375 
780 / 1452 : pp = 148.3845672607422 
790 / 1452 : pp = 148.5517120361328 
800 / 1452 : pp = 148.54002380371094 
810 / 1452 : pp = 148.51119995117188 
820 / 1452 : pp = 148.5664520263672 
830 / 1452 : pp = 148.7821044921875 
840 / 1452 : pp = 148.72486877441406 
850 / 1452 : pp = 148.77452087402344 
860 / 1452 : pp = 148.80076599121094 
870 / 1452 : pp = 148.79701232910156 
880 / 1452 : pp = 148.9181671142578 
890 / 1452 : pp = 148.94537353515625 
900 / 1452 : pp = 148.9435272216797 
910 / 1452 : pp = 149.02102661132812 
920 / 1452 : pp = 149.1085968017578 
930 / 1452 : pp = 149.06893920898438 
940 / 1452 : pp = 149.1317138671875 
950 / 1452 : pp = 149.1232452392578 
960 / 1452 : pp = 149.10354614257812 
970 / 1452 : pp = 149.11656188964844 
980 / 1452 : pp = 148.94259643554688 
990 / 1452 : pp = 148.8236846923828 
1000 / 1452 : pp = 148.633056640625 
1010 / 1452 : pp = 148.6830291748047 
1020 / 1452 : pp = 148.8126220703125 
1030 / 1452 : pp = 148.78089904785156 
1040 / 1452 : pp = 148.8600311279297 
1050 / 1452 : pp = 148.8486785888672 
1060 / 1452 : pp = 148.7664337158203 
1070 / 1452 : pp = 148.9337921142578 
1080 / 1452 : pp = 149.04441833496094 
1090 / 1452 : pp = 149.07284545898438 
1100 / 1452 : pp = 149.03318786621094 
1110 / 1452 : pp = 148.86428833007812 
1120 / 1452 : pp = 148.7332305908203 
1130 / 1452 : pp = 148.5670166015625 
1140 / 1452 : pp = 148.54661560058594 
1150 / 1452 : pp = 148.64219665527344 
1160 / 1452 : pp = 148.6490020751953 
1170 / 1452 : pp = 148.62420654296875 
1180 / 1452 : pp = 148.67665100097656 
1190 / 1452 : pp = 148.7633056640625 
1200 / 1452 : pp = 148.7782745361328 
1210 / 1452 : pp = 148.72500610351562 
1220 / 1452 : pp = 148.87493896484375 
1230 / 1452 : pp = 149.039794921875 
1240 / 1452 : pp = 149.04000854492188 
1250 / 1452 : pp = 149.17054748535156 
1260 / 1452 : pp = 149.23863220214844 
1270 / 1452 : pp = 149.2436065673828 
1280 / 1452 : pp = 149.25086975097656 
1290 / 1452 : pp = 149.24147033691406 
1300 / 1452 : pp = 149.27413940429688 
1310 / 1452 : pp = 149.32077026367188 
1320 / 1452 : pp = 149.27301025390625 
1330 / 1452 : pp = 149.23080444335938 
1340 / 1452 : pp = 149.25791931152344 
1350 / 1452 : pp = 149.2841033935547 
1360 / 1452 : pp = 149.337158203125 
1370 / 1452 : pp = 149.2467498779297 
1380 / 1452 : pp = 149.21351623535156 
1390 / 1452 : pp = 149.15403747558594 
1400 / 1452 : pp = 149.0877685546875 
1410 / 1452 : pp = 149.110595703125 
1420 / 1452 : pp = 149.07241821289062 
1430 / 1452 : pp = 149.05166625976562 
1440 / 1452 : pp = 149.0776824951172 
1450 / 1452 : pp = 149.06771850585938 

0 / 115 : pp = 227.0559844970703 
10 / 115 : pp = 208.7002410888672 
20 / 115 : pp = 210.38775634765625 
30 / 115 : pp = 207.9513397216797 
40 / 115 : pp = 207.12994384765625 
50 / 115 : pp = 202.70811462402344 
60 / 115 : pp = 202.05787658691406 
70 / 115 : pp = 198.3761444091797 
80 / 115 : pp = 196.17637634277344 
90 / 115 : pp = 193.5880126953125 
100 / 115 : pp = 189.0758819580078 
110 / 115 : pp = 187.07528686523438 
Training perplexity: 149.0502471923828
Validation perplexity:186.6911163330078
Total time : 47.274805545806885
Epoch 10

0 / 1452 : pp = 181.8408203125 
10 / 1452 : pp = 164.99664306640625 
20 / 1452 : pp = 161.8847198486328 
30 / 1452 : pp = 158.30064392089844 
40 / 1452 : pp = 160.13914489746094 
50 / 1452 : pp = 157.58743286132812 
60 / 1452 : pp = 156.11871337890625 
70 / 1452 : pp = 156.82948303222656 
80 / 1452 : pp = 156.2889862060547 
90 / 1452 : pp = 155.04833984375 
100 / 1452 : pp = 154.09327697753906 
110 / 1452 : pp = 152.5070343017578 
120 / 1452 : pp = 152.20750427246094 
130 / 1452 : pp = 151.3399200439453 
140 / 1452 : pp = 149.90740966796875 
150 / 1452 : pp = 149.345703125 
160 / 1452 : pp = 149.59814453125 
170 / 1452 : pp = 149.26539611816406 
180 / 1452 : pp = 148.624267578125 
190 / 1452 : pp = 148.58819580078125 
200 / 1452 : pp = 149.09552001953125 
210 / 1452 : pp = 148.8439178466797 
220 / 1452 : pp = 148.86605834960938 
230 / 1452 : pp = 149.1971435546875 
240 / 1452 : pp = 148.96533203125 
250 / 1452 : pp = 148.4253387451172 
260 / 1452 : pp = 147.9200897216797 
270 / 1452 : pp = 147.08816528320312 
280 / 1452 : pp = 147.24366760253906 
290 / 1452 : pp = 147.52182006835938 
300 / 1452 : pp = 147.72222900390625 
310 / 1452 : pp = 147.50486755371094 
320 / 1452 : pp = 147.73892211914062 
330 / 1452 : pp = 147.9404754638672 
340 / 1452 : pp = 147.37803649902344 
350 / 1452 : pp = 147.6969451904297 
360 / 1452 : pp = 147.5704345703125 
370 / 1452 : pp = 147.38674926757812 
380 / 1452 : pp = 147.03970336914062 
390 / 1452 : pp = 147.14231872558594 
400 / 1452 : pp = 146.91656494140625 
410 / 1452 : pp = 147.34059143066406 
420 / 1452 : pp = 147.68496704101562 
430 / 1452 : pp = 147.61195373535156 
440 / 1452 : pp = 147.68405151367188 
450 / 1452 : pp = 147.4711151123047 
460 / 1452 : pp = 147.1927032470703 
470 / 1452 : pp = 146.72970581054688 
480 / 1452 : pp = 146.17173767089844 
490 / 1452 : pp = 145.9028778076172 
500 / 1452 : pp = 145.42721557617188 
510 / 1452 : pp = 145.3111114501953 
520 / 1452 : pp = 145.11460876464844 
530 / 1452 : pp = 144.81488037109375 
540 / 1452 : pp = 144.263916015625 
550 / 1452 : pp = 143.997802734375 
560 / 1452 : pp = 143.71766662597656 
570 / 1452 : pp = 143.47451782226562 
580 / 1452 : pp = 143.08474731445312 
590 / 1452 : pp = 142.77920532226562 
600 / 1452 : pp = 142.39573669433594 
610 / 1452 : pp = 142.14906311035156 
620 / 1452 : pp = 141.9574432373047 
630 / 1452 : pp = 141.67369079589844 
640 / 1452 : pp = 141.81556701660156 
650 / 1452 : pp = 141.81759643554688 
660 / 1452 : pp = 141.9339599609375 
670 / 1452 : pp = 142.01248168945312 
680 / 1452 : pp = 142.08773803710938 
690 / 1452 : pp = 142.00328063964844 
700 / 1452 : pp = 141.98086547851562 
710 / 1452 : pp = 142.0632781982422 
720 / 1452 : pp = 142.10372924804688 
730 / 1452 : pp = 142.08055114746094 
740 / 1452 : pp = 142.23619079589844 
750 / 1452 : pp = 142.2660369873047 
760 / 1452 : pp = 142.34678649902344 
770 / 1452 : pp = 142.5257568359375 
780 / 1452 : pp = 142.70025634765625 
790 / 1452 : pp = 142.8614044189453 
800 / 1452 : pp = 142.84573364257812 
810 / 1452 : pp = 142.8250274658203 
820 / 1452 : pp = 142.8540496826172 
830 / 1452 : pp = 143.06053161621094 
840 / 1452 : pp = 143.0423126220703 
850 / 1452 : pp = 143.09634399414062 
860 / 1452 : pp = 143.10487365722656 
870 / 1452 : pp = 143.0884246826172 
880 / 1452 : pp = 143.19387817382812 
890 / 1452 : pp = 143.236083984375 
900 / 1452 : pp = 143.23390197753906 
910 / 1452 : pp = 143.29537963867188 
920 / 1452 : pp = 143.3722686767578 
930 / 1452 : pp = 143.33795166015625 
940 / 1452 : pp = 143.40618896484375 
950 / 1452 : pp = 143.3929901123047 
960 / 1452 : pp = 143.3693389892578 
970 / 1452 : pp = 143.39736938476562 
980 / 1452 : pp = 143.2371063232422 
990 / 1452 : pp = 143.13893127441406 
1000 / 1452 : pp = 142.9658660888672 
1010 / 1452 : pp = 143.01544189453125 
1020 / 1452 : pp = 143.152587890625 
1030 / 1452 : pp = 143.11334228515625 
1040 / 1452 : pp = 143.19020080566406 
1050 / 1452 : pp = 143.18234252929688 
1060 / 1452 : pp = 143.092041015625 
1070 / 1452 : pp = 143.24449157714844 
1080 / 1452 : pp = 143.34828186035156 
1090 / 1452 : pp = 143.38739013671875 
1100 / 1452 : pp = 143.37432861328125 
1110 / 1452 : pp = 143.20596313476562 
1120 / 1452 : pp = 143.07969665527344 
1130 / 1452 : pp = 142.92041015625 
1140 / 1452 : pp = 142.90902709960938 
1150 / 1452 : pp = 143.00732421875 
1160 / 1452 : pp = 143.01182556152344 
1170 / 1452 : pp = 142.9925994873047 
1180 / 1452 : pp = 143.06080627441406 
1190 / 1452 : pp = 143.14337158203125 
1200 / 1452 : pp = 143.16644287109375 
1210 / 1452 : pp = 143.1259002685547 
1220 / 1452 : pp = 143.2671661376953 
1230 / 1452 : pp = 143.4210968017578 
1240 / 1452 : pp = 143.4327850341797 
1250 / 1452 : pp = 143.5699920654297 
1260 / 1452 : pp = 143.63771057128906 
1270 / 1452 : pp = 143.65798950195312 
1280 / 1452 : pp = 143.68251037597656 
1290 / 1452 : pp = 143.68045043945312 
1300 / 1452 : pp = 143.72293090820312 
1310 / 1452 : pp = 143.77015686035156 
1320 / 1452 : pp = 143.71910095214844 
1330 / 1452 : pp = 143.68792724609375 
1340 / 1452 : pp = 143.7241668701172 
1350 / 1452 : pp = 143.7570037841797 
1360 / 1452 : pp = 143.81829833984375 
1370 / 1452 : pp = 143.7487030029297 
1380 / 1452 : pp = 143.7196502685547 
1390 / 1452 : pp = 143.67359924316406 
1400 / 1452 : pp = 143.60592651367188 
1410 / 1452 : pp = 143.62620544433594 
1420 / 1452 : pp = 143.5905303955078 
1430 / 1452 : pp = 143.55799865722656 
1440 / 1452 : pp = 143.5891571044922 
1450 / 1452 : pp = 143.5869598388672 

0 / 115 : pp = 226.9864959716797 
10 / 115 : pp = 207.8067169189453 
20 / 115 : pp = 209.68667602539062 
30 / 115 : pp = 207.1610565185547 
40 / 115 : pp = 206.3247833251953 
50 / 115 : pp = 201.77403259277344 
60 / 115 : pp = 201.07098388671875 
70 / 115 : pp = 197.33335876464844 
80 / 115 : pp = 195.12513732910156 
90 / 115 : pp = 192.5349578857422 
100 / 115 : pp = 187.90072631835938 
110 / 115 : pp = 185.81240844726562 
Training perplexity: 143.57354736328125
Validation perplexity:185.40573120117188
Total time : 46.14846849441528
Epoch 11

0 / 1452 : pp = 181.93162536621094 
10 / 1452 : pp = 159.94607543945312 
20 / 1452 : pp = 156.83673095703125 
30 / 1452 : pp = 153.75843811035156 
40 / 1452 : pp = 155.18362426757812 
50 / 1452 : pp = 152.39529418945312 
60 / 1452 : pp = 151.18772888183594 
70 / 1452 : pp = 151.9004364013672 
80 / 1452 : pp = 151.30239868164062 
90 / 1452 : pp = 150.1591033935547 
100 / 1452 : pp = 149.18618774414062 
110 / 1452 : pp = 147.72653198242188 
120 / 1452 : pp = 147.4357452392578 
130 / 1452 : pp = 146.41372680664062 
140 / 1452 : pp = 145.0057373046875 
150 / 1452 : pp = 144.39447021484375 
160 / 1452 : pp = 144.5330047607422 
170 / 1452 : pp = 144.23593139648438 
180 / 1452 : pp = 143.63990783691406 
190 / 1452 : pp = 143.63812255859375 
200 / 1452 : pp = 144.1143798828125 
210 / 1452 : pp = 143.88278198242188 
220 / 1452 : pp = 143.92518615722656 
230 / 1452 : pp = 144.24032592773438 
240 / 1452 : pp = 143.94110107421875 
250 / 1452 : pp = 143.3688507080078 
260 / 1452 : pp = 142.8829345703125 
270 / 1452 : pp = 142.11952209472656 
280 / 1452 : pp = 142.19415283203125 
290 / 1452 : pp = 142.51889038085938 
300 / 1452 : pp = 142.70494079589844 
310 / 1452 : pp = 142.51426696777344 
320 / 1452 : pp = 142.70106506347656 
330 / 1452 : pp = 142.88014221191406 
340 / 1452 : pp = 142.3287353515625 
350 / 1452 : pp = 142.6169891357422 
360 / 1452 : pp = 142.51971435546875 
370 / 1452 : pp = 142.33566284179688 
380 / 1452 : pp = 142.04161071777344 
390 / 1452 : pp = 142.13551330566406 
400 / 1452 : pp = 141.9499969482422 
410 / 1452 : pp = 142.3361358642578 
420 / 1452 : pp = 142.64065551757812 
430 / 1452 : pp = 142.5511016845703 
440 / 1452 : pp = 142.6728973388672 
450 / 1452 : pp = 142.47030639648438 
460 / 1452 : pp = 142.1704864501953 
470 / 1452 : pp = 141.73390197753906 
480 / 1452 : pp = 141.23020935058594 
490 / 1452 : pp = 140.9759521484375 
500 / 1452 : pp = 140.51609802246094 
510 / 1452 : pp = 140.40545654296875 
520 / 1452 : pp = 140.1936492919922 
530 / 1452 : pp = 139.8929443359375 
540 / 1452 : pp = 139.3696746826172 
550 / 1452 : pp = 139.13217163085938 
560 / 1452 : pp = 138.85247802734375 
570 / 1452 : pp = 138.6092987060547 
580 / 1452 : pp = 138.2471160888672 
590 / 1452 : pp = 137.9485626220703 
600 / 1452 : pp = 137.57379150390625 
610 / 1452 : pp = 137.31576538085938 
620 / 1452 : pp = 137.14230346679688 
630 / 1452 : pp = 136.87405395507812 
640 / 1452 : pp = 137.02928161621094 
650 / 1452 : pp = 137.0481719970703 
660 / 1452 : pp = 137.1595001220703 
670 / 1452 : pp = 137.21124267578125 
680 / 1452 : pp = 137.2671356201172 
690 / 1452 : pp = 137.19410705566406 
700 / 1452 : pp = 137.1850128173828 
710 / 1452 : pp = 137.26058959960938 
720 / 1452 : pp = 137.30726623535156 
730 / 1452 : pp = 137.28048706054688 
740 / 1452 : pp = 137.4352569580078 
750 / 1452 : pp = 137.4680938720703 
760 / 1452 : pp = 137.5524139404297 
770 / 1452 : pp = 137.73829650878906 
780 / 1452 : pp = 137.90882873535156 
790 / 1452 : pp = 138.05865478515625 
800 / 1452 : pp = 138.0673370361328 
810 / 1452 : pp = 138.03909301757812 
820 / 1452 : pp = 138.084716796875 
830 / 1452 : pp = 138.27989196777344 
840 / 1452 : pp = 138.23545837402344 
850 / 1452 : pp = 138.30343627929688 
860 / 1452 : pp = 138.3339080810547 
870 / 1452 : pp = 138.32835388183594 
880 / 1452 : pp = 138.4450225830078 
890 / 1452 : pp = 138.47157287597656 
900 / 1452 : pp = 138.46304321289062 
910 / 1452 : pp = 138.55618286132812 
920 / 1452 : pp = 138.64512634277344 
930 / 1452 : pp = 138.6160430908203 
940 / 1452 : pp = 138.66932678222656 
950 / 1452 : pp = 138.6573028564453 
960 / 1452 : pp = 138.6463165283203 
970 / 1452 : pp = 138.67059326171875 
980 / 1452 : pp = 138.50999450683594 
990 / 1452 : pp = 138.42430114746094 
1000 / 1452 : pp = 138.25344848632812 
1010 / 1452 : pp = 138.3004608154297 
1020 / 1452 : pp = 138.4243621826172 
1030 / 1452 : pp = 138.40713500976562 
1040 / 1452 : pp = 138.47129821777344 
1050 / 1452 : pp = 138.45928955078125 
1060 / 1452 : pp = 138.3919677734375 
1070 / 1452 : pp = 138.5287628173828 
1080 / 1452 : pp = 138.62298583984375 
1090 / 1452 : pp = 138.6699981689453 
1100 / 1452 : pp = 138.64849853515625 
1110 / 1452 : pp = 138.49191284179688 
1120 / 1452 : pp = 138.37355041503906 
1130 / 1452 : pp = 138.2216796875 
1140 / 1452 : pp = 138.21534729003906 
1150 / 1452 : pp = 138.30963134765625 
1160 / 1452 : pp = 138.316162109375 
1170 / 1452 : pp = 138.3023681640625 
1180 / 1452 : pp = 138.36932373046875 
1190 / 1452 : pp = 138.45960998535156 
1200 / 1452 : pp = 138.4866180419922 
1210 / 1452 : pp = 138.45730590820312 
1220 / 1452 : pp = 138.60031127929688 
1230 / 1452 : pp = 138.75485229492188 
1240 / 1452 : pp = 138.7751007080078 
1250 / 1452 : pp = 138.91221618652344 
1260 / 1452 : pp = 138.9815216064453 
1270 / 1452 : pp = 138.9919891357422 
1280 / 1452 : pp = 139.0243377685547 
1290 / 1452 : pp = 139.02725219726562 
1300 / 1452 : pp = 139.0701446533203 
1310 / 1452 : pp = 139.1090850830078 
1320 / 1452 : pp = 139.06027221679688 
1330 / 1452 : pp = 139.0338134765625 
1340 / 1452 : pp = 139.06385803222656 
1350 / 1452 : pp = 139.09608459472656 
1360 / 1452 : pp = 139.1609649658203 
1370 / 1452 : pp = 139.0869903564453 
1380 / 1452 : pp = 139.0604705810547 
1390 / 1452 : pp = 139.01670837402344 
1400 / 1452 : pp = 138.94393920898438 
1410 / 1452 : pp = 138.97323608398438 
1420 / 1452 : pp = 138.9404296875 
1430 / 1452 : pp = 138.90943908691406 
1440 / 1452 : pp = 138.94268798828125 
1450 / 1452 : pp = 138.93991088867188 

0 / 115 : pp = 225.55990600585938 
10 / 115 : pp = 207.0504608154297 
20 / 115 : pp = 208.98306274414062 
30 / 115 : pp = 206.28396606445312 
40 / 115 : pp = 205.35386657714844 
50 / 115 : pp = 200.7255401611328 
60 / 115 : pp = 200.0526580810547 
70 / 115 : pp = 196.33087158203125 
80 / 115 : pp = 194.12110900878906 
90 / 115 : pp = 191.52816772460938 
100 / 115 : pp = 186.7974395751953 
110 / 115 : pp = 184.59829711914062 
Training perplexity: 138.9222869873047
Validation perplexity:184.18101501464844
Total time : 43.92928600311279
Epoch 12

0 / 1452 : pp = 173.0251007080078 
10 / 1452 : pp = 152.98446655273438 
20 / 1452 : pp = 150.43128967285156 
30 / 1452 : pp = 147.5819854736328 
40 / 1452 : pp = 149.4164276123047 
50 / 1452 : pp = 146.70816040039062 
60 / 1452 : pp = 145.557861328125 
70 / 1452 : pp = 146.50473022460938 
80 / 1452 : pp = 145.83200073242188 
90 / 1452 : pp = 144.84402465820312 
100 / 1452 : pp = 144.0390167236328 
110 / 1452 : pp = 142.66514587402344 
120 / 1452 : pp = 142.3549346923828 
130 / 1452 : pp = 141.4630126953125 
140 / 1452 : pp = 140.2266082763672 
150 / 1452 : pp = 139.67518615722656 
160 / 1452 : pp = 139.90414428710938 
170 / 1452 : pp = 139.5490264892578 
180 / 1452 : pp = 138.91969299316406 
190 / 1452 : pp = 138.89234924316406 
200 / 1452 : pp = 139.40908813476562 
210 / 1452 : pp = 139.19068908691406 
220 / 1452 : pp = 139.35513305664062 
230 / 1452 : pp = 139.5464324951172 
240 / 1452 : pp = 139.3047637939453 
250 / 1452 : pp = 138.7708740234375 
260 / 1452 : pp = 138.29188537597656 
270 / 1452 : pp = 137.4787139892578 
280 / 1452 : pp = 137.6367950439453 
290 / 1452 : pp = 137.98513793945312 
300 / 1452 : pp = 138.17819213867188 
310 / 1452 : pp = 137.943359375 
320 / 1452 : pp = 138.12060546875 
330 / 1452 : pp = 138.29037475585938 
340 / 1452 : pp = 137.77606201171875 
350 / 1452 : pp = 138.06378173828125 
360 / 1452 : pp = 137.99000549316406 
370 / 1452 : pp = 137.81922912597656 
380 / 1452 : pp = 137.52159118652344 
390 / 1452 : pp = 137.61782836914062 
400 / 1452 : pp = 137.4178924560547 
410 / 1452 : pp = 137.82632446289062 
420 / 1452 : pp = 138.17567443847656 
430 / 1452 : pp = 138.11863708496094 
440 / 1452 : pp = 138.215087890625 
450 / 1452 : pp = 137.9976348876953 
460 / 1452 : pp = 137.6929168701172 
470 / 1452 : pp = 137.25416564941406 
480 / 1452 : pp = 136.75140380859375 
490 / 1452 : pp = 136.51712036132812 
500 / 1452 : pp = 136.0896453857422 
510 / 1452 : pp = 135.97048950195312 
520 / 1452 : pp = 135.7760009765625 
530 / 1452 : pp = 135.50389099121094 
540 / 1452 : pp = 135.01437377929688 
550 / 1452 : pp = 134.7666015625 
560 / 1452 : pp = 134.48973083496094 
570 / 1452 : pp = 134.22853088378906 
580 / 1452 : pp = 133.88455200195312 
590 / 1452 : pp = 133.5808868408203 
600 / 1452 : pp = 133.22975158691406 
610 / 1452 : pp = 132.99591064453125 
620 / 1452 : pp = 132.79502868652344 
630 / 1452 : pp = 132.5094451904297 
640 / 1452 : pp = 132.62892150878906 
650 / 1452 : pp = 132.63499450683594 
660 / 1452 : pp = 132.7379913330078 
670 / 1452 : pp = 132.79046630859375 
680 / 1452 : pp = 132.85842895507812 
690 / 1452 : pp = 132.80364990234375 
700 / 1452 : pp = 132.80477905273438 
710 / 1452 : pp = 132.90170288085938 
720 / 1452 : pp = 132.92971801757812 
730 / 1452 : pp = 132.9019012451172 
740 / 1452 : pp = 133.04811096191406 
750 / 1452 : pp = 133.10877990722656 
760 / 1452 : pp = 133.19189453125 
770 / 1452 : pp = 133.3564910888672 
780 / 1452 : pp = 133.54000854492188 
790 / 1452 : pp = 133.69239807128906 
800 / 1452 : pp = 133.68495178222656 
810 / 1452 : pp = 133.67971801757812 
820 / 1452 : pp = 133.7035675048828 
830 / 1452 : pp = 133.89329528808594 
840 / 1452 : pp = 133.850341796875 
850 / 1452 : pp = 133.90390014648438 
860 / 1452 : pp = 133.9090118408203 
870 / 1452 : pp = 133.89974975585938 
880 / 1452 : pp = 134.0077667236328 
890 / 1452 : pp = 134.03485107421875 
900 / 1452 : pp = 134.0261688232422 
910 / 1452 : pp = 134.10255432128906 
920 / 1452 : pp = 134.17291259765625 
930 / 1452 : pp = 134.14796447753906 
940 / 1452 : pp = 134.20925903320312 
950 / 1452 : pp = 134.19281005859375 
960 / 1452 : pp = 134.17745971679688 
970 / 1452 : pp = 134.18653869628906 
980 / 1452 : pp = 134.03192138671875 
990 / 1452 : pp = 133.94349670410156 
1000 / 1452 : pp = 133.79685974121094 
1010 / 1452 : pp = 133.8438262939453 
1020 / 1452 : pp = 133.9608612060547 
1030 / 1452 : pp = 133.93934631347656 
1040 / 1452 : pp = 134.02833557128906 
1050 / 1452 : pp = 134.01734924316406 
1060 / 1452 : pp = 133.95346069335938 
1070 / 1452 : pp = 134.10205078125 
1080 / 1452 : pp = 134.2030487060547 
1090 / 1452 : pp = 134.23696899414062 
1100 / 1452 : pp = 134.2230224609375 
1110 / 1452 : pp = 134.0829315185547 
1120 / 1452 : pp = 133.980224609375 
1130 / 1452 : pp = 133.83815002441406 
1140 / 1452 : pp = 133.8366241455078 
1150 / 1452 : pp = 133.92108154296875 
1160 / 1452 : pp = 133.94375610351562 
1170 / 1452 : pp = 133.9360809326172 
1180 / 1452 : pp = 133.99684143066406 
1190 / 1452 : pp = 134.0944366455078 
1200 / 1452 : pp = 134.11676025390625 
1210 / 1452 : pp = 134.0911102294922 
1220 / 1452 : pp = 134.22763061523438 
1230 / 1452 : pp = 134.38043212890625 
1240 / 1452 : pp = 134.39817810058594 
1250 / 1452 : pp = 134.5367431640625 
1260 / 1452 : pp = 134.593017578125 
1270 / 1452 : pp = 134.61497497558594 
1280 / 1452 : pp = 134.6423797607422 
1290 / 1452 : pp = 134.64340209960938 
1300 / 1452 : pp = 134.68026733398438 
1310 / 1452 : pp = 134.73556518554688 
1320 / 1452 : pp = 134.69021606445312 
1330 / 1452 : pp = 134.66131591796875 
1340 / 1452 : pp = 134.69393920898438 
1350 / 1452 : pp = 134.7328643798828 
1360 / 1452 : pp = 134.79405212402344 
1370 / 1452 : pp = 134.71237182617188 
1380 / 1452 : pp = 134.6885528564453 
1390 / 1452 : pp = 134.65110778808594 
1400 / 1452 : pp = 134.59584045410156 
1410 / 1452 : pp = 134.6193389892578 
1420 / 1452 : pp = 134.58338928222656 
1430 / 1452 : pp = 134.559326171875 
1440 / 1452 : pp = 134.59507751464844 
1450 / 1452 : pp = 134.59365844726562 

0 / 115 : pp = 226.0741729736328 
10 / 115 : pp = 207.00494384765625 
20 / 115 : pp = 209.26976013183594 
30 / 115 : pp = 206.44662475585938 
40 / 115 : pp = 205.47268676757812 
50 / 115 : pp = 200.7876739501953 
60 / 115 : pp = 200.13414001464844 
70 / 115 : pp = 196.35549926757812 
80 / 115 : pp = 194.10777282714844 
90 / 115 : pp = 191.47467041015625 
100 / 115 : pp = 186.61351013183594 
110 / 115 : pp = 184.30374145507812 
Training perplexity: 134.57826232910156
Validation perplexity:183.8900146484375
Total time : 45.410256147384644
Epoch 13

0 / 1452 : pp = 169.39393615722656 
10 / 1452 : pp = 150.13232421875 
20 / 1452 : pp = 147.60450744628906 
30 / 1452 : pp = 144.64317321777344 
40 / 1452 : pp = 146.47427368164062 
50 / 1452 : pp = 143.929443359375 
60 / 1452 : pp = 142.8344268798828 
70 / 1452 : pp = 143.45248413085938 
80 / 1452 : pp = 142.5418701171875 
90 / 1452 : pp = 141.6178436279297 
100 / 1452 : pp = 140.70127868652344 
110 / 1452 : pp = 139.2852325439453 
120 / 1452 : pp = 138.8017120361328 
130 / 1452 : pp = 137.85629272460938 
140 / 1452 : pp = 136.51718139648438 
150 / 1452 : pp = 136.03619384765625 
160 / 1452 : pp = 136.154296875 
170 / 1452 : pp = 135.67037963867188 
180 / 1452 : pp = 135.0376739501953 
190 / 1452 : pp = 134.9230499267578 
200 / 1452 : pp = 135.4241180419922 
210 / 1452 : pp = 135.24581909179688 
220 / 1452 : pp = 135.37957763671875 
230 / 1452 : pp = 135.67652893066406 
240 / 1452 : pp = 135.4161834716797 
250 / 1452 : pp = 134.90895080566406 
260 / 1452 : pp = 134.46754455566406 
270 / 1452 : pp = 133.68577575683594 
280 / 1452 : pp = 133.86770629882812 
290 / 1452 : pp = 134.18475341796875 
300 / 1452 : pp = 134.39132690429688 
310 / 1452 : pp = 134.19985961914062 
320 / 1452 : pp = 134.37998962402344 
330 / 1452 : pp = 134.5557403564453 
340 / 1452 : pp = 134.00686645507812 
350 / 1452 : pp = 134.27749633789062 
360 / 1452 : pp = 134.20286560058594 
370 / 1452 : pp = 134.042724609375 
380 / 1452 : pp = 133.74398803710938 
390 / 1452 : pp = 133.83584594726562 
400 / 1452 : pp = 133.64382934570312 
410 / 1452 : pp = 134.02366638183594 
420 / 1452 : pp = 134.35415649414062 
430 / 1452 : pp = 134.310546875 
440 / 1452 : pp = 134.3634490966797 
450 / 1452 : pp = 134.15602111816406 
460 / 1452 : pp = 133.86578369140625 
470 / 1452 : pp = 133.43414306640625 
480 / 1452 : pp = 132.90310668945312 
490 / 1452 : pp = 132.646240234375 
500 / 1452 : pp = 132.1982421875 
510 / 1452 : pp = 132.04200744628906 
520 / 1452 : pp = 131.86940002441406 
530 / 1452 : pp = 131.59841918945312 
540 / 1452 : pp = 131.12356567382812 
550 / 1452 : pp = 130.887939453125 
560 / 1452 : pp = 130.6210174560547 
570 / 1452 : pp = 130.37826538085938 
580 / 1452 : pp = 130.0374755859375 
590 / 1452 : pp = 129.75979614257812 
600 / 1452 : pp = 129.38308715820312 
610 / 1452 : pp = 129.16685485839844 
620 / 1452 : pp = 129.0115509033203 
630 / 1452 : pp = 128.75152587890625 
640 / 1452 : pp = 128.87295532226562 
650 / 1452 : pp = 128.88734436035156 
660 / 1452 : pp = 128.98275756835938 
670 / 1452 : pp = 129.0487060546875 
680 / 1452 : pp = 129.11013793945312 
690 / 1452 : pp = 129.0646514892578 
700 / 1452 : pp = 129.06280517578125 
710 / 1452 : pp = 129.1343994140625 
720 / 1452 : pp = 129.18582153320312 
730 / 1452 : pp = 129.15138244628906 
740 / 1452 : pp = 129.29811096191406 
750 / 1452 : pp = 129.339599609375 
760 / 1452 : pp = 129.4257354736328 
770 / 1452 : pp = 129.61631774902344 
780 / 1452 : pp = 129.802734375 
790 / 1452 : pp = 129.96804809570312 
800 / 1452 : pp = 129.95187377929688 
810 / 1452 : pp = 129.92417907714844 
820 / 1452 : pp = 129.9774627685547 
830 / 1452 : pp = 130.1638946533203 
840 / 1452 : pp = 130.13095092773438 
850 / 1452 : pp = 130.16595458984375 
860 / 1452 : pp = 130.173828125 
870 / 1452 : pp = 130.170166015625 
880 / 1452 : pp = 130.27032470703125 
890 / 1452 : pp = 130.3022003173828 
900 / 1452 : pp = 130.3071746826172 
910 / 1452 : pp = 130.37939453125 
920 / 1452 : pp = 130.46229553222656 
930 / 1452 : pp = 130.43846130371094 
940 / 1452 : pp = 130.50889587402344 
950 / 1452 : pp = 130.50086975097656 
960 / 1452 : pp = 130.4833221435547 
970 / 1452 : pp = 130.50814819335938 
980 / 1452 : pp = 130.35577392578125 
990 / 1452 : pp = 130.26759338378906 
1000 / 1452 : pp = 130.1064453125 
1010 / 1452 : pp = 130.1472625732422 
1020 / 1452 : pp = 130.27169799804688 
1030 / 1452 : pp = 130.25100708007812 
1040 / 1452 : pp = 130.30816650390625 
1050 / 1452 : pp = 130.29803466796875 
1060 / 1452 : pp = 130.2242431640625 
1070 / 1452 : pp = 130.35906982421875 
1080 / 1452 : pp = 130.45103454589844 
1090 / 1452 : pp = 130.49838256835938 
1100 / 1452 : pp = 130.484130859375 
1110 / 1452 : pp = 130.35316467285156 
1120 / 1452 : pp = 130.24697875976562 
1130 / 1452 : pp = 130.10804748535156 
1140 / 1452 : pp = 130.1076202392578 
1150 / 1452 : pp = 130.195068359375 
1160 / 1452 : pp = 130.19674682617188 
1170 / 1452 : pp = 130.18321228027344 
1180 / 1452 : pp = 130.24623107910156 
1190 / 1452 : pp = 130.33905029296875 
1200 / 1452 : pp = 130.3650360107422 
1210 / 1452 : pp = 130.34588623046875 
1220 / 1452 : pp = 130.4850616455078 
1230 / 1452 : pp = 130.63160705566406 
1240 / 1452 : pp = 130.64674377441406 
1250 / 1452 : pp = 130.77078247070312 
1260 / 1452 : pp = 130.8397674560547 
1270 / 1452 : pp = 130.8511199951172 
1280 / 1452 : pp = 130.88967895507812 
1290 / 1452 : pp = 130.9040985107422 
1300 / 1452 : pp = 130.93511962890625 
1310 / 1452 : pp = 130.9759063720703 
1320 / 1452 : pp = 130.92800903320312 
1330 / 1452 : pp = 130.9105224609375 
1340 / 1452 : pp = 130.929443359375 
1350 / 1452 : pp = 130.96153259277344 
1360 / 1452 : pp = 131.02381896972656 
1370 / 1452 : pp = 130.9545440673828 
1380 / 1452 : pp = 130.9344940185547 
1390 / 1452 : pp = 130.9055938720703 
1400 / 1452 : pp = 130.85386657714844 
1410 / 1452 : pp = 130.8874969482422 
1420 / 1452 : pp = 130.85928344726562 
1430 / 1452 : pp = 130.83995056152344 
1440 / 1452 : pp = 130.86659240722656 
1450 / 1452 : pp = 130.86839294433594 

0 / 115 : pp = 227.78428649902344 
10 / 115 : pp = 207.609619140625 
20 / 115 : pp = 209.92459106445312 
30 / 115 : pp = 206.96240234375 
40 / 115 : pp = 205.9295654296875 
50 / 115 : pp = 201.0296630859375 
60 / 115 : pp = 200.38059997558594 
70 / 115 : pp = 196.55764770507812 
80 / 115 : pp = 194.31735229492188 
90 / 115 : pp = 191.66146850585938 
100 / 115 : pp = 186.70437622070312 
110 / 115 : pp = 184.3171844482422 
Training perplexity: 130.85043334960938
Validation perplexity:183.88186645507812
Total time : 45.345656394958496
Epoch 14

0 / 1452 : pp = 164.82191467285156 
10 / 1452 : pp = 146.39089965820312 
20 / 1452 : pp = 142.93240356445312 
30 / 1452 : pp = 140.3113555908203 
40 / 1452 : pp = 142.39939880371094 
50 / 1452 : pp = 139.70162963867188 
60 / 1452 : pp = 138.73023986816406 
70 / 1452 : pp = 139.2675018310547 
80 / 1452 : pp = 138.47824096679688 
90 / 1452 : pp = 137.40432739257812 
100 / 1452 : pp = 136.47793579101562 
110 / 1452 : pp = 135.2294464111328 
120 / 1452 : pp = 134.80728149414062 
130 / 1452 : pp = 133.89822387695312 
140 / 1452 : pp = 132.54141235351562 
150 / 1452 : pp = 132.10025024414062 
160 / 1452 : pp = 132.21829223632812 
170 / 1452 : pp = 131.8765106201172 
180 / 1452 : pp = 131.37515258789062 
190 / 1452 : pp = 131.31622314453125 
200 / 1452 : pp = 131.78297424316406 
210 / 1452 : pp = 131.5507354736328 
220 / 1452 : pp = 131.7002410888672 
230 / 1452 : pp = 131.9277801513672 
240 / 1452 : pp = 131.72166442871094 
250 / 1452 : pp = 131.225830078125 
260 / 1452 : pp = 130.7496337890625 
270 / 1452 : pp = 129.9896697998047 
280 / 1452 : pp = 130.10594177246094 
290 / 1452 : pp = 130.41644287109375 
300 / 1452 : pp = 130.5982208251953 
310 / 1452 : pp = 130.36329650878906 
320 / 1452 : pp = 130.5633544921875 
330 / 1452 : pp = 130.77252197265625 
340 / 1452 : pp = 130.273193359375 
350 / 1452 : pp = 130.47889709472656 
360 / 1452 : pp = 130.4348602294922 
370 / 1452 : pp = 130.28126525878906 
380 / 1452 : pp = 130.02786254882812 
390 / 1452 : pp = 130.1564483642578 
400 / 1452 : pp = 129.98440551757812 
410 / 1452 : pp = 130.37721252441406 
420 / 1452 : pp = 130.71859741210938 
430 / 1452 : pp = 130.65939331054688 
440 / 1452 : pp = 130.72987365722656 
450 / 1452 : pp = 130.56272888183594 
460 / 1452 : pp = 130.28195190429688 
470 / 1452 : pp = 129.90936279296875 
480 / 1452 : pp = 129.42857360839844 
490 / 1452 : pp = 129.18077087402344 
500 / 1452 : pp = 128.7588348388672 
510 / 1452 : pp = 128.6303253173828 
520 / 1452 : pp = 128.47616577148438 
530 / 1452 : pp = 128.21148681640625 
540 / 1452 : pp = 127.7218017578125 
550 / 1452 : pp = 127.50067138671875 
560 / 1452 : pp = 127.27574157714844 
570 / 1452 : pp = 127.05399322509766 
580 / 1452 : pp = 126.73983001708984 
590 / 1452 : pp = 126.43692779541016 
600 / 1452 : pp = 126.06050109863281 
610 / 1452 : pp = 125.82952880859375 
620 / 1452 : pp = 125.66295623779297 
630 / 1452 : pp = 125.39354705810547 
640 / 1452 : pp = 125.49463653564453 
650 / 1452 : pp = 125.48816680908203 
660 / 1452 : pp = 125.58712005615234 
670 / 1452 : pp = 125.65978240966797 
680 / 1452 : pp = 125.71456146240234 
690 / 1452 : pp = 125.66937255859375 
700 / 1452 : pp = 125.65900421142578 
710 / 1452 : pp = 125.7271499633789 
720 / 1452 : pp = 125.77758026123047 
730 / 1452 : pp = 125.74129486083984 
740 / 1452 : pp = 125.8759765625 
750 / 1452 : pp = 125.91793823242188 
760 / 1452 : pp = 125.99595642089844 
770 / 1452 : pp = 126.18113708496094 
780 / 1452 : pp = 126.35147094726562 
790 / 1452 : pp = 126.50797271728516 
800 / 1452 : pp = 126.49759674072266 
810 / 1452 : pp = 126.48113250732422 
820 / 1452 : pp = 126.52528381347656 
830 / 1452 : pp = 126.705810546875 
840 / 1452 : pp = 126.67517852783203 
850 / 1452 : pp = 126.74176025390625 
860 / 1452 : pp = 126.74151611328125 
870 / 1452 : pp = 126.73414611816406 
880 / 1452 : pp = 126.83026885986328 
890 / 1452 : pp = 126.88519287109375 
900 / 1452 : pp = 126.88053894042969 
910 / 1452 : pp = 126.97138214111328 
920 / 1452 : pp = 127.04660034179688 
930 / 1452 : pp = 127.03763580322266 
940 / 1452 : pp = 127.1126480102539 
950 / 1452 : pp = 127.09610748291016 
960 / 1452 : pp = 127.0873794555664 
970 / 1452 : pp = 127.10343933105469 
980 / 1452 : pp = 126.96441650390625 
990 / 1452 : pp = 126.88519287109375 
1000 / 1452 : pp = 126.7336654663086 
1010 / 1452 : pp = 126.77796936035156 
1020 / 1452 : pp = 126.89826202392578 
1030 / 1452 : pp = 126.88761138916016 
1040 / 1452 : pp = 126.95309448242188 
1050 / 1452 : pp = 126.96478271484375 
1060 / 1452 : pp = 126.89324188232422 
1070 / 1452 : pp = 127.03242492675781 
1080 / 1452 : pp = 127.13228607177734 
1090 / 1452 : pp = 127.173095703125 
1100 / 1452 : pp = 127.15975189208984 
1110 / 1452 : pp = 127.0392074584961 
1120 / 1452 : pp = 126.94032287597656 
1130 / 1452 : pp = 126.80693054199219 
1140 / 1452 : pp = 126.81315612792969 
1150 / 1452 : pp = 126.90467834472656 
1160 / 1452 : pp = 126.91236114501953 
1170 / 1452 : pp = 126.90897369384766 
1180 / 1452 : pp = 126.98052215576172 
1190 / 1452 : pp = 127.07483673095703 
1200 / 1452 : pp = 127.10216522216797 
1210 / 1452 : pp = 127.08258819580078 
1220 / 1452 : pp = 127.22943878173828 
1230 / 1452 : pp = 127.38563537597656 
1240 / 1452 : pp = 127.40538024902344 
1250 / 1452 : pp = 127.53369140625 
1260 / 1452 : pp = 127.59293365478516 
1270 / 1452 : pp = 127.61489868164062 
1280 / 1452 : pp = 127.6484375 
1290 / 1452 : pp = 127.65257263183594 
1300 / 1452 : pp = 127.69329833984375 
1310 / 1452 : pp = 127.74549102783203 
1320 / 1452 : pp = 127.7043228149414 
1330 / 1452 : pp = 127.6866683959961 
1340 / 1452 : pp = 127.70913696289062 
1350 / 1452 : pp = 127.73233795166016 
1360 / 1452 : pp = 127.7855224609375 
1370 / 1452 : pp = 127.71918487548828 
1380 / 1452 : pp = 127.69987487792969 
1390 / 1452 : pp = 127.6697998046875 
1400 / 1452 : pp = 127.61137390136719 
1410 / 1452 : pp = 127.6404037475586 
1420 / 1452 : pp = 127.61094665527344 
1430 / 1452 : pp = 127.58216857910156 
1440 / 1452 : pp = 127.61477661132812 
1450 / 1452 : pp = 127.61964416503906 

0 / 115 : pp = 228.21578979492188 
10 / 115 : pp = 208.11244201660156 
20 / 115 : pp = 210.688232421875 
30 / 115 : pp = 207.62408447265625 
40 / 115 : pp = 206.45184326171875 
50 / 115 : pp = 201.52760314941406 
60 / 115 : pp = 200.7784881591797 
70 / 115 : pp = 196.83067321777344 
80 / 115 : pp = 194.6357879638672 
90 / 115 : pp = 191.9783935546875 
100 / 115 : pp = 186.8787841796875 
110 / 115 : pp = 184.35252380371094 
Training perplexity: 127.60413360595703
Validation perplexity:183.8877410888672
Total time : 41.6636528968811
Epoch 15

0 / 1452 : pp = 156.81654357910156 
10 / 1452 : pp = 142.1070556640625 
20 / 1452 : pp = 139.55076599121094 
30 / 1452 : pp = 136.63551330566406 
40 / 1452 : pp = 138.5840606689453 
50 / 1452 : pp = 136.052734375 
60 / 1452 : pp = 134.93019104003906 
70 / 1452 : pp = 135.65206909179688 
80 / 1452 : pp = 135.2620086669922 
90 / 1452 : pp = 134.314697265625 
100 / 1452 : pp = 133.4916229248047 
110 / 1452 : pp = 132.26052856445312 
120 / 1452 : pp = 131.7714080810547 
130 / 1452 : pp = 130.77365112304688 
140 / 1452 : pp = 129.5411834716797 
150 / 1452 : pp = 129.0791778564453 
160 / 1452 : pp = 129.21920776367188 
170 / 1452 : pp = 128.7528839111328 
180 / 1452 : pp = 128.22279357910156 
190 / 1452 : pp = 128.18177795410156 
200 / 1452 : pp = 128.58758544921875 
210 / 1452 : pp = 128.3906707763672 
220 / 1452 : pp = 128.5266571044922 
230 / 1452 : pp = 128.80563354492188 
240 / 1452 : pp = 128.61886596679688 
250 / 1452 : pp = 128.13172912597656 
260 / 1452 : pp = 127.69220733642578 
270 / 1452 : pp = 126.96150970458984 
280 / 1452 : pp = 127.04702758789062 
290 / 1452 : pp = 127.33565521240234 
300 / 1452 : pp = 127.55929565429688 
310 / 1452 : pp = 127.38514709472656 
320 / 1452 : pp = 127.52171325683594 
330 / 1452 : pp = 127.68690490722656 
340 / 1452 : pp = 127.18340301513672 
350 / 1452 : pp = 127.4073257446289 
360 / 1452 : pp = 127.30432891845703 
370 / 1452 : pp = 127.17618560791016 
380 / 1452 : pp = 126.92579650878906 
390 / 1452 : pp = 127.02473449707031 
400 / 1452 : pp = 126.8515625 
410 / 1452 : pp = 127.211669921875 
420 / 1452 : pp = 127.51788330078125 
430 / 1452 : pp = 127.47386169433594 
440 / 1452 : pp = 127.57164001464844 
450 / 1452 : pp = 127.3601303100586 
460 / 1452 : pp = 127.09434509277344 
470 / 1452 : pp = 126.71922302246094 
480 / 1452 : pp = 126.24349212646484 
490 / 1452 : pp = 125.98778533935547 
500 / 1452 : pp = 125.59526824951172 
510 / 1452 : pp = 125.4450912475586 
520 / 1452 : pp = 125.29247283935547 
530 / 1452 : pp = 125.03536224365234 
540 / 1452 : pp = 124.5813980102539 
550 / 1452 : pp = 124.33724212646484 
560 / 1452 : pp = 124.08995819091797 
570 / 1452 : pp = 123.86637878417969 
580 / 1452 : pp = 123.53152465820312 
590 / 1452 : pp = 123.20321655273438 
600 / 1452 : pp = 122.85673522949219 
610 / 1452 : pp = 122.64250946044922 
620 / 1452 : pp = 122.4958724975586 
630 / 1452 : pp = 122.22386169433594 
640 / 1452 : pp = 122.31143188476562 
650 / 1452 : pp = 122.30093383789062 
660 / 1452 : pp = 122.39427947998047 
670 / 1452 : pp = 122.45440673828125 
680 / 1452 : pp = 122.51146697998047 
690 / 1452 : pp = 122.4854736328125 
700 / 1452 : pp = 122.48600006103516 
710 / 1452 : pp = 122.56084442138672 
720 / 1452 : pp = 122.59059143066406 
730 / 1452 : pp = 122.55529022216797 
740 / 1452 : pp = 122.69409942626953 
750 / 1452 : pp = 122.76456451416016 
760 / 1452 : pp = 122.84437561035156 
770 / 1452 : pp = 123.02527618408203 
780 / 1452 : pp = 123.20509338378906 
790 / 1452 : pp = 123.36305236816406 
800 / 1452 : pp = 123.36852264404297 
810 / 1452 : pp = 123.36799621582031 
820 / 1452 : pp = 123.39976501464844 
830 / 1452 : pp = 123.59362030029297 
840 / 1452 : pp = 123.56946563720703 
850 / 1452 : pp = 123.63800811767578 
860 / 1452 : pp = 123.63983917236328 
870 / 1452 : pp = 123.64148712158203 
880 / 1452 : pp = 123.7568588256836 
890 / 1452 : pp = 123.7885513305664 
900 / 1452 : pp = 123.79640197753906 
910 / 1452 : pp = 123.86153411865234 
920 / 1452 : pp = 123.92941284179688 
930 / 1452 : pp = 123.9125747680664 
940 / 1452 : pp = 123.95559692382812 
950 / 1452 : pp = 123.93928527832031 
960 / 1452 : pp = 123.94294738769531 
970 / 1452 : pp = 123.95547485351562 
980 / 1452 : pp = 123.8229751586914 
990 / 1452 : pp = 123.73727416992188 
1000 / 1452 : pp = 123.59091186523438 
1010 / 1452 : pp = 123.634765625 
1020 / 1452 : pp = 123.76506042480469 
1030 / 1452 : pp = 123.75485229492188 
1040 / 1452 : pp = 123.807861328125 
1050 / 1452 : pp = 123.79156494140625 
1060 / 1452 : pp = 123.73054504394531 
1070 / 1452 : pp = 123.8615951538086 
1080 / 1452 : pp = 123.96564483642578 
1090 / 1452 : pp = 124.02104187011719 
1100 / 1452 : pp = 124.012939453125 
1110 / 1452 : pp = 123.87582397460938 
1120 / 1452 : pp = 123.775390625 
1130 / 1452 : pp = 123.63182067871094 
1140 / 1452 : pp = 123.62391662597656 
1150 / 1452 : pp = 123.71013641357422 
1160 / 1452 : pp = 123.72423553466797 
1170 / 1452 : pp = 123.71726989746094 
1180 / 1452 : pp = 123.79032897949219 
1190 / 1452 : pp = 123.87883758544922 
1200 / 1452 : pp = 123.9125747680664 
1210 / 1452 : pp = 123.90140533447266 
1220 / 1452 : pp = 124.03245544433594 
1230 / 1452 : pp = 124.19799041748047 
1240 / 1452 : pp = 124.21469116210938 
1250 / 1452 : pp = 124.34103393554688 
1260 / 1452 : pp = 124.4041976928711 
1270 / 1452 : pp = 124.42852020263672 
1280 / 1452 : pp = 124.46656036376953 
1290 / 1452 : pp = 124.4811019897461 
1300 / 1452 : pp = 124.52384185791016 
1310 / 1452 : pp = 124.57533264160156 
1320 / 1452 : pp = 124.5398178100586 
1330 / 1452 : pp = 124.52598571777344 
1340 / 1452 : pp = 124.53311157226562 
1350 / 1452 : pp = 124.57759094238281 
1360 / 1452 : pp = 124.63385772705078 
1370 / 1452 : pp = 124.58133697509766 
1380 / 1452 : pp = 124.55769348144531 
1390 / 1452 : pp = 124.54011535644531 
1400 / 1452 : pp = 124.4884033203125 
1410 / 1452 : pp = 124.51226806640625 
1420 / 1452 : pp = 124.49683380126953 
1430 / 1452 : pp = 124.4754638671875 
1440 / 1452 : pp = 124.50164031982422 
1450 / 1452 : pp = 124.50894165039062 

0 / 115 : pp = 230.8488006591797 
10 / 115 : pp = 209.2509002685547 
20 / 115 : pp = 211.68577575683594 
30 / 115 : pp = 208.44056701660156 
40 / 115 : pp = 207.2039337158203 
50 / 115 : pp = 202.1859588623047 
60 / 115 : pp = 201.34739685058594 
70 / 115 : pp = 197.4251251220703 
80 / 115 : pp = 195.2623291015625 
90 / 115 : pp = 192.592529296875 
100 / 115 : pp = 187.39553833007812 
110 / 115 : pp = 184.791259765625 
Training perplexity: 124.4933853149414
Validation perplexity:184.32510375976562
Total time : 40.856229066848755

0 / 128 : pp = 184.6475067138672 
10 / 128 : pp = 176.8856964111328 
20 / 128 : pp = 164.3444366455078 
30 / 128 : pp = 167.85472106933594 
40 / 128 : pp = 169.25367736816406 
50 / 128 : pp = 168.86561584472656 
60 / 128 : pp = 168.11801147460938 
70 / 128 : pp = 165.4105224609375 
80 / 128 : pp = 162.91146850585938 
90 / 128 : pp = 161.29742431640625 
100 / 128 : pp = 162.45989990234375 
110 / 128 : pp = 162.6834716796875 
120 / 128 : pp = 164.3359832763672 
=-==-==-==-==-=
Test perplexity: 164.0149383544922 
=-==-==-==-==-=
View Code

更详细的内容请参考下面链接

https://github.com/weizhenzhao/cs224d_nlp_problem_set2

 

今天将的还是cs224d 的problem set2 的第三部分习题,

原来国外大学的系统难度真的如此之大,相比之下还是默默地再天朝继续搬砖吧

下面讲述一下RNN语言建模的数学公式:

 

给出一串连续的词x1,x2...xt关于预测其后面紧跟的词xt+1的建模方式是:

vj是词库中的某个词。实现一个循环神经网络,此网络利用隐层中的反馈信息对"历史记录"x1,x2...xt进行建模:

$h^{(0)}=h_{0}\epsilon R^{D_{h}}$是隐藏层的初始化向量

$x^{(t)}L$是以$x^{(t)}$one-hot行向量与嵌入矩阵L的乘积

这个one-hot行向量就是当前处理词汇的索引

            

是词嵌入矩阵,

$L$是词嵌入矩阵

$I$是输入词表征矩阵

$H$是隐藏转换矩阵

$U$是输出词表征矩阵

$b_{1}$ $b_{2}$是偏置值

$d$是词嵌入的维数

|V|代表词库的规模

$D_{h}$是隐层的维数

输出向量

是面向整个词库的概率分布,我们需要最优化交叉熵(非正则化的)的损失率: 

使用困惑度来评估语言模型的性能,其定义形式如下:

梯度:

该模型中各个变量进行最优化迭代的时候的梯度如下所示:

初始化所有的上面这些需要训练的参数的值

然后通过对每一个词进行训练,安装上述公司求出每个参数的导数值

然后使用梯度下降方法对其进行更新

将新得到的参数代入到模型中,如果损失的值小于初始设定的值则停止迭代,否则继续进行迭代 

 


下面是一张RNNLM的结构图

 

上面这张是第二层RNN节点的结构图

上面这张是在RNN的变量上面应用Dropout的结构,降低模型过拟合的误差,第一层RNN的dropout结构

上面这张是第一层RNN的结构图

(注意前方高能,一大批天书即将来袭)

'''
Created on 2017年9月26日

@author: weizhen
'''
import getpass
import sys
import time
import numpy as np
from copy import deepcopy
from utils import calculate_perplexity, get_ptb_dataset, Vocab
from utils import ptb_iterator, sample
import tensorflow as tf
from model import LanguageModel
from tensorflow.contrib.legacy_seq2seq.python.ops.seq2seq import sequence_loss


class Config(object):
    """储存超参数和数据信息"""
    batch_size = 64
    embed_size = 50
    hidden_size = 100
    num_steps = 10
    max_epochs = 16
    early_stopping = 2
    dropout = 0.9
    lr = 0.001


class RNNLM_Model(LanguageModel):
    def load_data(self, debug=False):
        """加载词向量并且训练   train/dev/test 数据"""
        self.vocab = Vocab()
        self.vocab.construct(get_ptb_dataset('train'))
        self.encoded_train = np.array([self.vocab.encode(word) for word in get_ptb_dataset('train')], dtype=np.int32)
        self.encoded_valid = np.array([self.vocab.encode(word) for word in get_ptb_dataset('valid')], dtype=np.int32)
        self.encoded_test = np.array([self.vocab.encode(word) for word in get_ptb_dataset('test')])
        if debug:
            num_debug = 1024
            self.encoded_train = self.encoded_train[:num_debug]
            self.encoded_valid = self.encoded_valid[:num_debug]
            self.encoded_test = self.encoded_test[:num_debug]

    def add_placeholders(self):
        """生成placeholder 变量来表示输入的 tensors
            这些placeholder 被用来在模型的其他地方被填充
                            并且在训练的过程中会被填充
            input_placeholder:Input placeholder shape (None,num_steps),type  tf.int32
            labels_placeholder:label placeholder shape (None,num_steps) type tf.float32
            dropout_placeholder:dropput value placeholder (scalar), type tf.float32
        """
        self.input_placeholder = tf.placeholder(tf.int32, shape=[None, self.config.num_steps], name='Input')
        self.labels_placeholder = tf.placeholder(tf.int32, shape=[None, self.config.num_steps], name='Target')
        self.dropout_placeholder = tf.placeholder(tf.float32, name='Dropout')

    def add_embedding(self):
        """添加词嵌入层
        Hint : 这一层应该用input_placeholder 来索引词嵌入
        Hint : 你或许能发现tf.nn.embedding_lookup 是有用的
        Hint : 你或许能发现tf.split , tf.squeeze 是有用的在构造tensor 的输入的时候
        Hint : 下面是你需要创建的变量的维度
                L:(len(self.vocab),embed_size)
        Returns:
            inputs:一个训练次数的列表,每一个元素应该是
                    一个张量 大小是 (batch_size,embed_size)
        tf.split(dimension,num_split,input)
                dimension表示输入张量的哪一个维度,
                                        如果是0就表示对第0维度进行切割,
                num_split就是切割的数量,
                                        如果是2就表示输入张量被切成2份,
                                        每一份是一个列表
        tf.squeeze(input,squeeze_dims=None,name=None)
                                        从tensor中删除所有大小是1的维度
                example: t is a tensor of shape [1,2,1,3,1,1]
                        shape(squeeze(t))==>[2,3]
                        t is a tensor of shape [1,2,1,3,1,1]
                        shape(squeeze(t,[2,4]))==>[1,2,3,1]
        tf.nn.embedding_lookup 将词的索引映射到词的向量
        """
        with tf.device('/cpu:0'):
            embedding = tf.get_variable('Embedding', [len(self.vocab), self.config.embed_size], trainable=True)
            inputs = tf.nn.embedding_lookup(embedding, self.input_placeholder)
            inputs = [tf.squeeze(x, [1]) for x in tf.split(inputs, self.config.num_steps, 1)]
            return inputs

    def add_projection(self, rnn_outputs):
        """添加一个投影层
            投影层将隐藏层的表示变换到整个词向量上的分布式表示
            Hint:下面是你需要去创建的维度
                U(hidden_size,len(vocab))
                b_2:(len(vocab),)
            参数:
                rnn_outputs:一个训练次数的列表,每一个元素应该是一个张量
                            大小是(batch_size,embed_size)
            Returns:
                outputs:一个长度的列表,每一个元素是一个张量(batch_size,len(vocab))
        """
        with tf.variable_scope('Projection'):
            U = tf.get_variable('Matrix', [self.config.hidden_size, len(self.vocab)])
            proj_b = tf.get_variable('Bias', [len(self.vocab)])
            outputs = [tf.matmul(o, U) + proj_b for o in rnn_outputs]
        return outputs
    
    def add_loss_op(self, output):
        """将损失添加到目标函数上面
            Hint:使用tensorflow.python.ops.seq2seq.sequence_loss 来实现序列损失
                              参数:
                                        输出:一个张量   大小是 (None,self.vocab)
                              返回:
                                        损失:一个0-d大小的张量
        """
        all_ones = [tf.ones([self.config.batch_size * self.config.num_steps])]
        cross_entropy = sequence_loss([output], [tf.reshape(self.labels_placeholder, [-1])], all_ones, len(self.vocab))
        tf.add_to_collection('total_loss', cross_entropy)
        loss = tf.add_n(tf.get_collection('total_loss'))
        return loss
        
        
    def add_training_op(self, loss):
        """将目标损失添加到计算图上
            创建一个优化器并且应用梯度下降到所有的训练变量上面
            Hint:使用tf.train.AdamOptimizer 对于这个模型
                使用optimizer.minimize() 会返回一个train_op的对象
            参数:
                loss: 损失张量,来自于cross_entropy_loss 交叉熵损失
            返回:
                train_op:训练的目标
        """
        with tf.variable_scope("Optimizer") as scope:
            train_op = tf.train.AdamOptimizer(self.config.lr).minimize(loss)
        return train_op

    def __init__(self, config):
        self.config = config
        self.load_data(debug=False)
        self.add_placeholders()
        self.inputs = self.add_embedding()
        self.rnn_outputs = self.add_model(self.inputs)
        self.outputs = self.add_projection(self.rnn_outputs)

        # 我们想去检验下一个词预测得多好
        # 我们把o转变成float64 位 因为如果不这样就会有数值问题
        # sum(output of softmax) = 1.00000298179 并且不是 1
        self.predictions = [tf.nn.softmax(tf.cast(o, 'float64')) for o in self.outputs]
        # 将输出值转变成 len(vocab) 的大小
        output = tf.reshape(tf.concat(self.outputs, 1), [-1, len(self.vocab)])
        self.calculate_loss = self.add_loss_op(output)
        self.train_step = self.add_training_op(self.calculate_loss)

    def add_model(self, inputs):
        """创建RNN LM 模型
                      在下面的实现里面你需要去实现RNN LM 模型的等式
        Hint: 使用一个零向量 大小是 (batch_size,hidden_size) 作为初始的RNN的状态
        Hint: 将最后RNN输出 作为实例变量
            self.final_state
        Hint : 确保将dropout应用到 输入和输出的 变量上面
        Hint : 使用变量域 RNN 来定义 RNN变量
        Hint : 表现一个明显的 for-loop 在输入上面
                你可以使用scope.reuse_variable() 来确定权重
                在每一次迭代都是相同的
                确保不会在第一次循环的时候调用这个,因为没有变量会被初始化
        Hint : 下面变量的不同的维度 , 你需要去创建的

            H: (hidden_size,hidden_size)
            I: (embed_size,hidden_size)
            b_1:(hidden_size,)
        Args:
            inputs:一个记录num_steps的列表,里边的每一个元素应该是一个张量
                    大小是(batch_size,embed_size)的大小
        Returns:返回
            outputs:一个记录num_steps的列表,里面每一个元素应该是一个张量
                    大小是(batch_size,hidden_size)
        """
        with tf.variable_scope('InputDropout'):
            inputs = [tf.nn.dropout(x, self.dropout_placeholder) for x in inputs]

        with tf.variable_scope('RNN') as scope:
            self.initial_state = tf.zeros([self.config.batch_size, self.config.hidden_size])
            state = self.initial_state
            rnn_outputs = []
            for tstep, current_input in enumerate(inputs):
                if tstep > 0:
                    scope.reuse_variables()
                RNN_H = tf.get_variable('HMatrix', [self.config.hidden_size, self.config.hidden_size])
                RNN_I = tf.get_variable('IMatrix', [self.config.embed_size, self.config.hidden_size])
                RNN_b = tf.get_variable('B', [self.config.hidden_size])
                state = tf.nn.sigmoid(tf.matmul(state, RNN_H) + tf.matmul(current_input, RNN_I) + RNN_b)
                rnn_outputs.append(state)
            self.final_state = rnn_outputs[-1]

        with tf.variable_scope('RNNDropout'):
            rnn_outputs = [tf.nn.dropout(x, self.dropout_placeholder) for x in rnn_outputs]
        return rnn_outputs

    def run_epoch(self, session, data, train_op=None, verbose=10):
        config = self.config
        dp = config.dropout
        if not train_op:
            train_op = tf.no_op()
            dp = 1
        total_steps = sum(1 for x in ptb_iterator(data, config.batch_size, config.num_steps))
        total_loss = []
        state = self.initial_state.eval()
        for step, (x, y) in enumerate(ptb_iterator(data, config.batch_size, config.num_steps)):
            # 我们需要通过初始状态,并且从最终状态中抽取数据来进行填充
            # RNN 合适的 历史
            feed = {self.input_placeholder: x,
                    self.labels_placeholder: y,
                    self.initial_state: state,
                    self.dropout_placeholder: dp
                    }
            loss, state, _ = session.run([self.calculate_loss, self.final_state, train_op], feed_dict=feed)
            total_loss.append(loss)
            if verbose and step % verbose == 0:
                sys.stdout.write('\r{} / {} : pp = {} '.format(step, total_steps, np.exp(np.mean(total_loss))))
                sys.stdout.flush()
        if verbose:
            sys.stdout.write('\r')
        return np.exp(np.mean(total_loss))

def generate_text(session, model, config, starting_text='<eos>', stop_length=100, stop_tokens=None, temp=1.0):
    """从模型自动生成文字
        Hint:创建一个feed-dictionary 并且使用sess.run()方法去执行这个模型
                你会需要使用model.initial_state 作为一个键传递给feed_dict
        Hint:得到model.final_state 和 model.predictions[-1].
             在add_model()方法中设置model.final_state  。
             model.predictions 是在 __init__方法中设置的
        Hint:在模型的训练中存储输出的参数值,和预测的y_pred的值
        参数:
        Args:
            session : tf.Session() object
            model : Object of type RNNLM Model
            config : A Config() object
            starting_text:Initial text passed to model
        Returns:
            output : List of word idxs
    """
    state = model.initial_state.eval()
    # Imagine tokens as a batch size of one, length of len(tokens[0])
    tokens = [model.vocab.encode(word) for word in starting_text.split()]
    for i in range(stop_length):
        feed = {model.input_placeholder: [tokens[-1:]],
                model.initial_state: state,
                model.dropout_placeholder: 1}
        state, y_pred = session.run([model.final_state, model.predictions[-1]], feed_dict=feed)
        next_word_idx = sample(y_pred[0], temperature=temp)
        tokens.append(next_word_idx)
        if stop_tokens and model.vocab.decode(tokens[-1]) in stop_tokens:
            break
    output = [model.vocab.decode(word_idx) for word_idx in tokens]
    return output

def generate_sentence(session, model, config, *args, **kwargs):
    """方便从模型来生成句子"""
    return generate_text(session, model, config, *args, stop_tokens=['<eos>'], **kwargs)

def test_RNNLM():
    config = Config()
    gen_config = deepcopy(config)
    gen_config.batch_size = gen_config.num_steps = 1

    # 创建训练模型,并且生成模型
    with tf.variable_scope('RNNLM',reuse=None) as scope:
        model = RNNLM_Model(config)
        # 这个指示gen_model来重新使用相同的变量作为以上的模型
        scope.reuse_variables()
        gen_model = RNNLM_Model(gen_config)

    init = tf.global_variables_initializer()
    saver = tf.train.Saver()

    with tf.Session() as session:
        best_val_pp = float('inf')
        best_val_epoch = 0
        session.run(init)
        for epoch in range(config.max_epochs):
            print('Epoch {0}'.format(epoch))
            start = time.time()

            train_pp = model.run_epoch(session,
                                       model.encoded_train,
                                       train_op=model.train_step)
            valid_pp = model.run_epoch(session, model.encoded_valid)
            print('Training perplexity: {0}'.format(train_pp))
            print('Validation perplexity:{0}'.format(valid_pp))
            if valid_pp < best_val_pp:
                best_val_pp = valid_pp
                best_val_epoch = epoch
                saver.save(session, './ptb_rnnlm.weights')
            if epoch - best_val_epoch > config.early_stopping:
                break
            print('Total time : {0}'.format(time.time() - start))

        saver.restore(session, 'ptb_rnnlm.weights')
        test_pp = model.run_epoch(session, model.encoded_test)
        print('=-=' * 5)
        print('Test perplexity: {0} '.format(test_pp))
        print('=-=' * 5)
        starting_text = 'in palo alto'
        while starting_text:
            print(' '.join(generate_sentence(session, gen_model, gen_config, starting_text=starting_text, temp=1.0)))
            #starting_text = raw_input('>')


if __name__ == "__main__":
    test_RNNLM()

(其实也不算是天书啦,比高数简单多啦,比数学分析那是简单了好几十万倍了呀)

下面是训练的Log

1380 / 1452 : pp = 266.20892333984375 
1390 / 1452 : pp = 265.94439697265625 
1400 / 1452 : pp = 265.66845703125 
1410 / 1452 : pp = 265.5393981933594 
1420 / 1452 : pp = 265.32489013671875 
1430 / 1452 : pp = 265.2019348144531 
1440 / 1452 : pp = 265.13720703125 
1450 / 1452 : pp = 264.954833984375 

0 / 115 : pp = 296.9217224121094 
10 / 115 : pp = 282.02130126953125 
20 / 115 : pp = 279.76800537109375 
30 / 115 : pp = 276.4101257324219 
40 / 115 : pp = 276.2939147949219 
50 / 115 : pp = 270.73565673828125 
60 / 115 : pp = 269.88134765625 
70 / 115 : pp = 266.8675231933594 
80 / 115 : pp = 263.6731872558594 
90 / 115 : pp = 260.8569030761719 
100 / 115 : pp = 256.3356628417969 
110 / 115 : pp = 255.1026611328125 
Training perplexity: 264.9092102050781
Validation perplexity:254.84902954101562
Total time : 41.65332388877869
Epoch 3

0 / 1452 : pp = 327.0847473144531 
10 / 1452 : pp = 273.9620056152344 
20 / 1452 : pp = 270.22943115234375 
30 / 1452 : pp = 263.5213317871094 
40 / 1452 : pp = 264.0644836425781 
50 / 1452 : pp = 258.6029968261719 
60 / 1452 : pp = 257.04290771484375 
70 / 1452 : pp = 257.59161376953125 
80 / 1452 : pp = 256.7600402832031 
90 / 1452 : pp = 254.5120391845703 
100 / 1452 : pp = 252.44725036621094 
110 / 1452 : pp = 250.13954162597656 
120 / 1452 : pp = 249.91647338867188 
130 / 1452 : pp = 249.50460815429688 
140 / 1452 : pp = 247.67440795898438 
150 / 1452 : pp = 247.19090270996094 
160 / 1452 : pp = 247.8919219970703 
170 / 1452 : pp = 247.54322814941406 
180 / 1452 : pp = 246.17623901367188 
190 / 1452 : pp = 245.78330993652344 
200 / 1452 : pp = 246.80552673339844 
210 / 1452 : pp = 246.3059844970703 
220 / 1452 : pp = 246.19021606445312 
230 / 1452 : pp = 246.70140075683594 
240 / 1452 : pp = 246.3099822998047 
250 / 1452 : pp = 245.1745147705078 
260 / 1452 : pp = 244.17384338378906 
270 / 1452 : pp = 242.57363891601562 
280 / 1452 : pp = 242.8500213623047 
290 / 1452 : pp = 243.0492706298828 
300 / 1452 : pp = 243.1466522216797 
310 / 1452 : pp = 242.89044189453125 
320 / 1452 : pp = 243.08045959472656 
330 / 1452 : pp = 243.32235717773438 
340 / 1452 : pp = 242.34715270996094 
350 / 1452 : pp = 242.80972290039062 
360 / 1452 : pp = 242.5345458984375 
370 / 1452 : pp = 242.0083465576172 
380 / 1452 : pp = 241.22708129882812 
390 / 1452 : pp = 241.24398803710938 
400 / 1452 : pp = 240.63473510742188 
410 / 1452 : pp = 240.94094848632812 
420 / 1452 : pp = 241.19717407226562 
430 / 1452 : pp = 240.8896026611328 
440 / 1452 : pp = 240.7772979736328 
450 / 1452 : pp = 240.45913696289062 
460 / 1452 : pp = 240.06674194335938 
470 / 1452 : pp = 239.42198181152344 
480 / 1452 : pp = 238.39271545410156 
490 / 1452 : pp = 238.0517120361328 
500 / 1452 : pp = 237.31752014160156 
510 / 1452 : pp = 237.1197967529297 
520 / 1452 : pp = 236.64865112304688 
530 / 1452 : pp = 236.004638671875 
540 / 1452 : pp = 235.192626953125 
550 / 1452 : pp = 234.6700439453125 
560 / 1452 : pp = 234.1914825439453 
570 / 1452 : pp = 233.80899047851562 
580 / 1452 : pp = 233.3753662109375 
590 / 1452 : pp = 232.8699188232422 
600 / 1452 : pp = 232.2629852294922 
610 / 1452 : pp = 231.8668212890625 
620 / 1452 : pp = 231.478515625 
630 / 1452 : pp = 231.0444793701172 
640 / 1452 : pp = 231.2737579345703 
650 / 1452 : pp = 231.28114318847656 
660 / 1452 : pp = 231.4324951171875 
670 / 1452 : pp = 231.48513793945312 
680 / 1452 : pp = 231.45932006835938 
690 / 1452 : pp = 231.17738342285156 
700 / 1452 : pp = 231.00570678710938 
710 / 1452 : pp = 231.03810119628906 
720 / 1452 : pp = 230.96131896972656 
730 / 1452 : pp = 230.91110229492188 
740 / 1452 : pp = 231.13539123535156 
750 / 1452 : pp = 231.04393005371094 
760 / 1452 : pp = 231.03489685058594 
770 / 1452 : pp = 231.19744873046875 
780 / 1452 : pp = 231.26625061035156 
790 / 1452 : pp = 231.38714599609375 
800 / 1452 : pp = 231.24441528320312 
810 / 1452 : pp = 231.16824340820312 
820 / 1452 : pp = 231.11831665039062 
830 / 1452 : pp = 231.34886169433594 
840 / 1452 : pp = 231.221923828125 
850 / 1452 : pp = 231.2562255859375 
860 / 1452 : pp = 231.26492309570312 
870 / 1452 : pp = 231.1961212158203 
880 / 1452 : pp = 231.30506896972656 
890 / 1452 : pp = 231.24728393554688 
900 / 1452 : pp = 231.15744018554688 
910 / 1452 : pp = 231.20175170898438 
920 / 1452 : pp = 231.25534057617188 
930 / 1452 : pp = 231.09461975097656 
940 / 1452 : pp = 231.12612915039062 
950 / 1452 : pp = 231.0475616455078 
960 / 1452 : pp = 230.86056518554688 
970 / 1452 : pp = 230.80377197265625 
980 / 1452 : pp = 230.4598846435547 
990 / 1452 : pp = 230.24559020996094 
1000 / 1452 : pp = 229.91030883789062 
1010 / 1452 : pp = 229.9349822998047 
1020 / 1452 : pp = 230.01470947265625 
1030 / 1452 : pp = 229.8909149169922 
1040 / 1452 : pp = 229.9403533935547 
1050 / 1452 : pp = 229.84815979003906 
1060 / 1452 : pp = 229.60377502441406 
1070 / 1452 : pp = 229.74647521972656 
1080 / 1452 : pp = 229.80410766601562 
1090 / 1452 : pp = 229.78733825683594 
1100 / 1452 : pp = 229.64549255371094 
1110 / 1452 : pp = 229.26255798339844 
1120 / 1452 : pp = 229.00262451171875 
1130 / 1452 : pp = 228.6716766357422 
1140 / 1452 : pp = 228.55067443847656 
1150 / 1452 : pp = 228.61563110351562 
1160 / 1452 : pp = 228.50958251953125 
1170 / 1452 : pp = 228.3498992919922 
1180 / 1452 : pp = 228.29786682128906 
1190 / 1452 : pp = 228.33204650878906 
1200 / 1452 : pp = 228.27369689941406 
1210 / 1452 : pp = 228.11831665039062 
1220 / 1452 : pp = 228.21775817871094 
1230 / 1452 : pp = 228.3170166015625 
1240 / 1452 : pp = 228.22134399414062 
1250 / 1452 : pp = 228.3769073486328 
1260 / 1452 : pp = 228.37527465820312 
1270 / 1452 : pp = 228.33694458007812 
1280 / 1452 : pp = 228.27108764648438 
1290 / 1452 : pp = 228.1731414794922 
1300 / 1452 : pp = 228.12200927734375 
1310 / 1452 : pp = 228.10275268554688 
1320 / 1452 : pp = 227.9289093017578 
1330 / 1452 : pp = 227.77723693847656 
1340 / 1452 : pp = 227.79623413085938 
1350 / 1452 : pp = 227.7408447265625 
1360 / 1452 : pp = 227.72586059570312 
1370 / 1452 : pp = 227.49728393554688 
1380 / 1452 : pp = 227.37940979003906 
1390 / 1452 : pp = 227.20166015625 
1400 / 1452 : pp = 227.018310546875 
1410 / 1452 : pp = 226.95651245117188 
1420 / 1452 : pp = 226.8065643310547 
1430 / 1452 : pp = 226.7261199951172 
1440 / 1452 : pp = 226.7193145751953 
1450 / 1452 : pp = 226.61068725585938 

0 / 115 : pp = 269.342041015625 
10 / 115 : pp = 255.03016662597656 
20 / 115 : pp = 253.8992919921875 
30 / 115 : pp = 251.04025268554688 
40 / 115 : pp = 250.51756286621094 
50 / 115 : pp = 245.3595428466797 
60 / 115 : pp = 244.4713897705078 
70 / 115 : pp = 241.2674560546875 
80 / 115 : pp = 238.3473663330078 
90 / 115 : pp = 235.56423950195312 
100 / 115 : pp = 231.2281036376953 
110 / 115 : pp = 229.8423614501953 
Training perplexity: 226.5760040283203
Validation perplexity:229.59939575195312
Total time : 42.202677726745605
Epoch 4

0 / 1452 : pp = 282.2423095703125 
10 / 1452 : pp = 240.16258239746094 
20 / 1452 : pp = 236.12203979492188 
30 / 1452 : pp = 230.3953857421875 
40 / 1452 : pp = 231.8789825439453 
50 / 1452 : pp = 227.26612854003906 
60 / 1452 : pp = 226.22061157226562 
70 / 1452 : pp = 227.01885986328125 
80 / 1452 : pp = 226.2459716796875 
90 / 1452 : pp = 224.3211669921875 
100 / 1452 : pp = 222.65615844726562 
110 / 1452 : pp = 220.70326232910156 
120 / 1452 : pp = 220.42288208007812 
130 / 1452 : pp = 219.8100128173828 
140 / 1452 : pp = 218.04432678222656 
150 / 1452 : pp = 217.31639099121094 
160 / 1452 : pp = 217.86349487304688 
170 / 1452 : pp = 217.46597290039062 
180 / 1452 : pp = 216.3349151611328 
190 / 1452 : pp = 216.12240600585938 
200 / 1452 : pp = 216.97842407226562 
210 / 1452 : pp = 216.51014709472656 
220 / 1452 : pp = 216.46751403808594 
230 / 1452 : pp = 216.80126953125 
240 / 1452 : pp = 216.45965576171875 
250 / 1452 : pp = 215.5008544921875 
260 / 1452 : pp = 214.62210083007812 
270 / 1452 : pp = 213.29183959960938 
280 / 1452 : pp = 213.5621337890625 
290 / 1452 : pp = 213.80657958984375 
300 / 1452 : pp = 213.8963165283203 
310 / 1452 : pp = 213.60653686523438 
320 / 1452 : pp = 213.85877990722656 
330 / 1452 : pp = 214.07345581054688 
340 / 1452 : pp = 213.25421142578125 
350 / 1452 : pp = 213.68019104003906 
360 / 1452 : pp = 213.41717529296875 
370 / 1452 : pp = 213.04920959472656 
380 / 1452 : pp = 212.39019775390625 
390 / 1452 : pp = 212.4908905029297 
400 / 1452 : pp = 212.01914978027344 
410 / 1452 : pp = 212.36903381347656 
420 / 1452 : pp = 212.6802520751953 
430 / 1452 : pp = 212.42697143554688 
440 / 1452 : pp = 212.42990112304688 
450 / 1452 : pp = 212.14524841308594 
460 / 1452 : pp = 211.7836151123047 
470 / 1452 : pp = 211.17282104492188 
480 / 1452 : pp = 210.27903747558594 
490 / 1452 : pp = 209.95211791992188 
500 / 1452 : pp = 209.28302001953125 
510 / 1452 : pp = 209.1029815673828 
520 / 1452 : pp = 208.73855590820312 
530 / 1452 : pp = 208.19700622558594 
540 / 1452 : pp = 207.4554443359375 
550 / 1452 : pp = 207.0062255859375 
560 / 1452 : pp = 206.59739685058594 
570 / 1452 : pp = 206.27874755859375 
580 / 1452 : pp = 205.87144470214844 
590 / 1452 : pp = 205.43545532226562 
600 / 1452 : pp = 204.90940856933594 
610 / 1452 : pp = 204.5686798095703 
620 / 1452 : pp = 204.22862243652344 
630 / 1452 : pp = 203.8448028564453 
640 / 1452 : pp = 204.06576538085938 
650 / 1452 : pp = 204.0941925048828 
660 / 1452 : pp = 204.22103881835938 
670 / 1452 : pp = 204.289794921875 
680 / 1452 : pp = 204.3115234375 
690 / 1452 : pp = 204.10284423828125 
700 / 1452 : pp = 203.99757385253906 
710 / 1452 : pp = 204.04971313476562 
720 / 1452 : pp = 204.03152465820312 
730 / 1452 : pp = 203.99046325683594 
740 / 1452 : pp = 204.19786071777344 
750 / 1452 : pp = 204.1642608642578 
760 / 1452 : pp = 204.19435119628906 
770 / 1452 : pp = 204.37786865234375 
780 / 1452 : pp = 204.4965057373047 
790 / 1452 : pp = 204.6479034423828 
800 / 1452 : pp = 204.56117248535156 
810 / 1452 : pp = 204.52284240722656 
820 / 1452 : pp = 204.50978088378906 
830 / 1452 : pp = 204.7531280517578 
840 / 1452 : pp = 204.64468383789062 
850 / 1452 : pp = 204.71348571777344 
860 / 1452 : pp = 204.7399444580078 
870 / 1452 : pp = 204.69406127929688 
880 / 1452 : pp = 204.7965850830078 
890 / 1452 : pp = 204.7594757080078 
900 / 1452 : pp = 204.71446228027344 
910 / 1452 : pp = 204.7590789794922 
920 / 1452 : pp = 204.85772705078125 
930 / 1452 : pp = 204.7428741455078 
940 / 1452 : pp = 204.8068389892578 
950 / 1452 : pp = 204.75791931152344 
960 / 1452 : pp = 204.63815307617188 
970 / 1452 : pp = 204.60760498046875 
980 / 1452 : pp = 204.34347534179688 
990 / 1452 : pp = 204.151611328125 
1000 / 1452 : pp = 203.8665771484375 
1010 / 1452 : pp = 203.9164581298828 
1020 / 1452 : pp = 204.0184783935547 
1030 / 1452 : pp = 203.95166015625 
1040 / 1452 : pp = 204.03045654296875 
1050 / 1452 : pp = 203.95846557617188 
1060 / 1452 : pp = 203.77114868164062 
1070 / 1452 : pp = 203.93260192871094 
1080 / 1452 : pp = 204.00048828125 
1090 / 1452 : pp = 204.00233459472656 
1100 / 1452 : pp = 203.8960418701172 
1110 / 1452 : pp = 203.5987548828125 
1120 / 1452 : pp = 203.38392639160156 
1130 / 1452 : pp = 203.08872985839844 
1140 / 1452 : pp = 203.01272583007812 
1150 / 1452 : pp = 203.0865936279297 
1160 / 1452 : pp = 203.02308654785156 
1170 / 1452 : pp = 202.9125518798828 
1180 / 1452 : pp = 202.9097442626953 
1190 / 1452 : pp = 202.98252868652344 
1200 / 1452 : pp = 202.95387268066406 
1210 / 1452 : pp = 202.851318359375 
1220 / 1452 : pp = 202.97671508789062 
1230 / 1452 : pp = 203.1051025390625 
1240 / 1452 : pp = 203.0526123046875 
1250 / 1452 : pp = 203.21417236328125 
1260 / 1452 : pp = 203.23617553710938 
1270 / 1452 : pp = 203.22802734375 
1280 / 1452 : pp = 203.20846557617188 
1290 / 1452 : pp = 203.15362548828125 
1300 / 1452 : pp = 203.14315795898438 
1310 / 1452 : pp = 203.15264892578125 
1320 / 1452 : pp = 203.02801513671875 
1330 / 1452 : pp = 202.92977905273438 
1340 / 1452 : pp = 202.95484924316406 
1350 / 1452 : pp = 202.9335479736328 
1360 / 1452 : pp = 202.955322265625 
1370 / 1452 : pp = 202.7740478515625 
1380 / 1452 : pp = 202.68569946289062 
1390 / 1452 : pp = 202.55816650390625 
1400 / 1452 : pp = 202.41651916503906 
1410 / 1452 : pp = 202.38494873046875 
1420 / 1452 : pp = 202.27593994140625 
1430 / 1452 : pp = 202.21826171875 
1440 / 1452 : pp = 202.23272705078125 
1450 / 1452 : pp = 202.16099548339844 

0 / 115 : pp = 253.23211669921875 
10 / 115 : pp = 237.62506103515625 
20 / 115 : pp = 237.60557556152344 
30 / 115 : pp = 234.9273223876953 
40 / 115 : pp = 234.30519104003906 
50 / 115 : pp = 229.43960571289062 
60 / 115 : pp = 228.6050567626953 
70 / 115 : pp = 225.2646484375 
80 / 115 : pp = 222.55935668945312 
90 / 115 : pp = 219.83255004882812 
100 / 115 : pp = 215.5491485595703 
110 / 115 : pp = 214.07937622070312 
Training perplexity: 202.1349639892578
Validation perplexity:213.85256958007812
Total time : 42.10724234580994
Epoch 5

0 / 1452 : pp = 255.92384338378906 
10 / 1452 : pp = 219.5322265625 
20 / 1452 : pp = 214.36212158203125 
30 / 1452 : pp = 209.12620544433594 
40 / 1452 : pp = 210.04193115234375 
50 / 1452 : pp = 205.77398681640625 
60 / 1452 : pp = 204.8201141357422 
70 / 1452 : pp = 205.3955841064453 
80 / 1452 : pp = 204.8386688232422 
90 / 1452 : pp = 203.21194458007812 
100 / 1452 : pp = 201.87643432617188 
110 / 1452 : pp = 200.10122680664062 
120 / 1452 : pp = 199.82012939453125 
130 / 1452 : pp = 199.11192321777344 
140 / 1452 : pp = 197.51919555664062 
150 / 1452 : pp = 197.03567504882812 
160 / 1452 : pp = 197.4231414794922 
170 / 1452 : pp = 197.09571838378906 
180 / 1452 : pp = 196.17665100097656 
190 / 1452 : pp = 196.0064697265625 
200 / 1452 : pp = 196.7347869873047 
210 / 1452 : pp = 196.3063507080078 
220 / 1452 : pp = 196.21388244628906 
230 / 1452 : pp = 196.5252227783203 
240 / 1452 : pp = 196.203125 
250 / 1452 : pp = 195.3251953125 
260 / 1452 : pp = 194.53335571289062 
270 / 1452 : pp = 193.3546142578125 
280 / 1452 : pp = 193.59420776367188 
290 / 1452 : pp = 193.83297729492188 
300 / 1452 : pp = 193.98489379882812 
310 / 1452 : pp = 193.68414306640625 
320 / 1452 : pp = 193.89065551757812 
330 / 1452 : pp = 194.0518798828125 
340 / 1452 : pp = 193.32888793945312 
350 / 1452 : pp = 193.76219177246094 
360 / 1452 : pp = 193.56106567382812 
370 / 1452 : pp = 193.28179931640625 
380 / 1452 : pp = 192.7037811279297 
390 / 1452 : pp = 192.8145294189453 
400 / 1452 : pp = 192.43325805664062 
410 / 1452 : pp = 192.81527709960938 
420 / 1452 : pp = 193.13760375976562 
430 / 1452 : pp = 192.9148712158203 
440 / 1452 : pp = 192.92526245117188 
450 / 1452 : pp = 192.70083618164062 
460 / 1452 : pp = 192.36647033691406 
470 / 1452 : pp = 191.85394287109375 
480 / 1452 : pp = 191.07244873046875 
490 / 1452 : pp = 190.75401306152344 
500 / 1452 : pp = 190.1843719482422 
510 / 1452 : pp = 190.03334045410156 
520 / 1452 : pp = 189.72938537597656 
530 / 1452 : pp = 189.25889587402344 
540 / 1452 : pp = 188.59315490722656 
550 / 1452 : pp = 188.19313049316406 
560 / 1452 : pp = 187.80621337890625 
570 / 1452 : pp = 187.5229034423828 
580 / 1452 : pp = 187.1091766357422 
590 / 1452 : pp = 186.72592163085938 
600 / 1452 : pp = 186.2238006591797 
610 / 1452 : pp = 185.89695739746094 
620 / 1452 : pp = 185.60989379882812 
630 / 1452 : pp = 185.2689208984375 
640 / 1452 : pp = 185.47567749023438 
650 / 1452 : pp = 185.5127410888672 
660 / 1452 : pp = 185.64627075195312 
670 / 1452 : pp = 185.71311950683594 
680 / 1452 : pp = 185.72569274902344 
690 / 1452 : pp = 185.56459045410156 
700 / 1452 : pp = 185.48681640625 
710 / 1452 : pp = 185.5458221435547 
720 / 1452 : pp = 185.5598907470703 
730 / 1452 : pp = 185.5335235595703 
740 / 1452 : pp = 185.73995971679688 
750 / 1452 : pp = 185.744384765625 
760 / 1452 : pp = 185.81268310546875 
770 / 1452 : pp = 186.00088500976562 
780 / 1452 : pp = 186.14443969726562 
790 / 1452 : pp = 186.30764770507812 
800 / 1452 : pp = 186.2595977783203 
810 / 1452 : pp = 186.23028564453125 
820 / 1452 : pp = 186.23997497558594 
830 / 1452 : pp = 186.49057006835938 
840 / 1452 : pp = 186.43331909179688 
850 / 1452 : pp = 186.48887634277344 
860 / 1452 : pp = 186.51502990722656 
870 / 1452 : pp = 186.5167999267578 
880 / 1452 : pp = 186.62400817871094 
890 / 1452 : pp = 186.6103973388672 
900 / 1452 : pp = 186.58111572265625 
910 / 1452 : pp = 186.64126586914062 
920 / 1452 : pp = 186.7366180419922 
930 / 1452 : pp = 186.65719604492188 
940 / 1452 : pp = 186.71755981445312 
950 / 1452 : pp = 186.6977996826172 
960 / 1452 : pp = 186.62774658203125 
970 / 1452 : pp = 186.62115478515625 
980 / 1452 : pp = 186.3773193359375 
990 / 1452 : pp = 186.23109436035156 
1000 / 1452 : pp = 185.99227905273438 
1010 / 1452 : pp = 186.0488739013672 
1020 / 1452 : pp = 186.1744384765625 
1030 / 1452 : pp = 186.1162109375 
1040 / 1452 : pp = 186.18899536132812 
1050 / 1452 : pp = 186.1549072265625 
1060 / 1452 : pp = 186.01419067382812 
1070 / 1452 : pp = 186.17364501953125 
1080 / 1452 : pp = 186.27061462402344 
1090 / 1452 : pp = 186.28428649902344 
1100 / 1452 : pp = 186.2150115966797 
1110 / 1452 : pp = 185.95103454589844 
1120 / 1452 : pp = 185.77423095703125 
1130 / 1452 : pp = 185.5232696533203 
1140 / 1452 : pp = 185.4607391357422 
1150 / 1452 : pp = 185.56077575683594 
1160 / 1452 : pp = 185.53343200683594 
1170 / 1452 : pp = 185.46453857421875 
1180 / 1452 : pp = 185.4741668701172 
1190 / 1452 : pp = 185.5594482421875 
1200 / 1452 : pp = 185.53785705566406 
1210 / 1452 : pp = 185.4576416015625 
1220 / 1452 : pp = 185.5943145751953 
1230 / 1452 : pp = 185.7483673095703 
1240 / 1452 : pp = 185.70762634277344 
1250 / 1452 : pp = 185.8568115234375 
1260 / 1452 : pp = 185.90635681152344 
1270 / 1452 : pp = 185.8961639404297 
1280 / 1452 : pp = 185.89199829101562 
1290 / 1452 : pp = 185.85911560058594 
1300 / 1452 : pp = 185.86097717285156 
1310 / 1452 : pp = 185.88739013671875 
1320 / 1452 : pp = 185.79248046875 
1330 / 1452 : pp = 185.69700622558594 
1340 / 1452 : pp = 185.7310028076172 
1350 / 1452 : pp = 185.72613525390625 
1360 / 1452 : pp = 185.76829528808594 
1370 / 1452 : pp = 185.6322021484375 
1380 / 1452 : pp = 185.56378173828125 
1390 / 1452 : pp = 185.4654998779297 
1400 / 1452 : pp = 185.35110473632812 
1410 / 1452 : pp = 185.33917236328125 
1420 / 1452 : pp = 185.2509002685547 
1430 / 1452 : pp = 185.20436096191406 
1440 / 1452 : pp = 185.2254638671875 
1450 / 1452 : pp = 185.16542053222656 

0 / 115 : pp = 242.26800537109375 
10 / 115 : pp = 226.12258911132812 
20 / 115 : pp = 226.4702606201172 
30 / 115 : pp = 223.982666015625 
40 / 115 : pp = 223.376953125 
50 / 115 : pp = 218.65716552734375 
60 / 115 : pp = 217.95306396484375 
70 / 115 : pp = 214.5392303466797 
80 / 115 : pp = 212.07525634765625 
90 / 115 : pp = 209.40631103515625 
100 / 115 : pp = 205.1455078125 
110 / 115 : pp = 203.6289520263672 
Training perplexity: 185.14476013183594
Validation perplexity:203.3822784423828
Total time : 42.47052240371704
Epoch 6

0 / 1452 : pp = 233.56707763671875 
10 / 1452 : pp = 202.6468505859375 
20 / 1452 : pp = 198.2734375 
30 / 1452 : pp = 193.47442626953125 
40 / 1452 : pp = 195.17147827148438 
50 / 1452 : pp = 191.5596923828125 
60 / 1452 : pp = 190.4825897216797 
70 / 1452 : pp = 191.07681274414062 
80 / 1452 : pp = 190.339599609375 
90 / 1452 : pp = 188.98277282714844 
100 / 1452 : pp = 187.74757385253906 
110 / 1452 : pp = 186.10104370117188 
120 / 1452 : pp = 185.7500457763672 
130 / 1452 : pp = 184.90707397460938 
140 / 1452 : pp = 183.340087890625 
150 / 1452 : pp = 182.70840454101562 
160 / 1452 : pp = 183.1043701171875 
170 / 1452 : pp = 182.69776916503906 
180 / 1452 : pp = 181.88400268554688 
190 / 1452 : pp = 181.8062286376953 
200 / 1452 : pp = 182.4969940185547 
210 / 1452 : pp = 182.10572814941406 
220 / 1452 : pp = 181.9981689453125 
230 / 1452 : pp = 182.3802490234375 
240 / 1452 : pp = 182.03636169433594 
250 / 1452 : pp = 181.23712158203125 
260 / 1452 : pp = 180.53726196289062 
270 / 1452 : pp = 179.53567504882812 
280 / 1452 : pp = 179.70208740234375 
290 / 1452 : pp = 179.977783203125 
300 / 1452 : pp = 180.16600036621094 
310 / 1452 : pp = 179.87294006347656 
320 / 1452 : pp = 180.11849975585938 
330 / 1452 : pp = 180.31838989257812 
340 / 1452 : pp = 179.56759643554688 
350 / 1452 : pp = 179.97134399414062 
360 / 1452 : pp = 179.80030822753906 
370 / 1452 : pp = 179.52085876464844 
380 / 1452 : pp = 178.98228454589844 
390 / 1452 : pp = 179.0868682861328 
400 / 1452 : pp = 178.74569702148438 
410 / 1452 : pp = 179.1776580810547 
420 / 1452 : pp = 179.5055389404297 
430 / 1452 : pp = 179.3883056640625 
440 / 1452 : pp = 179.42279052734375 
450 / 1452 : pp = 179.2106475830078 
460 / 1452 : pp = 178.85311889648438 
470 / 1452 : pp = 178.33840942382812 
480 / 1452 : pp = 177.60350036621094 
490 / 1452 : pp = 177.30335998535156 
500 / 1452 : pp = 176.72222900390625 
510 / 1452 : pp = 176.6067352294922 
520 / 1452 : pp = 176.33998107910156 
530 / 1452 : pp = 175.93162536621094 
540 / 1452 : pp = 175.30657958984375 
550 / 1452 : pp = 174.9462432861328 
560 / 1452 : pp = 174.5836639404297 
570 / 1452 : pp = 174.31431579589844 
580 / 1452 : pp = 173.92300415039062 
590 / 1452 : pp = 173.55856323242188 
600 / 1452 : pp = 173.08277893066406 
610 / 1452 : pp = 172.75930786132812 
620 / 1452 : pp = 172.53192138671875 
630 / 1452 : pp = 172.20652770996094 
640 / 1452 : pp = 172.37454223632812 
650 / 1452 : pp = 172.39845275878906 
660 / 1452 : pp = 172.52255249023438 
670 / 1452 : pp = 172.60935974121094 
680 / 1452 : pp = 172.6611328125 
690 / 1452 : pp = 172.53118896484375 
700 / 1452 : pp = 172.4709014892578 
710 / 1452 : pp = 172.5406494140625 
720 / 1452 : pp = 172.55447387695312 
730 / 1452 : pp = 172.5330047607422 
740 / 1452 : pp = 172.7061767578125 
750 / 1452 : pp = 172.71054077148438 
760 / 1452 : pp = 172.77743530273438 
770 / 1452 : pp = 172.95481872558594 
780 / 1452 : pp = 173.11265563964844 
790 / 1452 : pp = 173.2832794189453 
800 / 1452 : pp = 173.2537841796875 
810 / 1452 : pp = 173.22164916992188 
820 / 1452 : pp = 173.24148559570312 
830 / 1452 : pp = 173.48228454589844 
840 / 1452 : pp = 173.43753051757812 
850 / 1452 : pp = 173.505615234375 
860 / 1452 : pp = 173.5214080810547 
870 / 1452 : pp = 173.5009002685547 
880 / 1452 : pp = 173.6202392578125 
890 / 1452 : pp = 173.622802734375 
900 / 1452 : pp = 173.5987091064453 
910 / 1452 : pp = 173.68316650390625 
920 / 1452 : pp = 173.77330017089844 
930 / 1452 : pp = 173.72018432617188 
940 / 1452 : pp = 173.79351806640625 
950 / 1452 : pp = 173.7653350830078 
960 / 1452 : pp = 173.7102508544922 
970 / 1452 : pp = 173.69766235351562 
980 / 1452 : pp = 173.4836883544922 
990 / 1452 : pp = 173.3550262451172 
1000 / 1452 : pp = 173.14816284179688 
1010 / 1452 : pp = 173.20777893066406 
1020 / 1452 : pp = 173.3390655517578 
1030 / 1452 : pp = 173.2884063720703 
1040 / 1452 : pp = 173.38015747070312 
1050 / 1452 : pp = 173.35592651367188 
1060 / 1452 : pp = 173.2260284423828 
1070 / 1452 : pp = 173.39321899414062 
1080 / 1452 : pp = 173.4879913330078 
1090 / 1452 : pp = 173.5231475830078 
1100 / 1452 : pp = 173.47177124023438 
1110 / 1452 : pp = 173.24453735351562 
1120 / 1452 : pp = 173.09408569335938 
1130 / 1452 : pp = 172.86627197265625 
1140 / 1452 : pp = 172.8234100341797 
1150 / 1452 : pp = 172.92843627929688 
1160 / 1452 : pp = 172.90065002441406 
1170 / 1452 : pp = 172.8550567626953 
1180 / 1452 : pp = 172.8810272216797 
1190 / 1452 : pp = 172.97312927246094 
1200 / 1452 : pp = 172.9776611328125 
1210 / 1452 : pp = 172.89413452148438 
1220 / 1452 : pp = 173.0257568359375 
1230 / 1452 : pp = 173.1847381591797 
1240 / 1452 : pp = 173.1756591796875 
1250 / 1452 : pp = 173.32138061523438 
1260 / 1452 : pp = 173.37229919433594 
1270 / 1452 : pp = 173.36891174316406 
1280 / 1452 : pp = 173.36337280273438 
1290 / 1452 : pp = 173.3444366455078 
1300 / 1452 : pp = 173.36138916015625 
1310 / 1452 : pp = 173.4015655517578 
1320 / 1452 : pp = 173.31790161132812 
1330 / 1452 : pp = 173.24710083007812 
1340 / 1452 : pp = 173.27212524414062 
1350 / 1452 : pp = 173.27674865722656 
1360 / 1452 : pp = 173.32749938964844 
1370 / 1452 : pp = 173.20472717285156 
1380 / 1452 : pp = 173.14889526367188 
1390 / 1452 : pp = 173.0755157470703 
1400 / 1452 : pp = 172.9678497314453 
1410 / 1452 : pp = 172.9612579345703 
1420 / 1452 : pp = 172.8872833251953 
1430 / 1452 : pp = 172.84805297851562 
1440 / 1452 : pp = 172.87252807617188 
1450 / 1452 : pp = 172.82505798339844 

0 / 115 : pp = 236.35635375976562 
10 / 115 : pp = 219.06166076660156 
20 / 115 : pp = 219.7670440673828 
30 / 115 : pp = 217.33587646484375 
40 / 115 : pp = 216.6626739501953 
50 / 115 : pp = 212.04734802246094 
60 / 115 : pp = 211.42068481445312 
70 / 115 : pp = 207.9592742919922 
80 / 115 : pp = 205.6216583251953 
90 / 115 : pp = 202.93597412109375 
100 / 115 : pp = 198.62583923339844 
110 / 115 : pp = 196.97216796875 
Training perplexity: 172.80404663085938
Validation perplexity:196.6871337890625
Total time : 41.52522921562195
Epoch 7

0 / 1452 : pp = 219.23231506347656 
10 / 1452 : pp = 192.07225036621094 
20 / 1452 : pp = 187.48464965820312 
30 / 1452 : pp = 182.9149932861328 
40 / 1452 : pp = 184.2945098876953 
50 / 1452 : pp = 180.78492736816406 
60 / 1452 : pp = 179.377197265625 
70 / 1452 : pp = 180.0273895263672 
80 / 1452 : pp = 179.2517547607422 
90 / 1452 : pp = 177.77540588378906 
100 / 1452 : pp = 176.6474151611328 
110 / 1452 : pp = 174.84066772460938 
120 / 1452 : pp = 174.46890258789062 
130 / 1452 : pp = 173.64573669433594 
140 / 1452 : pp = 172.17483520507812 
150 / 1452 : pp = 171.57041931152344 
160 / 1452 : pp = 171.92059326171875 
170 / 1452 : pp = 171.5497283935547 
180 / 1452 : pp = 170.77249145507812 
190 / 1452 : pp = 170.72103881835938 
200 / 1452 : pp = 171.336181640625 
210 / 1452 : pp = 170.98524475097656 
220 / 1452 : pp = 170.99771118164062 
230 / 1452 : pp = 171.39918518066406 
240 / 1452 : pp = 171.09925842285156 
250 / 1452 : pp = 170.39962768554688 
260 / 1452 : pp = 169.7328643798828 
270 / 1452 : pp = 168.72225952148438 
280 / 1452 : pp = 168.92552185058594 
290 / 1452 : pp = 169.20147705078125 
300 / 1452 : pp = 169.40338134765625 
310 / 1452 : pp = 169.12057495117188 
320 / 1452 : pp = 169.31236267089844 
330 / 1452 : pp = 169.49945068359375 
340 / 1452 : pp = 168.8396759033203 
350 / 1452 : pp = 169.25917053222656 
360 / 1452 : pp = 169.09388732910156 
370 / 1452 : pp = 168.84323120117188 
380 / 1452 : pp = 168.3832550048828 
390 / 1452 : pp = 168.48275756835938 
400 / 1452 : pp = 168.19972229003906 
410 / 1452 : pp = 168.5838623046875 
420 / 1452 : pp = 168.91119384765625 
430 / 1452 : pp = 168.80836486816406 
440 / 1452 : pp = 168.90264892578125 
450 / 1452 : pp = 168.68589782714844 
460 / 1452 : pp = 168.3704071044922 
470 / 1452 : pp = 167.90394592285156 
480 / 1452 : pp = 167.23373413085938 
490 / 1452 : pp = 166.9560546875 
500 / 1452 : pp = 166.43161010742188 
510 / 1452 : pp = 166.320068359375 
520 / 1452 : pp = 166.05902099609375 
530 / 1452 : pp = 165.71714782714844 
540 / 1452 : pp = 165.10398864746094 
550 / 1452 : pp = 164.80430603027344 
560 / 1452 : pp = 164.4687042236328 
570 / 1452 : pp = 164.2272491455078 
580 / 1452 : pp = 163.84312438964844 
590 / 1452 : pp = 163.46035766601562 
600 / 1452 : pp = 163.01559448242188 
610 / 1452 : pp = 162.74134826660156 
620 / 1452 : pp = 162.50267028808594 
630 / 1452 : pp = 162.2018280029297 
640 / 1452 : pp = 162.37130737304688 
650 / 1452 : pp = 162.3895721435547 
660 / 1452 : pp = 162.51351928710938 
670 / 1452 : pp = 162.57684326171875 
680 / 1452 : pp = 162.6346893310547 
690 / 1452 : pp = 162.5135955810547 
700 / 1452 : pp = 162.47052001953125 
710 / 1452 : pp = 162.539794921875 
720 / 1452 : pp = 162.55381774902344 
730 / 1452 : pp = 162.5297088623047 
740 / 1452 : pp = 162.71652221679688 
750 / 1452 : pp = 162.740966796875 
760 / 1452 : pp = 162.79754638671875 
770 / 1452 : pp = 162.9949951171875 
780 / 1452 : pp = 163.17868041992188 
790 / 1452 : pp = 163.33055114746094 
800 / 1452 : pp = 163.31591796875 
810 / 1452 : pp = 163.2859344482422 
820 / 1452 : pp = 163.2958984375 
830 / 1452 : pp = 163.528564453125 
840 / 1452 : pp = 163.47610473632812 
850 / 1452 : pp = 163.5260772705078 
860 / 1452 : pp = 163.55352783203125 
870 / 1452 : pp = 163.55718994140625 
880 / 1452 : pp = 163.67523193359375 
890 / 1452 : pp = 163.6920166015625 
900 / 1452 : pp = 163.67710876464844 
910 / 1452 : pp = 163.7476806640625 
920 / 1452 : pp = 163.84803771972656 
930 / 1452 : pp = 163.8114013671875 
940 / 1452 : pp = 163.86663818359375 
950 / 1452 : pp = 163.83531188964844 
960 / 1452 : pp = 163.79945373535156 
970 / 1452 : pp = 163.80320739746094 
980 / 1452 : pp = 163.5953369140625 
990 / 1452 : pp = 163.48382568359375 
1000 / 1452 : pp = 163.2642822265625 
1010 / 1452 : pp = 163.32113647460938 
1020 / 1452 : pp = 163.44204711914062 
1030 / 1452 : pp = 163.40206909179688 
1040 / 1452 : pp = 163.4915313720703 
1050 / 1452 : pp = 163.47096252441406 
1060 / 1452 : pp = 163.3601531982422 
1070 / 1452 : pp = 163.5138397216797 
1080 / 1452 : pp = 163.6189727783203 
1090 / 1452 : pp = 163.6471405029297 
1100 / 1452 : pp = 163.60406494140625 
1110 / 1452 : pp = 163.40736389160156 
1120 / 1452 : pp = 163.26841735839844 
1130 / 1452 : pp = 163.0680694580078 
1140 / 1452 : pp = 163.04591369628906 
1150 / 1452 : pp = 163.15478515625 
1160 / 1452 : pp = 163.1380615234375 
1170 / 1452 : pp = 163.09303283691406 
1180 / 1452 : pp = 163.14149475097656 
1190 / 1452 : pp = 163.2374267578125 
1200 / 1452 : pp = 163.2394561767578 
1210 / 1452 : pp = 163.17835998535156 
1220 / 1452 : pp = 163.32347106933594 
1230 / 1452 : pp = 163.4639434814453 
1240 / 1452 : pp = 163.4611358642578 
1250 / 1452 : pp = 163.60687255859375 
1260 / 1452 : pp = 163.67227172851562 
1270 / 1452 : pp = 163.67515563964844 
1280 / 1452 : pp = 163.6881103515625 
1290 / 1452 : pp = 163.66648864746094 
1300 / 1452 : pp = 163.69287109375 
1310 / 1452 : pp = 163.7276153564453 
1320 / 1452 : pp = 163.6551055908203 
1330 / 1452 : pp = 163.58901977539062 
1340 / 1452 : pp = 163.6205291748047 
1350 / 1452 : pp = 163.63824462890625 
1360 / 1452 : pp = 163.69334411621094 
1370 / 1452 : pp = 163.5885467529297 
1380 / 1452 : pp = 163.54049682617188 
1390 / 1452 : pp = 163.4760284423828 
1400 / 1452 : pp = 163.38897705078125 
1410 / 1452 : pp = 163.3974609375 
1420 / 1452 : pp = 163.35009765625 
1430 / 1452 : pp = 163.32191467285156 
1440 / 1452 : pp = 163.35220336914062 
1450 / 1452 : pp = 163.3201904296875 

0 / 115 : pp = 232.2108154296875 
10 / 115 : pp = 214.35496520996094 
20 / 115 : pp = 215.20510864257812 
30 / 115 : pp = 212.82754516601562 
40 / 115 : pp = 212.0598907470703 
50 / 115 : pp = 207.5095672607422 
60 / 115 : pp = 206.86976623535156 
70 / 115 : pp = 203.36016845703125 
80 / 115 : pp = 201.11538696289062 
90 / 115 : pp = 198.52120971679688 
100 / 115 : pp = 194.1772003173828 
110 / 115 : pp = 192.41224670410156 
Training perplexity: 163.29916381835938
Validation perplexity:192.09552001953125
Total time : 41.78096055984497
Epoch 8

0 / 1452 : pp = 201.77548217773438 
10 / 1452 : pp = 180.4141082763672 
20 / 1452 : pp = 176.41432189941406 
30 / 1452 : pp = 172.7764434814453 
40 / 1452 : pp = 174.69166564941406 
50 / 1452 : pp = 171.2933807373047 
60 / 1452 : pp = 170.08010864257812 
70 / 1452 : pp = 170.6719512939453 
80 / 1452 : pp = 170.07589721679688 
90 / 1452 : pp = 168.7478485107422 
100 / 1452 : pp = 167.57081604003906 
110 / 1452 : pp = 166.06971740722656 
120 / 1452 : pp = 165.73374938964844 
130 / 1452 : pp = 164.80674743652344 
140 / 1452 : pp = 163.32821655273438 
150 / 1452 : pp = 162.6752471923828 
160 / 1452 : pp = 163.02049255371094 
170 / 1452 : pp = 162.64120483398438 
180 / 1452 : pp = 161.95529174804688 
190 / 1452 : pp = 161.91954040527344 
200 / 1452 : pp = 162.5446014404297 
210 / 1452 : pp = 162.2645721435547 
220 / 1452 : pp = 162.3128662109375 
230 / 1452 : pp = 162.65872192382812 
240 / 1452 : pp = 162.40948486328125 
250 / 1452 : pp = 161.75787353515625 
260 / 1452 : pp = 161.15213012695312 
270 / 1452 : pp = 160.22256469726562 
280 / 1452 : pp = 160.3651123046875 
290 / 1452 : pp = 160.63780212402344 
300 / 1452 : pp = 160.80026245117188 
310 / 1452 : pp = 160.54383850097656 
320 / 1452 : pp = 160.7539520263672 
330 / 1452 : pp = 160.94317626953125 
340 / 1452 : pp = 160.3373565673828 
350 / 1452 : pp = 160.71763610839844 
360 / 1452 : pp = 160.60960388183594 
370 / 1452 : pp = 160.37527465820312 
380 / 1452 : pp = 159.92990112304688 
390 / 1452 : pp = 160.0165557861328 
400 / 1452 : pp = 159.75697326660156 
410 / 1452 : pp = 160.15274047851562 
420 / 1452 : pp = 160.48390197753906 
430 / 1452 : pp = 160.4031982421875 
440 / 1452 : pp = 160.4693603515625 
450 / 1452 : pp = 160.28016662597656 
460 / 1452 : pp = 159.94004821777344 
470 / 1452 : pp = 159.48257446289062 
480 / 1452 : pp = 158.87998962402344 
490 / 1452 : pp = 158.59765625 
500 / 1452 : pp = 158.10865783691406 
510 / 1452 : pp = 157.96795654296875 
520 / 1452 : pp = 157.7591552734375 
530 / 1452 : pp = 157.42648315429688 
540 / 1452 : pp = 156.85348510742188 
550 / 1452 : pp = 156.5618438720703 
560 / 1452 : pp = 156.24905395507812 
570 / 1452 : pp = 155.9994354248047 
580 / 1452 : pp = 155.612060546875 
590 / 1452 : pp = 155.25830078125 
600 / 1452 : pp = 154.8464813232422 
610 / 1452 : pp = 154.5833282470703 
620 / 1452 : pp = 154.38040161132812 
630 / 1452 : pp = 154.0767364501953 
640 / 1452 : pp = 154.2534637451172 
650 / 1452 : pp = 154.25875854492188 
660 / 1452 : pp = 154.35874938964844 
670 / 1452 : pp = 154.4289093017578 
680 / 1452 : pp = 154.51412963867188 
690 / 1452 : pp = 154.41676330566406 
700 / 1452 : pp = 154.37892150878906 
710 / 1452 : pp = 154.4234619140625 
720 / 1452 : pp = 154.4586639404297 
730 / 1452 : pp = 154.4351806640625 
740 / 1452 : pp = 154.6002197265625 
750 / 1452 : pp = 154.65684509277344 
760 / 1452 : pp = 154.73318481445312 
770 / 1452 : pp = 154.92935180664062 
780 / 1452 : pp = 155.1021728515625 
790 / 1452 : pp = 155.24757385253906 
800 / 1452 : pp = 155.223876953125 
810 / 1452 : pp = 155.2095184326172 
820 / 1452 : pp = 155.24009704589844 
830 / 1452 : pp = 155.4519500732422 
840 / 1452 : pp = 155.3947296142578 
850 / 1452 : pp = 155.45306396484375 
860 / 1452 : pp = 155.4661102294922 
870 / 1452 : pp = 155.45765686035156 
880 / 1452 : pp = 155.58758544921875 
890 / 1452 : pp = 155.59373474121094 
900 / 1452 : pp = 155.59254455566406 
910 / 1452 : pp = 155.66854858398438 
920 / 1452 : pp = 155.75942993164062 
930 / 1452 : pp = 155.73350524902344 
940 / 1452 : pp = 155.80740356445312 
950 / 1452 : pp = 155.7733917236328 
960 / 1452 : pp = 155.73565673828125 
970 / 1452 : pp = 155.74404907226562 
980 / 1452 : pp = 155.55902099609375 
990 / 1452 : pp = 155.45675659179688 
1000 / 1452 : pp = 155.2649688720703 
1010 / 1452 : pp = 155.31332397460938 
1020 / 1452 : pp = 155.44979858398438 
1030 / 1452 : pp = 155.4137725830078 
1040 / 1452 : pp = 155.49012756347656 
1050 / 1452 : pp = 155.46054077148438 
1060 / 1452 : pp = 155.3616943359375 
1070 / 1452 : pp = 155.5286865234375 
1080 / 1452 : pp = 155.63743591308594 
1090 / 1452 : pp = 155.6842803955078 
1100 / 1452 : pp = 155.65599060058594 
1110 / 1452 : pp = 155.4827880859375 
1120 / 1452 : pp = 155.35450744628906 
1130 / 1452 : pp = 155.1777801513672 
1140 / 1452 : pp = 155.15994262695312 
1150 / 1452 : pp = 155.26193237304688 
1160 / 1452 : pp = 155.26214599609375 
1170 / 1452 : pp = 155.23231506347656 
1180 / 1452 : pp = 155.29266357421875 
1190 / 1452 : pp = 155.37680053710938 
1200 / 1452 : pp = 155.3736114501953 
1210 / 1452 : pp = 155.3380584716797 
1220 / 1452 : pp = 155.474853515625 
1230 / 1452 : pp = 155.62986755371094 
1240 / 1452 : pp = 155.62831115722656 
1250 / 1452 : pp = 155.77101135253906 
1260 / 1452 : pp = 155.83445739746094 
1270 / 1452 : pp = 155.845458984375 
1280 / 1452 : pp = 155.8556365966797 
1290 / 1452 : pp = 155.8556365966797 
1300 / 1452 : pp = 155.8843994140625 
1310 / 1452 : pp = 155.92417907714844 
1320 / 1452 : pp = 155.8560791015625 
1330 / 1452 : pp = 155.80636596679688 
1340 / 1452 : pp = 155.84344482421875 
1350 / 1452 : pp = 155.8706512451172 
1360 / 1452 : pp = 155.9273681640625 
1370 / 1452 : pp = 155.83140563964844 
1380 / 1452 : pp = 155.7911376953125 
1390 / 1452 : pp = 155.7401885986328 
1400 / 1452 : pp = 155.6622314453125 
1410 / 1452 : pp = 155.68531799316406 
1420 / 1452 : pp = 155.64041137695312 
1430 / 1452 : pp = 155.62216186523438 
1440 / 1452 : pp = 155.6437530517578 
1450 / 1452 : pp = 155.62757873535156 

0 / 115 : pp = 228.70111083984375 
10 / 115 : pp = 211.03330993652344 
20 / 115 : pp = 212.24957275390625 
30 / 115 : pp = 209.8839569091797 
40 / 115 : pp = 209.11045837402344 
50 / 115 : pp = 204.66351318359375 
60 / 115 : pp = 204.03366088867188 
70 / 115 : pp = 200.46681213378906 
80 / 115 : pp = 198.24404907226562 
90 / 115 : pp = 195.63223266601562 
100 / 115 : pp = 191.18345642089844 
110 / 115 : pp = 189.31134033203125 
Training perplexity: 155.61154174804688
Validation perplexity:188.94537353515625
Total time : 42.13483738899231
Epoch 9

0 / 1452 : pp = 197.80628967285156 
10 / 1452 : pp = 172.6316680908203 
20 / 1452 : pp = 168.6739959716797 
30 / 1452 : pp = 164.4781036376953 
40 / 1452 : pp = 166.1627960205078 
50 / 1452 : pp = 163.05197143554688 
60 / 1452 : pp = 161.87924194335938 
70 / 1452 : pp = 162.5297088623047 
80 / 1452 : pp = 161.7450714111328 
90 / 1452 : pp = 160.6148223876953 
100 / 1452 : pp = 159.73289489746094 
110 / 1452 : pp = 158.4092254638672 
120 / 1452 : pp = 158.04653930664062 
130 / 1452 : pp = 157.13563537597656 
140 / 1452 : pp = 155.71798706054688 
150 / 1452 : pp = 155.19161987304688 
160 / 1452 : pp = 155.42718505859375 
170 / 1452 : pp = 155.0531463623047 
180 / 1452 : pp = 154.46897888183594 
190 / 1452 : pp = 154.4127197265625 
200 / 1452 : pp = 154.97154235839844 
210 / 1452 : pp = 154.70169067382812 
220 / 1452 : pp = 154.72816467285156 
230 / 1452 : pp = 155.03799438476562 
240 / 1452 : pp = 154.85601806640625 
250 / 1452 : pp = 154.28016662597656 
260 / 1452 : pp = 153.7699432373047 
270 / 1452 : pp = 152.90948486328125 
280 / 1452 : pp = 153.0459747314453 
290 / 1452 : pp = 153.298095703125 
300 / 1452 : pp = 153.45716857910156 
310 / 1452 : pp = 153.22195434570312 
320 / 1452 : pp = 153.41664123535156 
330 / 1452 : pp = 153.66542053222656 
340 / 1452 : pp = 153.06378173828125 
350 / 1452 : pp = 153.43923950195312 
360 / 1452 : pp = 153.31381225585938 
370 / 1452 : pp = 153.13473510742188 
380 / 1452 : pp = 152.75267028808594 
390 / 1452 : pp = 152.85504150390625 
400 / 1452 : pp = 152.62342834472656 
410 / 1452 : pp = 153.03152465820312 
420 / 1452 : pp = 153.39161682128906 
430 / 1452 : pp = 153.30364990234375 
440 / 1452 : pp = 153.37896728515625 
450 / 1452 : pp = 153.18988037109375 
460 / 1452 : pp = 152.88478088378906 
470 / 1452 : pp = 152.4380340576172 
480 / 1452 : pp = 151.86618041992188 
490 / 1452 : pp = 151.5962371826172 
500 / 1452 : pp = 151.11614990234375 
510 / 1452 : pp = 150.99830627441406 
520 / 1452 : pp = 150.8135986328125 
530 / 1452 : pp = 150.500732421875 
540 / 1452 : pp = 149.9623260498047 
550 / 1452 : pp = 149.68028259277344 
560 / 1452 : pp = 149.3885040283203 
570 / 1452 : pp = 149.140380859375 
580 / 1452 : pp = 148.76876831054688 
590 / 1452 : pp = 148.43368530273438 
600 / 1452 : pp = 148.02598571777344 
610 / 1452 : pp = 147.7869110107422 
620 / 1452 : pp = 147.59796142578125 
630 / 1452 : pp = 147.30068969726562 
640 / 1452 : pp = 147.45240783691406 
650 / 1452 : pp = 147.4651336669922 
660 / 1452 : pp = 147.5808563232422 
670 / 1452 : pp = 147.65582275390625 
680 / 1452 : pp = 147.7360382080078 
690 / 1452 : pp = 147.63075256347656 
700 / 1452 : pp = 147.6066131591797 
710 / 1452 : pp = 147.7024383544922 
720 / 1452 : pp = 147.7445526123047 
730 / 1452 : pp = 147.72279357910156 
740 / 1452 : pp = 147.87107849121094 
750 / 1452 : pp = 147.91436767578125 
760 / 1452 : pp = 147.9857635498047 
770 / 1452 : pp = 148.18206787109375 
780 / 1452 : pp = 148.3845672607422 
790 / 1452 : pp = 148.5517120361328 
800 / 1452 : pp = 148.54002380371094 
810 / 1452 : pp = 148.51119995117188 
820 / 1452 : pp = 148.5664520263672 
830 / 1452 : pp = 148.7821044921875 
840 / 1452 : pp = 148.72486877441406 
850 / 1452 : pp = 148.77452087402344 
860 / 1452 : pp = 148.80076599121094 
870 / 1452 : pp = 148.79701232910156 
880 / 1452 : pp = 148.9181671142578 
890 / 1452 : pp = 148.94537353515625 
900 / 1452 : pp = 148.9435272216797 
910 / 1452 : pp = 149.02102661132812 
920 / 1452 : pp = 149.1085968017578 
930 / 1452 : pp = 149.06893920898438 
940 / 1452 : pp = 149.1317138671875 
950 / 1452 : pp = 149.1232452392578 
960 / 1452 : pp = 149.10354614257812 
970 / 1452 : pp = 149.11656188964844 
980 / 1452 : pp = 148.94259643554688 
990 / 1452 : pp = 148.8236846923828 
1000 / 1452 : pp = 148.633056640625 
1010 / 1452 : pp = 148.6830291748047 
1020 / 1452 : pp = 148.8126220703125 
1030 / 1452 : pp = 148.78089904785156 
1040 / 1452 : pp = 148.8600311279297 
1050 / 1452 : pp = 148.8486785888672 
1060 / 1452 : pp = 148.7664337158203 
1070 / 1452 : pp = 148.9337921142578 
1080 / 1452 : pp = 149.04441833496094 
1090 / 1452 : pp = 149.07284545898438 
1100 / 1452 : pp = 149.03318786621094 
1110 / 1452 : pp = 148.86428833007812 
1120 / 1452 : pp = 148.7332305908203 
1130 / 1452 : pp = 148.5670166015625 
1140 / 1452 : pp = 148.54661560058594 
1150 / 1452 : pp = 148.64219665527344 
1160 / 1452 : pp = 148.6490020751953 
1170 / 1452 : pp = 148.62420654296875 
1180 / 1452 : pp = 148.67665100097656 
1190 / 1452 : pp = 148.7633056640625 
1200 / 1452 : pp = 148.7782745361328 
1210 / 1452 : pp = 148.72500610351562 
1220 / 1452 : pp = 148.87493896484375 
1230 / 1452 : pp = 149.039794921875 
1240 / 1452 : pp = 149.04000854492188 
1250 / 1452 : pp = 149.17054748535156 
1260 / 1452 : pp = 149.23863220214844 
1270 / 1452 : pp = 149.2436065673828 
1280 / 1452 : pp = 149.25086975097656 
1290 / 1452 : pp = 149.24147033691406 
1300 / 1452 : pp = 149.27413940429688 
1310 / 1452 : pp = 149.32077026367188 
1320 / 1452 : pp = 149.27301025390625 
1330 / 1452 : pp = 149.23080444335938 
1340 / 1452 : pp = 149.25791931152344 
1350 / 1452 : pp = 149.2841033935547 
1360 / 1452 : pp = 149.337158203125 
1370 / 1452 : pp = 149.2467498779297 
1380 / 1452 : pp = 149.21351623535156 
1390 / 1452 : pp = 149.15403747558594 
1400 / 1452 : pp = 149.0877685546875 
1410 / 1452 : pp = 149.110595703125 
1420 / 1452 : pp = 149.07241821289062 
1430 / 1452 : pp = 149.05166625976562 
1440 / 1452 : pp = 149.0776824951172 
1450 / 1452 : pp = 149.06771850585938 

0 / 115 : pp = 227.0559844970703 
10 / 115 : pp = 208.7002410888672 
20 / 115 : pp = 210.38775634765625 
30 / 115 : pp = 207.9513397216797 
40 / 115 : pp = 207.12994384765625 
50 / 115 : pp = 202.70811462402344 
60 / 115 : pp = 202.05787658691406 
70 / 115 : pp = 198.3761444091797 
80 / 115 : pp = 196.17637634277344 
90 / 115 : pp = 193.5880126953125 
100 / 115 : pp = 189.0758819580078 
110 / 115 : pp = 187.07528686523438 
Training perplexity: 149.0502471923828
Validation perplexity:186.6911163330078
Total time : 47.274805545806885
Epoch 10

0 / 1452 : pp = 181.8408203125 
10 / 1452 : pp = 164.99664306640625 
20 / 1452 : pp = 161.8847198486328 
30 / 1452 : pp = 158.30064392089844 
40 / 1452 : pp = 160.13914489746094 
50 / 1452 : pp = 157.58743286132812 
60 / 1452 : pp = 156.11871337890625 
70 / 1452 : pp = 156.82948303222656 
80 / 1452 : pp = 156.2889862060547 
90 / 1452 : pp = 155.04833984375 
100 / 1452 : pp = 154.09327697753906 
110 / 1452 : pp = 152.5070343017578 
120 / 1452 : pp = 152.20750427246094 
130 / 1452 : pp = 151.3399200439453 
140 / 1452 : pp = 149.90740966796875 
150 / 1452 : pp = 149.345703125 
160 / 1452 : pp = 149.59814453125 
170 / 1452 : pp = 149.26539611816406 
180 / 1452 : pp = 148.624267578125 
190 / 1452 : pp = 148.58819580078125 
200 / 1452 : pp = 149.09552001953125 
210 / 1452 : pp = 148.8439178466797 
220 / 1452 : pp = 148.86605834960938 
230 / 1452 : pp = 149.1971435546875 
240 / 1452 : pp = 148.96533203125 
250 / 1452 : pp = 148.4253387451172 
260 / 1452 : pp = 147.9200897216797 
270 / 1452 : pp = 147.08816528320312 
280 / 1452 : pp = 147.24366760253906 
290 / 1452 : pp = 147.52182006835938 
300 / 1452 : pp = 147.72222900390625 
310 / 1452 : pp = 147.50486755371094 
320 / 1452 : pp = 147.73892211914062 
330 / 1452 : pp = 147.9404754638672 
340 / 1452 : pp = 147.37803649902344 
350 / 1452 : pp = 147.6969451904297 
360 / 1452 : pp = 147.5704345703125 
370 / 1452 : pp = 147.38674926757812 
380 / 1452 : pp = 147.03970336914062 
390 / 1452 : pp = 147.14231872558594 
400 / 1452 : pp = 146.91656494140625 
410 / 1452 : pp = 147.34059143066406 
420 / 1452 : pp = 147.68496704101562 
430 / 1452 : pp = 147.61195373535156 
440 / 1452 : pp = 147.68405151367188 
450 / 1452 : pp = 147.4711151123047 
460 / 1452 : pp = 147.1927032470703 
470 / 1452 : pp = 146.72970581054688 
480 / 1452 : pp = 146.17173767089844 
490 / 1452 : pp = 145.9028778076172 
500 / 1452 : pp = 145.42721557617188 
510 / 1452 : pp = 145.3111114501953 
520 / 1452 : pp = 145.11460876464844 
530 / 1452 : pp = 144.81488037109375 
540 / 1452 : pp = 144.263916015625 
550 / 1452 : pp = 143.997802734375 
560 / 1452 : pp = 143.71766662597656 
570 / 1452 : pp = 143.47451782226562 
580 / 1452 : pp = 143.08474731445312 
590 / 1452 : pp = 142.77920532226562 
600 / 1452 : pp = 142.39573669433594 
610 / 1452 : pp = 142.14906311035156 
620 / 1452 : pp = 141.9574432373047 
630 / 1452 : pp = 141.67369079589844 
640 / 1452 : pp = 141.81556701660156 
650 / 1452 : pp = 141.81759643554688 
660 / 1452 : pp = 141.9339599609375 
670 / 1452 : pp = 142.01248168945312 
680 / 1452 : pp = 142.08773803710938 
690 / 1452 : pp = 142.00328063964844 
700 / 1452 : pp = 141.98086547851562 
710 / 1452 : pp = 142.0632781982422 
720 / 1452 : pp = 142.10372924804688 
730 / 1452 : pp = 142.08055114746094 
740 / 1452 : pp = 142.23619079589844 
750 / 1452 : pp = 142.2660369873047 
760 / 1452 : pp = 142.34678649902344 
770 / 1452 : pp = 142.5257568359375 
780 / 1452 : pp = 142.70025634765625 
790 / 1452 : pp = 142.8614044189453 
800 / 1452 : pp = 142.84573364257812 
810 / 1452 : pp = 142.8250274658203 
820 / 1452 : pp = 142.8540496826172 
830 / 1452 : pp = 143.06053161621094 
840 / 1452 : pp = 143.0423126220703 
850 / 1452 : pp = 143.09634399414062 
860 / 1452 : pp = 143.10487365722656 
870 / 1452 : pp = 143.0884246826172 
880 / 1452 : pp = 143.19387817382812 
890 / 1452 : pp = 143.236083984375 
900 / 1452 : pp = 143.23390197753906 
910 / 1452 : pp = 143.29537963867188 
920 / 1452 : pp = 143.3722686767578 
930 / 1452 : pp = 143.33795166015625 
940 / 1452 : pp = 143.40618896484375 
950 / 1452 : pp = 143.3929901123047 
960 / 1452 : pp = 143.3693389892578 
970 / 1452 : pp = 143.39736938476562 
980 / 1452 : pp = 143.2371063232422 
990 / 1452 : pp = 143.13893127441406 
1000 / 1452 : pp = 142.9658660888672 
1010 / 1452 : pp = 143.01544189453125 
1020 / 1452 : pp = 143.152587890625 
1030 / 1452 : pp = 143.11334228515625 
1040 / 1452 : pp = 143.19020080566406 
1050 / 1452 : pp = 143.18234252929688 
1060 / 1452 : pp = 143.092041015625 
1070 / 1452 : pp = 143.24449157714844 
1080 / 1452 : pp = 143.34828186035156 
1090 / 1452 : pp = 143.38739013671875 
1100 / 1452 : pp = 143.37432861328125 
1110 / 1452 : pp = 143.20596313476562 
1120 / 1452 : pp = 143.07969665527344 
1130 / 1452 : pp = 142.92041015625 
1140 / 1452 : pp = 142.90902709960938 
1150 / 1452 : pp = 143.00732421875 
1160 / 1452 : pp = 143.01182556152344 
1170 / 1452 : pp = 142.9925994873047 
1180 / 1452 : pp = 143.06080627441406 
1190 / 1452 : pp = 143.14337158203125 
1200 / 1452 : pp = 143.16644287109375 
1210 / 1452 : pp = 143.1259002685547 
1220 / 1452 : pp = 143.2671661376953 
1230 / 1452 : pp = 143.4210968017578 
1240 / 1452 : pp = 143.4327850341797 
1250 / 1452 : pp = 143.5699920654297 
1260 / 1452 : pp = 143.63771057128906 
1270 / 1452 : pp = 143.65798950195312 
1280 / 1452 : pp = 143.68251037597656 
1290 / 1452 : pp = 143.68045043945312 
1300 / 1452 : pp = 143.72293090820312 
1310 / 1452 : pp = 143.77015686035156 
1320 / 1452 : pp = 143.71910095214844 
1330 / 1452 : pp = 143.68792724609375 
1340 / 1452 : pp = 143.7241668701172 
1350 / 1452 : pp = 143.7570037841797 
1360 / 1452 : pp = 143.81829833984375 
1370 / 1452 : pp = 143.7487030029297 
1380 / 1452 : pp = 143.7196502685547 
1390 / 1452 : pp = 143.67359924316406 
1400 / 1452 : pp = 143.60592651367188 
1410 / 1452 : pp = 143.62620544433594 
1420 / 1452 : pp = 143.5905303955078 
1430 / 1452 : pp = 143.55799865722656 
1440 / 1452 : pp = 143.5891571044922 
1450 / 1452 : pp = 143.5869598388672 

0 / 115 : pp = 226.9864959716797 
10 / 115 : pp = 207.8067169189453 
20 / 115 : pp = 209.68667602539062 
30 / 115 : pp = 207.1610565185547 
40 / 115 : pp = 206.3247833251953 
50 / 115 : pp = 201.77403259277344 
60 / 115 : pp = 201.07098388671875 
70 / 115 : pp = 197.33335876464844 
80 / 115 : pp = 195.12513732910156 
90 / 115 : pp = 192.5349578857422 
100 / 115 : pp = 187.90072631835938 
110 / 115 : pp = 185.81240844726562 
Training perplexity: 143.57354736328125
Validation perplexity:185.40573120117188
Total time : 46.14846849441528
Epoch 11

0 / 1452 : pp = 181.93162536621094 
10 / 1452 : pp = 159.94607543945312 
20 / 1452 : pp = 156.83673095703125 
30 / 1452 : pp = 153.75843811035156 
40 / 1452 : pp = 155.18362426757812 
50 / 1452 : pp = 152.39529418945312 
60 / 1452 : pp = 151.18772888183594 
70 / 1452 : pp = 151.9004364013672 
80 / 1452 : pp = 151.30239868164062 
90 / 1452 : pp = 150.1591033935547 
100 / 1452 : pp = 149.18618774414062 
110 / 1452 : pp = 147.72653198242188 
120 / 1452 : pp = 147.4357452392578 
130 / 1452 : pp = 146.41372680664062 
140 / 1452 : pp = 145.0057373046875 
150 / 1452 : pp = 144.39447021484375 
160 / 1452 : pp = 144.5330047607422 
170 / 1452 : pp = 144.23593139648438 
180 / 1452 : pp = 143.63990783691406 
190 / 1452 : pp = 143.63812255859375 
200 / 1452 : pp = 144.1143798828125 
210 / 1452 : pp = 143.88278198242188 
220 / 1452 : pp = 143.92518615722656 
230 / 1452 : pp = 144.24032592773438 
240 / 1452 : pp = 143.94110107421875 
250 / 1452 : pp = 143.3688507080078 
260 / 1452 : pp = 142.8829345703125 
270 / 1452 : pp = 142.11952209472656 
280 / 1452 : pp = 142.19415283203125 
290 / 1452 : pp = 142.51889038085938 
300 / 1452 : pp = 142.70494079589844 
310 / 1452 : pp = 142.51426696777344 
320 / 1452 : pp = 142.70106506347656 
330 / 1452 : pp = 142.88014221191406 
340 / 1452 : pp = 142.3287353515625 
350 / 1452 : pp = 142.6169891357422 
360 / 1452 : pp = 142.51971435546875 
370 / 1452 : pp = 142.33566284179688 
380 / 1452 : pp = 142.04161071777344 
390 / 1452 : pp = 142.13551330566406 
400 / 1452 : pp = 141.9499969482422 
410 / 1452 : pp = 142.3361358642578 
420 / 1452 : pp = 142.64065551757812 
430 / 1452 : pp = 142.5511016845703 
440 / 1452 : pp = 142.6728973388672 
450 / 1452 : pp = 142.47030639648438 
460 / 1452 : pp = 142.1704864501953 
470 / 1452 : pp = 141.73390197753906 
480 / 1452 : pp = 141.23020935058594 
490 / 1452 : pp = 140.9759521484375 
500 / 1452 : pp = 140.51609802246094 
510 / 1452 : pp = 140.40545654296875 
520 / 1452 : pp = 140.1936492919922 
530 / 1452 : pp = 139.8929443359375 
540 / 1452 : pp = 139.3696746826172 
550 / 1452 : pp = 139.13217163085938 
560 / 1452 : pp = 138.85247802734375 
570 / 1452 : pp = 138.6092987060547 
580 / 1452 : pp = 138.2471160888672 
590 / 1452 : pp = 137.9485626220703 
600 / 1452 : pp = 137.57379150390625 
610 / 1452 : pp = 137.31576538085938 
620 / 1452 : pp = 137.14230346679688 
630 / 1452 : pp = 136.87405395507812 
640 / 1452 : pp = 137.02928161621094 
650 / 1452 : pp = 137.0481719970703 
660 / 1452 : pp = 137.1595001220703 
670 / 1452 : pp = 137.21124267578125 
680 / 1452 : pp = 137.2671356201172 
690 / 1452 : pp = 137.19410705566406 
700 / 1452 : pp = 137.1850128173828 
710 / 1452 : pp = 137.26058959960938 
720 / 1452 : pp = 137.30726623535156 
730 / 1452 : pp = 137.28048706054688 
740 / 1452 : pp = 137.4352569580078 
750 / 1452 : pp = 137.4680938720703 
760 / 1452 : pp = 137.5524139404297 
770 / 1452 : pp = 137.73829650878906 
780 / 1452 : pp = 137.90882873535156 
790 / 1452 : pp = 138.05865478515625 
800 / 1452 : pp = 138.0673370361328 
810 / 1452 : pp = 138.03909301757812 
820 / 1452 : pp = 138.084716796875 
830 / 1452 : pp = 138.27989196777344 
840 / 1452 : pp = 138.23545837402344 
850 / 1452 : pp = 138.30343627929688 
860 / 1452 : pp = 138.3339080810547 
870 / 1452 : pp = 138.32835388183594 
880 / 1452 : pp = 138.4450225830078 
890 / 1452 : pp = 138.47157287597656 
900 / 1452 : pp = 138.46304321289062 
910 / 1452 : pp = 138.55618286132812 
920 / 1452 : pp = 138.64512634277344 
930 / 1452 : pp = 138.6160430908203 
940 / 1452 : pp = 138.66932678222656 
950 / 1452 : pp = 138.6573028564453 
960 / 1452 : pp = 138.6463165283203 
970 / 1452 : pp = 138.67059326171875 
980 / 1452 : pp = 138.50999450683594 
990 / 1452 : pp = 138.42430114746094 
1000 / 1452 : pp = 138.25344848632812 
1010 / 1452 : pp = 138.3004608154297 
1020 / 1452 : pp = 138.4243621826172 
1030 / 1452 : pp = 138.40713500976562 
1040 / 1452 : pp = 138.47129821777344 
1050 / 1452 : pp = 138.45928955078125 
1060 / 1452 : pp = 138.3919677734375 
1070 / 1452 : pp = 138.5287628173828 
1080 / 1452 : pp = 138.62298583984375 
1090 / 1452 : pp = 138.6699981689453 
1100 / 1452 : pp = 138.64849853515625 
1110 / 1452 : pp = 138.49191284179688 
1120 / 1452 : pp = 138.37355041503906 
1130 / 1452 : pp = 138.2216796875 
1140 / 1452 : pp = 138.21534729003906 
1150 / 1452 : pp = 138.30963134765625 
1160 / 1452 : pp = 138.316162109375 
1170 / 1452 : pp = 138.3023681640625 
1180 / 1452 : pp = 138.36932373046875 
1190 / 1452 : pp = 138.45960998535156 
1200 / 1452 : pp = 138.4866180419922 
1210 / 1452 : pp = 138.45730590820312 
1220 / 1452 : pp = 138.60031127929688 
1230 / 1452 : pp = 138.75485229492188 
1240 / 1452 : pp = 138.7751007080078 
1250 / 1452 : pp = 138.91221618652344 
1260 / 1452 : pp = 138.9815216064453 
1270 / 1452 : pp = 138.9919891357422 
1280 / 1452 : pp = 139.0243377685547 
1290 / 1452 : pp = 139.02725219726562 
1300 / 1452 : pp = 139.0701446533203 
1310 / 1452 : pp = 139.1090850830078 
1320 / 1452 : pp = 139.06027221679688 
1330 / 1452 : pp = 139.0338134765625 
1340 / 1452 : pp = 139.06385803222656 
1350 / 1452 : pp = 139.09608459472656 
1360 / 1452 : pp = 139.1609649658203 
1370 / 1452 : pp = 139.0869903564453 
1380 / 1452 : pp = 139.0604705810547 
1390 / 1452 : pp = 139.01670837402344 
1400 / 1452 : pp = 138.94393920898438 
1410 / 1452 : pp = 138.97323608398438 
1420 / 1452 : pp = 138.9404296875 
1430 / 1452 : pp = 138.90943908691406 
1440 / 1452 : pp = 138.94268798828125 
1450 / 1452 : pp = 138.93991088867188 

0 / 115 : pp = 225.55990600585938 
10 / 115 : pp = 207.0504608154297 
20 / 115 : pp = 208.98306274414062 
30 / 115 : pp = 206.28396606445312 
40 / 115 : pp = 205.35386657714844 
50 / 115 : pp = 200.7255401611328 
60 / 115 : pp = 200.0526580810547 
70 / 115 : pp = 196.33087158203125 
80 / 115 : pp = 194.12110900878906 
90 / 115 : pp = 191.52816772460938 
100 / 115 : pp = 186.7974395751953 
110 / 115 : pp = 184.59829711914062 
Training perplexity: 138.9222869873047
Validation perplexity:184.18101501464844
Total time : 43.92928600311279
Epoch 12

0 / 1452 : pp = 173.0251007080078 
10 / 1452 : pp = 152.98446655273438 
20 / 1452 : pp = 150.43128967285156 
30 / 1452 : pp = 147.5819854736328 
40 / 1452 : pp = 149.4164276123047 
50 / 1452 : pp = 146.70816040039062 
60 / 1452 : pp = 145.557861328125 
70 / 1452 : pp = 146.50473022460938 
80 / 1452 : pp = 145.83200073242188 
90 / 1452 : pp = 144.84402465820312 
100 / 1452 : pp = 144.0390167236328 
110 / 1452 : pp = 142.66514587402344 
120 / 1452 : pp = 142.3549346923828 
130 / 1452 : pp = 141.4630126953125 
140 / 1452 : pp = 140.2266082763672 
150 / 1452 : pp = 139.67518615722656 
160 / 1452 : pp = 139.90414428710938 
170 / 1452 : pp = 139.5490264892578 
180 / 1452 : pp = 138.91969299316406 
190 / 1452 : pp = 138.89234924316406 
200 / 1452 : pp = 139.40908813476562 
210 / 1452 : pp = 139.19068908691406 
220 / 1452 : pp = 139.35513305664062 
230 / 1452 : pp = 139.5464324951172 
240 / 1452 : pp = 139.3047637939453 
250 / 1452 : pp = 138.7708740234375 
260 / 1452 : pp = 138.29188537597656 
270 / 1452 : pp = 137.4787139892578 
280 / 1452 : pp = 137.6367950439453 
290 / 1452 : pp = 137.98513793945312 
300 / 1452 : pp = 138.17819213867188 
310 / 1452 : pp = 137.943359375 
320 / 1452 : pp = 138.12060546875 
330 / 1452 : pp = 138.29037475585938 
340 / 1452 : pp = 137.77606201171875 
350 / 1452 : pp = 138.06378173828125 
360 / 1452 : pp = 137.99000549316406 
370 / 1452 : pp = 137.81922912597656 
380 / 1452 : pp = 137.52159118652344 
390 / 1452 : pp = 137.61782836914062 
400 / 1452 : pp = 137.4178924560547 
410 / 1452 : pp = 137.82632446289062 
420 / 1452 : pp = 138.17567443847656 
430 / 1452 : pp = 138.11863708496094 
440 / 1452 : pp = 138.215087890625 
450 / 1452 : pp = 137.9976348876953 
460 / 1452 : pp = 137.6929168701172 
470 / 1452 : pp = 137.25416564941406 
480 / 1452 : pp = 136.75140380859375 
490 / 1452 : pp = 136.51712036132812 
500 / 1452 : pp = 136.0896453857422 
510 / 1452 : pp = 135.97048950195312 
520 / 1452 : pp = 135.7760009765625 
530 / 1452 : pp = 135.50389099121094 
540 / 1452 : pp = 135.01437377929688 
550 / 1452 : pp = 134.7666015625 
560 / 1452 : pp = 134.48973083496094 
570 / 1452 : pp = 134.22853088378906 
580 / 1452 : pp = 133.88455200195312 
590 / 1452 : pp = 133.5808868408203 
600 / 1452 : pp = 133.22975158691406 
610 / 1452 : pp = 132.99591064453125 
620 / 1452 : pp = 132.79502868652344 
630 / 1452 : pp = 132.5094451904297 
640 / 1452 : pp = 132.62892150878906 
650 / 1452 : pp = 132.63499450683594 
660 / 1452 : pp = 132.7379913330078 
670 / 1452 : pp = 132.79046630859375 
680 / 1452 : pp = 132.85842895507812 
690 / 1452 : pp = 132.80364990234375 
700 / 1452 : pp = 132.80477905273438 
710 / 1452 : pp = 132.90170288085938 
720 / 1452 : pp = 132.92971801757812 
730 / 1452 : pp = 132.9019012451172 
740 / 1452 : pp = 133.04811096191406 
750 / 1452 : pp = 133.10877990722656 
760 / 1452 : pp = 133.19189453125 
770 / 1452 : pp = 133.3564910888672 
780 / 1452 : pp = 133.54000854492188 
790 / 1452 : pp = 133.69239807128906 
800 / 1452 : pp = 133.68495178222656 
810 / 1452 : pp = 133.67971801757812 
820 / 1452 : pp = 133.7035675048828 
830 / 1452 : pp = 133.89329528808594 
840 / 1452 : pp = 133.850341796875 
850 / 1452 : pp = 133.90390014648438 
860 / 1452 : pp = 133.9090118408203 
870 / 1452 : pp = 133.89974975585938 
880 / 1452 : pp = 134.0077667236328 
890 / 1452 : pp = 134.03485107421875 
900 / 1452 : pp = 134.0261688232422 
910 / 1452 : pp = 134.10255432128906 
920 / 1452 : pp = 134.17291259765625 
930 / 1452 : pp = 134.14796447753906 
940 / 1452 : pp = 134.20925903320312 
950 / 1452 : pp = 134.19281005859375 
960 / 1452 : pp = 134.17745971679688 
970 / 1452 : pp = 134.18653869628906 
980 / 1452 : pp = 134.03192138671875 
990 / 1452 : pp = 133.94349670410156 
1000 / 1452 : pp = 133.79685974121094 
1010 / 1452 : pp = 133.8438262939453 
1020 / 1452 : pp = 133.9608612060547 
1030 / 1452 : pp = 133.93934631347656 
1040 / 1452 : pp = 134.02833557128906 
1050 / 1452 : pp = 134.01734924316406 
1060 / 1452 : pp = 133.95346069335938 
1070 / 1452 : pp = 134.10205078125 
1080 / 1452 : pp = 134.2030487060547 
1090 / 1452 : pp = 134.23696899414062 
1100 / 1452 : pp = 134.2230224609375 
1110 / 1452 : pp = 134.0829315185547 
1120 / 1452 : pp = 133.980224609375 
1130 / 1452 : pp = 133.83815002441406 
1140 / 1452 : pp = 133.8366241455078 
1150 / 1452 : pp = 133.92108154296875 
1160 / 1452 : pp = 133.94375610351562 
1170 / 1452 : pp = 133.9360809326172 
1180 / 1452 : pp = 133.99684143066406 
1190 / 1452 : pp = 134.0944366455078 
1200 / 1452 : pp = 134.11676025390625 
1210 / 1452 : pp = 134.0911102294922 
1220 / 1452 : pp = 134.22763061523438 
1230 / 1452 : pp = 134.38043212890625 
1240 / 1452 : pp = 134.39817810058594 
1250 / 1452 : pp = 134.5367431640625 
1260 / 1452 : pp = 134.593017578125 
1270 / 1452 : pp = 134.61497497558594 
1280 / 1452 : pp = 134.6423797607422 
1290 / 1452 : pp = 134.64340209960938 
1300 / 1452 : pp = 134.68026733398438 
1310 / 1452 : pp = 134.73556518554688 
1320 / 1452 : pp = 134.69021606445312 
1330 / 1452 : pp = 134.66131591796875 
1340 / 1452 : pp = 134.69393920898438 
1350 / 1452 : pp = 134.7328643798828 
1360 / 1452 : pp = 134.79405212402344 
1370 / 1452 : pp = 134.71237182617188 
1380 / 1452 : pp = 134.6885528564453 
1390 / 1452 : pp = 134.65110778808594 
1400 / 1452 : pp = 134.59584045410156 
1410 / 1452 : pp = 134.6193389892578 
1420 / 1452 : pp = 134.58338928222656 
1430 / 1452 : pp = 134.559326171875 
1440 / 1452 : pp = 134.59507751464844 
1450 / 1452 : pp = 134.59365844726562 

0 / 115 : pp = 226.0741729736328 
10 / 115 : pp = 207.00494384765625 
20 / 115 : pp = 209.26976013183594 
30 / 115 : pp = 206.44662475585938 
40 / 115 : pp = 205.47268676757812 
50 / 115 : pp = 200.7876739501953 
60 / 115 : pp = 200.13414001464844 
70 / 115 : pp = 196.35549926757812 
80 / 115 : pp = 194.10777282714844 
90 / 115 : pp = 191.47467041015625 
100 / 115 : pp = 186.61351013183594 
110 / 115 : pp = 184.30374145507812 
Training perplexity: 134.57826232910156
Validation perplexity:183.8900146484375
Total time : 45.410256147384644
Epoch 13

0 / 1452 : pp = 169.39393615722656 
10 / 1452 : pp = 150.13232421875 
20 / 1452 : pp = 147.60450744628906 
30 / 1452 : pp = 144.64317321777344 
40 / 1452 : pp = 146.47427368164062 
50 / 1452 : pp = 143.929443359375 
60 / 1452 : pp = 142.8344268798828 
70 / 1452 : pp = 143.45248413085938 
80 / 1452 : pp = 142.5418701171875 
90 / 1452 : pp = 141.6178436279297 
100 / 1452 : pp = 140.70127868652344 
110 / 1452 : pp = 139.2852325439453 
120 / 1452 : pp = 138.8017120361328 
130 / 1452 : pp = 137.85629272460938 
140 / 1452 : pp = 136.51718139648438 
150 / 1452 : pp = 136.03619384765625 
160 / 1452 : pp = 136.154296875 
170 / 1452 : pp = 135.67037963867188 
180 / 1452 : pp = 135.0376739501953 
190 / 1452 : pp = 134.9230499267578 
200 / 1452 : pp = 135.4241180419922 
210 / 1452 : pp = 135.24581909179688 
220 / 1452 : pp = 135.37957763671875 
230 / 1452 : pp = 135.67652893066406 
240 / 1452 : pp = 135.4161834716797 
250 / 1452 : pp = 134.90895080566406 
260 / 1452 : pp = 134.46754455566406 
270 / 1452 : pp = 133.68577575683594 
280 / 1452 : pp = 133.86770629882812 
290 / 1452 : pp = 134.18475341796875 
300 / 1452 : pp = 134.39132690429688 
310 / 1452 : pp = 134.19985961914062 
320 / 1452 : pp = 134.37998962402344 
330 / 1452 : pp = 134.5557403564453 
340 / 1452 : pp = 134.00686645507812 
350 / 1452 : pp = 134.27749633789062 
360 / 1452 : pp = 134.20286560058594 
370 / 1452 : pp = 134.042724609375 
380 / 1452 : pp = 133.74398803710938 
390 / 1452 : pp = 133.83584594726562 
400 / 1452 : pp = 133.64382934570312 
410 / 1452 : pp = 134.02366638183594 
420 / 1452 : pp = 134.35415649414062 
430 / 1452 : pp = 134.310546875 
440 / 1452 : pp = 134.3634490966797 
450 / 1452 : pp = 134.15602111816406 
460 / 1452 : pp = 133.86578369140625 
470 / 1452 : pp = 133.43414306640625 
480 / 1452 : pp = 132.90310668945312 
490 / 1452 : pp = 132.646240234375 
500 / 1452 : pp = 132.1982421875 
510 / 1452 : pp = 132.04200744628906 
520 / 1452 : pp = 131.86940002441406 
530 / 1452 : pp = 131.59841918945312 
540 / 1452 : pp = 131.12356567382812 
550 / 1452 : pp = 130.887939453125 
560 / 1452 : pp = 130.6210174560547 
570 / 1452 : pp = 130.37826538085938 
580 / 1452 : pp = 130.0374755859375 
590 / 1452 : pp = 129.75979614257812 
600 / 1452 : pp = 129.38308715820312 
610 / 1452 : pp = 129.16685485839844 
620 / 1452 : pp = 129.0115509033203 
630 / 1452 : pp = 128.75152587890625 
640 / 1452 : pp = 128.87295532226562 
650 / 1452 : pp = 128.88734436035156 
660 / 1452 : pp = 128.98275756835938 
670 / 1452 : pp = 129.0487060546875 
680 / 1452 : pp = 129.11013793945312 
690 / 1452 : pp = 129.0646514892578 
700 / 1452 : pp = 129.06280517578125 
710 / 1452 : pp = 129.1343994140625 
720 / 1452 : pp = 129.18582153320312 
730 / 1452 : pp = 129.15138244628906 
740 / 1452 : pp = 129.29811096191406 
750 / 1452 : pp = 129.339599609375 
760 / 1452 : pp = 129.4257354736328 
770 / 1452 : pp = 129.61631774902344 
780 / 1452 : pp = 129.802734375 
790 / 1452 : pp = 129.96804809570312 
800 / 1452 : pp = 129.95187377929688 
810 / 1452 : pp = 129.92417907714844 
820 / 1452 : pp = 129.9774627685547 
830 / 1452 : pp = 130.1638946533203 
840 / 1452 : pp = 130.13095092773438 
850 / 1452 : pp = 130.16595458984375 
860 / 1452 : pp = 130.173828125 
870 / 1452 : pp = 130.170166015625 
880 / 1452 : pp = 130.27032470703125 
890 / 1452 : pp = 130.3022003173828 
900 / 1452 : pp = 130.3071746826172 
910 / 1452 : pp = 130.37939453125 
920 / 1452 : pp = 130.46229553222656 
930 / 1452 : pp = 130.43846130371094 
940 / 1452 : pp = 130.50889587402344 
950 / 1452 : pp = 130.50086975097656 
960 / 1452 : pp = 130.4833221435547 
970 / 1452 : pp = 130.50814819335938 
980 / 1452 : pp = 130.35577392578125 
990 / 1452 : pp = 130.26759338378906 
1000 / 1452 : pp = 130.1064453125 
1010 / 1452 : pp = 130.1472625732422 
1020 / 1452 : pp = 130.27169799804688 
1030 / 1452 : pp = 130.25100708007812 
1040 / 1452 : pp = 130.30816650390625 
1050 / 1452 : pp = 130.29803466796875 
1060 / 1452 : pp = 130.2242431640625 
1070 / 1452 : pp = 130.35906982421875 
1080 / 1452 : pp = 130.45103454589844 
1090 / 1452 : pp = 130.49838256835938 
1100 / 1452 : pp = 130.484130859375 
1110 / 1452 : pp = 130.35316467285156 
1120 / 1452 : pp = 130.24697875976562 
1130 / 1452 : pp = 130.10804748535156 
1140 / 1452 : pp = 130.1076202392578 
1150 / 1452 : pp = 130.195068359375 
1160 / 1452 : pp = 130.19674682617188 
1170 / 1452 : pp = 130.18321228027344 
1180 / 1452 : pp = 130.24623107910156 
1190 / 1452 : pp = 130.33905029296875 
1200 / 1452 : pp = 130.3650360107422 
1210 / 1452 : pp = 130.34588623046875 
1220 / 1452 : pp = 130.4850616455078 
1230 / 1452 : pp = 130.63160705566406 
1240 / 1452 : pp = 130.64674377441406 
1250 / 1452 : pp = 130.77078247070312 
1260 / 1452 : pp = 130.8397674560547 
1270 / 1452 : pp = 130.8511199951172 
1280 / 1452 : pp = 130.88967895507812 
1290 / 1452 : pp = 130.9040985107422 
1300 / 1452 : pp = 130.93511962890625 
1310 / 1452 : pp = 130.9759063720703 
1320 / 1452 : pp = 130.92800903320312 
1330 / 1452 : pp = 130.9105224609375 
1340 / 1452 : pp = 130.929443359375 
1350 / 1452 : pp = 130.96153259277344 
1360 / 1452 : pp = 131.02381896972656 
1370 / 1452 : pp = 130.9545440673828 
1380 / 1452 : pp = 130.9344940185547 
1390 / 1452 : pp = 130.9055938720703 
1400 / 1452 : pp = 130.85386657714844 
1410 / 1452 : pp = 130.8874969482422 
1420 / 1452 : pp = 130.85928344726562 
1430 / 1452 : pp = 130.83995056152344 
1440 / 1452 : pp = 130.86659240722656 
1450 / 1452 : pp = 130.86839294433594 

0 / 115 : pp = 227.78428649902344 
10 / 115 : pp = 207.609619140625 
20 / 115 : pp = 209.92459106445312 
30 / 115 : pp = 206.96240234375 
40 / 115 : pp = 205.9295654296875 
50 / 115 : pp = 201.0296630859375 
60 / 115 : pp = 200.38059997558594 
70 / 115 : pp = 196.55764770507812 
80 / 115 : pp = 194.31735229492188 
90 / 115 : pp = 191.66146850585938 
100 / 115 : pp = 186.70437622070312 
110 / 115 : pp = 184.3171844482422 
Training perplexity: 130.85043334960938
Validation perplexity:183.88186645507812
Total time : 45.345656394958496
Epoch 14

0 / 1452 : pp = 164.82191467285156 
10 / 1452 : pp = 146.39089965820312 
20 / 1452 : pp = 142.93240356445312 
30 / 1452 : pp = 140.3113555908203 
40 / 1452 : pp = 142.39939880371094 
50 / 1452 : pp = 139.70162963867188 
60 / 1452 : pp = 138.73023986816406 
70 / 1452 : pp = 139.2675018310547 
80 / 1452 : pp = 138.47824096679688 
90 / 1452 : pp = 137.40432739257812 
100 / 1452 : pp = 136.47793579101562 
110 / 1452 : pp = 135.2294464111328 
120 / 1452 : pp = 134.80728149414062 
130 / 1452 : pp = 133.89822387695312 
140 / 1452 : pp = 132.54141235351562 
150 / 1452 : pp = 132.10025024414062 
160 / 1452 : pp = 132.21829223632812 
170 / 1452 : pp = 131.8765106201172 
180 / 1452 : pp = 131.37515258789062 
190 / 1452 : pp = 131.31622314453125 
200 / 1452 : pp = 131.78297424316406 
210 / 1452 : pp = 131.5507354736328 
220 / 1452 : pp = 131.7002410888672 
230 / 1452 : pp = 131.9277801513672 
240 / 1452 : pp = 131.72166442871094 
250 / 1452 : pp = 131.225830078125 
260 / 1452 : pp = 130.7496337890625 
270 / 1452 : pp = 129.9896697998047 
280 / 1452 : pp = 130.10594177246094 
290 / 1452 : pp = 130.41644287109375 
300 / 1452 : pp = 130.5982208251953 
310 / 1452 : pp = 130.36329650878906 
320 / 1452 : pp = 130.5633544921875 
330 / 1452 : pp = 130.77252197265625 
340 / 1452 : pp = 130.273193359375 
350 / 1452 : pp = 130.47889709472656 
360 / 1452 : pp = 130.4348602294922 
370 / 1452 : pp = 130.28126525878906 
380 / 1452 : pp = 130.02786254882812 
390 / 1452 : pp = 130.1564483642578 
400 / 1452 : pp = 129.98440551757812 
410 / 1452 : pp = 130.37721252441406 
420 / 1452 : pp = 130.71859741210938 
430 / 1452 : pp = 130.65939331054688 
440 / 1452 : pp = 130.72987365722656 
450 / 1452 : pp = 130.56272888183594 
460 / 1452 : pp = 130.28195190429688 
470 / 1452 : pp = 129.90936279296875 
480 / 1452 : pp = 129.42857360839844 
490 / 1452 : pp = 129.18077087402344 
500 / 1452 : pp = 128.7588348388672 
510 / 1452 : pp = 128.6303253173828 
520 / 1452 : pp = 128.47616577148438 
530 / 1452 : pp = 128.21148681640625 
540 / 1452 : pp = 127.7218017578125 
550 / 1452 : pp = 127.50067138671875 
560 / 1452 : pp = 127.27574157714844 
570 / 1452 : pp = 127.05399322509766 
580 / 1452 : pp = 126.73983001708984 
590 / 1452 : pp = 126.43692779541016 
600 / 1452 : pp = 126.06050109863281 
610 / 1452 : pp = 125.82952880859375 
620 / 1452 : pp = 125.66295623779297 
630 / 1452 : pp = 125.39354705810547 
640 / 1452 : pp = 125.49463653564453 
650 / 1452 : pp = 125.48816680908203 
660 / 1452 : pp = 125.58712005615234 
670 / 1452 : pp = 125.65978240966797 
680 / 1452 : pp = 125.71456146240234 
690 / 1452 : pp = 125.66937255859375 
700 / 1452 : pp = 125.65900421142578 
710 / 1452 : pp = 125.7271499633789 
720 / 1452 : pp = 125.77758026123047 
730 / 1452 : pp = 125.74129486083984 
740 / 1452 : pp = 125.8759765625 
750 / 1452 : pp = 125.91793823242188 
760 / 1452 : pp = 125.99595642089844 
770 / 1452 : pp = 126.18113708496094 
780 / 1452 : pp = 126.35147094726562 
790 / 1452 : pp = 126.50797271728516 
800 / 1452 : pp = 126.49759674072266 
810 / 1452 : pp = 126.48113250732422 
820 / 1452 : pp = 126.52528381347656 
830 / 1452 : pp = 126.705810546875 
840 / 1452 : pp = 126.67517852783203 
850 / 1452 : pp = 126.74176025390625 
860 / 1452 : pp = 126.74151611328125 
870 / 1452 : pp = 126.73414611816406 
880 / 1452 : pp = 126.83026885986328 
890 / 1452 : pp = 126.88519287109375 
900 / 1452 : pp = 126.88053894042969 
910 / 1452 : pp = 126.97138214111328 
920 / 1452 : pp = 127.04660034179688 
930 / 1452 : pp = 127.03763580322266 
940 / 1452 : pp = 127.1126480102539 
950 / 1452 : pp = 127.09610748291016 
960 / 1452 : pp = 127.0873794555664 
970 / 1452 : pp = 127.10343933105469 
980 / 1452 : pp = 126.96441650390625 
990 / 1452 : pp = 126.88519287109375 
1000 / 1452 : pp = 126.7336654663086 
1010 / 1452 : pp = 126.77796936035156 
1020 / 1452 : pp = 126.89826202392578 
1030 / 1452 : pp = 126.88761138916016 
1040 / 1452 : pp = 126.95309448242188 
1050 / 1452 : pp = 126.96478271484375 
1060 / 1452 : pp = 126.89324188232422 
1070 / 1452 : pp = 127.03242492675781 
1080 / 1452 : pp = 127.13228607177734 
1090 / 1452 : pp = 127.173095703125 
1100 / 1452 : pp = 127.15975189208984 
1110 / 1452 : pp = 127.0392074584961 
1120 / 1452 : pp = 126.94032287597656 
1130 / 1452 : pp = 126.80693054199219 
1140 / 1452 : pp = 126.81315612792969 
1150 / 1452 : pp = 126.90467834472656 
1160 / 1452 : pp = 126.91236114501953 
1170 / 1452 : pp = 126.90897369384766 
1180 / 1452 : pp = 126.98052215576172 
1190 / 1452 : pp = 127.07483673095703 
1200 / 1452 : pp = 127.10216522216797 
1210 / 1452 : pp = 127.08258819580078 
1220 / 1452 : pp = 127.22943878173828 
1230 / 1452 : pp = 127.38563537597656 
1240 / 1452 : pp = 127.40538024902344 
1250 / 1452 : pp = 127.53369140625 
1260 / 1452 : pp = 127.59293365478516 
1270 / 1452 : pp = 127.61489868164062 
1280 / 1452 : pp = 127.6484375 
1290 / 1452 : pp = 127.65257263183594 
1300 / 1452 : pp = 127.69329833984375 
1310 / 1452 : pp = 127.74549102783203 
1320 / 1452 : pp = 127.7043228149414 
1330 / 1452 : pp = 127.6866683959961 
1340 / 1452 : pp = 127.70913696289062 
1350 / 1452 : pp = 127.73233795166016 
1360 / 1452 : pp = 127.7855224609375 
1370 / 1452 : pp = 127.71918487548828 
1380 / 1452 : pp = 127.69987487792969 
1390 / 1452 : pp = 127.6697998046875 
1400 / 1452 : pp = 127.61137390136719 
1410 / 1452 : pp = 127.6404037475586 
1420 / 1452 : pp = 127.61094665527344 
1430 / 1452 : pp = 127.58216857910156 
1440 / 1452 : pp = 127.61477661132812 
1450 / 1452 : pp = 127.61964416503906 

0 / 115 : pp = 228.21578979492188 
10 / 115 : pp = 208.11244201660156 
20 / 115 : pp = 210.688232421875 
30 / 115 : pp = 207.62408447265625 
40 / 115 : pp = 206.45184326171875 
50 / 115 : pp = 201.52760314941406 
60 / 115 : pp = 200.7784881591797 
70 / 115 : pp = 196.83067321777344 
80 / 115 : pp = 194.6357879638672 
90 / 115 : pp = 191.9783935546875 
100 / 115 : pp = 186.8787841796875 
110 / 115 : pp = 184.35252380371094 
Training perplexity: 127.60413360595703
Validation perplexity:183.8877410888672
Total time : 41.6636528968811
Epoch 15

0 / 1452 : pp = 156.81654357910156 
10 / 1452 : pp = 142.1070556640625 
20 / 1452 : pp = 139.55076599121094 
30 / 1452 : pp = 136.63551330566406 
40 / 1452 : pp = 138.5840606689453 
50 / 1452 : pp = 136.052734375 
60 / 1452 : pp = 134.93019104003906 
70 / 1452 : pp = 135.65206909179688 
80 / 1452 : pp = 135.2620086669922 
90 / 1452 : pp = 134.314697265625 
100 / 1452 : pp = 133.4916229248047 
110 / 1452 : pp = 132.26052856445312 
120 / 1452 : pp = 131.7714080810547 
130 / 1452 : pp = 130.77365112304688 
140 / 1452 : pp = 129.5411834716797 
150 / 1452 : pp = 129.0791778564453 
160 / 1452 : pp = 129.21920776367188 
170 / 1452 : pp = 128.7528839111328 
180 / 1452 : pp = 128.22279357910156 
190 / 1452 : pp = 128.18177795410156 
200 / 1452 : pp = 128.58758544921875 
210 / 1452 : pp = 128.3906707763672 
220 / 1452 : pp = 128.5266571044922 
230 / 1452 : pp = 128.80563354492188 
240 / 1452 : pp = 128.61886596679688 
250 / 1452 : pp = 128.13172912597656 
260 / 1452 : pp = 127.69220733642578 
270 / 1452 : pp = 126.96150970458984 
280 / 1452 : pp = 127.04702758789062 
290 / 1452 : pp = 127.33565521240234 
300 / 1452 : pp = 127.55929565429688 
310 / 1452 : pp = 127.38514709472656 
320 / 1452 : pp = 127.52171325683594 
330 / 1452 : pp = 127.68690490722656 
340 / 1452 : pp = 127.18340301513672 
350 / 1452 : pp = 127.4073257446289 
360 / 1452 : pp = 127.30432891845703 
370 / 1452 : pp = 127.17618560791016 
380 / 1452 : pp = 126.92579650878906 
390 / 1452 : pp = 127.02473449707031 
400 / 1452 : pp = 126.8515625 
410 / 1452 : pp = 127.211669921875 
420 / 1452 : pp = 127.51788330078125 
430 / 1452 : pp = 127.47386169433594 
440 / 1452 : pp = 127.57164001464844 
450 / 1452 : pp = 127.3601303100586 
460 / 1452 : pp = 127.09434509277344 
470 / 1452 : pp = 126.71922302246094 
480 / 1452 : pp = 126.24349212646484 
490 / 1452 : pp = 125.98778533935547 
500 / 1452 : pp = 125.59526824951172 
510 / 1452 : pp = 125.4450912475586 
520 / 1452 : pp = 125.29247283935547 
530 / 1452 : pp = 125.03536224365234 
540 / 1452 : pp = 124.5813980102539 
550 / 1452 : pp = 124.33724212646484 
560 / 1452 : pp = 124.08995819091797 
570 / 1452 : pp = 123.86637878417969 
580 / 1452 : pp = 123.53152465820312 
590 / 1452 : pp = 123.20321655273438 
600 / 1452 : pp = 122.85673522949219 
610 / 1452 : pp = 122.64250946044922 
620 / 1452 : pp = 122.4958724975586 
630 / 1452 : pp = 122.22386169433594 
640 / 1452 : pp = 122.31143188476562 
650 / 1452 : pp = 122.30093383789062 
660 / 1452 : pp = 122.39427947998047 
670 / 1452 : pp = 122.45440673828125 
680 / 1452 : pp = 122.51146697998047 
690 / 1452 : pp = 122.4854736328125 
700 / 1452 : pp = 122.48600006103516 
710 / 1452 : pp = 122.56084442138672 
720 / 1452 : pp = 122.59059143066406 
730 / 1452 : pp = 122.55529022216797 
740 / 1452 : pp = 122.69409942626953 
750 / 1452 : pp = 122.76456451416016 
760 / 1452 : pp = 122.84437561035156 
770 / 1452 : pp = 123.02527618408203 
780 / 1452 : pp = 123.20509338378906 
790 / 1452 : pp = 123.36305236816406 
800 / 1452 : pp = 123.36852264404297 
810 / 1452 : pp = 123.36799621582031 
820 / 1452 : pp = 123.39976501464844 
830 / 1452 : pp = 123.59362030029297 
840 / 1452 : pp = 123.56946563720703 
850 / 1452 : pp = 123.63800811767578 
860 / 1452 : pp = 123.63983917236328 
870 / 1452 : pp = 123.64148712158203 
880 / 1452 : pp = 123.7568588256836 
890 / 1452 : pp = 123.7885513305664 
900 / 1452 : pp = 123.79640197753906 
910 / 1452 : pp = 123.86153411865234 
920 / 1452 : pp = 123.92941284179688 
930 / 1452 : pp = 123.9125747680664 
940 / 1452 : pp = 123.95559692382812 
950 / 1452 : pp = 123.93928527832031 
960 / 1452 : pp = 123.94294738769531 
970 / 1452 : pp = 123.95547485351562 
980 / 1452 : pp = 123.8229751586914 
990 / 1452 : pp = 123.73727416992188 
1000 / 1452 : pp = 123.59091186523438 
1010 / 1452 : pp = 123.634765625 
1020 / 1452 : pp = 123.76506042480469 
1030 / 1452 : pp = 123.75485229492188 
1040 / 1452 : pp = 123.807861328125 
1050 / 1452 : pp = 123.79156494140625 
1060 / 1452 : pp = 123.73054504394531 
1070 / 1452 : pp = 123.8615951538086 
1080 / 1452 : pp = 123.96564483642578 
1090 / 1452 : pp = 124.02104187011719 
1100 / 1452 : pp = 124.012939453125 
1110 / 1452 : pp = 123.87582397460938 
1120 / 1452 : pp = 123.775390625 
1130 / 1452 : pp = 123.63182067871094 
1140 / 1452 : pp = 123.62391662597656 
1150 / 1452 : pp = 123.71013641357422 
1160 / 1452 : pp = 123.72423553466797 
1170 / 1452 : pp = 123.71726989746094 
1180 / 1452 : pp = 123.79032897949219 
1190 / 1452 : pp = 123.87883758544922 
1200 / 1452 : pp = 123.9125747680664 
1210 / 1452 : pp = 123.90140533447266 
1220 / 1452 : pp = 124.03245544433594 
1230 / 1452 : pp = 124.19799041748047 
1240 / 1452 : pp = 124.21469116210938 
1250 / 1452 : pp = 124.34103393554688 
1260 / 1452 : pp = 124.4041976928711 
1270 / 1452 : pp = 124.42852020263672 
1280 / 1452 : pp = 124.46656036376953 
1290 / 1452 : pp = 124.4811019897461 
1300 / 1452 : pp = 124.52384185791016 
1310 / 1452 : pp = 124.57533264160156 
1320 / 1452 : pp = 124.5398178100586 
1330 / 1452 : pp = 124.52598571777344 
1340 / 1452 : pp = 124.53311157226562 
1350 / 1452 : pp = 124.57759094238281 
1360 / 1452 : pp = 124.63385772705078 
1370 / 1452 : pp = 124.58133697509766 
1380 / 1452 : pp = 124.55769348144531 
1390 / 1452 : pp = 124.54011535644531 
1400 / 1452 : pp = 124.4884033203125 
1410 / 1452 : pp = 124.51226806640625 
1420 / 1452 : pp = 124.49683380126953 
1430 / 1452 : pp = 124.4754638671875 
1440 / 1452 : pp = 124.50164031982422 
1450 / 1452 : pp = 124.50894165039062 

0 / 115 : pp = 230.8488006591797 
10 / 115 : pp = 209.2509002685547 
20 / 115 : pp = 211.68577575683594 
30 / 115 : pp = 208.44056701660156 
40 / 115 : pp = 207.2039337158203 
50 / 115 : pp = 202.1859588623047 
60 / 115 : pp = 201.34739685058594 
70 / 115 : pp = 197.4251251220703 
80 / 115 : pp = 195.2623291015625 
90 / 115 : pp = 192.592529296875 
100 / 115 : pp = 187.39553833007812 
110 / 115 : pp = 184.791259765625 
Training perplexity: 124.4933853149414
Validation perplexity:184.32510375976562
Total time : 40.856229066848755

0 / 128 : pp = 184.6475067138672 
10 / 128 : pp = 176.8856964111328 
20 / 128 : pp = 164.3444366455078 
30 / 128 : pp = 167.85472106933594 
40 / 128 : pp = 169.25367736816406 
50 / 128 : pp = 168.86561584472656 
60 / 128 : pp = 168.11801147460938 
70 / 128 : pp = 165.4105224609375 
80 / 128 : pp = 162.91146850585938 
90 / 128 : pp = 161.29742431640625 
100 / 128 : pp = 162.45989990234375 
110 / 128 : pp = 162.6834716796875 
120 / 128 : pp = 164.3359832763672 
=-==-==-==-==-=
Test perplexity: 164.0149383544922 
=-==-==-==-==-=
View Code

更详细的内容请参考下面链接

https://github.com/weizhenzhao/cs224d_nlp_problem_set2

 

今天将的还是cs224d 的problem set2 的第三部分习题,

原来国外大学的系统难度真的如此之大,相比之下还是默默地再天朝继续搬砖吧

下面讲述一下RNN语言建模的数学公式:

 

给出一串连续的词x1,x2...xt关于预测其后面紧跟的词xt+1的建模方式是:

vj是词库中的某个词。实现一个循环神经网络,此网络利用隐层中的反馈信息对"历史记录"x1,x2...xt进行建模:

$h^{(0)}=h_{0}\epsilon R^{D_{h}}$是隐藏层的初始化向量

$x^{(t)}L$是以$x^{(t)}$one-hot行向量与嵌入矩阵L的乘积

这个one-hot行向量就是当前处理词汇的索引

            

是词嵌入矩阵,

$L$是词嵌入矩阵

$I$是输入词表征矩阵

$H$是隐藏转换矩阵

$U$是输出词表征矩阵

$b_{1}$ $b_{2}$是偏置值

$d$是词嵌入的维数

|V|代表词库的规模

$D_{h}$是隐层的维数

输出向量

是面向整个词库的概率分布,我们需要最优化交叉熵(非正则化的)的损失率: 

使用困惑度来评估语言模型的性能,其定义形式如下:

梯度:

该模型中各个变量进行最优化迭代的时候的梯度如下所示:

初始化所有的上面这些需要训练的参数的值

然后通过对每一个词进行训练,安装上述公司求出每个参数的导数值

然后使用梯度下降方法对其进行更新

将新得到的参数代入到模型中,如果损失的值小于初始设定的值则停止迭代,否则继续进行迭代 

 


下面是一张RNNLM的结构图

 

上面这张是第二层RNN节点的结构图

上面这张是在RNN的变量上面应用Dropout的结构,降低模型过拟合的误差,第一层RNN的dropout结构

上面这张是第一层RNN的结构图

(注意前方高能,一大批天书即将来袭)

'''
Created on 2017年9月26日

@author: weizhen
'''
import getpass
import sys
import time
import numpy as np
from copy import deepcopy
from utils import calculate_perplexity, get_ptb_dataset, Vocab
from utils import ptb_iterator, sample
import tensorflow as tf
from model import LanguageModel
from tensorflow.contrib.legacy_seq2seq.python.ops.seq2seq import sequence_loss


class Config(object):
    """储存超参数和数据信息"""
    batch_size = 64
    embed_size = 50
    hidden_size = 100
    num_steps = 10
    max_epochs = 16
    early_stopping = 2
    dropout = 0.9
    lr = 0.001


class RNNLM_Model(LanguageModel):
    def load_data(self, debug=False):
        """加载词向量并且训练   train/dev/test 数据"""
        self.vocab = Vocab()
        self.vocab.construct(get_ptb_dataset('train'))
        self.encoded_train = np.array([self.vocab.encode(word) for word in get_ptb_dataset('train')], dtype=np.int32)
        self.encoded_valid = np.array([self.vocab.encode(word) for word in get_ptb_dataset('valid')], dtype=np.int32)
        self.encoded_test = np.array([self.vocab.encode(word) for word in get_ptb_dataset('test')])
        if debug:
            num_debug = 1024
            self.encoded_train = self.encoded_train[:num_debug]
            self.encoded_valid = self.encoded_valid[:num_debug]
            self.encoded_test = self.encoded_test[:num_debug]

    def add_placeholders(self):
        """生成placeholder 变量来表示输入的 tensors
            这些placeholder 被用来在模型的其他地方被填充
                            并且在训练的过程中会被填充
            input_placeholder:Input placeholder shape (None,num_steps),type  tf.int32
            labels_placeholder:label placeholder shape (None,num_steps) type tf.float32
            dropout_placeholder:dropput value placeholder (scalar), type tf.float32
        """
        self.input_placeholder = tf.placeholder(tf.int32, shape=[None, self.config.num_steps], name='Input')
        self.labels_placeholder = tf.placeholder(tf.int32, shape=[None, self.config.num_steps], name='Target')
        self.dropout_placeholder = tf.placeholder(tf.float32, name='Dropout')

    def add_embedding(self):
        """添加词嵌入层
        Hint : 这一层应该用input_placeholder 来索引词嵌入
        Hint : 你或许能发现tf.nn.embedding_lookup 是有用的
        Hint : 你或许能发现tf.split , tf.squeeze 是有用的在构造tensor 的输入的时候
        Hint : 下面是你需要创建的变量的维度
                L:(len(self.vocab),embed_size)
        Returns:
            inputs:一个训练次数的列表,每一个元素应该是
                    一个张量 大小是 (batch_size,embed_size)
        tf.split(dimension,num_split,input)
                dimension表示输入张量的哪一个维度,
                                        如果是0就表示对第0维度进行切割,
                num_split就是切割的数量,
                                        如果是2就表示输入张量被切成2份,
                                        每一份是一个列表
        tf.squeeze(input,squeeze_dims=None,name=None)
                                        从tensor中删除所有大小是1的维度
                example: t is a tensor of shape [1,2,1,3,1,1]
                        shape(squeeze(t))==>[2,3]
                        t is a tensor of shape [1,2,1,3,1,1]
                        shape(squeeze(t,[2,4]))==>[1,2,3,1]
        tf.nn.embedding_lookup 将词的索引映射到词的向量
        """
        with tf.device('/cpu:0'):
            embedding = tf.get_variable('Embedding', [len(self.vocab), self.config.embed_size], trainable=True)
            inputs = tf.nn.embedding_lookup(embedding, self.input_placeholder)
            inputs = [tf.squeeze(x, [1]) for x in tf.split(inputs, self.config.num_steps, 1)]
            return inputs

    def add_projection(self, rnn_outputs):
        """添加一个投影层
            投影层将隐藏层的表示变换到整个词向量上的分布式表示
            Hint:下面是你需要去创建的维度
                U(hidden_size,len(vocab))
                b_2:(len(vocab),)
            参数:
                rnn_outputs:一个训练次数的列表,每一个元素应该是一个张量
                            大小是(batch_size,embed_size)
            Returns:
                outputs:一个长度的列表,每一个元素是一个张量(batch_size,len(vocab))
        """
        with tf.variable_scope('Projection'):
            U = tf.get_variable('Matrix', [self.config.hidden_size, len(self.vocab)])
            proj_b = tf.get_variable('Bias', [len(self.vocab)])
            outputs = [tf.matmul(o, U) + proj_b for o in rnn_outputs]
        return outputs
    
    def add_loss_op(self, output):
        """将损失添加到目标函数上面
            Hint:使用tensorflow.python.ops.seq2seq.sequence_loss 来实现序列损失
                              参数:
                                        输出:一个张量   大小是 (None,self.vocab)
                              返回:
                                        损失:一个0-d大小的张量
        """
        all_ones = [tf.ones([self.config.batch_size * self.config.num_steps])]
        cross_entropy = sequence_loss([output], [tf.reshape(self.labels_placeholder, [-1])], all_ones, len(self.vocab))
        tf.add_to_collection('total_loss', cross_entropy)
        loss = tf.add_n(tf.get_collection('total_loss'))
        return loss
        
        
    def add_training_op(self, loss):
        """将目标损失添加到计算图上
            创建一个优化器并且应用梯度下降到所有的训练变量上面
            Hint:使用tf.train.AdamOptimizer 对于这个模型
                使用optimizer.minimize() 会返回一个train_op的对象
            参数:
                loss: 损失张量,来自于cross_entropy_loss 交叉熵损失
            返回:
                train_op:训练的目标
        """
        with tf.variable_scope("Optimizer") as scope:
            train_op = tf.train.AdamOptimizer(self.config.lr).minimize(loss)
        return train_op

    def __init__(self, config):
        self.config = config
        self.load_data(debug=False)
        self.add_placeholders()
        self.inputs = self.add_embedding()
        self.rnn_outputs = self.add_model(self.inputs)
        self.outputs = self.add_projection(self.rnn_outputs)

        # 我们想去检验下一个词预测得多好
        # 我们把o转变成float64 位 因为如果不这样就会有数值问题
        # sum(output of softmax) = 1.00000298179 并且不是 1
        self.predictions = [tf.nn.softmax(tf.cast(o, 'float64')) for o in self.outputs]
        # 将输出值转变成 len(vocab) 的大小
        output = tf.reshape(tf.concat(self.outputs, 1), [-1, len(self.vocab)])
        self.calculate_loss = self.add_loss_op(output)
        self.train_step = self.add_training_op(self.calculate_loss)

    def add_model(self, inputs):
        """创建RNN LM 模型
                      在下面的实现里面你需要去实现RNN LM 模型的等式
        Hint: 使用一个零向量 大小是 (batch_size,hidden_size) 作为初始的RNN的状态
        Hint: 将最后RNN输出 作为实例变量
            self.final_state
        Hint : 确保将dropout应用到 输入和输出的 变量上面
        Hint : 使用变量域 RNN 来定义 RNN变量
        Hint : 表现一个明显的 for-loop 在输入上面
                你可以使用scope.reuse_variable() 来确定权重
                在每一次迭代都是相同的
                确保不会在第一次循环的时候调用这个,因为没有变量会被初始化
        Hint : 下面变量的不同的维度 , 你需要去创建的

            H: (hidden_size,hidden_size)
            I: (embed_size,hidden_size)
            b_1:(hidden_size,)
        Args:
            inputs:一个记录num_steps的列表,里边的每一个元素应该是一个张量
                    大小是(batch_size,embed_size)的大小
        Returns:返回
            outputs:一个记录num_steps的列表,里面每一个元素应该是一个张量
                    大小是(batch_size,hidden_size)
        """
        with tf.variable_scope('InputDropout'):
            inputs = [tf.nn.dropout(x, self.dropout_placeholder) for x in inputs]

        with tf.variable_scope('RNN') as scope:
            self.initial_state = tf.zeros([self.config.batch_size, self.config.hidden_size])
            state = self.initial_state
            rnn_outputs = []
            for tstep, current_input in enumerate(inputs):
                if tstep > 0:
                    scope.reuse_variables()
                RNN_H = tf.get_variable('HMatrix', [self.config.hidden_size, self.config.hidden_size])
                RNN_I = tf.get_variable('IMatrix', [self.config.embed_size, self.config.hidden_size])
                RNN_b = tf.get_variable('B', [self.config.hidden_size])
                state = tf.nn.sigmoid(tf.matmul(state, RNN_H) + tf.matmul(current_input, RNN_I) + RNN_b)
                rnn_outputs.append(state)
            self.final_state = rnn_outputs[-1]

        with tf.variable_scope('RNNDropout'):
            rnn_outputs = [tf.nn.dropout(x, self.dropout_placeholder) for x in rnn_outputs]
        return rnn_outputs

    def run_epoch(self, session, data, train_op=None, verbose=10):
        config = self.config
        dp = config.dropout
        if not train_op:
            train_op = tf.no_op()
            dp = 1
        total_steps = sum(1 for x in ptb_iterator(data, config.batch_size, config.num_steps))
        total_loss = []
        state = self.initial_state.eval()
        for step, (x, y) in enumerate(ptb_iterator(data, config.batch_size, config.num_steps)):
            # 我们需要通过初始状态,并且从最终状态中抽取数据来进行填充
            # RNN 合适的 历史
            feed = {self.input_placeholder: x,
                    self.labels_placeholder: y,
                    self.initial_state: state,
                    self.dropout_placeholder: dp
                    }
            loss, state, _ = session.run([self.calculate_loss, self.final_state, train_op], feed_dict=feed)
            total_loss.append(loss)
            if verbose and step % verbose == 0:
                sys.stdout.write('\r{} / {} : pp = {} '.format(step, total_steps, np.exp(np.mean(total_loss))))
                sys.stdout.flush()
        if verbose:
            sys.stdout.write('\r')
        return np.exp(np.mean(total_loss))

def generate_text(session, model, config, starting_text='<eos>', stop_length=100, stop_tokens=None, temp=1.0):
    """从模型自动生成文字
        Hint:创建一个feed-dictionary 并且使用sess.run()方法去执行这个模型
                你会需要使用model.initial_state 作为一个键传递给feed_dict
        Hint:得到model.final_state 和 model.predictions[-1].
             在add_model()方法中设置model.final_state  。
             model.predictions 是在 __init__方法中设置的
        Hint:在模型的训练中存储输出的参数值,和预测的y_pred的值
        参数:
        Args:
            session : tf.Session() object
            model : Object of type RNNLM Model
            config : A Config() object
            starting_text:Initial text passed to model
        Returns:
            output : List of word idxs
    """
    state = model.initial_state.eval()
    # Imagine tokens as a batch size of one, length of len(tokens[0])
    tokens = [model.vocab.encode(word) for word in starting_text.split()]
    for i in range(stop_length):
        feed = {model.input_placeholder: [tokens[-1:]],
                model.initial_state: state,
                model.dropout_placeholder: 1}
        state, y_pred = session.run([model.final_state, model.predictions[-1]], feed_dict=feed)
        next_word_idx = sample(y_pred[0], temperature=temp)
        tokens.append(next_word_idx)
        if stop_tokens and model.vocab.decode(tokens[-1]) in stop_tokens:
            break
    output = [model.vocab.decode(word_idx) for word_idx in tokens]
    return output

def generate_sentence(session, model, config, *args, **kwargs):
    """方便从模型来生成句子"""
    return generate_text(session, model, config, *args, stop_tokens=['<eos>'], **kwargs)

def test_RNNLM():
    config = Config()
    gen_config = deepcopy(config)
    gen_config.batch_size = gen_config.num_steps = 1

    # 创建训练模型,并且生成模型
    with tf.variable_scope('RNNLM',reuse=None) as scope:
        model = RNNLM_Model(config)
        # 这个指示gen_model来重新使用相同的变量作为以上的模型
        scope.reuse_variables()
        gen_model = RNNLM_Model(gen_config)

    init = tf.global_variables_initializer()
    saver = tf.train.Saver()

    with tf.Session() as session:
        best_val_pp = float('inf')
        best_val_epoch = 0
        session.run(init)
        for epoch in range(config.max_epochs):
            print('Epoch {0}'.format(epoch))
            start = time.time()

            train_pp = model.run_epoch(session,
                                       model.encoded_train,
                                       train_op=model.train_step)
            valid_pp = model.run_epoch(session, model.encoded_valid)
            print('Training perplexity: {0}'.format(train_pp))
            print('Validation perplexity:{0}'.format(valid_pp))
            if valid_pp < best_val_pp:
                best_val_pp = valid_pp
                best_val_epoch = epoch
                saver.save(session, './ptb_rnnlm.weights')
            if epoch - best_val_epoch > config.early_stopping:
                break
            print('Total time : {0}'.format(time.time() - start))

        saver.restore(session, 'ptb_rnnlm.weights')
        test_pp = model.run_epoch(session, model.encoded_test)
        print('=-=' * 5)
        print('Test perplexity: {0} '.format(test_pp))
        print('=-=' * 5)
        starting_text = 'in palo alto'
        while starting_text:
            print(' '.join(generate_sentence(session, gen_model, gen_config, starting_text=starting_text, temp=1.0)))
            #starting_text = raw_input('>')


if __name__ == "__main__":
    test_RNNLM()

(其实也不算是天书啦,比高数简单多啦,比数学分析那是简单了好几十万倍了呀)

下面是训练的Log

1380 / 1452 : pp = 266.20892333984375 
1390 / 1452 : pp = 265.94439697265625 
1400 / 1452 : pp = 265.66845703125 
1410 / 1452 : pp = 265.5393981933594 
1420 / 1452 : pp = 265.32489013671875 
1430 / 1452 : pp = 265.2019348144531 
1440 / 1452 : pp = 265.13720703125 
1450 / 1452 : pp = 264.954833984375 

0 / 115 : pp = 296.9217224121094 
10 / 115 : pp = 282.02130126953125 
20 / 115 : pp = 279.76800537109375 
30 / 115 : pp = 276.4101257324219 
40 / 115 : pp = 276.2939147949219 
50 / 115 : pp = 270.73565673828125 
60 / 115 : pp = 269.88134765625 
70 / 115 : pp = 266.8675231933594 
80 / 115 : pp = 263.6731872558594 
90 / 115 : pp = 260.8569030761719 
100 / 115 : pp = 256.3356628417969 
110 / 115 : pp = 255.1026611328125 
Training perplexity: 264.9092102050781
Validation perplexity:254.84902954101562
Total time : 41.65332388877869
Epoch 3

0 / 1452 : pp = 327.0847473144531 
10 / 1452 : pp = 273.9620056152344 
20 / 1452 : pp = 270.22943115234375 
30 / 1452 : pp = 263.5213317871094 
40 / 1452 : pp = 264.0644836425781 
50 / 1452 : pp = 258.6029968261719 
60 / 1452 : pp = 257.04290771484375 
70 / 1452 : pp = 257.59161376953125 
80 / 1452 : pp = 256.7600402832031 
90 / 1452 : pp = 254.5120391845703 
100 / 1452 : pp = 252.44725036621094 
110 / 1452 : pp = 250.13954162597656 
120 / 1452 : pp = 249.91647338867188 
130 / 1452 : pp = 249.50460815429688 
140 / 1452 : pp = 247.67440795898438 
150 / 1452 : pp = 247.19090270996094 
160 / 1452 : pp = 247.8919219970703 
170 / 1452 : pp = 247.54322814941406 
180 / 1452 : pp = 246.17623901367188 
190 / 1452 : pp = 245.78330993652344 
200 / 1452 : pp = 246.80552673339844 
210 / 1452 : pp = 246.3059844970703 
220 / 1452 : pp = 246.19021606445312 
230 / 1452 : pp = 246.70140075683594 
240 / 1452 : pp = 246.3099822998047 
250 / 1452 : pp = 245.1745147705078 
260 / 1452 : pp = 244.17384338378906 
270 / 1452 : pp = 242.57363891601562 
280 / 1452 : pp = 242.8500213623047 
290 / 1452 : pp = 243.0492706298828 
300 / 1452 : pp = 243.1466522216797 
310 / 1452 : pp = 242.89044189453125 
320 / 1452 : pp = 243.08045959472656 
330 / 1452 : pp = 243.32235717773438 
340 / 1452 : pp = 242.34715270996094 
350 / 1452 : pp = 242.80972290039062 
360 / 1452 : pp = 242.5345458984375 
370 / 1452 : pp = 242.0083465576172 
380 / 1452 : pp = 241.22708129882812 
390 / 1452 : pp = 241.24398803710938 
400 / 1452 : pp = 240.63473510742188 
410 / 1452 : pp = 240.94094848632812 
420 / 1452 : pp = 241.19717407226562 
430 / 1452 : pp = 240.8896026611328 
440 / 1452 : pp = 240.7772979736328 
450 / 1452 : pp = 240.45913696289062 
460 / 1452 : pp = 240.06674194335938 
470 / 1452 : pp = 239.42198181152344 
480 / 1452 : pp = 238.39271545410156 
490 / 1452 : pp = 238.0517120361328 
500 / 1452 : pp = 237.31752014160156 
510 / 1452 : pp = 237.1197967529297 
520 / 1452 : pp = 236.64865112304688 
530 / 1452 : pp = 236.004638671875 
540 / 1452 : pp = 235.192626953125 
550 / 1452 : pp = 234.6700439453125 
560 / 1452 : pp = 234.1914825439453 
570 / 1452 : pp = 233.80899047851562 
580 / 1452 : pp = 233.3753662109375 
590 / 1452 : pp = 232.8699188232422 
600 / 1452 : pp = 232.2629852294922 
610 / 1452 : pp = 231.8668212890625 
620 / 1452 : pp = 231.478515625 
630 / 1452 : pp = 231.0444793701172 
640 / 1452 : pp = 231.2737579345703 
650 / 1452 : pp = 231.28114318847656 
660 / 1452 : pp = 231.4324951171875 
670 / 1452 : pp = 231.48513793945312 
680 / 1452 : pp = 231.45932006835938 
690 / 1452 : pp = 231.17738342285156 
700 / 1452 : pp = 231.00570678710938 
710 / 1452 : pp = 231.03810119628906 
720 / 1452 : pp = 230.96131896972656 
730 / 1452 : pp = 230.91110229492188 
740 / 1452 : pp = 231.13539123535156 
750 / 1452 : pp = 231.04393005371094 
760 / 1452 : pp = 231.03489685058594 
770 / 1452 : pp = 231.19744873046875 
780 / 1452 : pp = 231.26625061035156 
790 / 1452 : pp = 231.38714599609375 
800 / 1452 : pp = 231.24441528320312 
810 / 1452 : pp = 231.16824340820312 
820 / 1452 : pp = 231.11831665039062 
830 / 1452 : pp = 231.34886169433594 
840 / 1452 : pp = 231.221923828125 
850 / 1452 : pp = 231.2562255859375 
860 / 1452 : pp = 231.26492309570312 
870 / 1452 : pp = 231.1961212158203 
880 / 1452 : pp = 231.30506896972656 
890 / 1452 : pp = 231.24728393554688 
900 / 1452 : pp = 231.15744018554688 
910 / 1452 : pp = 231.20175170898438 
920 / 1452 : pp = 231.25534057617188 
930 / 1452 : pp = 231.09461975097656 
940 / 1452 : pp = 231.12612915039062 
950 / 1452 : pp = 231.0475616455078 
960 / 1452 : pp = 230.86056518554688 
970 / 1452 : pp = 230.80377197265625 
980 / 1452 : pp = 230.4598846435547 
990 / 1452 : pp = 230.24559020996094 
1000 / 1452 : pp = 229.91030883789062 
1010 / 1452 : pp = 229.9349822998047 
1020 / 1452 : pp = 230.01470947265625 
1030 / 1452 : pp = 229.8909149169922 
1040 / 1452 : pp = 229.9403533935547 
1050 / 1452 : pp = 229.84815979003906 
1060 / 1452 : pp = 229.60377502441406 
1070 / 1452 : pp = 229.74647521972656 
1080 / 1452 : pp = 229.80410766601562 
1090 / 1452 : pp = 229.78733825683594 
1100 / 1452 : pp = 229.64549255371094 
1110 / 1452 : pp = 229.26255798339844 
1120 / 1452 : pp = 229.00262451171875 
1130 / 1452 : pp = 228.6716766357422 
1140 / 1452 : pp = 228.55067443847656 
1150 / 1452 : pp = 228.61563110351562 
1160 / 1452 : pp = 228.50958251953125 
1170 / 1452 : pp = 228.3498992919922 
1180 / 1452 : pp = 228.29786682128906 
1190 / 1452 : pp = 228.33204650878906 
1200 / 1452 : pp = 228.27369689941406 
1210 / 1452 : pp = 228.11831665039062 
1220 / 1452 : pp = 228.21775817871094 
1230 / 1452 : pp = 228.3170166015625 
1240 / 1452 : pp = 228.22134399414062 
1250 / 1452 : pp = 228.3769073486328 
1260 / 1452 : pp = 228.37527465820312 
1270 / 1452 : pp = 228.33694458007812 
1280 / 1452 : pp = 228.27108764648438 
1290 / 1452 : pp = 228.1731414794922 
1300 / 1452 : pp = 228.12200927734375 
1310 / 1452 : pp = 228.10275268554688 
1320 / 1452 : pp = 227.9289093017578 
1330 / 1452 : pp = 227.77723693847656 
1340 / 1452 : pp = 227.79623413085938 
1350 / 1452 : pp = 227.7408447265625 
1360 / 1452 : pp = 227.72586059570312 
1370 / 1452 : pp = 227.49728393554688 
1380 / 1452 : pp = 227.37940979003906 
1390 / 1452 : pp = 227.20166015625 
1400 / 1452 : pp = 227.018310546875 
1410 / 1452 : pp = 226.95651245117188 
1420 / 1452 : pp = 226.8065643310547 
1430 / 1452 : pp = 226.7261199951172 
1440 / 1452 : pp = 226.7193145751953 
1450 / 1452 : pp = 226.61068725585938 

0 / 115 : pp = 269.342041015625 
10 / 115 : pp = 255.03016662597656 
20 / 115 : pp = 253.8992919921875 
30 / 115 : pp = 251.04025268554688 
40 / 115 : pp = 250.51756286621094 
50 / 115 : pp = 245.3595428466797 
60 / 115 : pp = 244.4713897705078 
70 / 115 : pp = 241.2674560546875 
80 / 115 : pp = 238.3473663330078 
90 / 115 : pp = 235.56423950195312 
100 / 115 : pp = 231.2281036376953 
110 / 115 : pp = 229.8423614501953 
Training perplexity: 226.5760040283203
Validation perplexity:229.59939575195312
Total time : 42.202677726745605
Epoch 4

0 / 1452 : pp = 282.2423095703125 
10 / 1452 : pp = 240.16258239746094 
20 / 1452 : pp = 236.12203979492188 
30 / 1452 : pp = 230.3953857421875 
40 / 1452 : pp = 231.8789825439453 
50 / 1452 : pp = 227.26612854003906 
60 / 1452 : pp = 226.22061157226562 
70 / 1452 : pp = 227.01885986328125 
80 / 1452 : pp = 226.2459716796875 
90 / 1452 : pp = 224.3211669921875 
100 / 1452 : pp = 222.65615844726562 
110 / 1452 : pp = 220.70326232910156 
120 / 1452 : pp = 220.42288208007812 
130 / 1452 : pp = 219.8100128173828 
140 / 1452 : pp = 218.04432678222656 
150 / 1452 : pp = 217.31639099121094 
160 / 1452 : pp = 217.86349487304688 
170 / 1452 : pp = 217.46597290039062 
180 / 1452 : pp = 216.3349151611328 
190 / 1452 : pp = 216.12240600585938 
200 / 1452 : pp = 216.97842407226562 
210 / 1452 : pp = 216.51014709472656 
220 / 1452 : pp = 216.46751403808594 
230 / 1452 : pp = 216.80126953125 
240 / 1452 : pp = 216.45965576171875 
250 / 1452 : pp = 215.5008544921875 
260 / 1452 : pp = 214.62210083007812 
270 / 1452 : pp = 213.29183959960938 
280 / 1452 : pp = 213.5621337890625 
290 / 1452 : pp = 213.80657958984375 
300 / 1452 : pp = 213.8963165283203 
310 / 1452 : pp = 213.60653686523438 
320 / 1452 : pp = 213.85877990722656 
330 / 1452 : pp = 214.07345581054688 
340 / 1452 : pp = 213.25421142578125 
350 / 1452 : pp = 213.68019104003906 
360 / 1452 : pp = 213.41717529296875 
370 / 1452 : pp = 213.04920959472656 
380 / 1452 : pp = 212.39019775390625 
390 / 1452 : pp = 212.4908905029297 
400 / 1452 : pp = 212.01914978027344 
410 / 1452 : pp = 212.36903381347656 
420 / 1452 : pp = 212.6802520751953 
430 / 1452 : pp = 212.42697143554688 
440 / 1452 : pp = 212.42990112304688 
450 / 1452 : pp = 212.14524841308594 
460 / 1452 : pp = 211.7836151123047 
470 / 1452 : pp = 211.17282104492188 
480 / 1452 : pp = 210.27903747558594 
490 / 1452 : pp = 209.95211791992188 
500 / 1452 : pp = 209.28302001953125 
510 / 1452 : pp = 209.1029815673828 
520 / 1452 : pp = 208.73855590820312 
530 / 1452 : pp = 208.19700622558594 
540 / 1452 : pp = 207.4554443359375 
550 / 1452 : pp = 207.0062255859375 
560 / 1452 : pp = 206.59739685058594 
570 / 1452 : pp = 206.27874755859375 
580 / 1452 : pp = 205.87144470214844 
590 / 1452 : pp = 205.43545532226562 
600 / 1452 : pp = 204.90940856933594 
610 / 1452 : pp = 204.5686798095703 
620 / 1452 : pp = 204.22862243652344 
630 / 1452 : pp = 203.8448028564453 
640 / 1452 : pp = 204.06576538085938 
650 / 1452 : pp = 204.0941925048828 
660 / 1452 : pp = 204.22103881835938 
670 / 1452 : pp = 204.289794921875 
680 / 1452 : pp = 204.3115234375 
690 / 1452 : pp = 204.10284423828125 
700 / 1452 : pp = 203.99757385253906 
710 / 1452 : pp = 204.04971313476562 
720 / 1452 : pp = 204.03152465820312 
730 / 1452 : pp = 203.99046325683594 
740 / 1452 : pp = 204.19786071777344 
750 / 1452 : pp = 204.1642608642578 
760 / 1452 : pp = 204.19435119628906 
770 / 1452 : pp = 204.37786865234375 
780 / 1452 : pp = 204.4965057373047 
790 / 1452 : pp = 204.6479034423828 
800 / 1452 : pp = 204.56117248535156 
810 / 1452 : pp = 204.52284240722656 
820 / 1452 : pp = 204.50978088378906 
830 / 1452 : pp = 204.7531280517578 
840 / 1452 : pp = 204.64468383789062 
850 / 1452 : pp = 204.71348571777344 
860 / 1452 : pp = 204.7399444580078 
870 / 1452 : pp = 204.69406127929688 
880 / 1452 : pp = 204.7965850830078 
890 / 1452 : pp = 204.7594757080078 
900 / 1452 : pp = 204.71446228027344 
910 / 1452 : pp = 204.7590789794922 
920 / 1452 : pp = 204.85772705078125 
930 / 1452 : pp = 204.7428741455078 
940 / 1452 : pp = 204.8068389892578 
950 / 1452 : pp = 204.75791931152344 
960 / 1452 : pp = 204.63815307617188 
970 / 1452 : pp = 204.60760498046875 
980 / 1452 : pp = 204.34347534179688 
990 / 1452 : pp = 204.151611328125 
1000 / 1452 : pp = 203.8665771484375 
1010 / 1452 : pp = 203.9164581298828 
1020 / 1452 : pp = 204.0184783935547 
1030 / 1452 : pp = 203.95166015625 
1040 / 1452 : pp = 204.03045654296875 
1050 / 1452 : pp = 203.95846557617188 
1060 / 1452 : pp = 203.77114868164062 
1070 / 1452 : pp = 203.93260192871094 
1080 / 1452 : pp = 204.00048828125 
1090 / 1452 : pp = 204.00233459472656 
1100 / 1452 : pp = 203.8960418701172 
1110 / 1452 : pp = 203.5987548828125 
1120 / 1452 : pp = 203.38392639160156 
1130 / 1452 : pp = 203.08872985839844 
1140 / 1452 : pp = 203.01272583007812 
1150 / 1452 : pp = 203.0865936279297 
1160 / 1452 : pp = 203.02308654785156 
1170 / 1452 : pp = 202.9125518798828 
1180 / 1452 : pp = 202.9097442626953 
1190 / 1452 : pp = 202.98252868652344 
1200 / 1452 : pp = 202.95387268066406 
1210 / 1452 : pp = 202.851318359375 
1220 / 1452 : pp = 202.97671508789062 
1230 / 1452 : pp = 203.1051025390625 
1240 / 1452 : pp = 203.0526123046875 
1250 / 1452 : pp = 203.21417236328125 
1260 / 1452 : pp = 203.23617553710938 
1270 / 1452 : pp = 203.22802734375 
1280 / 1452 : pp = 203.20846557617188 
1290 / 1452 : pp = 203.15362548828125 
1300 / 1452 : pp = 203.14315795898438 
1310 / 1452 : pp = 203.15264892578125 
1320 / 1452 : pp = 203.02801513671875 
1330 / 1452 : pp = 202.92977905273438 
1340 / 1452 : pp = 202.95484924316406 
1350 / 1452 : pp = 202.9335479736328 
1360 / 1452 : pp = 202.955322265625 
1370 / 1452 : pp = 202.7740478515625 
1380 / 1452 : pp = 202.68569946289062 
1390 / 1452 : pp = 202.55816650390625 
1400 / 1452 : pp = 202.41651916503906 
1410 / 1452 : pp = 202.38494873046875 
1420 / 1452 : pp = 202.27593994140625 
1430 / 1452 : pp = 202.21826171875 
1440 / 1452 : pp = 202.23272705078125 
1450 / 1452 : pp = 202.16099548339844 

0 / 115 : pp = 253.23211669921875 
10 / 115 : pp = 237.62506103515625 
20 / 115 : pp = 237.60557556152344 
30 / 115 : pp = 234.9273223876953 
40 / 115 : pp = 234.30519104003906 
50 / 115 : pp = 229.43960571289062 
60 / 115 : pp = 228.6050567626953 
70 / 115 : pp = 225.2646484375 
80 / 115 : pp = 222.55935668945312 
90 / 115 : pp = 219.83255004882812 
100 / 115 : pp = 215.5491485595703 
110 / 115 : pp = 214.07937622070312 
Training perplexity: 202.1349639892578
Validation perplexity:213.85256958007812
Total time : 42.10724234580994
Epoch 5

0 / 1452 : pp = 255.92384338378906 
10 / 1452 : pp = 219.5322265625 
20 / 1452 : pp = 214.36212158203125 
30 / 1452 : pp = 209.12620544433594 
40 / 1452 : pp = 210.04193115234375 
50 / 1452 : pp = 205.77398681640625 
60 / 1452 : pp = 204.8201141357422 
70 / 1452 : pp = 205.3955841064453 
80 / 1452 : pp = 204.8386688232422 
90 / 1452 : pp = 203.21194458007812 
100 / 1452 : pp = 201.87643432617188 
110 / 1452 : pp = 200.10122680664062 
120 / 1452 : pp = 199.82012939453125 
130 / 1452 : pp = 199.11192321777344 
140 / 1452 : pp = 197.51919555664062 
150 / 1452 : pp = 197.03567504882812 
160 / 1452 : pp = 197.4231414794922 
170 / 1452 : pp = 197.09571838378906 
180 / 1452 : pp = 196.17665100097656 
190 / 1452 : pp = 196.0064697265625 
200 / 1452 : pp = 196.7347869873047 
210 / 1452 : pp = 196.3063507080078 
220 / 1452 : pp = 196.21388244628906 
230 / 1452 : pp = 196.5252227783203 
240 / 1452 : pp = 196.203125 
250 / 1452 : pp = 195.3251953125 
260 / 1452 : pp = 194.53335571289062 
270 / 1452 : pp = 193.3546142578125 
280 / 1452 : pp = 193.59420776367188 
290 / 1452 : pp = 193.83297729492188 
300 / 1452 : pp = 193.98489379882812 
310 / 1452 : pp = 193.68414306640625 
320 / 1452 : pp = 193.89065551757812 
330 / 1452 : pp = 194.0518798828125 
340 / 1452 : pp = 193.32888793945312 
350 / 1452 : pp = 193.76219177246094 
360 / 1452 : pp = 193.56106567382812 
370 / 1452 : pp = 193.28179931640625 
380 / 1452 : pp = 192.7037811279297 
390 / 1452 : pp = 192.8145294189453 
400 / 1452 : pp = 192.43325805664062 
410 / 1452 : pp = 192.81527709960938 
420 / 1452 : pp = 193.13760375976562 
430 / 1452 : pp = 192.9148712158203 
440 / 1452 : pp = 192.92526245117188 
450 / 1452 : pp = 192.70083618164062 
460 / 1452 : pp = 192.36647033691406 
470 / 1452 : pp = 191.85394287109375 
480 / 1452 : pp = 191.07244873046875 
490 / 1452 : pp = 190.75401306152344 
500 / 1452 : pp = 190.1843719482422 
510 / 1452 : pp = 190.03334045410156 
520 / 1452 : pp = 189.72938537597656 
530 / 1452 : pp = 189.25889587402344 
540 / 1452 : pp = 188.59315490722656 
550 / 1452 : pp = 188.19313049316406 
560 / 1452 : pp = 187.80621337890625 
570 / 1452 : pp = 187.5229034423828 
580 / 1452 : pp = 187.1091766357422 
590 / 1452 : pp = 186.72592163085938 
600 / 1452 : pp = 186.2238006591797 
610 / 1452 : pp = 185.89695739746094 
620 / 1452 : pp = 185.60989379882812 
630 / 1452 : pp = 185.2689208984375 
640 / 1452 : pp = 185.47567749023438 
650 / 1452 : pp = 185.5127410888672 
660 / 1452 : pp = 185.64627075195312 
670 / 1452 : pp = 185.71311950683594 
680 / 1452 : pp = 185.72569274902344 
690 / 1452 : pp = 185.56459045410156 
700 / 1452 : pp = 185.48681640625 
710 / 1452 : pp = 185.5458221435547 
720 / 1452 : pp = 185.5598907470703 
730 / 1452 : pp = 185.5335235595703 
740 / 1452 : pp = 185.73995971679688 
750 / 1452 : pp = 185.744384765625 
760 / 1452 : pp = 185.81268310546875 
770 / 1452 : pp = 186.00088500976562 
780 / 1452 : pp = 186.14443969726562 
790 / 1452 : pp = 186.30764770507812 
800 / 1452 : pp = 186.2595977783203 
810 / 1452 : pp = 186.23028564453125 
820 / 1452 : pp = 186.23997497558594 
830 / 1452 : pp = 186.49057006835938 
840 / 1452 : pp = 186.43331909179688 
850 / 1452 : pp = 186.48887634277344 
860 / 1452 : pp = 186.51502990722656 
870 / 1452 : pp = 186.5167999267578 
880 / 1452 : pp = 186.62400817871094 
890 / 1452 : pp = 186.6103973388672 
900 / 1452 : pp = 186.58111572265625 
910 / 1452 : pp = 186.64126586914062 
920 / 1452 : pp = 186.7366180419922 
930 / 1452 : pp = 186.65719604492188 
940 / 1452 : pp = 186.71755981445312 
950 / 1452 : pp = 186.6977996826172 
960 / 1452 : pp = 186.62774658203125 
970 / 1452 : pp = 186.62115478515625 
980 / 1452 : pp = 186.3773193359375 
990 / 1452 : pp = 186.23109436035156 
1000 / 1452 : pp = 185.99227905273438 
1010 / 1452 : pp = 186.0488739013672 
1020 / 1452 : pp = 186.1744384765625 
1030 / 1452 : pp = 186.1162109375 
1040 / 1452 : pp = 186.18899536132812 
1050 / 1452 : pp = 186.1549072265625 
1060 / 1452 : pp = 186.01419067382812 
1070 / 1452 : pp = 186.17364501953125 
1080 / 1452 : pp = 186.27061462402344 
1090 / 1452 : pp = 186.28428649902344 
1100 / 1452 : pp = 186.2150115966797 
1110 / 1452 : pp = 185.95103454589844 
1120 / 1452 : pp = 185.77423095703125 
1130 / 1452 : pp = 185.5232696533203 
1140 / 1452 : pp = 185.4607391357422 
1150 / 1452 : pp = 185.56077575683594 
1160 / 1452 : pp = 185.53343200683594 
1170 / 1452 : pp = 185.46453857421875 
1180 / 1452 : pp = 185.4741668701172 
1190 / 1452 : pp = 185.5594482421875 
1200 / 1452 : pp = 185.53785705566406 
1210 / 1452 : pp = 185.4576416015625 
1220 / 1452 : pp = 185.5943145751953 
1230 / 1452 : pp = 185.7483673095703 
1240 / 1452 : pp = 185.70762634277344 
1250 / 1452 : pp = 185.8568115234375 
1260 / 1452 : pp = 185.90635681152344 
1270 / 1452 : pp = 185.8961639404297 
1280 / 1452 : pp = 185.89199829101562 
1290 / 1452 : pp = 185.85911560058594 
1300 / 1452 : pp = 185.86097717285156 
1310 / 1452 : pp = 185.88739013671875 
1320 / 1452 : pp = 185.79248046875 
1330 / 1452 : pp = 185.69700622558594 
1340 / 1452 : pp = 185.7310028076172 
1350 / 1452 : pp = 185.72613525390625 
1360 / 1452 : pp = 185.76829528808594 
1370 / 1452 : pp = 185.6322021484375 
1380 / 1452 : pp = 185.56378173828125 
1390 / 1452 : pp = 185.4654998779297 
1400 / 1452 : pp = 185.35110473632812 
1410 / 1452 : pp = 185.33917236328125 
1420 / 1452 : pp = 185.2509002685547 
1430 / 1452 : pp = 185.20436096191406 
1440 / 1452 : pp = 185.2254638671875 
1450 / 1452 : pp = 185.16542053222656 

0 / 115 : pp = 242.26800537109375 
10 / 115 : pp = 226.12258911132812 
20 / 115 : pp = 226.4702606201172 
30 / 115 : pp = 223.982666015625 
40 / 115 : pp = 223.376953125 
50 / 115 : pp = 218.65716552734375 
60 / 115 : pp = 217.95306396484375 
70 / 115 : pp = 214.5392303466797 
80 / 115 : pp = 212.07525634765625 
90 / 115 : pp = 209.40631103515625 
100 / 115 : pp = 205.1455078125 
110 / 115 : pp = 203.6289520263672 
Training perplexity: 185.14476013183594
Validation perplexity:203.3822784423828
Total time : 42.47052240371704
Epoch 6

0 / 1452 : pp = 233.56707763671875 
10 / 1452 : pp = 202.6468505859375 
20 / 1452 : pp = 198.2734375 
30 / 1452 : pp = 193.47442626953125 
40 / 1452 : pp = 195.17147827148438 
50 / 1452 : pp = 191.5596923828125 
60 / 1452 : pp = 190.4825897216797 
70 / 1452 : pp = 191.07681274414062 
80 / 1452 : pp = 190.339599609375 
90 / 1452 : pp = 188.98277282714844 
100 / 1452 : pp = 187.74757385253906 
110 / 1452 : pp = 186.10104370117188 
120 / 1452 : pp = 185.7500457763672 
130 / 1452 : pp = 184.90707397460938 
140 / 1452 : pp = 183.340087890625 
150 / 1452 : pp = 182.70840454101562 
160 / 1452 : pp = 183.1043701171875 
170 / 1452 : pp = 182.69776916503906 
180 / 1452 : pp = 181.88400268554688 
190 / 1452 : pp = 181.8062286376953 
200 / 1452 : pp = 182.4969940185547 
210 / 1452 : pp = 182.10572814941406 
220 / 1452 : pp = 181.9981689453125 
230 / 1452 : pp = 182.3802490234375 
240 / 1452 : pp = 182.03636169433594 
250 / 1452 : pp = 181.23712158203125 
260 / 1452 : pp = 180.53726196289062 
270 / 1452 : pp = 179.53567504882812 
280 / 1452 : pp = 179.70208740234375 
290 / 1452 : pp = 179.977783203125 
300 / 1452 : pp = 180.16600036621094 
310 / 1452 : pp = 179.87294006347656 
320 / 1452 : pp = 180.11849975585938 
330 / 1452 : pp = 180.31838989257812 
340 / 1452 : pp = 179.56759643554688 
350 / 1452 : pp = 179.97134399414062 
360 / 1452 : pp = 179.80030822753906 
370 / 1452 : pp = 179.52085876464844 
380 / 1452 : pp = 178.98228454589844 
390 / 1452 : pp = 179.0868682861328 
400 / 1452 : pp = 178.74569702148438 
410 / 1452 : pp = 179.1776580810547 
420 / 1452 : pp = 179.5055389404297 
430 / 1452 : pp = 179.3883056640625 
440 / 1452 : pp = 179.42279052734375 
450 / 1452 : pp = 179.2106475830078 
460 / 1452 : pp = 178.85311889648438 
470 / 1452 : pp = 178.33840942382812 
480 / 1452 : pp = 177.60350036621094 
490 / 1452 : pp = 177.30335998535156 
500 / 1452 : pp = 176.72222900390625 
510 / 1452 : pp = 176.6067352294922 
520 / 1452 : pp = 176.33998107910156 
530 / 1452 : pp = 175.93162536621094 
540 / 1452 : pp = 175.30657958984375 
550 / 1452 : pp = 174.9462432861328 
560 / 1452 : pp = 174.5836639404297 
570 / 1452 : pp = 174.31431579589844 
580 / 1452 : pp = 173.92300415039062 
590 / 1452 : pp = 173.55856323242188 
600 / 1452 : pp = 173.08277893066406 
610 / 1452 : pp = 172.75930786132812 
620 / 1452 : pp = 172.53192138671875 
630 / 1452 : pp = 172.20652770996094 
640 / 1452 : pp = 172.37454223632812 
650 / 1452 : pp = 172.39845275878906 
660 / 1452 : pp = 172.52255249023438 
670 / 1452 : pp = 172.60935974121094 
680 / 1452 : pp = 172.6611328125 
690 / 1452 : pp = 172.53118896484375 
700 / 1452 : pp = 172.4709014892578 
710 / 1452 : pp = 172.5406494140625 
720 / 1452 : pp = 172.55447387695312 
730 / 1452 : pp = 172.5330047607422 
740 / 1452 : pp = 172.7061767578125 
750 / 1452 : pp = 172.71054077148438 
760 / 1452 : pp = 172.77743530273438 
770 / 1452 : pp = 172.95481872558594 
780 / 1452 : pp = 173.11265563964844 
790 / 1452 : pp = 173.2832794189453 
800 / 1452 : pp = 173.2537841796875 
810 / 1452 : pp = 173.22164916992188 
820 / 1452 : pp = 173.24148559570312 
830 / 1452 : pp = 173.48228454589844 
840 / 1452 : pp = 173.43753051757812 
850 / 1452 : pp = 173.505615234375 
860 / 1452 : pp = 173.5214080810547 
870 / 1452 : pp = 173.5009002685547 
880 / 1452 : pp = 173.6202392578125 
890 / 1452 : pp = 173.622802734375 
900 / 1452 : pp = 173.5987091064453 
910 / 1452 : pp = 173.68316650390625 
920 / 1452 : pp = 173.77330017089844 
930 / 1452 : pp = 173.72018432617188 
940 / 1452 : pp = 173.79351806640625 
950 / 1452 : pp = 173.7653350830078 
960 / 1452 : pp = 173.7102508544922 
970 / 1452 : pp = 173.69766235351562 
980 / 1452 : pp = 173.4836883544922 
990 / 1452 : pp = 173.3550262451172 
1000 / 1452 : pp = 173.14816284179688 
1010 / 1452 : pp = 173.20777893066406 
1020 / 1452 : pp = 173.3390655517578 
1030 / 1452 : pp = 173.2884063720703 
1040 / 1452 : pp = 173.38015747070312 
1050 / 1452 : pp = 173.35592651367188 
1060 / 1452 : pp = 173.2260284423828 
1070 / 1452 : pp = 173.39321899414062 
1080 / 1452 : pp = 173.4879913330078 
1090 / 1452 : pp = 173.5231475830078 
1100 / 1452 : pp = 173.47177124023438 
1110 / 1452 : pp = 173.24453735351562 
1120 / 1452 : pp = 173.09408569335938 
1130 / 1452 : pp = 172.86627197265625 
1140 / 1452 : pp = 172.8234100341797 
1150 / 1452 : pp = 172.92843627929688 
1160 / 1452 : pp = 172.90065002441406 
1170 / 1452 : pp = 172.8550567626953 
1180 / 1452 : pp = 172.8810272216797 
1190 / 1452 : pp = 172.97312927246094 
1200 / 1452 : pp = 172.9776611328125 
1210 / 1452 : pp = 172.89413452148438 
1220 / 1452 : pp = 173.0257568359375 
1230 / 1452 : pp = 173.1847381591797 
1240 / 1452 : pp = 173.1756591796875 
1250 / 1452 : pp = 173.32138061523438 
1260 / 1452 : pp = 173.37229919433594 
1270 / 1452 : pp = 173.36891174316406 
1280 / 1452 : pp = 173.36337280273438 
1290 / 1452 : pp = 173.3444366455078 
1300 / 1452 : pp = 173.36138916015625 
1310 / 1452 : pp = 173.4015655517578 
1320 / 1452 : pp = 173.31790161132812 
1330 / 1452 : pp = 173.24710083007812 
1340 / 1452 : pp = 173.27212524414062 
1350 / 1452 : pp = 173.27674865722656 
1360 / 1452 : pp = 173.32749938964844 
1370 / 1452 : pp = 173.20472717285156 
1380 / 1452 : pp = 173.14889526367188 
1390 / 1452 : pp = 173.0755157470703 
1400 / 1452 : pp = 172.9678497314453 
1410 / 1452 : pp = 172.9612579345703 
1420 / 1452 : pp = 172.8872833251953 
1430 / 1452 : pp = 172.84805297851562 
1440 / 1452 : pp = 172.87252807617188 
1450 / 1452 : pp = 172.82505798339844 

0 / 115 : pp = 236.35635375976562 
10 / 115 : pp = 219.06166076660156 
20 / 115 : pp = 219.7670440673828 
30 / 115 : pp = 217.33587646484375 
40 / 115 : pp = 216.6626739501953 
50 / 115 : pp = 212.04734802246094 
60 / 115 : pp = 211.42068481445312 
70 / 115 : pp = 207.9592742919922 
80 / 115 : pp = 205.6216583251953 
90 / 115 : pp = 202.93597412109375 
100 / 115 : pp = 198.62583923339844 
110 / 115 : pp = 196.97216796875 
Training perplexity: 172.80404663085938
Validation perplexity:196.6871337890625
Total time : 41.52522921562195
Epoch 7

0 / 1452 : pp = 219.23231506347656 
10 / 1452 : pp = 192.07225036621094 
20 / 1452 : pp = 187.48464965820312 
30 / 1452 : pp = 182.9149932861328 
40 / 1452 : pp = 184.2945098876953 
50 / 1452 : pp = 180.78492736816406 
60 / 1452 : pp = 179.377197265625 
70 / 1452 : pp = 180.0273895263672 
80 / 1452 : pp = 179.2517547607422 
90 / 1452 : pp = 177.77540588378906 
100 / 1452 : pp = 176.6474151611328 
110 / 1452 : pp = 174.84066772460938 
120 / 1452 : pp = 174.46890258789062 
130 / 1452 : pp = 173.64573669433594 
140 / 1452 : pp = 172.17483520507812 
150 / 1452 : pp = 171.57041931152344 
160 / 1452 : pp = 171.92059326171875 
170 / 1452 : pp = 171.5497283935547 
180 / 1452 : pp = 170.77249145507812 
190 / 1452 : pp = 170.72103881835938 
200 / 1452 : pp = 171.336181640625 
210 / 1452 : pp = 170.98524475097656 
220 / 1452 : pp = 170.99771118164062 
230 / 1452 : pp = 171.39918518066406 
240 / 1452 : pp = 171.09925842285156 
250 / 1452 : pp = 170.39962768554688 
260 / 1452 : pp = 169.7328643798828 
270 / 1452 : pp = 168.72225952148438 
280 / 1452 : pp = 168.92552185058594 
290 / 1452 : pp = 169.20147705078125 
300 / 1452 : pp = 169.40338134765625 
310 / 1452 : pp = 169.12057495117188 
320 / 1452 : pp = 169.31236267089844 
330 / 1452 : pp = 169.49945068359375 
340 / 1452 : pp = 168.8396759033203 
350 / 1452 : pp = 169.25917053222656 
360 / 1452 : pp = 169.09388732910156 
370 / 1452 : pp = 168.84323120117188 
380 / 1452 : pp = 168.3832550048828 
390 / 1452 : pp = 168.48275756835938 
400 / 1452 : pp = 168.19972229003906 
410 / 1452 : pp = 168.5838623046875 
420 / 1452 : pp = 168.91119384765625 
430 / 1452 : pp = 168.80836486816406 
440 / 1452 : pp = 168.90264892578125 
450 / 1452 : pp = 168.68589782714844 
460 / 1452 : pp = 168.3704071044922 
470 / 1452 : pp = 167.90394592285156 
480 / 1452 : pp = 167.23373413085938 
490 / 1452 : pp = 166.9560546875 
500 / 1452 : pp = 166.43161010742188 
510 / 1452 : pp = 166.320068359375 
520 / 1452 : pp = 166.05902099609375 
530 / 1452 : pp = 165.71714782714844 
540 / 1452 : pp = 165.10398864746094 
550 / 1452 : pp = 164.80430603027344 
560 / 1452 : pp = 164.4687042236328 
570 / 1452 : pp = 164.2272491455078 
580 / 1452 : pp = 163.84312438964844 
590 / 1452 : pp = 163.46035766601562 
600 / 1452 : pp = 163.01559448242188 
610 / 1452 : pp = 162.74134826660156 
620 / 1452 : pp = 162.50267028808594 
630 / 1452 : pp = 162.2018280029297 
640 / 1452 : pp = 162.37130737304688 
650 / 1452 : pp = 162.3895721435547 
660 / 1452 : pp = 162.51351928710938 
670 / 1452 : pp = 162.57684326171875 
680 / 1452 : pp = 162.6346893310547 
690 / 1452 : pp = 162.5135955810547 
700 / 1452 : pp = 162.47052001953125 
710 / 1452 : pp = 162.539794921875 
720 / 1452 : pp = 162.55381774902344 
730 / 1452 : pp = 162.5297088623047 
740 / 1452 : pp = 162.71652221679688 
750 / 1452 : pp = 162.740966796875 
760 / 1452 : pp = 162.79754638671875 
770 / 1452 : pp = 162.9949951171875 
780 / 1452 : pp = 163.17868041992188 
790 / 1452 : pp = 163.33055114746094 
800 / 1452 : pp = 163.31591796875 
810 / 1452 : pp = 163.2859344482422 
820 / 1452 : pp = 163.2958984375 
830 / 1452 : pp = 163.528564453125 
840 / 1452 : pp = 163.47610473632812 
850 / 1452 : pp = 163.5260772705078 
860 / 1452 : pp = 163.55352783203125 
870 / 1452 : pp = 163.55718994140625 
880 / 1452 : pp = 163.67523193359375 
890 / 1452 : pp = 163.6920166015625 
900 / 1452 : pp = 163.67710876464844 
910 / 1452 : pp = 163.7476806640625 
920 / 1452 : pp = 163.84803771972656 
930 / 1452 : pp = 163.8114013671875 
940 / 1452 : pp = 163.86663818359375 
950 / 1452 : pp = 163.83531188964844 
960 / 1452 : pp = 163.79945373535156 
970 / 1452 : pp = 163.80320739746094 
980 / 1452 : pp = 163.5953369140625 
990 / 1452 : pp = 163.48382568359375 
1000 / 1452 : pp = 163.2642822265625 
1010 / 1452 : pp = 163.32113647460938 
1020 / 1452 : pp = 163.44204711914062 
1030 / 1452 : pp = 163.40206909179688 
1040 / 1452 : pp = 163.4915313720703 
1050 / 1452 : pp = 163.47096252441406 
1060 / 1452 : pp = 163.3601531982422 
1070 / 1452 : pp = 163.5138397216797 
1080 / 1452 : pp = 163.6189727783203 
1090 / 1452 : pp = 163.6471405029297 
1100 / 1452 : pp = 163.60406494140625 
1110 / 1452 : pp = 163.40736389160156 
1120 / 1452 : pp = 163.26841735839844 
1130 / 1452 : pp = 163.0680694580078 
1140 / 1452 : pp = 163.04591369628906 
1150 / 1452 : pp = 163.15478515625 
1160 / 1452 : pp = 163.1380615234375 
1170 / 1452 : pp = 163.09303283691406 
1180 / 1452 : pp = 163.14149475097656 
1190 / 1452 : pp = 163.2374267578125 
1200 / 1452 : pp = 163.2394561767578 
1210 / 1452 : pp = 163.17835998535156 
1220 / 1452 : pp = 163.32347106933594 
1230 / 1452 : pp = 163.4639434814453 
1240 / 1452 : pp = 163.4611358642578 
1250 / 1452 : pp = 163.60687255859375 
1260 / 1452 : pp = 163.67227172851562 
1270 / 1452 : pp = 163.67515563964844 
1280 / 1452 : pp = 163.6881103515625 
1290 / 1452 : pp = 163.66648864746094 
1300 / 1452 : pp = 163.69287109375 
1310 / 1452 : pp = 163.7276153564453 
1320 / 1452 : pp = 163.6551055908203 
1330 / 1452 : pp = 163.58901977539062 
1340 / 1452 : pp = 163.6205291748047 
1350 / 1452 : pp = 163.63824462890625 
1360 / 1452 : pp = 163.69334411621094 
1370 / 1452 : pp = 163.5885467529297 
1380 / 1452 : pp = 163.54049682617188 
1390 / 1452 : pp = 163.4760284423828 
1400 / 1452 : pp = 163.38897705078125 
1410 / 1452 : pp = 163.3974609375 
1420 / 1452 : pp = 163.35009765625 
1430 / 1452 : pp = 163.32191467285156 
1440 / 1452 : pp = 163.35220336914062 
1450 / 1452 : pp = 163.3201904296875 

0 / 115 : pp = 232.2108154296875 
10 / 115 : pp = 214.35496520996094 
20 / 115 : pp = 215.20510864257812 
30 / 115 : pp = 212.82754516601562 
40 / 115 : pp = 212.0598907470703 
50 / 115 : pp = 207.5095672607422 
60 / 115 : pp = 206.86976623535156 
70 / 115 : pp = 203.36016845703125 
80 / 115 : pp = 201.11538696289062 
90 / 115 : pp = 198.52120971679688 
100 / 115 : pp = 194.1772003173828 
110 / 115 : pp = 192.41224670410156 
Training perplexity: 163.29916381835938
Validation perplexity:192.09552001953125
Total time : 41.78096055984497
Epoch 8

0 / 1452 : pp = 201.77548217773438 
10 / 1452 : pp = 180.4141082763672 
20 / 1452 : pp = 176.41432189941406 
30 / 1452 : pp = 172.7764434814453 
40 / 1452 : pp = 174.69166564941406 
50 / 1452 : pp = 171.2933807373047 
60 / 1452 : pp = 170.08010864257812 
70 / 1452 : pp = 170.6719512939453 
80 / 1452 : pp = 170.07589721679688 
90 / 1452 : pp = 168.7478485107422 
100 / 1452 : pp = 167.57081604003906 
110 / 1452 : pp = 166.06971740722656 
120 / 1452 : pp = 165.73374938964844 
130 / 1452 : pp = 164.80674743652344 
140 / 1452 : pp = 163.32821655273438 
150 / 1452 : pp = 162.6752471923828 
160 / 1452 : pp = 163.02049255371094 
170 / 1452 : pp = 162.64120483398438 
180 / 1452 : pp = 161.95529174804688 
190 / 1452 : pp = 161.91954040527344 
200 / 1452 : pp = 162.5446014404297 
210 / 1452 : pp = 162.2645721435547 
220 / 1452 : pp = 162.3128662109375 
230 / 1452 : pp = 162.65872192382812 
240 / 1452 : pp = 162.40948486328125 
250 / 1452 : pp = 161.75787353515625 
260 / 1452 : pp = 161.15213012695312 
270 / 1452 : pp = 160.22256469726562 
280 / 1452 : pp = 160.3651123046875 
290 / 1452 : pp = 160.63780212402344 
300 / 1452 : pp = 160.80026245117188 
310 / 1452 : pp = 160.54383850097656 
320 / 1452 : pp = 160.7539520263672 
330 / 1452 : pp = 160.94317626953125 
340 / 1452 : pp = 160.3373565673828 
350 / 1452 : pp = 160.71763610839844 
360 / 1452 : pp = 160.60960388183594 
370 / 1452 : pp = 160.37527465820312 
380 / 1452 : pp = 159.92990112304688 
390 / 1452 : pp = 160.0165557861328 
400 / 1452 : pp = 159.75697326660156 
410 / 1452 : pp = 160.15274047851562 
420 / 1452 : pp = 160.48390197753906 
430 / 1452 : pp = 160.4031982421875 
440 / 1452 : pp = 160.4693603515625 
450 / 1452 : pp = 160.28016662597656 
460 / 1452 : pp = 159.94004821777344 
470 / 1452 : pp = 159.48257446289062 
480 / 1452 : pp = 158.87998962402344 
490 / 1452 : pp = 158.59765625 
500 / 1452 : pp = 158.10865783691406 
510 / 1452 : pp = 157.96795654296875 
520 / 1452 : pp = 157.7591552734375 
530 / 1452 : pp = 157.42648315429688 
540 / 1452 : pp = 156.85348510742188 
550 / 1452 : pp = 156.5618438720703 
560 / 1452 : pp = 156.24905395507812 
570 / 1452 : pp = 155.9994354248047 
580 / 1452 : pp = 155.612060546875 
590 / 1452 : pp = 155.25830078125 
600 / 1452 : pp = 154.8464813232422 
610 / 1452 : pp = 154.5833282470703 
620 / 1452 : pp = 154.38040161132812 
630 / 1452 : pp = 154.0767364501953 
640 / 1452 : pp = 154.2534637451172 
650 / 1452 : pp = 154.25875854492188 
660 / 1452 : pp = 154.35874938964844 
670 / 1452 : pp = 154.4289093017578 
680 / 1452 : pp = 154.51412963867188 
690 / 1452 : pp = 154.41676330566406 
700 / 1452 : pp = 154.37892150878906 
710 / 1452 : pp = 154.4234619140625 
720 / 1452 : pp = 154.4586639404297 
730 / 1452 : pp = 154.4351806640625 
740 / 1452 : pp = 154.6002197265625 
750 / 1452 : pp = 154.65684509277344 
760 / 1452 : pp = 154.73318481445312 
770 / 1452 : pp = 154.92935180664062 
780 / 1452 : pp = 155.1021728515625 
790 / 1452 : pp = 155.24757385253906 
800 / 1452 : pp = 155.223876953125 
810 / 1452 : pp = 155.2095184326172 
820 / 1452 : pp = 155.24009704589844 
830 / 1452 : pp = 155.4519500732422 
840 / 1452 : pp = 155.3947296142578 
850 / 1452 : pp = 155.45306396484375 
860 / 1452 : pp = 155.4661102294922 
870 / 1452 : pp = 155.45765686035156 
880 / 1452 : pp = 155.58758544921875 
890 / 1452 : pp = 155.59373474121094 
900 / 1452 : pp = 155.59254455566406 
910 / 1452 : pp = 155.66854858398438 
920 / 1452 : pp = 155.75942993164062 
930 / 1452 : pp = 155.73350524902344 
940 / 1452 : pp = 155.80740356445312 
950 / 1452 : pp = 155.7733917236328 
960 / 1452 : pp = 155.73565673828125 
970 / 1452 : pp = 155.74404907226562 
980 / 1452 : pp = 155.55902099609375 
990 / 1452 : pp = 155.45675659179688 
1000 / 1452 : pp = 155.2649688720703 
1010 / 1452 : pp = 155.31332397460938 
1020 / 1452 : pp = 155.44979858398438 
1030 / 1452 : pp = 155.4137725830078 
1040 / 1452 : pp = 155.49012756347656 
1050 / 1452 : pp = 155.46054077148438 
1060 / 1452 : pp = 155.3616943359375 
1070 / 1452 : pp = 155.5286865234375 
1080 / 1452 : pp = 155.63743591308594 
1090 / 1452 : pp = 155.6842803955078 
1100 / 1452 : pp = 155.65599060058594 
1110 / 1452 : pp = 155.4827880859375 
1120 / 1452 : pp = 155.35450744628906 
1130 / 1452 : pp = 155.1777801513672 
1140 / 1452 : pp = 155.15994262695312 
1150 / 1452 : pp = 155.26193237304688 
1160 / 1452 : pp = 155.26214599609375 
1170 / 1452 : pp = 155.23231506347656 
1180 / 1452 : pp = 155.29266357421875 
1190 / 1452 : pp = 155.37680053710938 
1200 / 1452 : pp = 155.3736114501953 
1210 / 1452 : pp = 155.3380584716797 
1220 / 1452 : pp = 155.474853515625 
1230 / 1452 : pp = 155.62986755371094 
1240 / 1452 : pp = 155.62831115722656 
1250 / 1452 : pp = 155.77101135253906 
1260 / 1452 : pp = 155.83445739746094 
1270 / 1452 : pp = 155.845458984375 
1280 / 1452 : pp = 155.8556365966797 
1290 / 1452 : pp = 155.8556365966797 
1300 / 1452 : pp = 155.8843994140625 
1310 / 1452 : pp = 155.92417907714844 
1320 / 1452 : pp = 155.8560791015625 
1330 / 1452 : pp = 155.80636596679688 
1340 / 1452 : pp = 155.84344482421875 
1350 / 1452 : pp = 155.8706512451172 
1360 / 1452 : pp = 155.9273681640625 
1370 / 1452 : pp = 155.83140563964844 
1380 / 1452 : pp = 155.7911376953125 
1390 / 1452 : pp = 155.7401885986328 
1400 / 1452 : pp = 155.6622314453125 
1410 / 1452 : pp = 155.68531799316406 
1420 / 1452 : pp = 155.64041137695312 
1430 / 1452 : pp = 155.62216186523438 
1440 / 1452 : pp = 155.6437530517578 
1450 / 1452 : pp = 155.62757873535156 

0 / 115 : pp = 228.70111083984375 
10 / 115 : pp = 211.03330993652344 
20 / 115 : pp = 212.24957275390625 
30 / 115 : pp = 209.8839569091797 
40 / 115 : pp = 209.11045837402344 
50 / 115 : pp = 204.66351318359375 
60 / 115 : pp = 204.03366088867188 
70 / 115 : pp = 200.46681213378906 
80 / 115 : pp = 198.24404907226562 
90 / 115 : pp = 195.63223266601562 
100 / 115 : pp = 191.18345642089844 
110 / 115 : pp = 189.31134033203125 
Training perplexity: 155.61154174804688
Validation perplexity:188.94537353515625
Total time : 42.13483738899231
Epoch 9

0 / 1452 : pp = 197.80628967285156 
10 / 1452 : pp = 172.6316680908203 
20 / 1452 : pp = 168.6739959716797 
30 / 1452 : pp = 164.4781036376953 
40 / 1452 : pp = 166.1627960205078 
50 / 1452 : pp = 163.05197143554688 
60 / 1452 : pp = 161.87924194335938 
70 / 1452 : pp = 162.5297088623047 
80 / 1452 : pp = 161.7450714111328 
90 / 1452 : pp = 160.6148223876953 
100 / 1452 : pp = 159.73289489746094 
110 / 1452 : pp = 158.4092254638672 
120 / 1452 : pp = 158.04653930664062 
130 / 1452 : pp = 157.13563537597656 
140 / 1452 : pp = 155.71798706054688 
150 / 1452 : pp = 155.19161987304688 
160 / 1452 : pp = 155.42718505859375 
170 / 1452 : pp = 155.0531463623047 
180 / 1452 : pp = 154.46897888183594 
190 / 1452 : pp = 154.4127197265625 
200 / 1452 : pp = 154.97154235839844 
210 / 1452 : pp = 154.70169067382812 
220 / 1452 : pp = 154.72816467285156 
230 / 1452 : pp = 155.03799438476562 
240 / 1452 : pp = 154.85601806640625 
250 / 1452 : pp = 154.28016662597656 
260 / 1452 : pp = 153.7699432373047 
270 / 1452 : pp = 152.90948486328125 
280 / 1452 : pp = 153.0459747314453 
290 / 1452 : pp = 153.298095703125 
300 / 1452 : pp = 153.45716857910156 
310 / 1452 : pp = 153.22195434570312 
320 / 1452 : pp = 153.41664123535156 
330 / 1452 : pp = 153.66542053222656 
340 / 1452 : pp = 153.06378173828125 
350 / 1452 : pp = 153.43923950195312 
360 / 1452 : pp = 153.31381225585938 
370 / 1452 : pp = 153.13473510742188 
380 / 1452 : pp = 152.75267028808594 
390 / 1452 : pp = 152.85504150390625 
400 / 1452 : pp = 152.62342834472656 
410 / 1452 : pp = 153.03152465820312 
420 / 1452 : pp = 153.39161682128906 
430 / 1452 : pp = 153.30364990234375 
440 / 1452 : pp = 153.37896728515625 
450 / 1452 : pp = 153.18988037109375 
460 / 1452 : pp = 152.88478088378906 
470 / 1452 : pp = 152.4380340576172 
480 / 1452 : pp = 151.86618041992188 
490 / 1452 : pp = 151.5962371826172 
500 / 1452 : pp = 151.11614990234375 
510 / 1452 : pp = 150.99830627441406 
520 / 1452 : pp = 150.8135986328125 
530 / 1452 : pp = 150.500732421875 
540 / 1452 : pp = 149.9623260498047 
550 / 1452 : pp = 149.68028259277344 
560 / 1452 : pp = 149.3885040283203 
570 / 1452 : pp = 149.140380859375 
580 / 1452 : pp = 148.76876831054688 
590 / 1452 : pp = 148.43368530273438 
600 / 1452 : pp = 148.02598571777344 
610 / 1452 : pp = 147.7869110107422 
620 / 1452 : pp = 147.59796142578125 
630 / 1452 : pp = 147.30068969726562 
640 / 1452 : pp = 147.45240783691406 
650 / 1452 : pp = 147.4651336669922 
660 / 1452 : pp = 147.5808563232422 
670 / 1452 : pp = 147.65582275390625 
680 / 1452 : pp = 147.7360382080078 
690 / 1452 : pp = 147.63075256347656 
700 / 1452 : pp = 147.6066131591797 
710 / 1452 : pp = 147.7024383544922 
720 / 1452 : pp = 147.7445526123047 
730 / 1452 : pp = 147.72279357910156 
740 / 1452 : pp = 147.87107849121094 
750 / 1452 : pp = 147.91436767578125 
760 / 1452 : pp = 147.9857635498047 
770 / 1452 : pp = 148.18206787109375 
780 / 1452 : pp = 148.3845672607422 
790 / 1452 : pp = 148.5517120361328 
800 / 1452 : pp = 148.54002380371094 
810 / 1452 : pp = 148.51119995117188 
820 / 1452 : pp = 148.5664520263672 
830 / 1452 : pp = 148.7821044921875 
840 / 1452 : pp = 148.72486877441406 
850 / 1452 : pp = 148.77452087402344 
860 / 1452 : pp = 148.80076599121094 
870 / 1452 : pp = 148.79701232910156 
880 / 1452 : pp = 148.9181671142578 
890 / 1452 : pp = 148.94537353515625 
900 / 1452 : pp = 148.9435272216797 
910 / 1452 : pp = 149.02102661132812 
920 / 1452 : pp = 149.1085968017578 
930 / 1452 : pp = 149.06893920898438 
940 / 1452 : pp = 149.1317138671875 
950 / 1452 : pp = 149.1232452392578 
960 / 1452 : pp = 149.10354614257812 
970 / 1452 : pp = 149.11656188964844 
980 / 1452 : pp = 148.94259643554688 
990 / 1452 : pp = 148.8236846923828 
1000 / 1452 : pp = 148.633056640625 
1010 / 1452 : pp = 148.6830291748047 
1020 / 1452 : pp = 148.8126220703125 
1030 / 1452 : pp = 148.78089904785156 
1040 / 1452 : pp = 148.8600311279297 
1050 / 1452 : pp = 148.8486785888672 
1060 / 1452 : pp = 148.7664337158203 
1070 / 1452 : pp = 148.9337921142578 
1080 / 1452 : pp = 149.04441833496094 
1090 / 1452 : pp = 149.07284545898438 
1100 / 1452 : pp = 149.03318786621094 
1110 / 1452 : pp = 148.86428833007812 
1120 / 1452 : pp = 148.7332305908203 
1130 / 1452 : pp = 148.5670166015625 
1140 / 1452 : pp = 148.54661560058594 
1150 / 1452 : pp = 148.64219665527344 
1160 / 1452 : pp = 148.6490020751953 
1170 / 1452 : pp = 148.62420654296875 
1180 / 1452 : pp = 148.67665100097656 
1190 / 1452 : pp = 148.7633056640625 
1200 / 1452 : pp = 148.7782745361328 
1210 / 1452 : pp = 148.72500610351562 
1220 / 1452 : pp = 148.87493896484375 
1230 / 1452 : pp = 149.039794921875 
1240 / 1452 : pp = 149.04000854492188 
1250 / 1452 : pp = 149.17054748535156 
1260 / 1452 : pp = 149.23863220214844 
1270 / 1452 : pp = 149.2436065673828 
1280 / 1452 : pp = 149.25086975097656 
1290 / 1452 : pp = 149.24147033691406 
1300 / 1452 : pp = 149.27413940429688 
1310 / 1452 : pp = 149.32077026367188 
1320 / 1452 : pp = 149.27301025390625 
1330 / 1452 : pp = 149.23080444335938 
1340 / 1452 : pp = 149.25791931152344 
1350 / 1452 : pp = 149.2841033935547 
1360 / 1452 : pp = 149.337158203125 
1370 / 1452 : pp = 149.2467498779297 
1380 / 1452 : pp = 149.21351623535156 
1390 / 1452 : pp = 149.15403747558594 
1400 / 1452 : pp = 149.0877685546875 
1410 / 1452 : pp = 149.110595703125 
1420 / 1452 : pp = 149.07241821289062 
1430 / 1452 : pp = 149.05166625976562 
1440 / 1452 : pp = 149.0776824951172 
1450 / 1452 : pp = 149.06771850585938 

0 / 115 : pp = 227.0559844970703 
10 / 115 : pp = 208.7002410888672 
20 / 115 : pp = 210.38775634765625 
30 / 115 : pp = 207.9513397216797 
40 / 115 : pp = 207.12994384765625 
50 / 115 : pp = 202.70811462402344 
60 / 115 : pp = 202.05787658691406 
70 / 115 : pp = 198.3761444091797 
80 / 115 : pp = 196.17637634277344 
90 / 115 : pp = 193.5880126953125 
100 / 115 : pp = 189.0758819580078 
110 / 115 : pp = 187.07528686523438 
Training perplexity: 149.0502471923828
Validation perplexity:186.6911163330078
Total time : 47.274805545806885
Epoch 10

0 / 1452 : pp = 181.8408203125 
10 / 1452 : pp = 164.99664306640625 
20 / 1452 : pp = 161.8847198486328 
30 / 1452 : pp = 158.30064392089844 
40 / 1452 : pp = 160.13914489746094 
50 / 1452 : pp = 157.58743286132812 
60 / 1452 : pp = 156.11871337890625 
70 / 1452 : pp = 156.82948303222656 
80 / 1452 : pp = 156.2889862060547 
90 / 1452 : pp = 155.04833984375 
100 / 1452 : pp = 154.09327697753906 
110 / 1452 : pp = 152.5070343017578 
120 / 1452 : pp = 152.20750427246094 
130 / 1452 : pp = 151.3399200439453 
140 / 1452 : pp = 149.90740966796875 
150 / 1452 : pp = 149.345703125 
160 / 1452 : pp = 149.59814453125 
170 / 1452 : pp = 149.26539611816406 
180 / 1452 : pp = 148.624267578125 
190 / 1452 : pp = 148.58819580078125 
200 / 1452 : pp = 149.09552001953125 
210 / 1452 : pp = 148.8439178466797 
220 / 1452 : pp = 148.86605834960938 
230 / 1452 : pp = 149.1971435546875 
240 / 1452 : pp = 148.96533203125 
250 / 1452 : pp = 148.4253387451172 
260 / 1452 : pp = 147.9200897216797 
270 / 1452 : pp = 147.08816528320312 
280 / 1452 : pp = 147.24366760253906 
290 / 1452 : pp = 147.52182006835938 
300 / 1452 : pp = 147.72222900390625 
310 / 1452 : pp = 147.50486755371094 
320 / 1452 : pp = 147.73892211914062 
330 / 1452 : pp = 147.9404754638672 
340 / 1452 : pp = 147.37803649902344 
350 / 1452 : pp = 147.6969451904297 
360 / 1452 : pp = 147.5704345703125 
370 / 1452 : pp = 147.38674926757812 
380 / 1452 : pp = 147.03970336914062 
390 / 1452 : pp = 147.14231872558594 
400 / 1452 : pp = 146.91656494140625 
410 / 1452 : pp = 147.34059143066406 
420 / 1452 : pp = 147.68496704101562 
430 / 1452 : pp = 147.61195373535156 
440 / 1452 : pp = 147.68405151367188 
450 / 1452 : pp = 147.4711151123047 
460 / 1452 : pp = 147.1927032470703 
470 / 1452 : pp = 146.72970581054688 
480 / 1452 : pp = 146.17173767089844 
490 / 1452 : pp = 145.9028778076172 
500 / 1452 : pp = 145.42721557617188 
510 / 1452 : pp = 145.3111114501953 
520 / 1452 : pp = 145.11460876464844 
530 / 1452 : pp = 144.81488037109375 
540 / 1452 : pp = 144.263916015625 
550 / 1452 : pp = 143.997802734375 
560 / 1452 : pp = 143.71766662597656 
570 / 1452 : pp = 143.47451782226562 
580 / 1452 : pp = 143.08474731445312 
590 / 1452 : pp = 142.77920532226562 
600 / 1452 : pp = 142.39573669433594 
610 / 1452 : pp = 142.14906311035156 
620 / 1452 : pp = 141.9574432373047 
630 / 1452 : pp = 141.67369079589844 
640 / 1452 : pp = 141.81556701660156 
650 / 1452 : pp = 141.81759643554688 
660 / 1452 : pp = 141.9339599609375 
670 / 1452 : pp = 142.01248168945312 
680 / 1452 : pp = 142.08773803710938 
690 / 1452 : pp = 142.00328063964844 
700 / 1452 : pp = 141.98086547851562 
710 / 1452 : pp = 142.0632781982422 
720 / 1452 : pp = 142.10372924804688 
730 / 1452 : pp = 142.08055114746094 
740 / 1452 : pp = 142.23619079589844 
750 / 1452 : pp = 142.2660369873047 
760 / 1452 : pp = 142.34678649902344 
770 / 1452 : pp = 142.5257568359375 
780 / 1452 : pp = 142.70025634765625 
790 / 1452 : pp = 142.8614044189453 
800 / 1452 : pp = 142.84573364257812 
810 / 1452 : pp = 142.8250274658203 
820 / 1452 : pp = 142.8540496826172 
830 / 1452 : pp = 143.06053161621094 
840 / 1452 : pp = 143.0423126220703 
850 / 1452 : pp = 143.09634399414062 
860 / 1452 : pp = 143.10487365722656 
870 / 1452 : pp = 143.0884246826172 
880 / 1452 : pp = 143.19387817382812 
890 / 1452 : pp = 143.236083984375 
900 / 1452 : pp = 143.23390197753906 
910 / 1452 : pp = 143.29537963867188 
920 / 1452 : pp = 143.3722686767578 
930 / 1452 : pp = 143.33795166015625 
940 / 1452 : pp = 143.40618896484375 
950 / 1452 : pp = 143.3929901123047 
960 / 1452 : pp = 143.3693389892578 
970 / 1452 : pp = 143.39736938476562 
980 / 1452 : pp = 143.2371063232422 
990 / 1452 : pp = 143.13893127441406 
1000 / 1452 : pp = 142.9658660888672 
1010 / 1452 : pp = 143.01544189453125 
1020 / 1452 : pp = 143.152587890625 
1030 / 1452 : pp = 143.11334228515625 
1040 / 1452 : pp = 143.19020080566406 
1050 / 1452 : pp = 143.18234252929688 
1060 / 1452 : pp = 143.092041015625 
1070 / 1452 : pp = 143.24449157714844 
1080 / 1452 : pp = 143.34828186035156 
1090 / 1452 : pp = 143.38739013671875 
1100 / 1452 : pp = 143.37432861328125 
1110 / 1452 : pp = 143.20596313476562 
1120 / 1452 : pp = 143.07969665527344 
1130 / 1452 : pp = 142.92041015625 
1140 / 1452 : pp = 142.90902709960938 
1150 / 1452 : pp = 143.00732421875 
1160 / 1452 : pp = 143.01182556152344 
1170 / 1452 : pp = 142.9925994873047 
1180 / 1452 : pp = 143.06080627441406 
1190 / 1452 : pp = 143.14337158203125 
1200 / 1452 : pp = 143.16644287109375 
1210 / 1452 : pp = 143.1259002685547 
1220 / 1452 : pp = 143.2671661376953 
1230 / 1452 : pp = 143.4210968017578 
1240 / 1452 : pp = 143.4327850341797 
1250 / 1452 : pp = 143.5699920654297 
1260 / 1452 : pp = 143.63771057128906 
1270 / 1452 : pp = 143.65798950195312 
1280 / 1452 : pp = 143.68251037597656 
1290 / 1452 : pp = 143.68045043945312 
1300 / 1452 : pp = 143.72293090820312 
1310 / 1452 : pp = 143.77015686035156 
1320 / 1452 : pp = 143.71910095214844 
1330 / 1452 : pp = 143.68792724609375 
1340 / 1452 : pp = 143.7241668701172 
1350 / 1452 : pp = 143.7570037841797 
1360 / 1452 : pp = 143.81829833984375 
1370 / 1452 : pp = 143.7487030029297 
1380 / 1452 : pp = 143.7196502685547 
1390 / 1452 : pp = 143.67359924316406 
1400 / 1452 : pp = 143.60592651367188 
1410 / 1452 : pp = 143.62620544433594 
1420 / 1452 : pp = 143.5905303955078 
1430 / 1452 : pp = 143.55799865722656 
1440 / 1452 : pp = 143.5891571044922 
1450 / 1452 : pp = 143.5869598388672 

0 / 115 : pp = 226.9864959716797 
10 / 115 : pp = 207.8067169189453 
20 / 115 : pp = 209.68667602539062 
30 / 115 : pp = 207.1610565185547 
40 / 115 : pp = 206.3247833251953 
50 / 115 : pp = 201.77403259277344 
60 / 115 : pp = 201.07098388671875 
70 / 115 : pp = 197.33335876464844 
80 / 115 : pp = 195.12513732910156 
90 / 115 : pp = 192.5349578857422 
100 / 115 : pp = 187.90072631835938 
110 / 115 : pp = 185.81240844726562 
Training perplexity: 143.57354736328125
Validation perplexity:185.40573120117188
Total time : 46.14846849441528
Epoch 11

0 / 1452 : pp = 181.93162536621094 
10 / 1452 : pp = 159.94607543945312 
20 / 1452 : pp = 156.83673095703125 
30 / 1452 : pp = 153.75843811035156 
40 / 1452 : pp = 155.18362426757812 
50 / 1452 : pp = 152.39529418945312 
60 / 1452 : pp = 151.18772888183594 
70 / 1452 : pp = 151.9004364013672 
80 / 1452 : pp = 151.30239868164062 
90 / 1452 : pp = 150.1591033935547 
100 / 1452 : pp = 149.18618774414062 
110 / 1452 : pp = 147.72653198242188 
120 / 1452 : pp = 147.4357452392578 
130 / 1452 : pp = 146.41372680664062 
140 / 1452 : pp = 145.0057373046875 
150 / 1452 : pp = 144.39447021484375 
160 / 1452 : pp = 144.5330047607422 
170 / 1452 : pp = 144.23593139648438 
180 / 1452 : pp = 143.63990783691406 
190 / 1452 : pp = 143.63812255859375 
200 / 1452 : pp = 144.1143798828125 
210 / 1452 : pp = 143.88278198242188 
220 / 1452 : pp = 143.92518615722656 
230 / 1452 : pp = 144.24032592773438 
240 / 1452 : pp = 143.94110107421875 
250 / 1452 : pp = 143.3688507080078 
260 / 1452 : pp = 142.8829345703125 
270 / 1452 : pp = 142.11952209472656 
280 / 1452 : pp = 142.19415283203125 
290 / 1452 : pp = 142.51889038085938 
300 / 1452 : pp = 142.70494079589844 
310 / 1452 : pp = 142.51426696777344 
320 / 1452 : pp = 142.70106506347656 
330 / 1452 : pp = 142.88014221191406 
340 / 1452 : pp = 142.3287353515625 
350 / 1452 : pp = 142.6169891357422 
360 / 1452 : pp = 142.51971435546875 
370 / 1452 : pp = 142.33566284179688 
380 / 1452 : pp = 142.04161071777344 
390 / 1452 : pp = 142.13551330566406 
400 / 1452 : pp = 141.9499969482422 
410 / 1452 : pp = 142.3361358642578 
420 / 1452 : pp = 142.64065551757812 
430 / 1452 : pp = 142.5511016845703 
440 / 1452 : pp = 142.6728973388672 
450 / 1452 : pp = 142.47030639648438 
460 / 1452 : pp = 142.1704864501953 
470 / 1452 : pp = 141.73390197753906 
480 / 1452 : pp = 141.23020935058594 
490 / 1452 : pp = 140.9759521484375 
500 / 1452 : pp = 140.51609802246094 
510 / 1452 : pp = 140.40545654296875 
520 / 1452 : pp = 140.1936492919922 
530 / 1452 : pp = 139.8929443359375 
540 / 1452 : pp = 139.3696746826172 
550 / 1452 : pp = 139.13217163085938 
560 / 1452 : pp = 138.85247802734375 
570 / 1452 : pp = 138.6092987060547 
580 / 1452 : pp = 138.2471160888672 
590 / 1452 : pp = 137.9485626220703 
600 / 1452 : pp = 137.57379150390625 
610 / 1452 : pp = 137.31576538085938 
620 / 1452 : pp = 137.14230346679688 
630 / 1452 : pp = 136.87405395507812 
640 / 1452 : pp = 137.02928161621094 
650 / 1452 : pp = 137.0481719970703 
660 / 1452 : pp = 137.1595001220703 
670 / 1452 : pp = 137.21124267578125 
680 / 1452 : pp = 137.2671356201172 
690 / 1452 : pp = 137.19410705566406 
700 / 1452 : pp = 137.1850128173828 
710 / 1452 : pp = 137.26058959960938 
720 / 1452 : pp = 137.30726623535156 
730 / 1452 : pp = 137.28048706054688 
740 / 1452 : pp = 137.4352569580078 
750 / 1452 : pp = 137.4680938720703 
760 / 1452 : pp = 137.5524139404297 
770 / 1452 : pp = 137.73829650878906 
780 / 1452 : pp = 137.90882873535156 
790 / 1452 : pp = 138.05865478515625 
800 / 1452 : pp = 138.0673370361328 
810 / 1452 : pp = 138.03909301757812 
820 / 1452 : pp = 138.084716796875 
830 / 1452 : pp = 138.27989196777344 
840 / 1452 : pp = 138.23545837402344 
850 / 1452 : pp = 138.30343627929688 
860 / 1452 : pp = 138.3339080810547 
870 / 1452 : pp = 138.32835388183594 
880 / 1452 : pp = 138.4450225830078 
890 / 1452 : pp = 138.47157287597656 
900 / 1452 : pp = 138.46304321289062 
910 / 1452 : pp = 138.55618286132812 
920 / 1452 : pp = 138.64512634277344 
930 / 1452 : pp = 138.6160430908203 
940 / 1452 : pp = 138.66932678222656 
950 / 1452 : pp = 138.6573028564453 
960 / 1452 : pp = 138.6463165283203 
970 / 1452 : pp = 138.67059326171875 
980 / 1452 : pp = 138.50999450683594 
990 / 1452 : pp = 138.42430114746094 
1000 / 1452 : pp = 138.25344848632812 
1010 / 1452 : pp = 138.3004608154297 
1020 / 1452 : pp = 138.4243621826172 
1030 / 1452 : pp = 138.40713500976562 
1040 / 1452 : pp = 138.47129821777344 
1050 / 1452 : pp = 138.45928955078125 
1060 / 1452 : pp = 138.3919677734375 
1070 / 1452 : pp = 138.5287628173828 
1080 / 1452 : pp = 138.62298583984375 
1090 / 1452 : pp = 138.6699981689453 
1100 / 1452 : pp = 138.64849853515625 
1110 / 1452 : pp = 138.49191284179688 
1120 / 1452 : pp = 138.37355041503906 
1130 / 1452 : pp = 138.2216796875 
1140 / 1452 : pp = 138.21534729003906 
1150 / 1452 : pp = 138.30963134765625 
1160 / 1452 : pp = 138.316162109375 
1170 / 1452 : pp = 138.3023681640625 
1180 / 1452 : pp = 138.36932373046875 
1190 / 1452 : pp = 138.45960998535156 
1200 / 1452 : pp = 138.4866180419922 
1210 / 1452 : pp = 138.45730590820312 
1220 / 1452 : pp = 138.60031127929688 
1230 / 1452 : pp = 138.75485229492188 
1240 / 1452 : pp = 138.7751007080078 
1250 / 1452 : pp = 138.91221618652344 
1260 / 1452 : pp = 138.9815216064453 
1270 / 1452 : pp = 138.9919891357422 
1280 / 1452 : pp = 139.0243377685547 
1290 / 1452 : pp = 139.02725219726562 
1300 / 1452 : pp = 139.0701446533203 
1310 / 1452 : pp = 139.1090850830078 
1320 / 1452 : pp = 139.06027221679688 
1330 / 1452 : pp = 139.0338134765625 
1340 / 1452 : pp = 139.06385803222656 
1350 / 1452 : pp = 139.09608459472656 
1360 / 1452 : pp = 139.1609649658203 
1370 / 1452 : pp = 139.0869903564453 
1380 / 1452 : pp = 139.0604705810547 
1390 / 1452 : pp = 139.01670837402344 
1400 / 1452 : pp = 138.94393920898438 
1410 / 1452 : pp = 138.97323608398438 
1420 / 1452 : pp = 138.9404296875 
1430 / 1452 : pp = 138.90943908691406 
1440 / 1452 : pp = 138.94268798828125 
1450 / 1452 : pp = 138.93991088867188 

0 / 115 : pp = 225.55990600585938 
10 / 115 : pp = 207.0504608154297 
20 / 115 : pp = 208.98306274414062 
30 / 115 : pp = 206.28396606445312 
40 / 115 : pp = 205.35386657714844 
50 / 115 : pp = 200.7255401611328 
60 / 115 : pp = 200.0526580810547 
70 / 115 : pp = 196.33087158203125 
80 / 115 : pp = 194.12110900878906 
90 / 115 : pp = 191.52816772460938 
100 / 115 : pp = 186.7974395751953 
110 / 115 : pp = 184.59829711914062 
Training perplexity: 138.9222869873047
Validation perplexity:184.18101501464844
Total time : 43.92928600311279
Epoch 12

0 / 1452 : pp = 173.0251007080078 
10 / 1452 : pp = 152.98446655273438 
20 / 1452 : pp = 150.43128967285156 
30 / 1452 : pp = 147.5819854736328 
40 / 1452 : pp = 149.4164276123047 
50 / 1452 : pp = 146.70816040039062 
60 / 1452 : pp = 145.557861328125 
70 / 1452 : pp = 146.50473022460938 
80 / 1452 : pp = 145.83200073242188 
90 / 1452 : pp = 144.84402465820312 
100 / 1452 : pp = 144.0390167236328 
110 / 1452 : pp = 142.66514587402344 
120 / 1452 : pp = 142.3549346923828 
130 / 1452 : pp = 141.4630126953125 
140 / 1452 : pp = 140.2266082763672 
150 / 1452 : pp = 139.67518615722656 
160 / 1452 : pp = 139.90414428710938 
170 / 1452 : pp = 139.5490264892578 
180 / 1452 : pp = 138.91969299316406 
190 / 1452 : pp = 138.89234924316406 
200 / 1452 : pp = 139.40908813476562 
210 / 1452 : pp = 139.19068908691406 
220 / 1452 : pp = 139.35513305664062 
230 / 1452 : pp = 139.5464324951172 
240 / 1452 : pp = 139.3047637939453 
250 / 1452 : pp = 138.7708740234375 
260 / 1452 : pp = 138.29188537597656 
270 / 1452 : pp = 137.4787139892578 
280 / 1452 : pp = 137.6367950439453 
290 / 1452 : pp = 137.98513793945312 
300 / 1452 : pp = 138.17819213867188 
310 / 1452 : pp = 137.943359375 
320 / 1452 : pp = 138.12060546875 
330 / 1452 : pp = 138.29037475585938 
340 / 1452 : pp = 137.77606201171875 
350 / 1452 : pp = 138.06378173828125 
360 / 1452 : pp = 137.99000549316406 
370 / 1452 : pp = 137.81922912597656 
380 / 1452 : pp = 137.52159118652344 
390 / 1452 : pp = 137.61782836914062 
400 / 1452 : pp = 137.4178924560547 
410 / 1452 : pp = 137.82632446289062 
420 / 1452 : pp = 138.17567443847656 
430 / 1452 : pp = 138.11863708496094 
440 / 1452 : pp = 138.215087890625 
450 / 1452 : pp = 137.9976348876953 
460 / 1452 : pp = 137.6929168701172 
470 / 1452 : pp = 137.25416564941406 
480 / 1452 : pp = 136.75140380859375 
490 / 1452 : pp = 136.51712036132812 
500 / 1452 : pp = 136.0896453857422 
510 / 1452 : pp = 135.97048950195312 
520 / 1452 : pp = 135.7760009765625 
530 / 1452 : pp = 135.50389099121094 
540 / 1452 : pp = 135.01437377929688 
550 / 1452 : pp = 134.7666015625 
560 / 1452 : pp = 134.48973083496094 
570 / 1452 : pp = 134.22853088378906 
580 / 1452 : pp = 133.88455200195312 
590 / 1452 : pp = 133.5808868408203 
600 / 1452 : pp = 133.22975158691406 
610 / 1452 : pp = 132.99591064453125 
620 / 1452 : pp = 132.79502868652344 
630 / 1452 : pp = 132.5094451904297 
640 / 1452 : pp = 132.62892150878906 
650 / 1452 : pp = 132.63499450683594 
660 / 1452 : pp = 132.7379913330078 
670 / 1452 : pp = 132.79046630859375 
680 / 1452 : pp = 132.85842895507812 
690 / 1452 : pp = 132.80364990234375 
700 / 1452 : pp = 132.80477905273438 
710 / 1452 : pp = 132.90170288085938 
720 / 1452 : pp = 132.92971801757812 
730 / 1452 : pp = 132.9019012451172 
740 / 1452 : pp = 133.04811096191406 
750 / 1452 : pp = 133.10877990722656 
760 / 1452 : pp = 133.19189453125 
770 / 1452 : pp = 133.3564910888672 
780 / 1452 : pp = 133.54000854492188 
790 / 1452 : pp = 133.69239807128906 
800 / 1452 : pp = 133.68495178222656 
810 / 1452 : pp = 133.67971801757812 
820 / 1452 : pp = 133.7035675048828 
830 / 1452 : pp = 133.89329528808594 
840 / 1452 : pp = 133.850341796875 
850 / 1452 : pp = 133.90390014648438 
860 / 1452 : pp = 133.9090118408203 
870 / 1452 : pp = 133.89974975585938 
880 / 1452 : pp = 134.0077667236328 
890 / 1452 : pp = 134.03485107421875 
900 / 1452 : pp = 134.0261688232422 
910 / 1452 : pp = 134.10255432128906 
920 / 1452 : pp = 134.17291259765625 
930 / 1452 : pp = 134.14796447753906 
940 / 1452 : pp = 134.20925903320312 
950 / 1452 : pp = 134.19281005859375 
960 / 1452 : pp = 134.17745971679688 
970 / 1452 : pp = 134.18653869628906 
980 / 1452 : pp = 134.03192138671875 
990 / 1452 : pp = 133.94349670410156 
1000 / 1452 : pp = 133.79685974121094 
1010 / 1452 : pp = 133.8438262939453 
1020 / 1452 : pp = 133.9608612060547 
1030 / 1452 : pp = 133.93934631347656 
1040 / 1452 : pp = 134.02833557128906 
1050 / 1452 : pp = 134.01734924316406 
1060 / 1452 : pp = 133.95346069335938 
1070 / 1452 : pp = 134.10205078125 
1080 / 1452 : pp = 134.2030487060547 
1090 / 1452 : pp = 134.23696899414062 
1100 / 1452 : pp = 134.2230224609375 
1110 / 1452 : pp = 134.0829315185547 
1120 / 1452 : pp = 133.980224609375 
1130 / 1452 : pp = 133.83815002441406 
1140 / 1452 : pp = 133.8366241455078 
1150 / 1452 : pp = 133.92108154296875 
1160 / 1452 : pp = 133.94375610351562 
1170 / 1452 : pp = 133.9360809326172 
1180 / 1452 : pp = 133.99684143066406 
1190 / 1452 : pp = 134.0944366455078 
1200 / 1452 : pp = 134.11676025390625 
1210 / 1452 : pp = 134.0911102294922 
1220 / 1452 : pp = 134.22763061523438 
1230 / 1452 : pp = 134.38043212890625 
1240 / 1452 : pp = 134.39817810058594 
1250 / 1452 : pp = 134.5367431640625 
1260 / 1452 : pp = 134.593017578125 
1270 / 1452 : pp = 134.61497497558594 
1280 / 1452 : pp = 134.6423797607422 
1290 / 1452 : pp = 134.64340209960938 
1300 / 1452 : pp = 134.68026733398438 
1310 / 1452 : pp = 134.73556518554688 
1320 / 1452 : pp = 134.69021606445312 
1330 / 1452 : pp = 134.66131591796875 
1340 / 1452 : pp = 134.69393920898438 
1350 / 1452 : pp = 134.7328643798828 
1360 / 1452 : pp = 134.79405212402344 
1370 / 1452 : pp = 134.71237182617188 
1380 / 1452 : pp = 134.6885528564453 
1390 / 1452 : pp = 134.65110778808594 
1400 / 1452 : pp = 134.59584045410156 
1410 / 1452 : pp = 134.6193389892578 
1420 / 1452 : pp = 134.58338928222656 
1430 / 1452 : pp = 134.559326171875 
1440 / 1452 : pp = 134.59507751464844 
1450 / 1452 : pp = 134.59365844726562 

0 / 115 : pp = 226.0741729736328 
10 / 115 : pp = 207.00494384765625 
20 / 115 : pp = 209.26976013183594 
30 / 115 : pp = 206.44662475585938 
40 / 115 : pp = 205.47268676757812 
50 / 115 : pp = 200.7876739501953 
60 / 115 : pp = 200.13414001464844 
70 / 115 : pp = 196.35549926757812 
80 / 115 : pp = 194.10777282714844 
90 / 115 : pp = 191.47467041015625 
100 / 115 : pp = 186.61351013183594 
110 / 115 : pp = 184.30374145507812 
Training perplexity: 134.57826232910156
Validation perplexity:183.8900146484375
Total time : 45.410256147384644
Epoch 13

0 / 1452 : pp = 169.39393615722656 
10 / 1452 : pp = 150.13232421875 
20 / 1452 : pp = 147.60450744628906 
30 / 1452 : pp = 144.64317321777344 
40 / 1452 : pp = 146.47427368164062 
50 / 1452 : pp = 143.929443359375 
60 / 1452 : pp = 142.8344268798828 
70 / 1452 : pp = 143.45248413085938 
80 / 1452 : pp = 142.5418701171875 
90 / 1452 : pp = 141.6178436279297 
100 / 1452 : pp = 140.70127868652344 
110 / 1452 : pp = 139.2852325439453 
120 / 1452 : pp = 138.8017120361328 
130 / 1452 : pp = 137.85629272460938 
140 / 1452 : pp = 136.51718139648438 
150 / 1452 : pp = 136.03619384765625 
160 / 1452 : pp = 136.154296875 
170 / 1452 : pp = 135.67037963867188 
180 / 1452 : pp = 135.0376739501953 
190 / 1452 : pp = 134.9230499267578 
200 / 1452 : pp = 135.4241180419922 
210 / 1452 : pp = 135.24581909179688 
220 / 1452 : pp = 135.37957763671875 
230 / 1452 : pp = 135.67652893066406 
240 / 1452 : pp = 135.4161834716797 
250 / 1452 : pp = 134.90895080566406 
260 / 1452 : pp = 134.46754455566406 
270 / 1452 : pp = 133.68577575683594 
280 / 1452 : pp = 133.86770629882812 
290 / 1452 : pp = 134.18475341796875 
300 / 1452 : pp = 134.39132690429688 
310 / 1452 : pp = 134.19985961914062 
320 / 1452 : pp = 134.37998962402344 
330 / 1452 : pp = 134.5557403564453 
340 / 1452 : pp = 134.00686645507812 
350 / 1452 : pp = 134.27749633789062 
360 / 1452 : pp = 134.20286560058594 
370 / 1452 : pp = 134.042724609375 
380 / 1452 : pp = 133.74398803710938 
390 / 1452 : pp = 133.83584594726562 
400 / 1452 : pp = 133.64382934570312 
410 / 1452 : pp = 134.02366638183594 
420 / 1452 : pp = 134.35415649414062 
430 / 1452 : pp = 134.310546875 
440 / 1452 : pp = 134.3634490966797 
450 / 1452 : pp = 134.15602111816406 
460 / 1452 : pp = 133.86578369140625 
470 / 1452 : pp = 133.43414306640625 
480 / 1452 : pp = 132.90310668945312 
490 / 1452 : pp = 132.646240234375 
500 / 1452 : pp = 132.1982421875 
510 / 1452 : pp = 132.04200744628906 
520 / 1452 : pp = 131.86940002441406 
530 / 1452 : pp = 131.59841918945312 
540 / 1452 : pp = 131.12356567382812 
550 / 1452 : pp = 130.887939453125 
560 / 1452 : pp = 130.6210174560547 
570 / 1452 : pp = 130.37826538085938 
580 / 1452 : pp = 130.0374755859375 
590 / 1452 : pp = 129.75979614257812 
600 / 1452 : pp = 129.38308715820312 
610 / 1452 : pp = 129.16685485839844 
620 / 1452 : pp = 129.0115509033203 
630 / 1452 : pp = 128.75152587890625 
640 / 1452 : pp = 128.87295532226562 
650 / 1452 : pp = 128.88734436035156 
660 / 1452 : pp = 128.98275756835938 
670 / 1452 : pp = 129.0487060546875 
680 / 1452 : pp = 129.11013793945312 
690 / 1452 : pp = 129.0646514892578 
700 / 1452 : pp = 129.06280517578125 
710 / 1452 : pp = 129.1343994140625 
720 / 1452 : pp = 129.18582153320312 
730 / 1452 : pp = 129.15138244628906 
740 / 1452 : pp = 129.29811096191406 
750 / 1452 : pp = 129.339599609375 
760 / 1452 : pp = 129.4257354736328 
770 / 1452 : pp = 129.61631774902344 
780 / 1452 : pp = 129.802734375 
790 / 1452 : pp = 129.96804809570312 
800 / 1452 : pp = 129.95187377929688 
810 / 1452 : pp = 129.92417907714844 
820 / 1452 : pp = 129.9774627685547 
830 / 1452 : pp = 130.1638946533203 
840 / 1452 : pp = 130.13095092773438 
850 / 1452 : pp = 130.16595458984375 
860 / 1452 : pp = 130.173828125 
870 / 1452 : pp = 130.170166015625 
880 / 1452 : pp = 130.27032470703125 
890 / 1452 : pp = 130.3022003173828 
900 / 1452 : pp = 130.3071746826172 
910 / 1452 : pp = 130.37939453125 
920 / 1452 : pp = 130.46229553222656 
930 / 1452 : pp = 130.43846130371094 
940 / 1452 : pp = 130.50889587402344 
950 / 1452 : pp = 130.50086975097656 
960 / 1452 : pp = 130.4833221435547 
970 / 1452 : pp = 130.50814819335938 
980 / 1452 : pp = 130.35577392578125 
990 / 1452 : pp = 130.26759338378906 
1000 / 1452 : pp = 130.1064453125 
1010 / 1452 : pp = 130.1472625732422 
1020 / 1452 : pp = 130.27169799804688 
1030 / 1452 : pp = 130.25100708007812 
1040 / 1452 : pp = 130.30816650390625 
1050 / 1452 : pp = 130.29803466796875 
1060 / 1452 : pp = 130.2242431640625 
1070 / 1452 : pp = 130.35906982421875 
1080 / 1452 : pp = 130.45103454589844 
1090 / 1452 : pp = 130.49838256835938 
1100 / 1452 : pp = 130.484130859375 
1110 / 1452 : pp = 130.35316467285156 
1120 / 1452 : pp = 130.24697875976562 
1130 / 1452 : pp = 130.10804748535156 
1140 / 1452 : pp = 130.1076202392578 
1150 / 1452 : pp = 130.195068359375 
1160 / 1452 : pp = 130.19674682617188 
1170 / 1452 : pp = 130.18321228027344 
1180 / 1452 : pp = 130.24623107910156 
1190 / 1452 : pp = 130.33905029296875 
1200 / 1452 : pp = 130.3650360107422 
1210 / 1452 : pp = 130.34588623046875 
1220 / 1452 : pp = 130.4850616455078 
1230 / 1452 : pp = 130.63160705566406 
1240 / 1452 : pp = 130.64674377441406 
1250 / 1452 : pp = 130.77078247070312 
1260 / 1452 : pp = 130.8397674560547 
1270 / 1452 : pp = 130.8511199951172 
1280 / 1452 : pp = 130.88967895507812 
1290 / 1452 : pp = 130.9040985107422 
1300 / 1452 : pp = 130.93511962890625 
1310 / 1452 : pp = 130.9759063720703 
1320 / 1452 : pp = 130.92800903320312 
1330 / 1452 : pp = 130.9105224609375 
1340 / 1452 : pp = 130.929443359375 
1350 / 1452 : pp = 130.96153259277344 
1360 / 1452 : pp = 131.02381896972656 
1370 / 1452 : pp = 130.9545440673828 
1380 / 1452 : pp = 130.9344940185547 
1390 / 1452 : pp = 130.9055938720703 
1400 / 1452 : pp = 130.85386657714844 
1410 / 1452 : pp = 130.8874969482422 
1420 / 1452 : pp = 130.85928344726562 
1430 / 1452 : pp = 130.83995056152344 
1440 / 1452 : pp = 130.86659240722656 
1450 / 1452 : pp = 130.86839294433594 

0 / 115 : pp = 227.78428649902344 
10 / 115 : pp = 207.609619140625 
20 / 115 : pp = 209.92459106445312 
30 / 115 : pp = 206.96240234375 
40 / 115 : pp = 205.9295654296875 
50 / 115 : pp = 201.0296630859375 
60 / 115 : pp = 200.38059997558594 
70 / 115 : pp = 196.55764770507812 
80 / 115 : pp = 194.31735229492188 
90 / 115 : pp = 191.66146850585938 
100 / 115 : pp = 186.70437622070312 
110 / 115 : pp = 184.3171844482422 
Training perplexity: 130.85043334960938
Validation perplexity:183.88186645507812
Total time : 45.345656394958496
Epoch 14

0 / 1452 : pp = 164.82191467285156 
10 / 1452 : pp = 146.39089965820312 
20 / 1452 : pp = 142.93240356445312 
30 / 1452 : pp = 140.3113555908203 
40 / 1452 : pp = 142.39939880371094 
50 / 1452 : pp = 139.70162963867188 
60 / 1452 : pp = 138.73023986816406 
70 / 1452 : pp = 139.2675018310547 
80 / 1452 : pp = 138.47824096679688 
90 / 1452 : pp = 137.40432739257812 
100 / 1452 : pp = 136.47793579101562 
110 / 1452 : pp = 135.2294464111328 
120 / 1452 : pp = 134.80728149414062 
130 / 1452 : pp = 133.89822387695312 
140 / 1452 : pp = 132.54141235351562 
150 / 1452 : pp = 132.10025024414062 
160 / 1452 : pp = 132.21829223632812 
170 / 1452 : pp = 131.8765106201172 
180 / 1452 : pp = 131.37515258789062 
190 / 1452 : pp = 131.31622314453125 
200 / 1452 : pp = 131.78297424316406 
210 / 1452 : pp = 131.5507354736328 
220 / 1452 : pp = 131.7002410888672 
230 / 1452 : pp = 131.9277801513672 
240 / 1452 : pp = 131.72166442871094 
250 / 1452 : pp = 131.225830078125 
260 / 1452 : pp = 130.7496337890625 
270 / 1452 : pp = 129.9896697998047 
280 / 1452 : pp = 130.10594177246094 
290 / 1452 : pp = 130.41644287109375 
300 / 1452 : pp = 130.5982208251953 
310 / 1452 : pp = 130.36329650878906 
320 / 1452 : pp = 130.5633544921875 
330 / 1452 : pp = 130.77252197265625 
340 / 1452 : pp = 130.273193359375 
350 / 1452 : pp = 130.47889709472656 
360 / 1452 : pp = 130.4348602294922 
370 / 1452 : pp = 130.28126525878906 
380 / 1452 : pp = 130.02786254882812 
390 / 1452 : pp = 130.1564483642578 
400 / 1452 : pp = 129.98440551757812 
410 / 1452 : pp = 130.37721252441406 
420 / 1452 : pp = 130.71859741210938 
430 / 1452 : pp = 130.65939331054688 
440 / 1452 : pp = 130.72987365722656 
450 / 1452 : pp = 130.56272888183594 
460 / 1452 : pp = 130.28195190429688 
470 / 1452 : pp = 129.90936279296875 
480 / 1452 : pp = 129.42857360839844 
490 / 1452 : pp = 129.18077087402344 
500 / 1452 : pp = 128.7588348388672 
510 / 1452 : pp = 128.6303253173828 
520 / 1452 : pp = 128.47616577148438 
530 / 1452 : pp = 128.21148681640625 
540 / 1452 : pp = 127.7218017578125 
550 / 1452 : pp = 127.50067138671875 
560 / 1452 : pp = 127.27574157714844 
570 / 1452 : pp = 127.05399322509766 
580 / 1452 : pp = 126.73983001708984 
590 / 1452 : pp = 126.43692779541016 
600 / 1452 : pp = 126.06050109863281 
610 / 1452 : pp = 125.82952880859375 
620 / 1452 : pp = 125.66295623779297 
630 / 1452 : pp = 125.39354705810547 
640 / 1452 : pp = 125.49463653564453 
650 / 1452 : pp = 125.48816680908203 
660 / 1452 : pp = 125.58712005615234 
670 / 1452 : pp = 125.65978240966797 
680 / 1452 : pp = 125.71456146240234 
690 / 1452 : pp = 125.66937255859375 
700 / 1452 : pp = 125.65900421142578 
710 / 1452 : pp = 125.7271499633789 
720 / 1452 : pp = 125.77758026123047 
730 / 1452 : pp = 125.74129486083984 
740 / 1452 : pp = 125.8759765625 
750 / 1452 : pp = 125.91793823242188 
760 / 1452 : pp = 125.99595642089844 
770 / 1452 : pp = 126.18113708496094 
780 / 1452 : pp = 126.35147094726562 
790 / 1452 : pp = 126.50797271728516 
800 / 1452 : pp = 126.49759674072266 
810 / 1452 : pp = 126.48113250732422 
820 / 1452 : pp = 126.52528381347656 
830 / 1452 : pp = 126.705810546875 
840 / 1452 : pp = 126.67517852783203 
850 / 1452 : pp = 126.74176025390625 
860 / 1452 : pp = 126.74151611328125 
870 / 1452 : pp = 126.73414611816406 
880 / 1452 : pp = 126.83026885986328 
890 / 1452 : pp = 126.88519287109375 
900 / 1452 : pp = 126.88053894042969 
910 / 1452 : pp = 126.97138214111328 
920 / 1452 : pp = 127.04660034179688 
930 / 1452 : pp = 127.03763580322266 
940 / 1452 : pp = 127.1126480102539 
950 / 1452 : pp = 127.09610748291016 
960 / 1452 : pp = 127.0873794555664 
970 / 1452 : pp = 127.10343933105469 
980 / 1452 : pp = 126.96441650390625 
990 / 1452 : pp = 126.88519287109375 
1000 / 1452 : pp = 126.7336654663086 
1010 / 1452 : pp = 126.77796936035156 
1020 / 1452 : pp = 126.89826202392578 
1030 / 1452 : pp = 126.88761138916016 
1040 / 1452 : pp = 126.95309448242188 
1050 / 1452 : pp = 126.96478271484375 
1060 / 1452 : pp = 126.89324188232422 
1070 / 1452 : pp = 127.03242492675781 
1080 / 1452 : pp = 127.13228607177734 
1090 / 1452 : pp = 127.173095703125 
1100 / 1452 : pp = 127.15975189208984 
1110 / 1452 : pp = 127.0392074584961 
1120 / 1452 : pp = 126.94032287597656 
1130 / 1452 : pp = 126.80693054199219 
1140 / 1452 : pp = 126.81315612792969 
1150 / 1452 : pp = 126.90467834472656 
1160 / 1452 : pp = 126.91236114501953 
1170 / 1452 : pp = 126.90897369384766 
1180 / 1452 : pp = 126.98052215576172 
1190 / 1452 : pp = 127.07483673095703 
1200 / 1452 : pp = 127.10216522216797 
1210 / 1452 : pp = 127.08258819580078 
1220 / 1452 : pp = 127.22943878173828 
1230 / 1452 : pp = 127.38563537597656 
1240 / 1452 : pp = 127.40538024902344 
1250 / 1452 : pp = 127.53369140625 
1260 / 1452 : pp = 127.59293365478516 
1270 / 1452 : pp = 127.61489868164062 
1280 / 1452 : pp = 127.6484375 
1290 / 1452 : pp = 127.65257263183594 
1300 / 1452 : pp = 127.69329833984375 
1310 / 1452 : pp = 127.74549102783203 
1320 / 1452 : pp = 127.7043228149414 
1330 / 1452 : pp = 127.6866683959961 
1340 / 1452 : pp = 127.70913696289062 
1350 / 1452 : pp = 127.73233795166016 
1360 / 1452 : pp = 127.7855224609375 
1370 / 1452 : pp = 127.71918487548828 
1380 / 1452 : pp = 127.69987487792969 
1390 / 1452 : pp = 127.6697998046875 
1400 / 1452 : pp = 127.61137390136719 
1410 / 1452 : pp = 127.6404037475586 
1420 / 1452 : pp = 127.61094665527344 
1430 / 1452 : pp = 127.58216857910156 
1440 / 1452 : pp = 127.61477661132812 
1450 / 1452 : pp = 127.61964416503906 

0 / 115 : pp = 228.21578979492188 
10 / 115 : pp = 208.11244201660156 
20 / 115 : pp = 210.688232421875 
30 / 115 : pp = 207.62408447265625 
40 / 115 : pp = 206.45184326171875 
50 / 115 : pp = 201.52760314941406 
60 / 115 : pp = 200.7784881591797 
70 / 115 : pp = 196.83067321777344 
80 / 115 : pp = 194.6357879638672 
90 / 115 : pp = 191.9783935546875 
100 / 115 : pp = 186.8787841796875 
110 / 115 : pp = 184.35252380371094 
Training perplexity: 127.60413360595703
Validation perplexity:183.8877410888672
Total time : 41.6636528968811
Epoch 15

0 / 1452 : pp = 156.81654357910156 
10 / 1452 : pp = 142.1070556640625 
20 / 1452 : pp = 139.55076599121094 
30 / 1452 : pp = 136.63551330566406 
40 / 1452 : pp = 138.5840606689453 
50 / 1452 : pp = 136.052734375 
60 / 1452 : pp = 134.93019104003906 
70 / 1452 : pp = 135.65206909179688 
80 / 1452 : pp = 135.2620086669922 
90 / 1452 : pp = 134.314697265625 
100 / 1452 : pp = 133.4916229248047 
110 / 1452 : pp = 132.26052856445312 
120 / 1452 : pp = 131.7714080810547 
130 / 1452 : pp = 130.77365112304688 
140 / 1452 : pp = 129.5411834716797 
150 / 1452 : pp = 129.0791778564453 
160 / 1452 : pp = 129.21920776367188 
170 / 1452 : pp = 128.7528839111328 
180 / 1452 : pp = 128.22279357910156 
190 / 1452 : pp = 128.18177795410156 
200 / 1452 : pp = 128.58758544921875 
210 / 1452 : pp = 128.3906707763672 
220 / 1452 : pp = 128.5266571044922 
230 / 1452 : pp = 128.80563354492188 
240 / 1452 : pp = 128.61886596679688 
250 / 1452 : pp = 128.13172912597656 
260 / 1452 : pp = 127.69220733642578 
270 / 1452 : pp = 126.96150970458984 
280 / 1452 : pp = 127.04702758789062 
290 / 1452 : pp = 127.33565521240234 
300 / 1452 : pp = 127.55929565429688 
310 / 1452 : pp = 127.38514709472656 
320 / 1452 : pp = 127.52171325683594 
330 / 1452 : pp = 127.68690490722656 
340 / 1452 : pp = 127.18340301513672 
350 / 1452 : pp = 127.4073257446289 
360 / 1452 : pp = 127.30432891845703 
370 / 1452 : pp = 127.17618560791016 
380 / 1452 : pp = 126.92579650878906 
390 / 1452 : pp = 127.02473449707031 
400 / 1452 : pp = 126.8515625 
410 / 1452 : pp = 127.211669921875 
420 / 1452 : pp = 127.51788330078125 
430 / 1452 : pp = 127.47386169433594 
440 / 1452 : pp = 127.57164001464844 
450 / 1452 : pp = 127.3601303100586 
460 / 1452 : pp = 127.09434509277344 
470 / 1452 : pp = 126.71922302246094 
480 / 1452 : pp = 126.24349212646484 
490 / 1452 : pp = 125.98778533935547 
500 / 1452 : pp = 125.59526824951172 
510 / 1452 : pp = 125.4450912475586 
520 / 1452 : pp = 125.29247283935547 
530 / 1452 : pp = 125.03536224365234 
540 / 1452 : pp = 124.5813980102539 
550 / 1452 : pp = 124.33724212646484 
560 / 1452 : pp = 124.08995819091797 
570 / 1452 : pp = 123.86637878417969 
580 / 1452 : pp = 123.53152465820312 
590 / 1452 : pp = 123.20321655273438 
600 / 1452 : pp = 122.85673522949219 
610 / 1452 : pp = 122.64250946044922 
620 / 1452 : pp = 122.4958724975586 
630 / 1452 : pp = 122.22386169433594 
640 / 1452 : pp = 122.31143188476562 
650 / 1452 : pp = 122.30093383789062 
660 / 1452 : pp = 122.39427947998047 
670 / 1452 : pp = 122.45440673828125 
680 / 1452 : pp = 122.51146697998047 
690 / 1452 : pp = 122.4854736328125 
700 / 1452 : pp = 122.48600006103516 
710 / 1452 : pp = 122.56084442138672 
720 / 1452 : pp = 122.59059143066406 
730 / 1452 : pp = 122.55529022216797 
740 / 1452 : pp = 122.69409942626953 
750 / 1452 : pp = 122.76456451416016 
760 / 1452 : pp = 122.84437561035156 
770 / 1452 : pp = 123.02527618408203 
780 / 1452 : pp = 123.20509338378906 
790 / 1452 : pp = 123.36305236816406 
800 / 1452 : pp = 123.36852264404297 
810 / 1452 : pp = 123.36799621582031 
820 / 1452 : pp = 123.39976501464844 
830 / 1452 : pp = 123.59362030029297 
840 / 1452 : pp = 123.56946563720703 
850 / 1452 : pp = 123.63800811767578 
860 / 1452 : pp = 123.63983917236328 
870 / 1452 : pp = 123.64148712158203 
880 / 1452 : pp = 123.7568588256836 
890 / 1452 : pp = 123.7885513305664 
900 / 1452 : pp = 123.79640197753906 
910 / 1452 : pp = 123.86153411865234 
920 / 1452 : pp = 123.92941284179688 
930 / 1452 : pp = 123.9125747680664 
940 / 1452 : pp = 123.95559692382812 
950 / 1452 : pp = 123.93928527832031 
960 / 1452 : pp = 123.94294738769531 
970 / 1452 : pp = 123.95547485351562 
980 / 1452 : pp = 123.8229751586914 
990 / 1452 : pp = 123.73727416992188 
1000 / 1452 : pp = 123.59091186523438 
1010 / 1452 : pp = 123.634765625 
1020 / 1452 : pp = 123.76506042480469 
1030 / 1452 : pp = 123.75485229492188 
1040 / 1452 : pp = 123.807861328125 
1050 / 1452 : pp = 123.79156494140625 
1060 / 1452 : pp = 123.73054504394531 
1070 / 1452 : pp = 123.8615951538086 
1080 / 1452 : pp = 123.96564483642578 
1090 / 1452 : pp = 124.02104187011719 
1100 / 1452 : pp = 124.012939453125 
1110 / 1452 : pp = 123.87582397460938 
1120 / 1452 : pp = 123.775390625 
1130 / 1452 : pp = 123.63182067871094 
1140 / 1452 : pp = 123.62391662597656 
1150 / 1452 : pp = 123.71013641357422 
1160 / 1452 : pp = 123.72423553466797 
1170 / 1452 : pp = 123.71726989746094 
1180 / 1452 : pp = 123.79032897949219 
1190 / 1452 : pp = 123.87883758544922 
1200 / 1452 : pp = 123.9125747680664 
1210 / 1452 : pp = 123.90140533447266 
1220 / 1452 : pp = 124.03245544433594 
1230 / 1452 : pp = 124.19799041748047 
1240 / 1452 : pp = 124.21469116210938 
1250 / 1452 : pp = 124.34103393554688 
1260 / 1452 : pp = 124.4041976928711 
1270 / 1452 : pp = 124.42852020263672 
1280 / 1452 : pp = 124.46656036376953 
1290 / 1452 : pp = 124.4811019897461 
1300 / 1452 : pp = 124.52384185791016 
1310 / 1452 : pp = 124.57533264160156 
1320 / 1452 : pp = 124.5398178100586 
1330 / 1452 : pp = 124.52598571777344 
1340 / 1452 : pp = 124.53311157226562 
1350 / 1452 : pp = 124.57759094238281 
1360 / 1452 : pp = 124.63385772705078 
1370 / 1452 : pp = 124.58133697509766 
1380 / 1452 : pp = 124.55769348144531 
1390 / 1452 : pp = 124.54011535644531 
1400 / 1452 : pp = 124.4884033203125 
1410 / 1452 : pp = 124.51226806640625 
1420 / 1452 : pp = 124.49683380126953 
1430 / 1452 : pp = 124.4754638671875 
1440 / 1452 : pp = 124.50164031982422 
1450 / 1452 : pp = 124.50894165039062 

0 / 115 : pp = 230.8488006591797 
10 / 115 : pp = 209.2509002685547 
20 / 115 : pp = 211.68577575683594 
30 / 115 : pp = 208.44056701660156 
40 / 115 : pp = 207.2039337158203 
50 / 115 : pp = 202.1859588623047 
60 / 115 : pp = 201.34739685058594 
70 / 115 : pp = 197.4251251220703 
80 / 115 : pp = 195.2623291015625 
90 / 115 : pp = 192.592529296875 
100 / 115 : pp = 187.39553833007812 
110 / 115 : pp = 184.791259765625 
Training perplexity: 124.4933853149414
Validation perplexity:184.32510375976562
Total time : 40.856229066848755

0 / 128 : pp = 184.6475067138672 
10 / 128 : pp = 176.8856964111328 
20 / 128 : pp = 164.3444366455078 
30 / 128 : pp = 167.85472106933594 
40 / 128 : pp = 169.25367736816406 
50 / 128 : pp = 168.86561584472656 
60 / 128 : pp = 168.11801147460938 
70 / 128 : pp = 165.4105224609375 
80 / 128 : pp = 162.91146850585938 
90 / 128 : pp = 161.29742431640625 
100 / 128 : pp = 162.45989990234375 
110 / 128 : pp = 162.6834716796875 
120 / 128 : pp = 164.3359832763672 
=-==-==-==-==-=
Test perplexity: 164.0149383544922 
=-==-==-==-==-=
View Code

更详细的内容请参考下面链接

https://github.com/weizhenzhao/cs224d_nlp_problem_set2

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值