PyTorch的STML模型的语句生成_pytorch stml-CSDN博客

本文链接：https://blog.csdn.net/lhc1251313131313/article/details/136334438

2/27，在学习PyTorch时制作RNN的模型时想试着用LSTM实现，因此在原有的基础上更换了模型，并发现在cpu下计算比gpu更快，并且得到的loss并没有比rnn更好效果，因此在这做记录，因为是初学者，请多多指教。

以下是公式与代码：

class LSTM(nn.Module):
    def __init__(self,input_size, hidden_size, output_size,ct_size):
        super(LSTM,self).__init__()
        self.hidden_size = hidden_size
        self.ct_size     = ct_size
        '''
        LSTM 有遗忘门，输入门，输出门三个门，
        遗忘门forget_layer:当前(ht-1)时间序列会通过线性层(linear)，得到一个w1、b1。通过sigmoid function
                    得到真正的forget output，该参数是显示对ct-1的重要性权重参数。 
        输入门input:
            单元input_layer:将当前的(ht-1和xt)进行拼接，并且通过线性层(linear),得到一个w2、b2,
            单元cell_layer:当前拼接完成的通过线性得到w3，b3，通过tanh得到[-1,1]的序列

        输出门output_layer:通过线性层得到
        最后返回 ct 和 ht:
        '''
        self.forget_layer = nn.Linear(hidden_size,hidden_size)
        self.input_layer  = nn.Linear(n_categories + input_size + hidden_size, hidden_size)
        self.output_layer = nn.Linear(n_categories + input_size + hidden_size,hidden_size)
        self.cell_layer   = nn.Linear(n_categories + input_size + hidden_size, hidden_size)
        self.o2o = nn.Linear(hidden_size + ct_size, output_size)

        self.dropout = nn.Dropout(0.1) #dropout 1/10 * output
        self.sigmoid = nn.Sigmoid()
        self.tanh    = nn.Tanh()
        self.softmax = nn.LogSoftmax(dim=1) #used NLLloss, because it's a negative num.


    def forward(self, category, inputs, hindden, ct_1):
        '''
        ct = forget * ct-1 + input * cell, (arraya)
        ht = ot * tanh(ct), (hidden)
        '''
        #forget m make [1,128]
        forget = self.sigmoid(self.forget_layer(hindden))
        #forget ct_1 [1,128]
        ct_1 = torch.mul(forget , ct_1)
        #input shape make [1,59+128+59]
        input_combined = torch.cat((category,inputs,hindden),1)
        #output make [1,59+128+59] -> [1,128]
        output = self.sigmoid(self.input_layer(input_combined))
        #ct production at this time [1,59+128+59] -> [1,128]
        ct = self.tanh(self.cell_layer(input_combined))
        #all of the ct [1,128]
        ct_1 += torch.mul(output , ct)
        #ture output and new_hidden make [1,128] -> [1,128]
        hindden = self.sigmoid(self.output_layer(input_combined))
        hindden = torch.mul(hindden , self.tanh(ct_1))

        # [1,128] -> [1,59] 
        output_combined = torch.cat((hindden,ct_1),1)
        output  = self.o2o(output_combined)
        output  = self.dropout(output)
        output  = self.softmax(output)        
 
        return output,hindden

    def initHidden(self):
        return torch.zeros(1,self.hidden_size)

    def inict_1(self):
        return torch.randn(1,self.ct_size)

2/28，发现可能是因为我在生成dataset时输入进去时对其多次设定在gpu上导致信息来回搬运，因此GPU比CPU慢。