Siamese+LSTM+Distance(abs)

Siamese+LSTM


网络结构模块

  • 孪生网络

    首先要理解什么是孪生网络模块,我们在词嵌入和编码(LSTM)过程中使用的是相同的参数,比如说我下面的代码中,在编码阶段,对于两个句子的输入,我都使用了相同的LSTM,这也就是Siamese+LSTM。

  • 疑问
    不过我有个疑问,我在某些文章中看到用LSTM来判断句子相似度,它不是Siamese+LSTM,在文章中对比的两种方法,一种是Siamese+LSTM,还有一种是LSTM。我也不清楚这个单独的LSTM是什么意思,后来我思考了一下,可能是在对两个句子进行编码的时候,使用了两个不同的LSTM结构(我猜是这样)。

  • 编码
    编码完之后,也就是LSTM的输出阶段。LSTM的输出尺寸是[len,batch,hidden],因为在LSTM结构中我没有去声明batch_first这个属性,对LSTM的输出,我选取了最后一个输出,相当于final_ht。

  • 距离
    两个尺寸都是[batch,hidden],然后对两个句子取绝对值的差作为全连接神经网络的输入,torch.abs(),这里有个dim参数有时候有用,我在这里没写,正常情况下应该是dim=-1。

  • sigmoid()函数
    关于最后为什么使用sigmoid函数,我最终是把输出的值通过sigmoid函数归到[0,1],最终loss函数采用的nn.BCEloss()。当然这里你也可以进行二分类,我试了一下,感觉没有输出[0,1]之间的效果好,虽然效果都不是很好。。。

import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import pickle

file1 = r"./data/vocab.pkl"
file2 = r"./data/vocabs_matrix.pkl"


vocabs = pickle.load(open(file1,'rb'))
Embedding_matrix = pickle.load(open(file2,'rb'))
Vocab_size = len(vocabs)

class LSTM1(nn.Module):
    def __init__(self):
        super(LSTM1, self).__init__()
        self.Vocab_size = Vocab_size
        self.batch_size = 500
        self.input_size = 300
        self.n_hidden1 = 128
        self.Embedding_dim = 300
        self.n_class=2
        self.seq_len = 20
        self.dropout = nn.Dropout(0.2)
        self.Embedding_matrix = Embedding_matrix
        self.word_embeds = nn.Embedding(self.Vocab_size+1, self.Embedding_dim)
        pretrained_weight = np.array(self.Embedding_matrix)
        self.word_embeds.weight.data.copy_(torch.from_numpy(pretrained_weight))
        self.Lstm1 = nn.LSTM(self.Embedding_dim, hidden_size=self.n_hidden1, bidirectional=False)
        #self.fc = nn.Linear(self.n_hidden1*2,self.n_class,bias=False)
        self.fc1 = nn.Linear(self.n_hidden1,32,bias=False)
        self.b1 = nn.Parameter(torch.rand([32]))
        self.fc2 = nn.Linear(32,1,bias=False)
        self.b2 = nn.Parameter(torch.rand([1]))
        pass


    def forward(self,train_left,train_right):
        device = torch.device("cuda" if torch.cuda.is_available() else 'cpu')
        train_left = self.word_embeds(train_left).to(device)
        train_right = self.word_embeds(train_right).to(device)

        train_left = train_left.transpose(0,1)
        train_right = train_right.transpose(0,1)

        hidden_state1 = torch.rand(1,self.batch_size,self.n_hidden1).to(device)
        cell_state1 = torch.rand(1,self.batch_size,self.n_hidden1).to(device)

        outputs1_L,(final_state1_L,_) =self.Lstm1(train_left,(hidden_state1,cell_state1))
        outputs1_L = self.dropout(outputs1_L)

        outputs1_R,(final_state1_R,_) =self.Lstm1(train_right,(hidden_state1,cell_state1))
        outputs1_R = self.dropout(outputs1_R)
        outputs1 = outputs1_L[-1]
        outputs2 = outputs1_R[-1]
        output = torch.abs(outputs1-outputs2)
        output = self.fc1(output)+self.b1
        output = self.dropout(output)
        output = self.fc2(output)+self.b2
        output = torch.sigmoid(output)
        return output
        pass

train模块

def train(model, device, train_dataloader, optimizer, epoch):

    model.train()
    train_loss = 0
    num_correct = 0
    for batch_idx,(train_left,train_right,lables) in enumerate(train_dataloader):
        train_left = train_left.to(device)
        train_right = train_right.to(device)
        lables = lables.to(device)
        optimizer.zero_grad()
        output = model(train_left,train_right)
        output = output.view_as(lables)
        loss = loss_fn(output,lables)
        loss.backward()
        optimizer.step()
        train_loss += float(loss.item())
        true = lables.data.cpu()
        predict = torch.round(output).cpu()
        num_correct += torch.eq(predict, true).sum().float().item()
    total_len = len(train_dataloader.dataset)
    train_acc = num_correct / len(train_dataloader.dataset)
    train_loss = train_loss/ len(train_dataloader)
    print('Train epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f} \t Acc: {:.6f}'.format(epoch,
                                                                           batch_idx * len(train_left),
                                                                           len(train_dataloader.dataset),
                                                                           100. * batch_idx / len(train_dataloader),
                                                                           train_loss,
                                                                           train_acc))

结果(数据集采用的是Quora Question Pairs)

D:\Anaconda3\envs\python36\python.exe D:/TextMatching/NewProject2/train.py
Learning rate:  0.01
Epochs:  10
Training on 363861 samples...
Train epoch: 1 [363000/363861 (100%)]	Loss: 0.450285 	 Acc: 0.787457
Make prediction for 40429 samples...
F1 improved at epoch: 1 ; best_F1:0.75543 ; best_Accuracy:0.79368 ; best_Precision:0.67190 ; best_Recall:0.86267

Training on 363861 samples...
Train epoch: 2 [363000/363861 (100%)]	Loss: 0.368117 	 Acc: 0.833332
Make prediction for 40429 samples...
F1:0.75543 Accuracy:0.79368 Precision:0.67190 Recall:0.86267 No improvement since epoch: 1

Training on 363861 samples...
Train epoch: 3 [363000/363861 (100%)]	Loss: 0.336102 	 Acc: 0.848426
Make prediction for 40429 samples...
F1 improved at epoch: 3 ; best_F1:0.76229 ; best_Accuracy:0.79602 ; best_Precision:0.66912 ; best_Recall:0.88559

Training on 363861 samples...
Train epoch: 4 [363000/363861 (100%)]	Loss: 0.320470 	 Acc: 0.855816
Make prediction for 40429 samples...
F1 improved at epoch: 4 ; best_F1:0.77047 ; best_Accuracy:0.81200 ; best_Precision:0.70121 ; best_Recall:0.85491

Training on 363861 samples...
Train epoch: 5 [363000/363861 (100%)]	Loss: 0.308956 	 Acc: 0.862206
Make prediction for 40429 samples...
F1:0.77047 Accuracy:0.81200 Precision:0.70121 Recall:0.85491 No improvement since epoch: 4

Training on 363861 samples...
Train epoch: 6 [363000/363861 (100%)]	Loss: 0.302345 	 Acc: 0.864965
Make prediction for 40429 samples...
F1:0.77047 Accuracy:0.81200 Precision:0.70121 Recall:0.85491 No improvement since epoch: 4

Training on 363861 samples...
Train epoch: 7 [363000/363861 (100%)]	Loss: 0.296583 	 Acc: 0.867658
Make prediction for 40429 samples...
F1:0.77047 Accuracy:0.81200 Precision:0.70121 Recall:0.85491 No improvement since epoch: 4

Training on 363861 samples...
Train epoch: 8 [363000/363861 (100%)]	Loss: 0.293175 	 Acc: 0.868862
Make prediction for 40429 samples...
F1:0.77047 Accuracy:0.81200 Precision:0.70121 Recall:0.85491 No improvement since epoch: 4

Training on 363861 samples...
Train epoch: 9 [363000/363861 (100%)]	Loss: 0.288697 	 Acc: 0.871360
Make prediction for 40429 samples...
F1:0.77047 Accuracy:0.81200 Precision:0.70121 Recall:0.85491 No improvement since epoch: 4

Training on 363861 samples...
Train epoch: 10 [363000/363861 (100%)]	Loss: 0.287799 	 Acc: 0.871404
Make prediction for 40429 samples...
F1:0.77047 Accuracy:0.81200 Precision:0.70121 Recall:0.85491 No improvement since epoch: 4


Process finished with exit code 0

最后,如果是做文本相似方向的可以和我一起交流一下,我也是个新手。。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
本课程隶属于自然语言处理(NLP)实战系列。自然语言处理(NLP)是数据科学里的一个分支,它的主要覆盖的内容是:以一种智能与高效的方式,对文本数据进行系统化分析、理解与信息提取的过程。通过使用NLP以及它的组件,我们可以管理非常大块的文本数据,或者执行大量的自动化任务,并且解决各式各样的问题,如自动摘要,机器翻译,命名实体识别,关系提取,情感分析,语音识别,以及主题分割等等。一般情况下一个初级NLP工程师的工资从15万-35万不等,所以掌握NLP技术,对于人工智能学习者来讲是非常关键的一个环节。【超实用课程内容】 课程从自然语言处理的基本概念与基本任务出发,对目前主流的自然语言处理应用进行全面细致的讲解,包括文本分类,文本摘要提取,文本相似度,文本情感分析,文本特征提取等,同时算法方面包括经典算法与深度学习算法的结合,例如LSTM,BiLSTM等,并结合京东电商评论分类、豆瓣电影摘要提取、今日头条舆情挖掘、饿了么情感分析等过个案例,帮助大家熟悉自然语言处理工程师在工作中会接触到的常见应用的实施的基本实施流程,从0-1入门变成自然语言处理研发工程师。 【课程如何观看?】 PC端:https://edu.csdn.net/course/detail/25649 移动端:CSDN 学院APP(注意不是CSDN APP哦)本课程为录播课,课程2年有效观看时长,大家可以抓紧时间学习后一起讨论哦~【学员专享增值服务】 源码开放课件、课程案例代码完全开放给你,你可以根据所学知识,自行修改、优化下载方式:电脑登录https://edu.csdn.net/course/detail/25649,点击右下方课程资料、代码、课件等打包下载通过第二课时下载材料
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值