PyTorch入门学习-4.自然语言分类任务

本文介绍了使用PyTorch进行自然语言分类的步骤,包括数据预处理、模型构建与训练。首先讲解了如何使用TorchText处理IMDb数据集,创建词汇表并填充短句子。接着,分别实现并训练了Word Averaging模型、RNN模型以及CNN模型,评估了它们在情感分析任务上的性能。
摘要由CSDN通过智能技术生成

一. 情感分析
1. 准备数据

TorchText中的一个重要概念是Field。Field决定了你的数据会被怎样处理。在我们的情感分类任务中,我们所需要接触到的数据有文本字符串和两种情感,“pos"或者"neg”。

Field的参数制定了数据会被怎样处理。

使用TEXT field来定义如何处理电影评论,使用LABEL field来处理两个情感类别。

TEXT field带有tokenize=‘spacy’,这表示我们会用spaCy tokenizer来tokenize英文句子。如果我们不特别声明tokenize这个参数,那么默认的分词方法是使用空格。

安装spaCy

pip install -U spacy
python -m spacy download en

LABEL由LabelField定义。这是一种特别的用来处理label的Field。

import torch
from torchtext import data

SEED = 1234

torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

TEXT = data.Field(tokenize='spacy')
LABEL = data.LabelField(dtype=torch.float)

说明:

TorchText支持很多常见的自然语言处理数据集。
下面的代码会自动下载IMDb数据集,然后分成train/test两个torchtext.datasets类别。数据被前面的Fields处理。IMDb数据集一共有50000电影评论,每个评论都被标注为正面的或负面的。

from torchtext import datasets
train_data, test_data = datasets.IMDB.splits(TEXT, LABEL)

查看每个数据split有多少条数据

print(f'Number of training examples: {
     len(train_data)}')
print(f'Number of testing examples: {
     len(test_data)}')

结果:

Number of training examples: 25000
Number of testing examples: 25000

查看一个example

print(vars(train_data.examples[0]))

结果:

{
   ‘text’: [‘Brilliant’, ‘adaptation’, ‘of’, ‘the’, ‘novel’, ‘that’, ‘made’, ‘famous’, ‘the’, ‘relatives’, ‘of’, ‘Chilean’, ‘President’, ‘Salvador’, ‘Allende’, ‘killed’,., ‘In’, ‘the’, ‘environment’, ‘of’, ‘a’, ‘large’, ‘estate’, ‘that’, ‘arises’,from, ‘the’, ‘ruins’,,, ‘becoming’, ‘a’, ‘force’, ‘to’, ‘abuse’,and, ‘exploitation’, ‘of’, ‘outrage’,,, ‘a’, ‘luxury’, ‘estate’,for, ‘the’, ‘benefit’, ‘of’, ‘the’, ‘upstart’, ‘Esteban’, ‘Trueba’,and, ‘his’, ‘undeserved’, ‘family’,,, ‘the’, ‘brilliant’, ‘Danish’, ‘director’, ‘Bille’, ‘August’, ‘recreates’,,,in, ‘micro’,,, ‘which’, ‘at’, ‘the’, ‘time’, ‘would’, ‘be’, ‘the’, ‘process’, ‘leading’, ‘to’, ‘the’, ‘greatest’, ‘infamy’, ‘of’, ‘his’, ‘story’, ‘to’, ‘the’, ‘hardened’, ‘Chilean’, ‘nation’,,,and, ‘whose’, ‘main’, ‘character’, ‘would’, ‘Augusto’, ‘Pinochet’,(, ‘Stephen’, ‘similarities’,with, ‘it’, ‘are’, ‘inevitable’,:, ‘recall’,,,as, ‘an’, ‘example’,,, ‘that’, ‘image’, ‘of’, ‘the’, ‘senator’,with, ‘dark’, ‘glasses’, ‘that’, ‘makes’, ‘him’, ‘the’, ‘wink’, ‘to’, ‘the’, ‘general’, ‘to’, ‘begin’, ‘making’, ‘the’, ‘palace).<br’,/><br’,/>Bille’, ‘August’, ‘attends’, ‘an’, ‘exceptional’, ‘cast’,in, ‘the’, ‘Jeremy’, ‘protruding’, ‘Irons’,,, ‘whose’, ‘character’, ‘changes’,from, ‘arrogance’,and, ‘extreme’, ‘cruelty’,,, ‘the’, ‘hard’, ‘lesson’, ‘that’, ‘life’, ‘always’, ‘brings’, ‘us’, ‘to’, ‘almost’, ‘force’, ‘us’, ‘to’, ‘change’,., ‘In’, ‘Esteban’, ‘fully’, ‘applies’, ‘the’, ‘law’, ‘of’, ‘resonance’,,,with, ‘great’, ‘wisdom’,,, ‘Solomon’, ‘describes’,in, ‘these’, ‘words:"The’, ‘things’, ‘that’, ‘freckles’, ‘are’, ‘the’, ‘same’, ‘punishment’, ‘that’, ‘will’, ‘serve’, ‘you’, ‘.’, ‘",<, ‘br’,/><br’,/>Unforgettable’, ‘Glenn’, ‘Close’, ‘playing’, ‘splint’,,, ‘the’, ‘tainted’, ‘sister’, ‘of’, ‘Stephen’,,, ‘whose’, ‘sin’,,, ‘driven’, ‘by’, ‘loneliness’,,, ‘spiritual’,and, ‘platonic’, ‘love’, ‘was’, ‘the’, ‘wife’, ‘of’, ‘his’, ‘cruel’, ‘snowy’, ‘brother’,., ‘Meryl’, ‘Streep’, ‘also’, ‘brilliant’,,, ‘a’, ‘woman’, ‘whose’, ‘name’, ‘came’, ‘to’, ‘him’, ‘like’, ‘a’, ‘glove’, ‘Clara’,., ‘With’, ‘telekinetic’, ‘powers’,,, ‘cognitive’,and, ‘mediumistic’,,, ‘this’, ‘hardened’, ‘woman’,,, ‘loyal’, ‘to’, ‘his’, ‘blunt’,,, ‘conservative’, ‘husband’,,,is, ‘an’, ‘indicator’, ‘of’, ‘character’,and, ‘self’,-, ‘control’, ‘that’, ‘we’, ‘wish’,for, ‘ourselves’,and,for,all, ‘human’, ‘beings’,.,<, ‘br’,/><br’,/>Every’, ‘character’,is, ‘a’, ‘portrait’, ‘of’, ‘virtuosity’,(,as, ‘Blanca’, ‘worthy’, ‘rebel’, ‘leader’, ‘Pedro’, ‘Segundo’, ‘unhappy’, ‘…’,),or, ‘a’, ‘portrait’, ‘of’, ‘humiliation’,,, ‘like’, ‘Stephen’, ‘Jr.,,, ‘the’, ‘bastard’, ‘child’, ‘of’, ‘Senator’,,, ‘who’, ‘serves’,as, ‘an’, ‘instrument’,for, ‘the’,return, ‘of’, ‘the’, ‘boomerang’,.,<, ‘br’,/><br’,/>The’, ‘film’, ‘moves’, ‘the’, ‘bowels’,,, ‘we’, ‘recreated’, ‘some’, ‘facts’, ‘that’, ‘should’,not, ‘ever’, ‘be’, ‘repeated’,,
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值