4.torch数据集类、加载、自带数据集、transform处理图像数据

1.数据集类

继承torch.utils.data.Dataset基类并实现其中的__getitem__和__len__方法,即可定义自己的数据集类
from torch.utils.data import Dataset
data_path = "C:/Users/luoweu/Desktop/pytorch学习/DataSet/smsspamcollection/SMSSpamCollection"
class Mydataset(Dataset):
    def __init__(self):
        self.lines = open(data_path,encoding = 'mac_roman').readlines()
    def __getitem__(self,index):
        line = self.lines[index].strip()#参数为空时返回删除字符串前后空格后的副本,有字符参数时则删除前后字符串
        label = line[:4].strip()
        content=line[4:].strip()
        return label,content
    def __len__(self):
        return len(self.lines)
data = Mydataset()
print(data[5],len(data))
('spam', "FreeMsg Hey there darling it's been 3 week's now and no word back! I'd like some fun you up for it still? Tb ok! XxX std chgs to send, £1.50 to rcv") 5574

2.数据加载器类(分批、打乱、多线程加载数据)

from torch.utils.data import DataLoader
dataloader = DataLoader(dataset=data,batch_size=10,shuffle=True,num_workers=0)
for i,j in enumerate(dataloader):#不能用索引访问
    if i==5:
        print(j)
        break
[('ham', 'ham', 'spam', 'ham', 'ham', 'ham', 'spam', 'ham', 'ham', 'ham'), ("*deep sigh* ... I miss you :-( ... I am really surprised you haven't gone to the net cafe yet to get to me ... Don't you miss me?", "I'm glad. You are following your dreams.", 'Monthly password for wap. mobsi.com is 391784. Use your wap phone not PC.', 'I get out of class in bsn in like  <#>  minutes, you know where advising is?', 'WHORE YOU ARE UNBELIEVABLE.', 'Send this to ur friends and receive something about ur voice..... How is my speaking expression? 1.childish 2.naughty 3.Sentiment 4.rowdy 5.ful of attitude 6.romantic 7.shy 8.Attractive 9.funny  <#> .irritating  <#> .lovable. reply me..', 'Latest Nokia Mobile or iPOD MP3 Player +£400 proze GUARANTEED! Reply with: WIN to 83355 now! Norcorp Ltd.£1,50/Mtmsgrcvd18+', 'Ok anyway no need to change with what you said', 'Except theres a chick with huge boobs.', "I take it we didn't have the phone callon Friday. Can we assume we won't have it this year now?")]

3.torch自带数据集(torchvision\torchtesxt)

torchvision 提供对图像处理的API和数据集,数据位置:torchvision.datasets.MNIST(手写数字图片数据)
torchtext提供对文本数据处理的API和数据集,数据位置:torchtext.datasets.IMDB(电影评论文本数据)
import torchvision
#root位置,train下载训练集还是测试集,download是否联网下载,transform对图片进行处理的方法
mnist_train = torchvision.datasets.MNIST(root="C:/Users/luoweu/Desktop/pytorch学习/DataSet/MINST",train = True,download=True,transform=None)
print(mnist_train)
print(mnist_train[0])
mnist_train[0][0].show()#PIL.Image.Image image对象,可以用show方法显示
Dataset MNIST
    Number of datapoints: 60000
    Root location: C:/Users/luoweu/Desktop/pytorch学习/DataSet/MINST
    Split: Train
(<PIL.Image.Image image mode=L size=28x28 at 0x1F357DDEBE0>, 5)

4.torchvision.transforms(图像数据的处理方法)

#ToTensor把PIT.Image或者(H,W,C)的numpy数组转换为(C,H,W),类似于transpose和permute交换tensor维度的方法,把通道数放在前面
img=torchvision.transforms.ToTensor()(mnist_train[0][0])
#把图片规范化,减去均值mean,再除以标准差std,参数数量和通道数相同
norm_img=torchvision.transforms.Normalize((10),(1))(img)
norm_img
#把把多个transform对象组合成一个,使用时依次调用里面的功能
compose = torchvision.transforms.Compose([torchvision.transforms.ToTensor(),torchvision.transforms.Normalize(1,0)])
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

落尘客

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值