Pytorch基础学习记录(2024.3.20)|Dataset的使用

最新推荐文章于 2024-10-01 12:11:59 发布

Gakkii_

最新推荐文章于 2024-10-01 12:11:59 发布

阅读量917

点赞数 9

文章标签： pytorch 学习人工智能

本文链接：https://blog.csdn.net/m0_55333430/article/details/136882280

版权

学习来源

b站小土堆教学

jupyter notebook的一些Tips

首先打开Anaconda Prompt。
激活我们的Pytorch环境，然后打开jupyter notebook, 这里注意如果我们想要打开D盘的文件夹，加上D：即可。

conda activate pytorch_v2
jupyter notebook D:

复制网址在浏览器打开后，选择相对应的文件夹，再点击右上角的new，会发现没有我们需要的环境。
如下链接解决了该问题。
Anaconda创建虚拟环境并使用Jupyter-notebook打开虚拟环境
将环境写入Notebook的kernel中：

python -m ipykernel install --user --name 环境名称 --display-name "Python (环境名称)"

Python自学的两个命令

dir() 可以打开，看见文件的位置
help() 可以查询函数或工具的官方解释

Pytorch加载数据

Dataset(数据集)：提供一种方式去获取数据及其label

如何获取每一个数据及其label
告诉我们总共有多少数据

Dataloader(数据加载)：为网络提供不同的数据形式，比如批量打包(batch_size)

如何使用Dataset

首先用help查询一下 Dataset的介绍和用法

from torch.utils.data import Dataset
help(Dataset)

class Dataset(typing.Generic)
| Dataset(*args, **kwds)
|
| An abstract class representing a :class:Dataset.
|
| All datasets that represent a map from keys to data samples should subclass
| it. All subclasses should overwrite :meth:__getitem__, supporting fetching a
| data sample for a given key. Subclasses could also optionally overwrite
| :meth:__len__, which is expected to return the size of the dataset by many
| :class:~torch.utils.data.Sampler implementations and the default options
| of :class:~torch.utils.data.DataLoader.
|
| … note::
| :class:~torch.utils.data.DataLoader by default constructs a index
| sampler that yields integral indices. To make it work with a map-style
| dataset with non-integral indices/keys, a custom sampler must be provided.

代码实例：

以下代码演示了如何用Dataset来获取数据

from torch.utils.data import Dataset
from PIL import Image
import os

class Mydata(Dataset):

    def __init__(self, root_dir, label_dir):
        self.root_dir = root_dir
        self.label_dir = label_dir
        self.path = os.path.join(self.root_dir, self.label_dir)
        # 把该路径下的图片的名字传入list中，比如img_path[0]获取的就是一张图片的名字
        self.img_path = os.listdir(self.path) 

    def __getitem__(self, idx):
        img_name = self.img_path[idx]
        img_item_path = os.path.join(self.root_dir, self.label_dir, img_name)
        # 用Image类的open打开图像用img接收
        img = Image.open(img_item_path)
        label = self.label_dir
        # 返回值为获取的图片相对应的label
        return img, label

    def __len__(self):
    	#返回list的长度，即为数据集的长度
        return len(self.img_path)


root_dir = "dataset/train"
ants_label_dir = "ants"
bees_label_dir = "bees"
ants_dataset = Mydata(root_dir, ants_label_dir)
bees_dataset = Mydata(root_dir, bees_label_dir)

train_dataset = ants_dataset + bees_dataset

以下代码的作用是在文件夹ants_label下创建txt文件存放label的值，这样的做法是为了存储更加复杂的label值

import os

root_dir = "dataset/train"
target_dir = "ants_image"
img_name_list = os.listdir(os.path.join(root_dir, target_dir))
label = target_dir.split('_')[0] # label值取_的前一部分，即ants
out_dir = "ants_label"
for i in img_name_list:
    file_name = i.split('.jpg')[0]
    # format函数可以把（）内的字符串填入{}中
    # with.open函数在文件不存在的时候,会自动创建该文件
    with open(os.path.join(root_dir, out_dir, "{}.txt".format(file_name)), 'w') as f:
        f.write(label)