数据集加载(python, keras, pytorch)

zachysun

已于 2023-08-17 21:00:09 修改

阅读量575

点赞数 1

分类专栏： python keras 文章标签： python keras pytorch

于 2021-06-05 22:31:27 首次发布

本文链接：https://blog.csdn.net/m0_46581543/article/details/117607006

版权

python 同时被 2 个专栏收录

8 篇文章 0 订阅

订阅专栏

keras

1 篇文章 0 订阅

订阅专栏

从0开始处理数据集

图片数据以及对应的csv数据(以IDRiD数据集为例)

train_data = []
train_labels = []

def get_images(image_dir, labels_dir):
    
    for image_file in os.listdir(image_dir):
        image = cv2.imread(image_dir+r'/'+image_file)
        image = cv2.resize(image,(227,227))
        train_data.append(image)
        labels = pd.read_csv(labels_dir)
        label = list(labels[labels['Image_name'] + '.jpg' ==  image_file]['Retinopathy grade'])
        train_labels.append(label)
    
    return shuffle(train_data,train_labels,random_state=7)

train_data, train_labels = get_images()

图片数据及对应的txt数据

将txt文档的后缀名改成csv，便于1.1相同

仅txt数据

一般而言，仅给出txt数据的情况较少，遇到之后可以进行如下处理：

直接将txt文档的后缀名改成csv即可，然后再按照csv的处理方式进行处理

仅图像数据(以intel classification比赛为例)

首先是如何读取、打标签和使用含标签的训练集

只需给出需要的路径即可

def get_images(directory):
    
    Images = []
    Labels = []  
    label = 0
    
    for labels in os.listdir(directory):            # you should give the dir of the train data
        if labels == '':                            # you can change the name of labels and the number of labels
            label = 2
        elif labels == '':
            label = 4
        elif labels == '':
            label = 0
        elif labels == '':
            label = 1
        elif labels == '':
            label = 5
        elif labels == '':
            label = 3
        
        for image_file in os.listdir(directory+r'/'+labels):     
            image = cv2.imread(directory+r'/'+labels+r'/'+image_file)   # read your image and change the size of your image
            image = cv2.resize(image,(150,150)) 
            Images.append(image)
            Labels.append(label)
    
    return shuffle(Images,Labels,random_state=817328462)

Images, Labels = get_images('')   

Images = np.array(Images)                                  
Labels = np.array(Labels)

接下来是如何读取预测集

def Get_images(directory): # function for image detection
    Images = []
    
    path = os.path.join(directory)
    
    for img in os.listdir(path):
        img_array = cv2.imread(os.path.join(path, img))
        img_array = cv2.resize(img_array, (150, 150))
        Images.append(img_array)
        
    return shuffle(Images, random_state=817328462)

pred_images = Get_images('')

pred_images = np.array(pred_images)
pred_images.shape

仅csv数据

csv数据读取较为简单，直接使用以下指令即可，读取后可以当成dataframe格式进行后续的处理

pd.read_csv('')

zachysun

关注

1
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
数据集加载(python, keras, pytorch)

Re0系列-从0开始处理数据集图片数据以及对应的csv数据(以IDRiD数据集为例)train_data = []train_labels = []def get_images(image_dir, labels_dir): for image_file in os.listdir(image_dir): image = cv2.imread(image_dir+r'/'+image_file) image = cv2.resize(image,(
复制链接

扫一扫