搜集整理的超大超详细垃圾分类数据集（是分类共4大类，345小类，147913张图，已全部分类标注完成，共12GB。厨余垃圾 76小类 35058张

look me head

于 2024-09-29 05:40:50 发布

阅读量135

点赞数 4

分类专栏：数据集文章标签： YOLO QQ767172261 数据集垃圾分类数据集

本文链接：https://blog.csdn.net/2401_83580557/article/details/142625519

版权

数据集专栏收录该内容

269 篇文章 9 订阅

订阅专栏

搜集整理的超大超详细垃圾分类数据集（是分类，不要再问我有没有检测框，有没有yolo格式了），共4大类，345小类，147913张图，已全部分类标注完成，共12GB。

厨余垃圾 76小类 35058张
可回收物 195类 86116张
其他垃圾 53类 16156张
有害垃圾 18小类 10583张

超大超详细垃圾分类数据集介绍

数据集概览

总大小: 12GB
图像数量: 147,913张
类别数量:
- 总类别: 345小类
- 厨余垃圾: 76小类
- 可回收物: 195小类
- 其他垃圾: 53小类
- 有害垃圾: 18小类
任务类型: 图像分类
标注情况: 所有图像均已分类标注完成

类别划分

厨余垃圾 (76小类, 35,058张): 包括食物残渣、果皮、蔬菜叶等。
可回收物 (195小类, 86,116张): 包括纸张、塑料、金属、玻璃等。
其他垃圾 (53小类, 16,156张): 包括一次性餐具、卫生纸、尘土等。
有害垃圾 (18小类, 10,583张): 包括废电池、废荧光灯管、过期药品等。

数据集结构

假设数据集文件夹结构如下：

garbage_classification_dataset/
├── train/
│   ├── kitchen_waste/
│   │   ├── food_waste/
│   │   ├── fruit_peels/
│   │   ├── ...
│   ├── recyclable_waste/
│   │   ├── paper/
│   │   ├── plastic/
│   │   ├── ...
│   ├── other_waste/
│   │   ├── disposable_tableware/
│   │   ├── tissue_paper/
│   │   ├── ...
│   ├── hazardous_waste/
│   │   ├── batteries/
│   │   ├── fluorescent_tubes/
│   │   ├── ...
├── val/
│   ├── kitchen_waste/
│   ├── recyclable_waste/
│   ├── other_waste/
│   ├── hazardous_waste/
├── test/
│   ├── kitchen_waste/
│   ├── recyclable_waste/
│   ├── other_waste/
│   ├── hazardous_waste/
└── README.md

train/, val/, test/ 目录分别存放训练集、验证集和测试集的图像。
每个子目录对应一个具体的类别，例如 kitchen_waste/food_waste 表示厨余垃圾中的食物残渣。
README.md 文件包含数据集的使用说明和字段解释。

使用场景

垃圾分类: 用于自动识别和分类不同类型的垃圾。
环保教育: 帮助公众了解正确的垃圾分类方法。
智能垃圾桶: 结合物联网技术，实现智能垃圾分类和处理。

Keras 训练代码示例

以下是一个使用Keras框架进行模型训练的代码示例。我们将使用预训练的EfficientNetB0模型作为基础，并在其上添加全连接层来进行分类。

import os
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications.efficientnet import EfficientNetB0
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

# 数据集路径
data_dir = '/path/to/garbage_classification_dataset'
train_dir = os.path.join(data_dir, 'train')
val_dir = os.path.join(data_dir, 'val')

# 图像生成器
datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest'
)

# 训练集生成器
train_generator = datagen.flow_from_directory(
    train_dir,
    target_size=(224, 224),
    batch_size=32,
    class_mode='categorical'
)

# 验证集生成器
val_generator = datagen.flow_from_directory(
    val_dir,
    target_size=(224, 224),
    batch_size=32,
    class_mode='categorical'
)

# 加载预训练的EfficientNetB0模型
base_model = EfficientNetB0(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

# 添加全局平均池化层
x = base_model.output
x = GlobalAveragePooling2D()(x)

# 添加全连接层
x = Dense(1024, activation='relu')(x)
predictions = Dense(345, activation='softmax')(x)  # 345个类别

# 构建最终模型
model = Model(inputs=base_model.input, outputs=predictions)

# 冻结基础模型的层
for layer in base_model.layers:
    layer.trainable = False

# 编译模型
model.compile(optimizer=Adam(lr=0.0001), loss='categorical_crossentropy', metrics=['accuracy'])

# 打印模型概要
model.summary()

# 设置回调函数
early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)
checkpoint = ModelCheckpoint('best_garbage_classification_model.h5', save_best_only=True, monitor='val_accuracy', mode='max')

# 训练模型
history = model.fit(
    train_generator,
    steps_per_epoch=len(train_generator),
    epochs=20,
    validation_data=val_generator,
    validation_steps=len(val_generator),
    callbacks=[early_stopping, checkpoint]
)

# 保存模型
model.save('garbage_classification_model.h5')

# 可视化训练过程
import matplotlib.pyplot as plt

plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.legend()
plt.title('Accuracy')

plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.legend()
plt.title('Loss')

plt.show()

代码说明

数据生成器:
- 使用ImageDataGenerator进行数据增强，包括旋转、平移、剪切、缩放和水平翻转等操作。
- train_generator 和 val_generator 分别为训练集和验证集生成器。
模型构建:
- 使用预训练的EfficientNetB0模型作为基础。
- 在其基础上添加全局平均池化层和全连接层。
- 最后一层使用softmax激活函数，输出345个类别的概率分布。
模型编译:
使用Adam优化器，学习率为0.0001。
- 损失函数为交叉熵损失，评估指标为准确率。