智慧工地数据集,用于分割或检测
“主要用于建筑工地生产要素分割或检测”
数据集包含十个常见的建筑对象类别,分为三大类:
工人(戴安全帽和未戴安全帽两类)
机器(PC运输车、自卸卡车、混凝土搅拌车、挖掘机、压路机、推土机和轮式装载机七类)
材料(预制构件一类)
数据规模:包含5w张jpg格式图片,超8.3w个标注实例:
每张jpg图片与每个json标注一一对应
labelme标注json格式,polygon多边形框,用于语义分割/实例分割应用
labelme标注json格式,rectangle矩形框,用于目标检测应用
智慧工地数据集包含了大量的建筑工地图像,这些图像被标注用于分割或检测任务。数据集中有10个常见的建筑对象类别,分为三大类:工人、机器和材料。每张图像都配有对应的JSON格式的标签文件,包括多边形框(用于语义分割/实例分割)和矩形框(用于目标检测)。以下是处理这个数据集并使用它来训练模型的步骤。
1. 环境准备
首先,确保安装了必要的库和工具。你可以使用以下命令安装所需的库:
pip install torch torchvision
pip install numpy
pip install pandas
pip install matplotlib
pip install opencv-python
pip install pyyaml
pip install segmentation_models_pytorch
pip install albumentations
pip install labelme
2. 数据集准备
假设你的数据集目录结构如下:
smart_construction_dataset/
├── images/
│ ├── train/
│ ├── val/
│ └── test/
├── labels/
│ ├── train/
│ ├── val/
│ └── test/
└── config.yaml
每个图像文件和对应的标签文件都以相同的文件名命名,例如 0001.jpg
和 0001.json
。
3. JSON到PNG转换(语义分割)
由于大多数语义分割模型期望标签为PNG格式的掩码图像,我们需要将JSON格式的标签转换为PNG格式。可以使用以下Python脚本来完成这个转换:
import json
import os
import cv2
import numpy as np
from labelme import utils
def convert_json_to_mask(json_path, image_size, class_map):
with open(json_path, 'r') as f:
data = json.load(f)
mask = np.zeros(image_size, dtype=np.uint8)
for shape in data['shapes']:
points = np.array(shape['points'], dtype=np.int32)
label = shape['label']
if label in class_map:
class_id = class_map[label]
cv2.fillPoly(mask, [points], class_id)
return mask
# 定义类别映射
class_map = {
'WorkerWithHelmet': 1,
'WorkerWithoutHelmet': 2,
'PCTransportTruck': 3,
'DumpTruck': 4,
'ConcreteMixerTruck': 5,
'Excavator': 6,
'Roller': 7,
'Bulldozer': 8,
'WheelLoader': 9,
'PrefabricatedComponent': 10
}
# 转换训练集、验证集和测试集
for split in ['train', 'val', 'test']:
image_dir = f'smart_construction_dataset/images/{split}'
label_dir = f'smart_construction_dataset/labels/{split}'
output_dir = f'smart_construction_dataset/masks/{split}'
if not os.path.exists(output_dir):
os.makedirs(output_dir)
for filename in os.listdir(label_dir):
if not filename.endswith('.json'):
continue
json_path = os.path.join(label_dir, filename)
image_path = os.path.join(image_dir, filename.replace('.json', '.jpg'))
mask_path = os.path.join(output_dir, filename.replace('.json', '.png'))
# 读取图像尺寸
image = cv2.imread(image_path)
image_size = (image.shape[1], image.shape[0])
# 生成掩码
mask = convert_json_to_mask(json_path, image_size, class_map)
cv2.imwrite(mask_path, mask)
4. 创建配置文件
创建一个 config.yaml
文件,内容如下:
train_images: smart_construction_dataset/images/train
train_masks: smart_construction_dataset/masks/train
val_images: smart_construction_dataset/images/val
val_masks: smart_construction_dataset/masks/val
test_images: smart_construction_dataset/images/test
test_masks: smart_construction_dataset/masks/test
nc: 11
names: ['Background', 'WorkerWithHelmet', 'WorkerWithoutHelmet', 'PCTransportTruck', 'DumpTruck', 'ConcreteMixerTruck', 'Excavator', 'Roller', 'Bulldozer', 'WheelLoader', 'PrefabricatedComponent']
5. 数据加载器
创建自定义的数据加载器来读取图像和掩码。
import os
import cv2
import numpy as np
from torch.utils.data import Dataset, DataLoader
import albumentations as A
from albumentations.pytorch import ToTensorV2
class SmartConstructionDataset(Dataset):
def __init__(self, image_dir, mask_dir, transform=None):
self.image_dir = image_dir
self.mask_dir = mask_dir
self.transform = transform
self.images = os.listdir(image_dir)
def __len__(self):
return len(self.images)
def __getitem__(self, idx):
img_path = os.path.join(self.image_dir, self.images[idx])
mask_path = os.path.join(self.mask_dir, self.images[idx].replace('.jpg', '.png'))
image = cv2.imread(img_path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
mask = cv2.imread(mask_path, cv2.IMREAD_GRAYSCALE)
if self.transform is not None:
transformed = self.transform(image=image, mask=mask)
image = transformed['image']
mask = transformed['mask']
return image, mask
# 数据增强
transform = A.Compose([
A.Resize(256, 256), # 根据需要调整尺寸
A.Rotate(limit=35, p=1.0),
A.HorizontalFlip(p=0.5),
A.VerticalFlip(p=0.5),
A.Normalize(
mean=[0.0, 0.0, 0.0],
std=[1.0, 1.0, 1.0],
max_pixel_value=255.0,
),
ToTensorV2(),
])
# 数据加载器
train_dataset = SmartConstructionDataset(train_image_dir, train_mask_dir, transform=transform)
val_dataset = SmartConstructionDataset(val_image_dir, val_mask_dir, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True, num_workers=4)
val_loader = DataLoader(val_dataset, batch_size=16, shuffle=False, num_workers=4)
6. 模型选择与训练
我们可以使用 segmentation_models_pytorch
库中的预训练模型进行训练。这里我们选择 Unet
模型作为示例。
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.tensorboard import SummaryWriter
from segmentation_models_pytorch import Unet
# 超参数
batch_size = 16
num_epochs = 50
learning_rate = 0.001
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# 模型
model = Unet(
encoder_name='resnet34',
encoder_weights='imagenet',
in_channels=3,
classes=11
).to(device)
# 损失函数和优化器
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
# TensorBoard
writer = SummaryWriter('runs/smart_construction_segmentation')
# 训练循环
for epoch in range(num_epochs):
model.train()
running_loss = 0.0
for i, (images, masks) in enumerate(train_loader):
images = images.to(device)
masks = masks.to(device)
optimizer.zero_grad()
outputs = model(images)
loss = criterion(outputs, masks)
loss.backward()
optimizer.step()
running_loss += loss.item()
avg_train_loss = running_loss / len(train_loader)
writer.add_scalar('Training Loss', avg_train_loss, epoch)
# 验证
model.eval()
with torch.no_grad():
running_val_loss = 0.0
for images, masks in val_loader:
images = images.to(device)
masks = masks.to(device)
outputs = model(images)
loss = criterion(outputs, masks)
running_val_loss += loss.item()
avg_val_loss = running_val_loss / len(val_loader)
writer.add_scalar('Validation Loss', avg_val_loss, epoch)
print(f'Epoch [{epoch+1}/{num_epochs}], Train Loss: {avg_train_loss:.4f}, Val Loss: {avg_val_loss:.4f}')
# 保存模型
torch.save(model.state_dict(), 'unet_smart_construction.pth')
7. 可视化预测结果
使用以下代码来可视化模型的预测结果:
import matplotlib.pyplot as plt
# 加载模型
model.load_state_dict(torch.load('unet_smart_construction.pth'))
model.eval()
# 读取图像
image_path = 'smart_construction_dataset/images/test/0001.jpg'
image = cv2.imread(image_path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# 预处理
image = cv2.resize(image, (256, 256))
image = image.astype(np.float32) / 255.0
image = np.transpose(image, (2, 0, 1))
image = torch.tensor(image, dtype=torch.float32).unsqueeze(0).to(device)
# 进行预测
with torch.no_grad():
output = model(image)
pred = torch.argmax(output, dim=1).squeeze().cpu().numpy()
# 显示结果
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.imshow(cv2.cvtColor(cv2.imread(image_path), cv2.COLOR_BGR2RGB))
plt.title('Original Image')
plt.axis('off')
plt.subplot(1, 2, 2)
plt.imshow(pred, cmap='viridis')
plt.title('Predicted Segmentation')
plt.axis('off')
plt.show()
8. 目标检测
对于目标检测任务,我们可以使用YOLOv8模型。首先,需要将JSON格式的矩形框标签转换为YOLO格式。
import json
import os
def convert_json_to_yolo(json_path, image_size, class_map, output_path):
with open(json_path, 'r') as f:
data = json.load(f)
with open(output_path, 'w') as f:
for shape in data['shapes']:
if shape['shape_type'] == 'rectangle':
points = shape['points']
label = shape['label']
if label in class_map:
class_id = class_map[label]
x_min, y_min = points[0]
x_max, y_max = points[1]
center_x = (x_min + x_max) / 2.0 / image_size[0]
center_y = (y_min + y_max) / 2.0 / image_size[1]
width = (x_max - x_min) / image_size[0]
height = (y_max - y_min) / image_size[1]
f.write(f"{class_id} {center_x:.6f} {center_y:.6f} {width:.6f} {height:.6f}\n")
# 类别映射
class_map = {
'WorkerWithHelmet': 0,
'WorkerWithoutHelmet': 1,
'PCTransportTruck': 2,
'DumpTruck': 3,
'ConcreteMixerTruck': 4,
'Excavator': 5,
'Roller': 6,
'Bulldozer': 7,
'WheelLoader': 8,
'PrefabricatedComponent': 9
}
# 转换训练集、验证集和测试集
for split in ['train', 'val', 'test']:
image_dir = f'smart_construction_dataset/images/{split}'
label_dir = f'smart_construction_dataset/labels/{split}'
output_dir = f'smart_construction_dataset/labels_yolo/{split}'
if not os.path.exists(output_dir):
os.makedirs(output_dir)
for filename in os.listdir(label_dir):
if not filename.endswith('.json'):
continue
json_path = os.path.join(label_dir, filename)
image_path = os.path.join(image_dir, filename.replace('.json', '.jpg'))
yolo_label_path = os.path.join(output_dir, filename.replace('.json', '.txt'))
# 读取图像尺寸
image = cv2.imread(image_path)
image_size = (image.shape[1], image.shape[0])
# 生成YOLO格式的标签
convert_json_to_yolo(json_path, image_size, class_map, yolo_label_path)
9. YOLOv8训练
创建YOLOv8的配置文件 smart_construction.yaml
:
train: smart_construction_dataset/images/train
val: smart_construction_dataset/images/val
test: smart_construction_dataset/images/test
nc: 10
names: ['WorkerWithHelmet', 'WorkerWithoutHelmet', 'PCTransportTruck', 'DumpTruck', 'ConcreteMixerTruck', 'Excavator', 'Roller', 'Bulldozer', 'WheelLoader', 'PrefabricatedComponent']
然后使用YOLOv8进行训练:
yolo task=detect mode=train model=yolov8n.yaml data=smart_construction.yaml epochs=100 imgsz=640 batch=16
10. 总结
通过以上步骤,你可以成功地使用Unet模型对智慧工地数据集进行语义分割,并使用YOLOv8模型进行目标检测。
11. 额外建议
- 多尺度训练:可以考虑使用多尺度训练来提高模型的鲁棒性。
- 迁移学习:如果数据集中的某些类别与现有公开数据集中的类别相似,可以考虑使用这些公开数据集进行预训练,然后再在你的数据集上进行微调。
- 模型压缩:如果最终模型需要部署到资源受限的设备上,可以考虑使用模型压缩技术,如剪枝、量化等,来减小模型的大小和计算需求。