模型剪枝是用在模型的一种优化技术,旨在减少神经网络中不必要的参数,从而降低模型的复杂性和计算负载,进一步提高模型的效率。
模型剪枝的流程:约束训练(constained training)、剪枝(prune)、回调训练(finetune)
本篇主要记录自己YOLOv8模型剪枝的全过程,主要参考:YOLOv8剪枝全过程
目 录
一、约束训练(constrained training)
1、参数设置
设置./ultralytics/cfg/default.yaml中的amp=False
2、稀疏训练
主要方式:在BN层添加L1正则化
具体步骤:在./ultralytics/engine/trainer.py中添加以下内容:
# Backward
self.scaler.scale(self.loss).backward()
# ========== added(新增) ==========
# 1 constrained training
l1_lambda = 1e-2 * (1 - 0.9 * epoch / self.epochs)
for k, m in self.model.named_modules():
if isinstance(m, nn.BatchNorm2d):
m.weight.grad.data.add_(l1_lambda * torch.sign(m.weight.data))
m.bias.grad.data.add_(1e-2 * torch.sign(m.bias.data))
# ========== added(新增) ==========
# Optimize - https://pytorch.org/docs/master/notes/amp_examples.html
if ni - last_opt_step >= self.accumulate:
self.optimizer_step()
last_opt_step = ni
然后启动训练(/yolov8/train.py):
from ultralytics import YOLO
model = YOLO('yolov8n.yaml')
results = model.train(data='./data/data_nc5/data_nc5.yaml', batch=8, epochs=300, save=True)
二、剪枝(prune)
一该部分选用上一步训练得到的模型./runs/detect/train2/weight/last.pt进行剪枝处理。在/yolov8/下新建文件prune.py,具体内容如下:
from ultralytics import YOLO
import torch
from ultralytics.nn.modules import Bottleneck, Conv, C2f, SPPF, Detect
# Load a model
yolo = YOLO("./runs/detect/train2/weights/last.pt")
model = yolo.model
ws = []
bs = []
for name, m in model.named_modules():
if isinstance(m, torch.nn.BatchNorm2d):
w = m.weight.abs().detach()
b = m.bias.abs().detach()
ws.append(w)
bs.append(b)
# print(name, w.max().item(), w.min().item(), b.max().item(), b.min().item())
# keep
factor = 0.8
ws = torch.cat(ws)
threshold = torch.sort(ws, descending=True)[0][int(len(ws) * factor)]
print(threshold)
def prune_conv(conv1: Conv, conv2: Conv):
gamma = conv1.bn.weight.data.detach()
beta = conv1.bn.bias.data.detach()
keep_idxs = []
local_threshold = threshold
while len(keep_idxs) < 8:
keep_idxs = torch.where(gamma.abs() >= local_threshold)[0]
local_threshold = local_threshold * 0.5
n = len(keep_idxs)
# n = max(int(len(idxs) * 0.8), p)
# print(n / len(gamma) * 100)
# scale = len(idxs) / n
conv1.bn.weight.data = gamma[keep_idxs]
conv1.bn.bias.data = beta[keep_idxs]
conv1.bn.running_var.data = conv1.bn.running_var.data[keep_idxs]
conv1.bn.running_mean.data = conv1.bn.running_mean.data[keep_idxs]
conv1.bn.num_features = n
conv1.conv.weight.data = conv1.conv.weight.data[keep_idxs]
conv1.conv.out_channels = n
if conv1.conv.bias is not None:
conv1.conv.bias.data = conv1.conv.bias.data[keep_idxs]
if not isinstance(conv2, list):
conv2 = [conv2]
for item in conv2:
if item is not None:
if isinstance(item, Conv):
conv = item.conv
else:
conv = item
conv.in_channels = n
conv.weight.data = conv.weight.data[:, keep_idxs]
def prune(m1, m2):
if isinstance(m1, C2f): # C2f as a top conv
m1 = m1.cv2
if not isinstance(m2, list): # m2 is just one module
m2 = [m2]
for i, item in enumerate(m2):
if isinstance(item, C2f) or isinstance(item, SPPF):
m2[i] = item.cv1
prune_conv(m1, m2)
for name, m in model.named_modules():
if isinstance(m, Bottleneck):
prune_conv(m.cv1, m.cv2)
seq = model.model
for i in range(3, 9):
if i in [6, 4, 9]: continue
prune(seq[i], seq[i + 1])
detect: Detect = seq[-1]
last_inputs = [seq[15], seq[18], seq[21]]
colasts = [seq[16], seq[19], None]
for last_input, colast, cv2, cv3 in zip(last_inputs, colasts, detect.cv2, detect.cv3):
prune(last_input, [colast, cv2[0], cv3[0]])
prune(cv2[0], cv2[1])
prune(cv2[1], cv2[2])
prune(cv3[0], cv3[1])
prune(cv3[1], cv3[2])
for name, p in yolo.model.named_parameters():
p.requires_grad = True
yolo.val() # 剪枝模型进行验证 yolo.val(workers=0)
yolo.export(format="onnx") # 导出为onnx文件
# yolo.train(data="./data/data_nc5/data_nc5.yaml", epochs=100) # 剪枝后直接训练微调
torch.save(yolo.ckpt, "./runs/detect/train2/weights/prune.pt")
print("done")
其中,factor=0.8 表示的是保持率,factor越小,裁剪的就越多,一般不建议裁剪太多。
运行prune.py,可得到剪枝后的模型prune.pt,保存在./runs/detect/train2/weight/中。同文件夹下,还有last.onnx,可以看到onnx文件的大小比剪枝前变小了,具体结构(onnx模型结构查看)也和剪枝前的onnx相比有了轻微变化。
三、回调训练(finetune)
1、代码修改
首先,将先前在./ultralytics/engine/trainer.py中添加的L1正则化部分注释掉:
# Backward
self.scaler.scale(self.loss).backward()
# # ========== added(新增) ==========
# # 1 constrained training
# l1_lambda = 1e-2 * (1 - 0.9 * epoch / self.epochs)
# for k, m in self.model.named_modules():
# if isinstance(m, nn.BatchNorm2d):
# m.weight.grad.data.add_(l1_lambda * torch.sign(m.weight.data))
# m.bias.grad.data.add_(1e-2 * torch.sign(m.bias.data))
# # ========== added(新增) ==========
# Optimize - https://pytorch.org/docs/master/notes/amp_examples.html
if ni - last_opt_step >= self.accumulate:
self.optimizer_step()
last_opt_step = ni
然后,在该文件第543行左右添加代码 “self.model = weights” :
def setup_model(self):
"""Load/create/download model for any task."""
if isinstance(self.model, torch.nn.Module): # if model is loaded beforehand. No setup needed
return
model, weights = self.model, None
ckpt = None
if str(model).endswith(".pt"):
weights, ckpt = attempt_load_one_weight(model)
cfg = weights.yaml
else:
cfg = model
self.model = self.get_model(cfg=cfg, weights=weights, verbose=RANK == -1) # calls Model(cfg, weights)
# ========== added(新增) ==========
# 2 finetune 回调训练
self.model = weights
# ========== added(新增) ==========
return ckpt
2、再训练
利用已经剪枝好的模型prune.pt,我们再次启动训练(/yolov8/train.py):
from ultralytics import YOLO
model = YOLO('./runs/detect/train5/weights/prune.pt')
results = model.train(data='./data/data_nc5/data_nc5.yaml', batch=8, epochs=100, save=True)
注意,这里把model改成了"prune.pt",而不是原来的"yolov8n.yaml"
训练后新的模型保存在“./runs/detect/train3/weight/”中。后面可按需要进一步进行模型的推理和部署。