学习备忘。
yolox:https://github.com/Megvii-BaseDetection/YOLOX
详细解读可参考:深入浅出Yolo系列之Yolox核心基础完整讲解-技术圈
或者去大佬的公众号。
大佬个人的网站。
目录
yolox-darknet53网络结构
backbone
源码对应的文件是yolo_fpn中类YOLOFPN.
backbone由三种模块组成: CBL, Resx, SPP,下面对应代码分别介绍。
CBL模块
即Conv + BN + LeakyReLU的缩写,在BaseConv中实现。不改变特征图的HW。
pad = (ksize - 1) // 2
self.conv = nn.Conv2d(in_channels,out_channels,kernel_size=ksize,stride=stride,padding=pad,groups=groups,bias=bias,)
self.bn = nn.BatchNorm2d(out_channels)
self.act = get_activation(act, inplace=True)
ResUnit单个残差块
典型的沙漏型残差块,通道数2c -> c -> 2c
- 先用1x1的卷积,将通道减半;
- 然后接个BaseConv,将通道数复原,得到一个结果;
- 把上面的结果和输入相加,即可。
在类ResLayer中实现,不改变特征图的CHW。
def __init__(self, in_channels: int):
super().__init__()
mid_channels = in_channels // 2
self.layer1 = BaseConv(
in_channels, mid_channels, ksize=1, stride=1, act="lrelu"
)
self.layer2 = BaseConv(
mid_channels, in_channels, ksize=3, stride=1, act="lrelu"
)
def forward(self, x):
out = self.layer2(self.layer1(x))
return x + out
Resx模块
多个残差块组成。残差块的个数分别是1,2,8,8,4。后面5个模块都可以作为输出层。这里是后面三个作为输出层。
实现上,这里把Darknet53分成了5部分,如下。
def forward(self, x):
outputs = {}
x = self.stem(x)
outputs["stem"] = x
x = self.dark2(x)
outputs["dark2"] = x
x = self.dark3(x)
outputs["dark3"] = x
x = self.dark4(x)
outputs["dark4"] = x
x = self.dark5(x)
outputs["dark5"] = x
return {k: v for k, v in outputs.items() if k in self.out_features} # 只取后面三个
(1)self.stem
即backbone图中的前两部分,CBL , Res1.
self.stem = nn.Sequential(
BaseConv(in_channels, stem_out_channels, ksize=3, stride=1, act="lrelu"), # dartnet53的第一个cbl
*self.make_group_layer(stem_out_channels, num_blocks=1, stride=2), # cbl + Res1
)
def make_group_layer(self, in_channels: int, num_blocks: int, stride: int = 1):
"starts with conv layer then has `num_blocks` `ResLayer`"
return [
BaseConv(in_channels, in_channels * 2, ksize=3, stride=stride, act="lrelu"),
*[(ResLayer(in_channels * 2)) for _ in range(num_blocks)],
]
(2)self.dark2, self.dark3,self.dark4,
这些差不多,都是先BaseConv,再接一个或者多个ResLayer层。
分别对应上图backbone中:
self.dark2: Res2
self.dark3: Res8
self.dark4: Res8
in_channels = stem_out_channels * 2 # 64
self.dark2 = nn.Sequential(
*self.make_group_layer(in_channels, num_blocks[0], stride=2)
)
in_channels *= 2 # 128
self.dark3 = nn.Sequential(
*self.make_group_layer(in_channels, num_blocks[1], stride=2)
)
in_channels *= 2 # 256
self.dark4 = nn.Sequential(
*self.make_group_layer(in_channels, num_blocks[2], stride=2)
)
in_channels *= 2 # 512
self.dark5 = nn.Sequential(
*self.make_group_layer(in_channels, num_blocks[3], stride=2),
*self.make_spp_block([in_channels, in_channels * 2], in_channels * 2),
)
(3)self.dark5
self.dark5: Res4, 2*CBL+SPP+2*CBL.
def make_spp_block(self, filters_list, in_filters):
m = nn.Sequential(
*[
# 2*CBL
BaseConv(in_filters, filters_list[0], 1, stride=1, act="lrelu"),
BaseConv(filters_list[0], filters_list[1], 3, stride=1, act="lrelu"),
SPPBottleneck(
in_channels=filters_list[1],
out_channels=filters_list[0],
activation="lrelu",
),
# 2*CBL
BaseConv(filters_list[0], filters_list[1], 3, stride=1, act="lrelu"),
BaseConv(filters_list[1], filters_list[0], 1, stride=1, act="lrelu"),
]
)
return m
这里实现spp是给定三种尺度的最大池化(5,9,13),核心的函数nn.MaxPool2d。
这个和原始的spp-net中定义应该是不一样的。原始的应该是用globalAvagePooling把特征图按三种尺度pooing成向量。
spp代码如下。
def __init__(
self, in_channels, out_channels, kernel_sizes=(5, 9, 13), activation="silu"
):
super().__init__()
hidden_channels = in_channels // 2
self.conv1 = BaseConv(in_channels, hidden_channels, 1, stride=1, act=activation)
self.m = nn.ModuleList(
[
nn.MaxPool2d(kernel_size=ks, stride=1, padding=ks // 2)
for ks in kernel_sizes
]
)
conv2_channels = hidden_channels * (len(kernel_sizes) + 1) # 三个池化分支,加主分支
self.conv2 = BaseConv(conv2_channels, out_channels, 1, stride=1, act=activation)
def forward(self, x):
x = self.conv1(x) # 先1x1的卷积通道降维。
x = torch.cat([x] + [m(x) for m in self.m], dim=1) # 然后三个池化分支、加主分支
x = self.conv2(x) # 最后1x1的卷积,通道升为需要的数。
return x
spp结构可视化,如下。