Darknet-53|Some theories based on YOLOv3

Today, we are talking about YOLOv3, a great way to extract several characteristics in different objects, which includes a lot of complex parts and each of them also has plenty of details.
Let’s begin with Feature Pyramid Network.

The most important core of YOLOv3 is to transform object detection into regression based on only one CNN to make model run faster.
在这里插入图片描述
Maybe, your first sight is not so comfortable because of too many convolution as well as residual layers. In fact, it just does the same job over and over again to divide your picture into several different boxes and pay attention to each center of boxes to determine which is the best case in specific objects.
There are some python codes to build this CNN. Just finish it for several times.

    x = ZeroPadding2D(((1,0),(1,0)))(x)
    x = DarknetConv2D_BN_Leaky(num_filters, (3,3), strides=(2,2))(x)
    for i in range(num_blocks):
        y = DarknetConv2D_BN_Leaky(num_filters//2, (1,1))(x)
        y = DarknetConv2D_BN_Leaky(num_filters, (3,3))(y)
        x = Add()([x,y])
    return x

Most importantly, YOLOv3 is so faster than other models based on the structure of feature pyramid network(FPN). For instance, let’s imagine a picture of forest, obviously it’s easy for you to find one tree in it because the tree occupes for a large space. But if you want to find one specific leaf in the picture, which only occupy for a few pixels with small size, it seems not so easy. With the depth of your model after several convolution and average pooling processes, the larger object will be more attention-getting in comparison with the smaller one.
To solve this problem, “Upsampling” is coming! Each size of pyramid is so different but after upsampling to smaller one, it will become bigger with the same size of the last one, with ability to concatenate with it properly to find more features in it.
在这里插入图片描述
In addition, it only has three layers of characteristic, dividing pictures into 1313,2626,52*52 boxes. In order to make full use of these three layers, “upsampling” is the best way to build a bridge between different sizes of layers. Based on that, it’s so easy to rebuild your own pyramid and recycle each of layers efficiently.

Well, it’s not so hard to achieve a characteristic of pyramid because there are only a few words which can help you do a good job. Let’s pay more attention to some important codes!

    x = DarknetConv2D_BN_Leaky(32, (3,3))(x) #一个特殊卷积块,通道数调整为416*416*3(不变)
    x = resblock_body(x, 64, 1)#208*208 重复1次
    x = resblock_body(x, 128, 2) #104*104*128 重复2次
    x = resblock_body(x, 256, 8) #52*52*256 8次
    feat1 = x
    x = resblock_body(x, 512, 8)  #26*26*512 8次
    feat2 = x
    x = resblock_body(x, 1024, 4) #13*13*1024 4次
    feat3 = x
    return feat1,feat2,feat3

在这里插入图片描述
Haha, it helps us find the most important point in the picture!

In conclusion, darknet-53 is well-known as the simple structure as well as perfect results in object detection. Based on this, it’s so easy for everyone to build your own CNN with less than 30 minutes.
That’s all, thank you for watching~

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值