Darknet: Open Source Neural Networks in C - Tiny Darknet
Darknet is an open source neural network framework written in C and CUDA. It is fast, easy to install, and supports CPU and GPU computation. You can find the source on GitHub or you can read more about what Darknet can do right here:
https://github.com/pjreddie/darknet
1. Tiny Darknet
Image classification made tiny.
I’ve heard a lot of people talking about SqueezeNet.
SqueezeNet is cool but it’s JUST optimizing for parameter count. When most high quality images are 10 MB or more why do we care if our models are 5 MB or 50 MB? If you want a small model that’s actually FAST, why not check out the Darknet reference network? It’s only 28 MB but more importantly, it’s only 800 million floating point operations. The original Alexnet is 2.3 billion. Darknet is 2.9 times faster and it’s small and it’s 4% more accurate.
[SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size]
[Darknet Reference Model]
[ImageNet Classification with Deep Convolutional Neural Networks]
So what about SqueezeNet? Sure the weights are only 4.8 MB but a forward pass is still 2.2 billion operations. Alexnet was a great first pass at classification but we shouldn’t be stuck back in the days when networks this bad are also this slow!
stick [stɪk]:vt. 刺,戳,伸出,粘贴 vi. 坚持,伸出,粘住 n. 棍,手杖,呆头呆脑的人 过去式 stuck 过去分词 stuck
But anyway, people are super into SqueezeNet so if you really insist on small networks, use this:
1.1 Tiny Darknet
Model Top-1 Top-5 Ops Size
AlexNet 57.0 80.3 2.27 Bn 238 MB
Darknet Reference 61.1 83.0 0.81 Bn 28 MB
SqueezeNet 57.5 80.3 2.17 Bn 4.8 MB
Tiny Darknet 58.7 81.7 0.98 Bn 4.0 MB
The real winner here is clearly the Darknet reference model but if you insist on wanting a small model, use Tiny Darknet. Or train your own, it should be easy!
Here’s how to use it in Darknet (and also how to install Darknet):
git clone https://github.com/pjreddie/darknet
cd darknet
make
wget https://pjreddie.com/media/files/tiny.weights
./darknet classify cfg/tiny.cfg tiny.weights data/dog.jpg
1.1.1 tiny.cfg
[net]
# Train
# batch=128
# subdivisions=1
# Test
batch=1
subdivisions=1
height=224
width=224
channels=3
momentum=0.9
decay=0.0005
max_crop=320
......
1.1.2 Makefile
GPU=1
CUDNN=1
OPENCV=0
OPENMP=1
DEBUG=0
......
strong@foreverstrong:~/darknet_work/darknet_180906/darknet$ make clean
strong@foreverstrong:~/darknet_work/darknet_180906/darknet$ make
1.1.3 classify and classifier
./darknet classify ./cfg/tiny.cfg ./tiny.weights ./data/dog.jpg
./darknet classifier predict ./cfg/imagenet1k.data ./cfg/tiny.cfg ./tiny.weights ./data/dog.jpg
Hopefully you see something like this:
data/dog.jpg: Predicted in 0.160994 seconds.
malamute: 0.167168
Eskimo dog: 0.065828
dogsled: 0.063020
standard schnauzer: 0.051153
Siberian husky: 0.037506
strong@foreverstrong:~/darknet_work/darknet_180906/darknet$ wget https://pjreddie.com/media/files/tiny.weights
--2019-01-03 10:30:45-- https://pjreddie.com/media/files/tiny.weights
Resolving pjreddie.com (pjreddie.com)... 128.208.3.39
Connecting to pjreddie.com (pjreddie.com)|128.208.3.39|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4185968 (4.0M) [application/octet-stream]
Saving to: ‘tiny.weights’
tiny.weights 100%[===================>] 3.99M 137KB/s in 1m 41s
2019-01-03 10:32:32 (40.5 KB/s) - ‘tiny.weights’ saved [4185968/4185968]
strong@foreverstrong:~/darknet_work/darknet_180906/darknet$ ./darknet classify ./cfg/tiny.cfg ./tiny.weights ./data/dog.jpg
layer filters size input output
0 conv 16 3 x 3 / 1 224 x 224 x 3 -> 224 x 224 x 16 0.043 BFLOPs
1 max 2 x 2 / 2 224 x 224 x 16 -> 112 x 112 x 16
2 conv 32 3 x 3 / 1 112 x 112 x 16 -> 112 x 112 x 32 0.116 BFLOPs
3 max 2 x 2 / 2 112 x 112 x 32 -> 56 x 56 x 32
4 conv 16 1 x 1 / 1 56 x 56 x 32 -> 56 x 56 x 16 0.003 BFLOPs
5 conv 128 3 x 3 / 1 56 x 56 x 16 -> 56 x 56 x 128 0.116 BFLOPs
6 conv 16 1 x 1 / 1 56 x 56 x 128 -> 56 x 56 x 16 0.013 BFLOPs
7 conv 128 3 x 3 / 1 56 x 56 x 16 -> 56 x 56 x 128 0.116 BFLOPs
8 max 2 x 2 / 2 56 x 56 x 128 -> 28 x 28 x 128
9 conv 32 1 x 1 / 1 28 x 28 x 128 -> 28 x 28 x 32 0.006 BFLOPs
10 conv 256 3 x 3 / 1 28 x 28 x 32 -> 28 x 28 x 256 0.116 BFLOPs
11 conv 32 1 x 1 / 1 28 x 28 x 256 -> 28 x 28 x 32 0.013 BFLOPs
12 conv 256 3 x 3 / 1 28 x 28 x 32 -> 28 x 28 x 256 0.116 BFLOPs
13 max 2 x 2 / 2 28 x 28 x 256 -> 14 x 14 x 256
14 conv 64 1 x 1 / 1 14 x 14 x 256 -> 14 x 14 x 64 0.006 BFLOPs
15 conv 512 3 x 3 / 1 14 x 14 x 64 -> 14 x 14 x 512 0.116 BFLOPs
16 conv 64 1 x 1 / 1 14 x 14 x 512 -> 14 x 14 x 64 0.013 BFLOPs
17 conv 512 3 x 3 / 1 14 x 14 x 64 -> 14 x 14 x 512 0.116 BFLOPs
18 conv 128 1 x 1 / 1 14 x 14 x 512 -> 14 x 14 x 128 0.026 BFLOPs
19 conv 1000 1 x 1 / 1 14 x 14 x 128 -> 14 x 14 x1000 0.050 BFLOPs
20 avg 14 x 14 x1000 -> 1000
21 softmax 1000
Loading weights from ./tiny.weights...Done!
./data/dog.jpg: Predicted in 0.002268 seconds.
14.50%: malamute
6.08%: Newfoundland
5.59%: dogsled
4.56%: standard schnauzer
4.05%: Eskimo dog
strong@foreverstrong:~/darknet_work/darknet_180906/darknet$
strong@foreverstrong:~/darknet_work/darknet_180906/darknet$ ./darknet classifier predict ./cfg/imagenet1k.data ./cfg/tiny.cfg ./tiny.weights ./data/dog.jpg
layer filters size input output
0 conv 16 3 x 3 / 1 224 x 224 x 3 -> 224 x 224 x 16 0.043 BFLOPs
1 max 2 x 2 / 2 224 x 224 x 16 -> 112 x 112 x 16
2 conv 32 3 x 3 / 1 112 x 112 x 16 -> 112 x 112 x 32 0.116 BFLOPs
3 max 2 x 2 / 2 112 x 112 x 32 -> 56 x 56 x 32
4 conv 16 1 x 1 / 1 56 x 56 x 32 -> 56 x 56 x 16 0.003 BFLOPs
5 conv 128 3 x 3 / 1 56 x 56 x 16 -> 56 x 56 x 128 0.116 BFLOPs
6 conv 16 1 x 1 / 1 56 x 56 x 128 -> 56 x 56 x 16 0.013 BFLOPs
7 conv 128 3 x 3 / 1 56 x 56 x 16 -> 56 x 56 x 128 0.116 BFLOPs
8 max 2 x 2 / 2 56 x 56 x 128 -> 28 x 28 x 128
9 conv 32 1 x 1 / 1 28 x 28 x 128 -> 28 x 28 x 32 0.006 BFLOPs
10 conv 256 3 x 3 / 1 28 x 28 x 32 -> 28 x 28 x 256 0.116 BFLOPs
11 conv 32 1 x 1 / 1 28 x 28 x 256 -> 28 x 28 x 32 0.013 BFLOPs
12 conv 256 3 x 3 / 1 28 x 28 x 32 -> 28 x 28 x 256 0.116 BFLOPs
13 max 2 x 2 / 2 28 x 28 x 256 -> 14 x 14 x 256
14 conv 64 1 x 1 / 1 14 x 14 x 256 -> 14 x 14 x 64 0.006 BFLOPs
15 conv 512 3 x 3 / 1 14 x 14 x 64 -> 14 x 14 x 512 0.116 BFLOPs
16 conv 64 1 x 1 / 1 14 x 14 x 512 -> 14 x 14 x 64 0.013 BFLOPs
17 conv 512 3 x 3 / 1 14 x 14 x 64 -> 14 x 14 x 512 0.116 BFLOPs
18 conv 128 1 x 1 / 1 14 x 14 x 512 -> 14 x 14 x 128 0.026 BFLOPs
19 conv 1000 1 x 1 / 1 14 x 14 x 128 -> 14 x 14 x1000 0.050 BFLOPs
20 avg 14 x 14 x1000 -> 1000
21 softmax 1000
Loading weights from ./tiny.weights...Done!
./data/dog.jpg: Predicted in 0.002295 seconds.
14.50%: malamute
6.08%: Newfoundland
5.59%: dogsled
4.56%: standard schnauzer
4.05%: Eskimo dog
strong@foreverstrong:~/darknet_work/darknet_180906/darknet$
malamute ['mæləmjuːt]:n. 北极狗,爱斯基摩狗
dogsled ['dɔɡslɛd]:n. 狗拖的雪橇
standard schnauzer:标准史纳莎,标准型雪纳瑞犬
siberian husky:西伯利亚爱斯基摩狗
Eskimo ['eskiməu]:n. 爱斯基摩人,爱斯基摩语 adj. 爱斯基摩人的
Here’s the config file: tiny.cfg
[darknet/cfg/tiny.cfg]
The model is just some 3x3 and 1x1 convolutional layers:
layer filters size input output
0 conv 16 3 x 3 / 1 224 x 224 x 3 -> 224 x 224 x 16
1 max 2 x 2 / 2 224 x 224 x 16 -> 112 x 112 x 16
2 conv 32 3 x 3 / 1 112 x 112 x 16 -> 112 x 112 x 32
3 max 2 x 2 / 2 112 x 112 x 32 -> 56 x 56 x 32
4 conv 16 1 x 1 / 1 56 x 56 x 32 -> 56 x 56 x 16
5 conv 128 3 x 3 / 1 56 x 56 x 16 -> 56 x 56 x 128
6 conv 16 1 x 1 / 1 56 x 56 x 128 -> 56 x 56 x 16
7 conv 128 3 x 3 / 1 56 x 56 x 16 -> 56 x 56 x 128
8 max 2 x 2 / 2 56 x 56 x 128 -> 28 x 28 x 128
9 conv 32 1 x 1 / 1 28 x 28 x 128 -> 28 x 28 x 32
10 conv 256 3 x 3 / 1 28 x 28 x 32 -> 28 x 28 x 256
11 conv 32 1 x 1 / 1 28 x 28 x 256 -> 28 x 28 x 32
12 conv 256 3 x 3 / 1 28 x 28 x 32 -> 28 x 28 x 256
13 max 2 x 2 / 2 28 x 28 x 256 -> 14 x 14 x 256
14 conv 64 1 x 1 / 1 14 x 14 x 256 -> 14 x 14 x 64
15 conv 512 3 x 3 / 1 14 x 14 x 64 -> 14 x 14 x 512
16 conv 64 1 x 1 / 1 14 x 14 x 512 -> 14 x 14 x 64
17 conv 512 3 x 3 / 1 14 x 14 x 64 -> 14 x 14 x 512
18 conv 128 1 x 1 / 1 14 x 14 x 512 -> 14 x 14 x 128
19 conv 1000 1 x 1 / 1 14 x 14 x 128 -> 14 x 14 x1000
20 avg 14 x 14 x1000 -> 1000
21 softmax 1000
22 cost 1000
Wordbook
you only look once,YOLO
Visual Object Classes,VOC
Pattern Analysis, Statistical Modelling and Computational Learning,PASCAL
mean Average Precision,mAP:平均精度均值
floating point operations per second,FLOPS
frame rate or frame frequency, frames per second,FPS
hertz,Hz
billion,Bn
operations,Ops
configuration,cfg
ImageNet Large Scale Visual Recognition Challenge,ILSVRC
Microsoft Common Objects in Context,MS COCO