首先上论文:https://arxiv.org/abs/1708.02002
CE
先从一般的二分类交叉熵(Cross Entropy,CE)开始:
C E ( p , y ) = { − l o g ( p ) i f y = 1 l o g ( p ) o t h e r w i s e CE(p,y)=\left\{\begin{matrix} -log(p) &if y=1 \\ log(p)&otherwise \end{matrix}\right. CE(p,y)={−log(p)log(p)ify=1otherwise
为了方便书写,定义一个 p t p_t pt:(这里 p t p_t pt的含义是预测的越准确, p t p_t pt越大)
p t = { p i f y = 1 1 − p o t h e r w i s e p_t=\left\{\begin{matrix} p &if y=1 \\ 1-p&otherwise \end{matrix}\right. pt={p1−pify=1otherwise
这样
C
E
(
p
,
y
)
CE(p,y)
CE(p,y)就可以简写为
C
E
(
p
t
)
CE(p_t)
CE(pt)=
−
l
o
g
(
p
t
)
-log(p_t)
−log(pt)
很明显,这个公式在样本不均衡的情况下会出问题,例如在正负样本不均衡时(正:负=1:3000),就导致loss绝大部分是由负样本贡献的,会将正阳本淹没。
α \alpha α CE
有一种方法是在前面添加一个参数
α
\alpha
α,添加之后的公式为:
C
E
(
p
t
)
=
−
α
t
log
(
p
t
)
CE(p_t)=-\alpha_t\log(p_t)
CE(pt)=−αtlog(pt)
当样本为正时,
α
t
\alpha_t
αt=
α
\alpha
α,当样本为负时
α
t
\alpha_t
αt =
1
−
α
1-\alpha
1−α通常情况下
α
\alpha
α可以在训练时由正阳本比例的反数(负样本比例)给出,也可以作为超参数给出。这可以一定程度上减轻样本不均衡问题。
FL
Focal Loss是以上面的
C
E
(
p
t
)
CE(p_t)
CE(pt)为基准进行比较的,Focal Loss解决的一个关键问题是针对难/易样本,而
C
E
(
p
t
)
CE(p_t)
CE(pt)仅针对正/负样本。
Focal Loss的公式:
F
L
(
p
t
)
=
−
(
1
−
p
t
)
γ
log
(
p
t
)
FL(p_t)=-(1-p_t)^\gamma\log(p_t)
FL(pt)=−(1−pt)γlog(pt)
预测的越准确(容易样本),
p
t
p_t
pt越大,
1
−
p
t
1-p_t
1−pt越小,对总loss的贡献越小。
在实际应用中,还加入了一个超参数
α
t
\alpha_t
αt:
F
L
(
p
t
)
=
−
α
t
(
1
−
p
t
)
γ
log
(
p
t
)
FL(p_t)=-\alpha_t(1-p_t)^\gamma\log(p_t)
FL(pt)=−αt(1−pt)γlog(pt)
α
t
\alpha_t
αt于上面的功能类似,调节样本不均衡问题。通过实验发现
α
t
\alpha_t
αt=0.25
,
γ
,\gamma
,γ=2时,效果最好。(但奇怪的是
α
t
\alpha_t
αt=0.25是相当于减弱正阳本的比重,增强负样本的比重,可能是由于FL对负样本的抑制作用过于强烈 ? 。
下面是Keras版本的FL实现:
def focal(alpha=0.25, gamma=2.0):
""" Create a functor for computing the focal loss.
Args
alpha: Scale the focal weight with alpha.
gamma: Take the power of the focal weight with gamma.
Returns
A functor that computes the focal loss using the alpha and gamma.
"""
def _focal(y_true, y_pred):
""" Compute the focal loss given the target tensor and the predicted tensor.
As defined in https://arxiv.org/abs/1708.02002
Args
y_true: Tensor of target data from the generator with shape (B, N, num_classes).
y_pred: Tensor of predicted data from the network with shape (B, N, num_classes).
Returns
The focal loss of y_pred w.r.t. y_true.
"""
labels = y_true[:, :, :-1] # 获取真实的label
anchor_state = y_true[:, :, -1] # 获取anchor的状态,-1 for ignore, 0 for background, 1 for object
classification = y_pred # 预测的结果,shape和labels相同。
# filter out "ignore" anchors
indices = backend.where(keras.backend.not_equal(anchor_state, -1))
labels = backend.gather_nd(labels, indices)
classification = backend.gather_nd(classification, indices)
# compute the focal loss
alpha_factor = keras.backend.ones_like(labels) * alpha # 创建一个和labels的shape相同的全为alpha大小的张量。
alpha_factor = backend.where(keras.backend.equal(labels, 1), alpha_factor, 1 - alpha_factor) # 判断作用,equal条件满足选第二个,不满足选第三个。
focal_weight = backend.where(keras.backend.equal(labels, 1), 1 - classification, classification)
# 计算focal的权重。
focal_weight = alpha_factor * focal_weight ** gamma
cls_loss = focal_weight * keras.backend.binary_crossentropy(labels, classification)
# compute the normalizer: the number of positive anchors
normalizer = backend.where(keras.backend.equal(anchor_state, 1))
normalizer = keras.backend.cast(keras.backend.shape(normalizer)[0], keras.backend.floatx())
normalizer = keras.backend.maximum(1.0, normalizer)
return keras.backend.sum(cls_loss) / normalizer
return _focal
网络结构
基础网络-DenseNet121
源码地址:/home/tf/anaconda3/lib/python3.6/site-packages/keras_applications/densenet.py
DenseNet121的blocks == [6, 12, 24, 16],表示有四个densenet块,每个块里面分别有6,12,24,16个conv块,每个conv块里面有两个conv层。这样总共就有2×(6+12+24+16)=116。
两densenet块之间有个卷积层,四个块之间就有三个卷积,再加上开始的卷积和最后的全连接,共有116+3+1+1=121。
这里选取blocks的后三个卷积层进行多特征融合。
融合
通过上采样,下采样及融合等过程,得到
p
3
,
p
4
,
p
5
,
p
6
,
p
7
p_3,p_4,p_5,p_6,p_7
p3,p4,p5,p6,p7五个特征图,分别在五个特征图上进行分类和回归的子网络操作。
注意所有分类子网络最后都会reshape为(-1 , num_class),回归子网络都会reshape为(-1 , 4),两者中的-1代表行,每一行表示一个anchors。
最后将所有的分类和回归结果进行Concatenate(合并)。
generate_anchors
/home/tf/keras-retinanet/keras_retinanet/utils/anchors.py
这里以参数:scales = [1 , 1.26 , 1.59 ] ; ratios = [0.5 , 1 , 2 ] ; base_size = 16为例来说下生成
def generate_anchors(base_size=16, ratios=None, scales=None):
"""
Generate anchor (reference) windows by enumerating aspect ratios X
scales w.r.t. a reference window.
"""
if ratios is None:
ratios = AnchorParameters.default.ratios
if scales is None:
scales = AnchorParameters.default.scales
num_anchors = len(ratios) * len(scales)
# initialize output anchors
# 首先初始化一个shape为(9,4)全为零的array
anchors = np.zeros((num_anchors, 4))
# scale base_size
# 第三列和第四列按照scales进行初始化
"""
>>> anchors
array([[ 0. , 0. , 16. , 16. ],
[ 0. , 0. , 20.15873718, 20.15873718],
[ 0. , 0. , 25.39841652, 25.39841652],
[ 0. , 0. , 16. , 16. ],
[ 0. , 0. , 20.15873718, 20.15873718],
[ 0. , 0. , 25.39841652, 25.39841652],
[ 0. , 0. , 16. , 16. ],
[ 0. , 0. , 20.15873718, 20.15873718],
[ 0. , 0. , 25.39841652, 25.39841652]])
"""
anchors[:, 2:] = base_size * np.tile(scales, (2, len(ratios))).T
# compute areas of anchors
"""
>>> areas
array([256. , 406.3746848 , 645.07956168, 256. ,
406.3746848 , 645.07956168, 256. , 406.3746848 ,
645.07956168])
"""
areas = anchors[:, 2] * anchors[:, 3]
# correct for ratios
""">>> anchors
array([[ 0. , 0. , 22.627417 , 16. ],
[ 0. , 0. , 28.50875952, 20.15873718],
[ 0. , 0. , 35.9187851 , 25.39841652],
[ 0. , 0. , 16. , 16. ],
[ 0. , 0. , 20.15873718, 20.15873718],
[ 0. , 0. , 25.39841652, 25.39841652],
[ 0. , 0. , 11.3137085 , 16. ],
[ 0. , 0. , 14.25437976, 20.15873718],
[ 0. , 0. , 17.95939255, 25.39841652]])
"""
anchors[:, 2] = np.sqrt(areas / np.repeat(ratios, len(scales)))
""">>> anchors
array([[ 0. , 0. , 22.627417 , 11.3137085 ],
[ 0. , 0. , 28.50875952, 14.25437976],
[ 0. , 0. , 35.9187851 , 17.95939255],
[ 0. , 0. , 16. , 16. ],
[ 0. , 0. , 20.15873718, 20.15873718],
[ 0. , 0. , 25.39841652, 25.39841652],
[ 0. , 0. , 11.3137085 , 22.627417 ],
[ 0. , 0. , 14.25437976, 28.50875952],
[ 0. , 0. , 17.95939255, 35.9187851 ]])
"""
anchors[:, 3] = anchors[:, 2] * np.repeat(ratios, len(scales))
# transform from (x_ctr, y_ctr, w, h) -> (x1, y1, x2, y2)
anchors[:, 0::2] -= np.tile(anchors[:, 2] * 0.5, (2, 1)).T
anchors[:, 1::2] -= np.tile(anchors[:, 3] * 0.5, (2, 1)).T
return anchors