本篇文章介绍目标检测中不同的损失函数概念及其代码实现。目标检测主要任务为实现目标的
分类
与定位
,其损失组成如下:
类别/置信度损失
(分类任务):BCE,FL,QFL,VFL位置损失
(回归任务):IoU,GIoU,DIoU,CIoU,DFL(分类)
??二值交叉熵(Binary Cross-Entropy, BCE)
是一种应用于二分类任务中的损失函数,用于衡量目标类别预测值
和实际值
之间的差距,其计算公式如下:
B
C
E
(
y
,
p
)
=
?
y
log
?
(
p
)
?
(
1
?
y
)
log
?
(
1
?
p
)
{BCE}(y,p) = - y\log (p) - (1 - y)\log (1 - p)
BCE(y,p)=?ylog(p)?(1?y)log(1?p)
其中
y
y
y表示目标的实际类别,值为0或1,
p
p
p为目标的预测类别,值为[0,1],进一步地,BCE Loss可表示为以下形式:
B
C
E
(
p
t
)
=
?
l
o
g
(
p
t
)
{BCE}(p_t) = - log (p_t)
BCE(pt?)=?log(pt?)
p
t
=
{
p
,
y
=
1
1
?
p
,
o
t
h
e
r
w
i
s
e
?
p_t= \begin{cases} p,y=1 \\ 1-p,otherwise\ \end{cases}
pt?={p,y=11?p,otherwise??
针对多类别任务,可通过独热编码
将其分解为多个二分类任务的组合再使用BCE Loss。
??BCE在PyTorch
中的实现如下所示:
'''
函数实现:
binary_cross_entropy_with_logits:Sigmoid + BCE
binary_cross_entropy: BCE
'''
torch.nn.functional.binary_cross_entropy_with_logits(
input=None, # 预测值
target=None, # 实际标签
weight=None, # 对每个样本的损失进行加权
size_average=None, # 已弃用
reduce=None, # 不使用
pos_weight=None, # 正样本的损失加权(长度等于类数)
reduction='mean' # 所有样本的损失求平均(mean)或求和(sum)
)
'''
类实现(调用上面的函数实现损失计算)
'''
torch.nn.BCEWithLogitsLoss(weight,pos_weight,reduction)
??Focal Loss(FL)
由文章Focal Loss for Dense Object Detection提出。Focal Loss在BCE Loss的基础上,通过权重系数实现以下两点目的:
解决正负样本不平衡问题
:目标检测任务中存在大量的背景(负样本),实际目标(正样本)占比减少降低易分类样本的权重
:使模型训练更加关注于困难样本F L ( y , p ) = ? α ( 1 ? p ) γ y log ? ( p ) ? ( 1 ? α ) p γ ( 1 ? y ) log ? ( 1 ? p ) {FL}(y,p) = - α(1-p)^γy\log (p) -(1-α) p^γ(1 - y)\log (1 - p) FL(y,p)=?α(1?p)γylog(p)?(1?α)pγ(1?y)log(1?p)
??Focal Loss的实现方法如下:
class FocalLoss(nn.Module):
'''
用在代替原来的BCEcls(分类损失)和BCEobj(置信度损失)
优点:
1.解决了单阶段目标检测中图片正负样本(前景和背景)不均衡的问题;
2.降低简单样本的权重, 使损失函数更关注困难样本
'''
def __init__(self, loss_fcn, gamma=1.5, alpha=0.25):
super(FocalLoss, self).__init__()
self.loss_fcn = loss_fcn # 必须为nn.BCEWithLogitsLoss = sigmoid + BCELoss
self.gamma = gamma # 参数γ用于削弱简单样本对loss的贡献程度
self.alpha = alpha # 参数α用于平衡正负样本个数不均衡的问题
self.reduction = loss_fcn.reduction
self.loss_fcn.reduction = 'none' # focalloss中的BCE函数的reduction='none', 需要将focal loss应用到每个样本中
def forward(self, pred, true):
loss = self.loss_fcn(pred, true) # BCE(p_t) = -log(p_t)
pred_prob = torch.sigmoid(pred)
p_t = true * pred_prob + (1 - true) * (1 - pred_prob) # p_t
alpha_factor = true * self.alpha + (1 - true) * (1 - self.alpha) # α_t
modulating_factor = (1.0 - p_t) ** self.gamma # (1-p_t)^γ
loss *= alpha_factor * modulating_factor # 损失乘上系数
# 最后选择focalloss返回的类型 默认是mean
if self.reduction == 'mean':
return loss.mean()
elif self.reduction == 'sum':
return loss.sum()
else: # 'none'
return loss
??Quality Focal Loss(QFL)
由文章Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection提出,其与Focal Loss的主要不同为:
将目标实际标签变为连续值
soft ont-hot label(iou label)
:IoU值表示该类别,0表示其他类别ont-hot label(catgory label)
:1表示该类别,0表示其他类别用实际标签与预测标签的距离表征样本的分类难易度
??Quality Focal Loss公式如下:
Q
F
L
(
y
,
p
)
=
?
∣
y
?
p
∣
γ
(
α
y
log
?
(
p
)
?
(
1
?
α
)
(
1
?
y
)
log
?
(
1
?
p
)
)
{QFL}(y,p) = - |y-p|^γ(αy\log (p) -(1-α)(1 - y)\log (1 - p))
QFL(y,p)=?∣y?p∣γ(αylog(p)?(1?α)(1?y)log(1?p))
其中p
表示目标的soft ont-hot label(iou label)
,超参数
γ
、
α
γ、α
γ、α与Focal Loss中概念一致,
y
y
y与
p
p
p越接近,则该样本越易分类,对损失的贡献则越小。
??Quality Focal Loss实现方法如下:
class QualityFocalLoss(nn.Module):
'''
相比Focal Loss的变化:
1.以目标与预测结果的IoU作为实际标签(软标签)
2.修改了难易样本的权重计算方法
'''
def __init__(self, loss_fcn, gamma=1.5, alpha=0.25):
super(QFocalLoss, self).__init__()
self.loss_fcn = loss_fcn # 必须为 nn.BCEWithLogitsLoss() = BCE + Sigmoid
self.gamma = gamma # 参数γ用于削弱简单样本对loss的贡献程度
self.alpha = alpha # 参数α用于平衡正负样本个数不均衡的问题
self.reduction = loss_fcn.reduction
self.loss_fcn.reduction = 'none'
def forward(self, pred, true):
loss = self.loss_fcn(pred, true) # loss = -log(p_t)
pred_prob = torch.sigmoid(pred)
alpha_factor = true * self.alpha + (1 - true) * (1 - self.alpha) # α_t
modulating_factor = torch.abs(true - pred_prob) ** self.gamma # |y-p|^γ
loss *= alpha_factor * modulating_factor # 损失乘上系数
if self.reduction == 'mean':
return loss.mean()
elif self.reduction == 'sum':
return loss.sum()
else: # 'none'
return loss
??VariFocal Loss(VFL)
由文章VarifocalNet: An IoU-aware Dense Object Detector提出,其与Focal Loss的主要不同为:
Focal Loss
针对正样本和负样本均进行难易分类样本抑制,降低了正样本质量;VariFocal Loss
仅针对负样本进行难易分类样本抑制
Quality Focal Loss
中的软标签
以及难易样本分类权重
??VariFocal Loss公式如下:
V
F
L
(
y
,
p
)
=
{
?
y
(
y
l
o
g
(
p
)
+
(
1
?
y
)
l
o
g
(
1
?
p
)
)
,
y
>
0
α
p
γ
l
o
g
(
1
?
p
)
,
y
=
0
?
VFL(y,p)= \begin{cases} -y(ylog(p)+(1-y)log(1-p)),y>0 \\ αp^γlog(1-p),y=0\ \end{cases}
VFL(y,p)={?y(ylog(p)+(1?y)log(1?p)),y>0αpγlog(1?p),y=0??
其中
γ
γ
γ用于减少易分类负样本对损失的贡献,
α
α
α用于防止过度抑制。
??VariFocal Loss的实现代码如下:
class VariFocalLoss(nn.Module):
'''
相比Focal Loss的变化:
仅针对负样本进行难易分类度抑制
'''
def __init__(self, loss_fcn, gamma=1.5, alpha=0.75):
super(VariFocalLoss, self).__init__()
self.loss_fcn = loss_fcn # 必须为 nn.BCEWithLogitsLoss()=BCE+sigmoid
self.gamma = gamma # 参数gamma 用于负样本中削弱简单样本对loss的贡献程度
self.alpha = alpha # 参数alpha 用于防止对负样本的过度抑制
self.reduction = loss_fcn.reduction
self.loss_fcn.reduction = 'none'
def forward(self, pred, true):
loss = self.loss_fcn(pred, true) # loss = -log(p_t)
pred_prob = torch.sigmoid(pred)
focal_weight = self.alpha * torch.abs(pred_prob - true) ** self.gamma # 负样本系数αp^γ
indics = torch.where(true > 0.0) # 正样本索引
for i in range(len(indics[0])): # 正样本系数替换为y
focal_weight[indics[0][i], indics[1][i]] = true[indics[0][i], indics[1][i]]
loss *= focal_weight # 损失乘上系数
# 最后选择focalloss返回的类型 默认是mean
if self.reduction == 'mean':
return loss.mean()
elif self.reduction == 'sum':
return loss.sum()
else: # 'none'
return loss
??IoU
即交并比,用于衡量预测边框与实际边框之间的差距,其计算公式如下:
I
o
U
=
∣
A
∩
B
∣
∣
A
∪
B
∣
[
0
,
1
]
IoU = {{|A \cap B|} \over {|A \cup B|}} [0,1]
IoU=∣A∪B∣∣A∩B∣?[0,1]
其中
A
∩
B
A \cap B
A∩B表示边框A和B的交集面积,
A
∩
B
A \cap B
A∩B表示边框A和B的并集面积。
??进一步地,IoU Loss
表示为:
L
I
o
U
=
1
?
I
o
U
L_{IoU} = 1 - IoU
LIoU?=1?IoU
??IoU计算方法如下:
def bbox_iou(box1,box2,xywh=True):
# 将xywh坐标形式转换成xyxy(左上角右下角)坐标形式
if xywh: # transform from xywh to xyxy
(x1, y1, w1, h1), (x2, y2, w2, h2) = box1.chunk(4, -1), box2.chunk(4, -1)
w1_, h1_, w2_, h2_ = w1 / 2, h1 / 2, w2 / 2, h2 / 2
b1_x1, b1_x2, b1_y1, b1_y2 = x1 - w1_, x1 + w1_, y1 - h1_, y1 + h1_
b2_x1, b2_x2, b2_y1, b2_y2 = x2 - w2_, x2 + w2_, y2 - h2_, y2 + h2_
# 计算交集面积
inter = (torch.min(b1_x2, b2_x2) - torch.max(b1_x1, b2_x1)).clamp(0) * \
(torch.min(b1_y2, b2_y2) - torch.max(b1_y1, b2_y1)).clamp(0)
# 计算并集面积
union = w1 * h1 + w2 * h2 - inter + eps
# 计算IoU
iou = inter / union
return iou
??GIoU
由文章Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression提出,其计算公式如下:
G
I
o
U
=
I
o
U
?
∣
C
?
A
∩
B
∣
∣
C
∣
[
?
1
,
1
]
GIoU = IoU-{{|C - A \cap B|} \over {|C|}} [-1,1]
GIoU=IoU?∣C∣∣C?A∩B∣?[?1,1]
其中C表示A和B的最小包围矩形框。
??相比于IoU,GIoU不仅考虑重叠区域,也考虑非重叠区域,能更好的反映两者的重合度
。
??进一步地,IoU Loss
表示为:
L
G
I
o
U
=
1
?
G
I
o
U
L_{GIoU} = 1 - GIoU
LGIoU?=1?GIoU
??GIoU计算方法如下:
def bbox_giou(box1,box2,xywh=True):
# 将xywh坐标形式转换成xyxy(左上角右下角)坐标形式
if xywh: # transform from xywh to xyxy
(x1, y1, w1, h1), (x2, y2, w2, h2) = box1.chunk(4, -1), box2.chunk(4, -1)
w1_, h1_, w2_, h2_ = w1 / 2, h1 / 2, w2 / 2, h2 / 2
b1_x1, b1_x2, b1_y1, b1_y2 = x1 - w1_, x1 + w1_, y1 - h1_, y1 + h1_
b2_x1, b2_x2, b2_y1, b2_y2 = x2 - w2_, x2 + w2_, y2 - h2_, y2 + h2_
# 计算交集面积
inter = (torch.min(b1_x2, b2_x2) - torch.max(b1_x1, b2_x1)).clamp(0) * \
(torch.min(b1_y2, b2_y2) - torch.max(b1_y1, b2_y1)).clamp(0)
# 计算并集面积
union = w1 * h1 + w2 * h2 - inter + eps
# 计算IoU
iou = inter / union
cw = torch.max(b1_x2, b2_x2) - torch.min(b1_x1, b2_x1) # 最小包围矩形框宽度
ch = torch.max(b1_y2, b2_y2) - torch.min(b1_y1, b2_y1) # 最小包围矩形框高度
c_area = cw * ch + eps # 最小包围矩形框面积
giou = iou - (c_area - union) / c_area
return giou
??DIoU由文章Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression提出,其计算公式如下:
D
I
o
U
=
I
o
U
?
ρ
2
(
b
,
b
g
t
)
c
2
DIoU = IoU - {{{\rho ^2}(b,{b^{gt}})} \over {{c^2}}}
DIoU=IoU?c2ρ2(b,bgt)?
其中
b
b
b,
b
g
t
b^{gt}
bgt分别表示预测框和实际框的中心点坐标,
ρ
ρ
ρ表示两者的欧式距离,
c
c
c表示最小包围矩形框的对角线长度。
??相比于IoU和GIoU,DIoU不仅考虑两者之间的重合度
,还考虑两者之间的距离
。
??进一步地,DIoU Loss
表示为:
L
D
I
o
U
=
1
?
D
I
o
U
L_{DIoU} = 1 - DIoU
LDIoU?=1?DIoU
??DIoU计算方法如下:
def bbox_diou(box1, box2, xywh=True):
'''
计算Iou/GIou/DIou/CIou
'''
# Returns Intersection over Union (IoU) of box1(1,4) to box2(n,4)
# 将xywh坐标形式转换成xyxy(左上角右下角)坐标形式
if xywh: # transform from xywh to xyxy
(x1, y1, w1, h1), (x2, y2, w2, h2) = box1.chunk(4, -1), box2.chunk(4, -1)
w1_, h1_, w2_, h2_ = w1 / 2, h1 / 2, w2 / 2, h2 / 2
b1_x1, b1_x2, b1_y1, b1_y2 = x1 - w1_, x1 + w1_, y1 - h1_, y1 + h1_
b2_x1, b2_x2, b2_y1, b2_y2 = x2 - w2_, x2 + w2_, y2 - h2_, y2 + h2_
# 计算交集面积
inter = (torch.min(b1_x2, b2_x2) - torch.max(b1_x1, b2_x1)).clamp(0) * \
(torch.min(b1_y2, b2_y2) - torch.max(b1_y1, b2_y1)).clamp(0)
# 计算并集面积
union = w1 * h1 + w2 * h2 - inter + eps
# 计算IoU
iou = inter / union
cw = torch.max(b1_x2, b2_x2) - torch.min(b1_x1, b2_x1) # 最小包围矩形框宽度
ch = torch.max(b1_y2, b2_y2) - torch.min(b1_y1, b2_y1) # 最小包围矩形框高度
c2 = cw ** 2 + ch ** 2 # 最小包围矩形框对角线长度的平方
rho2 = ((b2_x1 + b2_x2 - b1_x1 - b1_x2) ** 2 + (b2_y1 + b2_y2 - b1_y1 - b1_y2) ** 2) / 4 # 矩形框中心点距离的平方
diou = iou - rho2 / c2
return diou
??CIOU也由DIOU作者在同一篇论文中提出,在DIOU基础上,CIOU考虑矩形框之间的高宽比
,其计算公式如下:
D
I
o
U
=
I
o
U
?
(
ρ
2
(
b
,
b
g
t
)
c
2
+
α
v
)
DIoU = IoU - ({{{\rho ^2}(b,{b^{gt}})} \over {{c^2}}}+αv)
DIoU=IoU?(c2ρ2(b,bgt)?+αv)
v = 4 π 2 ( arctan ? w g t h g t ? arctan ? w h ) 2 {v = {4 \over {{\pi ^2}}}{{(\arctan {{{w^{gt}}} \over {{h^{gt}}}} - \arctan {w \over h})}^2}} v=π24?(arctanhgtwgt??arctanhw?)2
α = v ( 1 ? I o U ) + v {\alpha = {v \over {(1 - IoU) + v}}} α=(1?IoU)+vv?
其中
(
w
g
t
,
h
g
t
)
(w^{gt},h^{gt})
(wgt,hgt),
(
w
,
h
)
(w,h)
(w,h)分别表示实际边框与预测边框的宽和高。
??进一步地,CIoU Loss
表示为:
L
D
I
o
U
=
1
?
C
I
o
U
L_{DIoU} = 1 - CIoU
LDIoU?=1?CIoU
??CIoU计算方法如下:
def bbox_ciou(box1, box2, xywh=True):
'''
计算Iou/GIou/DIou/CIou
'''
# Returns Intersection over Union (IoU) of box1(1,4) to box2(n,4)
# 将xywh坐标形式转换成xyxy(左上角右下角)坐标形式
if xywh: # transform from xywh to xyxy
(x1, y1, w1, h1), (x2, y2, w2, h2) = box1.chunk(4, -1), box2.chunk(4, -1)
w1_, h1_, w2_, h2_ = w1 / 2, h1 / 2, w2 / 2, h2 / 2
b1_x1, b1_x2, b1_y1, b1_y2 = x1 - w1_, x1 + w1_, y1 - h1_, y1 + h1_
b2_x1, b2_x2, b2_y1, b2_y2 = x2 - w2_, x2 + w2_, y2 - h2_, y2 + h2_
# 计算交集面积
inter = (torch.min(b1_x2, b2_x2) - torch.max(b1_x1, b2_x1)).clamp(0) * \
(torch.min(b1_y2, b2_y2) - torch.max(b1_y1, b2_y1)).clamp(0)
# 计算并集面积
union = w1 * h1 + w2 * h2 - inter + eps
# 计算IoU
iou = inter / union
cw = torch.max(b1_x2, b2_x2) - torch.min(b1_x1, b2_x1) # 最小包围矩形框宽度
ch = torch.max(b1_y2, b2_y2) - torch.min(b1_y1, b2_y1) # 最小包围矩形框高度
c2 = cw ** 2 + ch ** 2 # 最小包围矩形框对角线长度的平方
rho2 = ((b2_x1 + b2_x2 - b1_x1 - b1_x2) ** 2 + (b2_y1 + b2_y2 - b1_y1 - b1_y2) ** 2) / 4 # 矩形框中心点距离的平方
v = (4 / math.pi ** 2) * torch.pow(torch.atan(w2 / h2) - torch.atan(w1 / h1), 2)
alpha = v / (v - iou + (1 + 1e-7))
ciou - (rho2 / c2 + v * alpha)
return ciou
??Distribution Focal Loss(DFL)
由QFL
作者在同一篇文章Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection中提出,针对目标受遮挡时边界模糊
问题,其将目标检测中边框回归任务
(
l
e
f
t
,
t
o
p
,
r
i
g
h
t
,
b
o
t
t
o
m
)
(left, top, right, bottom)
(left,top,right,bottom)转换为分类任务
,将标签预测
转换为序列预测
,预测信息变化如图1所示(图来源于DFL论文),序列面积之和即为所需结果。假设某目标
l
e
f
t
left
left预测结果为序列
{
y
0
,
y
1
,
y
2
,
.
.
.
,
y
n
?
1
}
,
y
i
?
[
0
,
1.0
]
\{y_0,y_1,y_2,...,y_{n-1}\},y_i\subseteq [0,1.0]
{y0?,y1?,y2?,...,yn?1?},yi??[0,1.0],则转换为:
l
e
f
t
=
∑
i
=
0
n
?
1
i
y
i
left = \sum\limits_{i = 0}^{n-1} {i{y_i}}
left=i=0∑n?1?iyi?
??Distribution Focal Loss
公式如下:
D
F
L
(
y
i
,
y
i
+
1
)
=
?
(
i
+
1
?
y
)
log
?
(
y
i
)
?
(
y
?
i
)
log
?
(
y
i
+
1
)
DFL({y_i},{y_{i + 1}}) = - (i + 1- y)\log ({y_i}) - (y - i)\log ({y_{i + 1}})
DFL(yi?,yi+1?)=?(i+1?y)log(yi?)?(y?i)log(yi+1?)
其中
y
y
y为实际标签,且式中变量满足
i
≤
y
≤
i
+
1
(
i
?
N
+
)
i≤y≤i+1(i\subseteq N^+)
i≤y≤i+1(i?N+),
y
i
,
y
i
+
1
?
[
0
,
1.0
]
y_i,y_{i+1}\subseteq [0,1.0]
yi?,yi+1??[0,1.0]
??Distribution Focal Loss的实现方法如下:
class DistributionFocalLoss(nn.Module):
'''
将目标边框回归预测转换为序列分类任务
'''
def __init__(self, reg_max):
super(DistributionFocalLoss, self).__init__()
self.reg_max = reg_max # 预测序列点数
def forward(self, pred, true):
'''
pred: [num_points, reg_max]
true: [num_gt, 4] 4->(ltrb)(输入图像绝对坐标)
'''
tl = true.long() # target left(i)
tr = tl + 1 # target right (i+1)
wl = tr - true # i+1-y
wr = 1 - wl # y-i
# -(i+1-y)log(y_i) - (y-i)log(y_i+1)
dfl = (F.cross_entropy(pred, tl.view(-1), reduction='none').view(tl.shape) * wl +
F.cross_entropy(pred, tr.view(-1), reduction='none').view(tl.shape) * wr).mean(-1,keepdim=True)