??本系列文章记录本人硕士阶段YOLO系列目标检测算法自学及其代码实现的过程。其中算法具体实现借鉴于ultralytics YOLO源码Github,删减了源码中部分内容,满足个人科研需求。
??本篇文章在YOLOv5
算法实现的基础上,进一步完成YOLOv8
算法的实现。YOLOv8
相比于YOLOv5
,最主要的不同之处如下:
模型结构
:将YOLOv5
中的CSP模块替换为C2f模块
,将Detect(耦合头 + Anchor-based)模块替换为Detect模块(解耦头 + Anchor-free + DFL)
正样本匹配
:采用TaskAlignedAssigner分配策略
损失计算
:
- 类别损失:二值交叉熵损失
- 位置损失:
Distribution Focal Loss(DFL) + CIOU Loss
- 置信度损失:
YOLOv8不预测模型的目标置信度,不再使用该损失
文章地址:
YOLOv8算法实现(一):模型搭建
YOLOv8算法实现(二):正样本匹配(TaskAlignedAssigner)与损失计算
??YOLOv8
的模型结构如图1所示,其包含以下几个模块:
CBS
:卷积层、批标准化(BN)和SiLU激活函数C2f
:多梯度融合特征提取模块SPPF
:快速金字塔池化特征层Detect
:检测头(解耦头 + Anchor-free + Distribution)class Bottleneck(nn.Module):
'''
残差连接瓶颈层, Residual block
'''
def __init__(self, c1, c2, shortcut=True, g=1, e=0.5, k=1):
'''
:param c1: 输入通道
:param c2: 输出通道
:param shortcut: 为True时采用残差连接
:param g: groups 在输出通道上分组, c2 // g 分组后不同组之间的卷积核参数不同
:param e: 中间层的通道数
'''
super(Bottleneck, self).__init__()
c_ = int(c2 * e) # 中间层的通道
self.cv1 = Conv(c1, c_, k, 1) # ch_in, ch_out, kereal_size, stride
self.cv2 = Conv(c_, c2, 3, 1, g=g)
self.add = shortcut and c1 == c2
def forward(self, x):
out = self.cv2(self.cv1(x))
return x + out if self.add else out
class C2f(nn.Module):
def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5):
super().__init__()
self.c = int(c2 * e) # hidden channels
self.cv1 = Conv(c1, 2 * self.c, 1, 1)
self.cv2 = Conv((2 + n) * self.c, c2, 1)
self.m = nn.ModuleList(Bottleneck(self.c, self.c, shortcut, e=1.0, k=3) for _ in range(n))
def forward(self, x):
y = list(self.cv1(x).split((self.c, self.c), 1))
y.extend(m(y[-1]) for m in self.m)
return self.cv2(torch.cat(y, 1))
??YOLOv8对结果的预测有如下特征:
特征图
实现对不同大小的目标预测;像素单位
的目标进行预测,每个单位负责得到一个预测结果;??假设特征图数量为
n
l
nl
nl,特征图中的分辨率为
(
g
r
i
d
_
x
i
,
g
r
i
d
_
y
i
)
(grid\_xi,grid\_yi)
(grid_xi,grid_yi),则一张图片可得到的预测结果数量
n
p
np
np为:
n
p
=
∑
i
=
1
n
l
(
g
r
i
d
_
x
i
×
g
r
i
d
_
y
i
)
np = \sum\limits_{i = 1}^{nl} {( grid\_xi \times grid\_yi} )
np=i=1∑nl?(grid_xi×grid_yi)
??模型预测的边框信息最终表示为 ( l e f t , t o p , r i g h t , b o t t o m ) (left,top,right,bottom) (left,top,right,bottom):
??模型的边框信息输出形式为一序列,如图2所示。假设某目标
l
e
f
t
left
left预测结果为序列
{
y
0
,
y
1
,
y
2
,
.
.
.
,
y
n
?
1
}
,
y
i
?
[
0
,
1.0
]
\{y_0,y_1,y_2,...,y_{n-1}\},y_i\subseteq [0,1.0]
{y0?,y1?,y2?,...,yn?1?},yi??[0,1.0],满足:
l
e
f
t
=
∑
i
=
0
n
?
1
i
y
i
left = \sum\limits_{i = 0}^{n-1} {i{y_i}}
left=i=0∑n?1?iyi?
class Detect(nn.Module):
# YOLOv8 Detect head for detection models
shape = None
anchors = torch.empty(0) # init
strides = torch.empty(0) # init
def __init__(self, nc=80, ch=()): # detection layer
super().__init__()
self.nc = nc # 类别数
self.nl = len(ch) # 检测层数(feature_map)
self.reg_max = 16 # DFL channels(通过卷积实现预测序列面积的计算)
self.no = nc + self.reg_max * 4 # 每一个预测单元点的输出通道
self.stride = torch.zeros(self.nl) # strides computed during build
c2, c3 = max((16, ch[0] // 4, self.reg_max * 4)), max(ch[0], self.nc) # 中间层通道
self.cv2 = nn.ModuleList(
nn.Sequential(Conv(x, c2, 3), Conv(c2, c2, 3), nn.Conv2d(c2, 4 * self.reg_max, 1)) for x in ch)
self.cv3 = nn.ModuleList(nn.Sequential(Conv(x, c3, 3), Conv(c3, c3, 3), nn.Conv2d(c3, self.nc, 1)) for x in ch)
self.dfl = DFL(self.reg_max) if self.reg_max > 1 else nn.Identity()
def forward(self, x):
shape = x[0].shape # BCHW
for i in range(self.nl):
# shape->(bs, 4*reg_max+num_cls, H, W)
x[i] = torch.cat((self.cv2[i](x[i]), self.cv3[i](x[i])), 1)
if self.training:
return x
elif self.shape != shape:
# anchors:所有预测单元中心点坐标; strides:所有预测单元相对于输入图像大小的尺度
self.anchors, self.strides = (x.transpose(0, 1) for x in self.make_anchors(x, self.stride, 0.5))
self.shape = shape
# [bs, no, ny, nx] -> box:[bs, 4 * reg_max, (20^2+40^2+80^2))] cls:[bs, num_cls, 20^2+40^2+80^]
box, cls = torch.cat([xi.view(shape[0], self.no, -1) for xi in x], 2).split((self.reg_max * 4, self.nc), 1)
# 将预测结果(l,t,r,b)(不同特征图上)转换为(x,y,x,y)(原图绝对坐标)
dbox = dist2bbox(self.dfl(box), self.anchors.unsqueeze(0), xywh=True, dim=1) * self.strides
y = torch.cat((dbox, cls.sigmoid()), 1) # shape [1, 4+num_cls, (20^2+40^2+80^2)] 4->(x,y,x,y)输入图绝对坐标
return y, x
def make_anchors(self, feats, strides, grid_cell_offset=0.5):
"""Generate anchors from features."""
anchor_points, stride_tensor = [], []
assert feats is not None
dtype, device = feats[0].dtype, feats[0].device
for i, stride in enumerate(strides):
_, _, h, w = feats[i].shape # bs, channel, h, w
sx = torch.arange(end=w, device=device, dtype=dtype) + grid_cell_offset # x方向网格中心点
sy = torch.arange(end=h, device=device, dtype=dtype) + grid_cell_offset # y方向网格中心点
sy, sx = torch.meshgrid(sy, sx)
anchor_points.append(torch.stack((sx, sy), -1).view(-1, 2)) # 所有网格中心点
stride_tensor.append(torch.full((h * w, 1), stride, dtype=dtype, device=device))
return torch.cat(anchor_points), torch.cat(stride_tensor)
其余模块的实现方式与
YOLOv5
中一致,具体可参考文章YOLOv5算法实现(二):模型搭建。
??基于图1所示的模型结构和模型模块所需的参数,构建模型配置文件。其中结构解析包含四个参数[from,number,module,args]:
# Parameters
nc: 80 # number of classes
depth_multiple: 1.00 # 模型深度(模块个数系数)
width_multiple: 1.00 # 模型宽度(模块通道数系数)
# YOLOv8.0l backbone
backbone:
# [from, repeats, module, args]
- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
- [-1, 3, C2f, [128, True]]
- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
- [-1, 6, C2f, [256, True]]
- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
- [-1, 6, C2f, [512, True]]
- [-1, 1, Conv, [512, 3, 2]] # 7-P5/32
- [-1, 3, C2f, [512, True]]
- [-1, 1, SPPF, [512, 5]] # 9
# YOLOv8.0l head
head:
- [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- [[-1, 6], 1, Concat, [1]] # cat backbone P4
- [-1, 3, C2f, [512]] # 13
- [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- [[-1, 4], 1, Concat, [1]] # cat backbone P3
- [-1, 3, C2f, [256]] # 17 (P3/8-small)
- [-1, 1, Conv, [256, 3, 2]]
- [[-1, 12], 1, Concat, [1]] # cat head P4
- [-1, 3, C2f, [512]] # 20 (P4/16-medium)
- [-1, 1, Conv, [512, 3, 2]]
- [[-1, 9], 1, Concat, [1]] # cat head P5
- [-1, 3, C2f, [512]] # 23 (P5/32-large)
- [[15, 18, 21], 1, Detect_v8, [nc]] # Detect(P3, P4, P5)
模型搭建的具体实现方法可见文章YOLOv5算法实现(二):模型搭建