论文地址:VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
这篇论文《VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION》的创新点以及主要贡献如下:
创新点:深度网络架构
主要贡献:性能提升和影响
输入层:网络接收一个224 x 224 x 3的输入图像,这意味着图像宽度为224像素,高度为224像素,具有3个颜色通道(红色、绿色和蓝色)。
第一和第二个卷积块:输入图像首先经过两个3x3的卷积层,每个层的深度为64。每个卷积层后面都接一个ReLU激活函数。该块最后是一个2x2滤波器和步长为2的最大池化层,将尺寸减半到112 x 112 x 64。
第三和第四个卷积块:每个块由两个3x3的卷积层组成。第三个块的深度增加到128,第四块进一步增加到256。每个卷积层后面都接ReLU激活函数。每个块最后都有一个最大池化层,同样的规格,将尺寸缩小到56 x 56 x 128和28 x 28 x 256。
第五和第六个卷积块:这两个块各包含三个3x3的卷积层。第五块的深度保持为256,而第六块增加到512。如前所述,每个卷积层后面都接一个ReLU激活,每个块结束时有一个最大池化层,将尺寸进一步减小到14 x 14 x 512,然后是7 x 7 x 512。
全连接层:将最后一个池化层的输出扁平化后,网络过渡到三个全连接层。前两个全连接层各有4096个单元,应用了ReLU激活函数。此外,第一个全连接层后面接一个dropout层以防止过拟合。
输出层:最后一个全连接层有1000个单元,对应1000个类别,后面接一个softmax函数来得到类别的概率分布。
目标:网络产生一个分类输出,在提供的图片的上下文中,对应的类别为“汽车”。
from typing import cast
import torch
from torch import nn
from torch.hub import load_state_dict_from_url
class VGG(nn.Module):
def __init__(
self,
features,
num_classes=1000,
init_weights=True,
dropout=0.5
):
"""
:param features: 特征提取层,也就是vgg块组成
:param num_classes: 输出类别数量 默认1000
:param init_weights: 是否参数初始化 默认是True
:param dropout: drop正则化概率 默认是0.5
"""
super().__init__()
self.features = features
self.avgpool = nn.AdaptiveAvgPool2d((7 * 7))
self.classifier = nn.Sequential(
nn.Linear(512 * 7 * 7, 4096),
nn.ReLU(True),
nn.Dropout(p=dropout),
nn.Linear(4096, 4096),
nn.ReLU(True),
nn.Dropout(p=dropout),
nn.Linear(4096, num_classes),
)
if init_weights:
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode="fan_out", nonlinearity="relu")
if m.bias is not None:
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.BatchNorm2d):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.Linear):
nn.init.normal_(m.weight, 0, 0.01)
nn.init.constant_(m.bias, 0)
def forward(self,x ):
x = self.features(x)
x = self.avgpool(x)
x = torch.flatten(x, 1)
x = self.classifier(x)
return x
def make_layers(cfg, batch_norm=False):
layers = []
in_channels = 3
for v in cfg:
if v == 'M':
layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
else:
v = cast(int, v)
conv2d = nn.Conv2d(in_channels=in_channels, out_channels= v, kernel_size=3, padding=1)
if batch_norm:
layers += [conv2d, nn.BatchNorm2d(num_features=v), nn.ReLU(inplace=True)]
else:
layers += [conv2d, nn.ReLU(inplace=True)]
in_channels = v
return nn.Sequential(*layers)
cfgs = {
"A": [64, "M", 128, "M", 256, 256, "M", 512, 512, "M", 512, 512, "M"],
"B": [64, 64, "M", 128, 128, "M", 256, 256, "M", 512, 512, "M", 512, 512, "M"],
"D": [64, 64, "M", 128, 128, "M", 256, 256, 256, "M", 512, 512, 512, "M", 512, 512, 512, "M"],
"E": [64, 64, "M", 128, 128, "M", 256, 256, 256, 256, "M", 512, 512, 512, 512, "M", 512, 512, 512, 512, "M"],
}
model_config = {
"VGG-11": {
"cfg" : "A",
"model_url" : {
"normal" : "https://download.pytorch.org/models/vgg11-8a719046.pth",
"bn" : "https://download.pytorch.org/models/vgg11_bn-6002323d.pth"
}
},
"VGG-13": {
"cfg" : "B",
"model_url" : {
"normal" : "https://download.pytorch.org/models/vgg13-19584684.pth",
"bn" : "https://download.pytorch.org/models/vgg13_bn-abd245e5.pth"
}
},
"VGG-16": {
"cfg" : "D",
"model_url" : {
"normal" : "https://download.pytorch.org/models/vgg16-397923af.pth",
"bn" : "https://download.pytorch.org/models/vgg16_bn-6c64b313.pth"
}
},
"VGG-19": {
"cfg" : "E",
"model_url" : {
"normal" : "https://download.pytorch.org/models/vgg19-dcbb9e9d.pth",
"bn" : "https://download.pytorch.org/models/vgg19_bn-c79401a0.pth"
}
},
}
def vgg(model_name, num_classes=1000, pretrained=True, batch_norm=True):
config = model_config[model_name]
model = VGG(features=make_layers(cfg=cfgs[config['cfg']], batch_norm=batch_norm))
if pretrained:
if batch_norm:
state_dict = load_state_dict_from_url(
url = config["model_url"]["bn"],
model_dir="./pretrained_model",
progress=True
)
else:
state_dict = load_state_dict_from_url(
url=config["model_url"]["normal"],
model_dir="./pretrained_model",
progress=True
)
model.load_state_dict(state_dict=state_dict, strict=False)
if num_classes != 1000:
model.classifier[-1] = nn.Linear(in_features=4096, out_features=num_classes)
return model
以上内容旨在记录自己的学习过程以及复习,如有错误,欢迎批评指正,谢谢阅读。