睿智的目標(biāo)檢測(cè)——Pytorch 利用mobilenet系列（v1,v2,v3）搭建yolov4目標(biāo)檢測(cè)平臺(tái)

學(xué)習(xí)前言

一起來看看如何利用mobilenet系列搭建yolov4目標(biāo)檢測(cè)平臺(tái)。

源碼下載

https://github.com/bubbliiiing/mobilenet-yolov4-pytorch 喜歡的可以點(diǎn)個(gè)star噢。

網(wǎng)絡(luò)替換實(shí)現(xiàn)思路

1、網(wǎng)絡(luò)結(jié)構(gòu)解析與替換思路解析

對(duì)于YoloV4而言，其整個(gè)網(wǎng)絡(luò)結(jié)構(gòu)可以分為三個(gè)部分。分別是：1、主干特征提取網(wǎng)絡(luò)Backbone，對(duì)應(yīng)圖像上的CSPdarknet53 2、加強(qiáng)特征提取網(wǎng)絡(luò)，對(duì)應(yīng)圖像上的SPP和PANet 3、預(yù)測(cè)網(wǎng)絡(luò)YoloHead，利用獲得到的特征進(jìn)行預(yù)測(cè)

其中：第一部分主干特征提取網(wǎng)絡(luò)的功能是進(jìn)行初步的特征提取，利用主干特征提取網(wǎng)絡(luò)，我們可以獲得三個(gè)初步的有效特征層。第二部分加強(qiáng)特征提取網(wǎng)絡(luò)的功能是進(jìn)行加強(qiáng)的特征提取，利用加強(qiáng)特征提取網(wǎng)絡(luò)，我們可以對(duì)三個(gè)初步的有效特征層進(jìn)行特征融合，提取出更好的特征，獲得三個(gè)更有效的有效特征層。第三部分預(yù)測(cè)網(wǎng)絡(luò)的功能是利用更有效的有效特整層獲得預(yù)測(cè)結(jié)果。

在這三部分中，第1部分和第2部分可以更容易去修改。第3部分可修改內(nèi)容不大，畢竟本身也只是3x3卷積和1x1卷積的組合。

mobilenet系列網(wǎng)絡(luò)可用于進(jìn)行分類，其主干部分的作用是進(jìn)行特征提取，我們可以使用mobilenet系列網(wǎng)絡(luò)代替yolov4當(dāng)中的CSPdarknet53進(jìn)行特征提取，將三個(gè)初步的有效特征層相同shape的特征層進(jìn)行加強(qiáng)特征提取，便可以將mobilenet系列替換進(jìn)yolov4當(dāng)中了。

2、mobilenet系列網(wǎng)絡(luò)介紹

本文共用到三個(gè)主干特征提取網(wǎng)絡(luò)，分別是mobilenetV1、mobilenetV2、mobilenetV3。

a、mobilenetV1介紹

MobileNet模型是Google針對(duì)手機(jī)等嵌入式設(shè)備提出的一種輕量級(jí)的深層神經(jīng)網(wǎng)絡(luò)，其使用的核心思想便是depthwise separable convolution（深度可分離卷積塊）。

對(duì)于一個(gè)卷積點(diǎn)而言：假設(shè)有一個(gè)3×3大小的卷積層，其輸入通道為16、輸出通道為32。具體為，32個(gè)3×3大小的卷積核會(huì)遍歷16個(gè)通道中的每個(gè)數(shù)據(jù)，最后可得到所需的32個(gè)輸出通道，所需參數(shù)為16×32×3×3=4608個(gè)。

應(yīng)用深度可分離卷積結(jié)構(gòu)塊，用16個(gè)3×3大小的卷積核分別遍歷16通道的數(shù)據(jù)，得到了16個(gè)特征圖譜。在融合操作之前，接著用32個(gè)1×1大小的卷積核遍歷這16個(gè)特征圖譜，所需參數(shù)為16×3×3+16×32×1×1=656個(gè)。可以看出來depthwise separable convolution可以減少模型的參數(shù)。

如下這張圖就是depthwise separable convolution的結(jié)構(gòu) 在建立模型的時(shí)候，可以將卷積group設(shè)置成in_filters層實(shí)現(xiàn)深度可分離卷積，然后再利用1x1卷積調(diào)整channels數(shù)。

通俗地理解就是3x3的卷積核厚度只有一層，然后在輸入張量上一層一層地滑動(dòng)，每一次卷積完生成一個(gè)輸出通道，當(dāng)卷積完成后，在利用1x1的卷積調(diào)整厚度。

如下就是MobileNet的結(jié)構(gòu)，其中Conv dw就是分層卷積，在其之后都會(huì)接一個(gè)1x1的卷積進(jìn)行通道處理， 上圖所示是的mobilenetV1-1的結(jié)構(gòu)，由于我沒有辦法找到pytorch的mobilenetv1的權(quán)重資源，我只有mobilenetV1-0.25的權(quán)重，所以本文所使用的mobilenetV1版本就是mobilenetV1-0.25。

mobilenetV1-0.25是mobilenetV1-1通道數(shù)壓縮為原來1/4的網(wǎng)絡(luò)。對(duì)于yolov4來講，我們需要取出它的最后三個(gè)shape的有效特征層進(jìn)行加強(qiáng)特征提取。

在代碼中，我們?nèi)〕隽薿ut1、out2、out3。

import?time
import?torch
import?torch.nn?as?nn
import?torchvision.models._utils?as?_utils
import?torchvision.models?as?models
import?torch.nn.functional?as?F
from?torch.autograd?import?Variable

def?conv_bn(inp,?oup,?stride?=?1):
????return?nn.Sequential(
????????nn.Conv2d(inp,?oup,?3,?stride,?1,?bias=False),
????????nn.BatchNorm2d(oup),
????????nn.ReLU6(inplace=True)
????)
????
def?conv_dw(inp,?oup,?stride?=?1):
????return?nn.Sequential(
????????nn.Conv2d(inp,?inp,?3,?stride,?1,?groups=inp,?bias=False),
????????nn.BatchNorm2d(inp),
????????nn.ReLU6(inplace=True),

????????nn.Conv2d(inp,?oup,?1,?1,?0,?bias=False),
????????nn.BatchNorm2d(oup),
????????nn.ReLU6(inplace=True),
????)

class?MobileNetV1(nn.Module):
????def?__init__(self):
????????super(MobileNetV1,?self).__init__()
????????self.stage1?=?nn.Sequential(
????????????#?640,640,3?->?320,320,32
????????????conv_bn(3,?32,?2),
????????????#?320,320,32?->?320,320,64
????????????conv_dw(32,?64,?1),?

????????????#?320,320,64?->?160,160,128
????????????conv_dw(64,?128,?2),
????????????conv_dw(128,?128,?1),

????????????#?160,160,128?->?80,80,256
????????????conv_dw(128,?256,?2),
????????????conv_dw(256,?256,?1),?
????????)
????????????#?80,80,256?->?40,40,512
????????self.stage2?=?nn.Sequential(
????????????conv_dw(256,?512,?2),
????????????conv_dw(512,?512,?1),
????????????conv_dw(512,?512,?1),
????????????conv_dw(512,?512,?1),?
????????????conv_dw(512,?512,?1),
????????????conv_dw(512,?512,?1),
????????)
????????????#?40,40,512?->?20,20,1024
????????self.stage3?=?nn.Sequential(
????????????conv_dw(512,?1024,?2),
????????????conv_dw(1024,?1024,?1),
????????)
????????self.avg?=?nn.AdaptiveAvgPool2d((1,1))
????????self.fc?=?nn.Linear(1024,?1000)

????def?forward(self,?x):
????????x?=?self.stage1(x)
????????x?=?self.stage2(x)
????????x?=?self.stage3(x)
????????x?=?self.avg(x)
????????#?x?=?self.model(x)
????????x?=?x.view(-1,?1024)
????????x?=?self.fc(x)
????????return?x

def?mobilenet_v1(pretrained=False,?progress=True):
????model?=?MobileNetV1()
????if?pretrained:
????????print("mobilenet_v1?has?no?pretrained?model")
????return?model

if?__name__?==?"__main__":
????import?torch
????from?torchsummary?import?summary

????#?需要使用device來指定網(wǎng)絡(luò)在GPU還是CPU運(yùn)行
????device?=?torch.device('cuda'?if?torch.cuda.is_available()?else?'cpu')
????model?=?mobilenet_v1().to(device)
????summary(model,?input_size=(3,?416,?416))

b、mobilenetV2介紹

MobileNetV2是MobileNet的升級(jí)版，它具有一個(gè)非常重要的特點(diǎn)就是使用了Inverted resblock，整個(gè)mobilenetv2都由Inverted resblock組成。

Inverted resblock可以分為兩個(gè)部分：左邊是主干部分，首先利用1x1卷積進(jìn)行升維，然后利用3x3深度可分離卷積進(jìn)行特征提取，然后再利用1x1卷積降維。右邊是殘差邊部分，輸入和輸出直接相接。

整體網(wǎng)絡(luò)結(jié)構(gòu)如下：（其中Inverted resblock進(jìn)行的操作就是上述結(jié)構(gòu)）對(duì)于yolov4來講，我們需要取出它的最后三個(gè)shape的有效特征層進(jìn)行加強(qiáng)特征提取。

在代碼中，我們?nèi)〕隽薿ut1、out2、out3。

from?torch?import?nn
from?torchvision.models.utils?import?load_state_dict_from_url

model_urls?=?{
????'mobilenet_v2':?'https://download.pytorch.org/models/mobilenet_v2-b0353104.pth',
}


def?_make_divisible(v,?divisor,?min_value=None):
????if?min_value?is?None:
????????min_value?=?divisor
????new_v?=?max(min_value,?int(v?+?divisor?/?2)?//?divisor?*?divisor)
????if?new_v?<?0.9?*?v:
????????new_v?+=?divisor
????return?new_v

class?ConvBNReLU(nn.Sequential):
????def?__init__(self,?in_planes,?out_planes,?kernel_size=3,?stride=1,?groups=1):
????????padding?=?(kernel_size?-?1)?//?2
????????super(ConvBNReLU,?self).__init__(
????????????nn.Conv2d(in_planes,?out_planes,?kernel_size,?stride,?padding,?groups=groups,?bias=False),
????????????nn.BatchNorm2d(out_planes),
????????????nn.ReLU6(inplace=True)
????????)

class?InvertedResidual(nn.Module):
????def?__init__(self,?inp,?oup,?stride,?expand_ratio):
????????super(InvertedResidual,?self).__init__()
????????self.stride?=?stride
????????assert?stride?in?[1,?2]

????????hidden_dim?=?int(round(inp?*?expand_ratio))
????????self.use_res_connect?=?self.stride?==?1?and?inp?==?oup

????????layers?=?[]
????????if?expand_ratio?!=?1:
????????????layers.append(ConvBNReLU(inp,?hidden_dim,?kernel_size=1))
????????layers.extend([
????????????ConvBNReLU(hidden_dim,?hidden_dim,?stride=stride,?groups=hidden_dim),
????????????nn.Conv2d(hidden_dim,?oup,?1,?1,?0,?bias=False),
????????????nn.BatchNorm2d(oup),
????????])
????????self.conv?=?nn.Sequential(*layers)

????def?forward(self,?x):
????????if?self.use_res_connect:
????????????return?x?+?self.conv(x)
????????else:
????????????return?self.conv(x)


class?MobileNetV2(nn.Module):
????def?__init__(self,?num_classes=1000,?width_mult=1.0,?inverted_residual_setting=None,?round_nearest=8):
????????super(MobileNetV2,?self).__init__()
????????block?=?InvertedResidual
????????input_channel?=?32
????????last_channel?=?1280

????????if?inverted_residual_setting?is?None:
????????????inverted_residual_setting?=?[
????????????????#?t,?c,?n,?s
????????????????[1,?16,?1,?1],
????????????????[6,?24,?2,?2],
????????????????[6,?32,?3,?2],
????????????????[6,?64,?4,?2],
????????????????[6,?96,?3,?1],
????????????????[6,?160,?3,?2],
????????????????[6,?320,?1,?1],
????????????]

????????if?len(inverted_residual_setting)?==?0?or?len(inverted_residual_setting[0])?!=?4:
????????????raise?ValueError("inverted_residual_setting?should?be?non-empty?"
?????????????????????????????"or?a?4-element?list,?got?{}".format(inverted_residual_setting))

????????input_channel?=?_make_divisible(input_channel?*?width_mult,?round_nearest)
????????self.last_channel?=?_make_divisible(last_channel?*?max(1.0,?width_mult),?round_nearest)
????????features?=?[ConvBNReLU(3,?input_channel,?stride=2)]

????????for?t,?c,?n,?s?in?inverted_residual_setting:
????????????output_channel?=?_make_divisible(c?*?width_mult,?round_nearest)
????????????for?i?in?range(n):
????????????????stride?=?s?if?i?==?0?else?1
????????????????features.append(block(input_channel,?output_channel,?stride,?expand_ratio=t))
????????????????input_channel?=?output_channel

????????features.append(ConvBNReLU(input_channel,?self.last_channel,?kernel_size=1))
????????self.features?=?nn.Sequential(*features)

????????self.classifier?=?nn.Sequential(
????????????nn.Dropout(0.2),
????????????nn.Linear(self.last_channel,?num_classes),
????????)

????????for?m?in?self.modules():
????????????if?isinstance(m,?nn.Conv2d):
????????????????nn.init.kaiming_normal_(m.weight,?mode='fan_out')
????????????????if?m.bias?is?not?None:
????????????????????nn.init.zeros_(m.bias)
????????????elif?isinstance(m,?nn.BatchNorm2d):
????????????????nn.init.ones_(m.weight)
????????????????nn.init.zeros_(m.bias)
????????????elif?isinstance(m,?nn.Linear):
????????????????nn.init.normal_(m.weight,?0,?0.01)
????????????????nn.init.zeros_(m.bias)

????def?forward(self,?x):
????????x?=?self.features(x)
????????x?=?x.mean([2,?3])
????????x?=?self.classifier(x)
????????return?x

def?mobilenet_v2(pretrained=False,?progress=True):
????model?=?MobileNetV2()
????if?pretrained:
????????state_dict?=?load_state_dict_from_url(model_urls['mobilenet_v2'],?model_dir="model_data",
??????????????????????????????????????????????progress=progress)
????????model.load_state_dict(state_dict)

????return?model

if?__name__?==?"__main__":
????print(mobilenet_v2())

c、mobilenetV3介紹

mobilenetV3使用了特殊的bneck結(jié)構(gòu)。

bneck結(jié)構(gòu)如下圖所示： 它綜合了以下四個(gè)特點(diǎn)：a、MobileNetV2的具有線性瓶頸的逆殘差結(jié)構(gòu)(the inverted residual with linear bottleneck)。 即先利用1x1卷積進(jìn)行升維度，再進(jìn)行下面的操作，并具有殘差邊。

b、MobileNetV1的深度可分離卷積（depthwise separable convolutions）。 在輸入1x1卷積進(jìn)行升維度后，進(jìn)行3x3深度可分離卷積。

c、輕量級(jí)的注意力模型。 這個(gè)注意力機(jī)制的作用方式是調(diào)整每個(gè)通道的權(quán)重。

d、利用h-swish代替swish函數(shù)。 在結(jié)構(gòu)中使用了h-swishj激活函數(shù)，代替swish函數(shù)，減少運(yùn)算量，提高性能。

下圖為整個(gè)mobilenetV3的結(jié)構(gòu)圖：如何看懂這個(gè)表呢？我們從每一列出發(fā)：第一列Input代表mobilenetV3每個(gè)特征層的shape變化；第二列Operator代表每次特征層即將經(jīng)歷的block結(jié)構(gòu)，我們可以看到在MobileNetV3中，特征提取經(jīng)過了許多的bneck結(jié)構(gòu)；第三、四列分別代表了bneck內(nèi)逆殘差結(jié)構(gòu)上升后的通道數(shù)、輸入到bneck時(shí)特征層的通道數(shù)。第五列SE代表了是否在這一層引入注意力機(jī)制。第六列NL代表了激活函數(shù)的種類，HS代表h-swish，RE代表RELU。第七列s代表了每一次block結(jié)構(gòu)所用的步長(zhǎng)。

對(duì)于yolov4來講，我們需要取出它的最后三個(gè)shape的有效特征層進(jìn)行加強(qiáng)特征提取。

在代碼中，我們?nèi)〕隽薿ut1、out2、out3。

import?torch.nn?as?nn
import?math
import?torch
def?_make_divisible(v,?divisor,?min_value=None):
????if?min_value?is?None:
????????min_value?=?divisor
????new_v?=?max(min_value,?int(v?+?divisor?/?2)?//?divisor?*?divisor)
????#?Make?sure?that?round?down?does?not?go?down?by?more?than?10%.
????if?new_v?<?0.9?*?v:
????????new_v?+=?divisor
????return?new_v

class?h_sigmoid(nn.Module):
????def?__init__(self,?inplace=True):
????????super(h_sigmoid,?self).__init__()
????????self.relu?=?nn.ReLU6(inplace=inplace)

????def?forward(self,?x):
????????return?self.relu(x?+?3)?/?6


class?h_swish(nn.Module):
????def?__init__(self,?inplace=True):
????????super(h_swish,?self).__init__()
????????self.sigmoid?=?h_sigmoid(inplace=inplace)

????def?forward(self,?x):
????????return?x?*?self.sigmoid(x)


class?SELayer(nn.Module):
????def?__init__(self,?channel,?reduction=4):
????????super(SELayer,?self).__init__()
????????self.avg_pool?=?nn.AdaptiveAvgPool2d(1)
????????self.fc?=?nn.Sequential(
????????????????nn.Linear(channel,?_make_divisible(channel?//?reduction,?8)),
????????????????nn.ReLU(inplace=True),
????????????????nn.Linear(_make_divisible(channel?//?reduction,?8),?channel),
????????????????h_sigmoid()
????????)

????def?forward(self,?x):
????????b,?c,?_,?_?=?x.size()
????????y?=?self.avg_pool(x).view(b,?c)
????????y?=?self.fc(y).view(b,?c,?1,?1)
????????return?x?*?y


def?conv_3x3_bn(inp,?oup,?stride):
????return?nn.Sequential(
????????nn.Conv2d(inp,?oup,?3,?stride,?1,?bias=False),
????????nn.BatchNorm2d(oup),
????????h_swish()
????)


def?conv_1x1_bn(inp,?oup):
????return?nn.Sequential(
????????nn.Conv2d(inp,?oup,?1,?1,?0,?bias=False),
????????nn.BatchNorm2d(oup),
????????h_swish()
????)


class?InvertedResidual(nn.Module):
????def?__init__(self,?inp,?hidden_dim,?oup,?kernel_size,?stride,?use_se,?use_hs):
????????super(InvertedResidual,?self).__init__()
????????assert?stride?in?[1,?2]

????????self.identity?=?stride?==?1?and?inp?==?oup

????????if?inp?==?hidden_dim:
????????????self.conv?=?nn.Sequential(
????????????????#?dw
????????????????nn.Conv2d(hidden_dim,?hidden_dim,?kernel_size,?stride,?(kernel_size?-?1)?//?2,?groups=hidden_dim,?bias=False),
????????????????nn.BatchNorm2d(hidden_dim),
????????????????h_swish()?if?use_hs?else?nn.ReLU(inplace=True),
????????????????#?Squeeze-and-Excite
????????????????SELayer(hidden_dim)?if?use_se?else?nn.Identity(),
????????????????#?pw-linear
????????????????nn.Conv2d(hidden_dim,?oup,?1,?1,?0,?bias=False),
????????????????nn.BatchNorm2d(oup),
????????????)
????????else:
????????????self.conv?=?nn.Sequential(
????????????????#?pw
????????????????nn.Conv2d(inp,?hidden_dim,?1,?1,?0,?bias=False),
????????????????nn.BatchNorm2d(hidden_dim),
????????????????h_swish()?if?use_hs?else?nn.ReLU(inplace=True),
????????????????#?dw
????????????????nn.Conv2d(hidden_dim,?hidden_dim,?kernel_size,?stride,?(kernel_size?-?1)?//?2,?groups=hidden_dim,?bias=False),
????????????????nn.BatchNorm2d(hidden_dim),
????????????????#?Squeeze-and-Excite
????????????????SELayer(hidden_dim)?if?use_se?else?nn.Identity(),
????????????????h_swish()?if?use_hs?else?nn.ReLU(inplace=True),
????????????????#?pw-linear
????????????????nn.Conv2d(hidden_dim,?oup,?1,?1,?0,?bias=False),
????????????????nn.BatchNorm2d(oup),
????????????)

????def?forward(self,?x):
????????if?self.identity:
????????????return?x?+?self.conv(x)
????????else:
????????????return?self.conv(x)


class?MobileNetV3(nn.Module):
????def?__init__(self,?num_classes=1000,?width_mult=1.):
????????super(MobileNetV3,?self).__init__()
????????#?setting?of?inverted?residual?blocks
????????self.cfgs?=?[
????????????#`?k,?t,?c,?SE,?HS,?s?
????????????[3,???1,??16,?0,?0,?1],
????????????[3,???4,??24,?0,?0,?2],
????????????[3,???3,??24,?0,?0,?1],
????????????[5,???3,??40,?1,?0,?2],
????????????[5,???3,??40,?1,?0,?1],
????????????[5,???3,??40,?1,?0,?1],
????????????[3,???6,??80,?0,?1,?2],
????????????[3,?2.5,??80,?0,?1,?1],
????????????[3,?2.3,??80,?0,?1,?1],
????????????[3,?2.3,??80,?0,?1,?1],
????????????[3,???6,?112,?1,?1,?1],
????????????[3,???6,?112,?1,?1,?1],
????????????[5,???6,?160,?1,?1,?2],
????????????[5,???6,?160,?1,?1,?1],
????????????[5,???6,?160,?1,?1,?1]
????????]

????????input_channel?=?_make_divisible(16?*?width_mult,?8)
????????layers?=?[conv_3x3_bn(3,?input_channel,?2)]

????????block?=?InvertedResidual
????????for?k,?t,?c,?use_se,?use_hs,?s?in?self.cfgs:
????????????output_channel?=?_make_divisible(c?*?width_mult,?8)
????????????exp_size?=?_make_divisible(input_channel?*?t,?8)
????????????layers.append(block(input_channel,?exp_size,?output_channel,?k,?s,?use_se,?use_hs))
????????????input_channel?=?output_channel
????????self.features?=?nn.Sequential(*layers)

????????self.conv?=?conv_1x1_bn(input_channel,?exp_size)
????????self.avgpool?=?nn.AdaptiveAvgPool2d((1,?1))
????????output_channel?=?_make_divisible(1280?*?width_mult,?8)?if?width_mult?>?1.0?else?1280
????????self.classifier?=?nn.Sequential(
????????????nn.Linear(exp_size,?output_channel),
????????????h_swish(),
????????????nn.Dropout(0.2),
????????????nn.Linear(output_channel,?num_classes),
????????)

????????self._initialize_weights()

????def?forward(self,?x):
????????x?=?self.features(x)
????????x?=?self.conv(x)
????????x?=?self.avgpool(x)
????????x?=?x.view(x.size(0),?-1)
????????x?=?self.classifier(x)
????????return?x

????def?_initialize_weights(self):
????????for?m?in?self.modules():
????????????if?isinstance(m,?nn.Conv2d):
????????????????n?=?m.kernel_size[0]?*?m.kernel_size[1]?*?m.out_channels
????????????????m.weight.data.normal_(0,?math.sqrt(2.?/?n))
????????????????if?m.bias?is?not?None:
????????????????????m.bias.data.zero_()
????????????elif?isinstance(m,?nn.BatchNorm2d):
????????????????m.weight.data.fill_(1)
????????????????m.bias.data.zero_()
????????????elif?isinstance(m,?nn.Linear):
????????????????n?=?m.weight.size(1)
????????????????m.weight.data.normal_(0,?0.01)
????????????????m.bias.data.zero_()

def?mobilenet_v3(pretrained=False,?**kwargs):
????model?=?MobileNetV3(**kwargs)
????if?pretrained:
????????state_dict?=?torch.load('./model_data/mobilenetv3-large-1cd25616.pth')
????????model.load_state_dict(state_dict,?strict=True)
????return?model

3、將預(yù)測(cè)結(jié)果融入到y(tǒng)olov4網(wǎng)絡(luò)當(dāng)中

對(duì)于yolov4來講，我們需要利用主干特征提取網(wǎng)絡(luò)獲得的三個(gè)有效特征進(jìn)行加強(qiáng)特征金字塔的構(gòu)建。

利用上一步定義的MobilenetV1、MobilenetV2、MobilenetV3三個(gè)函數(shù)我們可以獲得每個(gè)Mobilenet網(wǎng)絡(luò)對(duì)應(yīng)的三個(gè)有效特征層。

我們可以利用這三個(gè)有效特征層替換原來yolov4主干網(wǎng)絡(luò)CSPdarknet53的有效特征層。

為了進(jìn)一步減少參數(shù)量，我們可以使用深度可分離卷積代替yoloV3中用到的普通卷積。

實(shí)現(xiàn)代碼如下：

import?torch
import?torch.nn?as?nn
from?collections?import?OrderedDict
from?nets.mobilenet_v1?import?mobilenet_v1
from?nets.mobilenet_v2?import?mobilenet_v2
from?nets.mobilenet_v3?import?mobilenet_v3

class?MobileNetV1(nn.Module):
????def?__init__(self,?pretrained?=?False):
????????super(MobileNetV1,?self).__init__()
????????self.model?=?mobilenet_v1(pretrained=pretrained)

????def?forward(self,?x):
????????out3?=?self.model.stage1(x)
????????out4?=?self.model.stage2(out3)
????????out5?=?self.model.stage3(out4)
????????return?out3,?out4,?out5

class?MobileNetV2(nn.Module):
????def?__init__(self,?pretrained?=?False):
????????super(MobileNetV2,?self).__init__()
????????self.model?=?mobilenet_v2(pretrained=pretrained)

????def?forward(self,?x):
????????out3?=?self.model.features[:7](x)
????????out4?=?self.model.features[7:14](out3)
????????out5?=?self.model.features[14:18](out4)
????????return?out3,?out4,?out5

class?MobileNetV3(nn.Module):
????def?__init__(self,?pretrained?=?False):
????????super(MobileNetV3,?self).__init__()
????????self.model?=?mobilenet_v3(pretrained=pretrained)

????def?forward(self,?x):
????????out3?=?self.model.features[:7](x)
????????out4?=?self.model.features[7:13](out3)
????????out5?=?self.model.features[13:16](out4)
????????return?out3,?out4,?out5

def?conv2d(filter_in,?filter_out,?kernel_size,?groups=1,?stride=1):
????pad?=?(kernel_size?-?1)?//?2?if?kernel_size?else?0
????return?nn.Sequential(OrderedDict([
????????("conv",?nn.Conv2d(filter_in,?filter_out,?kernel_size=kernel_size,?stride=stride,?padding=pad,?groups=groups,?bias=False)),
????????("bn",?nn.BatchNorm2d(filter_out)),
????????("relu",?nn.ReLU6(inplace=True)),
????]))

def?conv_dw(filter_in,?filter_out,?stride?=?1):
????return?nn.Sequential(
????????nn.Conv2d(filter_in,?filter_in,?3,?stride,?1,?groups=filter_in,?bias=False),
????????nn.BatchNorm2d(filter_in),
????????nn.ReLU6(inplace=True),

????????nn.Conv2d(filter_in,?filter_out,?1,?1,?0,?bias=False),
????????nn.BatchNorm2d(filter_out),
????????nn.ReLU6(inplace=True),
????)

#---------------------------------------------------#
#???SPP結(jié)構(gòu)，利用不同大小的池化核進(jìn)行池化
#???池化后堆疊
#---------------------------------------------------#
class?SpatialPyramidPooling(nn.Module):
????def?__init__(self,?pool_sizes=[5,?9,?13]):
????????super(SpatialPyramidPooling,?self).__init__()

????????self.maxpools?=?nn.ModuleList([nn.MaxPool2d(pool_size,?1,?pool_size//2)?for?pool_size?in?pool_sizes])

????def?forward(self,?x):
????????features?=?[maxpool(x)?for?maxpool?in?self.maxpools[::-1]]
????????features?=?torch.cat(features?+?[x],?dim=1)

????????return?features

#---------------------------------------------------#
#???卷積?+?上采樣
#---------------------------------------------------#
class?Upsample(nn.Module):
????def?__init__(self,?in_channels,?out_channels):
????????super(Upsample,?self).__init__()

????????self.upsample?=?nn.Sequential(
????????????conv2d(in_channels,?out_channels,?1),
????????????nn.Upsample(scale_factor=2,?mode='nearest')
????????)

????def?forward(self,?x,):
????????x?=?self.upsample(x)
????????return?x

#---------------------------------------------------#
#???三次卷積塊
#---------------------------------------------------#
def?make_three_conv(filters_list,?in_filters):
????m?=?nn.Sequential(
????????conv2d(in_filters,?filters_list[0],?1),
????????conv_dw(filters_list[0],?filters_list[1]),
????????conv2d(filters_list[1],?filters_list[0],?1),
????)
????return?m

#---------------------------------------------------#
#???五次卷積塊
#---------------------------------------------------#
def?make_five_conv(filters_list,?in_filters):
????m?=?nn.Sequential(
????????conv2d(in_filters,?filters_list[0],?1),
????????conv_dw(filters_list[0],?filters_list[1]),
????????conv2d(filters_list[1],?filters_list[0],?1),
????????conv_dw(filters_list[0],?filters_list[1]),
????????conv2d(filters_list[1],?filters_list[0],?1),
????)
????return?m

#---------------------------------------------------#
#???最后獲得yolov4的輸出
#---------------------------------------------------#
def?yolo_head(filters_list,?in_filters):
????m?=?nn.Sequential(
????????conv_dw(in_filters,?filters_list[0]),
????????
????????nn.Conv2d(filters_list[0],?filters_list[1],?1),
????)
????return?m

#---------------------------------------------------#
#???yolo_body
#---------------------------------------------------#
class?YoloBody(nn.Module):
????def?__init__(self,?num_anchors,?num_classes,?backbone="mobilenetv2",?pretrained=False):
????????super(YoloBody,?self).__init__()
????????#??backbone
????????if?backbone?==?"mobilenetv1":
????????????self.backbone?=?MobileNetV1(pretrained=pretrained)
????????????alpha?=?1
????????????in_filters?=?[256,512,1024]
????????elif?backbone?==?"mobilenetv2":
????????????self.backbone?=?MobileNetV2(pretrained=pretrained)
????????????alpha?=?1
????????????in_filters?=?[32,96,320]
????????elif?backbone?==?"mobilenetv3":
????????????self.backbone?=?MobileNetV3(pretrained=pretrained)
????????????alpha?=?1
????????????in_filters?=?[40,112,160]
????????else:
????????????raise?ValueError('Unsupported?backbone?-?`{}`,?Use?mobilenetv1,?mobilenetv2,?mobilenetv3.'.format(backbone))

????????self.conv1???????????=?make_three_conv([int(512*alpha),?int(1024*alpha)],?in_filters[2])
????????self.SPP?????????????=?SpatialPyramidPooling()
????????self.conv2???????????=?make_three_conv([int(512*alpha),?int(1024*alpha)],?int(2048*alpha))

????????self.upsample1???????=?Upsample(int(512*alpha),?int(256*alpha))
????????self.conv_for_P4?????=?conv2d(in_filters[1],?int(256*alpha),1)
????????self.make_five_conv1?=?make_five_conv([int(256*alpha),?int(512*alpha)],?int(512*alpha))

????????self.upsample2???????=?Upsample(int(256*alpha),?int(128*alpha))
????????self.conv_for_P3?????=?conv2d(in_filters[0],?int(128*alpha),1)
????????self.make_five_conv2?=?make_five_conv([?int(128*alpha),?int(256*alpha)],?int(256*alpha))
????????#?3*(5+num_classes)=3*(5+20)=3*(4+1+20)=75
????????#?4+1+num_classes
????????final_out_filter2????=?num_anchors?*?(5?+?num_classes)
????????self.yolo_head3??????=?yolo_head([int(256*alpha),?final_out_filter2],int(128*alpha))

????????self.down_sample1????=?conv_dw(int(128*alpha),?int(256*alpha),stride=2)
????????self.make_five_conv3?=?make_five_conv([int(256*alpha),?int(512*alpha)],int(512*alpha))
????????#?3*(5+num_classes)=3*(5+20)=3*(4+1+20)=75
????????final_out_filter1????=?num_anchors?*?(5?+?num_classes)
????????self.yolo_head2??????=?yolo_head([int(512*alpha),?final_out_filter1],?int(256*alpha))


????????self.down_sample2????=?conv_dw(int(256*alpha),?int(512*alpha),stride=2)
????????self.make_five_conv4?=?make_five_conv([int(512*alpha),?int(1024*alpha)],?int(1024*alpha))
????????#?3*(5+num_classes)=3*(5+20)=3*(4+1+20)=75
????????final_out_filter0????=?num_anchors?*?(5?+?num_classes)
????????self.yolo_head1??????=?yolo_head([int(1024*alpha),?final_out_filter0],?int(512*alpha))


????def?forward(self,?x):
????????#??backbone
????????x2,?x1,?x0?=?self.backbone(x)

????????P5?=?self.conv1(x0)
????????P5?=?self.SPP(P5)
????????P5?=?self.conv2(P5)

????????P5_upsample?=?self.upsample1(P5)
????????P4?=?self.conv_for_P4(x1)
????????P4?=?torch.cat([P4,P5_upsample],axis=1)
????????P4?=?self.make_five_conv1(P4)

????????P4_upsample?=?self.upsample2(P4)
????????P3?=?self.conv_for_P3(x2)
????????P3?=?torch.cat([P3,P4_upsample],axis=1)
????????P3?=?self.make_five_conv2(P3)

????????P3_downsample?=?self.down_sample1(P3)
????????P4?=?torch.cat([P3_downsample,P4],axis=1)
????????P4?=?self.make_five_conv3(P4)

????????P4_downsample?=?self.down_sample2(P4)
????????P5?=?torch.cat([P4_downsample,P5],axis=1)
????????P5?=?self.make_five_conv4(P5)

????????out2?=?self.yolo_head3(P3)
????????out1?=?self.yolo_head2(P4)
????????out0?=?self.yolo_head1(P5)

????????return?out0,?out1,?out2

訓(xùn)練自己的YoloV4模型

首先前往Github下載對(duì)應(yīng)的倉(cāng)庫(kù)，下載完后利用解壓軟件解壓，之后用編程軟件打開文件夾。注意打開的根目錄必須正確，否則相對(duì)目錄不正確的情況下，代碼將無(wú)法運(yùn)行。一定要注意打開后的根目錄是文件存放的目錄。

一、數(shù)據(jù)集的準(zhǔn)備

本文使用VOC格式進(jìn)行訓(xùn)練，訓(xùn)練前需要自己制作好數(shù)據(jù)集，如果沒有自己的數(shù)據(jù)集，可以通過Github連接下載VOC12+07的數(shù)據(jù)集嘗試下。訓(xùn)練前將標(biāo)簽文件放在VOCdevkit文件夾下的VOC2007文件夾下的Annotation中。訓(xùn)練前將圖片文件放在VOCdevkit文件夾下的VOC2007文件夾下的JPEGImages中。此時(shí)數(shù)據(jù)集的擺放已經(jīng)結(jié)束。

二、數(shù)據(jù)集的處理

在完成數(shù)據(jù)集的擺放之后，我們需要對(duì)數(shù)據(jù)集進(jìn)行下一步的處理，目的是獲得訓(xùn)練用的2007_train.txt以及2007_val.txt，需要用到根目錄下的voc_annotation.py。

voc_annotation.py里面有一些參數(shù)需要設(shè)置。分別是annotation_mode、classes_path、trainval_percent、train_percent、VOCdevkit_path，第一次訓(xùn)練可以僅修改classes_path

'''
annotation_mode用于指定該文件運(yùn)行時(shí)計(jì)算的內(nèi)容
annotation_mode為0代表整個(gè)標(biāo)簽處理過程，包括獲得VOCdevkit/VOC2007/ImageSets里面的txt以及訓(xùn)練用的2007_train.txt、2007_val.txt
annotation_mode為1代表獲得VOCdevkit/VOC2007/ImageSets里面的txt
annotation_mode為2代表獲得訓(xùn)練用的2007_train.txt、2007_val.txt
'''
annotation_mode?????=?0
'''
必須要修改，用于生成2007_train.txt、2007_val.txt的目標(biāo)信息
與訓(xùn)練和預(yù)測(cè)所用的classes_path一致即可
如果生成的2007_train.txt里面沒有目標(biāo)信息
那么就是因?yàn)閏lasses沒有設(shè)定正確
僅在annotation_mode為0和2的時(shí)候有效
'''
classes_path????????=?'model_data/voc_classes.txt'
'''
trainval_percent用于指定(訓(xùn)練集+驗(yàn)證集)與測(cè)試集的比例，默認(rèn)情況下?(訓(xùn)練集+驗(yàn)證集):測(cè)試集?=?9:1
train_percent用于指定(訓(xùn)練集+驗(yàn)證集)中訓(xùn)練集與驗(yàn)證集的比例，默認(rèn)情況下?訓(xùn)練集:驗(yàn)證集?=?9:1
僅在annotation_mode為0和1的時(shí)候有效
'''
trainval_percent????=?0.9
train_percent???????=?0.9
'''
指向VOC數(shù)據(jù)集所在的文件夾
默認(rèn)指向根目錄下的VOC數(shù)據(jù)集
'''
VOCdevkit_path??=?'VOCdevkit'

classes_path用于指向檢測(cè)類別所對(duì)應(yīng)的txt，以voc數(shù)據(jù)集為例，我們用的txt為：訓(xùn)練自己的數(shù)據(jù)集時(shí)，可以自己建立一個(gè)cls_classes.txt，里面寫自己所需要區(qū)分的類別。

三、開始網(wǎng)絡(luò)訓(xùn)練

通過voc_annotation.py我們已經(jīng)生成了2007_train.txt以及2007_val.txt，此時(shí)我們可以開始訓(xùn)練了。訓(xùn)練的參數(shù)較多，大家可以在下載庫(kù)后仔細(xì)看注釋，其中最重要的部分依然是train.py里的classes_path。

classes_path用于指向檢測(cè)類別所對(duì)應(yīng)的txt，這個(gè)txt和voc_annotation.py里面的txt一樣！訓(xùn)練自己的數(shù)據(jù)集必須要修改！ 修改完classes_path后就可以運(yùn)行train.py開始訓(xùn)練了，在訓(xùn)練多個(gè)epoch后，權(quán)值會(huì)生成在logs文件夾中。

另外，backbone參數(shù)用于指定所用的主干特征提取網(wǎng)絡(luò)，可以在mobilenetv1, mobilenetv2, mobilenetv3中進(jìn)行選擇。

訓(xùn)練前需要注意所用mobilenet版本和預(yù)訓(xùn)練權(quán)重的對(duì)齊。

其它參數(shù)的作用如下：

#-------------------------------#
#???是否使用Cuda
#???沒有GPU可以設(shè)置成False
#-------------------------------#
Cuda?=?True
#--------------------------------------------------------#
#???訓(xùn)練前一定要修改classes_path，使其對(duì)應(yīng)自己的數(shù)據(jù)集
#--------------------------------------------------------#
classes_path????=?'model_data/voc_classes.txt'
#---------------------------------------------------------------------#
#?? anchors_path代表先驗(yàn)框?qū)?yīng)的txt文件，一般不修改。
#?? anchors_mask用于幫助代碼找到對(duì)應(yīng)的先驗(yàn)框，一般不修改。
#---------------------------------------------------------------------#
anchors_path????=?'model_data/yolo_anchors.txt'
anchors_mask????=?[[6,?7,?8],?[3,?4,?5],?[0,?1,?2]]
#------------------------------------------------------------------------------------------------------#
#???權(quán)值文件請(qǐng)看README，百度網(wǎng)盤下載。數(shù)據(jù)的預(yù)訓(xùn)練權(quán)重對(duì)不同數(shù)據(jù)集是通用的，因?yàn)樘卣魇峭ㄓ玫?/span>
#???預(yù)訓(xùn)練權(quán)重對(duì)于99%的情況都必須要用，不用的話權(quán)值太過隨機(jī)，特征提取效果不明顯，網(wǎng)絡(luò)訓(xùn)練的結(jié)果也不會(huì)好。
#???訓(xùn)練自己的數(shù)據(jù)集時(shí)提示維度不匹配正常，預(yù)測(cè)的東西都不一樣了自然維度不匹配
#???如果想要斷點(diǎn)續(xù)練就將model_path設(shè)置成logs文件夾下已經(jīng)訓(xùn)練的權(quán)值文件。?
#------------------------------------------------------------------------------------------------------#
model_path??????=?'model_data/yolov4_mobilenet_v1_voc.pth'
#------------------------------------------------------#
#???輸入的shape大小，一定要是32的倍數(shù)
#------------------------------------------------------#
input_shape?????=?[416,?416]
#-------------------------------#
#???所使用的主干特征提取網(wǎng)絡(luò)
#???mobilenetv1
#???mobilenetv2
#???mobilenetv3
#???ghostnet
#-------------------------------#
backbone????????=?"mobilenetv1"
#----------------------------------#
#???是否使用主干網(wǎng)絡(luò)的預(yù)訓(xùn)練權(quán)重
#???只包括主干部分，與model_path無(wú)關(guān)
#----------------------------------#
pretrained??????=?False
#------------------------------------------------------#
#???Yolov4的tricks應(yīng)用
#???mosaic?馬賽克數(shù)據(jù)增強(qiáng)?True?or?False?
#???實(shí)際測(cè)試時(shí)mosaic數(shù)據(jù)增強(qiáng)并不穩(wěn)定，所以默認(rèn)為False
#???Cosine_scheduler?余弦退火學(xué)習(xí)率?True?or?False
#???label_smoothing?標(biāo)簽平滑?0.01以下一般?如0.01、0.005
#------------------------------------------------------#
mosaic??????????????=?False
Cosine_lr???????????=?False
label_smoothing?????=?0

四、訓(xùn)練結(jié)果預(yù)測(cè)

訓(xùn)練結(jié)果預(yù)測(cè)需要用到兩個(gè)文件，分別是yolo.py和predict.py。我們首先需要去yolo.py里面修改model_path以及classes_path，這兩個(gè)參數(shù)必須要修改。

另外，backbone參數(shù)用于指定所用的主干特征提取網(wǎng)絡(luò)，可以在mobilenetv1, mobilenetv2, mobilenetv3中進(jìn)行選擇。

model_path指向訓(xùn)練好的權(quán)值文件，在logs文件夾里。classes_path指向檢測(cè)類別所對(duì)應(yīng)的txt。 完成修改后就可以運(yùn)行predict.py進(jìn)行檢測(cè)了。運(yùn)行后輸入圖片路徑即可檢測(cè)。

Pytorch 利用mobilenet系列（v1,v2,v3）搭建yolov4目標(biāo)檢測(cè)平臺(tái)

睿智的目標(biāo)檢測(cè)——Pytorch 利用mobilenet系列（v1,v2,v3）搭建yolov4目標(biāo)檢測(cè)平臺(tái)

學(xué)習(xí)前言

源碼下載

網(wǎng)絡(luò)替換實(shí)現(xiàn)思路

1、網(wǎng)絡(luò)結(jié)構(gòu)解析與替換思路解析

2、mobilenet系列網(wǎng)絡(luò)介紹

a、mobilenetV1介紹

b、mobilenetV2介紹

c、mobilenetV3介紹

3、將預(yù)測(cè)結(jié)果融入到y(tǒng)olov4網(wǎng)絡(luò)當(dāng)中

訓(xùn)練自己的YoloV4模型

一、數(shù)據(jù)集的準(zhǔn)備

二、數(shù)據(jù)集的處理

三、開始網(wǎng)絡(luò)訓(xùn)練

四、訓(xùn)練結(jié)果預(yù)測(cè)

2、mobilenet系列網(wǎng)絡(luò)介紹

a、mobilenetV1介紹

b、mobilenetV2介紹

c、mobilenetV3介紹

3、將預(yù)測(cè)結(jié)果融入到y(tǒng)olov4網(wǎng)絡(luò)當(dāng)中

二、數(shù)據(jù)集的處理

三、開始網(wǎng)絡(luò)訓(xùn)練

四、訓(xùn)練結(jié)果預(yù)測(cè)