<p id="m2nkj"><option id="m2nkj"><big id="m2nkj"></big></option></p>
    <strong id="m2nkj"></strong>
    <ruby id="m2nkj"></ruby>

    <var id="m2nkj"></var>
  • CeiT:訓(xùn)練更快的多層特征抽取ViT

    共 5777字,需瀏覽 12分鐘

     ·

    2022-01-15 07:34



    【GiantPandaCV導(dǎo)語(yǔ)】

    來(lái)自商湯和南洋理工的工作,也是使用卷積來(lái)增強(qiáng)模型提出low-level特征的能力,增強(qiáng)模型獲取局部性的能力,核心貢獻(xiàn)是LCA模塊,可以用于捕獲多層特征表示。相比DeiT,訓(xùn)練速度更快。




    引言

    針對(duì)先前Transformer架構(gòu)需要大量額外數(shù)據(jù)或者額外的監(jiān)督(Deit),才能獲得與卷積神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)相當(dāng)?shù)男阅?,為了克服這種缺陷,提出結(jié)合CNN來(lái)彌補(bǔ)Transformer的缺陷,提出了CeiT:

    (1)設(shè)計(jì)Image-to-Tokens模塊來(lái)從low-level特征中得到embedding。

    (2)將Transformer中的Feed Forward模塊替換為L(zhǎng)ocally-enhanced Feed-Forward(LeFF)模塊,增加了相鄰token之間的相關(guān)性。

    (3)使用Layer-wise Class Token Attention(LCA)捕獲多層的特征表示。

    經(jīng)過(guò)以上修改,可以發(fā)現(xiàn)模型效率方面以及泛化能力得到了提升,收斂性也有所改善,如下圖所示:

    方法

    1. Image-to-Tokens

    使用卷積+池化來(lái)取代原先ViT中7x7的大型patch。

    2. LeFF

    將tokens重新拼成feature map,然后使用深度可分離卷積添加局部性的處理,然后再使用一個(gè)Linear層映射至tokens。

    3. LCA

    前兩個(gè)都比較常規(guī),最后一個(gè)比較有特色,經(jīng)過(guò)所有Transformer層以后使用的Layer-wise Class-token Attention,如下圖所示:

    LCA模塊會(huì)將所有Transformer Block中得到的class token作為輸入,然后再在其基礎(chǔ)上使用一個(gè)MSA+FFN得到最終的logits輸出。作者認(rèn)為這樣可以獲取多尺度的表征。

    實(shí)驗(yàn)

    SOTA比較:

    I2T消融實(shí)驗(yàn):

    LeFF消融實(shí)驗(yàn):

    LCA有效性比較:

    收斂速度比較:

    代碼

    模塊1:I2T Image-to-Token

    ??#?IoT
    ??self.conv?=?nn.Sequential(
    ??????nn.Conv2d(in_channels,?out_channels,?conv_kernel,?stride,?4),
    ??????nn.BatchNorm2d(out_channels),
    ??????nn.MaxPool2d(pool_kernel,?stride)????
    ??)
    ??
    ??feature_size?=?image_size?//?4

    ??assert?feature_size?%?patch_size?==?0,?'Image?dimensions?must?be?divisible?by?the?patch?size.'
    ??num_patches?=?(feature_size?//?patch_size)?**?2
    ??patch_dim?=?out_channels?*?patch_size?**?2
    ??self.to_patch_embedding?=?nn.Sequential(
    ??????Rearrange('b?c?(h?p1)?(w?p2)?->?b?(h?w)?(p1?p2?c)',?p1?=?patch_size,?p2?=?patch_size),
    ??????nn.Linear(patch_dim,?dim),
    ??)

    模塊2:LeFF

    class?LeFF(nn.Module):
    ????
    ????def?__init__(self,?dim?=?192,?scale?=?4,?depth_kernel?=?3):
    ????????super().__init__()
    ????????
    ????????scale_dim?=?dim*scale
    ????????self.up_proj?=?nn.Sequential(nn.Linear(dim,?scale_dim),
    ????????????????????????????????????Rearrange('b?n?c?->?b?c?n'),
    ????????????????????????????????????nn.BatchNorm1d(scale_dim),
    ????????????????????????????????????nn.GELU(),
    ????????????????????????????????????Rearrange('b?c?(h?w)?->?b?c?h?w',?h=14,?w=14)
    ????????????????????????????????????)
    ????????
    ????????self.depth_conv?=??nn.Sequential(nn.Conv2d(scale_dim,?scale_dim,?kernel_size=depth_kernel,?padding=1,?groups=scale_dim,?bias=False),
    ??????????????????????????nn.BatchNorm2d(scale_dim),
    ??????????????????????????nn.GELU(),
    ??????????????????????????Rearrange('b?c?h?w?->?b?(h?w)?c',?h=14,?w=14)
    ??????????????????????????)
    ????????
    ????????self.down_proj?=?nn.Sequential(nn.Linear(scale_dim,?dim),
    ????????????????????????????????????Rearrange('b?n?c?->?b?c?n'),
    ????????????????????????????????????nn.BatchNorm1d(dim),
    ????????????????????????????????????nn.GELU(),
    ????????????????????????????????????Rearrange('b?c?n?->?b?n?c')
    ????????????????????????????????????)
    ????????
    ????def?forward(self,?x):
    ????????x?=?self.up_proj(x)
    ????????x?=?self.depth_conv(x)
    ????????x?=?self.down_proj(x)
    ????????return?x
    ????????
    class?TransformerLeFF(nn.Module):
    ????def?__init__(self,?dim,?depth,?heads,?dim_head,?scale?=?4,?depth_kernel?=?3,?dropout?=?0.):
    ????????super().__init__()
    ????????self.layers?=?nn.ModuleList([])
    ????????for?_?in?range(depth):
    ????????????self.layers.append(nn.ModuleList([
    ????????????????Residual(PreNorm(dim,?Attention(dim,?heads?=?heads,?dim_head?=?dim_head,?dropout?=?dropout))),
    ????????????????Residual(PreNorm(dim,?LeFF(dim,?scale,?depth_kernel)))
    ????????????]))
    ????def?forward(self,?x):
    ????????c?=?list()
    ????????for?attn,?leff?in?self.layers:
    ????????????x?=?attn(x)
    ????????????cls_tokens?=?x[:,?0]
    ????????????c.append(cls_tokens)
    ????????????x?=?leff(x[:,?1:])
    ????????????x?=?torch.cat((cls_tokens.unsqueeze(1),?x),?dim=1)?
    ????????return?x,?torch.stack(c).transpose(0,?1)

    模塊3:LCA

    class?LCAttention(nn.Module):
    ????def?__init__(self,?dim,?heads?=?8,?dim_head?=?64,?dropout?=?0.):
    ????????super().__init__()
    ????????inner_dim?=?dim_head?*??heads
    ????????project_out?=?not?(heads?==?1?and?dim_head?==?dim)

    ????????self.heads?=?heads
    ????????self.scale?=?dim_head?**?-0.5

    ????????self.to_qkv?=?nn.Linear(dim,?inner_dim?*?3,?bias?=?False)

    ????????self.to_out?=?nn.Sequential(
    ????????????nn.Linear(inner_dim,?dim),
    ????????????nn.Dropout(dropout)
    ????????)?if?project_out?else?nn.Identity()

    ????def?forward(self,?x):
    ????????b,?n,?_,?h?=?*x.shape,?self.heads
    ????????qkv?=?self.to_qkv(x).chunk(3,?dim?=?-1)
    ????????q,?k,?v?=?map(lambda?t:?rearrange(t,?'b?n?(h?d)?->?b?h?n?d',?h?=?h),?qkv)
    ????????q?=?q[:,?:,?-1,?:].unsqueeze(2)?#?Only?Lth?element?use?as?query

    ????????dots?=?einsum('b?h?i?d,?b?h?j?d?->?b?h?i?j',?q,?k)?*?self.scale

    ????????attn?=?dots.softmax(dim=-1)

    ????????out?=?einsum('b?h?i?j,?b?h?j?d?->?b?h?i?d',?attn,?v)
    ????????out?=?rearrange(out,?'b?h?n?d?->?b?n?(h?d)')
    ????????out?=??self.to_out(out)
    ????????return?out

    class?LCA(nn.Module):
    ????#?I?remove?Residual?connection?from?here,?in?paper?author?didn't?explicitly?mentioned?to?use?Residual?connection,?
    ????#?so?I?removed?it,?althougth?with?Residual?connection?also?this?code?will?work.
    ????def?__init__(self,?dim,?heads,?dim_head,?mlp_dim,?dropout?=?0.):
    ????????super().__init__()
    ????????self.layers?=?nn.ModuleList([])
    ????????self.layers.append(nn.ModuleList([
    ????????????????PreNorm(dim,?LCAttention(dim,?heads?=?heads,?dim_head?=?dim_head,?dropout?=?dropout)),
    ????????????????PreNorm(dim,?FeedForward(dim,?mlp_dim,?dropout?=?dropout))
    ????????????]))
    ????def?forward(self,?x):
    ????????for?attn,?ff?in?self.layers:
    ????????????x?=?attn(x)?+?x[:,?-1].unsqueeze(1)
    ????????????x?=?x[:,?-1].unsqueeze(1)?+?ff(x)
    ????????return?x

    參考

    https://arxiv.org/abs/2103.11816

    https://github.com/rishikksh20/CeiT-pytorch/blob/master/ceit.py



    為了感謝讀者的長(zhǎng)期支持,今天我們將送出三本由?機(jī)械工業(yè)出版社?提供的:《從零開(kāi)始構(gòu)建深度前饋神經(jīng)網(wǎng)絡(luò)》 。點(diǎn)擊下方抽獎(jiǎng)助手參與抽獎(jiǎng)。沒(méi)抽到并且對(duì)本書(shū)有興趣的也可以使用下方鏈接進(jìn)行購(gòu)買。

    《從零開(kāi)始構(gòu)建深度前饋神經(jīng)網(wǎng)絡(luò)》抽獎(jiǎng)鏈接


    本書(shū)通過(guò)Python+NumPy從零開(kāi)始構(gòu)建神經(jīng)網(wǎng)絡(luò)模型,強(qiáng)化讀者對(duì)算法思想的理解,并通過(guò)TensorFlow構(gòu)建模型來(lái)驗(yàn)證讀者親手從零構(gòu)建的版本。前饋神經(jīng)網(wǎng)絡(luò)是深度學(xué)習(xí)的重要知識(shí),其核心思想是反向傳播與梯度下降。本書(shū)從極易理解的示例開(kāi)始,逐漸深入,幫助讀者充分理解并熟練掌握反向傳播與梯度下降算法,為后續(xù)學(xué)習(xí)打下堅(jiān)實(shí)的基礎(chǔ)。


    END



    瀏覽 127
    點(diǎn)贊
    評(píng)論
    收藏
    分享

    手機(jī)掃一掃分享

    分享
    舉報(bào)
    評(píng)論
    圖片
    表情
    推薦
    點(diǎn)贊
    評(píng)論
    收藏
    分享

    手機(jī)掃一掃分享

    分享
    舉報(bào)
    <p id="m2nkj"><option id="m2nkj"><big id="m2nkj"></big></option></p>
    <strong id="m2nkj"></strong>
    <ruby id="m2nkj"></ruby>

    <var id="m2nkj"></var>
  • 亚洲视频免费 | 成人国产三级精品秘 | 欧美黄色大片一级a | 三级片网站视频 | 国产视频99在线观看 | 国产精品日韩欧美大师 | 在线视频一区二区 | 亚洲成人看吧 | 亚洲视频免费看 | 97自拍超碰 |