无码一区二区激情,国产精品一区二区人成电影网,国产精品扒开腿做爽爽爽A片唱戏,天美张佳晨,91狠狠综合久久久久久,性受 XXXX黑人XYX性爽,人妻综合网,亚洲一区欧美日韩国产云播

點藍色字關(guān)注“機器學習算法工程師”

設(shè)為星標，干貨直達！

近日，大家都被Google AI發(fā)布MLP-Mixer: An all-MLP Architecture for Vision（Google AI提出MLP-Mixer：只需MLP就在ImageNet達到SOTA！）給刷屏了。論文中證明了僅包含最簡單的MLP結(jié)構(gòu)就能在ImageNet上達到SOTA。而就在Google公布論文的第二天，F(xiàn)acebook AI也公布了一篇論文：ResMLP: Feedforward networks for image classification with data-efficient training。這篇論文提出的ResMLP和Google提出MLP-Mixer模型簡直如出一轍，都證明了簡單的MLP結(jié)構(gòu)就能在圖像分類問題上取得比較好的效果。

ResMLP的網(wǎng)絡(luò)結(jié)構(gòu)如上圖所示，網(wǎng)絡(luò)的輸入也是一系列的patch emmbeddings，模型的基本block包括一個linear層和一個MLP，其中l(wèi)inear層完成patchs間的信息交互，而MLP則是各個patch的channel間的信息交互（就是原始transformer中的FFN）：

雖然整體思路和MLP-Mixer一致，但是ResMLP還是有以下幾點不同的地方：

（1）ResMLP并沒有采用LayerNorm，而是采用了一種Affine transformation來進行norm，這種norm方式不需要像LayerNorm那樣計算統(tǒng)計值來做歸一化，而是直接用兩個學習的參數(shù)α和β做線性變換：

具體的實現(xiàn)代碼比較簡單：

# No norm layer                                                                                                                             class Affine(nn.Module):    def __init__(self, dim):        super().__init__()        self.alpha = nn.Parameter(torch.ones(dim))        self.beta = nn.Parameter(torch.zeros(dim))    def forward(self, x):         return self.alpha * x + self.beta

這種norm方式和Going deeper with image transformers論文中采用的LayerScale很相似，不過后者沒有偏置項。實際上在ResMLP中pre-normalization中采用Aff，而residual block的pre-normalization采用LayerScale：

# MLP on channelsclass Mlp(nn.Module):    def __init__(self, dim):        super().__init__()        self.fc1 = nn.Linear(dim, 4 * dim)        self.act = nn.GELU()        self.fc2 = nn.Linear(4 * dim, dim)    def forward(self, x):         x = self.fc1(x)        x = self.act(x)        x = self.fc2(x)        return x
# ResMLP blocks: a linear between patches + a MLP to process them independentlyclass ResMLP_BLocks(nn.Module):    def __init__(self, nb_patches ,dim, layerscale_init):        super().__init__()        self.affine_1 = Affine(dim)        self.affine_2 = Affine(dim)        self.linear_patches = nn.Linear(nb_patches, nb_patches) #Linear layer on patches        self.mlp_channels = Mlp(dim) #MLP on channels        self.layerscale_1 = nn.Parameter(layerscale_init * torch.ones((dim))) #LayerScale        self.layerscale_2 = nn.Parameter(layerscale_init * torch.ones((dim))) # parameters    def forward(self, x):         res_1 = self.linear_patches(self.affine_1(x).transpose(1,2)).transpose(1,2))        x = x + self.layerscale_1 * res_1        res_2 = self.mlp_channels(self.affine_2(x))        x = x + self.layerscale_2 * res_2        return x

與ViT和MLP-Mixer一樣，ResMLP也是堆積同樣的block來提取特征：

# ResMLP model: Stacking the full networkclass ResMLP_models(nn.Module):    def __init__(self, dim, depth, nb_patches, layerscale_init, num_classes):        super().__init__()        self.patch_projector = Patch_projector()        self.blocks = nn.ModuleList([        ResMLP_BLocks(nb_patches ,dim, layerscale_init)        for i in range(depth)])        self.affine = Affine(dim)        self.linear_classifier = nn.Linear(dim, num_classes)    def forward(self, x):        B, C, H, W = x.shape        x = self.patch_projector(x)        for blk in self.blocks:            x = blk(x)        x = self.affine(x)        x = x.mean(dim=1).reshape(B,-1) #average pooling        return self.linear_classifier(x)

（2）ResMLP不像MLP-Mixer一樣采用兩個MLP，對于token mixing部分只是采用一個linear層。其實ResMLP的本意是將self-attention替換成MLP，而self-attention后面的FFN本身就是一個MLP，這樣就和Google的MLP-Mixer一樣了，但是最終實驗發(fā)現(xiàn)替換self-attention的MLP中間隱含層的維度越大反而效果越差，索性就直接簡化成a simple linear layer of size N × N;

（3）沒有在大數(shù)據(jù)（ImageNet-21K，JFT-300M）上pretrain，模型都是直接用ImageNet-1K訓練的，主要是用training strategy （heavy data-augmentation and optionally distillation）來得到好的performance。