手指關(guān)鍵點檢測論文復現(xiàn)

一、模型介紹
????????這是一篇來自華工的團隊發(fā)表的關(guān)于手指關(guān)鍵點檢測的論文,論文雖然舊,但是復現(xiàn)效果并沒有論文中展示的這么好,因為很多實驗細節(jié)論文中并沒有提到,這是讓我覺得很尷尬的地方。
????????論文的整體讀下來思路非常簡單,畢竟是15年的論文。論文通過構(gòu)建兩個CNN模型,第一個CNN模型用于大致定位關(guān)鍵部位,然后截取出該關(guān)鍵部位送到第二個CNN模型進行關(guān)鍵點檢測(指尖點和關(guān)節(jié)點)。兩個CNN并沒有端對端訓練,而是分開訓練,這也是導致最終推斷效果不好的原因之一。

? ? ? ? 數(shù)據(jù)集是在各個場景下拍攝的圖像,標注信息包括手部矩形框的左上角和右下角歸一化坐標,以及指尖點和關(guān)節(jié)點的歸一化坐標。

????????第一個CNN模型結(jié)構(gòu)如下,說實話下面的圖有些數(shù)據(jù)不太準確,有些kernel_size并不能得到相應(yīng)feature map大小。一般來說,下一層卷積層的輸出大小都會依照下面公式得到,即上一層卷積層的大小減去卷積核大小,除以步進大小,再加上2乘以邊緣填充大小,最后加1。大家感興趣可以自行推算。

模型PyTorch代碼:
class?cascadelevel1(nn.Module):
????def?__init__(self,?num_classes=4):
????????super(cascadelevel1,?self).__init__()
????????self.conv_pool?=?nn.Sequential(
????????????nn.Conv2d(in_channels=3,?out_channels=48,?kernel_size=3,?stride=4),
????????????nn.BatchNorm2d(num_features=48),
????????????nn.ReLU(),
????????????nn.MaxPool2d(kernel_size=1,?stride=2),
????????????nn.Conv2d(in_channels=48,?out_channels=96,?kernel_size=5,?stride=1,?padding=1),
????????????nn.BatchNorm2d(num_features=96),
????????????nn.ReLU(),
????????????nn.MaxPool2d(kernel_size=3,?stride=2),
????????????nn.Conv2d(in_channels=96,?out_channels=128,?kernel_size=3,?stride=1,?padding=1),
????????????nn.BatchNorm2d(num_features=128),
????????????nn.ReLU(),
????????????nn.Conv2d(in_channels=128,?out_channels=164,?kernel_size=3,?stride=1,?padding=1),
????????????nn.BatchNorm2d(num_features=164),
????????????nn.ReLU(),
????????????nn.Conv2d(in_channels=164,?out_channels=256,?kernel_size=3,?stride=1,?padding=1),
????????????nn.BatchNorm2d(num_features=256),
????????????nn.ReLU(),
????????????nn.MaxPool2d(kernel_size=3,?stride=2),
????????)
????????self.fc1?=?nn.Sequential(
????????????nn.Linear(6*6*256,?512),
????????????nn.ReLU(),
????????????nn.Dropout(0.5),
????????)
????????self.fc2?=?nn.Sequential(
????????????nn.Linear(512,?256),
????????????nn.ReLU(),
????????????nn.Dropout(0.5),
????????)
????????self.fc3?=?nn.Linear(256,?num_classes)
????def?forward(self,?x):
????????x?=?self.conv_pool(x)
????????x?=?x.view(x.size(0),?-1)
????????x?=?self.fc1(x)
????????x?=?self.fc2(x)
????????x?=?self.fc3(x)
????????return?x
????????在第二個CNN模型的構(gòu)建上,在數(shù)據(jù)處理階段,作者認為HSV顏色空間的效果比RGB好,因為主要是關(guān)鍵點都在手部,HSV可以很好的聚類皮膚顏色的重要特征。除此之外,論文還是對HSV三個分量分別使用拉普拉斯算子提取邊緣、形狀、輪廓等特征(公式如下),并結(jié)合到原始HSV中,構(gòu)成6通道輸入,整體模型如下圖所示。


模型PyTorch代碼:
class?cascadelevel2(nn.Module):
????def?__init__(self,?num_classes=4,?first_in_channel=3):
????????super(cascadelevel2,?self).__init__()
????????assert?first_in_channel?in?[3,?4,?6],?"[ERROR]?'first_in_channel'?should?be?3,?4,?6"
????????self.conv_pool_1?=?nn.Sequential(
????????????nn.Conv2d(in_channels=first_in_channel,?out_channels=32,?kernel_size=5,?stride=2),
????????????nn.BatchNorm2d(num_features=32),
????????????nn.ReLU(),
????????????nn.Conv2d(in_channels=32,?out_channels=32,?kernel_size=4,?stride=1),
????????????nn.BatchNorm2d(num_features=32),
????????????nn.ReLU(),
????????????nn.MaxPool2d(kernel_size=3,?stride=2),
????????????nn.Conv2d(in_channels=32,?out_channels=64,?kernel_size=3,?stride=1),
????????????nn.BatchNorm2d(num_features=64),
????????????nn.ReLU(),
????????????nn.Conv2d(in_channels=64,?out_channels=64,?kernel_size=3,?stride=1),
????????????nn.BatchNorm2d(num_features=64),
????????????nn.ReLU(),
????????????nn.MaxPool2d(kernel_size=2,?stride=2),
????????)
????????self.conv_pool_2?=?nn.Sequential(
????????????nn.Conv2d(in_channels=64,?out_channels=96,?kernel_size=2,?stride=1),
????????????nn.BatchNorm2d(num_features=96),
????????????nn.ReLU(),
????????????nn.Conv2d(in_channels=96,?out_channels=96,?kernel_size=2,?stride=1),
????????????nn.BatchNorm2d(num_features=96),
????????????nn.ReLU(),
????????????nn.MaxPool2d(kernel_size=2,?stride=1),
????????)
????????self.fc1?=?nn.Sequential(
????????????nn.Linear(9*9*64,?160),
????????????nn.ReLU(),
????????????nn.Dropout(0.5),
????????)
????????self.fc2?=?nn.Sequential(
????????????nn.Linear(6*6*96,?160),
????????????nn.ReLU(),
????????????nn.Dropout(0.5),
????????)
????????self.fc3?=?nn.Linear(320,?num_classes)
????def?forward(self,?x):
????????x?=?self.conv_pool_1(x)?#?20,?64,?9,?9
????????x1?=?x.view(x.size(0),?-1)??#?20,?5184
????????x1?=?self.fc1(x1)??#?20,?160
????????x2?=?self.conv_pool_2(x)??#?20,?96,?6,?6
????????x2?=?x2.view(x2.size(0),?-1)??#?20,?3456
????????x2?=?self.fc2(x2)
????????x3?=?torch.cat((x1,?x2),?1)
????????out?=?self.fc3(x3)
????????return?out
二、實驗
????????論文在實驗階段提供的細節(jié)太少,損失函數(shù)用的是歐拉損失函數(shù),其實就是MSE,

第一個CNN還用了覆蓋率F0作為精確率的criteria,其實就是IoU指標。

優(yōu)化器的選擇,超參的設(shè)置,參數(shù)初始化策略等,論文均沒有提及,這讓我很尷尬到底驗證集IoU大于0.7是怎么來的,因為做了很多次試驗,結(jié)果最好只是去到0.49左右。下圖左是兩個CNN訓練的loss曲線??梢钥吹?,幾乎是沒有什么下降的。很可能的原因就是訓練遇到瓶頸,需要減小學習率或批量數(shù)目。

下面展示一些結(jié)果,紅色表示真實結(jié)果,綠色表示預測結(jié)果??梢钥吹剑芏嘈Ч€是很差。


代碼分享:
鏈接:https://pan.baidu.com/s/1GtdUKLdwUoUEKubBr08-cg
提取碼:du7j
