<kbd id="afajh"><form id="afajh"></form></kbd>
<strong id="afajh"><dl id="afajh"></dl></strong>
    <del id="afajh"><form id="afajh"></form></del>
        1. <th id="afajh"><progress id="afajh"></progress></th>
          <b id="afajh"><abbr id="afajh"></abbr></b>
          <th id="afajh"><progress id="afajh"></progress></th>

          帶你捋一捋pytorch官方Faster R-CNN代碼

          共 26885字,需瀏覽 54分鐘

           ·

          2020-08-13 09:36

          AI編輯:我是小將

          本文作者:白裳

          https://zhuanlan.zhihu.com/p/145842317

          本文已由原作者授權(quán),不得擅自二次轉(zhuǎn)載


          目前 pytorch 已經(jīng)在 torchvision 模塊集成了 FasterRCNN 和 MaskRCNN 代碼。考慮到幫助各位小伙伴理解模型細(xì)節(jié)問題,本文分析一下 FasterRCNN 代碼,幫助新手理解 Two-Stage 檢測中的主要問題。

          這篇文章默認(rèn)讀者已經(jīng)對 FasterRCNN 原理有一定了解。否則請先點擊閱讀上一篇文章:

          一文讀懂Faster RCNNzhuanlan.zhihu.com

          torchvision 中 FasterRCNN 代碼文檔如下:

          https://pytorch.org/docs/stable/torchvision/models.html#faster-r-cnnpytorch.org

          在 python 中裝好 torchvision 后,輸入以下命令即可查看版本和代碼位置:

          import torchvision

          print(torchvision.__version__)
          # '0.6.0'
          print(torchvision.__path__)
          # ['/usr/local/lib/python3.7/site-packages/torchvision']

          代碼結(jié)構(gòu)


          圖1


          作為 torchvision 中目標(biāo)檢測基類,GeneralizedRCNN 繼承了 torch.nn.Module,后續(xù) FasterRCNN 、MaskRCNN 都繼承 GeneralizedRCNN。

          GeneralizedRCNN

          GeneralizedRCNN 繼承基類 nn.Module 。首先來看看基類 GeneralizedRCNN 的代碼:

          class GeneralizedRCNN(nn.Module):
          def __init__(self, backbone, rpn, roi_heads, transform):
          super(GeneralizedRCNN, self).__init__()
          self.transform = transform
          self.backbone = backbone
          self.rpn = rpn
          self.roi_heads = roi_heads
          # used only on torchscript mode
          self._has_warned = False

          @torch.jit.unused
          def eager_outputs(self, losses, detections):
          # type: (Dict[str, Tensor], List[Dict[str, Tensor]]) -> Tuple[Dict[str, Tensor], List[Dict[str, Tensor]]]
          if self.training:
          return losses

          return detections

          def forward(self, images, targets=None):
          if self.training and targets is None:
          raise ValueError("In training mode, targets should be passed")
          original_image_sizes = torch.jit.annotate(List[Tuple[int, int]], [])
          for img in images:
          val = img.shape[-2:]
          assert len(val) == 2
          original_image_sizes.append((val[0], val[1]))

          images, targets = self.transform(images, targets)
          features = self.backbone(images.tensors)
          if isinstance(features, torch.Tensor):
          features = OrderedDict([('0', features)])
          proposals, proposal_losses = self.rpn(images, features, targets)
          detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)
          detections = self.transform.postprocess(detections, images.image_sizes, original_image_sizes)

          losses = {}
          losses.update(detector_losses)
          losses.update(proposal_losses)

          if torch.jit.is_scripting():
          if not self._has_warned:
          warnings.warn("RCNN always returns a (Losses, Detections) tuple in scripting")
          self._has_warned = True
          return (losses, detections)
          else:
          return self.eager_outputs(losses, detections)

          對于 GeneralizedRCNN 代碼,其中有4個重要的接口:

          • transform

          • backbone

          • rpn

          • roi_heads

          transform

          # GeneralizedRCNN.forward(...)
          for img in images:
          val = img.shape[-2:]
          assert len(val) == 2
          original_image_sizes.append((val[0], val[1]))

          images, targets = self.transform(images, targets)


          圖2


          transform主要做2件事:

          1. 將范圍是??的 uint8 類型圖像歸一化到范圍??的 float32 類型

          2. 將圖像縮放到固定大小(同時也要對應(yīng)縮放 targets 中標(biāo)記框 )

          需要說明,由于把縮放后的圖像輸入網(wǎng)絡(luò),那么網(wǎng)絡(luò)輸出的檢測框也是在縮放后的圖像上的。但是實際中我們需要的是在原始圖像的檢測框,為了對應(yīng)起來,所以需要記錄變換前original_images_sizes 。


          圖3


          這里解釋一下為何要縮放圖像。對于 FasterRCNN,從純理論上來說確實可以支持任意大小的圖片。但是實際中,如果輸入圖像太大(如6000x4000)會直接撐爆內(nèi)存??紤]到工程問題,縮放是一個比較穩(wěn)妥的折衷選擇。

          backbone + rpn + roi_heads

          圖4


          完成圖像縮放之后其實才算是正式進(jìn)入網(wǎng)絡(luò)流程。接下來有4個步驟:

          (1)將 transform 后的圖像輸入到 backbone 模塊提取特征圖

          # GeneralizedRCNN.forward(...)
          features = self.backbone(images.tensors)

          backbone 一般為 VGG、ResNet、MobileNet 等網(wǎng)絡(luò)。

          (2)然后經(jīng)過 rpn 模塊生成 proposals 和 proposal_losses

          # GeneralizedRCNN.forward(...)
          proposals, proposal_losses = self.rpn(images, features, targets)

          (3)接著進(jìn)入 roi_heads 模塊(即 roi_pooling + 分類)

          # GeneralizedRCNN.forward(...)
          detections, detector_losses =
          self.roi_heads(features, proposals, images.image_sizes, targets)


          ?(4)最后經(jīng) postprocess 模塊(進(jìn)行 NMS,同時將 box 通過 original_images_size映射回原圖)

          # GeneralizedRCNN.forward(...)
          detections = self.transform.postprocess(detections, images.image_sizes, original_image_sizes)

          FasterRCNN

          FasterRCNN 繼承基類 GeneralizedRCNN。

          class FasterRCNN(GeneralizedRCNN):

          def __init__(self, backbone, num_classes=None,
          # transform parameters
          min_size=800, max_size=1333,
          image_mean=None, image_std=None,
          # RPN parameters
          rpn_anchor_generator=None, rpn_head=None,
          rpn_pre_nms_top_n_train=2000, rpn_pre_nms_top_n_test=1000,
          rpn_post_nms_top_n_train=2000, rpn_post_nms_top_n_test=1000,
          rpn_nms_thresh=0.7,
          rpn_fg_iou_thresh=0.7, rpn_bg_iou_thresh=0.3,
          rpn_batch_size_per_image=256, rpn_positive_fraction=0.5,
          # Box parameters
          box_roi_pool=None, box_head=None, box_predictor=None,
          box_score_thresh=0.05, box_nms_thresh=0.5, box_detections_per_img=100,
          box_fg_iou_thresh=0.5, box_bg_iou_thresh=0.5,
          box_batch_size_per_image=512, box_positive_fraction=0.25,
          bbox_reg_weights=None):

          out_channels = backbone.out_channels

          if rpn_anchor_generator is None:
          anchor_sizes = ((32,), (64,), (128,), (256,), (512,))
          aspect_ratios = ((0.5, 1.0, 2.0),) * len(anchor_sizes)
          rpn_anchor_generator = AnchorGenerator(
          anchor_sizes, aspect_ratios
          )
          if rpn_head is None:
          rpn_head = RPNHead(
          out_channels, rpn_anchor_generator.num_anchors_per_location()[0]
          )

          rpn_pre_nms_top_n = dict(training=rpn_pre_nms_top_n_train, testing=rpn_pre_nms_top_n_test)
          rpn_post_nms_top_n = dict(training=rpn_post_nms_top_n_train, testing=rpn_post_nms_top_n_test)

          rpn = RegionProposalNetwork(
          rpn_anchor_generator, rpn_head,
          rpn_fg_iou_thresh, rpn_bg_iou_thresh,
          rpn_batch_size_per_image, rpn_positive_fraction,
          rpn_pre_nms_top_n, rpn_post_nms_top_n, rpn_nms_thresh)

          if box_roi_pool is None:
          box_roi_pool = MultiScaleRoIAlign(
          featmap_names=['0', '1', '2', '3'],
          output_size=7,
          sampling_ratio=2)

          if box_head is None:
          resolution = box_roi_pool.output_size[0]
          representation_size = 1024
          box_head = TwoMLPHead(
          out_channels * resolution ** 2,
          representation_size)

          if box_predictor is None:
          representation_size = 1024
          box_predictor = FastRCNNPredictor(
          representation_size,
          num_classes)

          roi_heads = RoIHeads(
          # Box
          box_roi_pool, box_head, box_predictor,
          box_fg_iou_thresh, box_bg_iou_thresh,
          box_batch_size_per_image, box_positive_fraction,
          bbox_reg_weights,
          box_score_thresh, box_nms_thresh, box_detections_per_img)

          if image_mean is None:
          image_mean = [0.485, 0.456, 0.406]
          if image_std is None:
          image_std = [0.229, 0.224, 0.225]
          transform = GeneralizedRCNNTransform(min_size, max_size, image_mean, image_std)

          super(FasterRCNN, self).__init__(backbone, rpn, roi_heads, transform)

          FasterRCNN 實現(xiàn)了 GeneralizedRCNN 中的 transform、backbone、rpn、roi_heads 接口:

          # FasterRCNN.__init__(...)
          super(FasterRCNN, self).__init__(backbone, rpn, roi_heads, transform)

          對于 transform 接口,使用 GeneralizedRCNNTransform 實現(xiàn)。從代碼變量名可以明顯看到包含:

          • 與縮放相關(guān)參數(shù):min_size + max_size

          • 與歸一化相關(guān)參數(shù):image_mean + image_std


          # FasterRCNN.__init__(...)
          if image_mean is None:
          image_mean = [0.485, 0.456, 0.406]
          if image_std is None:
          image_std = [0.229, 0.224, 0.225]
          transform = GeneralizedRCNNTransform(min_size, max_size, image_mean, image_std)

          對于 backbone 使用 ResNet50 + FPN 結(jié)構(gòu):

          def fasterrcnn_resnet50_fpn(pretrained=False, progress=True, num_classes=91, pretrained_backbone=True, **kwargs):
          if pretrained:
          # no need to download the backbone if pretrained is set
          pretrained_backbone = False
          backbone = resnet_fpn_backbone('resnet50', pretrained_backbone)
          model = FasterRCNN(backbone, num_classes, **kwargs)
          if pretrained:
          state_dict = load_state_dict_from_url(model_urls['fasterrcnn_resnet50_fpn_coco'], progress=progress)
          model.load_state_dict(state_dict)
          return model

          ResNet:?Deep Residual Learning for Image Recognition

          FPN:?Feature Pyramid Networks for Object Detection

          5 FPN


          接下來重點介紹 rpn 接口的實現(xiàn)。首先是 rpn_anchor_generator :

          # FasterRCNN.__init__(...)
          if rpn_anchor_generator is None:
          anchor_sizes = ((32,), (64,), (128,), (256,), (512,))
          aspect_ratios = ((0.5, 1.0, 2.0),) * len(anchor_sizes)
          rpn_anchor_generator = AnchorGenerator(
          anchor_sizes, aspect_ratios
          )

          對于普通的 FasterRCNN 只需要將 feature_map 輸入到 rpn 網(wǎng)絡(luò)生成 proposals 即可。但是由于加入 FPN,需要將多個 feature_map 逐個輸入到 rpn 網(wǎng)絡(luò)。

          圖6


          接下來看看 AnchorGenerator 具體實現(xiàn):

          class AnchorGenerator(nn.Module):
          ......

          def generate_anchors(self, scales, aspect_ratios, dtype=torch.float32, device="cpu"):
          # type: (List[int], List[float], int, Device) # noqa: F821
          scales = torch.as_tensor(scales, dtype=dtype, device=device)
          aspect_ratios = torch.as_tensor(aspect_ratios, dtype=dtype, device=device)
          h_ratios = torch.sqrt(aspect_ratios)
          w_ratios = 1 / h_ratios

          ws = (w_ratios[:, None] * scales[None, :]).view(-1)
          hs = (h_ratios[:, None] * scales[None, :]).view(-1)

          base_anchors = torch.stack([-ws, -hs, ws, hs], dim=1) / 2
          return base_anchors.round()

          def set_cell_anchors(self, dtype, device):
          # type: (int, Device) -> None # noqa: F821
          ......

          cell_anchors = [
          self.generate_anchors(
          sizes,
          aspect_ratios,
          dtype,
          device
          )
          for sizes, aspect_ratios in zip(self.sizes, self.aspect_ratios)
          ]
          self.cell_anchors = cell_anchors

          首先,每個位置有 5 種 anchor_size 和 3 種 aspect_ratios,所以每個位置生成 15 個 base_anchors:

          [ -23.,  -11.,   23.,   11.]
          [ -16., -16., 16., 16.] # w = h = 32, ratio = 1
          [ -11., -23., 11., 23.]
          [ -45., -23., 45., 23.]
          [ -32., -32., 32., 32.] # w = h = 64, ratio = 1
          [ -23., -45., 23., 45.]
          [ -91., -45., 91., 45.]
          [ -64., -64., 64., 64.] # w = h = 128, ratio = 1
          [ -45., -91., 45., 91.]
          [-181., -91., 181., 91.]
          [-128., -128., 128., 128.] # w = h = 256, ratio = 1
          [ -91., -181., 91., 181.]
          [-362., -181., 362., 181.]
          [-256., -256., 256., 256.] # w = h = 512, ratio = 1
          [-181., -362., 181., 362.]

          注意 base_anchors 的中心都是??點,如下圖所示:


          ? ? ? ? ? ? ? ? ? ??圖7 base_anchor(此圖只畫了32/64/128的base_anchor)


          接著來看 AnchorGenerator.grid_anchors 函數(shù):

          # AnchorGenerator
          def grid_anchors(self, grid_sizes, strides):
          # type: (List[List[int]], List[List[Tensor]])
          anchors = []
          cell_anchors = self.cell_anchors
          assert cell_anchors is not None

          for size, stride, base_anchors in zip(
          grid_sizes, strides, cell_anchors
          ):
          grid_height, grid_width = size
          stride_height, stride_width = stride
          device = base_anchors.device

          # For output anchor, compute [x_center, y_center, x_center, y_center]
          shifts_x = torch.arange(
          0, grid_width, dtype=torch.float32, device=device
          ) * stride_width
          shifts_y = torch.arange(
          0, grid_height, dtype=torch.float32, device=device
          ) * stride_height
          shift_y, shift_x = torch.meshgrid(shifts_y, shifts_x)
          shift_x = shift_x.reshape(-1)
          shift_y = shift_y.reshape(-1)
          shifts = torch.stack((shift_x, shift_y, shift_x, shift_y), dim=1)

          # For every (base anchor, output anchor) pair,
          # offset each zero-centered base anchor by the center of the output anchor.
          anchors.append(
          (shifts.view(-1, 1, 4) + base_anchors.view(1, -1, 4)).reshape(-1, 4)
          )

          return anchors

          def forward(self, image_list, feature_maps):
          # type: (ImageList, List[Tensor])
          grid_sizes = list([feature_map.shape[-2:] for feature_map in feature_maps])
          image_size = image_list.tensors.shape[-2:]
          dtype, device = feature_maps[0].dtype, feature_maps[0].device
          strides = [[torch.tensor(image_size[0] / g[0], dtype=torch.int64, device=device),
          torch.tensor(image_size[1] / g[1], dtype=torch.int64, device=device)] for g in grid_sizes]
          self.set_cell_anchors(dtype, device)
          anchors_over_all_feature_maps = self.cached_grid_anchors(grid_sizes, strides)
          ......

          在之前提到,由于有 FPN 網(wǎng)絡(luò),所以輸入 rpn 的是多個特征。為了方便介紹,以下都是以某一個特征進(jìn)行描述,其他特征類似。

          假設(shè)有??的特征,首先會計算這個特征相對于輸入圖像的下采樣倍數(shù) stride:

          然后生成一個??大小的網(wǎng)格,每個格子長度為 stride,如下圖:

          # AnchorGenerator.grid_anchors(...)
          shifts_x = torch.arange(0, grid_width, dtype=torch.float32, device=device) * stride_width
          shifts_y = torch.arange(0, grid_height, dtype=torch.float32, device=device) * stride_height
          shift_y, shift_x = torch.meshgrid(shifts_y, shifts_x)

          圖8


          然后將 base_anchors 的中心從??移動到網(wǎng)格的點,且在網(wǎng)格的每個點都放置一組 base_anchors。這樣就在當(dāng)前 feature_map 上有了很多的 anchors。

          需要特別說明,stride 代表網(wǎng)絡(luò)的感受野,網(wǎng)絡(luò)不可能檢測到比 feature_map 更密集的框了!所以才只會在網(wǎng)格中每個點設(shè)置 anchors(反過來說,如果在網(wǎng)格的兩個點之間設(shè)置 anchors,那么就對應(yīng) feature_map 中半個點,顯然不合理)。

          # AnchorGenerator.grid_anchors(...)
          anchors.append((shifts.view(-1, 1, 4) + base_anchors.view(1, -1, 4)).reshape(-1, 4))


          圖9


          放置好 anchors 后,接下來就要調(diào)整網(wǎng)絡(luò),使網(wǎng)絡(luò)輸出能夠判斷每個 anchor 是否有目標(biāo),同時還要有 bounding box regression 需要的4個值??。

          class RPNHead(nn.Module):
          def __init__(self, in_channels, num_anchors):
          super(RPNHead, self).__init__()
          self.conv = nn.Conv2d(
          in_channels, in_channels, kernel_size=3, stride=1, padding=1
          )
          self.cls_logits = nn.Conv2d(in_channels, num_anchors, kernel_size=1, stride=1)
          self.bbox_pred = nn.Conv2d(
          in_channels, num_anchors * 4, kernel_size=1, stride=1
          )

          def forward(self, x):
          logits = []
          bbox_reg = []
          for feature in x:
          t = F.relu(self.conv(feature))
          logits.append(self.cls_logits(t))
          bbox_reg.append(self.bbox_pred(t))
          return logits, bbox_reg

          假設(shè) feature 的大小??,每個點??個 anchor。從 RPNHead 的代碼中可以明顯看到:

          1. 首先進(jìn)行 3x3 卷積

          2. 然后對 feature 進(jìn)行卷積,輸出 cls_logits 大小是??,對應(yīng)每個 anchor 是否有目標(biāo);

          3. 同時feature 進(jìn)行卷積,輸出 bbox_pred 大小是??,對應(yīng)每個點的4個框位置回歸信息??。


          # RPNHead.__init__(...)
          self.cls_logits = nn.Conv2d(in_channels, num_anchors, kernel_size=1, stride=1)
          self.bbox_pred = nn.Conv2d(in_channels, num_anchors * 4, kernel_size=1, stride=1)


          圖10


          上述過程只是單個 feature_map 的處理流程。對于 FPN 網(wǎng)絡(luò)的輸出的多個大小不同的 feature_maps,每個特征圖都會按照上述過程計算 stride 和網(wǎng)格,并設(shè)置 anchors。當(dāng)處理完后獲得密密麻麻的各種 anchors 了。

          接下來進(jìn)入 RegionProposalNetwork 類:

          # FasterRCNN.__init__(...)
          rpn_pre_nms_top_n = dict(training=rpn_pre_nms_top_n_train, testing=rpn_pre_nms_top_n_test)
          rpn_post_nms_top_n = dict(training=rpn_post_nms_top_n_train, testing=rpn_post_nms_top_n_test)

          # rpn_anchor_generator 生成anchors
          # rpn_head 調(diào)整feature_map獲得cls_logits+bbox_pred
          rpn = RegionProposalNetwork(
          rpn_anchor_generator, rpn_head,
          rpn_fg_iou_thresh, rpn_bg_iou_thresh,
          rpn_batch_size_per_image, rpn_positive_fraction,
          rpn_pre_nms_top_n, rpn_post_nms_top_n, rpn_nms_thresh)

          RegionProposalNetwork 類的用是:

          • test 階段 :計算有目標(biāo)的 anchor 并進(jìn)行框回歸生成 proposals,然后 NMS

          • train 階段 :除了上面的作用,還計算 rpn loss


          class RegionProposalNetwork(torch.nn.Module):
          .......

          def forward(self, images, features, targets=None):
          features = list(features.values())
          objectness, pred_bbox_deltas = self.head(features)
          anchors = self.anchor_generator(images, features)

          num_images = len(anchors)
          num_anchors_per_level_shape_tensors = [o[0].shape for o in objectness]
          num_anchors_per_level = [s[0] * s[1] * s[2] for s in num_anchors_per_level_shape_tensors]
          objectness, pred_bbox_deltas = \
          concat_box_prediction_layers(objectness, pred_bbox_deltas)
          # apply pred_bbox_deltas to anchors to obtain the decoded proposals
          # note that we detach the deltas because Faster R-CNN do not backprop through
          # the proposals
          proposals = self.box_coder.decode(pred_bbox_deltas.detach(), anchors)
          proposals = proposals.view(num_images, -1, 4)
          boxes, scores = self.filter_proposals(proposals, objectness, images.image_sizes, num_anchors_per_level)

          losses = {}
          if self.training:
          assert targets is not None
          labels, matched_gt_boxes = self.assign_targets_to_anchors(anchors, targets)
          regression_targets = self.box_coder.encode(matched_gt_boxes, anchors)
          loss_objectness, loss_rpn_box_reg = self.compute_loss(
          objectness, pred_bbox_deltas, labels, regression_targets)
          losses = {
          "loss_objectness": loss_objectness,
          "loss_rpn_box_reg": loss_rpn_box_reg,
          }
          return boxes, losses

          具體來看,首先計算有目標(biāo)的 anchor 并進(jìn)行框回歸生成 proposals :

          # RegionProposalNetwork.forward(...)
          objectness, pred_bbox_deltas = self.head(features)
          anchors = self.anchor_generator(images, features)
          ......
          proposals = self.box_coder.decode(pred_bbox_deltas.detach(), anchors)
          proposals = proposals.view(num_images, -1, 4)

          然后依照 objectness 置信由大到小度排序(優(yōu)先提取更可能包含目標(biāo)的的),并 NMS,生成 boxes (即 NMS 后的 proposal boxes ) :

          # RegionProposalNetwork.forward(...)
          boxes, scores = self.filter_proposals(proposals, objectness, images.image_sizes, num_anchors_per_level)

          如果是訓(xùn)練階段,還要將 boxes 與 anchors 進(jìn)行匹配,計算 cls_logits 的損失 loss_objectness,同時計算 bbox_pred 的損失 loss_rpn_box_reg。

          在 RegionProposalNetwork 之后已經(jīng)生成了 boxes ,接下來就要提取 boxes 內(nèi)的特征進(jìn)行 roi_pooling :

          roi_heads = RoIHeads(
          # Box
          box_roi_pool, box_head, box_predictor,
          box_fg_iou_thresh, box_bg_iou_thresh,
          box_batch_size_per_image, box_positive_fraction,
          bbox_reg_weights,
          box_score_thresh, box_nms_thresh, box_detections_per_img)

          這里一點問題是如何計算 box 所屬的 feature_map:

          • 對于原始 FasterRCNN,只在 backbone 的最后一層 feature_map 提取 box 對應(yīng)特征;

          • 當(dāng)加入 FPN 后 backbone 會輸出多個特征圖,需要計算當(dāng)前 boxes 對應(yīng)于哪一個特征。

          如下圖:


          圖11


          class MultiScaleRoIAlign(nn.Module):
          ......

          def infer_scale(self, feature, original_size):
          # type: (Tensor, List[int])
          # assumption: the scale is of the form 2 ** (-k), with k integer
          size = feature.shape[-2:]
          possible_scales = torch.jit.annotate(List[float], [])
          for s1, s2 in zip(size, original_size):
          approx_scale = float(s1) / float(s2)
          scale = 2 ** float(torch.tensor(approx_scale).log2().round())
          possible_scales.append(scale)
          assert possible_scales[0] == possible_scales[1]
          return possible_scales[0]

          def setup_scales(self, features, image_shapes):
          # type: (List[Tensor], List[Tuple[int, int]])
          assert len(image_shapes) != 0
          max_x = 0
          max_y = 0
          for shape in image_shapes:
          max_x = max(shape[0], max_x)
          max_y = max(shape[1], max_y)
          original_input_shape = (max_x, max_y)

          scales = [self.infer_scale(feat, original_input_shape) for feat in features]
          # get the levels in the feature map by leveraging the fact that the network always
          # downsamples by a factor of 2 at each level.
          lvl_min = -torch.log2(torch.tensor(scales[0], dtype=torch.float32)).item()
          lvl_max = -torch.log2(torch.tensor(scales[-1], dtype=torch.float32)).item()
          self.scales = scales
          self.map_levels = initLevelMapper(int(lvl_min), int(lvl_max))

          首先計算每個 feature_map 相對于網(wǎng)絡(luò)輸入 image 的下采樣倍率 scale。其中 infer_scale 函數(shù)采用如下的近似公式:

          該公式相當(dāng)于做了一個簡單的映射,將不同的 feature_map 與 image 大小比映射到附近的尺度:


          ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?圖12


          例如對于 FasterRCNN 實際值為:

          之后設(shè)置 lvl_min=2 和 lvl_max=5:

          # MultiScaleRoIAlign.setup_scales(...)
          # get the levels in the feature map by leveraging the fact that the network always
          # downsamples by a factor of 2 at each level.
          lvl_min = -torch.log2(torch.tensor(scales[0], dtype=torch.float32)).item()
          lvl_max = -torch.log2(torch.tensor(scales[-1], dtype=torch.float32)).item()

          接著使用 FPN 原文中的公式計算 box 所在 anchor(其中??,??為 box 面積):



          class LevelMapper(object)
          def __init__(self, k_min, k_max, canonical_scale=224, canonical_level=4, eps=1e-6):
          self.k_min = k_min # lvl_min=2
          self.k_max = k_max # lvl_max=5
          self.s0 = canonical_scale # 224
          self.lvl0 = canonical_level # 4
          self.eps = eps

          def __call__(self, boxlists):
          s = torch.sqrt(torch.cat([box_area(boxlist) for boxlist in boxlists]))

          # Eqn.(1) in FPN paper
          target_lvls = torch.floor(self.lvl0 + torch.log2(s / self.s0) + torch.tensor(self.eps, dtype=s.dtype))
          target_lvls = torch.clamp(target_lvls, min=self.k_min, max=self.k_max)
          return (target_lvls.to(torch.int64) - self.k_min).to(torch.int64)

          其中 torch.clamp(input, min, max) → Tensor 函數(shù)的作用是截斷,防止越界:

          可以看到,通過 LevelMapper 類將不同大小的 box 定位到某個 feature_map,如下圖。之后就是按照圖11中的流程進(jìn)行 roi_pooling 操作。


          圖13


          在對每個 box 指定完 anchor 并進(jìn)行 roi_pooling 后就獲得了 7x7 特征。接下來就是將特征轉(zhuǎn)為最后針對 box 的類別信息(如人、貓、狗、車)和進(jìn)一步的框回歸信息。

          class TwoMLPHead(nn.Module):

          def __init__(self, in_channels, representation_size):
          super(TwoMLPHead, self).__init__()

          self.fc6 = nn.Linear(in_channels, representation_size)
          self.fc7 = nn.Linear(representation_size, representation_size)

          def forward(self, x):
          x = x.flatten(start_dim=1)

          x = F.relu(self.fc6(x))
          x = F.relu(self.fc7(x))

          return x


          class FastRCNNPredictor(nn.Module):

          def __init__(self, in_channels, num_classes):
          super(FastRCNNPredictor, self).__init__()
          self.cls_score = nn.Linear(in_channels, num_classes)
          self.bbox_pred = nn.Linear(in_channels, num_classes * 4)

          def forward(self, x):
          if x.dim() == 4:
          assert list(x.shape[2:]) == [1, 1]
          x = x.flatten(start_dim=1)
          scores = self.cls_score(x)
          bbox_deltas = self.bbox_pred(x)

          return scores, bbox_deltas

          首先 TwoMLPHead 將 7x7 特征經(jīng)過兩個全連接層轉(zhuǎn)為 1024,然后 FastRCNNPredictor 將每個 box 對應(yīng)的 1024 維特征轉(zhuǎn)為 cls_score 和 bbox_pred :


          圖14


          顯然 cls_score 后接 softmax 即為類別概率,可以確定 box 的類別;在確定類別后,在 bbox_pred 中對應(yīng)類別的??4個值即為第二次 bounding box regression 需要的4個偏移值。

          簡單的說,帶有FPN的FasterRCNN網(wǎng)絡(luò)結(jié)構(gòu)可以用下圖表示:


          圖15


          關(guān)于訓(xùn)練

          FasterRCNN模型在兩處地方有損失函數(shù):

          • 在 RegionProposalNetwork 類,需要判別 anchor 中是否包含目標(biāo)從而生成 proposals,這里需要計算 loss

          • 在 RoIHeads 類,對 roi_pooling 后的全連接生成的 cls_score 和 bbox_pred 進(jìn)行訓(xùn)練,也需要計算 loss

          首先來看 RegionProposalNetwork 類中的 assign_targets_to_anchors 函數(shù)。

          def assign_targets_to_anchors(self, anchors, targets):
          # type: (List[Tensor], List[Dict[str, Tensor]])
          labels = []
          matched_gt_boxes = []
          for anchors_per_image, targets_per_image in zip(anchors, targets):
          gt_boxes = targets_per_image["boxes"]

          if gt_boxes.numel() == 0:
          # Background image (negative example)
          device = anchors_per_image.device
          matched_gt_boxes_per_image = torch.zeros(anchors_per_image.shape, dtype=torch.float32, device=device)
          labels_per_image = torch.zeros((anchors_per_image.shape[0],), dtype=torch.float32, device=device)
          else:
          match_quality_matrix = box_ops.box_iou(gt_boxes, anchors_per_image)
          matched_idxs = self.proposal_matcher(match_quality_matrix)
          # get the targets corresponding GT for each proposal
          # NB: need to clamp the indices because we can have a single
          # GT in the image, and matched_idxs can be -2, which goes
          # out of bounds
          matched_gt_boxes_per_image = gt_boxes[matched_idxs.clamp(min=0)]

          labels_per_image = matched_idxs >= 0
          labels_per_image = labels_per_image.to(dtype=torch.float32)
          # Background (negative examples)
          bg_indices = matched_idxs == self.proposal_matcher.BELOW_LOW_THRESHOLD
          labels_per_image[bg_indices] = torch.tensor(0.0)

          # discard indices that are between thresholds
          inds_to_discard = matched_idxs == self.proposal_matcher.BETWEEN_THRESHOLDS
          labels_per_image[inds_to_discard] = torch.tensor(-1.0)

          labels.append(labels_per_image)
          matched_gt_boxes.append(matched_gt_boxes_per_image)
          return labels, matched_gt_boxes

          當(dāng)圖像中沒有 gt_boxes 時,設(shè)置所有 anchor 都為 background(即 label 為 0):

          if gt_boxes.numel() == 0
          # Background image (negative example)
          device = anchors_per_image.device
          matched_gt_boxes_per_image = torch.zeros(anchors_per_image.shape, dtype=torch.float32, device=device)
          labels_per_image = torch.zeros((anchors_per_image.shape[0],), dtype=torch.float32, device=device)

          當(dāng)圖像中有 gt_boxes 時,計算 anchor 與 gt_box 的 IOU:

          (1)選擇 IOU < 0.3 的 anchor 為 background,標(biāo)簽為 0

          labels_per_image[bg_indices] = torch.tensor(0.0)

          (2)選擇 IOU > 0.7 的 anchor 為 foreground,標(biāo)簽為 1

          labels_per_image = matched_idxs >= 0

          (3)忽略 0.3 < IOU < 0.7 的 anchor,不參與訓(xùn)練

          從 FasterRCNN 類的 __init__ 函數(shù)默認(rèn)參數(shù)就可以清晰的看到這一點:

          rpn_fg_iou_thresh=0.7, rpn_bg_iou_thresh=0.3,

          接著來看 RoIHeads 類中的 assign_targets_to_proposals 函數(shù)。

          def assign_targets_to_proposals(self, proposals, gt_boxes, gt_labels):
          # type: (List[Tensor], List[Tensor], List[Tensor])
          matched_idxs = []
          labels = []
          for proposals_in_image, gt_boxes_in_image, gt_labels_in_image in zip(proposals, gt_boxes, gt_labels):

          if gt_boxes_in_image.numel() == 0:
          # Background image
          device = proposals_in_image.device
          clamped_matched_idxs_in_image = torch.zeros(
          (proposals_in_image.shape[0],), dtype=torch.int64, device=device
          )
          labels_in_image = torch.zeros(
          (proposals_in_image.shape[0],), dtype=torch.int64, device=device
          )
          else:
          # set to self.box_similarity when https://github.com/pytorch/pytorch/issues/27495 lands
          match_quality_matrix = box_ops.box_iou(gt_boxes_in_image, proposals_in_image)
          matched_idxs_in_image = self.proposal_matcher(match_quality_matrix)

          clamped_matched_idxs_in_image = matched_idxs_in_image.clamp(min=0)

          labels_in_image = gt_labels_in_image[clamped_matched_idxs_in_image]
          labels_in_image = labels_in_image.to(dtype=torch.int64)

          # Label background (below the low threshold)
          bg_inds = matched_idxs_in_image == self.proposal_matcher.BELOW_LOW_THRESHOLD
          labels_in_image[bg_inds] = torch.tensor(0)

          # Label ignore proposals (between low and high thresholds)
          ignore_inds = matched_idxs_in_image == self.proposal_matcher.BETWEEN_THRESHOLDS
          labels_in_image[ignore_inds] = torch.tensor(-1) # -1 is ignored by sampler

          matched_idxs.append(clamped_matched_idxs_in_image)
          labels.append(labels_in_image)
          return matched_idxs, labels

          與 assign_targets_to_anchors 不同,該函數(shù)設(shè)置:

          box_fg_iou_thresh=0.5, box_bg_iou_thresh=0.5,

          (1)IOU > 0.5 的 proposal 為 foreground,標(biāo)簽為對應(yīng)的 class_id

          labels_in_image = gt_labels_in_image[clamped_matched_idxs_in_image]

          這里與上面不同:RegionProposalNetwork 只需要判斷 anchor 是否有目標(biāo),正類別為1;RoIHeads 需要判斷 proposal 的具體類別,所以正類別為具體的 class_id。

          (2)IOU < 0.5 的為 background,標(biāo)簽為 0

          labels_in_image[bg_inds] = torch.tensor(0)

          寫在最后

          本文簡要的介紹了 torchvision 中的 FasterRCNN 實現(xiàn),并分析我認(rèn)為重要的知識點。寫這篇文章的目的是為閱讀代碼困難的小伙伴做個指引,鼓勵入門新手能夠多看看代碼實現(xiàn)。

          若要真正的理解模型(不被面試官問?。且创a!

          瀏覽 84
          點贊
          評論
          收藏
          分享

          手機掃一掃分享

          分享
          舉報
          評論
          圖片
          表情
          推薦
          點贊
          評論
          收藏
          分享

          手機掃一掃分享

          分享
          舉報
          <kbd id="afajh"><form id="afajh"></form></kbd>
          <strong id="afajh"><dl id="afajh"></dl></strong>
            <del id="afajh"><form id="afajh"></form></del>
                1. <th id="afajh"><progress id="afajh"></progress></th>
                  <b id="afajh"><abbr id="afajh"></abbr></b>
                  <th id="afajh"><progress id="afajh"></progress></th>
                  欧美黄色大片免费在线观看 | 成人毛片视频网站 | 国产精品无码插逼 | 色欲在线 | 日本成人中文字幕在线观看 |