"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

October 25, 2022

Segmentation - U2Net

Paper #1 - U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection

Key Features

  • Mixture of receptive fields of different sizes
  • Increases the depth of the whole architecture 
  • Alexnet, VGG, ResNet, ResNeXt, DenseNet, etc. However, these backbones are all originally designed for image classification
  • CNN - They extract features that are representative of semantic meaning
  • Segmentation needs local details and global contrast information, which are essential to saliency detection
  • Key thought - can we go deeper while maintaining high-resolution feature maps, at low memory and computation cost
  • U2-Net is a two-level nested U-structure that is designed for SOD without using any pre-trained backbones from image classification
  • Input size of 320×320×3
  • obtain a feature vector for describing the saliency of this pixel
  • saliency detection requires both local and global information
  • stacking two differently configured U-Nets
  • Convolution + Residual Blocks
  • multiple dilated convolutions 
  • Dilated Convolution: It is a technique that expands the kernel (input) by inserting holes between its consecutive elements. In simpler terms, it is the same as convolution but it involves pixel skipping, so as to cover a larger area of the input. 
  • The standard keras Conv2D layer supports dilation, you just need to set the dilation_rate to a value bigger than one. For example:
  • out = Conv2D(10, (3, 3), dilation_rate=2)(input_tensor)

  • novel ReSidual Ublock, RSU

  • ”(U×n-Net)”, where n is the number of repeated U-Net modules.
  • Dialted Convoluton + Custom Resnet
  • Repeated modules

  • In the training process, each image is first resized to 320×320 and randomly flipped vertically and cropped to 288×288.

Paper #2 - Highly Accurate Dichotomous Image Segmentation

  • new metric called human correction efforts (HCE)
  • To obtain more representative features, FCN-based models [60], Encoder-Decoder [3,81], Coarse-to-Fine [96], Predict-Refine [78, 90], Vision Transformer [118] and so on are developed.
Demos & Codes

Keep Exploring!!!

No comments: