"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

February 10, 2022

Fashion Segmentation Paper Read

Paper - U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection

Key Notes

  • Capture more contextual information from different scales thanks
  • Code link https://github.com/xuebinqin/U-2-Net
  • Segmenting the most visually attractive objects in an image
  • Deep features extracted by existing backbones, such as Alexnet [17], VGG [35], ResNet [12], ResNeXt [44], DenseNet [15]
  • Convolution with stride of two followed by a maxpooling with stride of two are utilized to reduce the size of the feature maps to one fourth
  • Go deeper while maintaining high resolution feature maps
  • ReSidual U-block (RSU), which is able to extract intra-stage multi-scale features 
  • Multi-scale feature extraction - A 3 × 3 filter is good for extracting local features at each layer
  • Convolution + Feature Extraction + Downsample + Upsample

  • multi-scale feature extraction target at designing new modules for extracting both local and global information from features obtained by backbone networks.

RSU mainly consists of three component (ReSidual Ublock, RSU)

  • an input convolution layer, which transforms the input feature map
  • a U-Net like symmetric encoder-decoder structure which takes the intermediate feature map as input and learns to extract and encode the multi-scale contextual information
  • a residual connection which fuses local features and the multi-scale features

Dataset - Link

  • Labelled Images Samples
  • Ground Truth / Training Images
  • After 600k iterations (with a batch size of 12), the training loss converges and the whole training process takes about 120 hours Sample data


Creating this is also the key

Usecases

  • Remove background
  • Create portrait view

Paper #2 - BASNet: Boundary-Aware Salient Object Detection

Code - Link

Background removal tool - Link

Notes

  • Architecture is composed of a densely supervised Encoder-Decoder network and a residual refinement module
  • Hybrid loss - Binary Cross Entropy (BCE), Structural SIMilarity (SSIM) and Intersectionover-Union (IoU) losses.
  • Code Link
  • It assembles a UNet-like [57] deeply supervised [31, 67] Encoder-Decoder network with a novel residual refinement module

Paper #3 - BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation

Notes

  • Spatial Path with a small stride to preserve the spatial information and generate high-resolution features
  • Context Path with a fast downsampling strategy is employed to obtain sufficient receptive field
  • Spatial Path (SP) and Context Path (CP)


Paper #4 - BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation

Notes

This architecture involves: (i) a Detail Branch, with wide channels and shallow layers to capture low-level details and generate high-resolution feature representation; (ii) a Semantic Branch, with narrow channels and deep layers to obtain high-level semantic context

Keep Exploring!!!

No comments: