Paper - U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection
Key Notes
- Capture more contextual information from different scales thanks
- Code link https://github.com/xuebinqin/U-2-Net
- Segmenting the most visually attractive objects in an image
- Deep features extracted by existing backbones, such as Alexnet [17], VGG [35], ResNet [12], ResNeXt [44], DenseNet [15]
- Convolution with stride of two followed by a maxpooling with stride of two are utilized to reduce the size of the feature maps to one fourth
- Go deeper while maintaining high resolution feature maps
- ReSidual U-block (RSU), which is able to extract intra-stage multi-scale features
- Multi-scale feature extraction - A 3 × 3 filter is good for extracting local features at each layer
- Convolution + Feature Extraction + Downsample + Upsample
- multi-scale feature extraction target at designing new modules for extracting both local and global information from features obtained by backbone networks.
RSU mainly consists of three component (ReSidual Ublock, RSU)
- an input convolution layer, which transforms the input feature map
- a U-Net like symmetric encoder-decoder structure which takes the intermediate feature map as input and learns to extract and encode the multi-scale contextual information
- a residual connection which fuses local features and the multi-scale features
Dataset - Link
- Labelled Images Samples
- Ground Truth / Training Images
- After 600k iterations (with a batch size of 12), the training loss converges and the whole training process takes about 120 hours Sample data
Creating this is also the key
Usecases
- Remove background
- Create portrait view
Paper #2 - BASNet: Boundary-Aware Salient Object Detection
Code - Link
Background removal tool - Link
Notes
- Architecture is composed of a densely supervised Encoder-Decoder network and a residual refinement module
- Hybrid loss - Binary Cross Entropy (BCE), Structural SIMilarity (SSIM) and Intersectionover-Union (IoU) losses.
- Code Link
- It assembles a UNet-like [57] deeply supervised [31, 67] Encoder-Decoder network with a novel residual refinement module
Paper #3 - BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation
Notes
- Spatial Path with a small stride to preserve the spatial information and generate high-resolution features
- Context Path with a fast downsampling strategy is employed to obtain sufficient receptive field
- Spatial Path (SP) and Context Path (CP)
Paper #4 - BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation
Notes
This architecture involves: (i) a Detail Branch, with wide channels and shallow layers to capture low-level details and generate high-resolution feature representation; (ii) a Semantic Branch, with narrow channels and deep layers to obtain high-level semantic context
Keep Exploring!!!
No comments:
Post a Comment