Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): Segmentation

October 25, 2022

Key Features

Mixture of receptive fields of different sizes
Increases the depth of the whole architecture
Alexnet, VGG, ResNet, ResNeXt, DenseNet, etc. However, these backbones are all originally designed for image classification
CNN - They extract features that are representative of semantic meaning
Segmentation needs local details and global contrast information, which are essential to saliency detection
Key thought - can we go deeper while maintaining high-resolution feature maps, at low memory and computation cost
U2-Net is a two-level nested U-structure that is designed for SOD without using any pre-trained backbones from image classification
Input size of 320×320×3
obtain a feature vector for describing the saliency of this pixel
saliency detection requires both local and global information
stacking two differently configured U-Nets
Convolution + Residual Blocks
multiple dilated convolutions
Dilated Convolution: It is a technique that expands the kernel (input) by inserting holes between its consecutive elements. In simpler terms, it is the same as convolution but it involves pixel skipping, so as to cover a larger area of the input.
The standard keras Conv2D layer supports dilation, you just need to set the dilation_rate to a value bigger than one. For example:
out = Conv2D(10, (3, 3), dilation_rate=2)(input_tensor)

In the training process, each image is first resized to 320×320 and randomly flipped vertically and cropped to 288×288.

new metric called human correction efforts (HCE)
To obtain more representative features, FCN-based models [60], Encoder-Decoder [3,81], Coarse-to-Fine [96], Predict-Refine [78, 90], Vision Transformer [118] and so on are developed.

Demos & Codes

Keep Exploring!!!

Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database)