"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

July 05, 2020

Research Paper Reads - People Counting

Paper #1 - PaDNet: Pan-Density Crowd Counting

Key Notes
  • Density-Aware Network (DAN) contains multiple subnetworks pretrained on scenarios with different densities
  • Capturing pandensity information
  • Feature Enhancement Layer (FEL) effectively captures the global and local contextual features
  • Feature Fusion Network (FFN) embeds spatial context and fuses these density-specific features
Real World Challenges
  • Inconsistent densities due to camera perspective
Existing Methods
  • Sliding window detector
  • Regression-based approaches
  • Hand-crafted features
  • detection-based methods affected by severe occlusions
Switch-CNN
  • Switch-CNN through training the switch classifier to select the optimal regressor for one input patch
  • Each subnetwork of Switch-CNN is trained on a specific density subdataset and thus cannot utilize the whole dataset
  • High computation complexity in predicting the global and local contexts
Network Design
  • FEN extracts low-level feature of image
  • DAN employs multiple subnetworks to recognize different density levels in crowds and to generate the feature map
Regression Based Methods
  • Mapping from low-level features extracted from local image patches to the count
Features
  • Extracted features include foreground features, edge features, textures, and gradient features such as local binary pattern (LBP), and histogram oriented gradients (HOG)
  • Regression approaches include linear regression [24], piecewise linear regression [25], ridge regression [26], and Gaussian process regression
MCNN architecture
  • Five-branch contextual pyramid CNN
  • GANs-based method to generate highquality density maps
Density-Aware Network (DAN)
  • Input: input crowd image patches dataset S
  • Output: output the parameters ΘPaDNet
  • Init: Dividing the whole image patches S into N clusters S1, S2...SN via K-means clustering algorithm.
  • P is the number of people in an image patch, dij represents the distance between the ith subject and its jth nearest neighbor
Paper #2 - Point in, Box out: Beyond Counting Persons in Crowds
Crowd Counting Challenges
Heavy occlusions, perspective distortions, scale variations and varying density of people
Combine the detection result with regression result for crowd counting

Implementation
Point-level annotations on person heads
Novelty
  • Online pseudo ground truth updating scheme which initializes the pseudo ground truth bounding boxes from point-level annotations
  • Novel locally-constrained regression loss
  • Curriculum learning strategy
Key Lessons
  • Patch-based density estimation
Paper #3 - Revisiting Perspective Information for Efficient Crowd Counting
  • Estimate crowd counts via the detection of each individual pedestrian Regression of density maps 
  • Crowd counting is casted as estimating a continuous density function
Multi-scale architecture of convolutional neural networks (CNNs) to regress the density maps at different resolutions
Average distance from certain head at j to its K-nearest neighbors (K-NN)

Datasets
  • ShanghaiTech
  • WorldExpo’10
  • UCF CC 50
  • UCSD
Paper #4 - CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly
Congested Scenes
  • Convolutional neural network
  • Front-end for 2D feature extraction and a dilated CNN for the back-end
  • Multi-column based architecture (MCNN) for crowd counting.
  • Dilated convolutional layers have been demonstrated in segmentation tasks with significant improvement of accuracy
  • Dilated convolution shows distinct advantages compared to the scheme of using convolution + pooling + deconvolution. 

Deep Learning-Based Crowd Scene Analysis Survey
Detection based approaches
Datasets

Make it a High / Medium / Low crowd scenario


No comments: