"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

June 06, 2020

Learning Notes - Action Recognition - Part II

Paper #1 - Learning Motion in Feature Space: Locally-Consistent Deformable Convolution Networks for Fine-Grained Action Detection

Key Notes
  • Extraction of local spatio-temporal features followed by temporal modeling
Spatio-temporal feature extraction
  • Sample consecutive frames
  • Optical flow for temporal modeling
  • Dense Trajectory (IDT), Motion History Image (MHI)
Network Architecture
  • Bi-directional LSTM
  • Spatial-temporal CNN (STCNN) with Segmentation models
  • Temporal convolutional networks (TCN)
  • Temporal deformable residual networks (TDRN) 
Different Convolution Strategies
  • Standard convolution - The standard convolutions use the box, unchangeable shape of the filters
  • Dilated convolution - Dilating the filter means expanding its size filling the empty positions with zeros.
  • #out = Conv2D(10, (3, 3), dilation_rate=2)(input_tensor)
  • Deformable convolution - he deformable convolutions learn the filter shapes and adjust shapes to the most frequent cases
Implementation
  • Downsampled to 6fps
  • Frames were resized to 224x224 and augmented using random cropping and mean removal
  • Each video snippet contained 16 frames after sampling
Key Notes
  • Generative Adversarial Network (GAN) to generate exact joint locations from noisy probability heat maps
  • Detection classification is applied to a continuous sequence of videos of multiple activities
  • Generative adversarial network (GAN) to produce potential body joint locations in an unsupervised manner
Features
  • Optical flow (OF) and feature matching
  • Picking from shelf vs putting back
  • Joint location estimation results using GAN-based approach.
  • Actions - Reach, Retract, Hand in, Insp. Product, Insp. Shelf
  • Fashion Dataset Keypoint detection similar approach can be leveraged here too

Key Notes
  • Temporal Convolutional Networks (TCNs)
  • Two types of TCNs 
  • First, our EncoderDecoder TCN (ED-TCN) only uses a hierarchy of temporal convolutions, pooling, and upsampling but can efficiently capture long-range temporal patterns.
  • Second, Dilated TCN uses dilated convolutions
Code Temporal Convolutional Networks
More Reads
An introduction to ConvLSTM
Keras Convolutional LSTM network
Dense-Optical-Flow
Anomaly Detection in Videos using LSTM Convolutional Autoencoder
Attention Based CNN-ConvLSTM for Pedestrian Attribute Recognition

No comments: