Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): Action Recognition

June 06, 2020

Learning Notes - Action Recognition - Part II

Paper #1 - Learning Motion in Feature Space: Locally-Consistent Deformable Convolution Networks for Fine-Grained Action Detection

Key Notes

Extraction of local spatio-temporal features followed by temporal modeling

Spatio-temporal feature extraction

Sample consecutive frames
Optical flow for temporal modeling
Dense Trajectory (IDT), Motion History Image (MHI)

Network Architecture

Bi-directional LSTM
Spatial-temporal CNN (STCNN) with Segmentation models
Temporal convolutional networks (TCN)
Temporal deformable residual networks (TDRN)

Different Convolution Strategies

Standard convolution - The standard convolutions use the box, unchangeable shape of the filters
Dilated convolution - Dilating the filter means expanding its size filling the empty positions with zeros.
#out = Conv2D(10, (3, 3), dilation_rate=2)(input_tensor)
Deformable convolution - he deformable convolutions learn the filter shapes and adjust shapes to the most frequent cases

Implementation

Downsampled to 6fps
Frames were resized to 224x224 and augmented using random cropping and mean removal
Each video snippet contained 16 frames after sampling

Follow the Attention: Combining Partial Pose and Object Motion for Fine-Grained Action Detection

Key Notes

Generative Adversarial Network (GAN) to generate exact joint locations from noisy probability heat maps
Detection classification is applied to a continuous sequence of videos of multiple activities
Generative adversarial network (GAN) to produce potential body joint locations in an unsupervised manner

Features

Optical flow (OF) and feature matching
Picking from shelf vs putting back
Joint location estimation results using GAN-based approach.
Actions - Reach, Retract, Hand in, Insp. Product, Insp. Shelf
Fashion Dataset Keypoint detection similar approach can be leveraged here too

Paper #3 - Temporal Convolutional Networks for Action Segmentation and Detection

Key Notes

Temporal Convolutional Networks (TCNs)
Two types of TCNs
First, our EncoderDecoder TCN (ED-TCN) only uses a hierarchy of temporal convolutions, pooling, and upsampling but can efficiently capture long-range temporal patterns.
Second, Dilated TCN uses dilated convolutions

Code Temporal Convolutional Networks
More Reads
An introduction to ConvLSTM
Keras Convolutional LSTM network
Dense-Optical-Flow
Anomaly Detection in Videos using LSTM Convolutional Autoencoder
Attention Based CNN-ConvLSTM for Pedestrian Attribute Recognition

#Drones can monitor when fights break out.
by @Seeker #AI #ArtificialIntelligence #IoT #InternetOfThings #DeepLearning #DataScience #DataAnalytics

Cc: @pawlowskimario @randal_olson @hackingdata @revodavid pic.twitter.com/e2Vuas8GTs
— Ronald van Loon (@Ronald_vanLoon) July 14, 2020

#AI #Technology now on the lookout for shoplifters
by @mashable #ArtificialIntelligence #Tech #IT

Cc: @mikequindazzi @stratorob @moegmida @andy_fitze @wotnot_io pic.twitter.com/8nwhflxHv8
— Ronald van Loon (@Ronald_vanLoon) July 13, 2020

Happy Learning!!!

December 01, 2018

Day #157 - Video Analysis using Deep Learning - Research papers - Action Recognition

Paper 1 - Action Classification and Highlighting in Videos

Limitation of RNN is the inability to backpropagate error through long-range temporal interval (a problem known as vanishing gradient effect)

Key Summary notes from paper

End-to-end encoder-decoder LSTM framework with the built-in attention mechanism, LSTM decoder is equipped with an attention/alignment model
Encodes a video into a temporal sequence of visual representations and chooses an adaptively wighted subset of that sequence for prediction
Classify actions and highlight frames associated with the action

Implementation Learnings

CNN Encoder - Set of frames passed to extract features, VGGNet used in this case
Action Model - Feedforward network plus LSTM Decoder

Real World Implementation Article - Video Analysis to Detect Suspicious Activity Based on Deep Learning

Key Summary

Use Transfer Learning to extract features
Pass the data to new RNN
Perform Classification on it

Key Lessons

Extract frames from video
Use Inception network to generate features
Set of 15 frames used to compute action and aggregate value
Pass the 15 frames value to RNN (LSTM)
Perform Action Classification

Implementation Approach #2 - Five video classification methods implemented in Keras and TensorFlow

I liked the approach of combination of CNN and RNN

Presentation #1 - Multi-Dimensional LSTM Networks for Video Prediction

Key Lessons

Standard LSTM, Bidirectional LSTM
Parallel Multi-Dimensional LSTM
Convolutional LSTM for video prediction
Convolutional LSTM are 3D Tensors
20 Convolutional LSTM layers + 2 skip connections

Paper #2 - What is Convolutional LSTM ?

Key Lessons

Extending Fully Connected LSTM to have convolutional structures in both input to state and state to state transitions
LSTM encoder-decoder framework proposed in [23] provides a general framework for sequence-to-sequence learning problems by training temporally concatenated LSTMs
ConvLSTM are 3D tensors whose last two dimensions are spatial dimensions (rows and columns)

Paper #3 - Exploiting Objects with LSTMs for Video Categorization
Key Summary

CNN takes frame / optional flow image as its input, hence fails to consider temporal coherence in videos
To exploit long term temporal dynamics recent studies adopted LSTM
First level CNN used to extract high level objects, then they are utilized by LSTM to capture temporal dynamics in videos

Paper #4 - Tracking of Humans in Video Stream Using LSTM Recurrent Neural Network
Key Summary

Yolo + LSTM = ROLO
Input frame - Yolo Features - Spatial constraint detection - Temporal constraint LSTM - Prediction

Paper #5 - Beyond Short Snippets: Deep Networks for Video Classification
Key Summary

RNN that uses LSTM cells that are connected to the output of underlying CNN
LSTM cells operates on frame level CNN activations
Capture videos temporal evolution

Different Feature pooling strategy

Conv Pooling
Late Pooling
Slow Pooling
Local Pooling
GoogLeNet Conv Pooling

More Reads
Real-Time Recurrent Regression Networks for Visual Tracking of Generic Objects
Online Video Object Detection using Association LSTM

More References
https://github.com/harvitronix/five-video-classification-methods
https://github.com/harvitronix/continuous-online-video-classification-blog
https://github.com/tencia/video_predict
https://github.com/sagarvegad/Video-Classification-CNN-and-LSTM-
http://blog.qure.ai/notes/deep-learning-for-videos-action-recognition-review
https://github.com/Guanghan/ROLO
https://www.pyimagesearch.com/2019/07/15/video-classification-with-keras-and-deep-learning/
https://github.com/sagarvegad/Video-Classification-CNN-and-LSTM-

Action Recognition

Static Action Recognition
Video action recognition - Optical flow between frames
Stitch Multiple Frames and evaluate with CNN

Session - Link

Key Lessons

Video is a stack of frames
Sports 1 Million UCF 101 dataset
Preprocess / Crop to a fixed size

Frame-based object detections
Late Fusion - Wide spaces (15 frames)
Overlapping patches

One on Centre of object
Low-resolution frame

Data Augmentation
Resize / Rotate

Happy Learning!!!

Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database)

June 06, 2020

Learning Notes - Action Recognition - Part II

December 01, 2018

Day #157 - Video Analysis using Deep Learning - Research papers - Action Recognition

About Me

What is your Expertise

Search This Blog

Git Code Repository

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist

Blog Archive

Live Traffic

Total Pageviews

Popular Posts