Paper 1 - Action Classification and Highlighting in Videos
Limitation of RNN is the inability to backpropagate error through long-range temporal interval (a problem known as vanishing gradient effect)
Key Summary notes from paper
Key Summary
Key Lessons
Key Lessons
Key Summary
Key Summary
Paper #5 - Beyond Short Snippets: Deep Networks for Video Classification
Key Summary
Real-Time Recurrent Regression Networks for Visual Tracking of Generic Objects
Online Video Object Detection using Association LSTM
More References
https://github.com/harvitronix/five-video-classification-methods
https://github.com/harvitronix/continuous-online-video-classification-blog
https://github.com/tencia/video_predict
https://github.com/sagarvegad/Video-Classification-CNN-and-LSTM-
http://blog.qure.ai/notes/deep-learning-for-videos-action-recognition-review
https://github.com/Guanghan/ROLO
https://www.pyimagesearch.com/2019/07/15/video-classification-with-keras-and-deep-learning/
https://github.com/sagarvegad/Video-Classification-CNN-and-LSTM-
Action Recognition
Limitation of RNN is the inability to backpropagate error through long-range temporal interval (a problem known as vanishing gradient effect)
Key Summary notes from paper
- End-to-end encoder-decoder LSTM framework with the built-in attention mechanism, LSTM decoder is equipped with an attention/alignment model
- Encodes a video into a temporal sequence of visual representations and chooses an adaptively wighted subset of that sequence for prediction
- Classify actions and highlight frames associated with the action
- CNN Encoder - Set of frames passed to extract features, VGGNet used in this case
- Action Model - Feedforward network plus LSTM Decoder
Key Summary
- Use Transfer Learning to extract features
- Pass the data to new RNN
- Perform Classification on it
- Extract frames from video
- Use Inception network to generate features
- Set of 15 frames used to compute action and aggregate value
- Pass the 15 frames value to RNN (LSTM)
- Perform Action Classification
- I liked the approach of combination of CNN and RNN
Key Lessons
- Standard LSTM, Bidirectional LSTM
- Parallel Multi-Dimensional LSTM
- Convolutional LSTM for video prediction
- Convolutional LSTM are 3D Tensors
- 20 Convolutional LSTM layers + 2 skip connections
Key Lessons
- Extending Fully Connected LSTM to have convolutional structures in both input to state and state to state transitions
- LSTM encoder-decoder framework proposed in [23] provides a general framework for sequence-to-sequence learning problems by training temporally concatenated LSTMs
- ConvLSTM are 3D tensors whose last two dimensions are spatial dimensions (rows and columns)
Key Summary
- CNN takes frame / optional flow image as its input, hence fails to consider temporal coherence in videos
- To exploit long term temporal dynamics recent studies adopted LSTM
- First level CNN used to extract high level objects, then they are utilized by LSTM to capture temporal dynamics in videos
Key Summary
- Yolo + LSTM = ROLO
- Input frame - Yolo Features - Spatial constraint detection - Temporal constraint LSTM - Prediction
Key Summary
- RNN that uses LSTM cells that are connected to the output of underlying CNN
- LSTM cells operates on frame level CNN activations
- Capture videos temporal evolution
- Conv Pooling
- Late Pooling
- Slow Pooling
- Local Pooling
- GoogLeNet Conv Pooling
Real-Time Recurrent Regression Networks for Visual Tracking of Generic Objects
Online Video Object Detection using Association LSTM
More References
https://github.com/harvitronix/five-video-classification-methods
https://github.com/harvitronix/continuous-online-video-classification-blog
https://github.com/tencia/video_predict
https://github.com/sagarvegad/Video-Classification-CNN-and-LSTM-
http://blog.qure.ai/notes/deep-learning-for-videos-action-recognition-review
https://github.com/Guanghan/ROLO
https://www.pyimagesearch.com/2019/07/15/video-classification-with-keras-and-deep-learning/
https://github.com/sagarvegad/Video-Classification-CNN-and-LSTM-
Action Recognition
- Static Action Recognition
- Video action recognition - Optical flow between frames
- Stitch Multiple Frames and evaluate with CNN
Session - Link
Key Lessons
No comments:
Post a Comment