Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): Day #175

December 28, 2018

Day #175 - Videos and Unsupervised Learning

Dense Trajectory features

Detect key points to track
Tracklets obtained and features get accumulated
Feature points at different scales
Track using optical flow methods
Bunch of features extracted in local coordinate system of every track
15 frames, x,y positions
Extract features in local coordinate system between two frames
Differences in the key points reflects the optical flow

Key Point Detection

Detect features
Run Optical flow algos
Displacement vector between every single frame
Optical flow methods in python check
Histogram bins
SVM

Deep Network

Process frame in Alexnet
Encode 15 frame in CNN
Sharing weights spatially
Extend filters in small amounts in time
11 x 11 x T (Temporal Extent)
3 (R,G,B)
Sliding filters in time
Carving out activation volume

Spatio-Temporal ConvNets

3D Conv in Space and Time
Slow Fusion 3D Conv Approach
Learned filters on first layers (Smaller filters - More layers)
Spatio-Temporal ConvNets
Datasets are not quiet there
3D Conv, LSTM
Single frame networks are baseline

Spatio-Temporal ConvNets

c3D
3 x 3 col, 2 x 2 pool
VGG in 3D
3D Conv is Painful
Two ConvNets look at image
One look at optical flow
Extract Optical Flow, Fuse in the end
Optical flow contains lot of information
Need to check - Compute optical flow between two frames

Long time Spatial - Temporal ConvNets

Videos with temporal dependencies
Events larger than timescale
Attention model
Attention over different parts of idea
Process images at detail level, resize at global level
RNN
Video - Classes prediction at point in time
RNN allow to have infinite context
3D conv, lstm
CNN + LSTM

Video Classification Architectures

RNN + 3D Convnet
RNN before the ConvNet processes the image (Idea)
ConvNets between frames (Scales) - Speed up and Slow down
Bakground Subtraction only look at things of interest (Check code)
Weight sharing between ConvNet and RNN

Idea

Get Rid of RNN
Convnet
All neurons in convnet is recurrent
GRU slightly different update formula
Replace through the conv
Convolve over input, Output and then RNN
RNN Convnet (Check code)

Summary

Local motion 3D Conv
Global motion LSTM

Research papers of video + audio not there
Supervised Learning - Dataset has data x, label y. Goal in supervised learning is function that takes input x and outputs y
Example - Classification, Regression, Object detection, Semantic segmentation, image captioning

Unsupervised Learning

Just Data and labels
Learn Some structure on data
Examples - Clustering, dimensionality reduction, feature learning, generative models

Autoencoders

Traditional - Feature Learning
Variational - Generate Samples

Input x -> Pass thru Encode network -> Learnable feature Z
Reconstruction - Reproduce data x from features z
Decoder - Smaller features - Blows back to original data
Encoder / Decoder sometimes share weights
PCA Optimal for L2 Reconstruction
Our intention is learn useful tasks
Generate Fake images like original images

Variational AutoEncoder

Exist outside world prior distribution
Assume distribution is Gaussian
Bayes rule tell posterior
Probablity given observed data
Unsupervised data to learn features
Maximum likelihood
Variational Inference
Insert Extra constant, break into two different terms

Adverserial Networks - Generate Samples

Generator - Mini batches of random noise
Discriminator - both original and fake images
Architecture bigger and powerful using multiscale processing
Generate at multiple scales
Low-Resolution -> Upsample -> Delta on top of it -> Upsample ..

Variational Autoencoder

Adverserial noise inputs can be changed as we generate
Interpolate between random points in latent space

GAN

Learns Nice useful representations
Variational Autoencoder
Add Adverserial network to VAE
Discriminator Network added
Pixel Loss
Generate Samples like Alexnet

Python / OpenCV code to try

Program #1 - Generate optical flow between frames, Compute sift between two frames, Color the moved pixes
Program #2 - Upsample Images
Program #3 - Generate Optical flow data, Use it to feed to CNN to classify actions
Program #4 - GANs
Program #5 - CNN with 3x3, 1x7 different filters and training / test accuracy

Happy Mastering DL!!!

Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database)

December 28, 2018

Day #175 - Videos and Unsupervised Learning

No comments:

Git Code Repository

About Me

What is your Expertise

Search This Blog

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist

Blog Archive

Live Traffic

Total Pageviews

Popular Posts