"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

December 28, 2018

Day #175 - Videos and Unsupervised Learning

Dense Trajectory features
  • Detect key points to track
  • Tracklets obtained and features get accumulated
  • Feature points at different scales
  • Track using optical flow methods
  • Bunch of features extracted in local coordinate system of every track
  • 15 frames, x,y positions
  • Extract features in local coordinate system between two frames
  • Differences in the key points reflects the optical flow



Key Point Detection
  • Detect features
  • Run Optical flow algos
  • Displacement vector between every single frame
  • Optical flow methods in python check
  • Histogram bins
  • SVM

Deep Network
  • Process frame in Alexnet
  • Encode 15 frame in CNN
  • Sharing weights spatially
  • Extend filters in small amounts in time
  • 11 x 11 x T (Temporal Extent)
  • 3 (R,G,B)
  • Sliding filters in time
  • Carving out activation volume


Spatio-Temporal ConvNets
  • 3D Conv in Space and Time
  • Slow Fusion 3D Conv Approach
  • Learned filters on first layers (Smaller filters - More layers)
  • Spatio-Temporal ConvNets
  • Datasets are not quiet there
  • 3D Conv, LSTM
  • Single frame networks are baseline



Spatio-Temporal ConvNets
  • c3D
  • 3 x 3 col, 2 x 2 pool
  • VGG in 3D
  • 3D Conv is Painful
  • Two ConvNets look at image
  • One look at optical flow
  • Extract Optical Flow, Fuse in the end
  • Optical flow contains lot of information
  • Need to check - Compute optical flow between two frames
Long time Spatial - Temporal ConvNets
  • Videos with temporal dependencies
  • Events larger than timescale
  • Attention model
  • Attention over different parts of idea
  • Process images at detail level, resize at global level
  • RNN
  • Video - Classes prediction at point in time
  • RNN allow to have infinite context
  • 3D conv, lstm
  • CNN + LSTM



Video Classification Architectures
  • RNN + 3D Convnet
  • RNN before the ConvNet processes the image (Idea)
  • ConvNets between frames (Scales) - Speed up and Slow down
  • Bakground Subtraction only look at things of interest (Check code)
  • Weight sharing between ConvNet and RNN
Idea
  • Get Rid of RNN
  • Convnet
  • All neurons in convnet is recurrent
  • GRU slightly different update formula
  • Replace through the conv
  • Convolve over input, Output and then RNN
  • RNN Convnet (Check code)
Summary
  • Local motion 3D Conv
  • Global motion LSTM



Research papers of video + audio not there
Supervised Learning - Dataset has data x, label y. Goal in supervised learning is function that takes input x and outputs y
Example - Classification, Regression, Object detection, Semantic segmentation, image captioning

Unsupervised Learning
  • Just Data and labels
  • Learn Some structure on data
  • Examples - Clustering, dimensionality reduction, feature learning, generative models
Autoencoders
  • Traditional - Feature Learning
  • Variational - Generate Samples


  • Input x -> Pass thru Encode network -> Learnable feature Z
  • Reconstruction - Reproduce data x from features z
  • Decoder - Smaller features - Blows back to original data
  • Encoder / Decoder sometimes share weights
  • PCA Optimal for L2 Reconstruction
  • Our intention is learn useful tasks
  • Generate Fake images like original images
Variational AutoEncoder
  • Exist outside world prior distribution
  • Assume distribution is Gaussian
  • Bayes rule tell posterior
  • Probablity given observed data
  • Unsupervised data to learn features
  • Maximum likelihood
  • Variational Inference
  • Insert Extra constant, break into two different terms

Adverserial Networks - Generate Samples
  • Generator - Mini batches of random noise
  • Discriminator - both original and fake images
  • Architecture bigger and powerful using multiscale processing
  • Generate at multiple scales
  • Low-Resolution -> Upsample -> Delta on top of it -> Upsample ..
Variational Autoencoder
  • Adverserial noise inputs can be changed as we generate
  • Interpolate between random points in latent space
GAN
  • Learns Nice useful representations
  • Variational Autoencoder
  • Add Adverserial network to VAE
  • Discriminator Network added
  • Pixel Loss
  • Generate Samples like Alexnet


Python / OpenCV code to try
  • Program #1 - Generate optical flow between frames, Compute sift between two frames, Color the moved pixes
  • Program #2 - Upsample Images
  • Program #3 - Generate Optical flow data, Use it to feed to CNN to classify actions
  • Program #4 - GANs
  • Program #5 - CNN with 3x3, 1x7 different filters and training / test accuracy



Happy Mastering DL!!!

No comments: