"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

November 24, 2018

Day #154 - CNN - Class Notes

Key Summary
  • 540 million years to trace evolutions of vision
  • Human vision is trained for 540 million years
  • Hierarchy of layers in our vision are involved in processing
What Computers see
  • Images are numbers
  • Pixes represented by 2D array of numbers
  • RGB (3D Array)
  • Computer vision Tasks
  • Regression (Output takes a continuous value)
  • Classification (Single Class label)
  • Detect presence of features in particular image
Manual Feature Extraction
  • Domain Knowledge
  • Define features
  • Detect features and classify
Image Challenges
  • Occlusion
  • Viewpoint variation
  • Scale variation
  • Deformation
  • Background Clutter
  • Intra Class variation
  • Illumination Conditions
Neural Networks
  • Learn directly from Image data
  • Low Level (Edge / Dark Spots)
  • Mid Level (eyes, Ears, Nose)
  • High Level (Facial Structures)
Fully Connected Neural Network
  • Multiple Hidden Layers
  • Input 2D Image (Vector of pixel values)
  • All spatial information will be lost
  • Connect neuron in hidden layer to all neurons in input layer
  • Slide patch window across the image, this considers spatial structure
  • Apply set of weights to extract local features
  • Multiple filters and multiple set of weights
  • Patchy Operation known as convolution
Feature Extraction and Convolution
  • Convolution preserves spatial relationship between pixels
  • Elementwise multiplication between patch and filters
  • Different filters for Sharpening, Edge
  • Use multiple filters to extract different features



CNNs for Classification
  • Convolution - Apply filter with learned weights to generate feature maps
  • Non-Linearity - Often Relu (Image data highly non-linear)
  • Pooling - Downsampling for each feature map
  • Train model to learn weights
  • Each Neuron sees patch of inputs
  • Apply matrix of weights for elementwise multiplication
  • depth = number of filters
  • Relu - Pixel by pixel operation that replaces all negative values by zero (Non-Linear operation)
  • Pooling - Reduce dimensionality preserve spatial invariance (Downsampling operations)
  • Layer operations to learn hierarchy of features
  • Feature Learning Pipeline + Performing Classification
Imagenet CNN
  • 14 million Images
  • 21,841 categories
  • Deeper Network vs How deep we can go
Architecture for Applications
  • New architecture beyond Feature Learning
  • Semantic Segmentation (Fully Convolutional Network) - Downsampling and Upsampling operations, Driving Scene Segmentation, Encoder-Decoder
  • Object Detection - Region Proposals / Classify them, Really long time to compute
  • Image Captioning - Generate Semantic Content - Remove Fully Connected layer and replace them with RNN
  • CNN feature Layer + RNN (Trained to predict words that describe the image)
  • CAM (Class Activation Map)

Happy Mastering DL!!!

No comments: