Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): Day #154 - CNN - Class Notes

November 24, 2018

Day #154 - CNN - Class Notes

Key Summary

540 million years to trace evolutions of vision
Human vision is trained for 540 million years
Hierarchy of layers in our vision are involved in processing

What Computers see

Images are numbers
Pixes represented by 2D array of numbers
RGB (3D Array)
Computer vision Tasks
Regression (Output takes a continuous value)
Classification (Single Class label)
Detect presence of features in particular image

Manual Feature Extraction

Domain Knowledge
Define features
Detect features and classify

Image Challenges

Occlusion
Viewpoint variation
Scale variation
Deformation
Background Clutter
Intra Class variation
Illumination Conditions

Neural Networks

Learn directly from Image data
Low Level (Edge / Dark Spots)
Mid Level (eyes, Ears, Nose)
High Level (Facial Structures)

Fully Connected Neural Network

Multiple Hidden Layers
Input 2D Image (Vector of pixel values)
All spatial information will be lost
Connect neuron in hidden layer to all neurons in input layer
Slide patch window across the image, this considers spatial structure
Apply set of weights to extract local features
Multiple filters and multiple set of weights
Patchy Operation known as convolution

Feature Extraction and Convolution

Convolution preserves spatial relationship between pixels
Elementwise multiplication between patch and filters
Different filters for Sharpening, Edge
Use multiple filters to extract different features

CNNs for Classification

Convolution - Apply filter with learned weights to generate feature maps
Non-Linearity - Often Relu (Image data highly non-linear)
Pooling - Downsampling for each feature map
Train model to learn weights
Each Neuron sees patch of inputs
Apply matrix of weights for elementwise multiplication
depth = number of filters
Relu - Pixel by pixel operation that replaces all negative values by zero (Non-Linear operation)
Pooling - Reduce dimensionality preserve spatial invariance (Downsampling operations)
Layer operations to learn hierarchy of features
Feature Learning Pipeline + Performing Classification

Imagenet CNN

14 million Images
21,841 categories
Deeper Network vs How deep we can go

Architecture for Applications

New architecture beyond Feature Learning
Semantic Segmentation (Fully Convolutional Network) - Downsampling and Upsampling operations, Driving Scene Segmentation, Encoder-Decoder
Object Detection - Region Proposals / Classify them, Really long time to compute
Image Captioning - Generate Semantic Content - Remove Fully Connected layer and replace them with RNN
CNN feature Layer + RNN (Trained to predict words that describe the image)
CAM (Class Activation Map)

Happy Mastering DL!!!

No comments:

Subscribe to: Post Comments (Atom)