"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

November 19, 2018

Day #151 - Back to Basics - Geoff Hinton Papers

Paper 1 - Learning Representations by back propagating errors (1986)

Key Summary
  • The procedure repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between actual output and desired output
  • Ability to create new distinguishing features
  • The aim is to find the set of weights that ensure that for each input vector the output vector produced by the network is same as the desired output vector
  • The drawback in learning procedure is that the error surface may contain local minima so that gradient descent is not guaranteed to find a global minimum
Paper 2 - Deep learning (2015)

Key Summary

Deep Learning
  • Machine Learning systems are used to identify objects in images, transcribe speech into text, match new items, posts or products with user interests and relevant results of search
  • Multiple processing layers to learn representations of data with multiple levels of abstraction
  • Recurrent Networks for sequential data such as text and speech
  • Deep Learning methods are representation learning methods with multiple levels of representation obtained by composing non-linear models that transform representation at abstract level
  • The layers are learned from data by general purpose learning procedure
  • The conventional option is hand design good feature extractors which require a considerable amount of engineering skill and domain expertise. Key advantage of deep learning is learn automatically using general purpose learning procedure
  • A deep learning architecture is a multistack layer of simple modules, all of which may compute simple non-linear input-output mappings
  • The backpropagation procedure to compute the gradient of an objective function with respect to the weights of a multi-layer stack of module is nothing more than a practical application of chain rule of derivatives
Convolutional Neural Networks
  • Composed of Convolutional layers and pooling layers
  • Units in convolutional layer organized into feature maps
  • Filtering operation performed by feature map is a discrete convolution
  • Pooling computes maximum of local patches
  • Two or three stages of convolution, non-linearity and pooling are stacked up, followed by more convolutional and fully connected layer
Recurrent Neural Networks
  • RNN process an input sequence one element at a time, maintaining their hidden units as state vector (history of past sequences)
  • Good at predicting next word in a sequence
Paper #3 - Dropout: A Simple Way to Prevent Neural Networks from Overfitting
Key Summary 
  • Randomly drop units from neural network during training
  • Dropping out units hidden and visible in a neural network
  • Temporarily remove from network along with incoming and outgoing connections

Key Summary
  • Long Short Term Memory - RNN Architecture
  • RNN are deep in time, Since their hidden state is a function of all previous hiddem states
  • Make use of previous context
  • Deep birectional LSTM RNNs for speech recognition
LSTM Components
  • Input gate
  • Forget gate
  • Output Gate
  • Cell Activation Vectors
Bidirectional RNN has
  • Forward Hidden Sequence
  • Backward Hidden Sequence
CTC - Connnectionist Temporal Classification
  • Uses Softmax layer to define a seperate output distribution
  • CTC uses forward - backward algorithm to sum over all the possible alignments and determine the normalised probability
  • RNN trained with CTC are bi-directional

Brain creates internal representations to learn without any explicit instructions
  • ANN are modern neurons
  • Behavior of ANN depends on weights, activation functions
  • Backpropagation algorithm to train the neural network
Backpropagation Challenges
  • Requires labeled training data
  • Forward Pass - Signal = Activity = y
  • Backward Pass - Signal = dE/dy
  • Learning alters the shape of search space and provides good evolutionary path
  • Learning organisms evolve much faster
Key Summary
  • Interaction between learning and evolution was proposed by Baldwin
  • Learning alters search space in which evolution operates
  • Inspired by Theory of natural evolution
  • Motivated by Darwinian Theory
Unimodal vs Multimodal
  • A landscape is unimodal if it has single minimum
  • Multimodal if it has several minima with equal function values
More Papers - Link

Happy Learning!!!!

No comments: