"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

December 26, 2021

Know how questions ? Logical perspectives ? Non-Linearity ?

How do I understand what is activated in each layer of neural network ?

Let's take an image, For object detection, the basics are edges, corners, contours, circles. Every layer certain key properties would get highlighted and summing up all the nodes and the number of layers it would have preserved the activation spots. The output of each hidden layer would be based on activation function - (0,1)-Sigmoid, (-1,1)- Tanh, (0,X) - Relu, (.1x,x for Relu). As the results of each activation in each hidden layer contribute to get features, we do a forward propagation from N to N+10 layers. Then we do a partial derivative of

  • Differentiation of Expected output with respect to Actual Output
  • Differentiation of ActualOutput / Activation Function
  • Differentiation of Input of N+10 Node with respect to Weights passed to it.

You do a forward propagation loop F1

Then compute partial derivative error and Backprop Fron N+10,N+9..Up to N layers. Since this is done from right to left. Last to first layers. The previous computed derivatives gets used in every layer to left.

Sigmoid could not pass/preserve weights for some cases as it's of range (0,1). Relu came for the rescue. You can see in previous posts how relu is able to meet just by half of the epochs it took sigmoid to learn.

Why not to assign all weights same ?

This is nothing but an increased proportion of inputs, This will not introduce any non-linearity. This is same as input with some multiplier ratio

Curse of Dimensionality ?

Images, Videos, Text are multi-dimensional. These cannot be handled as linear problems with separating boundaries. They are non-convex, They would have optimal solutions. We use partial derivatives in neural networks to arrive at optimal value. There could be multiple optimal values. To arrive at local minima vs global minima we take care of learning rates / partial derivatives.

Keep Exploring!!!


No comments: