This post I was looking for sometime. Backprop for me in one Slide :)
Question #4 - The
Bias (5 marks): We generally initialize the bias to random numbers larger than
0. Why? What happens if we initialize it to a value below zero? Does this
affect our ability to train?
Answer -
We cannot initialize it to zero. By chain rule it will
affect the derivatives and will end up in zeros only. By assigning random non-zero
variables we will have derivatives available and slowly find the local maxima
using gradient descent approach.
Happy Mastering DL!!!
No comments:
Post a Comment