Same experiment, different activation functions, different observations
- The number of epochs to arrive at similar accuracy - Tanh / Relu seem to perform significantly better than sigmoid. In the case of huge datasets, It makes sense to pick those activation functions which perform better and learn faster.
- Boundary types between each activation function - Although every activation function solves it you can spot the boundary circular / boxed based on their behavior.
- The training accuracy with 880 Epochs in Sigmoid = 90 Epochs of Tanh = 78 Epochs of Relu. Every activation function will converge but the number of epochs depends on choosing the right activation function to save compute, faster training
Keep Exploring!!!
No comments:
Post a Comment