Activation Function

Course Summary - 4. Introduction to TensorFlow

Activation Function for Non-linearlity
→ A linear model : the form of output `y = w1 * x1 + w2 * x2 + w3 * x3`
→ Substitute each group of weights for a new weight.!
→ Exactly the same linear model* as before despite adding a hidden layer of neurons.*
- First neuron of the Hidden Layer on the left takes the weights from all three Input nodes

Sigmoid/Tanh → Vanishing Gradient → ReLU → Dying ReLU → its variants

Nonlinear activation functions with sigmoid, hyberbolic tangent (scaled and shifted sigmoid) being some of the earliest.
→ Saturation which leads to the vanishing gradient problem
where, with zero gradients, the model’s weights don’t update and training halts.
ReLU: Rectified Linear Unit is one of our favorites because it’s simple and works well.
- Networks with ReLU hidden activations often have 10 times the speed of training than networks with Sigmoid hidden activations.
- In the positive domain : it is linear → don’t have saturation
- In the negative domain : 0 → the negative domain’s function always being zero → can end up with ReLU layers dying.
- Inputs in the negative domain → then the output of the activation will be zero which doesn’t help in the next layer in getting the inputs into the positive domain : compounds and creates a lot of zero activations.
- During backpropagation when updating the weights, since we have to multiply our error’s derivative by the activation, we end up with a gradient of zero, thus a weight update of 0, and thus the weights don’t change and training fails for that layer.

Many different ReLU variants
Slightly modify the ReLU and avoid the "dying ReLU"

Problem	Insight	Solution
1. Gradients can vanish	Each additional layer can successively reduce - signal vs. noise	Using `ReLu` instead of `sigmoid`/`tanh` can help
2. Gradients can explode	Learning rates are important here	Batch normalization (useful knob) can help
3. `ReLu` layers can die	Monitor fraction of zero weights in TensorBoard	Lower your learning rate

Glossary (0)	2021.12.12
Model Debugging - Hyperparameter 하이퍼파라미터 값 조정 (0)	2021.11.23
Model debugging and Loss curve (0)	2021.11.23
Hyperparameters Tuning 하이퍼파라미터 튜닝 (0)	2021.11.22
Activation Function 활성화 함수 - Sigmoid, tanh, ReLU, LeakyReLU (0)	2021.11.21