I2dl activationFunctions

< Back = Activation functions =

Activation functions are mathematical operations applied to the output of each node or unit in an artificial neural network, which determines the output of the node given an input or set of inputs. They are used to introduce non-linearity into the output of a node, which allows neural networks to model complex relationships between inputs and outputs.

There are several commonly used activation functions, including:


 * Sigmoid: maps any real-valued number to the range (0,1), and is often used for binary classification problems.


 * ReLU (Rectified Linear Unit): sets the output of a node to 0 if the input is negative, and to the input itself if the input is positive. ReLU is widely used due to its simplicity and computational efficiency.


 * Tanh (Hyperbolic Tangent): maps real-valued numbers to the range (-1, 1), and is often used in recurrent neural networks.


 * Softmax: maps a set of real-valued numbers to a probability distribution over several classes, and is often used for multiclass classification problems.

There are many other activation functions that have been proposed, including variants of ReLU and variants of the sigmoid function. The choice of activation function depends on the specific problem being solved and the design of the neural network architecture.

Explanation: Explain why we need activation functions

 * Non-linearity: Neural networks without activation functions are essentially just linear models, which are limited in their ability to model complex relationships between inputs and outputs. Activation functions introduce non-linearity into the output of each node, allowing neural networks to model more complex functions.


 * Vanishing Gradients: In deep neural networks, the gradients used to update the network weights during training can become very small, causing the training process to become slow and unstable. Activation functions such as ReLU, leaky ReLU, and Tanh help alleviate this problem by allowing the gradients to propagate more easily through the network.


 * Normalization: Activation functions like batch normalization can help normalize the output of each node, which can improve the stability and convergence of the training process.


 * Output Constraints: Some activation functions, like sigmoid and softmax, constrain the output of each node to specific ranges, which can be useful for specific types of problems, such as binary classification or multiclass classification.


 * Model Interpretability: Certain activation functions, like ReLU and tanh, are interpretable, which can make it easier to understand the learned representations in the network.