L1 Regularization

$$ \mathcal{L}{L1}(\mathbf{w}) = \mathcal{L}(\mathbf{w}) + \lambda \sum{i=1}^n |w_i| $$

L1 Regularization, also known as Lasso Regularization, is a regularization technique that helps to prevent overfitting in a machine learning model. It works by adding a penalty term to the loss function. The penalty term is the sum of the absolute values of the weights, multiplied by the regularization parameter, lambda. The effect of L1 regularization is that it tends to make the weights of the less important features zero, effectively eliminating those features from the model. This leads to a sparse model, where only a subset of the features are used, making the model simpler and more interpretable.

L2 Regularization

$$ \mathcal{L}{L2}(\mathbf{w}) = \mathcal{L}(\mathbf{w}) + \frac{\lambda}{2} \sum{i=1}^n w_i^2 $$

L2 Regularization, also known as Ridge Regularization, is another regularization technique that helps to prevent overfitting in a machine learning model. Similar to L1 regularization, it adds a penalty term to the loss function. However, the penalty term in L2 regularization is the sum of the squares of the weights, multiplied by the regularization parameter, lambda. The effect of L2 regularization is that it tends to make the weights of the less important features small but not zero. This leads to a model where all features are used, but the influence of less important features is reduced. Unlike L1 regularization, L2 regularization does not result in a sparse model.

Noise injection in inputs
Data augmentation
Early stopping
Dropout for neural networks
Elastic Net (Combination of L1 and L2)
Lasso Regression (L1 Regularization)
Ridge Regression (L2 Regularization)

Here are some commonly used regularization techniques:

Regularization is a technique used in machine learning models to combat overfitting by adding a penalty term to the loss function. Overfitting occurs when a model learns the noise along with the underlying pattern in the data. Regularized models thus generalize better on unseen data.