L1 Regularization vs. L2 Regularization | Notion

L1 Regularization (Lasso)

Pros

L1 regularization can lead to sparse solutions, effectively performing feature selection.
It is robust to outliers.

Cons

L1 regularization can be unstable in high dimensions, i.e., slight changes in the data can lead to large changes in the model.
It tends to distribute the effect of a feature across multiple correlated features.

Use Cases

L1 regularization is useful when we have a high-dimensional dataset and we suspect that only a few features are actually relevant.
It is also useful when we want a model that is interpretable, as the sparsity induced by L1 regularization leads to a model that uses only a subset of the features.

L2 Regularization (Ridge)

Pros

L2 regularization does not lead to sparse solutions and is more stable in high dimensions.
It tends to distribute the effect of a feature evenly across all features.

Cons

L2 regularization does not perform feature selection - it will include all features in the model.
It is sensitive to outliers.

Use Cases

L2 regularization is useful when we have a lot of features, each of which contributes a bit to the prediction.
It is also useful when we do not need a sparse or interpretable model, but rather a model that is stable and performs well.