L1 Regularization (Lasso)
Pros
- L1 regularization can lead to sparse solutions, effectively performing feature selection.
- It is robust to outliers.
Cons
- L1 regularization can be unstable in high dimensions, i.e., slight changes in the data can lead to large changes in the model.
- It tends to distribute the effect of a feature across multiple correlated features.
Use Cases
- L1 regularization is useful when we have a high-dimensional dataset and we suspect that only a few features are actually relevant.
- It is also useful when we want a model that is interpretable, as the sparsity induced by L1 regularization leads to a model that uses only a subset of the features.
L2 Regularization (Ridge)
Pros
- L2 regularization does not lead to sparse solutions and is more stable in high dimensions.
- It tends to distribute the effect of a feature evenly across all features.
Cons
- L2 regularization does not perform feature selection - it will include all features in the model.
- It is sensitive to outliers.
Use Cases
- L2 regularization is useful when we have a lot of features, each of which contributes a bit to the prediction.
- It is also useful when we do not need a sparse or interpretable model, but rather a model that is stable and performs well.