Ordinal Encoding vs. Similarity Encoding | Notion

Ordinal Encoding

Pros

Simple to understand and implement.
Does not increase the feature space.

Cons

Imposes an arbitrary order on categories, which may not represent their true relationship.
Not suitable for non-ordinal categories (i.e., categories that do not have a natural order).

Use Cases

Suitable for ordinal features, i.e., categories that have a natural order (e.g., ratings, size).
Useful when simplicity and computational efficiency are more important than capturing complex relationships between categories.

Similarity Encoding

Pros

Captures more complex relationships between categories by considering their similarity.
Can handle unseen categories during training.

Cons

More complex to understand and implement.
Can significantly increase the feature space, leading to higher computational cost.

Use Cases

Suitable for non-ordinal features, i.e., categories that do not have a natural order (e.g., city names, product types).
Useful when it’s important to capture complex relationships between categories, even at the cost of increased computational complexity.