Description

Orthogonal Sparse Bigram (OSB) is a text processing technique used in Natural Language Processing (NLP) and Information Retrieval (IR). It’s designed to capture the context of words in a document by considering not only individual words but also pairs of words that appear within a certain distance from each other.

How OSB Works

OSB works by applying a fixed-size sliding window over the text. For each position of the window, it generates a set of bigrams consisting of the word at the center of the window and each of the other words in the window. These bigrams are treated as separate features in the resulting feature vector.

Benefits

Limitations

Features

Use Cases