In the traditional skip-gram model, neural network weights are updated after processing a single (target word, context word) pair. Batch skip-gram is a variation that improves training efficiency by updating the weights after considering a batch of such pairs.

Why Use Batch Skip-gram?

How it Works (Simplified)

  1. Batch Formation: Create a batch of (target word, context word) pairs.
  2. Calculate Gradients: For each pair in the batch, calculate the prediction error and the resulting gradient for weight updates.
  3. Aggregate Gradients: Instead of immediate updates, sum or average the gradients across the entire batch.
  4. Update Weights: Update the neural network weights using the aggregated gradient.

Considerations

Strengths

Weaknesses