SageMakerVariantInvocationsPerInstance is a predefined CloudWatch metric used for instance-based auto-scaling of Amazon SageMaker endpoints. Let's break down what that means:

Understanding the Metric

Using it for Auto-Scaling

SageMakerVariantInvocationsPerInstance is typically used in Target Tracking Scaling Policies where you want your endpoint's capacity to adapt dynamically to incoming inference requests. Here's how:

  1. Target Value: You decide on a target average number of invocations per instance you consider optimal. Say your application works best when each instance handles around 50 requests per minute.
  2. Scaling policy: You create an auto-scaling policy that will:

Why Use This Metric

Important Points