Parameters | Notion

VariantName: A unique identifier for the production variant within the endpoint. It's used to differentiate between the different models or model versions deployed on the same endpoint.
- Use Case: Identifying and routing requests to specific model versions for A/B testing.
ModelName: The name of the Amazon SageMaker model that you want to host. This refers to the model artifact that has been previously created and registered in SageMaker.
- Use Case: Deploying different models or model versions to serve inference requests.
InitialInstanceCount: The initial number of instances to be launched for the model variant. It dictates the scale at which the model starts serving inference requests.
- Use Case: Ensuring adequate capacity to meet anticipated demand while optimizing costs.
InstanceType: The type of Amazon EC2 instance to use for hosting the model. This determines the compute resources available for the model variant.
- Use Case: Balancing between computational power and cost for model inferences.
InitialVariantWeight: A value indicating the fraction of the total inference traffic to be routed to this model variant initially. Weights are relative and do not need to sum to 1.
- Use Case: Gradually introducing a new model variant by starting with a small percentage of the traffic.
AcceleratorType (optional): Specifies the type of Elastic Inference accelerator to attach to each instance of the model variant. This is relevant for models that can benefit from accelerated computing.
- Use Case: Enhancing model inference performance and reducing costs for compute-intensive models.
DesiredWeight (used with UpdateEndpointWeightsAndCapacities operation): Similar to InitialVariantWeight, but used for updating the traffic distribution among variants after the endpoint is already running.
- Use Case: Adjusting traffic distribution in response to real-time performance metrics or to complete a gradual rollout.
DesiredInstanceCount (used with UpdateEndpointWeightsAndCapacities operation): Specifies a new desired number of instances for a model variant, allowing dynamic scaling in response to demand changes.
- Use Case: Scaling the model variant in or out based on demand fluctuations to maintain performance while controlling costs.