SageMaker Inference Recommender Default: The Default mode of SageMaker Inference Recommender is designed to simplify the process of deploying machine learning models. It automates load testing and model tuning across SageMaker ML instances, helping you deploy your model to a real-time or serverless inference endpoint that delivers the best performance at the lowest cost. It helps you select the best instance type and configuration for your ML models and workloads.
For example, if you have a credit card fraud detection model, the Default mode can help you find the optimal inference instance type and ML system configurations that can detect fraudulent credit card transactions in milliseconds.
SageMaker Inference Recommender Advanced: The Advanced mode of SageMaker Inference Recommender offers more flexibility and control. It allows you to run a set of load tests to recommend the right ML instance types for any ML use case. This mode is particularly useful when you have specific service level agreement (SLA) requirements with respect to latency, throughput, and cost metrics.
For instance, in the case of the credit card fraud detection model, the Advanced mode allows you to set up Inference Recommender jobs with a custom load to meet inference SLA requirements to satisfy peak concurrency of 30,000 transactions per minute while serving predictions results in less than 100 milliseconds.
In summary, while both modes aim to optimize cost and performance, the Default mode is more automated and the Advanced mode offers more flexibility and control.