Manage Deployments
Autoscale the Deployment

Autoscale the Deployment

You could enable autoscaling for your deployment by setting the --min-replicas and --max-replicas flags when deploying your model.

$ mdz deploy --image modelzai/llm-blomdz-560m:23.06.13 --min-replicas 1 --max-replicas 3

How it works

The autoscaler will scale your deployment based on the number of requests it receives. It will scale up the deployment if the number of requests is higher than the threshold, and scale down the deployment if the number of requests is lower than the threshold.

The requests will be load balanced between the replicas of your deployment.

Scale from zero

You could set the --min-replicas flag to 0 to scale your deployment from zero.

$ mdz deploy --image modelzai/llm-blomdz-560m:23.06.13 --min-replicas 0 --max-replicas 1

The deployment will be scaled up when the first request is received. The deployment will be scaled down to 0 if there is no request for a while (default 10 minutes).

💡

The first request may take a while to be processed because the deployment needs to be scaled up. It depends on the size of your model and the image you're using.

Configure the autoscaler

The autoscaler scales the deployment horizontally between a minimum and maximum number of replicas, or to zero.

You can configure the autoscaler to scale based on the number of concurrent requests. Configuration options are shown here:

  • min-replicas: Minimum number of replicas to scale to. Defaults to 1.
  • max-replicas: Maximum number of replicas to scale to. Defaults to 1.
  • target-load: Ideal number of concurrent requests per replica. Defaults to 10.
  • scale-to-zero-duration: Idle duration before scaling to zero. Defaults to 10 minutes.
  • startup-duration: Duration to wait for the deployment to start up. Defaults to 10 minutes.

Currently target-load, scale-to-zero-duration and startup-duration are not configurable in mdz for the sake of simplicity.