Deploy on Any Cloud (AWS, GCP, Azure, Lambda Labs, etc.)

Deploy on Any Cloud (AWS, GCP, Azure, Lambda Labs, etc.)

It's super easy to deploy your models on any cloud. You could start from a simple model, on a single machine, and scale it to a cluster of machines.

Create a VM and start the mdz server

After you have created a VM, you could download the mdz CLI with PyPI, and start the mdz server. You could provide the public IP to allow the mdz server to be accessible from the outside world.

$ pip install mdz
$ mdz server start <public ip>
...
🎉 You could set the environment variable to get started!

export MDZ_URL=http://1.2.3.4.modelz.live

Deploy your model

You could deploy your model by using the mdz deploy command. We will deploy a stable diffusion web UI as an example.

$ mdz --debug deploy --image modelzai/gradio-stable-diffusion:23.03 --gpu 1 --port 7860 --name sdw
$ mdz list
 NAME  ENDPOINT                                                 STATUS  INVOCATIONS  REPLICAS 
 sdw   http://sdw-qh2n0y28ybqc36oc.146.235.213.84.modelz.live   Ready           174  1/1      
       http://146.235.213.84.modelz.live/inference/sdw.default                                

After the deployment is Ready, you could access the endpoint URL to access the web UI. It is http://sdw-qh2n0y28ybqc36oc.146.235.213.84.modelz.live in this example.

sd-webui

Scale your deployment

You could add more servers to support more traffic. First, you need to create a new VM on your cloud provider. Then, you could start the mdz server on the new VM and join it to the existing cluster.

$ mdz server join <ip>
$ mdz server list
 NAME   PHASE  ALLOCATABLE      CAPACITY        
 node1  Ready  cpu: 16          cpu: 16         
               mem: 32784748Ki  mem: 32784748Ki 
 node2  Ready  cpu: 16          cpu: 16         
               mem: 32784748Ki  mem: 32784748Ki 

The IP address is the internal IP address of the previous VM. After the new VM is joined to the cluster, you could scale your deployment by using the mdz scale command.

$ mdz scale sdw --replicas 2

The requests will be load balanced between the replicas of your deployment. You already get a cluster of two machines to serve your model!