GPU ClusterΒΆ

Author: wuzewu@baidu.com

This tutorial demonstrates how to set up a GPU cluster.

First we run the following command to launch a GPU cluster with the port 8002.

xparl start --port 8002 --gpu_cluster

Then we add GPU resource of the computation server to the cluster. Users should specify the GPUs added to the cluster with the argument --gpu.

The following command is an example that adds the first 4 GPUs into the cluster.

xparl connect --address ${CLUSTER_IP}:${CLUSTER_PORT} --gpu 0,1,2,3

Once the GPU cluster based on xparl has been established, we can leverage the parl.remote_class decorator to execute parallel computations. The number of GPUs to be utilized can be specified by the n_gpu argument.

Here is an entry level example to test the GPU cluster we have set up.

import parl
import os

# connect to the Cluster. replace the ip and port with the actual IP address.
parl.connect("localhost:8002")

# Use a decorator to decorate a local class, which will be sent to a remote instance.
# n_gpu=2 means that this Actor will be allocated two GPU cards.
@parl.remote_class(n_gpu=2)
class Actor:
    def get_device(self):
        return os.environ['CUDA_VISIBLE_DEVICES']

    def step(self, a, b):
        return a + b

actor = Actor()
# execute remotely and return the value of the CUDA_VISIBLE_DEVICES environment variable.
print(actor.get_device())