Create Customized Algorithms

Goal of this tutorial:

  • Learn how to implement your own algorithms.

Overview

To build a new algorithm, you need to inherit class parl.Algorithm and implement three basic functions: predict and learn.

Methods

  • __init__

    As algorithms update weights of the models, this method needs to define some models inherited from parl.Model, like self.model in this example. You can also set some hyperparameters in this method, like learning_rate, reward_decay and action_dimension, which might be used in the following steps.

  • predict

    This function defines how to choose actions. For instance, you can use a policy model to predict actions.

  • learn

    Define loss function in learn method, which will be used to update weights of self.model.

Example: DQN

This example shows how to implement DQN algorithm based on class parl.Algorithm according to the steps mentioned above.

Within class DQN(Algorithm), we define the following methods:

  • __init__(self, model, gamma=None, lr=None)

    We define self.model and self.target_model of DQN in this method, which are instances of class parl.Model. And we also set hyperparameters gamma and lr here. We will use these parameters in learn method.

    def __init__(self, model, gamma=None, lr=None):
        """ DQN algorithm
    
        Args:
            model (parl.Model): forward neural network representing the Q function.
            gamma (float): discounted factor for `accumulative` reward computation
            lr (float): learning rate.
        """
        self.model = model
        self.target_model = copy.deepcopy(model)
    
        assert isinstance(gamma, float)
        assert isinstance(lr, float)
    
        self.gamma = gamma
        self.lr = lr
    
        self.mse_loss = paddle.nn.MSELoss(reduction='mean')
        self.optimizer = paddle.optimizer.Adam(
            learning_rate=lr, parameters=self.model.parameters())
    
  • predict(self, obs)

    We use the forward network defined in self.model here, which uses observations to predict action values directly.

    def predict(self, obs):
        """ use self.model (Q function) to predict the action values
        """
        return self.model.value(obs)
    
  • learn(self, obs, action, reward, next_obs, terminal)

    learn method calculates the cost of value function according to the predict value and the target value. Agent will use the cost to update weights in self.model.

    def learn(self, obs, action, reward, next_obs, terminal):
        """ update the Q function (self.model) with DQN algorithm
        """
        # Q
        pred_values = self.model.value(obs)
        action_dim = pred_values.shape[-1]
        action = paddle.squeeze(action, axis=-1)
        action_onehot = paddle.nn.functional.one_hot(
            action, num_classes=action_dim)
        pred_value = paddle.multiply(pred_values, action_onehot)
        pred_value = paddle.sum(pred_value, axis=1, keepdim=True)
    
        # target Q
        with paddle.no_grad():
            max_v = self.target_model.value(next_obs).max(1, keepdim=True)
            target = reward + (1 - terminal) * self.gamma * max_v
    
        loss = self.mse_loss(pred_value, target)
    
        # optimize
        self.optimizer.clear_grad()
        loss.backward()
        self.optimizer.step()
    
        return loss
    
  • sync_target(self)

    Use this method to synchronize the weights in self.target_model with those in self.model. This is the step used in DQN algorithm.

    def sync_target(self):
    
        self.model.sync_weights_to(self.target_model)