Create Customized Algorithms

Goal of this tutorial:

  • Learn how to implement your own algorithms.

Overview

To build a new algorithm, you need to inherit class parl.Algorithm and implement three basic functions: sample, predict and learn.

Methods

  • __init__

    As algorithms update weights of the models, this method needs to define some models inherited from parl.Model, like self.model in this example. You can also set some hyperparameters in this method, like learning_rate, reward_decay and action_dimension, which might be used in the following steps.

  • predict

    This function defines how to choose actions. For instance, you can use a policy model to predict actions.

  • sample

    Based on predict method, sample generates actions with noises. Use this method to do exploration if needed.

  • learn

    Define loss function in learn method, which will be used to update weights of self.model.

Example: DQN

This example shows how to implement DQN algorithm based on class parl.Algorithm according to the steps mentioned above.

Within class DQN(Algorithm), we define the following methods:

  • __init__(self, model, act_dim=None, gamma=None, lr=None)

    We define self.model and self.target_model of DQN in this method, which are instances of class parl.Model. And we also set hyperparameters act_dim, gamma and lr here. We will use these parameters in learn method.

    def __init__(self,
                 model,
                 act_dim=None,
                 gamma=None,
                 lr=None):
        """ DQN algorithm
    
        Args:
            model (parl.Model): model defining forward network of Q function
            hyperparas (dict): (deprecated) dict of hyper parameters.
            act_dim (int): dimension of the action space
            gamma (float): discounted factor for reward computation.
            lr (float): learning rate.
        """
        self.model = model
        self.target_model = copy.deepcopy(model)
    
        assert isinstance(act_dim, int)
        assert isinstance(gamma, float)
        assert isinstance(lr, float)
        self.act_dim = act_dim
        self.gamma = gamma
        self.lr = lr
    
  • predict(self, obs)

    We use the forward network defined in self.model here, which uses observations to predict action values directly.

    def predict(self, obs):
            """ use value model self.model to predict the action value
            """
            return self.model.value(obs)
    
  • learn(self, obs, action, reward, next_obs, terminal)

    learn method calculates the cost of value function according to the predict value and the target value. Agent will use the cost to update weights in self.model.

    def learn(self, obs, action, reward, next_obs, terminal):
        """ update value model self.model with DQN algorithm
        """
    
        pred_value = self.model.value(obs)
        next_pred_value = self.target_model.value(next_obs)
        best_v = layers.reduce_max(next_pred_value, dim=1)
        best_v.stop_gradient = True
        target = reward + (
            1.0 - layers.cast(terminal, dtype='float32')) * self.gamma * best_v
    
        action_onehot = layers.one_hot(action, self.act_dim)
        action_onehot = layers.cast(action_onehot, dtype='float32')
        pred_action_value = layers.reduce_sum(
            layers.elementwise_mul(action_onehot, pred_value), dim=1)
        cost = layers.square_error_cost(pred_action_value, target)
        cost = layers.reduce_mean(cost)
        optimizer = fluid.optimizer.Adam(self.lr, epsilon=1e-3)
        optimizer.minimize(cost)
        return cost
    
  • sync_target(self)

    Use this method to synchronize the weights in self.target_model with those in self.model. This is the step used in DQN algorithm.

    def sync_target(self, gpu_id=None):
        """ sync weights of self.model to self.target_model
        """
        self.model.sync_weights_to(self.target_model)