DDPG¶

class DDPG(model, gamma=None, tau=None, actor_lr=None, critic_lr=None)[source]¶

__init__(model, gamma=None, tau=None, actor_lr=None, critic_lr=None)[source]¶

DDPG algorithm

Parameters:	model (parl.Model) – forward network of actor and critic. gamma (float) – discounted factor for reward computation tau (float) – decay coefficient when updating the weights of self.target_model with self.model actor_lr (float) – learning rate of the actor model critic_lr (float) – learning rate of the critic model

learn(obs, action, reward, next_obs, terminal)[source]¶: Define the loss function and create an optimizer to minize the loss.

predict(obs)[source]¶: Refine the predicting process, e.g,. use the policy model to predict actions.

sync_target(decay=None)[source]¶

update the target network with the training network

Parameters:	decay (float) – the decaying factor while updating the target network with the training network. 0 represents the assignment. None represents updating the target network slowly that depends on the hyperparameter tau.