DDPG¶
-
class
DDPG
(model, gamma=None, tau=None, actor_lr=None, critic_lr=None)[source]¶ Bases:
parl.core.paddle.algorithm.Algorithm
-
__init__
(model, gamma=None, tau=None, actor_lr=None, critic_lr=None)[source]¶ DDPG algorithm
Parameters: - model (parl.Model) – forward network of actor and critic.
- gamma (float) – discounted factor for reward computation
- tau (float) – decay coefficient when updating the weights of self.target_model with self.model
- actor_lr (float) – learning rate of the actor model
- critic_lr (float) – learning rate of the critic model
-
learn
(obs, action, reward, next_obs, terminal)[source]¶ Define the loss function and create an optimizer to minize the loss.
-
sync_target
(decay=None)[source]¶ update the target network with the training network
Parameters: decay (float) – the decaying factor while updating the target network with the training network. 0 represents the assignment. None represents updating the target network slowly that depends on the hyperparameter tau.
-