TD3¶
-
class
TD3
(model, gamma=None, tau=None, actor_lr=None, critic_lr=None, policy_noise=0.2, noise_clip=0.5, policy_freq=2)[source]¶ Bases:
parl.core.paddle.algorithm.Algorithm
-
__init__
(model, gamma=None, tau=None, actor_lr=None, critic_lr=None, policy_noise=0.2, noise_clip=0.5, policy_freq=2)[source]¶ TD3 algorithm
Parameters: - model (parl.Model) – forward network of actor and critic.
- gamma (float) – discounted factor for reward computation
- tau (float) – decay coefficient when updating the weights of self.target_model with self.model
- actor_lr (float) – learning rate of the actor model
- critic_lr (float) – learning rate of the critic model
- policy_noise (float) – noise added to target policy during critic update
- noise_clip (float) – range to clip target policy noise
- policy_freq (int) – frequency of delayed policy updates
-