OAC¶
- class OAC(model, gamma=None, tau=None, alpha=None, beta=None, delta=None, actor_lr=None, critic_lr=None)[源代码]¶
基类:
Algorithm
- __init__(model, gamma=None, tau=None, alpha=None, beta=None, delta=None, actor_lr=None, critic_lr=None)[源代码]¶
OAC algorithm
- 参数:
model (parl.Model) – forward network of actor and critic.
gamma (float) – discounted factor for reward computation
tau (float) – decay coefficient when updating the weights of self.target_model with self.model
alpha (float) – Temperature parameter determines the relative importance of the entropy against the reward
beta (float) – determines the relative importance of sigma_Q
delta (float) – determines the relative changes of exploration`s mean
actor_lr (float) – learning rate of the actor model
critic_lr (float) – learning rate of the critic model