![]() ![]() beta ( float) – Weight coefficient for the entropy regularizaiton term.t_max ( int) – The model is updated after every t_max local steps.A3C ( model, optimizer, t_max, gamma, beta=0.01, process_idx=0, phi=>, pi_loss_coef=1.0, v_loss_coef=0.5, keep_loss_scale_same=False, normalize_grad_by_t_max=False, use_average_reward=False, average_reward_tau=0.01, act_deterministically=False, average_entropy_decay=0.999, average_value_decay=0.999, batch_states= ) ¶Ī3C: Asynchronous Advantage Actor-Critic. batch_states ( callable) – method which makes a batch of observations.ĭefault is _states.batch_statesĬlass chainerrl.agents.act_deterministically ( bool) – If set true, choose most probable actions.average_value_decay ( float) – Decay rate of average value.average_entropy_decay ( float) – Decay rate of average entropy.average_actor_loss_decay ( float) – Decay rate of average actor loss. ![]() use_gae ( bool) – use generalized advantage estimation(GAE).entropy_coeff ( float) – Weight coefficient for the loss of the entropy.v_loss_coef ( float) – Weight coefficient for the loss of the value.pi_loss_coef ( float) – Weight coefficient for the loss of the policy.phi ( callable) – Feature extractor function.update_steps ( int) – The number of update steps.gpu ( int) – GPU device id if not None nor negative.num_processes ( int) – The number of processes.optimizer ( chainer.Optimizer) – optimizer used to train the model.A2C is a synchronous, deterministic variant of Asynchronous Advantage Actor Critic (A3C). A2C ( model, optimizer, gamma, num_processes, gpu=None, update_steps=5, phi=>, pi_loss_coef=1.0, v_loss_coef=0.5, entropy_coeff=0.01, use_gae=False, tau=0.95, act_deterministically=False, average_actor_loss_decay=0.999, average_entropy_decay=0.999, average_value_decay=0.999, batch_states= ) ¶Ī2C: Advantage Actor-Critic. Agent implementations ¶ class chainerrl.agents. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |