reagent.training.gradient_free package
Submodules
reagent.training.gradient_free.ars_util module
- class reagent.training.gradient_free.ars_util.ARSOptimizer(feature_dim, n_pert=10, rand_ars_params=False, alpha=1, noise=1, b_top=None)
Bases:
object
ARSOptimizer is supposed to maximize an objective function
- sample_perturbed_params()
Return tuples of (pos_param, neg_param)
- update_ars_params(rewards: torch.Tensor)
reward should be something like [reward_pert1_pos, reward_pert1_neg, reward_pert2_pos, reward_pert2_neg, …]
reagent.training.gradient_free.es_worker module
- class reagent.training.gradient_free.es_worker.EsWorker(individual_pool: reagent.training.gradient_free.evolution_pool.EvolutionPool, es_params: reagent.core.parameters.EvolutionParameters, process_group: torch._C._distributed_c10d.ProcessGroup, num_nodes: int)
Bases:
object
- run_epoch() float
reagent.training.gradient_free.evolution_pool module
- class reagent.training.gradient_free.evolution_pool.EvolutionPool(seed: int, es_params: reagent.core.parameters.EvolutionParameters, tensor_sizes: Dict[str, List[int]])
Bases:
object
Handles spawning new individuals from a parent, computing an estimated gradient, and applying that gradient to mutate the parent.
- apply_global_reward(rewards: torch.Tensor, next_iteration: int)
- compute_all_local_rewards()
- compute_local_reward(individual)
Given an individual as a list of tensors, return the reward of this policy
- populate_children(iteration: int)
- class reagent.training.gradient_free.evolution_pool.OneMaxEvolutionPool(seed: int, es_params: reagent.core.parameters.EvolutionParameters, tensor_sizes: Dict[str, List[int]])
Bases:
reagent.training.gradient_free.evolution_pool.EvolutionPool
A simple example of an evolution pool. The agent gets maximum reward as the tensor approaches [inf, -inf, inf, -inf, …]
- compute_local_reward(individual)
Given an individual as a list of tensors, return the reward of this policy