reagent.training.gradient_free package

Submodules

reagent.training.gradient_free.ars_util module

class reagent.training.gradient_free.ars_util.ARSOptimizer(feature_dim, n_pert=10, rand_ars_params=False, alpha=1, noise=1, b_top=None)

Bases: object

ARSOptimizer is supposed to maximize an objective function

sample_perturbed_params()

Return tuples of (pos_param, neg_param)

update_ars_params(rewards: torch.Tensor)

reward should be something like [reward_pert1_pos, reward_pert1_neg, reward_pert2_pos, reward_pert2_neg, …]

reagent.training.gradient_free.es_worker module

class reagent.training.gradient_free.es_worker.EsWorker(individual_pool: reagent.training.gradient_free.evolution_pool.EvolutionPool, es_params: reagent.core.parameters.EvolutionParameters, process_group: torch._C._distributed_c10d.ProcessGroup, num_nodes: int)

Bases: object

run_epoch() float

reagent.training.gradient_free.evolution_pool module

class reagent.training.gradient_free.evolution_pool.EvolutionPool(seed: int, es_params: reagent.core.parameters.EvolutionParameters, tensor_sizes: Dict[str, List[int]])

Bases: object

Handles spawning new individuals from a parent, computing an estimated gradient, and applying that gradient to mutate the parent.

apply_global_reward(rewards: torch.Tensor, next_iteration: int)
compute_all_local_rewards()
compute_local_reward(individual)

Given an individual as a list of tensors, return the reward of this policy

populate_children(iteration: int)
class reagent.training.gradient_free.evolution_pool.OneMaxEvolutionPool(seed: int, es_params: reagent.core.parameters.EvolutionParameters, tensor_sizes: Dict[str, List[int]])

Bases: reagent.training.gradient_free.evolution_pool.EvolutionPool

A simple example of an evolution pool. The agent gets maximum reward as the tensor approaches [inf, -inf, inf, -inf, …]

compute_local_reward(individual)

Given an individual as a list of tensors, return the reward of this policy

Module contents