ml.rl.training.gradient_free package

Submodules

ml.rl.training.gradient_free.es_worker module

class ml.rl.training.gradient_free.es_worker.EsWorker(individual_pool: ml.rl.training.gradient_free.evolution_pool.EvolutionPool, es_params: ml.rl.parameters.EvolutionParameters, process_group: torch.distributed.ProcessGroup, num_nodes: int)

Bases: object

run_epoch() → float

ml.rl.training.gradient_free.evolution_pool module

class ml.rl.training.gradient_free.evolution_pool.EvolutionPool(seed: int, es_params: ml.rl.parameters.EvolutionParameters, tensor_sizes: Dict[str, List[int]])

Bases: object

Handles spawning new individuals from a parent, computing an estimated gradient, and applying that gradient to mutate the parent.

apply_global_reward(rewards: torch.Tensor, next_iteration: int)
compute_all_local_rewards()
compute_local_reward(individual)

Given an individual as a list of tensors, return the reward of this policy

populate_children(iteration: int)
class ml.rl.training.gradient_free.evolution_pool.OneMaxEvolutionPool(seed: int, es_params: ml.rl.parameters.EvolutionParameters, tensor_sizes: Dict[str, List[int]])

Bases: ml.rl.training.gradient_free.evolution_pool.EvolutionPool

A simple example of an evolution pool. The agent gets maximum reward as the tensor approaches [inf, -inf, inf, -inf, …]

compute_local_reward(individual)

Given an individual as a list of tensors, return the reward of this policy

Module contents