reagent.gym.policies package

Subpackages

Submodules

reagent.gym.policies.policy module

class reagent.gym.policies.policy.Policy(scorer: Union[Callable[[Any, Optional[torch.Tensor]], Any], Callable[[Any], Any]], sampler: reagent.gym.types.Sampler)

Bases: object

act(obs: Any, possible_actions_mask: Optional[torch.Tensor] = None) reagent.core.types.ActorOutput

Performs the composition described above. These are the actions being put into the replay buffer, not necessary the actions taken by the environment!

reagent.gym.policies.predictor_policies module

class reagent.gym.policies.predictor_policies.ActorPredictorPolicy(predictor)

Bases: reagent.gym.policies.policy.Policy

act(obs: Union[reagent.core.types.ServingFeatureData, Tuple[torch.Tensor, torch.Tensor]], possible_actions_mask: Optional[torch.Tensor] = None) reagent.core.types.ActorOutput

Input is either state_with_presence, or ServingFeatureData (in the case of sparse features)

class reagent.gym.policies.predictor_policies.DiscreteDQNPredictorPolicy(wrapped_dqn_predictor, rl_parameters: Optional[reagent.core.parameters.RLParameters])

Bases: reagent.gym.policies.policy.Policy

act(obs: Union[reagent.core.types.ServingFeatureData, Tuple[torch.Tensor, torch.Tensor]], possible_actions_mask: Optional[torch.Tensor]) reagent.core.types.ActorOutput

Input is either state_with_presence, or ServingFeatureData (in the case of sparse features)

reagent.gym.policies.predictor_policies.create_predictor_policy_from_model(serving_module, **kwargs) reagent.gym.policies.policy.Policy

serving_module is the result of ModelManager.build_serving_module(). This function creates a Policy for gym environments.

reagent.gym.policies.random_policies module

class reagent.gym.policies.random_policies.ContinuousRandomPolicy(low: torch.Tensor, high: torch.Tensor)

Bases: reagent.gym.policies.policy.Policy

act(obs: reagent.core.types.FeatureData, possible_actions_mask: Optional[torch.Tensor] = None) reagent.core.types.ActorOutput

Act randomly regardless of the observation.

classmethod create_for_env(env: gym.core.Env)
class reagent.gym.policies.random_policies.DiscreteRandomPolicy(num_actions: int)

Bases: reagent.gym.policies.policy.Policy

act(obs: reagent.core.types.FeatureData, possible_actions_mask: Optional[torch.Tensor] = None) reagent.core.types.ActorOutput

Act randomly regardless of the observation.

classmethod create_for_env(env: gym.core.Env)
class reagent.gym.policies.random_policies.MultiDiscreteRandomPolicy(num_action_vec: List[int])

Bases: reagent.gym.policies.policy.Policy

act(obs: reagent.core.types.FeatureData, possible_actions_mask: Optional[torch.Tensor] = None) reagent.core.types.ActorOutput

Performs the composition described above. These are the actions being put into the replay buffer, not necessary the actions taken by the environment!

classmethod create_for_env(env: gym.core.Env)
reagent.gym.policies.random_policies.make_random_policy_for_env(env: gym.core.Env) reagent.gym.policies.policy.Policy

Module contents

class reagent.gym.policies.Policy(scorer: Union[Callable[[Any, Optional[torch.Tensor]], Any], Callable[[Any], Any]], sampler: reagent.gym.types.Sampler)

Bases: object

act(obs: Any, possible_actions_mask: Optional[torch.Tensor] = None) reagent.core.types.ActorOutput

Performs the composition described above. These are the actions being put into the replay buffer, not necessary the actions taken by the environment!