reagent.gym.agents package

Submodules

reagent.gym.agents.agent module

class reagent.gym.agents.agent.Agent(policy: reagent.gym.policies.policy.Policy, post_transition_callback: Optional[Callable[[reagent.gym.types.Transition], None]] = None, post_episode_callback: Optional[Callable[[reagent.gym.types.Trajectory, Dict], None]] = None, obs_preprocessor=<function _id>, action_extractor=<function _id>, device: Optional[torch.device] = None)

Bases: object

act(obs: Any, possible_actions_mask: Optional[numpy.ndarray] = None) Tuple[Any, Optional[float]]

Act on a single observation

classmethod create_for_env(env: reagent.gym.envs.env_wrapper.EnvWrapper, policy: Optional[reagent.gym.policies.policy.Policy], *, device: Union[str, torch.device] = 'cpu', obs_preprocessor=None, action_extractor=None, **kwargs)

If policy is not given, we will try to create a random policy

classmethod create_for_env_with_serving_policy(env: reagent.gym.envs.env_wrapper.EnvWrapper, serving_policy: reagent.gym.policies.policy.Policy, *, obs_preprocessor=None, action_extractor=None, **kwargs)
post_episode(trajectory: reagent.gym.types.Trajectory, info: Dict)

to be called after step(action)

post_step(transition: reagent.gym.types.Transition)

to be called after step(action)

reagent.gym.agents.post_step module

reagent.gym.agents.post_step.add_replay_buffer_post_step(replay_buffer: reagent.replay_memory.circular_replay_buffer.ReplayBuffer, env: gym.core.Env, replay_buffer_inserter=None)

Simply add transitions to replay_buffer.

Module contents