reagent.gym.tests package
Subpackages
Submodules
reagent.gym.tests.test_gym module
- class reagent.gym.tests.test_gym.TestGym(*args: Any, **kwargs: Any)
Bases:
reagent.test.base.horizon_test_base.HorizonTestBase
- test_online_episode_gym_cpu = None
- test_online_episode_gym_cpu_0_REINFORCE_Cartpole_online()
- test_online_episode_gym_cpu_1_PPO_Cartpole_online()
- test_replay_buffer_gym_cpu_1 = None
- test_replay_buffer_gym_cpu_1_0_Discrete_CRR_Cartpole()
- test_replay_buffer_gym_cpu_1_1_Discrete_DQN_Cartpole()
- test_replay_buffer_gym_cpu_1_2_Discrete_C51_Cartpole()
- test_replay_buffer_gym_cpu_1_3_Discrete_QR_Cartpole()
- test_replay_buffer_gym_cpu_1_4_Discrete_DQN_Open_Gridworld()
- test_replay_buffer_gym_cpu_1_5_SAC_Pendulum()
- test_replay_buffer_gym_cpu_1_6_Continuous_CRR_Pendulum()
- test_replay_buffer_gym_cpu_1_7_TD3_Pendulum()
- test_replay_buffer_gym_cpu_2 = None
- test_replay_buffer_gym_cpu_2_0_Parametric_DQN_Cartpole()
- test_replay_buffer_gym_cpu_2_1_Parametric_SARSA_Cartpole()
- test_replay_buffer_gym_cpu_2_2_SlateQ_RecSim()
- test_replay_buffer_gym_cpu_2_3_SlateQ_RecSim_with_Discount_Scaled_by_Time_Diff()
- test_replay_buffer_gym_cpu_2_4_SlateQ_RecSim_multi_selection()
- test_replay_buffer_gym_cpu_2_5_SlateQ_RecSim_multi_selection_average_by_current_slate_size()
- test_replay_buffer_gym_cpu_2_6_PossibleActionsMask_DQN()
- test_replay_buffer_gym_gpu_1 = None
- test_replay_buffer_gym_gpu_1_0_Discrete_CRR_Cartpole()
- test_replay_buffer_gym_gpu_1_1_Discrete_DQN_Cartpole()
- test_replay_buffer_gym_gpu_1_2_Discrete_C51_Cartpole()
- test_replay_buffer_gym_gpu_1_3_Discrete_QR_Cartpole()
- test_replay_buffer_gym_gpu_1_4_Discrete_DQN_Open_Gridworld()
- test_replay_buffer_gym_gpu_1_5_SAC_Pendulum()
- test_replay_buffer_gym_gpu_1_6_Continuous_CRR_Pendulum()
- test_replay_buffer_gym_gpu_1_7_TD3_Pendulum()
- test_replay_buffer_gym_gpu_2 = None
- test_replay_buffer_gym_gpu_2_0_Parametric_DQN_Cartpole()
- test_replay_buffer_gym_gpu_2_1_Parametric_SARSA_Cartpole()
- test_replay_buffer_gym_gpu_2_2_SlateQ_RecSim()
- test_replay_buffer_gym_gpu_2_3_SlateQ_RecSim_with_Discount_Scaled_by_Time_Diff()
- test_replay_buffer_gym_gpu_2_4_SlateQ_RecSim_multi_selection()
- test_replay_buffer_gym_gpu_2_5_SlateQ_RecSim_multi_selection_average_by_current_slate_size()
- test_replay_buffer_gym_gpu_2_6_PossibleActionsMask_DQN()
- reagent.gym.tests.test_gym.eval_policy(env: reagent.gym.envs.env_wrapper.EnvWrapper, serving_policy: reagent.gym.policies.policy.Policy, num_eval_episodes: int, serving: bool = True) numpy.ndarray
- reagent.gym.tests.test_gym.identity_collate(batch)
- reagent.gym.tests.test_gym.run_test_online_episode(env: reagent.gym.envs.Env__Union, model: reagent.model_managers.union.ModelManager__Union, num_train_episodes: int, passing_score_bar: float, num_eval_episodes: int, use_gpu: bool)
Run an online learning test. At the end of each episode training is run on the trajectory.
- reagent.gym.tests.test_gym.run_test_replay_buffer(env: reagent.gym.envs.Env__Union, model: reagent.model_managers.union.ModelManager__Union, replay_memory_size: int, train_every_ts: int, train_after_ts: int, num_train_episodes: int, passing_score_bar: float, num_eval_episodes: int, use_gpu: bool, minibatch_size: Optional[int] = None)
Run an online learning test with a replay buffer. The replay buffer is pre-filled, then the training starts. Each transition is added to the replay buffer immediately after it takes place.
reagent.gym.tests.test_gym_datasets module
reagent.gym.tests.test_gym_offline module
- class reagent.gym.tests.test_gym_offline.TestGymOffline(*args: Any, **kwargs: Any)
Bases:
reagent.test.base.horizon_test_base.HorizonTestBase
- test_gym_offline_cpu = None
- test_gym_offline_cpu_0_CEM_Cartpole()
- test_gym_offline_cpu_1_CEM_Single_World_Model_Linear_Dynamics()
- test_gym_offline_cpu_2_CEM_Many_World_Models_Linear_Dynamics()
- test_gym_offline_gpu = None
- test_gym_offline_gpu_0_CEM_Cartpole()
- test_gym_offline_gpu_1_CEM_Single_World_Model_Linear_Dynamics()
- test_gym_offline_gpu_2_CEM_Many_World_Models_Linear_Dynamics()
- reagent.gym.tests.test_gym_offline.evaluate_cem(env, manager, trainer_module, num_eval_episodes: int)
- reagent.gym.tests.test_gym_offline.identity_collate(batch)
- reagent.gym.tests.test_gym_offline.run_test_offline(env_name: str, model: reagent.model_managers.union.ModelManager__Union, replay_memory_size: int, num_batches_per_epoch: int, num_train_epochs: int, passing_score_bar: float, num_eval_episodes: int, minibatch_size: int, use_gpu: bool)
reagent.gym.tests.test_gym_replay_buffer module
- class reagent.gym.tests.test_gym_replay_buffer.TestEnv(env)
Bases:
reagent.gym.envs.wrappers.simple_minigrid.SimpleObsWrapper
Wrap Gym environment in TestEnv to save the MiniGrid’s observation, action, reward and terminal in a list so that we can check if replay buffer is working correctly
- reset(**kwargs)
Resets the environment to an initial state and returns an initial observation.
Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.
- Returns
the initial observation.
- Return type
observation (object)
- seed(*args, **kwargs)
Sets the seed for this env’s random number generator(s).
Note
Some environments use multiple pseudorandom number generators. We want to capture all such seeds used in order to ensure that there aren’t accidental correlations between multiple generators.
- Returns
- Returns the list of seeds used in this env’s random
number generators. The first value in the list should be the “main” seed, or the value which a reproducer should pass to ‘seed’. Often, the main seed equals the provided ‘seed’, but this won’t be true if seed=None, for example.
- Return type
list<bigint>
- step(action)
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Parameters
action (object) – an action provided by the agent
- Returns
agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- Return type
observation (object)
reagent.gym.tests.test_linear_dynamics module
- class reagent.gym.tests.test_linear_dynamics.TestLinearDynamicsEnvironment(methodName='runTest')
Bases:
unittest.case.TestCase
- run_n_episodes(env, num_episodes, policy)
- setUp()
Hook method for setting up the test fixture before exercising it.
- test_random_vs_lqr()
Test random actions vs. a LQR controller. LQR controller should perform much better than random actions in the linear dynamics environment.
reagent.gym.tests.test_pomdp module
reagent.gym.tests.test_world_model module
- class reagent.gym.tests.test_world_model.TestWorldModel(*args: Any, **kwargs: Any)
Bases:
reagent.test.base.horizon_test_base.HorizonTestBase
- test_mdnrnn()
Test MDNRNN feature importance and feature sensitivity.
- test_world_model()
Train DQN on POMDP given features from world model.
- static verify_result(result_dict: Dict[str, float], expected_top_features: List[str])
- reagent.gym.tests.test_world_model.calculate_feature_importance(env: gym.core.Env, trainer: reagent.training.world_model.mdnrnn_trainer.MDNRNNTrainer, use_gpu: bool, test_batch: reagent.core.types.MemoryNetworkInput)
- reagent.gym.tests.test_world_model.calculate_feature_sensitivity(env: reagent.gym.envs.env_wrapper.EnvWrapper, trainer: reagent.training.world_model.mdnrnn_trainer.MDNRNNTrainer, use_gpu: bool, test_batch: reagent.core.types.MemoryNetworkInput)
- reagent.gym.tests.test_world_model.create_embed_rl_dataset(env: reagent.gym.envs.env_wrapper.EnvWrapper, memory_network: reagent.models.world_model.MemoryNetwork, num_state_embed_transitions: int, batch_size: int, seq_len: int, hidden_dim: int, use_gpu: bool)
- reagent.gym.tests.test_world_model.train_mdnrnn(env: reagent.gym.envs.env_wrapper.EnvWrapper, trainer: reagent.training.world_model.mdnrnn_trainer.MDNRNNTrainer, trainer_preprocessor, num_train_transitions: int, seq_len: int, batch_size: int, num_train_epochs: int, test_replay_buffer=None)
- reagent.gym.tests.test_world_model.train_mdnrnn_and_compute_feature_stats(env_name: str, model: reagent.model_managers.union.ModelManager__Union, num_train_transitions: int, num_test_transitions: int, seq_len: int, batch_size: int, num_train_epochs: int, use_gpu: bool, saved_mdnrnn_path: Optional[str] = None)
Train MDNRNN Memory Network and compute feature importance/sensitivity.
- reagent.gym.tests.test_world_model.train_mdnrnn_and_train_on_embedded_env(env_name: str, embedding_model: reagent.model_managers.union.ModelManager__Union, num_embedding_train_transitions: int, seq_len: int, batch_size: int, num_embedding_train_epochs: int, train_model: reagent.model_managers.union.ModelManager__Union, num_state_embed_transitions: int, num_agent_train_epochs: int, num_agent_eval_epochs: int, use_gpu: bool, passing_score_bar: float, saved_mdnrnn_path: Optional[str] = None)
Train an agent on embedded states by the MDNRNN.