reagent.gym.tests package

Subpackages

Submodules

reagent.gym.tests.test_gym module

class reagent.gym.tests.test_gym.TestGym(*args: Any, **kwargs: Any)

Bases: reagent.test.base.horizon_test_base.HorizonTestBase

test_online_episode_gym_cpu = None
test_online_episode_gym_cpu_0_REINFORCE_Cartpole_online()
test_online_episode_gym_cpu_1_PPO_Cartpole_online()
test_replay_buffer_gym_cpu_1 = None
test_replay_buffer_gym_cpu_1_0_Discrete_CRR_Cartpole()
test_replay_buffer_gym_cpu_1_1_Discrete_DQN_Cartpole()
test_replay_buffer_gym_cpu_1_2_Discrete_C51_Cartpole()
test_replay_buffer_gym_cpu_1_3_Discrete_QR_Cartpole()
test_replay_buffer_gym_cpu_1_4_Discrete_DQN_Open_Gridworld()
test_replay_buffer_gym_cpu_1_5_SAC_Pendulum()
test_replay_buffer_gym_cpu_1_6_Continuous_CRR_Pendulum()
test_replay_buffer_gym_cpu_1_7_TD3_Pendulum()
test_replay_buffer_gym_cpu_2 = None
test_replay_buffer_gym_cpu_2_0_Parametric_DQN_Cartpole()
test_replay_buffer_gym_cpu_2_1_Parametric_SARSA_Cartpole()
test_replay_buffer_gym_cpu_2_2_SlateQ_RecSim()
test_replay_buffer_gym_cpu_2_3_SlateQ_RecSim_with_Discount_Scaled_by_Time_Diff()
test_replay_buffer_gym_cpu_2_4_SlateQ_RecSim_multi_selection()
test_replay_buffer_gym_cpu_2_5_SlateQ_RecSim_multi_selection_average_by_current_slate_size()
test_replay_buffer_gym_cpu_2_6_PossibleActionsMask_DQN()
test_replay_buffer_gym_gpu_1 = None
test_replay_buffer_gym_gpu_1_0_Discrete_CRR_Cartpole()
test_replay_buffer_gym_gpu_1_1_Discrete_DQN_Cartpole()
test_replay_buffer_gym_gpu_1_2_Discrete_C51_Cartpole()
test_replay_buffer_gym_gpu_1_3_Discrete_QR_Cartpole()
test_replay_buffer_gym_gpu_1_4_Discrete_DQN_Open_Gridworld()
test_replay_buffer_gym_gpu_1_5_SAC_Pendulum()
test_replay_buffer_gym_gpu_1_6_Continuous_CRR_Pendulum()
test_replay_buffer_gym_gpu_1_7_TD3_Pendulum()
test_replay_buffer_gym_gpu_2 = None
test_replay_buffer_gym_gpu_2_0_Parametric_DQN_Cartpole()
test_replay_buffer_gym_gpu_2_1_Parametric_SARSA_Cartpole()
test_replay_buffer_gym_gpu_2_2_SlateQ_RecSim()
test_replay_buffer_gym_gpu_2_3_SlateQ_RecSim_with_Discount_Scaled_by_Time_Diff()
test_replay_buffer_gym_gpu_2_4_SlateQ_RecSim_multi_selection()
test_replay_buffer_gym_gpu_2_5_SlateQ_RecSim_multi_selection_average_by_current_slate_size()
test_replay_buffer_gym_gpu_2_6_PossibleActionsMask_DQN()
reagent.gym.tests.test_gym.eval_policy(env: reagent.gym.envs.env_wrapper.EnvWrapper, serving_policy: reagent.gym.policies.policy.Policy, num_eval_episodes: int, serving: bool = True) numpy.ndarray
reagent.gym.tests.test_gym.identity_collate(batch)
reagent.gym.tests.test_gym.run_test_online_episode(env: reagent.gym.envs.Env__Union, model: reagent.model_managers.union.ModelManager__Union, num_train_episodes: int, passing_score_bar: float, num_eval_episodes: int, use_gpu: bool)

Run an online learning test. At the end of each episode training is run on the trajectory.

reagent.gym.tests.test_gym.run_test_replay_buffer(env: reagent.gym.envs.Env__Union, model: reagent.model_managers.union.ModelManager__Union, replay_memory_size: int, train_every_ts: int, train_after_ts: int, num_train_episodes: int, passing_score_bar: float, num_eval_episodes: int, use_gpu: bool, minibatch_size: Optional[int] = None)

Run an online learning test with a replay buffer. The replay buffer is pre-filled, then the training starts. Each transition is added to the replay buffer immediately after it takes place.

reagent.gym.tests.test_gym_datasets module

class reagent.gym.tests.test_gym_datasets.TestEpisodicDataset(methodName='runTest')

Bases: unittest.case.TestCase

setUp()

Hook method for setting up the test fixture before exercising it.

test_episodic_dataset()

reagent.gym.tests.test_gym_offline module

class reagent.gym.tests.test_gym_offline.TestGymOffline(*args: Any, **kwargs: Any)

Bases: reagent.test.base.horizon_test_base.HorizonTestBase

test_gym_offline_cpu = None
test_gym_offline_cpu_0_CEM_Cartpole()
test_gym_offline_cpu_1_CEM_Single_World_Model_Linear_Dynamics()
test_gym_offline_cpu_2_CEM_Many_World_Models_Linear_Dynamics()
test_gym_offline_gpu = None
test_gym_offline_gpu_0_CEM_Cartpole()
test_gym_offline_gpu_1_CEM_Single_World_Model_Linear_Dynamics()
test_gym_offline_gpu_2_CEM_Many_World_Models_Linear_Dynamics()
reagent.gym.tests.test_gym_offline.evaluate_cem(env, manager, trainer_module, num_eval_episodes: int)
reagent.gym.tests.test_gym_offline.identity_collate(batch)
reagent.gym.tests.test_gym_offline.run_test_offline(env_name: str, model: reagent.model_managers.union.ModelManager__Union, replay_memory_size: int, num_batches_per_epoch: int, num_train_epochs: int, passing_score_bar: float, num_eval_episodes: int, minibatch_size: int, use_gpu: bool)

reagent.gym.tests.test_gym_replay_buffer module

class reagent.gym.tests.test_gym_replay_buffer.TestEnv(env)

Bases: reagent.gym.envs.wrappers.simple_minigrid.SimpleObsWrapper

Wrap Gym environment in TestEnv to save the MiniGrid’s observation, action, reward and terminal in a list so that we can check if replay buffer is working correctly

reset(**kwargs)

Resets the environment to an initial state and returns an initial observation.

Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.

Returns

the initial observation.

Return type

observation (object)

seed(*args, **kwargs)

Sets the seed for this env’s random number generator(s).

Note

Some environments use multiple pseudorandom number generators. We want to capture all such seeds used in order to ensure that there aren’t accidental correlations between multiple generators.

Returns

Returns the list of seeds used in this env’s random

number generators. The first value in the list should be the “main” seed, or the value which a reproducer should pass to ‘seed’. Often, the main seed equals the provided ‘seed’, but this won’t be true if seed=None, for example.

Return type

list<bigint>

step(action)

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters

action (object) – an action provided by the agent

Returns

agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

Return type

observation (object)

class reagent.gym.tests.test_gym_replay_buffer.TestGymReplayBuffer(*args: Any, **kwargs: Any)

Bases: reagent.test.base.horizon_test_base.HorizonTestBase

test_create_df_from_replay_buffer()

reagent.gym.tests.test_linear_dynamics module

class reagent.gym.tests.test_linear_dynamics.TestLinearDynamicsEnvironment(methodName='runTest')

Bases: unittest.case.TestCase

run_n_episodes(env, num_episodes, policy)
setUp()

Hook method for setting up the test fixture before exercising it.

test_random_vs_lqr()

Test random actions vs. a LQR controller. LQR controller should perform much better than random actions in the linear dynamics environment.

reagent.gym.tests.test_pomdp module

class reagent.gym.tests.test_pomdp.TestPOMDPEnvironment(methodName='runTest')

Bases: unittest.case.TestCase

setUp()

Hook method for setting up the test fixture before exercising it.

test_pocman()
test_string_game()
test_string_game_v1()

reagent.gym.tests.test_world_model module

class reagent.gym.tests.test_world_model.TestWorldModel(*args: Any, **kwargs: Any)

Bases: reagent.test.base.horizon_test_base.HorizonTestBase

test_mdnrnn()

Test MDNRNN feature importance and feature sensitivity.

test_world_model()

Train DQN on POMDP given features from world model.

static verify_result(result_dict: Dict[str, float], expected_top_features: List[str])
reagent.gym.tests.test_world_model.calculate_feature_importance(env: gym.core.Env, trainer: reagent.training.world_model.mdnrnn_trainer.MDNRNNTrainer, use_gpu: bool, test_batch: reagent.core.types.MemoryNetworkInput)
reagent.gym.tests.test_world_model.calculate_feature_sensitivity(env: reagent.gym.envs.env_wrapper.EnvWrapper, trainer: reagent.training.world_model.mdnrnn_trainer.MDNRNNTrainer, use_gpu: bool, test_batch: reagent.core.types.MemoryNetworkInput)
reagent.gym.tests.test_world_model.create_embed_rl_dataset(env: reagent.gym.envs.env_wrapper.EnvWrapper, memory_network: reagent.models.world_model.MemoryNetwork, num_state_embed_transitions: int, batch_size: int, seq_len: int, hidden_dim: int, use_gpu: bool)
reagent.gym.tests.test_world_model.train_mdnrnn(env: reagent.gym.envs.env_wrapper.EnvWrapper, trainer: reagent.training.world_model.mdnrnn_trainer.MDNRNNTrainer, trainer_preprocessor, num_train_transitions: int, seq_len: int, batch_size: int, num_train_epochs: int, test_replay_buffer=None)
reagent.gym.tests.test_world_model.train_mdnrnn_and_compute_feature_stats(env_name: str, model: reagent.model_managers.union.ModelManager__Union, num_train_transitions: int, num_test_transitions: int, seq_len: int, batch_size: int, num_train_epochs: int, use_gpu: bool, saved_mdnrnn_path: Optional[str] = None)

Train MDNRNN Memory Network and compute feature importance/sensitivity.

reagent.gym.tests.test_world_model.train_mdnrnn_and_train_on_embedded_env(env_name: str, embedding_model: reagent.model_managers.union.ModelManager__Union, num_embedding_train_transitions: int, seq_len: int, batch_size: int, num_embedding_train_epochs: int, train_model: reagent.model_managers.union.ModelManager__Union, num_state_embed_transitions: int, num_agent_train_epochs: int, num_agent_eval_epochs: int, use_gpu: bool, passing_score_bar: float, saved_mdnrnn_path: Optional[str] = None)

Train an agent on embedded states by the MDNRNN.

Module contents