reagent.gym.envs.functionality package


reagent.gym.envs.functionality.possible_actions_mask_tester module

Simple environment to test possible_actions_mask. State simply tells you which iteration it is, but doesn’t tell anything about which action to take, so only source of info is possible_actions_mask. The Q-value of each action to converge to the (discounted) value of the MDP.

The value of the MDP should be 10 * max_steps = 200

class reagent.gym.envs.functionality.possible_actions_mask_tester.PossibleActionsMaskTester

Bases: gym.core.Env


Resets the environment to an initial state and returns an initial observation.

Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.


the initial observation.

Return type

observation (object)


Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).


action (object) – an action provided by the agent


agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

Return type

observation (object)

Module contents