reagent.model_managers.model_based package

Submodules

reagent.model_managers.model_based.cross_entropy_method module

class reagent.model_managers.model_based.cross_entropy_method.CEMPolicy(cem_planner_network: reagent.models.cem_planner.CEMPlannerNetwork, discrete_action: bool)

Bases: reagent.gym.policies.policy.Policy

act(obs: reagent.core.types.FeatureData, possible_actions_mask: Optional[torch.Tensor] = None) reagent.core.types.ActorOutput

Performs the composition described above. These are the actions being put into the replay buffer, not necessary the actions taken by the environment!

class reagent.model_managers.model_based.cross_entropy_method.CrossEntropyMethod(reward_boost: Optional[Dict[str, float]] = None, trainer_param: reagent.core.parameters.CEMTrainerParameters = <factory>)

Bases: reagent.model_managers.world_model_base.WorldModelBase

build_trainer(normalization_data_map: Dict[str, reagent.core.parameters.NormalizationData], use_gpu: bool, reward_options: Optional[reagent.workflow.types.RewardOptions] = None) reagent.training.cem_trainer.CEMTrainer

Implement this to build the trainer, given the config

TODO: This function should return ReAgentLightningModule & the dictionary of modules created

create_policy(trainer_module: reagent.training.reagent_lightning_module.ReAgentLightningModule, serving: bool = False, normalization_data_map: Optional[Dict[str, reagent.core.parameters.NormalizationData]] = None) reagent.gym.policies.policy.Policy
trainer_param: reagent.core.parameters.CEMTrainerParameters

reagent.model_managers.model_based.seq2reward_model module

class reagent.model_managers.model_based.seq2reward_model.Seq2RewardModel(reward_boost: Optional[Dict[str, float]] = None, net_builder: reagent.net_builder.unions.ValueNetBuilder__Union = <factory>, compress_net_builder: reagent.net_builder.unions.ValueNetBuilder__Union = <factory>, trainer_param: reagent.core.parameters.Seq2RewardTrainerParameters = <factory>, preprocessing_options: Optional[reagent.workflow.types.PreprocessingOptions] = None)

Bases: reagent.model_managers.world_model_base.WorldModelBase

build_trainer(normalization_data_map: Dict[str, reagent.core.parameters.NormalizationData], use_gpu: bool, reward_options: Optional[reagent.workflow.types.RewardOptions] = None) reagent.training.world_model.seq2reward_trainer.Seq2RewardTrainer

Implement this to build the trainer, given the config

TODO: This function should return ReAgentLightningModule & the dictionary of modules created

compress_net_builder: reagent.net_builder.unions.ValueNetBuilder__Union
get_reporter() reagent.reporting.seq2reward_reporter.Seq2RewardReporter
net_builder: reagent.net_builder.unions.ValueNetBuilder__Union
preprocessing_options: Optional[reagent.workflow.types.PreprocessingOptions] = None
trainer_param: reagent.core.parameters.Seq2RewardTrainerParameters

reagent.model_managers.model_based.synthetic_reward module

class reagent.model_managers.model_based.synthetic_reward.SyntheticReward(trainer_param: reagent.training.parameters.RewardNetworkTrainerParameters = <factory>, net_builder: reagent.net_builder.unions.SyntheticRewardNetBuilder__Union = <factory>, eval_parameters: reagent.core.parameters.EvaluationParameters = <factory>, state_preprocessing_options: Optional[reagent.workflow.types.PreprocessingOptions] = None, action_preprocessing_options: Optional[reagent.workflow.types.PreprocessingOptions] = None, state_float_features: Optional[List[Tuple[int, str]]] = None, parametric_action_float_features: Optional[List[Tuple[int, str]]] = None, discrete_action_names: Optional[List[str]] = None, max_seq_len: int = 5)

Bases: reagent.model_managers.model_manager.ModelManager

Train models to attribute single step rewards from sparse/delayed/aggregated rewards. Ideas from: 1. Synthetic Returns for Long-Term Credit Assignment: https://arxiv.org/pdf/2102.12425.pdf 2. RUDDER: Return Decomposition for Delayed Rewards: https://arxiv.org/pdf/1806.07857.pdf 3. Optimizing Agent Behavior over Long Time Scales by Transporting Value: https://arxiv.org/pdf/1810.06721.pdf 4. Sequence Modeling of Temporal Credit Assignment for Episodic Reinforcement Learning: https://arxiv.org/pdf/1905.13420.pdf

property action_feature_config: reagent.core.types.ModelFeatureConfig
action_preprocessing_options: Optional[reagent.workflow.types.PreprocessingOptions] = None
build_serving_module(trainer_module: reagent.training.reagent_lightning_module.ReAgentLightningModule, normalization_data_map: Dict[str, reagent.core.parameters.NormalizationData]) torch.nn.modules.module.Module

Returns a TorchScript predictor module

build_trainer(normalization_data_map: Dict[str, reagent.core.parameters.NormalizationData], use_gpu: bool, reward_options: Optional[reagent.workflow.types.RewardOptions] = None) reagent.training.reward_network_trainer.RewardNetTrainer

Implement this to build the trainer, given the config

TODO: This function should return ReAgentLightningModule & the dictionary of modules created

discrete_action_names: Optional[List[str]] = None
eval_parameters: reagent.core.parameters.EvaluationParameters
get_data_module(*, input_table_spec: Optional[reagent.workflow.types.TableSpec] = None, reward_options: Optional[reagent.workflow.types.RewardOptions] = None, reader_options: Optional[reagent.workflow.types.ReaderOptions] = None, setup_data: Optional[Dict[str, bytes]] = None, saved_setup_data: Optional[Dict[str, bytes]] = None, resource_options: Optional[reagent.workflow.types.ResourceOptions] = None) Optional[reagent.data.reagent_data_module.ReAgentDataModule]

Return the data module. If this is not None, then run_feature_identification & query_data will not be run.

get_reporter()
max_seq_len: int = 5
net_builder: reagent.net_builder.unions.SyntheticRewardNetBuilder__Union
parametric_action_float_features: Optional[List[Tuple[int, str]]] = None
property state_feature_config: reagent.core.types.ModelFeatureConfig
state_float_features: Optional[List[Tuple[int, str]]] = None
state_preprocessing_options: Optional[reagent.workflow.types.PreprocessingOptions] = None
trainer_param: reagent.training.parameters.RewardNetworkTrainerParameters
class reagent.model_managers.model_based.synthetic_reward.SyntheticRewardDataModule(*args: Any, **kwargs: Any)

Bases: reagent.data.manual_data_module.ManualDataModule

build_batch_preprocessor()
query_data(input_table_spec: reagent.workflow.types.TableSpec, sample_range: Optional[Tuple[float, float]], reward_options: reagent.workflow.types.RewardOptions, data_fetcher: reagent.data.data_fetcher.DataFetcher) reagent.workflow.types.Dataset

Massage input table into the format expected by the trainer

run_feature_identification(input_table_spec: reagent.workflow.types.TableSpec) Dict[str, reagent.core.parameters.NormalizationData]

Derive preprocessing parameters from data.

property should_generate_eval_dataset: bool

reagent.model_managers.model_based.world_model module

class reagent.model_managers.model_based.world_model.WorldModel(reward_boost: Optional[Dict[str, float]] = None, trainer_param: reagent.core.parameters.MDNRNNTrainerParameters = <factory>)

Bases: reagent.model_managers.world_model_base.WorldModelBase

build_trainer(normalization_data_map: Dict[str, reagent.core.parameters.NormalizationData], use_gpu: bool, reward_options: Optional[reagent.workflow.types.RewardOptions] = None) reagent.training.world_model.mdnrnn_trainer.MDNRNNTrainer

Implement this to build the trainer, given the config

TODO: This function should return ReAgentLightningModule & the dictionary of modules created

trainer_param: reagent.core.parameters.MDNRNNTrainerParameters

Module contents

class reagent.model_managers.model_based.CrossEntropyMethod(reward_boost: Optional[Dict[str, float]] = None, trainer_param: reagent.core.parameters.CEMTrainerParameters = <factory>)

Bases: reagent.model_managers.world_model_base.WorldModelBase

build_trainer(normalization_data_map: Dict[str, reagent.core.parameters.NormalizationData], use_gpu: bool, reward_options: Optional[reagent.workflow.types.RewardOptions] = None) reagent.training.cem_trainer.CEMTrainer

Implement this to build the trainer, given the config

TODO: This function should return ReAgentLightningModule & the dictionary of modules created

create_policy(trainer_module: reagent.training.reagent_lightning_module.ReAgentLightningModule, serving: bool = False, normalization_data_map: Optional[Dict[str, reagent.core.parameters.NormalizationData]] = None) reagent.gym.policies.policy.Policy
trainer_param: reagent.core.parameters.CEMTrainerParameters
class reagent.model_managers.model_based.Seq2RewardModel(reward_boost: Optional[Dict[str, float]] = None, net_builder: reagent.net_builder.unions.ValueNetBuilder__Union = <factory>, compress_net_builder: reagent.net_builder.unions.ValueNetBuilder__Union = <factory>, trainer_param: reagent.core.parameters.Seq2RewardTrainerParameters = <factory>, preprocessing_options: Optional[reagent.workflow.types.PreprocessingOptions] = None)

Bases: reagent.model_managers.world_model_base.WorldModelBase

build_trainer(normalization_data_map: Dict[str, reagent.core.parameters.NormalizationData], use_gpu: bool, reward_options: Optional[reagent.workflow.types.RewardOptions] = None) reagent.training.world_model.seq2reward_trainer.Seq2RewardTrainer

Implement this to build the trainer, given the config

TODO: This function should return ReAgentLightningModule & the dictionary of modules created

compress_net_builder: reagent.net_builder.unions.ValueNetBuilder__Union
get_reporter() reagent.reporting.seq2reward_reporter.Seq2RewardReporter
net_builder: reagent.net_builder.unions.ValueNetBuilder__Union
preprocessing_options: Optional[reagent.workflow.types.PreprocessingOptions] = None
trainer_param: reagent.core.parameters.Seq2RewardTrainerParameters
class reagent.model_managers.model_based.SyntheticReward(trainer_param: reagent.training.parameters.RewardNetworkTrainerParameters = <factory>, net_builder: reagent.net_builder.unions.SyntheticRewardNetBuilder__Union = <factory>, eval_parameters: reagent.core.parameters.EvaluationParameters = <factory>, state_preprocessing_options: Optional[reagent.workflow.types.PreprocessingOptions] = None, action_preprocessing_options: Optional[reagent.workflow.types.PreprocessingOptions] = None, state_float_features: Optional[List[Tuple[int, str]]] = None, parametric_action_float_features: Optional[List[Tuple[int, str]]] = None, discrete_action_names: Optional[List[str]] = None, max_seq_len: int = 5)

Bases: reagent.model_managers.model_manager.ModelManager

Train models to attribute single step rewards from sparse/delayed/aggregated rewards. Ideas from: 1. Synthetic Returns for Long-Term Credit Assignment: https://arxiv.org/pdf/2102.12425.pdf 2. RUDDER: Return Decomposition for Delayed Rewards: https://arxiv.org/pdf/1806.07857.pdf 3. Optimizing Agent Behavior over Long Time Scales by Transporting Value: https://arxiv.org/pdf/1810.06721.pdf 4. Sequence Modeling of Temporal Credit Assignment for Episodic Reinforcement Learning: https://arxiv.org/pdf/1905.13420.pdf

property action_feature_config: reagent.core.types.ModelFeatureConfig
action_preprocessing_options: Optional[reagent.workflow.types.PreprocessingOptions] = None
build_serving_module(trainer_module: reagent.training.reagent_lightning_module.ReAgentLightningModule, normalization_data_map: Dict[str, reagent.core.parameters.NormalizationData]) torch.nn.modules.module.Module

Returns a TorchScript predictor module

build_trainer(normalization_data_map: Dict[str, reagent.core.parameters.NormalizationData], use_gpu: bool, reward_options: Optional[reagent.workflow.types.RewardOptions] = None) reagent.training.reward_network_trainer.RewardNetTrainer

Implement this to build the trainer, given the config

TODO: This function should return ReAgentLightningModule & the dictionary of modules created

discrete_action_names: Optional[List[str]] = None
eval_parameters: reagent.core.parameters.EvaluationParameters
get_data_module(*, input_table_spec: Optional[reagent.workflow.types.TableSpec] = None, reward_options: Optional[reagent.workflow.types.RewardOptions] = None, reader_options: Optional[reagent.workflow.types.ReaderOptions] = None, setup_data: Optional[Dict[str, bytes]] = None, saved_setup_data: Optional[Dict[str, bytes]] = None, resource_options: Optional[reagent.workflow.types.ResourceOptions] = None) Optional[reagent.data.reagent_data_module.ReAgentDataModule]

Return the data module. If this is not None, then run_feature_identification & query_data will not be run.

get_reporter()
max_seq_len: int = 5
net_builder: reagent.net_builder.unions.SyntheticRewardNetBuilder__Union
parametric_action_float_features: Optional[List[Tuple[int, str]]] = None
property state_feature_config: reagent.core.types.ModelFeatureConfig
state_float_features: Optional[List[Tuple[int, str]]] = None
state_preprocessing_options: Optional[reagent.workflow.types.PreprocessingOptions] = None
trainer_param: reagent.training.parameters.RewardNetworkTrainerParameters
class reagent.model_managers.model_based.WorldModel(reward_boost: Optional[Dict[str, float]] = None, trainer_param: reagent.core.parameters.MDNRNNTrainerParameters = <factory>)

Bases: reagent.model_managers.world_model_base.WorldModelBase

build_trainer(normalization_data_map: Dict[str, reagent.core.parameters.NormalizationData], use_gpu: bool, reward_options: Optional[reagent.workflow.types.RewardOptions] = None) reagent.training.world_model.mdnrnn_trainer.MDNRNNTrainer

Implement this to build the trainer, given the config

TODO: This function should return ReAgentLightningModule & the dictionary of modules created

trainer_param: reagent.core.parameters.MDNRNNTrainerParameters