
ReAgent: Applied Reinforcement Learning Platform
Overview
ReAgent is an open source end-to-end platform for applied reinforcement learning (RL) developed and used at Facebook. ReAgent is built in Python and uses PyTorch for modeling and training and TorchScript for model serving. The platform contains workflows to train popular deep RL algorithms and includes data preprocessing, feature transformation, distributed training, counterfactual policy evaluation, and optimized serving. For more detailed information about ReAgent, please read releases post and white paper.
The source code is available here: Source code.
The platform was once named “Horizon” but we have adopted the name “ReAgent” recently to emphasize its broader scope in decision making and reasoning.
Algorithms Supported
Classic Off-Policy algorithms:
Discrete-Action DQN
Parametric-Action DQN
Twin Delayed DDPG (TD3)
Soft Actor-Critic (SAC)
RL for recommender systems:
Counterfactual Evaluation:
Doubly Robust (for bandits)
Doubly Robust (for sequential decisions)
Multi-Arm and Contextual Bandits:
Others:
Installation
ReAgent can be installed via. Docker or manually. Detailed instructions on how to install ReAgent can be found here: Installation.
Tutorial
ReAgent is designed for large-scale, distributed recommendation/optimization tasks where we don’t have access to a simulator. In this environment, it is typically better to train offline on batches of data, and release new policies slowly over time. Because the policy updates slowly and in batches, we use off-policy algorithms. To test a new policy without deploying it, we rely on counter-factual policy evaluation (CPE), a set of techniques for estimating a policy based on the actions of another policy.
We also have a set of tools to facilitate applying RL in real-world applications:
Domain Analysis Tool, which analyzes state/action feature importance and identifies whether the problem is a suitable for applying batch RL
Behavior Cloning, which clones from the logging policy to bootstrap the learning policy safely
Detailed instructions on how to use ReAgent can be found here: Usage.
License
Citing
Cite our work by:
@article{gauci2018horizon,
title={Horizon: Facebook's Open Source Applied Reinforcement Learning Platform},
author={Gauci, Jason and Conti, Edoardo and Liang, Yitao and Virochsiri, Kittipat and Chen, Zhengxing and He, Yuchen and Kaden, Zachary and Narayanan, Vivek and Ye, Xiaohui},
journal={arXiv preprint arXiv:1811.00260},
year={2018}
}
Table of Contents
Getting Started
Advanced Topics
Package Reference
- Core
- Submodules
- reagent.core.aggregators module
- reagent.core.base_dataclass module
- reagent.core.configuration module
- reagent.core.dataclasses module
- reagent.core.debug_on_error module
- reagent.core.fb_checker module
- reagent.core.multiprocess_utils module
- reagent.core.observers module
- reagent.core.oss_tensorboard_logger module
- reagent.core.parameters module
- reagent.core.parameters_seq2slate module
- reagent.core.registry_meta module
- reagent.core.report_utils module
- reagent.core.result_registries module
- reagent.core.result_types module
- reagent.core.running_stats module
- reagent.core.tagged_union module
- reagent.core.tensorboardX module
- reagent.core.torch_utils module
- reagent.core.tracker module
- reagent.core.types module
- reagent.core.utils module
- Module contents
- Data
- Gym
- Subpackages
- reagent.gym.agents package
- reagent.gym.datasets package
- reagent.gym.envs package
- Subpackages
- Submodules
- reagent.gym.envs.changing_arms module
- reagent.gym.envs.env_wrapper module
- reagent.gym.envs.gym module
- reagent.gym.envs.oracle_pvm module
- reagent.gym.envs.recsim module
- reagent.gym.envs.toy_vm module
- reagent.gym.envs.utils module
- Module contents
- reagent.gym.policies package
- reagent.gym.preprocessors package
- reagent.gym.runners package
- reagent.gym.tests package
- Subpackages
- Submodules
- reagent.gym.tests.test_gym module
- reagent.gym.tests.test_gym_datasets module
- reagent.gym.tests.test_gym_offline module
- reagent.gym.tests.test_gym_replay_buffer module
- reagent.gym.tests.test_linear_dynamics module
- reagent.gym.tests.test_pomdp module
- reagent.gym.tests.test_world_model module
- Module contents
- Submodules
- reagent.gym.normalizers module
- reagent.gym.types module
- reagent.gym.utils module
- Module contents
- Subpackages
- Evaluation
- Subpackages
- Submodules
- reagent.evaluation.cpe module
- reagent.evaluation.doubly_robust_estimator module
- reagent.evaluation.evaluation_data_page module
- reagent.evaluation.evaluator module
- reagent.evaluation.ope_adapter module
- reagent.evaluation.sequential_doubly_robust_estimator module
- reagent.evaluation.weighted_sequential_doubly_robust_estimator module
- reagent.evaluation.world_model_evaluator module
- Module contents
- Lite
- MAB
- Model Managers
- Subpackages
- Submodules
- reagent.model_managers.actor_critic_base module
- reagent.model_managers.discrete_dqn_base module
- reagent.model_managers.model_manager module
- reagent.model_managers.parametric_dqn_base module
- reagent.model_managers.slate_q_base module
- reagent.model_managers.union module
- reagent.model_managers.world_model_base module
- Module contents
- Model Utils
- Net Builders
- Subpackages
- reagent.net_builder.categorical_dqn package
- reagent.net_builder.continuous_actor package
- reagent.net_builder.discrete_actor package
- reagent.net_builder.discrete_dqn package
- reagent.net_builder.parametric_dqn package
- reagent.net_builder.quantile_dqn package
- reagent.net_builder.slate_ranking package
- reagent.net_builder.slate_reward package
- reagent.net_builder.synthetic_reward package
- Submodules
- reagent.net_builder.synthetic_reward.ngram_synthetic_reward module
- reagent.net_builder.synthetic_reward.sequence_synthetic_reward module
- reagent.net_builder.synthetic_reward.single_step_synthetic_reward module
- reagent.net_builder.synthetic_reward.transformer_synthetic_reward module
- Module contents
- reagent.net_builder.value package
- Submodules
- reagent.net_builder.categorical_dqn_net_builder module
- reagent.net_builder.continuous_actor_net_builder module
- reagent.net_builder.discrete_actor_net_builder module
- reagent.net_builder.discrete_dqn_net_builder module
- reagent.net_builder.parametric_dqn_net_builder module
- reagent.net_builder.quantile_dqn_net_builder module
- reagent.net_builder.slate_ranking_net_builder module
- reagent.net_builder.slate_reward_net_builder module
- reagent.net_builder.synthetic_reward_net_builder module
- reagent.net_builder.unions module
- reagent.net_builder.value_net_builder module
- Module contents
- Subpackages
- Optimizers
- Submodules
- reagent.optimizer.optimizer module
- reagent.optimizer.scheduler module
- reagent.optimizer.scheduler_union module
- reagent.optimizer.soft_update module
- reagent.optimizer.uninferrable_optimizers module
- reagent.optimizer.uninferrable_schedulers module
- reagent.optimizer.union module
- reagent.optimizer.utils module
- Module contents
- Models
- Submodules
- reagent.models.actor module
- reagent.models.base module
- reagent.models.bcq module
- reagent.models.categorical_dqn module
- reagent.models.cem_planner module
- reagent.models.containers module
- reagent.models.convolutional_network module
- reagent.models.critic module
- reagent.models.dqn module
- reagent.models.dueling_q_network module
- reagent.models.embedding_bag_concat module
- reagent.models.fully_connected_network module
- reagent.models.linear_regression module
- reagent.models.mdn_rnn module
- reagent.models.mlp_scorer module
- reagent.models.model_feature_config_provider module
- reagent.models.no_soft_update_embedding module
- reagent.models.seq2reward_model module
- reagent.models.seq2slate module
- reagent.models.seq2slate_reward module
- reagent.models.synthetic_reward module
- reagent.models.world_model module
- Module contents
- Prediction
- Preprocessing
- Submodules
- reagent.preprocessing.batch_preprocessor module
- reagent.preprocessing.identify_types module
- reagent.preprocessing.normalization module
- reagent.preprocessing.postprocessor module
- reagent.preprocessing.preprocessor module
- reagent.preprocessing.sparse_preprocessor module
- reagent.preprocessing.sparse_to_dense module
- reagent.preprocessing.transforms module
- reagent.preprocessing.types module
- Module contents
- Training
- Subpackages
- Submodules
- reagent.training.c51_trainer module
- reagent.training.cem_trainer module
- reagent.training.discrete_crr_trainer module
- reagent.training.dqn_trainer module
- reagent.training.dqn_trainer_base module
- reagent.training.imitator_training module
- reagent.training.multi_stage_trainer module
- reagent.training.parameters module
- reagent.training.parametric_dqn_trainer module
- reagent.training.ppo_trainer module
- reagent.training.qrdqn_trainer module
- reagent.training.reagent_lightning_module module
- reagent.training.reinforce_trainer module
- reagent.training.reward_network_trainer module
- reagent.training.rl_trainer_pytorch module
- reagent.training.sac_trainer module
- reagent.training.slate_q_trainer module
- reagent.training.td3_trainer module
- reagent.training.utils module
- Module contents
- Workflow
- All Modules
- reagent package
- Subpackages
- reagent.core package
- Submodules
- reagent.core.aggregators module
- reagent.core.base_dataclass module
- reagent.core.configuration module
- reagent.core.dataclasses module
- reagent.core.debug_on_error module
- reagent.core.fb_checker module
- reagent.core.multiprocess_utils module
- reagent.core.observers module
- reagent.core.oss_tensorboard_logger module
- reagent.core.parameters module
- reagent.core.parameters_seq2slate module
- reagent.core.registry_meta module
- reagent.core.report_utils module
- reagent.core.result_registries module
- reagent.core.result_types module
- reagent.core.running_stats module
- reagent.core.tagged_union module
- reagent.core.tensorboardX module
- reagent.core.torch_utils module
- reagent.core.tracker module
- reagent.core.types module
- reagent.core.utils module
- Module contents
- reagent.data package
- reagent.evaluation package
- Subpackages
- Submodules
- reagent.evaluation.cpe module
- reagent.evaluation.doubly_robust_estimator module
- reagent.evaluation.evaluation_data_page module
- reagent.evaluation.evaluator module
- reagent.evaluation.ope_adapter module
- reagent.evaluation.sequential_doubly_robust_estimator module
- reagent.evaluation.weighted_sequential_doubly_robust_estimator module
- reagent.evaluation.world_model_evaluator module
- Module contents
- reagent.gym package
- Subpackages
- reagent.gym.agents package
- reagent.gym.datasets package
- reagent.gym.envs package
- Subpackages
- Submodules
- reagent.gym.envs.changing_arms module
- reagent.gym.envs.env_wrapper module
- reagent.gym.envs.gym module
- reagent.gym.envs.oracle_pvm module
- reagent.gym.envs.recsim module
- reagent.gym.envs.toy_vm module
- reagent.gym.envs.utils module
- Module contents
- reagent.gym.policies package
- reagent.gym.preprocessors package
- reagent.gym.runners package
- reagent.gym.tests package
- Subpackages
- Submodules
- reagent.gym.tests.test_gym module
- reagent.gym.tests.test_gym_datasets module
- reagent.gym.tests.test_gym_offline module
- reagent.gym.tests.test_gym_replay_buffer module
- reagent.gym.tests.test_linear_dynamics module
- reagent.gym.tests.test_pomdp module
- reagent.gym.tests.test_world_model module
- Module contents
- Submodules
- reagent.gym.normalizers module
- reagent.gym.types module
- reagent.gym.utils module
- Module contents
- Subpackages
- reagent.lite package
- reagent.mab package
- reagent.model_managers package
- Subpackages
- Submodules
- reagent.model_managers.actor_critic_base module
- reagent.model_managers.discrete_dqn_base module
- reagent.model_managers.model_manager module
- reagent.model_managers.parametric_dqn_base module
- reagent.model_managers.slate_q_base module
- reagent.model_managers.union module
- reagent.model_managers.world_model_base module
- Module contents
- reagent.model_utils package
- reagent.models package
- Submodules
- reagent.models.actor module
- reagent.models.base module
- reagent.models.bcq module
- reagent.models.categorical_dqn module
- reagent.models.cem_planner module
- reagent.models.containers module
- reagent.models.convolutional_network module
- reagent.models.critic module
- reagent.models.dqn module
- reagent.models.dueling_q_network module
- reagent.models.embedding_bag_concat module
- reagent.models.fully_connected_network module
- reagent.models.linear_regression module
- reagent.models.mdn_rnn module
- reagent.models.mlp_scorer module
- reagent.models.model_feature_config_provider module
- reagent.models.no_soft_update_embedding module
- reagent.models.seq2reward_model module
- reagent.models.seq2slate module
- reagent.models.seq2slate_reward module
- reagent.models.synthetic_reward module
- reagent.models.world_model module
- Module contents
- reagent.net_builder package
- Subpackages
- reagent.net_builder.categorical_dqn package
- reagent.net_builder.continuous_actor package
- reagent.net_builder.discrete_actor package
- reagent.net_builder.discrete_dqn package
- reagent.net_builder.parametric_dqn package
- reagent.net_builder.quantile_dqn package
- reagent.net_builder.slate_ranking package
- reagent.net_builder.slate_reward package
- reagent.net_builder.synthetic_reward package
- Submodules
- reagent.net_builder.synthetic_reward.ngram_synthetic_reward module
- reagent.net_builder.synthetic_reward.sequence_synthetic_reward module
- reagent.net_builder.synthetic_reward.single_step_synthetic_reward module
- reagent.net_builder.synthetic_reward.transformer_synthetic_reward module
- Module contents
- reagent.net_builder.value package
- Submodules
- reagent.net_builder.categorical_dqn_net_builder module
- reagent.net_builder.continuous_actor_net_builder module
- reagent.net_builder.discrete_actor_net_builder module
- reagent.net_builder.discrete_dqn_net_builder module
- reagent.net_builder.parametric_dqn_net_builder module
- reagent.net_builder.quantile_dqn_net_builder module
- reagent.net_builder.slate_ranking_net_builder module
- reagent.net_builder.slate_reward_net_builder module
- reagent.net_builder.synthetic_reward_net_builder module
- reagent.net_builder.unions module
- reagent.net_builder.value_net_builder module
- Module contents
- Subpackages
- reagent.ope package
- Subpackages
- reagent.ope.datasets package
- reagent.ope.estimators package
- reagent.ope.test package
- reagent.ope.trainers package
- Submodules
- reagent.ope.utils module
- Module contents
- Subpackages
- reagent.optimizer package
- Submodules
- reagent.optimizer.optimizer module
- reagent.optimizer.scheduler module
- reagent.optimizer.scheduler_union module
- reagent.optimizer.soft_update module
- reagent.optimizer.uninferrable_optimizers module
- reagent.optimizer.uninferrable_schedulers module
- reagent.optimizer.union module
- reagent.optimizer.utils module
- Module contents
- reagent.prediction package
- reagent.preprocessing package
- Submodules
- reagent.preprocessing.batch_preprocessor module
- reagent.preprocessing.identify_types module
- reagent.preprocessing.normalization module
- reagent.preprocessing.postprocessor module
- reagent.preprocessing.preprocessor module
- reagent.preprocessing.sparse_preprocessor module
- reagent.preprocessing.sparse_to_dense module
- reagent.preprocessing.transforms module
- reagent.preprocessing.types module
- Module contents
- reagent.publishers package
- reagent.replay_memory package
- reagent.reporting package
- Submodules
- reagent.reporting.actor_critic_reporter module
- reagent.reporting.compound_reporter module
- reagent.reporting.discrete_crr_reporter module
- reagent.reporting.discrete_dqn_reporter module
- reagent.reporting.parametric_dqn_reporter module
- reagent.reporting.reporter_base module
- reagent.reporting.reward_network_reporter module
- reagent.reporting.seq2reward_reporter module
- reagent.reporting.slate_q_reporter module
- reagent.reporting.td3_reporter module
- reagent.reporting.world_model_reporter module
- Module contents
- reagent.samplers package
- reagent.scripts package
- reagent.training package
- Subpackages
- Submodules
- reagent.training.c51_trainer module
- reagent.training.cem_trainer module
- reagent.training.discrete_crr_trainer module
- reagent.training.dqn_trainer module
- reagent.training.dqn_trainer_base module
- reagent.training.imitator_training module
- reagent.training.multi_stage_trainer module
- reagent.training.parameters module
- reagent.training.parametric_dqn_trainer module
- reagent.training.ppo_trainer module
- reagent.training.qrdqn_trainer module
- reagent.training.reagent_lightning_module module
- reagent.training.reinforce_trainer module
- reagent.training.reward_network_trainer module
- reagent.training.rl_trainer_pytorch module
- reagent.training.sac_trainer module
- reagent.training.slate_q_trainer module
- reagent.training.td3_trainer module
- reagent.training.utils module
- Module contents
- reagent.validators package
- reagent.workflow package
- reagent.core package
- Module contents
- Subpackages
- reagent package