reagent.models package

Submodules

reagent.models.actor module

class reagent.models.actor.DirichletFullyConnectedActor(state_dim, action_dim, sizes, activations, use_batch_norm=False)

Bases: reagent.models.base.ModelBase

EPSILON = 1e-06
forward(state)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_log_prob(state, action)
input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

training: bool
class reagent.models.actor.FullyConnectedActor(state_dim: int, action_dim: int, sizes: List[int], activations: List[str], use_batch_norm: bool = False, action_activation: str = 'tanh', exploration_variance: Optional[float] = None)

Bases: reagent.models.base.ModelBase

forward(state: reagent.core.types.FeatureData) reagent.core.types.ActorOutput

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

training: bool
class reagent.models.actor.GaussianFullyConnectedActor(state_dim: int, action_dim: int, sizes: List[int], activations: List[str], scale: float = 0.05, use_batch_norm: bool = False, use_layer_norm: bool = False, use_l2_normalization: bool = False)

Bases: reagent.models.base.ModelBase

forward(state: reagent.core.types.FeatureData)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_log_prob(state: reagent.core.types.FeatureData, squashed_action: torch.Tensor)

Action is expected to be squashed with tanh

input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

training: bool
class reagent.models.actor.StochasticActor(scorer, sampler)

Bases: reagent.models.base.ModelBase

forward(state)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_distributed_data_parallel_model()

Return DistributedDataParallel version of this model

This needs to be implemented explicitly because: 1) Model with EmbeddingBag module is not compatible with vanilla DistributedDataParallel 2) Exporting logic needs structured data. DistributedDataParallel doesn’t work with structured data.

input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

training: bool

reagent.models.base module

class reagent.models.base.ModelBase

Bases: torch.nn.modules.module.Module

A base class to support exporting through ONNX

cpu_model()

Override this in DistributedDataParallel models

feature_config() Optional[reagent.core.types.ModelFeatureConfig]

If the model needs additional preprocessing, e.g., using sequence features, returns the config here.

get_distributed_data_parallel_model()

Return DistributedDataParallel version of this model

This needs to be implemented explicitly because: 1) Model with EmbeddingBag module is not compatible with vanilla DistributedDataParallel 2) Exporting logic needs structured data. DistributedDataParallel doesn’t work with structured data.

get_target_network()

Return a copy of this network to be used as target network

Subclass should override this if the target network should share parameters with the network to be trained.

input_prototype() Any

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

training: bool

reagent.models.bcq module

class reagent.models.bcq.BatchConstrainedDQN(state_dim, q_network, imitator_network, bcq_drop_threshold)

Bases: reagent.models.base.ModelBase

forward(state)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

training: bool

reagent.models.categorical_dqn module

class reagent.models.categorical_dqn.CategoricalDQN(distributional_network: reagent.models.base.ModelBase, *, qmin: float, qmax: float, num_atoms: int)

Bases: reagent.models.base.ModelBase

forward(state: reagent.core.types.FeatureData)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

log_dist(state: reagent.core.types.FeatureData) torch.Tensor
training: bool

reagent.models.cem_planner module

A network which implements a cross entropy method-based planner

The planner plans the best next action based on simulation data generated by an ensemble of world models.

The idea is inspired by: https://arxiv.org/abs/1805.12114

class reagent.models.cem_planner.CEMPlannerNetwork(mem_net_list: List[reagent.models.world_model.MemoryNetwork], cem_num_iterations: int, cem_population_size: int, ensemble_population_size: int, num_elites: int, plan_horizon_length: int, state_dim: int, action_dim: int, discrete_action: bool, terminal_effective: bool, gamma: float, alpha: float = 0.25, epsilon: float = 0.001, action_upper_bounds: Optional[numpy.ndarray] = None, action_lower_bounds: Optional[numpy.ndarray] = None)

Bases: torch.nn.modules.module.Module

acc_rewards_of_all_solutions(state: reagent.core.types.FeatureData, solutions: torch.Tensor) float

Calculate accumulated rewards of solutions.

Parameters
  • state – the input which contains the starting state

  • solutions – its shape is (cem_pop_size, plan_horizon_length, action_dim)

Returns

a vector of size cem_pop_size, which is the reward of each solution

acc_rewards_of_one_solution(init_state: torch.Tensor, solution: torch.Tensor, solution_idx: int)

ensemble_pop_size trajectories will be sampled to evaluate a CEM solution. Each trajectory is generated by one world model

Parameters
  • init_state – its shape is (state_dim, )

  • solution – its shape is (plan_horizon_length, action_dim)

  • solution_idx – the index of the solution

Return reward

Reward of each of ensemble_pop_size trajectories

constrained_variance(mean, var)
continuous_planning(state: reagent.core.types.FeatureData) torch.Tensor
discrete_planning(state: reagent.core.types.FeatureData) Tuple[int, numpy.ndarray]
forward(state: reagent.core.types.FeatureData)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

sample_reward_next_state_terminal(state: reagent.core.types.FeatureData, action: reagent.core.types.FeatureData, mem_net: reagent.models.world_model.MemoryNetwork)

Sample one-step dynamics based on the provided world model

training: bool

reagent.models.containers module

class reagent.models.containers.Sequential(*args: torch.nn.modules.module.Module)
class reagent.models.containers.Sequential(arg: collections.OrderedDict[str, torch.nn.modules.module.Module])

Bases: torch.nn.modules.container.Sequential, reagent.models.base.ModelBase

Used this instead of torch.nn.Sequential to automate model tracing

input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

reagent.models.convolutional_network module

class reagent.models.convolutional_network.ConvolutionalNetwork(cnn_parameters, layers, activations, use_layer_norm)

Bases: torch.nn.modules.module.Module

conv_forward(input)
forward(input) torch.FloatTensor

Forward pass for generic convnet DNNs. Assumes activation names are valid pytorch activation names. :param input image tensor

training: bool

reagent.models.critic module

class reagent.models.critic.FullyConnectedCritic(state_dim: int, action_dim: int, sizes: List[int], activations: List[str], use_batch_norm: bool = False, use_layer_norm: bool = False, output_dim: int = 1)

Bases: reagent.models.base.ModelBase

forward(state: reagent.core.types.FeatureData, action: reagent.core.types.FeatureData)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

training: bool

reagent.models.dqn module

class reagent.models.dqn.FullyConnectedDQN(state_dim, action_dim, sizes, activations, *, output_activation: str = 'linear', num_atoms: Optional[int] = None, use_batch_norm: bool = False, dropout_ratio: float = 0.0, normalized_output: bool = False, use_layer_norm: bool = False)

Bases: reagent.models.fully_connected_network.FloatFeatureFullyConnected

forward(state: reagent.core.types.FeatureData, possible_actions_mask: Optional[torch.Tensor] = None) torch.Tensor

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool

reagent.models.dueling_q_network module

class reagent.models.dueling_q_network.DuelingQNetwork(*, shared_network: reagent.models.base.ModelBase, advantage_network: reagent.models.base.ModelBase, value_network: reagent.models.base.ModelBase)

Bases: reagent.models.base.ModelBase

forward(state: reagent.core.types.FeatureData, possible_actions_mask: Optional[torch.Tensor] = None) torch.Tensor

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

classmethod make_fully_connected(state_dim: int, action_dim: int, layers: List[int], activations: List[str], num_atoms: Optional[int] = None, use_batch_norm: bool = False)
training: bool
class reagent.models.dueling_q_network.ParametricDuelingQNetwork(*, shared_network: reagent.models.base.ModelBase, advantage_network: reagent.models.base.ModelBase, value_network: reagent.models.base.ModelBase)

Bases: reagent.models.base.ModelBase

forward(state: reagent.core.types.FeatureData, action: reagent.core.types.FeatureData) torch.Tensor

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

classmethod make_fully_connected(state_dim: int, action_dim: int, layers: List[int], activations: List[str], use_batch_norm: bool = False)
training: bool

reagent.models.embedding_bag_concat module

class reagent.models.embedding_bag_concat.EmbeddingBagConcat(state_dim: int, model_feature_config: reagent.core.types.ModelFeatureConfig, embedding_dim: int)

Bases: reagent.models.base.ModelBase

Concatenating embedding with float features before passing the input to DQN

forward(state: reagent.core.types.FeatureData)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

property output_dim: int
training: bool

reagent.models.fully_connected_network module

class reagent.models.fully_connected_network.FloatFeatureFullyConnected(state_dim, output_dim, sizes, activations, *, output_activation: str = 'linear', num_atoms: Optional[int] = None, use_batch_norm: bool = False, dropout_ratio: float = 0.0, normalized_output: bool = False, use_layer_norm: bool = False)

Bases: reagent.models.base.ModelBase

A fully connected network that takes FloatFeatures input and supports distributional prediction.

forward(state: reagent.core.types.FeatureData) torch.Tensor

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

training: bool
class reagent.models.fully_connected_network.FullyConnectedNetwork(layers, activations, *, use_batch_norm: bool = False, min_std: float = 0.0, dropout_ratio: float = 0.0, use_layer_norm: bool = False, normalize_output: bool = False, orthogonal_init: bool = False)

Bases: reagent.models.base.ModelBase

forward(input: torch.Tensor) torch.Tensor

Forward pass for generic feed-forward DNNs. Assumes activation names are valid pytorch activation names. :param input tensor

input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

training: bool
class reagent.models.fully_connected_network.SlateBatchNorm1d(*args, **kwargs)

Bases: torch.nn.modules.module.Module

Same as nn.BatchNorm1d is input has shape (batch_size, feat_dim). But if input has shape (batch_size, num_candidates, item_feats), like in LearnedVM, we transpose it, since that’s what nn.BatchNorm1d computes Batch Normalization over 1st dimension, while we want to compute it over item_feats.

NOTE: this is different from nn.BatchNorm2d which is for CNNs, and expects 4D inputs

forward(x: torch.Tensor)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
reagent.models.fully_connected_network.gaussian_fill_w_gain(tensor, gain, dim_in, min_std=0.0) None

Gaussian initialization with gain.

reagent.models.linear_regression module

class reagent.models.linear_regression.LinearRegressionUCB(input_dim: int, *, l2_reg_lambda: float = 1.0, predict_ucb: float = False, ucb_alpha: float = 1.0)

Bases: reagent.models.base.ModelBase

A linear regression model for LinUCB. Note that instead of being trained by a PyTorch optimizer, we explicitly

update attributes A and b (according to the LinUCB formulas implemented in reagent.training.cb.linucb_trainer.LinUCBTrainer).

Since computing the regression coefficients inverse matrix inversion (expensive op), we

save time by only computing the coefficients when necessary (when doing inference).

Parameters
  • input_dim – Dimension of input data

  • l2_reg_lambda – The weight on L2 regularization

  • predict_ucb – If True, the model outputs an Upper Confidence Bound (UCB). If False, the model outputs the point estimate

  • ucb_alpha – The coefficient on the standard deviation in UCB formula. Only used if predict_ucb=True.

forward(inp: torch.Tensor, ucb_alpha: Optional[float] = None) torch.Tensor

Forward can return the mean or a UCB. If returning UCB, the CI width is stddev*ucb_alpha If ucb_alpha is not passed in, a fixed alpha from init is used

input_prototype() torch.Tensor

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

training: bool
reagent.models.linear_regression.batch_quadratic_form(x: torch.Tensor, A: torch.Tensor) torch.Tensor

Compute the quadratic form x^T * A * x for a batched input x. Inspired by https://stackoverflow.com/questions/18541851/calculate-vt-a-v-for-a-matrix-of-vectors-v This is a vectorized implementation of out[i] = x[i].t() @ A @ x[i] x shape: (B, N) A shape: (N, N) output shape: (B)

reagent.models.mdn_rnn module

class reagent.models.mdn_rnn.MDNRNN(state_dim, action_dim, num_hiddens, num_hidden_layers, num_gaussians)

Bases: torch.nn.modules.module.Module

Mixture Density Network - Recurrent Neural Network

forward(actions: torch.Tensor, states: torch.Tensor, hidden=None)

Forward pass of MDN-RNN

Parameters
  • actions – (SEQ_LEN, BATCH_SIZE, ACTION_DIM) torch tensor

  • states – (SEQ_LEN, BATCH_SIZE, STATE_DIM) torch tensor

Returns

parameters of the GMM prediction for the next state,

gaussian prediction of the reward and logit prediction of non-terminality. And the RNN’s outputs.

  • mus: (SEQ_LEN, BATCH_SIZE, NUM_GAUSSIANS, STATE_DIM) torch tensor

  • sigmas: (SEQ_LEN, BATCH_SIZE, NUM_GAUSSIANS, STATE_DIM) torch tensor

  • logpi: (SEQ_LEN, BATCH_SIZE, NUM_GAUSSIANS) torch tensor

  • reward: (SEQ_LEN, BATCH_SIZE) torch tensor

  • not_terminal: (SEQ_LEN, BATCH_SIZE) torch tensor

  • last_step_hidden_and_cell: TUPLE(

    (NUM_LAYERS, BATCH_SIZE, HIDDEN_SIZE), (NUM_LAYERS, BATCH_SIZE, HIDDEN_SIZE)

) torch tensor - all_steps_hidden: (SEQ_LEN, BATCH_SIZE, HIDDEN_SIZE) torch tensor

get_initial_hidden_state(batch_size=1)
training: bool
class reagent.models.mdn_rnn.MDNRNNMemoryPool(max_replay_memory_size)

Bases: object

deque_sample(indices)
insert_into_memory(state, action, next_state, reward, not_terminal)
property memory_size
sample_memories(batch_size, use_gpu=False) reagent.core.types.MemoryNetworkInput
Parameters
  • batch_size – number of samples to return

  • use_gpu – whether to put samples on gpu

State’s shape is SEQ_LEN x BATCH_SIZE x STATE_DIM, for example. By default, MDN-RNN consumes data with SEQ_LEN as the first dimension.

class reagent.models.mdn_rnn.MDNRNNMemorySample(state, action, next_state, reward, not_terminal)

Bases: NamedTuple

action: numpy.ndarray

Alias for field number 1

next_state: numpy.ndarray

Alias for field number 2

not_terminal: float

Alias for field number 4

reward: float

Alias for field number 3

state: numpy.ndarray

Alias for field number 0

reagent.models.mdn_rnn.gmm_loss(batch, mus, sigmas, logpi, reduce=True)

Computes the gmm loss.

Compute minus the log probability of batch under the GMM model described by mus, sigmas, pi. Precisely, with bs1, bs2, … the sizes of the batch dimensions (several batch dimension are useful when you have both a batch axis and a time step axis), gs the number of mixtures and fs the number of features.

Parameters
  • batch – (bs1, bs2, *, fs) torch tensor

  • mus – (bs1, bs2, *, gs, fs) torch tensor

  • sigmas – (bs1, bs2, *, gs, fs) torch tensor

  • logpi – (bs1, bs2, *, gs) torch tensor

  • reduce – if not reduce, the mean in the following formula is omitted

Returns

loss(batch) = - mean_{i1=0..bs1, i2=0..bs2, …} log(
sum_{k=1..gs} pi[i1, i2, …, k] * N(

batch[i1, i2, …, :] | mus[i1, i2, …, k, :], sigmas[i1, i2, …, k, :]))

NOTE: The loss is not reduced along the feature dimension (i.e. it should scale linearily with fs).

Adapted from: https://github.com/ctallec/world-models

reagent.models.mdn_rnn.transpose(*args)

reagent.models.mlp_scorer module

class reagent.models.mlp_scorer.MLPScorer(mlp: torch.nn.modules.module.Module, has_user_feat: bool = False)

Bases: reagent.models.base.ModelBase

Log-space in and out

forward(obs: reagent.core.types.FeatureData)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

training: bool

reagent.models.model_feature_config_provider module

class reagent.models.model_feature_config_provider.ModelFeatureConfigProvider

Bases: object

REGISTRY = {'raw': <class 'reagent.models.model_feature_config_provider.RawModelFeatureConfigProvider'>}
REGISTRY_FROZEN = True
REGISTRY_NAME = 'ModelFeatureConfigProvider'
abstract get_model_feature_config() reagent.core.types.ModelFeatureConfig
class reagent.models.model_feature_config_provider.RawModelFeatureConfigProvider(float_feature_infos: List[reagent.core.types.FloatFeatureInfo] = <factory>, id_mapping_config: Dict[str, reagent.core.types.IdMappingUnion] = <factory>, id_list_feature_configs: List[reagent.core.types.IdListFeatureConfig] = <factory>, id_score_list_feature_configs: List[reagent.core.types.IdScoreListFeatureConfig] = <factory>)

Bases: reagent.models.model_feature_config_provider.ModelFeatureConfigProvider, reagent.core.types.ModelFeatureConfig

get_model_feature_config() reagent.core.types.ModelFeatureConfig

reagent.models.no_soft_update_embedding module

class reagent.models.no_soft_update_embedding.NoSoftUpdateEmbedding(num_embeddings: int, embedding_dim: int, padding_idx: Optional[int] = None, max_norm: Optional[float] = None, norm_type: float = 2.0, scale_grad_by_freq: bool = False, sparse: bool = False, _weight: Optional[torch.Tensor] = None, device=None, dtype=None)

Bases: torch.nn.modules.sparse.Embedding

Use this instead of vanilla Embedding module to avoid soft-updating the embedding table in the target network.

embedding_dim: int
max_norm: Optional[float]
norm_type: float
num_embeddings: int
padding_idx: Optional[int]
scale_grad_by_freq: bool
sparse: bool
weight: torch.Tensor

reagent.models.seq2reward_model module

class reagent.models.seq2reward_model.Seq2RewardNetwork(state_dim, action_dim, num_hiddens, num_hidden_layers)

Bases: reagent.models.base.ModelBase

forward(state: reagent.core.types.FeatureData, action: reagent.core.types.FeatureData, valid_reward_len: Optional[torch.Tensor] = None)

Forward pass of Seq2Reward

Takes in the current state and use it as init hidden The input sequence are pure actions only Output the predicted reward after each time step

Parameters
  • actions – (SEQ_LEN, BATCH_SIZE, ACTION_DIM) torch tensor

  • states – (SEQ_LEN, BATCH_SIZE, STATE_DIM) torch tensor

  • valid_reward_len – (BATCH_SIZE,) torch tensor

Returns

predicated accumulated rewards at last step for the given sequence - acc_reward: (BATCH_SIZE, 1) torch tensor

get_initial_hidden_state(state, batch_size=1)
input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

training: bool

reagent.models.seq2slate module

class reagent.models.seq2slate.BaselineNet(state_dim, dim_feedforward, num_stacked_layers)

Bases: torch.nn.modules.module.Module

forward(input: reagent.core.types.PreprocessedRankingInput)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class reagent.models.seq2slate.Decoder(layer, num_layers)

Bases: torch.nn.modules.module.Module

Generic num_layers layer decoder with masking.

forward(x, memory, tgt_src_mask, tgt_tgt_mask)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class reagent.models.seq2slate.DecoderLastLayerPytorch(d_model, nhead, dim_feedforward=2048, dropout=0.1, activation=<function relu>, layer_norm_eps=1e-05, batch_first=False, norm_first=False, device=None, dtype=None)

Bases: torch.nn.modules.transformer.TransformerDecoderLayer

The last layer of Decoder. Modified from PyTorch official code: instead of attention embedding, return attention weights which can be directly used to sample items

forward(tgt, memory, tgt_mask, memory_mask)

Pass the inputs (and mask) through the decoder layer.

Parameters
  • tgt – the sequence to the decoder layer (required).

  • memory – the sequence from the last layer of the encoder (required).

  • tgt_mask – the mask for the tgt sequence (optional).

  • memory_mask – the mask for the memory sequence (optional).

  • tgt_key_padding_mask – the mask for the tgt keys per batch (optional).

  • memory_key_padding_mask – the mask for the memory keys per batch (optional).

Shape:

see the docs in Transformer class.

training: bool
class reagent.models.seq2slate.DecoderLayer(size, self_attn, src_attn, feed_forward)

Bases: torch.nn.modules.module.Module

Decoder is made of self-attn, src-attn, and feed forward

forward(x, m, tgt_src_mask, tgt_tgt_mask)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class reagent.models.seq2slate.DecoderPyTorch(dim_model, num_heads, dim_feedforward, num_layers)

Bases: torch.nn.modules.module.Module

Transformer-based decoder based on PyTorch official implementation

forward(tgt_embed, memory, tgt_src_mask, tgt_tgt_mask)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class reagent.models.seq2slate.Embedder(dim_in, dim_out)

Bases: torch.nn.modules.module.Module

forward(x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class reagent.models.seq2slate.Encoder(layer, num_layers)

Bases: torch.nn.modules.module.Module

Core encoder is a stack of num_layers layers

forward(x, mask)

Pass the input (and mask) through each layer in turn.

training: bool
class reagent.models.seq2slate.EncoderLayer(dim_model, self_attn, feed_forward)

Bases: torch.nn.modules.module.Module

Encoder is made up of self-attn and feed forward

forward(src_embed, src_mask)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class reagent.models.seq2slate.EncoderPyTorch(dim_model, num_heads, dim_feedforward, num_layers)

Bases: torch.nn.modules.module.Module

Transformer-based encoder based on PyTorch official implementation

forward(src)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class reagent.models.seq2slate.Generator

Bases: torch.nn.modules.module.Module

Candidate generation

forward(probs: torch.Tensor, greedy: bool)

Decode one-step

Parameters
  • probs – probability distributions of decoder. Shape: batch_size, tgt_seq_len, candidate_size

  • greedy – whether to greedily pick or sample the next symbol

training: bool
class reagent.models.seq2slate.MultiHeadedAttention(num_heads, dim_model)

Bases: torch.nn.modules.module.Module

forward(query, key, value, mask=None)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class reagent.models.seq2slate.PositionalEncoding(dim_model)

Bases: torch.nn.modules.module.Module

A special, non-learnable positional encoding for handling variable (possibly longer) lengths of inputs. We simply add an ordinal number as an additional dimension for the input embeddings, and then project them back to the original number of dimensions

forward(x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class reagent.models.seq2slate.PositionwiseFeedForward(dim_model, dim_feedforward)

Bases: torch.nn.modules.module.Module

forward(x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class reagent.models.seq2slate.Seq2SlateNet(state_dim: int, candidate_dim: int, num_stacked_layers: int, dim_model: int, max_src_seq_len: int, max_tgt_seq_len: int, output_arch: reagent.model_utils.seq2slate_utils.Seq2SlateOutputArch, temperature: float)

Bases: reagent.models.base.ModelBase

candidate_dim: int
dim_model: int
forward(input: reagent.core.types.PreprocessedRankingInput, mode: reagent.model_utils.seq2slate_utils.Seq2SlateMode, tgt_seq_len: Optional[int] = None, greedy: Optional[bool] = None)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_distributed_data_parallel_model()

Return DistributedDataParallel version of this model

This needs to be implemented explicitly because: 1) Model with EmbeddingBag module is not compatible with vanilla DistributedDataParallel 2) Exporting logic needs structured data. DistributedDataParallel doesn’t work with structured data.

input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

max_src_seq_len: int
max_tgt_seq_len: int
num_stacked_layers: int
output_arch: reagent.model_utils.seq2slate_utils.Seq2SlateOutputArch
state_dim: int
temperature: float
class reagent.models.seq2slate.Seq2SlateTransformerModel(state_dim: int, candidate_dim: int, num_stacked_layers: int, num_heads: int, dim_model: int, dim_feedforward: int, max_src_seq_len: int, max_tgt_seq_len: int, output_arch: reagent.model_utils.seq2slate_utils.Seq2SlateOutputArch, temperature: float = 1.0, state_embed_dim: Optional[int] = None)

Bases: torch.nn.modules.module.Module

A Seq2Slate network with Transformer. The network is essentially an encoder-decoder structure. The encoder inputs a sequence of candidate feature vectors and a state feature vector, and the decoder outputs an ordered list of candidate indices. The output order is learned through REINFORCE algorithm to optimize sequence-wise reward.

One application example is to rank candidate feeds to a specific user such that the final list of feeds as a whole optimizes the user’s engagement.

Seq2Slate paper: https://arxiv.org/abs/1810.02019 Transformer paper: https://arxiv.org/abs/1706.03762

The model archtecture can also adapt to some variations. (1) The decoder can be autoregressive (2) The decoder can take encoder scores and perform iterative softmax (aka frechet sort) (3) No decoder and the output order is solely based on encoder scores

decode(memory, state, tgt_in_idx, tgt_in_seq)
encode(state, src_seq)
encoder_output_to_scores(state: torch.Tensor, src_seq: torch.Tensor, tgt_out_idx: torch.Tensor) reagent.models.seq2slate.Seq2SlateTransformerOutput
forward(mode: str, state: torch.Tensor, src_seq: torch.Tensor, tgt_in_idx: Optional[torch.Tensor] = None, tgt_out_idx: Optional[torch.Tensor] = None, tgt_in_seq: Optional[torch.Tensor] = None, tgt_seq_len: Optional[int] = None, greedy: Optional[bool] = None) reagent.models.seq2slate.Seq2SlateTransformerOutput
Parameters
  • input – model input

  • mode

    a string indicating which mode to perform. “rank”: return ranked actions and their generative probabilities. “per_seq_log_probs”: return generative log probabilities of given

    tgt sequences (used for REINFORCE training)

    ”per_symbol_log_probs”: return generative log probabilties of each

    symbol in given tgt sequences (used in TEACHER FORCING training)

  • tgt_seq_len – the length of output sequence to be decoded. Only used in rank mode

  • greedy – whether to sample based on softmax distribution or greedily when decoding. Only used in rank mode

training: bool
class reagent.models.seq2slate.Seq2SlateTransformerNet(state_dim: int, candidate_dim: int, num_stacked_layers: int, dim_model: int, max_src_seq_len: int, max_tgt_seq_len: int, output_arch: reagent.model_utils.seq2slate_utils.Seq2SlateOutputArch, temperature: float, num_heads: int, dim_feedforward: int, state_embed_dim: Optional[int] = None)

Bases: reagent.models.seq2slate.Seq2SlateNet

dim_feedforward: int
num_heads: int
state_embed_dim: Optional[int] = None
class reagent.models.seq2slate.Seq2SlateTransformerOutput(ranked_per_symbol_probs, ranked_per_seq_probs, ranked_tgt_out_idx, per_symbol_log_probs, per_seq_log_probs, encoder_scores)

Bases: NamedTuple

encoder_scores: Optional[torch.Tensor]

Alias for field number 5

per_seq_log_probs: Optional[torch.Tensor]

Alias for field number 4

per_symbol_log_probs: Optional[torch.Tensor]

Alias for field number 3

ranked_per_seq_probs: Optional[torch.Tensor]

Alias for field number 1

ranked_per_symbol_probs: Optional[torch.Tensor]

Alias for field number 0

ranked_tgt_out_idx: Optional[torch.Tensor]

Alias for field number 2

class reagent.models.seq2slate.SublayerConnection(dim_model)

Bases: torch.nn.modules.module.Module

A residual connection followed by a layer norm.

forward(x, sublayer)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool

reagent.models.seq2slate_reward module

class reagent.models.seq2slate_reward.Seq2SlateGRURewardNet(state_dim: int, candidate_dim: int, num_stacked_layers: int, dim_model: int, max_src_seq_len: int, max_tgt_seq_len: int)

Bases: reagent.models.seq2slate_reward.Seq2SlateRewardNetBase

embed(state, tgt_in_seq)
forward(input: reagent.core.types.PreprocessedRankingInput)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class reagent.models.seq2slate_reward.Seq2SlateRewardNetBase(state_dim: int, candidate_dim: int, dim_model: int, num_stacked_layers: int, max_src_seq_len: int, max_tgt_seq_len: int)

Bases: reagent.models.base.ModelBase

input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

training: bool
class reagent.models.seq2slate_reward.Seq2SlateRewardNetEnsemble(models: List[reagent.models.base.ModelBase])

Bases: reagent.models.base.ModelBase

forward(state: torch.Tensor, src_seq: torch.Tensor, tgt_out_seq: torch.Tensor, src_src_mask: torch.Tensor, tgt_out_idx: torch.Tensor) torch.Tensor

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class reagent.models.seq2slate_reward.Seq2SlateRewardNetJITWrapper(model: reagent.models.seq2slate_reward.Seq2SlateRewardNetBase)

Bases: reagent.models.base.ModelBase

forward(state: torch.Tensor, src_seq: torch.Tensor, tgt_out_seq: torch.Tensor, src_src_mask: torch.Tensor, tgt_out_idx: torch.Tensor) torch.Tensor

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

input_prototype(use_gpu=False)

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

training: bool
class reagent.models.seq2slate_reward.Seq2SlateTransformerRewardNet(state_dim: int, candidate_dim: int, num_stacked_layers: int, num_heads: int, dim_model: int, dim_feedforward: int, max_src_seq_len: int, max_tgt_seq_len: int)

Bases: reagent.models.seq2slate_reward.Seq2SlateRewardNetBase

decode(memory, state, tgt_src_mask, tgt_in_seq, tgt_tgt_mask, tgt_seq_len)
encode(state, src_seq, src_mask)
forward(input: reagent.core.types.PreprocessedRankingInput)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool

reagent.models.synthetic_reward module

class reagent.models.synthetic_reward.Concat

Bases: torch.nn.modules.module.Module

forward(state: torch.Tensor, action: torch.Tensor)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class reagent.models.synthetic_reward.NGramConvolutionalNetwork(state_dim: int, action_dim: int, sizes: List[int], activations: List[str], last_layer_activation: str, context_size: int, conv_net_params: reagent.core.parameters.ConvNetParameters, use_layer_norm: bool = False)

Bases: torch.nn.modules.module.Module

forward(state: torch.Tensor, action: torch.Tensor) torch.Tensor

Forward pass NGram conv net.

Parameters

shape (input) – seq_len, batch_size, feature_dim

training: bool
class reagent.models.synthetic_reward.NGramFullyConnectedNetwork(state_dim: int, action_dim: int, sizes: List[int], activations: List[str], last_layer_activation: str, context_size: int, use_layer_norm: bool = False)

Bases: torch.nn.modules.module.Module

forward(state: torch.Tensor, action: torch.Tensor) torch.Tensor

Forward pass NGram conv net.

Parameters

shape (input) – seq_len, batch_size, feature_dim

training: bool
class reagent.models.synthetic_reward.PETransformerEncoderLayer(d_model, nhead, dim_feedforward=2048, dropout=0.0, activation='relu', layer_norm_eps=1e-05, max_len=100, use_ff=True, pos_weight=0.5, batch_first=False, device=None, dtype=None)

Bases: torch.nn.modules.module.Module

PETransformerEncoderLayer is made up of Positional Encoding (PE), residual connections, self-attn and feedforward network. Major differences between this implementation and the pytorch official torch.nn.TransformerEncoderLayer are: 1. Augment input data with positional encoding. hat{x} = x + PE{x} 2. Two paralle residual blocks are applied to the raw input data (x) and encoded input data (hat{x}), respectively, i.e. z = Residual(x), hat{z} = Residual(hat{x}) 3. Treat z as the Value input, and hat{z} as the Query and Key input to feed a self-attention block.

Main Args:

d_model: the number of expected features in the input (required). nhead: the number of heads in the multiheadattention models (required). dim_feedforward: the dimension of the feedforward network model (default=2048). activation: the activation function of intermediate layer, relu or gelu (default=relu). layer_norm_eps: the eps value in layer normalization components (default=1e-5). batch_first: If True, then the input and output tensors are provided

as (batch, seq, feature). Default: False.

max_len: argument passed to the Positional Encoding module, see more details in the PositionalEncoding class.

forward(src, src_mask=None, src_key_padding_mask=None)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class reagent.models.synthetic_reward.PositionalEncoding(feature_dim=128, dropout=0.0, max_len=100)

Bases: torch.nn.modules.module.Module

forward(x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class reagent.models.synthetic_reward.ResidualBlock(d_model=64, dim_feedforward=128)

Bases: torch.nn.modules.module.Module

forward(x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class reagent.models.synthetic_reward.SequenceSyntheticRewardNet(state_dim: int, action_dim: int, lstm_hidden_size: int, lstm_num_layers: int, lstm_bidirectional: bool, last_layer_activation: str)

Bases: torch.nn.modules.module.Module

forward(state: torch.Tensor, action: torch.Tensor)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class reagent.models.synthetic_reward.SequentialMultiArguments(*args: torch.nn.modules.module.Module)
class reagent.models.synthetic_reward.SequentialMultiArguments(arg: collections.OrderedDict[str, torch.nn.modules.module.Module])

Bases: torch.nn.modules.container.Sequential

Sequential which can take more than 1 argument in forward function

forward(*inputs)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class reagent.models.synthetic_reward.SingleStepSyntheticRewardNet(state_dim: int, action_dim: int, sizes: List[int], activations: List[str], last_layer_activation: str, use_batch_norm: bool = False, use_layer_norm: bool = False)

Bases: torch.nn.modules.module.Module

forward(state: torch.Tensor, action: torch.Tensor)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class reagent.models.synthetic_reward.SyntheticRewardNet(net: torch.nn.modules.module.Module)

Bases: reagent.models.base.ModelBase

This base class provides basic operations to consume inputs and call a synthetic reward net

A synthetic reward net (self.net) assumes the input contains only torch.Tensors. Expected input shape:

state: seq_len, batch_size, state_dim action: seq_len, batch_size, action_dim

Expected output shape:

reward: batch_size, seq_len

export_mlp()

Export an pytorch nn to feed to predictor wrapper.

forward(training_batch: reagent.core.types.MemoryNetworkInput)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class reagent.models.synthetic_reward.TransformerSyntheticRewardNet(state_dim: int, action_dim: int, d_model: int, nhead: int = 2, num_encoder_layers: int = 2, dim_feedforward: int = 128, dropout: float = 0.0, activation: str = 'relu', last_layer_activation: str = 'leaky_relu', layer_norm_eps: float = 1e-05, max_len: int = 10)

Bases: torch.nn.modules.module.Module

forward(state: torch.Tensor, action: torch.Tensor)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
reagent.models.synthetic_reward.ngram(input: torch.Tensor, context_size: int, ngram_padding: torch.Tensor)

reagent.models.world_model module

class reagent.models.world_model.MemoryNetwork(state_dim, action_dim, num_hiddens, num_hidden_layers, num_gaussians)

Bases: reagent.models.base.ModelBase

forward(state: reagent.core.types.FeatureData, action: reagent.core.types.FeatureData)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

training: bool

Module contents

class reagent.models.BatchConstrainedDQN(state_dim, q_network, imitator_network, bcq_drop_threshold)

Bases: reagent.models.base.ModelBase

forward(state)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

training: bool
class reagent.models.CategoricalDQN(distributional_network: reagent.models.base.ModelBase, *, qmin: float, qmax: float, num_atoms: int)

Bases: reagent.models.base.ModelBase

forward(state: reagent.core.types.FeatureData)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

log_dist(state: reagent.core.types.FeatureData) torch.Tensor
training: bool
class reagent.models.DirichletFullyConnectedActor(state_dim, action_dim, sizes, activations, use_batch_norm=False)

Bases: reagent.models.base.ModelBase

EPSILON = 1e-06
forward(state)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_log_prob(state, action)
input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

training: bool
class reagent.models.DuelingQNetwork(*, shared_network: reagent.models.base.ModelBase, advantage_network: reagent.models.base.ModelBase, value_network: reagent.models.base.ModelBase)

Bases: reagent.models.base.ModelBase

forward(state: reagent.core.types.FeatureData, possible_actions_mask: Optional[torch.Tensor] = None) torch.Tensor

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

classmethod make_fully_connected(state_dim: int, action_dim: int, layers: List[int], activations: List[str], num_atoms: Optional[int] = None, use_batch_norm: bool = False)
training: bool
class reagent.models.EmbeddingBagConcat(state_dim: int, model_feature_config: reagent.core.types.ModelFeatureConfig, embedding_dim: int)

Bases: reagent.models.base.ModelBase

Concatenating embedding with float features before passing the input to DQN

feat2table: Dict[str, str]
forward(state: reagent.core.types.FeatureData)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

property output_dim: int
training: bool
class reagent.models.FullyConnectedActor(state_dim: int, action_dim: int, sizes: List[int], activations: List[str], use_batch_norm: bool = False, action_activation: str = 'tanh', exploration_variance: Optional[float] = None)

Bases: reagent.models.base.ModelBase

forward(state: reagent.core.types.FeatureData) reagent.core.types.ActorOutput

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

training: bool
class reagent.models.FullyConnectedCritic(state_dim: int, action_dim: int, sizes: List[int], activations: List[str], use_batch_norm: bool = False, use_layer_norm: bool = False, output_dim: int = 1)

Bases: reagent.models.base.ModelBase

forward(state: reagent.core.types.FeatureData, action: reagent.core.types.FeatureData)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

training: bool
class reagent.models.FullyConnectedDQN(state_dim, action_dim, sizes, activations, *, output_activation: str = 'linear', num_atoms: Optional[int] = None, use_batch_norm: bool = False, dropout_ratio: float = 0.0, normalized_output: bool = False, use_layer_norm: bool = False)

Bases: reagent.models.fully_connected_network.FloatFeatureFullyConnected

forward(state: reagent.core.types.FeatureData, possible_actions_mask: Optional[torch.Tensor] = None) torch.Tensor

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class reagent.models.FullyConnectedNetwork(layers, activations, *, use_batch_norm: bool = False, min_std: float = 0.0, dropout_ratio: float = 0.0, use_layer_norm: bool = False, normalize_output: bool = False, orthogonal_init: bool = False)

Bases: reagent.models.base.ModelBase

forward(input: torch.Tensor) torch.Tensor

Forward pass for generic feed-forward DNNs. Assumes activation names are valid pytorch activation names. :param input tensor

input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

training: bool
class reagent.models.GaussianFullyConnectedActor(state_dim: int, action_dim: int, sizes: List[int], activations: List[str], scale: float = 0.05, use_batch_norm: bool = False, use_layer_norm: bool = False, use_l2_normalization: bool = False)

Bases: reagent.models.base.ModelBase

forward(state: reagent.core.types.FeatureData)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_log_prob(state: reagent.core.types.FeatureData, squashed_action: torch.Tensor)

Action is expected to be squashed with tanh

input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

training: bool
class reagent.models.MLPScorer(mlp: torch.nn.modules.module.Module, has_user_feat: bool = False)

Bases: reagent.models.base.ModelBase

Log-space in and out

forward(obs: reagent.core.types.FeatureData)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

training: bool
class reagent.models.ModelBase

Bases: torch.nn.modules.module.Module

A base class to support exporting through ONNX

cpu_model()

Override this in DistributedDataParallel models

feature_config() Optional[reagent.core.types.ModelFeatureConfig]

If the model needs additional preprocessing, e.g., using sequence features, returns the config here.

get_distributed_data_parallel_model()

Return DistributedDataParallel version of this model

This needs to be implemented explicitly because: 1) Model with EmbeddingBag module is not compatible with vanilla DistributedDataParallel 2) Exporting logic needs structured data. DistributedDataParallel doesn’t work with structured data.

get_target_network()

Return a copy of this network to be used as target network

Subclass should override this if the target network should share parameters with the network to be trained.

input_prototype() Any

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

training: bool
class reagent.models.ParametricDuelingQNetwork(*, shared_network: reagent.models.base.ModelBase, advantage_network: reagent.models.base.ModelBase, value_network: reagent.models.base.ModelBase)

Bases: reagent.models.base.ModelBase

forward(state: reagent.core.types.FeatureData, action: reagent.core.types.FeatureData) torch.Tensor

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

classmethod make_fully_connected(state_dim: int, action_dim: int, layers: List[int], activations: List[str], use_batch_norm: bool = False)
training: bool
class reagent.models.Seq2RewardNetwork(state_dim, action_dim, num_hiddens, num_hidden_layers)

Bases: reagent.models.base.ModelBase

forward(state: reagent.core.types.FeatureData, action: reagent.core.types.FeatureData, valid_reward_len: Optional[torch.Tensor] = None)

Forward pass of Seq2Reward

Takes in the current state and use it as init hidden The input sequence are pure actions only Output the predicted reward after each time step

Parameters
  • actions – (SEQ_LEN, BATCH_SIZE, ACTION_DIM) torch tensor

  • states – (SEQ_LEN, BATCH_SIZE, STATE_DIM) torch tensor

  • valid_reward_len – (BATCH_SIZE,) torch tensor

Returns

predicated accumulated rewards at last step for the given sequence - acc_reward: (BATCH_SIZE, 1) torch tensor

get_initial_hidden_state(state, batch_size=1)
input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

training: bool
class reagent.models.Sequential(*args: torch.nn.modules.module.Module)
class reagent.models.Sequential(arg: collections.OrderedDict[str, torch.nn.modules.module.Module])

Bases: torch.nn.modules.container.Sequential, reagent.models.base.ModelBase

Used this instead of torch.nn.Sequential to automate model tracing

input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().