ml.rl.models package

Submodules

ml.rl.models.actor module

class ml.rl.models.actor.DirichletFullyConnectedActor(state_dim, action_dim, sizes, activations, use_batch_norm=False)

Bases: ml.rl.models.base.ModelBase

EPSILON = 1e-06
forward(input)
get_log_prob(state, action)
input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

class ml.rl.models.actor.FullyConnectedActor(state_dim, action_dim, sizes, activations, use_batch_norm=False, action_activation='tanh')

Bases: ml.rl.models.base.ModelBase

forward(input)
input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

class ml.rl.models.actor.GaussianFullyConnectedActor(state_dim, action_dim, sizes, activations, scale=0.05, use_batch_norm=False, use_layer_norm=False)

Bases: ml.rl.models.base.ModelBase

forward(input)
get_log_prob(state, squashed_action)

Action is expected to be squashed with tanh

input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

ml.rl.models.base module

class ml.rl.models.base.ModelBase(*args, **kwargs)

Bases: torch.nn.Module

A base class to support exporting through ONNX

cpu_model()

Override this in DistributedDataParallel models

feature_config() → Optional[ml.rl.types.ModelFeatureConfig]

If the model needs additional preprocessing, e.g., using sequence features, returns the config here.

get_distributed_data_parallel_model()

Return DistributedDataParallel version of this model

This needs to be implemented explicitly because: 1) Model with EmbeddingBag module is not compatible with vanilla DistributedDataParallel 2) Exporting logic needs structured data. DistributedDataParallel doesn’t work with structured data.

get_target_network()

Return a copy of this network to be used as target network

Subclass should override this if the target network should share parameters with the network to be trained.

input_prototype() → Any

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

ml.rl.models.bcq module

class ml.rl.models.bcq.BatchConstrainedDQN(state_dim, q_network, imitator_network, bcq_drop_threshold)

Bases: ml.rl.models.base.ModelBase

forward(input)
input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

ml.rl.models.categorical_dqn module

class ml.rl.models.categorical_dqn.CategoricalDQN(state_dim, action_dim, num_atoms, qmin, qmax, sizes, activations, use_batch_norm=False, dropout_ratio=0.0, use_gpu=False)

Bases: ml.rl.models.base.ModelBase

forward(input: ml.rl.types.PreprocessedState)
input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

log_dist(input: ml.rl.types.PreprocessedState)
serving_model()

ml.rl.models.cem_planner module

A network which implements a cross entropy method-based planner

The planner plans the best next action based on simulation data generated by an ensemble of world models.

The idea is inspired by: https://arxiv.org/abs/1805.12114

class ml.rl.models.cem_planner.CEMPlanner(cem_planner_network: ml.rl.models.cem_planner.CEMPlannerNetwork, plan_horizon_length: int, state_dim: int, action_dim: int, discrete_action: bool)

Bases: ml.rl.models.base.ModelBase

forward(input: ml.rl.types.PreprocessedState)
get_distributed_data_parallel_model()

Return DistributedDataParallel version of this model

This needs to be implemented explicitly because: 1) Model with EmbeddingBag module is not compatible with vanilla DistributedDataParallel 2) Exporting logic needs structured data. DistributedDataParallel doesn’t work with structured data.

input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

class ml.rl.models.cem_planner.CEMPlannerNetwork(mem_net_list: List[ml.rl.models.world_model.MemoryNetwork], cem_num_iterations: int, cem_population_size: int, ensemble_population_size: int, num_elites: int, plan_horizon_length: int, state_dim: int, action_dim: int, discrete_action: bool, terminal_effective: bool, gamma: float, alpha: float = 0.25, epsilon: float = 0.001, action_upper_bounds: Optional[numpy.ndarray] = None, action_lower_bounds: Optional[numpy.ndarray] = None)

Bases: torch.nn.Module

acc_rewards_of_all_solutions(input: ml.rl.types.PreprocessedState, solutions: torch.Tensor) → float

Calculate accumulated rewards of solutions.

Parameters
  • input – the input which contains the starting state

  • solutions – its shape is (cem_pop_size, plan_horizon_length, action_dim)

Returns

a vector of size cem_pop_size, which is the reward of each solution

acc_rewards_of_one_solution(init_state: torch.Tensor, solution: torch.Tensor, solution_idx: int)

ensemble_pop_size trajectories will be sampled to evaluate a CEM solution. Each trajectory is generated by one world model

Parameters
  • init_state – its shape is (state_dim, )

  • solution – its shape is (plan_horizon_length, action_dim)

  • solution_idx – the index of the solution

Return reward

Reward of each of ensemble_pop_size trajectories

constrained_variance(mean, var)
continuous_planning(input: ml.rl.types.PreprocessedState) → numpy.ndarray
discrete_planning(input: ml.rl.types.PreprocessedState) → Tuple[int, numpy.ndarray]
forward(input: ml.rl.types.PreprocessedState)
sample_reward_next_state_terminal(world_model_input: ml.rl.types.PreprocessedStateAction, mem_net: ml.rl.models.world_model.MemoryNetwork)

Sample one-step dynamics based on the provided world model

ml.rl.models.convolutional_network module

class ml.rl.models.convolutional_network.ConvolutionalNetwork(cnn_parameters, layers, activations)

Bases: torch.nn.Module

conv_forward(input)
forward(input) → torch.FloatTensor

Forward pass for generic convnet DNNs. Assumes activation names are valid pytorch activation names. :param input image tensor

ml.rl.models.dqn module

class ml.rl.models.dqn.FullyConnectedDQN(state_dim, action_dim, sizes, activations, use_batch_norm=False, dropout_ratio=0.0)

Bases: ml.rl.models.base.ModelBase

forward(input: ml.rl.types.PreprocessedState)
get_distributed_data_parallel_model()

Return DistributedDataParallel version of this model

This needs to be implemented explicitly because: 1) Model with EmbeddingBag module is not compatible with vanilla DistributedDataParallel 2) Exporting logic needs structured data. DistributedDataParallel doesn’t work with structured data.

input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

ml.rl.models.dueling_q_network module

class ml.rl.models.dueling_q_network.DuelingQNetwork(layers, activations, use_batch_norm=False, action_dim=0)

Bases: ml.rl.models.base.ModelBase

forward(input) → Union[NamedTuple, torch.FloatTensor]
input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

ml.rl.models.dueling_quantile_dqn module

class ml.rl.models.dueling_quantile_dqn.DuelingQuantileDQN(layers, activations, num_atoms=50, use_batch_norm=False)

Bases: ml.rl.models.base.ModelBase

dist(input: ml.rl.types.PreprocessedState)
forward(input: ml.rl.types.PreprocessedState)
input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

ml.rl.models.example_sequence_model module

class ml.rl.models.example_sequence_model.ExampleSequenceModel(state_dim)

Bases: ml.rl.models.base.ModelBase

feature_config()

If the model needs additional preprocessing, e.g., using sequence features, returns the config here.

forward(state)
input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

class ml.rl.models.example_sequence_model.ExampleSequenceModelOutput(value:torch.Tensor)

Bases: object

class ml.rl.models.example_sequence_model.SequenceFeatures(watched_videos: ml.rl.models.example_sequence_model.WatchedVideoSequence)

Bases: ml.rl.types.SequenceFeatures

The whole class hierarchy can be created dynamically from config. Another diff will show this.

class ml.rl.models.example_sequence_model.VideoIDFeatures(page_id:torch.Tensor)

Bases: ml.rl.types.IdFeatureBase

classmethod get_feature_config() → Dict[str, ml.rl.types.IdFeatureConfig]

Returns mapping from feature name, which must be a field in this dataclass, to feature config.

class ml.rl.models.example_sequence_model.WatchedVideoSequence(id_features:ml.rl.models.example_sequence_model.VideoIDFeatures, float_features:Union[ml.rl.types.ValuePresence, NoneType])

Bases: ml.rl.types.SequenceFeatureBase

classmethod get_float_feature_infos() → List[ml.rl.types.FloatFeatureInfo]

Override this if the sequence has float features associated to it. Float features should be stored as ID-score-list, where the ID part corresponds to primary entity ID of the sequence. E.g., if this is a sequence of previously watched videos, then the key should be video ID.

classmethod get_max_length() → int

Subclass should return the max-length of this sequence. If the raw data is longer, feature extractor will truncate the front. If the raw data is shorter, feature extractor will fill the front with zero.

ml.rl.models.fully_connected_network module

class ml.rl.models.fully_connected_network.FullyConnectedNetwork(layers, activations, use_batch_norm=False, min_std=0.0, dropout_ratio=0.0, use_layer_norm=False)

Bases: torch.nn.Module

forward(input: torch.FloatTensor) → torch.FloatTensor

Forward pass for generic feed-forward DNNs. Assumes activation names are valid pytorch activation names. :param input tensor

ml.rl.models.fully_connected_network.gaussian_fill_w_gain(tensor, activation, dim_in, min_std=0.0) → None

Gaussian initialization with gain.

ml.rl.models.mdn_rnn module

class ml.rl.models.mdn_rnn.MDNRNN(state_dim, action_dim, num_hiddens, num_hidden_layers, num_gaussians)

Bases: ml.rl.models.mdn_rnn._MDNRNNBase

Mixture Density Network - Recurrent Neural Network

forward(actions, states, hidden=None)

Forward pass of MDN-RNN

Parameters
  • actions – (SEQ_LEN, BATCH_SIZE, ACTION_DIM) torch tensor

  • states – (SEQ_LEN, BATCH_SIZE, STATE_DIM) torch tensor

Returns

parameters of the GMM prediction for the next state,

gaussian prediction of the reward and logit prediction of non-terminality. And the RNN’s outputs.

  • mus: (SEQ_LEN, BATCH_SIZE, NUM_GAUSSIANS, STATE_DIM) torch tensor

  • sigmas: (SEQ_LEN, BATCH_SIZE, NUM_GAUSSIANS, STATE_DIM) torch tensor

  • logpi: (SEQ_LEN, BATCH_SIZE, NUM_GAUSSIANS) torch tensor

  • reward: (SEQ_LEN, BATCH_SIZE) torch tensor

  • not_terminal: (SEQ_LEN, BATCH_SIZE) torch tensor

  • last_step_hidden_and_cell: TUPLE(

    (NUM_LAYERS, BATCH_SIZE, HIDDEN_SIZE), (NUM_LAYERS, BATCH_SIZE, HIDDEN_SIZE)

) torch tensor - all_steps_hidden: (SEQ_LEN, BATCH_SIZE, HIDDEN_SIZE) torch tensor

get_initial_hidden_state(batch_size=1)
class ml.rl.models.mdn_rnn.MDNRNNMemoryPool(max_replay_memory_size)

Bases: object

deque_sample(indices)
insert_into_memory(state, action, next_state, reward, not_terminal)
property memory_size
sample_memories(batch_size, use_gpu=False, batch_first=False) → ml.rl.types.PreprocessedTrainingBatch
Parameters
  • batch_size – number of samples to return

  • use_gpu – whether to put samples on gpu

  • batch_first – If True, the first dimension of data is batch_size. If False (default), the first dimension is SEQ_LEN. Therefore, state’s shape is SEQ_LEN x BATCH_SIZE x STATE_DIM, for example. By default, MDN-RNN consumes data with SEQ_LEN as the first dimension.

class ml.rl.models.mdn_rnn.MDNRNNMemorySample(state, action, next_state, reward, not_terminal)

Bases: tuple

property action

Alias for field number 1

property next_state

Alias for field number 2

property not_terminal

Alias for field number 4

property reward

Alias for field number 3

property state

Alias for field number 0

ml.rl.models.mdn_rnn.gmm_loss(batch, mus, sigmas, logpi, reduce=True)

Computes the gmm loss.

Compute minus the log probability of batch under the GMM model described by mus, sigmas, pi. Precisely, with bs1, bs2, … the sizes of the batch dimensions (several batch dimension are useful when you have both a batch axis and a time step axis), gs the number of mixtures and fs the number of features.

Parameters
  • batch – (bs1, bs2, *, fs) torch tensor

  • mus – (bs1, bs2, *, gs, fs) torch tensor

  • sigmas – (bs1, bs2, *, gs, fs) torch tensor

  • logpi – (bs1, bs2, *, gs) torch tensor

  • reduce – if not reduce, the mean in the following formula is omitted

Returns

loss(batch) = - mean_{i1=0..bs1, i2=0..bs2, …} log(
sum_{k=1..gs} pi[i1, i2, …, k] * N(

batch[i1, i2, …, :] | mus[i1, i2, …, k, :], sigmas[i1, i2, …, k, :]))

NOTE: The loss is not reduced along the feature dimension (i.e. it should scale linearily with fs).

Adapted from: https://github.com/ctallec/world-models

ml.rl.models.mdn_rnn.transpose(*args)

ml.rl.models.no_soft_update_embedding module

class ml.rl.models.no_soft_update_embedding.NoSoftUpdateEmbedding(*args, **kwargs)

Bases: torch.nn.Embedding

Use this instead of vanilla Embedding module to avoid soft-updating the embedding table in the target network.

ml.rl.models.parametric_dqn module

class ml.rl.models.parametric_dqn.FullyConnectedParametricDQN(state_dim, action_dim, sizes, activations, use_batch_norm=False, use_layer_norm=False, output_dim=1)

Bases: ml.rl.models.base.ModelBase

forward(input)
get_distributed_data_parallel_model()

Return DistributedDataParallel version of this model

This needs to be implemented explicitly because: 1) Model with EmbeddingBag module is not compatible with vanilla DistributedDataParallel 2) Exporting logic needs structured data. DistributedDataParallel doesn’t work with structured data.

input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

class ml.rl.models.parametric_dqn.ParametricDQNWithPreprocessing(q_network, state_preprocessor, action_preprocessor=None)

Bases: ml.rl.models.base.ModelBase

forward(input)
input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

ml.rl.models.quantile_dqn module

class ml.rl.models.quantile_dqn.QuantileDQN(state_dim, action_dim, sizes, activations, num_atoms=50, use_batch_norm=False, dropout_ratio=0.0)

Bases: ml.rl.models.base.ModelBase

dist(input: ml.rl.types.PreprocessedState)
forward(input: ml.rl.types.PreprocessedState)
input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

ml.rl.models.seq2slate module

class ml.rl.models.seq2slate.BaselineNet(state_dim, dim_feedforward, num_stacked_layers)

Bases: torch.nn.Module

forward(input: ml.rl.types.PreprocessedRankingInput)
class ml.rl.models.seq2slate.Decoder(layer, num_layers)

Bases: torch.nn.Module

Generic num_layers layer decoder with masking.

forward(x, memory, tgt_src_mask, tgt_tgt_mask)
class ml.rl.models.seq2slate.DecoderLayer(size, self_attn, src_attn, feed_forward)

Bases: torch.nn.Module

Decoder is made of self-attn, src-attn, and feed forward

forward(x, m, tgt_src_mask, tgt_tgt_mask)
class ml.rl.models.seq2slate.Embedder(dim_in, dim_out)

Bases: torch.nn.Module

forward(x)
class ml.rl.models.seq2slate.Encoder(layer, num_layers)

Bases: torch.nn.Module

Core encoder is a stack of num_layers layers

forward(x, mask)

Pass the input (and mask) through each layer in turn.

class ml.rl.models.seq2slate.EncoderLayer(dim_model, self_attn, feed_forward)

Bases: torch.nn.Module

Encoder is made up of self-attn and feed forward

forward(src_embed, src_mask)
class ml.rl.models.seq2slate.Generator(dim_model, candidate_size)

Bases: torch.nn.Module

Define standard linear + softmax generation step.

forward(mode, decoder_output=None, tgt_in_idx=None, greedy=None)
class ml.rl.models.seq2slate.MultiHeadedAttention(num_heads, dim_model)

Bases: torch.nn.Module

forward(query, key, value, mask=None)
class ml.rl.models.seq2slate.PositionalEncoding(dim_model, max_len=5000)

Bases: torch.nn.Module

forward(x, seq_len)
class ml.rl.models.seq2slate.PositionwiseFeedForward(dim_model, dim_feedforward)

Bases: torch.nn.Module

forward(x)
class ml.rl.models.seq2slate.Seq2SlateTransformerModel(state_dim: int, candidate_dim: int, num_stacked_layers: int, num_heads: int, dim_model: int, dim_feedforward: int, max_src_seq_len: int, max_tgt_seq_len: int)

Bases: torch.nn.Module

A Seq2Slate network with Transformer. The network is essentially an encoder-decoder structure. The encoder inputs a sequence of candidate feature vectors and a state feature vector, and the decoder outputs an ordered list of candidate indices. The output order is learned through REINFORCE algorithm to optimize some sequence-wise reward which is also specific to the provided state feature.

One application example is to rank candidate feeds to a specific user such that the final list of feeds as a whole optimizes the user’s engagement.

Seq2Slate paper: https://arxiv.org/abs/1810.02019 Transformer paper: https://arxiv.org/abs/1706.03762

decode(memory, state, tgt_src_mask, tgt_seq, tgt_tgt_mask, tgt_seq_len)
encode(state, src_seq, src_mask)
forward(input: ml.rl.types.PreprocessedRankingInput, mode: str, tgt_seq_len: Optional[int] = None, greedy: Optional[bool] = None)
Parameters
  • input – model input

  • mode – a string indicating which mode to perform. “rank”: return ranked actions and their generative probabilities. “log_probs”: return generative log probabilities of given tgt sequences (used for REINFORCE training)

  • tgt_seq_len – the length of output sequence to be decoded. Only used in rank mode

  • greedy – whether to sample based on softmax distribution or greedily when decoding. Only used in rank mode

class ml.rl.models.seq2slate.Seq2SlateTransformerNet(state_dim: int, candidate_dim: int, num_stacked_layers: int, num_heads: int, dim_model: int, dim_feedforward: int, max_src_seq_len: int, max_tgt_seq_len: int)

Bases: ml.rl.models.base.ModelBase

forward(input: ml.rl.types.PreprocessedRankingInput, mode: str, tgt_seq_len: Optional[int] = None, greedy: Optional[bool] = None)
get_distributed_data_parallel_model()

Return DistributedDataParallel version of this model

This needs to be implemented explicitly because: 1) Model with EmbeddingBag module is not compatible with vanilla DistributedDataParallel 2) Exporting logic needs structured data. DistributedDataParallel doesn’t work with structured data.

input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

class ml.rl.models.seq2slate.SublayerConnection(dim_model)

Bases: torch.nn.Module

A residual connection followed by a layer norm.

forward(x, sublayer)
ml.rl.models.seq2slate.attention(query, key, value, mask, d_k)

Scaled Dot Product Attention

ml.rl.models.seq2slate.clones(module, N)

Produce N identical layers.

Parameters
  • module – nn.Module class

  • N – number of copies

ml.rl.models.seq2slate.subsequent_and_padding_mask(tgt_in_idx)

Create a mask to hide padding and future items

ml.rl.models.seq2slate.subsequent_mask(size)

Mask out subsequent positions. Mainly used in the decoding process, in which an item should not attend subsequent items.

ml.rl.models.world_model module

class ml.rl.models.world_model.MemoryNetwork(state_dim, action_dim, num_hiddens, num_hidden_layers, num_gaussians)

Bases: ml.rl.models.base.ModelBase

forward(input)
get_distributed_data_parallel_model()

Return DistributedDataParallel version of this model

This needs to be implemented explicitly because: 1) Model with EmbeddingBag module is not compatible with vanilla DistributedDataParallel 2) Exporting logic needs structured data. DistributedDataParallel doesn’t work with structured data.

input_prototype()

This function provides the input for ONNX graph tracing.

The return value should be what expected by forward().

Module contents