reagent package

Subpackages

Submodules

reagent.base_dataclass module

We should revisit this at some point. Config classes shouldn’t subclass from this.

class reagent.base_dataclass.BaseDataClass

Bases: object

reagent.debug_on_error module

reagent.debug_on_error.start()

reagent.json_serialize module

reagent.json_serialize.from_json(j_obj: Any, to_type: Type) → Any
reagent.json_serialize.json_to_object(j: str, to_type: Type) → Any
reagent.json_serialize.object_to_json(o: Any) → str
reagent.json_serialize.prepare_for_json(o: Any) → Any

reagent.parameters module

reagent.parameters_seq2slate module

class reagent.parameters_seq2slate.LearningMethod

Bases: enum.Enum

An enumeration.

DIFFERENTIABLE_REWARD = 'differentiable_reward'
PAIRWISE_ATTENTION = 'pairwise_attention'
REINFORCEMENT_LEARNING = 'reinforcement_learning'
SIMULATION = 'simulation'
TEACHER_FORCING = 'teacher_forcing'
property expect_slate_wise_reward
class reagent.parameters_seq2slate.RewardClamp(clamp_min: Union[float, NoneType] = None, clamp_max: Union[float, NoneType] = None)

Bases: object

clamp_max = None
clamp_min = None

reagent.tensorboardX module

Context library to allow dropping tensorboardX anywhere in the codebase. If there is no SummaryWriter in the context, function calls will be no-op.

Usage:

writer = SummaryWriter()

with summary_writer_context(writer):

some_func()

def some_func():

SummaryWriterContext.add_scalar(“foo”, tensor)

class reagent.tensorboardX.SummaryWriterContext

Bases: object

classmethod add_custom_scalars(writer)

Call this once you are satisfied setting up custom scalar

classmethod add_custom_scalars_multilinechart(tags, category=None, title=None)
classmethod add_histogram(key, val, *args, **kwargs)
classmethod increase_global_step()
classmethod pop()
classmethod push(writer)
class reagent.tensorboardX.SummaryWriterContextMeta

Bases: type

reagent.tensorboardX.summary_writer_context(writer)

reagent.torch_utils module

reagent.torch_utils.dict_to_tensor(batch: Dict[str, numpy.ndarray], device: str = 'cpu')
reagent.torch_utils.export_module_to_buffer(module) → _io.BytesIO
reagent.torch_utils.gather(data, index_2d)

Gather data alongs the second dim. Assume data’s shape as (batch_size, dim1, dim2, …), and index_2d’s shape is (batch_size, dim1). output[i][j] = data[i][index_2d[i][j]]

This function does not require data, output, or index_2d having the same shape, which

is mandated by torch.gather.

reagent.torch_utils.masked_softmax(x, mask, temperature)

Compute softmax values for each sets of scores in x.

reagent.torch_utils.rescale_torch_tensor(tensor: torch.Tensor, new_min: torch.Tensor, new_max: torch.Tensor, prev_min: torch.Tensor, prev_max: torch.Tensor)

Rescale column values in N X M torch tensor to be in new range. Each column m in input tensor will be rescaled from range [prev_min[m], prev_max[m]] to [new_min[m], new_max[m]]

reagent.torch_utils.softmax(x, temperature)

Compute softmax values for each sets of scores in x.

reagent.torch_utils.stack(mems)

Stack a list of tensors Could use torch.stack here but torch.stack is much slower than torch.cat + view Submitted an issue for investigation: https://github.com/pytorch/pytorch/issues/22462

FIXME: Remove this function after the issue above is resolved

reagent.types module

class reagent.types.ActorOutput(action: torch.Tensor, log_prob: Union[torch.Tensor, NoneType] = None, action_mean: Union[torch.Tensor, NoneType] = None)

Bases: reagent.types.TensorDataClass

action_mean = None
log_prob = None
class reagent.types.BaseInput(state: reagent.types.FeatureData, next_state: reagent.types.FeatureData, reward: torch.Tensor, time_diff: torch.Tensor, step: Optional[torch.Tensor], not_terminal: torch.Tensor)

Bases: reagent.types.TensorDataClass

Base class for all inputs, both raw and preprocessed

batch_size()
classmethod from_dict(batch)
class reagent.types.DiscreteDqnInput(state: reagent.types.FeatureData, next_state: reagent.types.FeatureData, reward: torch.Tensor, time_diff: torch.Tensor, step: Union[torch.Tensor, NoneType], not_terminal: torch.Tensor, action: torch.Tensor, next_action: torch.Tensor, possible_actions_mask: torch.Tensor, possible_next_actions_mask: torch.Tensor, extras: reagent.types.ExtraData)

Bases: reagent.types.BaseInput

classmethod from_dict(batch)
class reagent.types.DocList(float_features: torch.Tensor, mask: torch.Tensor, value: torch.Tensor)

Bases: reagent.types.TensorDataClass

as_feature_data()
select_slate(action: torch.Tensor)
class reagent.types.DqnPolicyActionSet(greedy: int, softmax: Union[int, NoneType] = None, greedy_act_name: Union[str, NoneType] = None, softmax_act_name: Union[str, NoneType] = None, softmax_act_prob: Union[float, NoneType] = None)

Bases: reagent.types.TensorDataClass

greedy_act_name = None
softmax = None
softmax_act_name = None
softmax_act_prob = None
class reagent.types.ExtraData(mdp_id: Union[torch.Tensor, NoneType] = None, sequence_number: Union[torch.Tensor, NoneType] = None, action_probability: Union[torch.Tensor, NoneType] = None, max_num_actions: Union[int, NoneType] = None, metrics: Union[torch.Tensor, NoneType] = None)

Bases: reagent.types.TensorDataClass

action_probability = None
classmethod from_dict(d)
max_num_actions = None
mdp_id = None
metrics = None
sequence_number = None
class reagent.types.FeatureData(float_features: torch.Tensor, id_list_features: Dict[str, Tuple[torch.Tensor, torch.Tensor]] = <factory>, id_score_list_features: Dict[str, Tuple[torch.Tensor, torch.Tensor, torch.Tensor]] = <factory>, stacked_float_features: Union[torch.Tensor, NoneType] = None, candidate_docs: Union[reagent.types.DocList, NoneType] = None, time_since_first: Union[torch.Tensor, NoneType] = None)

Bases: reagent.types.TensorDataClass

candidate_docs = None
get_tiled_batch(num_tiles: int)
property has_float_features_only
stacked_float_features = None
time_since_first = None
class reagent.types.FloatFeatureInfo(name: str, feature_id: int)

Bases: reagent.base_dataclass.BaseDataClass

class reagent.types.IdListFeatureConfig(name: str, feature_id: int, id_mapping_name: str)

Bases: reagent.base_dataclass.BaseDataClass

class reagent.types.IdMapping(ids: List[int])

Bases: object

property id2index

used in preprocessing ids list represents mapping from idx -> value we want the reverse: from feature to embedding table indices

property ids
property table_size
class reagent.types.IdScoreListFeatureConfig(name: str, feature_id: int, id_mapping_name: str)

Bases: reagent.base_dataclass.BaseDataClass

class reagent.types.MemoryNetworkInput(state: reagent.types.FeatureData, next_state: reagent.types.FeatureData, reward: torch.Tensor, time_diff: torch.Tensor, step: Union[torch.Tensor, NoneType], not_terminal: torch.Tensor, action: torch.Tensor)

Bases: reagent.types.BaseInput

batch_size()
class reagent.types.MemoryNetworkOutput(mus: torch.Tensor, sigmas: torch.Tensor, logpi: torch.Tensor, reward: torch.Tensor, not_terminal: torch.Tensor, last_step_lstm_hidden: torch.Tensor, last_step_lstm_cell: torch.Tensor, all_steps_lstm_hidden: torch.Tensor)

Bases: reagent.types.TensorDataClass

class reagent.types.ModelFeatureConfig(float_feature_infos: List[reagent.types.FloatFeatureInfo] = <factory>, id_mapping_config: Dict[str, reagent.types.IdMapping] = <factory>, id_list_feature_configs: List[reagent.types.IdListFeatureConfig] = <factory>, id_score_list_feature_configs: List[reagent.types.IdScoreListFeatureConfig] = <factory>)

Bases: reagent.base_dataclass.BaseDataClass

property id2config
property id2name
property name2config
property name2id
property only_dense
class reagent.types.NoDuplicatedWarningLogger(logger)

Bases: object

warning(msg)
class reagent.types.ParametricDqnInput(state: reagent.types.FeatureData, next_state: reagent.types.FeatureData, reward: torch.Tensor, time_diff: torch.Tensor, step: Union[torch.Tensor, NoneType], not_terminal: torch.Tensor, action: reagent.types.FeatureData, next_action: reagent.types.FeatureData, possible_actions: reagent.types.FeatureData, possible_actions_mask: torch.Tensor, possible_next_actions: reagent.types.FeatureData, possible_next_actions_mask: torch.Tensor, extras: Union[reagent.types.ExtraData, NoneType] = None)

Bases: reagent.types.BaseInput

extras = None
classmethod from_dict(batch)
class reagent.types.PlanningPolicyOutput(next_best_continuous_action: Union[torch.Tensor, NoneType] = None, next_best_discrete_action_one_hot: Union[torch.Tensor, NoneType] = None, next_best_discrete_action_idx: Union[int, NoneType] = None)

Bases: reagent.types.TensorDataClass

next_best_continuous_action = None
next_best_discrete_action_idx = None
next_best_discrete_action_one_hot = None
class reagent.types.PolicyGradientInput(state: reagent.types.FeatureData, action: torch.Tensor, reward: torch.Tensor, log_prob: torch.Tensor)

Bases: reagent.base_dataclass.BaseDataClass

classmethod input_prototype()
class reagent.types.PolicyNetworkInput(state: reagent.types.FeatureData, next_state: reagent.types.FeatureData, reward: torch.Tensor, time_diff: torch.Tensor, step: Union[torch.Tensor, NoneType], not_terminal: torch.Tensor, action: reagent.types.FeatureData, next_action: reagent.types.FeatureData, extras: Union[reagent.types.ExtraData, NoneType] = None)

Bases: reagent.types.BaseInput

batch_size() → int
extras = None
classmethod from_dict(batch)
class reagent.types.PreprocessedRankingInput(state: reagent.types.FeatureData, src_seq: reagent.types.FeatureData, src_src_mask: torch.Tensor, tgt_in_seq: Union[reagent.types.FeatureData, NoneType] = None, tgt_out_seq: Union[reagent.types.FeatureData, NoneType] = None, tgt_tgt_mask: Union[torch.Tensor, NoneType] = None, slate_reward: Union[torch.Tensor, NoneType] = None, position_reward: Union[torch.Tensor, NoneType] = None, src_in_idx: Union[torch.Tensor, NoneType] = None, tgt_in_idx: Union[torch.Tensor, NoneType] = None, tgt_out_idx: Union[torch.Tensor, NoneType] = None, tgt_out_probs: Union[torch.Tensor, NoneType] = None, optim_tgt_in_idx: Union[torch.Tensor, NoneType] = None, optim_tgt_out_idx: Union[torch.Tensor, NoneType] = None, optim_tgt_in_seq: Union[reagent.types.FeatureData, NoneType] = None, optim_tgt_out_seq: Union[reagent.types.FeatureData, NoneType] = None)

Bases: reagent.types.TensorDataClass

batch_size() → int
classmethod from_tensors(state: torch.Tensor, src_seq: torch.Tensor, src_src_mask: torch.Tensor, tgt_in_seq: Optional[torch.Tensor] = None, tgt_out_seq: Optional[torch.Tensor] = None, tgt_tgt_mask: Optional[torch.Tensor] = None, slate_reward: Optional[torch.Tensor] = None, position_reward: Optional[torch.Tensor] = None, src_in_idx: Optional[torch.Tensor] = None, tgt_in_idx: Optional[torch.Tensor] = None, tgt_out_idx: Optional[torch.Tensor] = None, tgt_out_probs: Optional[torch.Tensor] = None, optim_tgt_in_idx: Optional[torch.Tensor] = None, optim_tgt_out_idx: Optional[torch.Tensor] = None, optim_tgt_in_seq: Optional[torch.Tensor] = None, optim_tgt_out_seq: Optional[torch.Tensor] = None, **kwargs)
optim_tgt_in_idx = None
optim_tgt_in_seq = None
optim_tgt_out_idx = None
optim_tgt_out_seq = None
position_reward = None
slate_reward = None
src_in_idx = None
tgt_in_idx = None
tgt_in_seq = None
tgt_out_idx = None
tgt_out_probs = None
tgt_out_seq = None
tgt_tgt_mask = None
class reagent.types.PreprocessedTrainingBatch(training_input: reagent.types.PreprocessedRankingInput, extras: reagent.types.ExtraData = <factory>)

Bases: reagent.types.TensorDataClass

batch_size()
class reagent.types.RankingOutput(ranked_tgt_out_idx: Union[torch.Tensor, NoneType] = None, ranked_tgt_out_probs: Union[torch.Tensor, NoneType] = None, log_probs: Union[torch.Tensor, NoneType] = None, encoder_scores: Union[torch.Tensor, NoneType] = None)

Bases: reagent.types.TensorDataClass

encoder_scores = None
log_probs = None
ranked_tgt_out_idx = None
ranked_tgt_out_probs = None
class reagent.types.RewardNetworkOutput(predicted_reward: torch.Tensor)

Bases: reagent.types.TensorDataClass

class reagent.types.Seq2RewardOutput(acc_reward: torch.Tensor)

Bases: reagent.types.TensorDataClass

class reagent.types.ServingFeatureData(float_features_with_presence, id_list_features, id_score_list_features)

Bases: tuple

property float_features_with_presence

Alias for field number 0

property id_list_features

Alias for field number 1

property id_score_list_features

Alias for field number 2

class reagent.types.SlateQInput(state: reagent.types.FeatureData, next_state: reagent.types.FeatureData, reward: torch.Tensor, time_diff: torch.Tensor, step: Optional[torch.Tensor], not_terminal: torch.Tensor, action: torch.Tensor, next_action: torch.Tensor, reward_mask: torch.Tensor, extras: Optional[reagent.types.ExtraData] = None)

Bases: reagent.types.BaseInput

The shapes of reward, reward_mask, & next_item_mask are (batch_size, slate_size).

reward_mask indicated whether the reward could be observed, e.g., the item got into viewport or not.

extras = None
classmethod from_dict(d)
class reagent.types.TensorDataClass

Bases: reagent.base_dataclass.BaseDataClass

cuda(*args, **kwargs)
class reagent.types.TensorFeatureData(*args, **kwargs)

Bases: torch.nn.Module

Primarily for using in nn.Sequential

forward(input: torch.Tensor) → reagent.types.FeatureData
class reagent.types.ValuePresence(value: torch.Tensor, presence: Union[torch.Tensor, NoneType])

Bases: reagent.types.TensorDataClass

reagent.types.isinstance_namedtuple(x)

Module contents