reagent.core package

Submodules

reagent.core.aggregators module

class reagent.core.aggregators.ActionCountAggregator(key: str, actions: List[str])

Bases: reagent.core.aggregators.TensorAggregator

Counting the frequency of each action. Actions are indexed from 0 to len(actions) - 1. The input is assumed to contain action index.

aggregate(values)
get_cumulative_distributions() Dict[str, float]

Returns the cumulative distributions in each aggregating step

get_distributions() Dict[str, List[float]]

Returns the action disributions in each aggregating step

class reagent.core.aggregators.EpochListAggregator(key: str)

Bases: reagent.core.aggregators.TensorAggregator

aggregate(values)
flush()
class reagent.core.aggregators.FunctionsByActionAggregator(key: str, actions: List[str], fns: Dict[str, Callable])

Bases: reagent.core.aggregators.TensorAggregator

Aggregating the input by action, using the given functions. The input is assumed to be an N x D tensor, where each column is an action, and each row is an example. This takes a dictionary of functions so that the values only need to be concatenated once.

Example

agg = FunctionByActionAggregator(

“model_values”, [“A”, “B], {“mean”: torch.mean, “std”: torch.std}

)

input = torch.tensor([

[0.9626, 0.7142], [0.7216, 0.5426], [0.4225, 0.9485],

]) agg(input) input2 = torch.tensor([

[0.0103, 0.0306], [0.9846, 0.8373], [0.4614, 0.0174],

]) agg(input2) print(agg.values)

{
“mean”: {

“A”: [0.7022, 0.4854], “B”: [0.7351, 0.2951],

}, “std”: {

“A”: [0.2706, 0.4876], “B”: [0.2038, 0.4696],

}

}

aggregate(values)
class reagent.core.aggregators.ListAggregator(key: str)

Bases: reagent.core.tracker.Aggregator

aggregate(values)
class reagent.core.aggregators.MeanAggregator(key: str)

Bases: reagent.core.aggregators.TensorAggregator

aggregate(values)
class reagent.core.aggregators.RecentValuesAggregator(key: str, size: int = 1000000)

Bases: reagent.core.aggregators.TensorAggregator

aggregate(values)
class reagent.core.aggregators.TensorAggregator(key: str)

Bases: reagent.core.tracker.Aggregator

class reagent.core.aggregators.TensorBoardActionCountAggregator(key: str, title: str, actions: List[str])

Bases: reagent.core.aggregators.TensorAggregator

aggregate(values)
class reagent.core.aggregators.TensorBoardActionHistogramAndMeanAggregator(key: str, category: str, title: str, actions: List[str], log_key_prefix: Optional[str] = None)

Bases: reagent.core.aggregators.TensorAggregator

aggregate(values)
class reagent.core.aggregators.TensorBoardHistogramAndMeanAggregator(key: str, log_key: str)

Bases: reagent.core.aggregators.TensorAggregator

aggregate(values)

reagent.core.base_dataclass module

We should revisit this at some point. Config classes shouldn’t subclass from this.

class reagent.core.base_dataclass.BaseDataClass

Bases: object

reagent.core.configuration module

reagent.core.configuration.make_config_class(func, allowlist: Optional[List[str]] = None, blocklist: Optional[List[str]] = None, blocklist_types: List[Type] = [torch.nn.Module])

Create a decorator to create dataclass with the arguments of func as fields. Only annotated arguments are converted to fields. If the default value is mutable, you must use dataclass.field(default_factory=default_factory) as default. In that case, the func has to be wrapped with @resolve_defaults below.

allowlist & blocklist are mutually exclusive.

reagent.core.configuration.param_hash(p)

Use this to make parameters hashable. This is required because __hash__() is not inherited when subclass redefines __eq__(). We only need this when the parameter dataclass has a list or dict field.

reagent.core.configuration.resolve_defaults(func)

Use this decorator to resolve default field values in the constructor.

reagent.core.dataclasses module

reagent.core.dataclasses.dataclass(_cls: Optional[Any] = None, *, config=None, **kwargs)

reagent.core.debug_on_error module

reagent.core.debug_on_error.start()

reagent.core.fb_checker module

reagent.core.fb_checker.is_fb_environment()

reagent.core.multiprocess_utils module

reagent.core.multiprocess_utils.deserialize_and_run(serialized_fn: bytes, serialized_args: List[bytes], serialized_kwargs: Dict[str, bytes], *args, **kwargs) bytes
reagent.core.multiprocess_utils.unwrap_function_outputs(outputs: List[bytes])
reagent.core.multiprocess_utils.wrap_function_arguments(fn, *args, **kwargs)

reagent.core.observers module

class reagent.core.observers.CompositeObserver(observers: Iterable[reagent.core.tracker.Observer])

Bases: reagent.core.tracker.Observer

A composite observer which takes care of dispatching values to child observers

update(key: str, value)
class reagent.core.observers.EpochEndObserver(callback, key: str = 'epoch_end')

Bases: reagent.core.tracker.Observer

Call the callback function with epoch # when the epoch ends

update(key: str, value)
class reagent.core.observers.IntervalAggregatingObserver(interval: Optional[int], aggregator: reagent.core.tracker.Aggregator, observe_epoch_end: bool = True)

Bases: reagent.core.tracker.Observer

flush()
update(key: str, value)
class reagent.core.observers.TensorBoardScalarObserver(key: str, logging_key: Optional[str])

Bases: reagent.core.tracker.Observer

update(key: str, value)
class reagent.core.observers.ValueListObserver(observing_key: str)

Bases: reagent.core.tracker.Observer

Simple observer that collect values into a list

reset()
update(key: str, value)

reagent.core.oss_tensorboard_logger module

reagent.core.parameters module

class reagent.core.parameters.BaselineParameters(dim_feedforward: int, num_stacked_layers: int, warmup_num_batches: int = 0)

Bases: reagent.core.base_dataclass.BaseDataClass

dim_feedforward: int
num_stacked_layers: int
warmup_num_batches: int = 0
class reagent.core.parameters.CEMTrainerParameters(plan_horizon_length: int = 0, num_world_models: int = 0, cem_population_size: int = 0, cem_num_iterations: int = 0, ensemble_population_size: int = 0, num_elites: int = 0, mdnrnn: reagent.core.parameters.MDNRNNTrainerParameters = MDNRNNTrainerParameters(hidden_size=64, num_hidden_layers=2, learning_rate=0.001, num_gaussians=5, reward_loss_weight=1.0, next_state_loss_weight=1.0, not_terminal_loss_weight=1.0, fit_only_one_next_step=False, action_dim=2, action_names=None, multi_steps=1), rl: reagent.core.parameters.RLParameters = RLParameters(gamma=0.9, epsilon=0.1, target_update_rate=0.001, maxq_learning=True, reward_boost=None, temperature=0.01, softmax_policy=False, use_seq_num_diff_as_time_diff=False, q_network_loss='mse', set_missing_value_to_zero=False, tensorboard_logging_freq=0, predictor_atol_check=0.0, predictor_rtol_check=5e-05, time_diff_unit_length=1.0, multi_steps=None, ratio_different_predictions_tolerance=0.0), alpha: float = 0.25, epsilon: float = 0.001)

Bases: reagent.core.base_dataclass.BaseDataClass

alpha: float = 0.25
cem_num_iterations: int = 0
cem_population_size: int = 0
ensemble_population_size: int = 0
epsilon: float = 0.001
mdnrnn: reagent.core.parameters.MDNRNNTrainerParameters = MDNRNNTrainerParameters(hidden_size=64, num_hidden_layers=2, learning_rate=0.001, num_gaussians=5, reward_loss_weight=1.0, next_state_loss_weight=1.0, not_terminal_loss_weight=1.0, fit_only_one_next_step=False, action_dim=2, action_names=None, multi_steps=1)
num_elites: int = 0
num_world_models: int = 0
plan_horizon_length: int = 0
rl: reagent.core.parameters.RLParameters = RLParameters(gamma=0.9, epsilon=0.1, target_update_rate=0.001, maxq_learning=True, reward_boost=None, temperature=0.01, softmax_policy=False, use_seq_num_diff_as_time_diff=False, q_network_loss='mse', set_missing_value_to_zero=False, tensorboard_logging_freq=0, predictor_atol_check=0.0, predictor_rtol_check=5e-05, time_diff_unit_length=1.0, multi_steps=None, ratio_different_predictions_tolerance=0.0)
class reagent.core.parameters.ConvNetParameters(conv_dims: List[int], conv_height_kernels: List[int], pool_types: List[str], pool_kernel_sizes: List[int], conv_width_kernels: Optional[List[int]] = None)

Bases: reagent.core.base_dataclass.BaseDataClass

conv_dims: List[int]
conv_height_kernels: List[int]
conv_width_kernels: Optional[List[int]] = None
pool_kernel_sizes: List[int]
pool_types: List[str]
class reagent.core.parameters.EvaluationParameters(calc_cpe_in_training: bool = True)

Bases: reagent.core.base_dataclass.BaseDataClass

calc_cpe_in_training: bool = True
class reagent.core.parameters.EvolutionParameters(population_size: int = 1000, mutation_power: float = 0.1, learning_rate: float = 0.01)

Bases: reagent.core.base_dataclass.BaseDataClass

learning_rate: float = 0.01
mutation_power: float = 0.1
population_size: int = 1000
class reagent.core.parameters.GRUParameters(dim_model: int, num_stacked_layers: int)

Bases: reagent.core.base_dataclass.BaseDataClass

dim_model: int
num_stacked_layers: int
class reagent.core.parameters.MDNRNNTrainerParameters(hidden_size: int = 64, num_hidden_layers: int = 2, learning_rate: float = 0.001, num_gaussians: int = 5, reward_loss_weight: float = 1.0, next_state_loss_weight: float = 1.0, not_terminal_loss_weight: float = 1.0, fit_only_one_next_step: bool = False, action_dim: int = 2, action_names: Optional[List[str]] = None, multi_steps: int = 1)

Bases: reagent.core.base_dataclass.BaseDataClass

action_dim: int = 2
action_names: Optional[List[str]] = None
fit_only_one_next_step: bool = False
hidden_size: int = 64
learning_rate: float = 0.001
multi_steps: int = 1
next_state_loss_weight: float = 1.0
not_terminal_loss_weight: float = 1.0
num_gaussians: int = 5
num_hidden_layers: int = 2
reward_loss_weight: float = 1.0
class reagent.core.parameters.NormalizationData(dense_normalization_parameters: Dict[int, reagent.core.parameters.NormalizationParameters])

Bases: reagent.core.base_dataclass.BaseDataClass

dense_normalization_parameters: Dict[int, reagent.core.parameters.NormalizationParameters]
class reagent.core.parameters.NormalizationKey

Bases: object

Keys for dictionaries of NormalizationData

ACTION = 'action'
CANDIDATE = 'candidate'
ITEM = 'item'
STATE = 'state'
class reagent.core.parameters.NormalizationParameters(feature_type: str, boxcox_lambda: Optional[float] = None, boxcox_shift: Optional[float] = None, mean: Optional[float] = None, stddev: Optional[float] = None, possible_values: Optional[List[int]] = None, quantiles: Optional[List[float]] = None, min_value: Optional[float] = None, max_value: Optional[float] = None)

Bases: reagent.core.base_dataclass.BaseDataClass

boxcox_lambda: Optional[float] = None
boxcox_shift: Optional[float] = None
feature_type: str
max_value: Optional[float] = None
mean: Optional[float] = None
min_value: Optional[float] = None
possible_values: Optional[List[int]] = None
quantiles: Optional[List[float]] = None
stddev: Optional[float] = None
class reagent.core.parameters.ProblemDomain(value)

Bases: enum.Enum

An enumeration.

CONTINUOUS_ACTION = 'continuous_action'
DISCRETE_ACTION = 'discrete_action'
MDN_RNN = 'mdn_rnn'
PARAMETRIC_ACTION = 'parametric_action'
SEQ_TO_REWARD = 'seq2reward'
class reagent.core.parameters.RLParameters(gamma: float = 0.9, epsilon: float = 0.1, target_update_rate: float = 0.001, maxq_learning: bool = True, reward_boost: Optional[Dict[str, float]] = None, temperature: float = 0.01, softmax_policy: bool = False, use_seq_num_diff_as_time_diff: bool = False, q_network_loss: str = 'mse', set_missing_value_to_zero: bool = False, tensorboard_logging_freq: int = 0, predictor_atol_check: float = 0.0, predictor_rtol_check: float = 5e-05, time_diff_unit_length: float = 1.0, multi_steps: Optional[int] = None, ratio_different_predictions_tolerance: float = 0)

Bases: reagent.core.base_dataclass.BaseDataClass

epsilon: float = 0.1
gamma: float = 0.9
maxq_learning: bool = True
multi_steps: Optional[int] = None
predictor_atol_check: float = 0.0
predictor_rtol_check: float = 5e-05
q_network_loss: str = 'mse'
ratio_different_predictions_tolerance: float = 0
reward_boost: Optional[Dict[str, float]] = None
set_missing_value_to_zero: bool = False
softmax_policy: bool = False
target_update_rate: float = 0.001
temperature: float = 0.01
tensorboard_logging_freq: int = 0
time_diff_unit_length: float = 1.0
use_seq_num_diff_as_time_diff: bool = False
class reagent.core.parameters.RankingParameters(max_src_seq_len: int = 0, max_tgt_seq_len: int = 0, greedy_serving: bool = False)

Bases: reagent.core.base_dataclass.BaseDataClass

greedy_serving: bool = False
max_src_seq_len: int = 0
max_tgt_seq_len: int = 0
class reagent.core.parameters.Seq2RewardTrainerParameters(learning_rate: float = 0.001, multi_steps: int = 1, action_names: List[str] = <factory>, compress_model_learning_rate: float = 0.001, gamma: float = 1.0, view_q_value: bool = False, step_predict_net_size: int = 64, reward_boost: Optional[Dict[str, float]] = None)

Bases: reagent.core.base_dataclass.BaseDataClass

action_names: List[str]
compress_model_learning_rate: float = 0.001
gamma: float = 1.0
learning_rate: float = 0.001
multi_steps: int = 1
reward_boost: Optional[Dict[str, float]] = None
step_predict_net_size: int = 64
view_q_value: bool = False
class reagent.core.parameters.Seq2SlateParameters(on_policy: bool = True, learning_method: reagent.core.parameters_seq2slate.LearningMethod = <LearningMethod.REINFORCEMENT_LEARNING: 'reinforcement_learning'>, ips_clamp: Optional[reagent.core.parameters_seq2slate.IPSClamp] = None, simulation: Optional[reagent.core.parameters_seq2slate.SimulationParameters] = None)

Bases: reagent.core.base_dataclass.BaseDataClass

ips_clamp: Optional[reagent.core.parameters_seq2slate.IPSClamp] = None
learning_method: reagent.core.parameters_seq2slate.LearningMethod = 'reinforcement_learning'
on_policy: bool = True
simulation: Optional[reagent.core.parameters_seq2slate.SimulationParameters] = None
class reagent.core.parameters.SlateOptMethod(value)

Bases: enum.Enum

An enumeration.

EXACT = 'exact'
GREEDY = 'greedy'
TOP_K = 'top_k'
class reagent.core.parameters.SlateOptParameters(method: reagent.core.parameters.SlateOptMethod = <SlateOptMethod.TOP_K: 'top_k'>)

Bases: reagent.core.base_dataclass.BaseDataClass

method: reagent.core.parameters.SlateOptMethod = 'top_k'
class reagent.core.parameters.StateFeatureParameters(state_feature_names_override: List[str] = <factory>, state_feature_hashes_override: List[int] = <factory>)

Bases: reagent.core.base_dataclass.BaseDataClass

state_feature_hashes_override: List[int]
state_feature_names_override: List[str]
class reagent.core.parameters.TransformerParameters(num_heads: int = 1, dim_model: int = 64, dim_feedforward: int = 32, num_stacked_layers: int = 2, state_embed_dim: Optional[int] = None)

Bases: reagent.core.base_dataclass.BaseDataClass

dim_feedforward: int = 32
dim_model: int = 64
num_heads: int = 1
num_stacked_layers: int = 2
state_embed_dim: Optional[int] = None

reagent.core.parameters_seq2slate module

class reagent.core.parameters_seq2slate.IPSClamp(clamp_method: reagent.core.parameters_seq2slate.IPSClampMethod, clamp_max: float)

Bases: reagent.core.base_dataclass.BaseDataClass

clamp_max: float
clamp_method: reagent.core.parameters_seq2slate.IPSClampMethod
class reagent.core.parameters_seq2slate.IPSClampMethod(value)

Bases: enum.Enum

An enumeration.

AGGRESSIVE = 'aggressive'
UNIVERSAL = 'universal'
class reagent.core.parameters_seq2slate.LearningMethod(value)

Bases: enum.Enum

An enumeration.

PAIRWISE_ATTENTION = 'pairwise_attention'
REINFORCEMENT_LEARNING = 'reinforcement_learning'
SIMULATION = 'simulation'
TEACHER_FORCING = 'teacher_forcing'
property expect_slate_wise_reward
class reagent.core.parameters_seq2slate.RewardClamp(clamp_min: Optional[float] = None, clamp_max: Optional[float] = None)

Bases: object

clamp_max: Optional[float] = None
clamp_min: Optional[float] = None
class reagent.core.parameters_seq2slate.SimulationParameters(reward_name_weight: Dict[str, float], reward_name_power: Dict[str, float], reward_name_path: Dict[str, str], reward_clamp: Optional[reagent.core.parameters_seq2slate.RewardClamp] = None, distance_penalty: Optional[float] = None)

Bases: reagent.core.base_dataclass.BaseDataClass

distance_penalty: Optional[float] = None
reward_clamp: Optional[reagent.core.parameters_seq2slate.RewardClamp] = None
reward_name_path: Dict[str, str]
reward_name_power: Dict[str, float]
reward_name_weight: Dict[str, float]

reagent.core.registry_meta module

class reagent.core.registry_meta.RegistryMeta(name, bases, namespace, **kwargs)

Bases: abc.ABCMeta

fill_union()
reagent.core.registry_meta.wrap_oss_with_dataclass(union)

reagent.core.report_utils module

reagent.core.report_utils.calculate_recent_window_average(arr, window_size, num_entries)
reagent.core.report_utils.get_mean_of_recent_values(values: Dict[str, List[float]], min_window_size=10) Dict[str, float]

reagent.core.result_registries module

class reagent.core.result_registries.PublishingResult(success: bool)

Bases: object

REGISTRY = {'no_publishing_results': <class 'reagent.core.result_types.NoPublishingResults'>}
REGISTRY_FROZEN = False
REGISTRY_NAME = 'PublishingResult'
success: bool
class reagent.core.result_registries.TrainingReport

Bases: object

REGISTRY = {}
REGISTRY_FROZEN = False
REGISTRY_NAME = 'TrainingReport'
class reagent.core.result_registries.ValidationResult(should_publish: bool)

Bases: object

REGISTRY = {'no_validation_results': <class 'reagent.core.result_types.NoValidationResults'>}
REGISTRY_FROZEN = False
REGISTRY_NAME = 'ValidationResult'
should_publish: bool

reagent.core.result_types module

class reagent.core.result_types.NoPublishingResults(success: bool)

Bases: reagent.core.result_registries.PublishingResult

success: bool
class reagent.core.result_types.NoValidationResults(should_publish: bool)

Bases: reagent.core.result_registries.ValidationResult

should_publish: bool

reagent.core.running_stats module

class reagent.core.running_stats.RunningStats(lst=None, capacity: int = 1000)

Bases: object

Running statistics for elements in a stream

Can take single values or iterables

1. Implements Welford’s algorithm for computing a running mean and standard deviation 2. Min-Heap to find top-k where k < capacity (kwarg) .. method:: mean - returns the mean

std     - returns the std
meanfull- returns the mean and std of the mean
topk(k) - returns the kth highest value for k < capacity
consume(lst)
property mean
property meanfull
property std
update(x)

reagent.core.tagged_union module

class reagent.core.tagged_union.TaggedUnion

Bases: object

Assuming that subclasses are pydantic’s dataclass. All the fields must be Optional w/ None as default value. This doesn’t support changing selected field/value.

property value

reagent.core.tensorboardX module

Context library to allow dropping tensorboardX anywhere in the codebase. If there is no SummaryWriter in the context, function calls will be no-op.

Usage:

writer = SummaryWriter()

with summary_writer_context(writer):

some_func()

def some_func():

SummaryWriterContext.add_scalar(“foo”, tensor)

class reagent.core.tensorboardX.SummaryWriterContext

Bases: object

classmethod add_custom_scalars(writer)

Call this once you are satisfied setting up custom scalar

classmethod add_custom_scalars_multilinechart(tags, category=None, title=None)
classmethod add_histogram(key, val, *args, **kwargs)
classmethod increase_global_step()
classmethod pop()
classmethod push(writer)
class reagent.core.tensorboardX.SummaryWriterContextMeta

Bases: type

reagent.core.tensorboardX.summary_writer_context(writer)

reagent.core.torch_utils module

reagent.core.torch_utils.dict_to_tensor(batch: Dict[str, numpy.ndarray], device: str = 'cpu')
reagent.core.torch_utils.export_module_to_buffer(module) _io.BytesIO
reagent.core.torch_utils.gather(data, index_2d)

Gather data alongs the second dim. Assume data is 3d with shape (batch_size, dim1, dim2), and index_2d’s shape is (batch_size, dim1). output[i][j] = data[i][index_2d[i][j]]

This function does not require data, output, or index_2d having the same shape, which

is mandated by torch.gather.

reagent.core.torch_utils.get_device(model)
reagent.core.torch_utils.masked_softmax(x, mask, temperature)

Compute softmax values for each sets of scores in x.

reagent.core.torch_utils.rescale_torch_tensor(tensor: torch.Tensor, new_min: torch.Tensor, new_max: torch.Tensor, prev_min: torch.Tensor, prev_max: torch.Tensor)

Rescale column values in N X M torch tensor to be in new range. Each column m in input tensor will be rescaled from range [prev_min[m], prev_max[m]] to [new_min[m], new_max[m]]

reagent.core.torch_utils.softmax(x, temperature)

Compute softmax values for each sets of scores in x.

reagent.core.torch_utils.stack(mems)

Stack a list of tensors Could use torch.stack here but torch.stack is much slower than torch.cat + view Submitted an issue for investigation: https://github.com/pytorch/pytorch/issues/22462

FIXME: Remove this function after the issue above is resolved

reagent.core.tracker module

class reagent.core.tracker.Aggregator(key: str)

Bases: object

aggregate(values)
flush()
class reagent.core.tracker.ObservableMixin

Bases: object

add_observer(observer: reagent.core.tracker.Observer)
add_observers(observers: List[reagent.core.tracker.Observer])
notify_observers(**kwargs)
class reagent.core.tracker.Observer(observing_keys: List[str])

Bases: object

Base class for observers

get_observing_keys() List[str]
update(key: str, value)
reagent.core.tracker.observable(cls=None, **kwargs)

Decorator to mark a class as producing observable values. The names of the observable values are the names of keyword arguments. The values of keyword arguments are the types of the value. The type is currently not used for anything.

reagent.core.types module

class reagent.core.types.ActorOutput(action: torch.Tensor, log_prob: Optional[torch.Tensor] = None, squashed_mean: Optional[torch.Tensor] = None)

Bases: reagent.core.types.TensorDataClass

action: torch.Tensor
log_prob: Optional[torch.Tensor] = None
squashed_mean: Optional[torch.Tensor] = None
class reagent.core.types.BanditRewardModelInput(state: reagent.core.types.FeatureData, action: torch.Tensor, reward: torch.Tensor, action_prob: Optional[torch.Tensor] = None)

Bases: reagent.core.types.TensorDataClass

action: torch.Tensor
action_prob: Optional[torch.Tensor] = None
classmethod from_dict(batch: Dict[str, torch.Tensor])
reward: torch.Tensor
state: reagent.core.types.FeatureData
class reagent.core.types.BaseInput(state: reagent.core.types.FeatureData, next_state: reagent.core.types.FeatureData, reward: torch.Tensor, time_diff: torch.Tensor, step: Optional[torch.Tensor], not_terminal: torch.Tensor)

Bases: reagent.core.types.TensorDataClass

Base class for all inputs, both raw and preprocessed

as_dict_shallow()
batch_size()
static from_dict(batch)
next_state: reagent.core.types.FeatureData
not_terminal: torch.Tensor
reward: torch.Tensor
state: reagent.core.types.FeatureData
step: Optional[torch.Tensor]
time_diff: torch.Tensor
class reagent.core.types.CBInput(context_action_features: torch.Tensor, action: Final[Optional[torch.Tensor]] = None, reward: Final[Optional[torch.Tensor]] = None, log_prob: Final[Optional[torch.Tensor]] = None, weight: Final[Optional[torch.Tensor]] = None)

Bases: reagent.core.types.TensorDataClass

action: Final[Optional[torch.Tensor]] = None
context_action_features: torch.Tensor
classmethod from_dict(d: Dict[str, torch.Tensor]) reagent.core.types.CBInput
classmethod input_prototype(context_dim: int = 2, batch_size: int = 10, action_features_dim: int = 3, num_actions: int = 4) reagent.core.types.CBInput
log_prob: Final[Optional[torch.Tensor]] = None
reward: Final[Optional[torch.Tensor]] = None
weight: Final[Optional[torch.Tensor]] = None
class reagent.core.types.DiscreteDqnInput(state: reagent.core.types.FeatureData, next_state: reagent.core.types.FeatureData, reward: torch.Tensor, time_diff: torch.Tensor, step: Optional[torch.Tensor], not_terminal: torch.Tensor, action: torch.Tensor, next_action: torch.Tensor, possible_actions_mask: torch.Tensor, possible_next_actions_mask: torch.Tensor, extras: reagent.core.types.ExtraData)

Bases: reagent.core.types.BaseInput

See input_prototype for DQN expected input shapes

action: torch.Tensor
extras: reagent.core.types.ExtraData
classmethod from_dict(batch)
classmethod input_prototype(action_dim=2, batch_size=10, state_dim=3)
next_action: torch.Tensor
possible_actions_mask: torch.Tensor
possible_next_actions_mask: torch.Tensor
class reagent.core.types.DocList(float_features: torch.Tensor, mask: torch.Tensor = None, value: torch.Tensor = None)

Bases: reagent.core.types.TensorDataClass

as_feature_data()
float_features: torch.Tensor
mask: torch.Tensor = None
select_slate(action: torch.Tensor)
value: torch.Tensor = None
class reagent.core.types.DqnPolicyActionSet(greedy: int, softmax: Optional[int] = None, greedy_act_name: Optional[str] = None, softmax_act_name: Optional[str] = None, softmax_act_prob: Optional[float] = None)

Bases: reagent.core.types.TensorDataClass

greedy: int
greedy_act_name: Optional[str] = None
softmax: Optional[int] = None
softmax_act_name: Optional[str] = None
softmax_act_prob: Optional[float] = None
class reagent.core.types.ExplicitMapping(ids: List[int] = <factory>)

Bases: object

property id2index: Dict[int, int]
ids: List[int]
property table_size
class reagent.core.types.ExtraData(mdp_id: Optional[torch.Tensor] = None, sequence_number: Optional[torch.Tensor] = None, action_probability: Optional[torch.Tensor] = None, max_num_actions: Optional[int] = None, metrics: Optional[torch.Tensor] = None)

Bases: reagent.core.types.TensorDataClass

action_probability: Optional[torch.Tensor] = None
classmethod from_dict(d)
max_num_actions: Optional[int] = None
mdp_id: Optional[torch.Tensor] = None
metrics: Optional[torch.Tensor] = None
sequence_number: Optional[torch.Tensor] = None
class reagent.core.types.FeatureData(float_features: torch.Tensor, id_list_features: Dict[str, Tuple[torch.Tensor, torch.Tensor]] = <factory>, id_score_list_features: Dict[str, Tuple[torch.Tensor, torch.Tensor, torch.Tensor]] = <factory>, stacked_float_features: Optional[torch.Tensor] = None, candidate_docs: Optional[reagent.core.types.DocList] = None, time_since_first: Optional[torch.Tensor] = None)

Bases: reagent.core.types.TensorDataClass

candidate_docs: Optional[reagent.core.types.DocList] = None
concat_user_doc()
float_features: torch.Tensor
get_ranking_state(has_user_feat: bool)
get_tiled_batch(num_tiles: int)
property has_float_features_only: bool
id_list_features: Dict[str, Tuple[torch.Tensor, torch.Tensor]]
id_score_list_features: Dict[str, Tuple[torch.Tensor, torch.Tensor, torch.Tensor]]
stacked_float_features: Optional[torch.Tensor] = None
time_since_first: Optional[torch.Tensor] = None
class reagent.core.types.FloatFeatureInfo(name: str, feature_id: int)

Bases: reagent.core.base_dataclass.BaseDataClass

feature_id: int
name: str
class reagent.core.types.FrechetSortConfig(shape: float, equiv_len: int, topk: Optional[int] = None, log_scores: bool = True)

Bases: object

equiv_len: int
log_scores: bool = True
shape: float
topk: Optional[int] = None
class reagent.core.types.IdListFeatureConfig(name: str, feature_id: int, id_mapping_name: str)

Bases: reagent.core.base_dataclass.BaseDataClass

feature_id: int
id_mapping_name: str
name: str
class reagent.core.types.IdMappingUnion(explicit_mapping: Optional[reagent.core.types.ExplicitMapping] = None, modulo: Optional[reagent.core.types.ModuloMapping] = None)

Bases: reagent.core.tagged_union.TaggedUnion

explicit_mapping: Optional[reagent.core.types.ExplicitMapping] = None
modulo: Optional[reagent.core.types.ModuloMapping] = None
class reagent.core.types.IdScoreListFeatureConfig(name: str, feature_id: int, id_mapping_name: str)

Bases: reagent.core.base_dataclass.BaseDataClass

feature_id: int
id_mapping_name: str
name: str
class reagent.core.types.MemoryNetworkInput(state: reagent.core.types.FeatureData, next_state: reagent.core.types.FeatureData, reward: torch.Tensor, time_diff: torch.Tensor, step: Optional[torch.Tensor], not_terminal: torch.Tensor, action: torch.Tensor, valid_step: Optional[torch.Tensor] = None, extras: reagent.core.types.ExtraData = <factory>)

Bases: reagent.core.types.BaseInput

action: torch.Tensor
extras: reagent.core.types.ExtraData
classmethod from_dict(d)
valid_step: Optional[torch.Tensor] = None
class reagent.core.types.MemoryNetworkOutput(mus: torch.Tensor, sigmas: torch.Tensor, logpi: torch.Tensor, reward: torch.Tensor, not_terminal: torch.Tensor, last_step_lstm_hidden: torch.Tensor, last_step_lstm_cell: torch.Tensor, all_steps_lstm_hidden: torch.Tensor)

Bases: reagent.core.types.TensorDataClass

all_steps_lstm_hidden: torch.Tensor
last_step_lstm_cell: torch.Tensor
last_step_lstm_hidden: torch.Tensor
logpi: torch.Tensor
mus: torch.Tensor
not_terminal: torch.Tensor
reward: torch.Tensor
sigmas: torch.Tensor
class reagent.core.types.ModelFeatureConfig(float_feature_infos: List[reagent.core.types.FloatFeatureInfo] = <factory>, id_mapping_config: Dict[str, reagent.core.types.IdMappingUnion] = <factory>, id_list_feature_configs: List[reagent.core.types.IdListFeatureConfig] = <factory>, id_score_list_feature_configs: List[reagent.core.types.IdScoreListFeatureConfig] = <factory>)

Bases: reagent.core.base_dataclass.BaseDataClass

float_feature_infos: List[reagent.core.types.FloatFeatureInfo]
property id2config
property id2name
id_list_feature_configs: List[reagent.core.types.IdListFeatureConfig]
id_mapping_config: Dict[str, reagent.core.types.IdMappingUnion]
id_score_list_feature_configs: List[reagent.core.types.IdScoreListFeatureConfig]
property name2config
property name2id
property only_dense
class reagent.core.types.ModuloMapping(table_size: int)

Bases: object

Map IDs to [0, table_size) via modulo table_size

table_size: int
class reagent.core.types.NoDuplicatedWarningLogger(logger)

Bases: object

warning(msg)
class reagent.core.types.ParametricDqnInput(state: reagent.core.types.FeatureData, next_state: reagent.core.types.FeatureData, reward: torch.Tensor, time_diff: torch.Tensor, step: Optional[torch.Tensor], not_terminal: torch.Tensor, action: reagent.core.types.FeatureData, next_action: reagent.core.types.FeatureData, possible_actions: reagent.core.types.FeatureData, possible_actions_mask: torch.Tensor, possible_next_actions: reagent.core.types.FeatureData, possible_next_actions_mask: torch.Tensor, extras: Optional[reagent.core.types.ExtraData] = None)

Bases: reagent.core.types.BaseInput

action: reagent.core.types.FeatureData
extras: Optional[reagent.core.types.ExtraData] = None
classmethod from_dict(batch)
next_action: reagent.core.types.FeatureData
possible_actions: reagent.core.types.FeatureData
possible_actions_mask: torch.Tensor
possible_next_actions: reagent.core.types.FeatureData
possible_next_actions_mask: torch.Tensor
class reagent.core.types.PlanningPolicyOutput(next_best_continuous_action: Optional[torch.Tensor] = None, next_best_discrete_action_one_hot: Optional[torch.Tensor] = None, next_best_discrete_action_idx: Optional[int] = None)

Bases: reagent.core.types.TensorDataClass

next_best_continuous_action: Optional[torch.Tensor] = None
next_best_discrete_action_idx: Optional[int] = None
next_best_discrete_action_one_hot: Optional[torch.Tensor] = None
class reagent.core.types.PolicyGradientInput(state: reagent.core.types.FeatureData, action: torch.Tensor, reward: torch.Tensor, log_prob: torch.Tensor, possible_actions_mask: Optional[torch.Tensor] = None)

Bases: reagent.core.types.TensorDataClass

See input_prototype for expected input dimensions

action: torch.Tensor
classmethod from_dict(d: Dict[str, torch.Tensor])
classmethod input_prototype(action_dim=2, batch_size=10, state_dim=3)
log_prob: torch.Tensor
possible_actions_mask: Optional[torch.Tensor] = None
reward: torch.Tensor
state: reagent.core.types.FeatureData
class reagent.core.types.PolicyNetworkInput(state: reagent.core.types.FeatureData, next_state: reagent.core.types.FeatureData, reward: torch.Tensor, time_diff: torch.Tensor, step: Optional[torch.Tensor], not_terminal: torch.Tensor, action: reagent.core.types.FeatureData, next_action: reagent.core.types.FeatureData, extras: Optional[reagent.core.types.ExtraData] = None)

Bases: reagent.core.types.BaseInput

action: reagent.core.types.FeatureData
extras: Optional[reagent.core.types.ExtraData] = None
classmethod from_dict(batch)
next_action: reagent.core.types.FeatureData
class reagent.core.types.PreprocessedRankingInput(state: reagent.core.types.FeatureData, src_seq: reagent.core.types.FeatureData, src_src_mask: Optional[torch.Tensor] = None, tgt_in_seq: Optional[reagent.core.types.FeatureData] = None, tgt_out_seq: Optional[reagent.core.types.FeatureData] = None, tgt_tgt_mask: Optional[torch.Tensor] = None, slate_reward: Optional[torch.Tensor] = None, position_reward: Optional[torch.Tensor] = None, src_in_idx: Optional[torch.Tensor] = None, tgt_in_idx: Optional[torch.Tensor] = None, tgt_out_idx: Optional[torch.Tensor] = None, tgt_out_probs: Optional[torch.Tensor] = None, optim_tgt_in_idx: Optional[torch.Tensor] = None, optim_tgt_out_idx: Optional[torch.Tensor] = None, optim_tgt_in_seq: Optional[reagent.core.types.FeatureData] = None, optim_tgt_out_seq: Optional[reagent.core.types.FeatureData] = None, extras: Optional[reagent.core.types.ExtraData] = <factory>)

Bases: reagent.core.types.TensorDataClass

batch_size() int
extras: Optional[reagent.core.types.ExtraData]
classmethod from_input(state: torch.Tensor, candidates: torch.Tensor, device: torch.device, action: Optional[torch.Tensor] = None, optimal_action: Optional[torch.Tensor] = None, logged_propensities: Optional[torch.Tensor] = None, slate_reward: Optional[torch.Tensor] = None, position_reward: Optional[torch.Tensor] = None, extras: Optional[reagent.core.types.ExtraData] = None)

Build derived fields (indices & masks) from raw input

classmethod from_tensors(state: torch.Tensor, src_seq: torch.Tensor, src_src_mask: Optional[torch.Tensor] = None, tgt_in_seq: Optional[torch.Tensor] = None, tgt_out_seq: Optional[torch.Tensor] = None, tgt_tgt_mask: Optional[torch.Tensor] = None, slate_reward: Optional[torch.Tensor] = None, position_reward: Optional[torch.Tensor] = None, src_in_idx: Optional[torch.Tensor] = None, tgt_in_idx: Optional[torch.Tensor] = None, tgt_out_idx: Optional[torch.Tensor] = None, tgt_out_probs: Optional[torch.Tensor] = None, optim_tgt_in_idx: Optional[torch.Tensor] = None, optim_tgt_out_idx: Optional[torch.Tensor] = None, optim_tgt_in_seq: Optional[torch.Tensor] = None, optim_tgt_out_seq: Optional[torch.Tensor] = None, extras: Optional[reagent.core.types.ExtraData] = None, **kwargs)
optim_tgt_in_idx: Optional[torch.Tensor] = None
optim_tgt_in_seq: Optional[reagent.core.types.FeatureData] = None
optim_tgt_out_idx: Optional[torch.Tensor] = None
optim_tgt_out_seq: Optional[reagent.core.types.FeatureData] = None
position_reward: Optional[torch.Tensor] = None
slate_reward: Optional[torch.Tensor] = None
src_in_idx: Optional[torch.Tensor] = None
src_seq: reagent.core.types.FeatureData
src_src_mask: Optional[torch.Tensor] = None
state: reagent.core.types.FeatureData
tgt_in_idx: Optional[torch.Tensor] = None
tgt_in_seq: Optional[reagent.core.types.FeatureData] = None
tgt_out_idx: Optional[torch.Tensor] = None
tgt_out_probs: Optional[torch.Tensor] = None
tgt_out_seq: Optional[reagent.core.types.FeatureData] = None
tgt_tgt_mask: Optional[torch.Tensor] = None
class reagent.core.types.PreprocessedTrainingBatch(training_input: reagent.core.types.PreprocessedRankingInput, extras: reagent.core.types.ExtraData = <factory>)

Bases: reagent.core.types.TensorDataClass

batch_size()
extras: reagent.core.types.ExtraData
training_input: reagent.core.types.PreprocessedRankingInput
class reagent.core.types.RankingOutput(ranked_tgt_out_idx: Optional[torch.Tensor] = None, ranked_per_symbol_probs: Optional[torch.Tensor] = None, ranked_per_seq_probs: Optional[torch.Tensor] = None, log_probs: Optional[torch.Tensor] = None, encoder_scores: Optional[torch.Tensor] = None)

Bases: reagent.core.types.TensorDataClass

encoder_scores: Optional[torch.Tensor] = None
log_probs: Optional[torch.Tensor] = None
ranked_per_seq_probs: Optional[torch.Tensor] = None
ranked_per_symbol_probs: Optional[torch.Tensor] = None
ranked_tgt_out_idx: Optional[torch.Tensor] = None
class reagent.core.types.RewardNetworkOutput(predicted_reward: torch.Tensor)

Bases: reagent.core.types.TensorDataClass

predicted_reward: torch.Tensor
class reagent.core.types.Seq2RewardOutput(acc_reward: torch.Tensor)

Bases: reagent.core.types.TensorDataClass

acc_reward: torch.Tensor
class reagent.core.types.ServingFeatureData(float_features_with_presence, id_list_features, id_score_list_features)

Bases: NamedTuple

float_features_with_presence: Tuple[torch.Tensor, torch.Tensor]

Alias for field number 0

id_list_features: Dict[int, Tuple[torch.Tensor, torch.Tensor]]

Alias for field number 1

id_score_list_features: Dict[int, Tuple[torch.Tensor, torch.Tensor, torch.Tensor]]

Alias for field number 2

class reagent.core.types.SlateQInput(state: reagent.core.types.FeatureData, next_state: reagent.core.types.FeatureData, reward: torch.Tensor, time_diff: torch.Tensor, step: Optional[torch.Tensor], not_terminal: torch.Tensor, action: torch.Tensor, next_action: torch.Tensor, reward_mask: torch.Tensor, extras: Optional[reagent.core.types.ExtraData] = None)

Bases: reagent.core.types.BaseInput

The shapes of reward, reward_mask, & next_item_mask are (batch_size, slate_size).

reward_mask indicated whether the reward could be observed, e.g., the item got into viewport or not.

action: torch.Tensor
extras: Optional[reagent.core.types.ExtraData] = None
classmethod from_dict(d)
next_action: torch.Tensor
reward_mask: torch.Tensor
class reagent.core.types.SlateScoreBatch(mdp_id: torch.Tensor, sequence_number: torch.Tensor, scores: torch.Tensor, training_input: reagent.core.types.PolicyGradientInput)

Bases: object

mdp_id: torch.Tensor
scores: torch.Tensor
sequence_number: torch.Tensor
training_input: reagent.core.types.PolicyGradientInput
class reagent.core.types.TensorDataClass

Bases: reagent.core.base_dataclass.BaseDataClass

cpu()
cuda(*args, **kwargs)
class reagent.core.types.TensorFeatureData(*args: Any, **kwargs: Any)

Bases: torch.nn.Module

Primarily for using in nn.Sequential

forward(input: torch.Tensor) reagent.core.types.FeatureData
class reagent.core.types.ValuePresence(value: torch.Tensor, presence: Optional[torch.Tensor])

Bases: reagent.core.types.TensorDataClass

presence: Optional[torch.Tensor]
value: torch.Tensor
reagent.core.types.isinstance_namedtuple(x)

reagent.core.utils module

reagent.core.utils.get_rank() int

Returns the torch.distributed rank of the process. 0 represents the main process and is the default if torch.distributed isn’t set up

class reagent.core.utils.lazy_property(fget)

Bases: object

More or less copy-pasta: http://stackoverflow.com/a/6849299 Meant to be used for lazy evaluation of an object attribute. property should represent non-mutable data, as it replaces itself.

Module contents