reagent.ope.estimators package

Submodules

reagent.ope.estimators.contextual_bandits_estimators module

class reagent.ope.estimators.contextual_bandits_estimators.ActionRewards(values: Union[Mapping[reagent.ope.estimators.types.KeyType, float], Sequence[float], numpy.ndarray, torch.Tensor])

Bases: reagent.ope.estimators.types.Values[reagent.ope.estimators.types.TypeWrapper[Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]]]

class reagent.ope.estimators.contextual_bandits_estimators.BanditsEstimatorInput(action_space: reagent.ope.estimators.types.ActionSpace, samples: Sequence[reagent.ope.estimators.contextual_bandits_estimators.LogSample], has_model_outputs: bool)

Bases: object

action_space: reagent.ope.estimators.types.ActionSpace
has_model_outputs: bool
samples: Sequence[reagent.ope.estimators.contextual_bandits_estimators.LogSample]
class reagent.ope.estimators.contextual_bandits_estimators.BanditsModel

Bases: abc.ABC

class reagent.ope.estimators.contextual_bandits_estimators.DMEstimator(trainer: Optional[reagent.ope.estimators.types.Trainer] = None, device=None)

Bases: reagent.ope.estimators.estimator.Estimator

TRAINING_VALIDATION_SPLIT = 0.8

Estimating using Direct Method (DM), assuming a reward model is trained

evaluate(input: reagent.ope.estimators.contextual_bandits_estimators.BanditsEstimatorInput, **kwargs) Optional[reagent.ope.estimators.estimator.EstimatorResult]
class reagent.ope.estimators.contextual_bandits_estimators.DoublyRobustEstimator(trainer: Optional[reagent.ope.estimators.types.Trainer] = None, weight_clamper: Optional[reagent.ope.utils.Clamper] = None, device=None)

Bases: reagent.ope.estimators.contextual_bandits_estimators.DMEstimator

Doubly Robust (DR) estimator:
reference: https://arxiv.org/abs/1103.4601 (deterministic reward model)

https://arxiv.org/abs/1612.01205 (distributed reward model)

class reagent.ope.estimators.contextual_bandits_estimators.IPSEstimator(weight_clamper: Optional[reagent.ope.utils.Clamper] = None, weighted: bool = False, device=None)

Bases: reagent.ope.estimators.estimator.Estimator

Inverse Propensity Scoring (IPS) estimator

evaluate(input: reagent.ope.estimators.contextual_bandits_estimators.BanditsEstimatorInput, **kwargs) Optional[reagent.ope.estimators.estimator.EstimatorResult]
class reagent.ope.estimators.contextual_bandits_estimators.LogSample(context: object, log_action: reagent.ope.estimators.types.TypeWrapper[typing.Union[int, typing.Tuple[int], float, typing.Tuple[float], numpy.ndarray, torch.Tensor]], log_reward: float, log_action_probabilities: reagent.ope.estimators.types.ActionDistribution, tgt_action_probabilities: reagent.ope.estimators.types.ActionDistribution, tgt_action: reagent.ope.estimators.types.TypeWrapper[typing.Union[int, typing.Tuple[int], float, typing.Tuple[float], numpy.ndarray, torch.Tensor]], model_outputs: Optional[reagent.ope.estimators.contextual_bandits_estimators.ModelOutputs] = None, ground_truth_reward: float = nan, item_feature: Optional[torch.Tensor] = None)

Bases: object

context: object
ground_truth_reward: float = nan
item_feature: Optional[torch.Tensor] = None
log_action: reagent.ope.estimators.types.TypeWrapper[Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]]
log_action_probabilities: reagent.ope.estimators.types.ActionDistribution
log_reward: float
model_outputs: Optional[reagent.ope.estimators.contextual_bandits_estimators.ModelOutputs] = None
tgt_action: reagent.ope.estimators.types.TypeWrapper[Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]]
tgt_action_probabilities: reagent.ope.estimators.types.ActionDistribution
class reagent.ope.estimators.contextual_bandits_estimators.ModelOutputs(tgt_reward_from_log_action: float, tgt_rewards: Sequence[float])

Bases: object

tgt_reward_from_log_action: float
tgt_rewards: Sequence[float]
class reagent.ope.estimators.contextual_bandits_estimators.SwitchDREstimator(trainer: Optional[reagent.ope.estimators.types.Trainer] = None, weight_clamper: Optional[reagent.ope.utils.Clamper] = None, rmax: Optional[float] = None, device=None)

Bases: reagent.ope.estimators.contextual_bandits_estimators.SwitchEstimator

class reagent.ope.estimators.contextual_bandits_estimators.SwitchEstimator(trainer: Optional[reagent.ope.estimators.types.Trainer] = None, weight_clamper: Optional[reagent.ope.utils.Clamper] = None, rmax: Optional[float] = None, device=None)

Bases: reagent.ope.estimators.contextual_bandits_estimators.DMEstimator

CANDIDATES = 21
EPSILON = 1e-06
EXP_BASE = 1.5

reagent.ope.estimators.estimator module

class reagent.ope.estimators.estimator.Estimator(device=None)

Bases: abc.ABC

Estimator interface

abstract evaluate(input, **kwargs) Optional[Union[reagent.ope.estimators.estimator.EstimatorResult, reagent.ope.estimators.estimator.EstimatorResults]]
class reagent.ope.estimators.estimator.EstimatorResult(log_reward: float, estimated_reward: float, ground_truth_reward: Optional[float] = 0.0, estimated_weight: float = 1.0, estimated_reward_normalized: Optional[float] = None, estimated_reward_std_error: Optional[float] = None, estimated_reward_normalized_std_error: Optional[float] = None)

Bases: object

estimated_reward: float
estimated_reward_normalized: Optional[float] = None
estimated_reward_normalized_std_error: Optional[float] = None
estimated_reward_std_error: Optional[float] = None
estimated_weight: float = 1.0
ground_truth_reward: Optional[float] = 0.0
log_reward: float
class reagent.ope.estimators.estimator.EstimatorResults(results: List[reagent.ope.estimators.estimator.EstimatorResult] = <factory>)

Bases: object

Estimator results

append(result: reagent.ope.estimators.estimator.EstimatorResult)

Append a data point

Parameters

result – result from an experimental run

device = None
report()
results: List[reagent.ope.estimators.estimator.EstimatorResult]
class reagent.ope.estimators.estimator.EstimatorSampleResult(log_reward: float, target_reward: float, ground_truth_reward: float, weight: float)

Bases: object

ground_truth_reward: float
log_reward: float
target_reward: float
weight: float
class reagent.ope.estimators.estimator.Evaluator(experiments: Iterable[Tuple[Iterable[reagent.ope.estimators.estimator.Estimator], object]], max_num_workers: int)

Bases: object

Multiprocessing evaluator

evaluate() Mapping[str, reagent.ope.estimators.estimator.EstimatorResults]
static report_results(results: Mapping[str, reagent.ope.estimators.estimator.EstimatorResults])
class reagent.ope.estimators.estimator.ResultDiffs(diffs: torch.Tensor)

Bases: object

Statistics for differences, e.g., estimates vs ground truth

property bias: torch.Tensor
property rmse: torch.Tensor
property variance: torch.Tensor
reagent.ope.estimators.estimator.run_evaluation(file_name: str) Optional[Mapping[str, Iterable[reagent.ope.estimators.estimator.EstimatorResults]]]

reagent.ope.estimators.sequential_estimators module

class reagent.ope.estimators.sequential_estimators.DMEstimator(device=None)

Bases: reagent.ope.estimators.sequential_estimators.RLEstimator

Direct Method estimator

evaluate(input: reagent.ope.estimators.sequential_estimators.RLEstimatorInput, **kwargs) reagent.ope.estimators.estimator.EstimatorResults
class reagent.ope.estimators.sequential_estimators.DoublyRobustEstimator(weight_clamper: Optional[reagent.ope.utils.Clamper] = None, weighted: bool = True, device=None)

Bases: reagent.ope.estimators.sequential_estimators.IPSEstimator

Doubly Robust estimator

evaluate(input: reagent.ope.estimators.sequential_estimators.RLEstimatorInput, **kwargs) reagent.ope.estimators.estimator.EstimatorResults
class reagent.ope.estimators.sequential_estimators.EpsilonGreedyRLPolicy(policy: reagent.ope.estimators.sequential_estimators.RLPolicy, epsilon: float = 0.0)

Bases: reagent.ope.estimators.sequential_estimators.RLPolicy

A wrapper policy:

Skewing the wrapped policy action distribution by epsilon Number of total actions must be given, and wrapped policy should calculate probabilities for all actions

action_dist(state) reagent.ope.estimators.types.ActionDistribution
class reagent.ope.estimators.sequential_estimators.IPSEstimator(weight_clamper: Optional[reagent.ope.utils.Clamper] = None, weighted: bool = True, device=None)

Bases: reagent.ope.estimators.sequential_estimators.RLEstimator

IPS estimator

evaluate(input: reagent.ope.estimators.sequential_estimators.RLEstimatorInput, **kwargs) reagent.ope.estimators.estimator.EstimatorResults
class reagent.ope.estimators.sequential_estimators.MAGICEstimator(weight_clamper: Optional[reagent.ope.utils.Clamper] = None, device=None)

Bases: reagent.ope.estimators.sequential_estimators.IPSEstimator

Algorithm from https://arxiv.org/abs/1604.00923, appendix G.3

evaluate(input: reagent.ope.estimators.sequential_estimators.RLEstimatorInput, **kwargs) reagent.ope.estimators.estimator.EstimatorResults
class reagent.ope.estimators.sequential_estimators.Model

Bases: abc.ABC

Model interface

abstract next_state_reward_dist(state: reagent.ope.estimators.sequential_estimators.State, action: reagent.ope.estimators.types.TypeWrapper[Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]]) Mapping[reagent.ope.estimators.sequential_estimators.State, reagent.ope.estimators.sequential_estimators.RewardProbability]
class reagent.ope.estimators.sequential_estimators.NeuralDualDICE(state_dim: int, action_dim: int, deterministic_env: bool, average_next_v: bool = False, polynomial_degree: float = 1.5, value_lr: float = 0.01, zeta_lr: float = 0.01, hidden_dim: int = 64, hidden_layers: int = 2, training_samples: int = 100000, batch_size: int = 2048, device: Optional[Any] = None, loss_callback_fn: Optional[Callable[[float, float, reagent.ope.estimators.sequential_estimators.RLEstimator], None]] = None, reporting_frequency: int = 1000, v: Optional[Any] = None, zeta: Optional[Any] = None, f: Optional[Any] = None, fconjugate: Optional[Any] = None, zeta_net: Optional[Any] = None, v_net: Optional[Any] = None)

Bases: reagent.ope.estimators.sequential_estimators.RLEstimator

Parameters
  • state_dim – The dimensionality of the state vectors

  • action_dim – The number of discrete actions

  • deterministic_env – Whether or not the environment is determinstic. Can help with stability of training.

  • average_next_v – Whether or not to average the next nu value over all possible actions. Can help with stability of training.

  • polynomial_degree – The degree of the convex function f(x) = 1/p * |x|^p

  • value_lr – The learning rate for nu

  • zeta_lr – The learning rate for zeta

  • hidden_dim – The dimensionality of the hidden layers for zeta and v

  • hidden_layers – The number of hidden layers for zeta and v

  • activation – The activation function for zeta and v

  • training_samples – The number of batches to train zeta and v for

  • batch_size – The number of samples in each batch

  • loss_callback_fn – A function that will be called every reporting_frequency batches, giving the average zeta loss, average nu loss, and self

  • reporting_frequency – The number of batches between outputting the state of the training

action_dim: int
activation

alias of torch.nn.modules.activation.Tanh

average_next_v: bool = False
batch_size: int = 2048
deterministic_env: bool
device: Any = None
evaluate(input: reagent.ope.estimators.sequential_estimators.RLEstimatorInput, **kwargs) reagent.ope.estimators.estimator.EstimatorResults
f: Any = None
fconjugate: Any = None
hidden_dim: int = 64
hidden_layers: int = 2
loss_callback_fn: Optional[Callable[[float, float, reagent.ope.estimators.sequential_estimators.RLEstimator], None]] = None
polynomial_degree: float = 1.5
reporting_frequency: int = 1000
reset()
state_dim: int
training_samples: int = 100000
v: Any = None
v_net: Any = None
value_lr: float = 0.01
zeta: Any = None
zeta_lr: float = 0.01
zeta_net: Any = None
class reagent.ope.estimators.sequential_estimators.RLEstimator(device=None)

Bases: reagent.ope.estimators.estimator.Estimator

class reagent.ope.estimators.sequential_estimators.RLEstimatorInput(gamma: float, log: Sequence[Sequence[reagent.ope.estimators.sequential_estimators.Transition]], target_policy: reagent.ope.estimators.sequential_estimators.RLPolicy, value_function: Optional[reagent.ope.estimators.sequential_estimators.ValueFunction] = None, ground_truth: Optional[reagent.ope.estimators.sequential_estimators.ValueFunction] = None, horizon: int = - 1, discrete_states: bool = True)

Bases: object

discrete_states: bool = True
gamma: float
ground_truth: Optional[reagent.ope.estimators.sequential_estimators.ValueFunction] = None
horizon: int = -1
log: Sequence[Sequence[reagent.ope.estimators.sequential_estimators.Transition]]
target_policy: reagent.ope.estimators.sequential_estimators.RLPolicy
value_function: Optional[reagent.ope.estimators.sequential_estimators.ValueFunction] = None
class reagent.ope.estimators.sequential_estimators.RLPolicy(action_space: reagent.ope.estimators.types.ActionSpace, device=None)

Bases: abc.ABC

Policy interface

abstract action_dist(state: reagent.ope.estimators.sequential_estimators.State) reagent.ope.estimators.types.ActionDistribution
property action_space
class reagent.ope.estimators.sequential_estimators.RandomRLPolicy(action_space: reagent.ope.estimators.types.ActionSpace, device=None)

Bases: reagent.ope.estimators.sequential_estimators.RLPolicy

A random policy which return an action according to uniform distribution

action_dist(state: reagent.ope.estimators.sequential_estimators.State) reagent.ope.estimators.types.ActionDistribution
class reagent.ope.estimators.sequential_estimators.RewardProbability(reward: float = 0.0, prob: float = 0.0)

Bases: object

prob: float = 0.0
reward: float = 0.0
class reagent.ope.estimators.sequential_estimators.State(value: ~ ValueType, is_terminal: bool = False)

Bases: reagent.ope.estimators.types.TypeWrapper[Union[float, Tuple[float], Tuple[int], numpy.ndarray, torch.Tensor]]

is_terminal: bool = False
class reagent.ope.estimators.sequential_estimators.StateReward(state: Optional[reagent.ope.estimators.sequential_estimators.State] = None, reward: float = 0.0)

Bases: object

reward: float = 0.0
state: Optional[reagent.ope.estimators.sequential_estimators.State] = None
class reagent.ope.estimators.sequential_estimators.Transition(last_state: Optional[reagent.ope.estimators.sequential_estimators.State] = None, action: Optional[reagent.ope.estimators.types.TypeWrapper[Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]]] = None, action_prob: float = 0.0, state: Optional[reagent.ope.estimators.sequential_estimators.State] = None, reward: float = 0.0, status: reagent.ope.estimators.sequential_estimators.Transition.Status = <Status.NORMAL: 1>)

Bases: object

class Status(value)

Bases: enum.Enum

An enumeration.

NOOP = 0
NORMAL = 1
TERMINATED = 2
action: Optional[reagent.ope.estimators.types.TypeWrapper[Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]]] = None
action_prob: float = 0.0
last_state: Optional[reagent.ope.estimators.sequential_estimators.State] = None
reward: float = 0.0
state: Optional[reagent.ope.estimators.sequential_estimators.State] = None
status: reagent.ope.estimators.sequential_estimators.Transition.Status = 1
class reagent.ope.estimators.sequential_estimators.ValueFunction

Bases: abc.ABC

Value function to calculate state and state-action values

abstract reset()
abstract state_action_value(state: reagent.ope.estimators.sequential_estimators.State, action: reagent.ope.estimators.types.TypeWrapper[Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]]) float
abstract state_value(state: reagent.ope.estimators.sequential_estimators.State) float

reagent.ope.estimators.slate_estimators module

class reagent.ope.estimators.slate_estimators.DCGSlateMetric(device=None)

Bases: reagent.ope.estimators.slate_estimators.SlateMetric

slot_values(rewards: reagent.ope.estimators.slate_estimators.SlateSlotValues) reagent.ope.estimators.slate_estimators.SlateSlotValues
slot_weights(slots: reagent.ope.estimators.slate_estimators.SlateSlots) reagent.ope.estimators.slate_estimators.SlateSlotValues
class reagent.ope.estimators.slate_estimators.DMEstimator(trainer: reagent.ope.estimators.types.Trainer, training_sample_ratio: float, device=None)

Bases: reagent.ope.estimators.slate_estimators.SlateEstimator

Direct Method estimator

evaluate(input: reagent.ope.estimators.slate_estimators.SlateEstimatorInput, *kwargs) Optional[reagent.ope.estimators.estimator.EstimatorResult]
class reagent.ope.estimators.slate_estimators.DoublyRobustEstimator(trainer: reagent.ope.estimators.types.Trainer, training_sample_ratio: float, weight_clamper: Optional[reagent.ope.utils.Clamper] = None, weighted: bool = False, device=None)

Bases: reagent.ope.estimators.slate_estimators.DMEstimator

evaluate(input: reagent.ope.estimators.slate_estimators.SlateEstimatorInput, *kwargs) Optional[reagent.ope.estimators.estimator.EstimatorResult]
class reagent.ope.estimators.slate_estimators.ERRSlateMetric(max_reward: float, device=None)

Bases: reagent.ope.estimators.slate_estimators.SlateMetric

slot_values(rewards: reagent.ope.estimators.slate_estimators.SlateSlotValues) reagent.ope.estimators.slate_estimators.SlateSlotValues
slot_weights(slots: reagent.ope.estimators.slate_estimators.SlateSlots) reagent.ope.estimators.slate_estimators.SlateSlotValues
class reagent.ope.estimators.slate_estimators.FrechetDistribution(shape: float, deterministic: bool = False)

Bases: reagent.ope.estimators.slate_estimators.RewardDistribution

Frechet distribution

distribution(rewards: torch.Tensor) torch.Tensor
property name: str
class reagent.ope.estimators.slate_estimators.IPSEstimator(weight_clamper: Optional[reagent.ope.utils.Clamper] = None, weighted: bool = True, device=None)

Bases: reagent.ope.estimators.slate_estimators.SlateEstimator

evaluate(input: reagent.ope.estimators.slate_estimators.SlateEstimatorInput, *kwargs) Optional[reagent.ope.estimators.estimator.EstimatorResult]
class reagent.ope.estimators.slate_estimators.LogSample(context: reagent.ope.estimators.slate_estimators.SlateContext, metric: reagent.ope.estimators.slate_estimators.SlateMetric, log_slate: reagent.ope.estimators.slate_estimators.Slate, log_reward: float, _log_slate_probability: float = nan, _log_slot_item_probabilities: Optional[reagent.ope.estimators.slate_estimators.SlateSlotItemProbabilities] = None, _log_item_probabilities: Optional[reagent.ope.estimators.slate_estimators.SlateItemProbabilities] = None, _tgt_slate_probability: float = nan, _tgt_slot_item_probabilities: Optional[reagent.ope.estimators.slate_estimators.SlateSlotItemProbabilities] = None, _tgt_item_probabilities: Optional[reagent.ope.estimators.slate_estimators.SlateItemProbabilities] = None, ground_truth_reward: float = nan, slot_weights: Optional[reagent.ope.estimators.slate_estimators.SlateSlotValues] = None, slot_probabilities: Optional[reagent.ope.estimators.slate_estimators.SlateSlotValues] = None, item_features: Optional[reagent.ope.estimators.slate_estimators.SlateItemFeatures] = None)

Bases: object

context: reagent.ope.estimators.slate_estimators.SlateContext
ground_truth_reward: float = nan
item_features: Optional[reagent.ope.estimators.slate_estimators.SlateItemFeatures] = None
property items: reagent.ope.estimators.slate_estimators.SlateItems
log_reward: float
log_slate: reagent.ope.estimators.slate_estimators.Slate
log_slate_probability(slate: Optional[reagent.ope.estimators.slate_estimators.Slate] = None) float
log_slot_item_expectations(slots: reagent.ope.estimators.slate_estimators.SlateSlots) Optional[reagent.ope.estimators.slate_estimators.SlateSlotItemExpectations]
metric: reagent.ope.estimators.slate_estimators.SlateMetric
slot_probabilities: Optional[reagent.ope.estimators.slate_estimators.SlateSlotValues] = None
slot_weights: Optional[reagent.ope.estimators.slate_estimators.SlateSlotValues] = None
tgt_slate_probability() float
tgt_slate_space(slots: reagent.ope.estimators.slate_estimators.SlateSlots) Iterable[Tuple[Sequence[reagent.ope.estimators.types.TypeWrapper[Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]]], float]]
tgt_slot_expectations(slots: reagent.ope.estimators.slate_estimators.SlateSlots) Optional[reagent.ope.estimators.slate_estimators.SlateSlotItemExpectations]
validate()
class reagent.ope.estimators.slate_estimators.NDCGSlateMetric(item_rewards: reagent.ope.estimators.slate_estimators.SlateItemValues, device=None)

Bases: reagent.ope.estimators.slate_estimators.DCGSlateMetric

slot_weights(slots: reagent.ope.estimators.slate_estimators.SlateSlots) reagent.ope.estimators.slate_estimators.SlateSlotValues
class reagent.ope.estimators.slate_estimators.PBMEstimator(weight_clamper: Optional[reagent.ope.utils.Clamper] = None, weighted: bool = True, device=None)

Bases: reagent.ope.estimators.slate_estimators.SlateEstimator

Estimator from reference 1: Position-Based Click Model

evaluate(input: reagent.ope.estimators.slate_estimators.SlateEstimatorInput, *kwargs) Optional[reagent.ope.estimators.estimator.EstimatorResult]
class reagent.ope.estimators.slate_estimators.PassThruDistribution(deterministic: bool = False)

Bases: reagent.ope.estimators.slate_estimators.RewardDistribution

No-op distribution, probability determined by reward

distribution(rewards: torch.Tensor) torch.Tensor
property name: str
class reagent.ope.estimators.slate_estimators.PseudoInverseEstimator(weight_clamper: Optional[reagent.ope.utils.Clamper] = None, weighted: bool = True, device=None)

Bases: reagent.ope.estimators.slate_estimators.SlateEstimator

Estimator from reference 2

evaluate(input: reagent.ope.estimators.slate_estimators.SlateEstimatorInput, *kwargs) Optional[reagent.ope.estimators.estimator.EstimatorResult]
class reagent.ope.estimators.slate_estimators.RankingDistribution(alpha: float = - 1.0, deterministic: bool = False)

Bases: reagent.ope.estimators.slate_estimators.RewardDistribution

Ranking distribution according to https://arxiv.org/abs/1605.04812

distribution(rewards: torch.Tensor) torch.Tensor
property name: str
class reagent.ope.estimators.slate_estimators.RewardDistribution(deterministic: bool = False)

Bases: abc.ABC

Return customized probability distribution according to rewards

abstract distribution(rewards: torch.Tensor) torch.Tensor
abstract property name: str
class reagent.ope.estimators.slate_estimators.Slate(values: Union[MutableMapping[reagent.ope.estimators.types.TypeWrapper[Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]], reagent.ope.estimators.types.ValueType], MutableSequence[reagent.ope.estimators.types.ValueType]])

Bases: reagent.ope.estimators.slate_estimators.SlateSlotObjects[reagent.ope.estimators.types.TypeWrapper[Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]]]

Class represents a slate: map from slots to items/docs

property items: Sequence[reagent.ope.estimators.types.TypeWrapper[Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]]]
one_hots(items: reagent.ope.estimators.slate_estimators.SlateItems, device=None) torch.Tensor
slot_features(item_features: reagent.ope.estimators.slate_estimators.SlateItemFeatures) reagent.ope.estimators.slate_estimators.SlateSlotFeatures

Map items in the slate to given values :param item_values: Map from all items to some values

Returns

List of values in the slate

slot_values(item_values: reagent.ope.estimators.slate_estimators.SlateItemValues) reagent.ope.estimators.slate_estimators.SlateSlotValues

Map items in the slate to given values :param item_values: Map from all items to some values

Returns

List of values in the slate

class reagent.ope.estimators.slate_estimators.SlateContext(query: reagent.ope.estimators.types.TypeWrapper[typing.Union[typing.Tuple[int], typing.Tuple[float], numpy.ndarray, torch.Tensor, typing.Tuple[int, int]]], slots: reagent.ope.estimators.slate_estimators.SlateSlots, params: object = None)

Bases: object

params: object = None
query: reagent.ope.estimators.types.TypeWrapper[Union[Tuple[int], Tuple[float], numpy.ndarray, torch.Tensor, Tuple[int, int]]]
slots: reagent.ope.estimators.slate_estimators.SlateSlots
class reagent.ope.estimators.slate_estimators.SlateEstimator(device=None)

Bases: reagent.ope.estimators.estimator.Estimator

class reagent.ope.estimators.slate_estimators.SlateEstimatorInput(samples: Sequence[reagent.ope.estimators.slate_estimators.LogSample])

Bases: object

samples: Sequence[reagent.ope.estimators.slate_estimators.LogSample]
validate()
class reagent.ope.estimators.slate_estimators.SlateItemFeatures(values: Union[Mapping[reagent.ope.estimators.types.TypeWrapper[Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]], torch.Tensor], Sequence[torch.Tensor], torch.Tensor, numpy.ndarray])

Bases: reagent.ope.estimators.types.Objects[reagent.ope.estimators.types.TypeWrapper[Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]], torch.Tensor]

property items: reagent.ope.estimators.slate_estimators.SlateItems
class reagent.ope.estimators.slate_estimators.SlateItemProbabilities(values: Union[Mapping[reagent.ope.estimators.types.TypeWrapper[Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]], float], Sequence[float], numpy.ndarray, torch.Tensor], greedy: bool = False)

Bases: reagent.ope.estimators.slate_estimators.SlateItemValues

Probabilities of each item being selected into the slate

property is_deterministic: bool
sample_slate(slots: reagent.ope.estimators.slate_estimators.SlateSlots) reagent.ope.estimators.slate_estimators.Slate
slate_probability(slate: reagent.ope.estimators.slate_estimators.Slate) float

Calculate probability of a slate under this distribution :param slate:

Returns

probability

slate_space(slots: reagent.ope.estimators.slate_estimators.SlateSlots, max_size: int = - 1) Iterable[Tuple[Sequence[reagent.ope.estimators.types.TypeWrapper[Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]]], float]]

Return all possible slates and their probabilities

The algorithm is similar to _calculate_expectations(), but has less value to cache thus save both space and computation :param slots: slots to be filled :param max_size: max number of samples to be returned

<= 0 return all samples

slot_item_expectations(slots: reagent.ope.estimators.slate_estimators.SlateSlots) reagent.ope.estimators.slate_estimators.SlateSlotItemExpectations
class reagent.ope.estimators.slate_estimators.SlateItemValues(values: Union[Mapping[reagent.ope.estimators.types.KeyType, float], Sequence[float], numpy.ndarray, torch.Tensor])

Bases: reagent.ope.estimators.types.Values[reagent.ope.estimators.types.TypeWrapper[Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]]]

property items: reagent.ope.estimators.slate_estimators.SlateItems
class reagent.ope.estimators.slate_estimators.SlateItems(items: Union[Sequence[reagent.ope.estimators.types.ValueType], int])

Bases: reagent.ope.estimators.types.Items[reagent.ope.estimators.types.TypeWrapper[Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]]]

class reagent.ope.estimators.slate_estimators.SlateMetric(device=None)

Bases: object

Metric calculator for a slate: weights (dot) rewards

Base class is just sum of the all item rewards

calculate_reward(slots: reagent.ope.estimators.slate_estimators.SlateSlots, rewards: Optional[reagent.ope.estimators.slate_estimators.SlateSlotValues] = None, slot_values: Optional[reagent.ope.estimators.slate_estimators.SlateSlotValues] = None, slot_weights: Optional[reagent.ope.estimators.slate_estimators.SlateSlotValues] = None) float
slot_values(rewards: reagent.ope.estimators.slate_estimators.SlateSlotValues) reagent.ope.estimators.slate_estimators.SlateSlotValues
slot_weights(slots: reagent.ope.estimators.slate_estimators.SlateSlots) reagent.ope.estimators.slate_estimators.SlateSlotValues
class reagent.ope.estimators.slate_estimators.SlateModel

Bases: abc.ABC

Model providing item relevance/reward, slot examination (click) distribution

abstract item_rewards(context: reagent.ope.estimators.slate_estimators.SlateContext) reagent.ope.estimators.slate_estimators.SlateItemValues

Returns each item’s relevance under the context :param context:

Returns

Item relevances

slot_probabilities(context: reagent.ope.estimators.slate_estimators.SlateContext) reagent.ope.estimators.slate_estimators.SlateSlotValues

Returns each slot/positions’s probability independent of showing item, used in PBM estimator :param context:

Returns:

class reagent.ope.estimators.slate_estimators.SlatePolicy(device=None)

Bases: abc.ABC

Policy interface

class reagent.ope.estimators.slate_estimators.SlateSlotFeatures(values: Union[MutableMapping[reagent.ope.estimators.types.TypeWrapper[Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]], reagent.ope.estimators.types.ValueType], MutableSequence[reagent.ope.estimators.types.ValueType]])

Bases: reagent.ope.estimators.slate_estimators.SlateSlotObjects[torch.Tensor]

property features: torch.Tensor
class reagent.ope.estimators.slate_estimators.SlateSlotItemExpectations(values: Union[MutableMapping[reagent.ope.estimators.types.TypeWrapper[Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]], reagent.ope.estimators.slate_estimators.SlateItemValues], MutableSequence[reagent.ope.estimators.slate_estimators.SlateItemValues]])

Bases: reagent.ope.estimators.slate_estimators.SlateSlotItemValues

property expectations: Sequence[reagent.ope.estimators.slate_estimators.SlateItemValues]
expected_rewards(item_rewards: reagent.ope.estimators.slate_estimators.SlateItemValues, device=None) reagent.ope.estimators.slate_estimators.SlateSlotValues

Calculate expected relevances of each slot, given each item’s relevances, under this distribution :param item_rewards: :param device:

Returns

Map of slots to their expected relevance

class reagent.ope.estimators.slate_estimators.SlateSlotItemProbabilities(values: Union[MutableMapping[reagent.ope.estimators.types.TypeWrapper[Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]], reagent.ope.estimators.slate_estimators.SlateItemValues], MutableSequence[reagent.ope.estimators.slate_estimators.SlateItemValues]], greedy: bool = False)

Bases: reagent.ope.estimators.slate_estimators.SlateSlotItemValues

sample_slate(slots: reagent.ope.estimators.slate_estimators.SlateSlots) reagent.ope.estimators.slate_estimators.Slate
slate_probability(slate: reagent.ope.estimators.slate_estimators.Slate) float

Calculate probability of a slate under this distribution :param slate:

Returns

probability

slot_item_expectations(samples: int = 20000) reagent.ope.estimators.slate_estimators.SlateSlotItemExpectations
class reagent.ope.estimators.slate_estimators.SlateSlotItemValues(values: Union[MutableMapping[reagent.ope.estimators.types.TypeWrapper[Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]], reagent.ope.estimators.slate_estimators.SlateItemValues], MutableSequence[reagent.ope.estimators.slate_estimators.SlateItemValues]])

Bases: reagent.ope.estimators.slate_estimators.SlateSlotObjects[reagent.ope.estimators.slate_estimators.SlateItemValues]

values_tensor(device=None) torch.Tensor
class reagent.ope.estimators.slate_estimators.SlateSlotObjects(values: Union[MutableMapping[reagent.ope.estimators.types.TypeWrapper[Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]], reagent.ope.estimators.types.ValueType], MutableSequence[reagent.ope.estimators.types.ValueType]])

Bases: reagent.ope.estimators.types.Objects[reagent.ope.estimators.types.TypeWrapper[Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]], reagent.ope.estimators.types.ValueType]

fill(values: Sequence[reagent.ope.estimators.types.ValueType]) Union[Mapping[reagent.ope.estimators.types.TypeWrapper[Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]], reagent.ope.estimators.types.ValueType], Sequence[reagent.ope.estimators.types.ValueType]]
property objects: Sequence[reagent.ope.estimators.types.ValueType]
property slots: reagent.ope.estimators.slate_estimators.SlateSlots
class reagent.ope.estimators.slate_estimators.SlateSlotValues(values: Union[Mapping[reagent.ope.estimators.types.KeyType, float], Sequence[float], numpy.ndarray, torch.Tensor])

Bases: reagent.ope.estimators.types.Values[reagent.ope.estimators.types.TypeWrapper[Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]]]

Map from a slot to a value

class reagent.ope.estimators.slate_estimators.SlateSlots(items: Union[Sequence[reagent.ope.estimators.types.ValueType], int])

Bases: reagent.ope.estimators.types.Items[reagent.ope.estimators.types.TypeWrapper[Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]]]

List of slot

fill(values: Union[Mapping[reagent.ope.estimators.types.TypeWrapper[Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]], float], Sequence[float], numpy.ndarray, torch.Tensor]) reagent.ope.estimators.slate_estimators.SlateSlotValues

Map slots to given values :param values: given values

Returns

Map from slots to given values

reagent.ope.estimators.slate_estimators.is_to_calculate_expectation(slate_size: int, item_size: int) bool

Switch between calculating and sampling expectations, balanced by execution time and accuracy :returns: True to calculate

False to sample

reagent.ope.estimators.slate_estimators.make_slate(slots: reagent.ope.estimators.slate_estimators.SlateSlots, items: Sequence[reagent.ope.estimators.types.TypeWrapper[Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]]]) reagent.ope.estimators.slate_estimators.Slate

Assign items to slots to make a slate

reagent.ope.estimators.slate_estimators.make_slot_item_distributions(slots: reagent.ope.estimators.slate_estimators.SlateSlots, dists: Sequence[reagent.ope.estimators.slate_estimators.SlateItemValues]) reagent.ope.estimators.slate_estimators.SlateSlotItemExpectations

reagent.ope.estimators.types module

class reagent.ope.estimators.types.ActionDistribution(values: Union[Mapping[reagent.ope.estimators.types.KeyType, float], Sequence[float], numpy.ndarray, torch.Tensor])

Bases: reagent.ope.estimators.types.Values[reagent.ope.estimators.types.TypeWrapper[Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]]]

class reagent.ope.estimators.types.ActionSpace(items: Union[Sequence[reagent.ope.estimators.types.ValueType], int])

Bases: reagent.ope.estimators.types.Items[reagent.ope.estimators.types.TypeWrapper[Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]]]

distribution(dist: Union[Mapping[reagent.ope.estimators.types.TypeWrapper[Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]], float], Sequence[float], numpy.ndarray, torch.Tensor]) reagent.ope.estimators.types.ActionDistribution
property space: Sequence[reagent.ope.estimators.types.TypeWrapper[Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]]]
class reagent.ope.estimators.types.Items(items: Union[Sequence[reagent.ope.estimators.types.ValueType], int])

Bases: Generic[reagent.ope.estimators.types.ValueType], abc.ABC

List of items

fill(values: Union[Mapping[reagent.ope.estimators.types.ValueType, float], Sequence[float], numpy.ndarray, torch.Tensor]) Union[Sequence[float], Mapping[reagent.ope.estimators.types.ValueType, float]]
index_of(item: reagent.ope.estimators.types.ValueType) int
property is_sequence
class reagent.ope.estimators.types.Objects(values: Union[Mapping[reagent.ope.estimators.types.KeyType, reagent.ope.estimators.types.ValueType], Sequence[reagent.ope.estimators.types.ValueType]])

Bases: Generic[reagent.ope.estimators.types.KeyType, reagent.ope.estimators.types.ValueType], abc.ABC

Generic class for a map from item to its value. It supports [] indexing, and iterator protocol

items

list of items

values

list of their values

index_of(key: reagent.ope.estimators.types.KeyType) int
property is_sequence
property keys: Sequence[reagent.ope.estimators.types.KeyType]
property values
class reagent.ope.estimators.types.Policy(action_space: reagent.ope.estimators.types.ActionSpace, device=None)

Bases: abc.ABC

Policy interface

property action_space
class reagent.ope.estimators.types.PredictResults(predictions: Optional[torch.Tensor], scores: torch.Tensor, probabilities: Optional[torch.Tensor] = None)

Bases: object

predictions: Optional[torch.Tensor]
probabilities: Optional[torch.Tensor] = None
scores: torch.Tensor
class reagent.ope.estimators.types.Trainer

Bases: abc.ABC

property is_trained: bool
load_model(file: str)
abstract property name: str
abstract predict(x: torch.Tensor, device=None) reagent.ope.estimators.types.PredictResults
reset()
save_model(file: str)
abstract score(x: torch.Tensor, y: torch.Tensor, weight: Optional[torch.Tensor] = None) float
abstract train(data: reagent.ope.estimators.types.TrainingData, iterations: int = 1, num_samples: int = 0)
class reagent.ope.estimators.types.TrainingData(train_x: torch.Tensor, train_y: torch.Tensor, train_weight: Optional[torch.Tensor], validation_x: torch.Tensor, validation_y: torch.Tensor, validation_weight: Optional[torch.Tensor])

Bases: object

train_weight: Optional[torch.Tensor]
train_x: torch.Tensor
train_y: torch.Tensor
validation_weight: Optional[torch.Tensor]
validation_x: torch.Tensor
validation_y: torch.Tensor
class reagent.ope.estimators.types.TypeWrapper(value: ~ ValueType)

Bases: Generic[reagent.ope.estimators.types.ValueType]

value: reagent.ope.estimators.types.ValueType
class reagent.ope.estimators.types.Values(values: Union[Mapping[reagent.ope.estimators.types.KeyType, float], Sequence[float], numpy.ndarray, torch.Tensor])

Bases: reagent.ope.estimators.types.Objects[reagent.ope.estimators.types.KeyType, float]

Generic class for a map from item to its value. It supports [] indexing, and iterator protocol

items

list of items

values

list of their values

greedy(size=1) Sequence[reagent.ope.estimators.types.KeyType]
probability(key: reagent.ope.estimators.types.ValueType) float
replace(values: Union[Mapping[reagent.ope.estimators.types.ValueType, float], Sequence[float], torch.Tensor, numpy.ndarray]) reagent.ope.estimators.types.Values

Replace current values with new values, and returns the new copy. Current Values object is not changed

Parameters

values – new value

Returns

Values object with new values

sample(size=1) Sequence[reagent.ope.estimators.types.KeyType]
sort(descending: bool = True) Tuple[Sequence[reagent.ope.estimators.types.KeyType], torch.Tensor]

Sort based on values

Parameters

descending – sorting order

Returns

Tuple of sorted indices and values

reagent.ope.estimators.types.is_array(obj)

Module contents