reagent.ope.estimators package

Submodules

reagent.ope.estimators.contextual_bandits_estimators module

class reagent.ope.estimators.contextual_bandits_estimators.ActionRewards(values: Union[Mapping[KeyType, float], Sequence[float], numpy.ndarray, torch.Tensor])

Bases: reagent.ope.estimators.types.Values

class reagent.ope.estimators.contextual_bandits_estimators.BanditsEstimatorInput(action_space: reagent.ope.estimators.types.ActionSpace, samples: Sequence[reagent.ope.estimators.contextual_bandits_estimators.LogSample], has_model_outputs: bool)

Bases: object

class reagent.ope.estimators.contextual_bandits_estimators.BanditsModel

Bases: abc.ABC

class reagent.ope.estimators.contextual_bandits_estimators.DMEstimator(trainer: Optional[reagent.ope.estimators.types.Trainer] = None, device=None)

Bases: reagent.ope.estimators.estimator.Estimator

Estimating using Direct Method (DM), assuming a reward model is trained

evaluate(input: reagent.ope.estimators.contextual_bandits_estimators.BanditsEstimatorInput, **kwargs) → Optional[reagent.ope.estimators.estimator.EstimatorResult]
class reagent.ope.estimators.contextual_bandits_estimators.DoublyRobustEstimator(trainer: Optional[reagent.ope.estimators.types.Trainer] = None, weight_clamper: Optional[reagent.ope.utils.Clamper] = None, device=None)

Bases: reagent.ope.estimators.contextual_bandits_estimators.DMEstimator

Doubly Robust (DR) estimator:
reference: https://arxiv.org/abs/1103.4601 (deterministic reward model)

https://arxiv.org/abs/1612.01205 (distributed reward model)

evaluate(input: reagent.ope.estimators.contextual_bandits_estimators.BanditsEstimatorInput, **kwargs) → Optional[reagent.ope.estimators.estimator.EstimatorResult]
class reagent.ope.estimators.contextual_bandits_estimators.IPSEstimator(weight_clamper: Optional[reagent.ope.utils.Clamper] = None, weighted: bool = False, device=None)

Bases: reagent.ope.estimators.estimator.Estimator

Inverse Propensity Scoring (IPS) estimator

evaluate(input: reagent.ope.estimators.contextual_bandits_estimators.BanditsEstimatorInput, **kwargs) → Optional[reagent.ope.estimators.estimator.EstimatorResult]
class reagent.ope.estimators.contextual_bandits_estimators.LogSample(context: object, log_action: reagent.ope.estimators.types.TypeWrapper[typing.Union[int, typing.Tuple[int], float, typing.Tuple[float], numpy.ndarray, torch.Tensor]], log_reward: float, log_action_probabilities: reagent.ope.estimators.types.ActionDistribution, tgt_action_probabilities: reagent.ope.estimators.types.ActionDistribution, tgt_action: reagent.ope.estimators.types.TypeWrapper[typing.Union[int, typing.Tuple[int], float, typing.Tuple[float], numpy.ndarray, torch.Tensor]], model_outputs: Union[reagent.ope.estimators.contextual_bandits_estimators.ModelOutputs, NoneType] = None, ground_truth_reward: float = nan, item_feature: Union[torch.Tensor, NoneType] = None)

Bases: object

ground_truth_reward = nan
item_feature = None
model_outputs = None
class reagent.ope.estimators.contextual_bandits_estimators.ModelOutputs(tgt_reward_from_log_action: float, tgt_rewards: float)

Bases: object

reagent.ope.estimators.estimator module

class reagent.ope.estimators.estimator.Estimator(device=None)

Bases: abc.ABC

Estimator interface

abstract evaluate(input, **kwargs) → Union[reagent.ope.estimators.estimator.EstimatorResult, reagent.ope.estimators.estimator.EstimatorResults, None]
class reagent.ope.estimators.estimator.EstimatorResult(log_reward: float, estimated_reward: float, ground_truth_reward: Union[float, NoneType] = 0.0, estimated_weight: float = 1.0, estimated_reward_normalized: Union[float, NoneType] = None, estimated_reward_std_error: Union[float, NoneType] = None, estimated_reward_normalized_std_error: Union[float, NoneType] = None)

Bases: object

estimated_reward_normalized = None
estimated_reward_normalized_std_error = None
estimated_reward_std_error = None
estimated_weight = 1.0
ground_truth_reward = 0.0
class reagent.ope.estimators.estimator.EstimatorResults(results: List[reagent.ope.estimators.estimator.EstimatorResult] = <factory>)

Bases: object

Estimator results

append(result: reagent.ope.estimators.estimator.EstimatorResult)

Append a data point

Parameters

result – result from an experimental run

device = None
report()
class reagent.ope.estimators.estimator.EstimatorSampleResult(log_reward: float, target_reward: float, ground_truth_reward: float, weight: float)

Bases: object

class reagent.ope.estimators.estimator.Evaluator(experiments: Iterable[Tuple[Iterable[reagent.ope.estimators.estimator.Estimator], object]], max_num_workers: int)

Bases: object

Multiprocessing evaluator

evaluate() → Mapping[str, reagent.ope.estimators.estimator.EstimatorResults]
static report_results(results: Mapping[str, reagent.ope.estimators.estimator.EstimatorResults])
class reagent.ope.estimators.estimator.ResultDiffs(diffs: torch.Tensor)

Bases: object

Statistics for differences, e.g., estimates vs ground truth

property bias
property rmse
property variance
reagent.ope.estimators.estimator.run_evaluation(file_name: str) → Optional[Mapping[str, Iterable[reagent.ope.estimators.estimator.EstimatorResults]]]

reagent.ope.estimators.sequential_estimators module

class reagent.ope.estimators.sequential_estimators.DMEstimator(device=None)

Bases: reagent.ope.estimators.sequential_estimators.RLEstimator

Direct Method estimator

evaluate(input: reagent.ope.estimators.sequential_estimators.RLEstimatorInput, **kwargs) → reagent.ope.estimators.estimator.EstimatorResults
class reagent.ope.estimators.sequential_estimators.DoublyRobustEstimator(weight_clamper: Optional[reagent.ope.utils.Clamper] = None, weighted: bool = True, device=None)

Bases: reagent.ope.estimators.sequential_estimators.IPSEstimator

Doubly Robust estimator

evaluate(input: reagent.ope.estimators.sequential_estimators.RLEstimatorInput, **kwargs) → reagent.ope.estimators.estimator.EstimatorResults
class reagent.ope.estimators.sequential_estimators.EpsilonGreedyRLPolicy(policy: reagent.ope.estimators.sequential_estimators.RLPolicy, epsilon: float = 0.0)

Bases: reagent.ope.estimators.sequential_estimators.RLPolicy

A wrapper policy:

Skewing the wrapped policy action distribution by epsilon Number of total actions must be given, and wrapped policy should calculate probabilities for all actions

action_dist(state) → reagent.ope.estimators.types.ActionDistribution
class reagent.ope.estimators.sequential_estimators.IPSEstimator(weight_clamper: Optional[reagent.ope.utils.Clamper] = None, weighted: bool = True, device=None)

Bases: reagent.ope.estimators.sequential_estimators.RLEstimator

IPS estimator

evaluate(input: reagent.ope.estimators.sequential_estimators.RLEstimatorInput, **kwargs) → reagent.ope.estimators.estimator.EstimatorResults
class reagent.ope.estimators.sequential_estimators.MAGICEstimator(weight_clamper: Optional[reagent.ope.utils.Clamper] = None, device=None)

Bases: reagent.ope.estimators.sequential_estimators.IPSEstimator

Algorithm from https://arxiv.org/abs/1604.00923, appendix G.3

evaluate(input: reagent.ope.estimators.sequential_estimators.RLEstimatorInput, **kwargs) → reagent.ope.estimators.estimator.EstimatorResults
class reagent.ope.estimators.sequential_estimators.Model

Bases: abc.ABC

Model interface

abstract next_state_reward_dist(state: reagent.ope.estimators.sequential_estimators.State, action: reagent.ope.estimators.types.TypeWrapper[typing.Union[int, typing.Tuple[int], float, typing.Tuple[float], numpy.ndarray, torch.Tensor]][Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]]) → Mapping[reagent.ope.estimators.sequential_estimators.State, reagent.ope.estimators.sequential_estimators.RewardProbability]
class reagent.ope.estimators.sequential_estimators.RLEstimator(device=None)

Bases: reagent.ope.estimators.estimator.Estimator

class reagent.ope.estimators.sequential_estimators.RLEstimatorInput(gamma: float, log: Mapping[reagent.ope.estimators.sequential_estimators.State, Sequence[Sequence[reagent.ope.estimators.sequential_estimators.Transition]]], target_policy: reagent.ope.estimators.sequential_estimators.RLPolicy, value_function: Union[reagent.ope.estimators.sequential_estimators.ValueFunction, NoneType] = None, ground_truth: Union[reagent.ope.estimators.sequential_estimators.ValueFunction, NoneType] = None, horizon: int = -1)

Bases: object

ground_truth = None
horizon = -1
value_function = None
class reagent.ope.estimators.sequential_estimators.RLPolicy(action_space: reagent.ope.estimators.types.ActionSpace, device=None)

Bases: abc.ABC

Policy interface

abstract action_dist(state: reagent.ope.estimators.sequential_estimators.State) → reagent.ope.estimators.types.ActionDistribution
property action_space
class reagent.ope.estimators.sequential_estimators.RandomRLPolicy(action_space: reagent.ope.estimators.types.ActionSpace, device=None)

Bases: reagent.ope.estimators.sequential_estimators.RLPolicy

A random policy which return an action according to uniform distribution

action_dist(state: reagent.ope.estimators.sequential_estimators.State) → reagent.ope.estimators.types.ActionDistribution
class reagent.ope.estimators.sequential_estimators.RewardProbability(reward: float = 0.0, prob: float = 0.0)

Bases: object

prob = 0.0
reward = 0.0
class reagent.ope.estimators.sequential_estimators.State(*args, **kwds)

Bases: reagent.ope.estimators.types.TypeWrapper

is_terminal = False
class reagent.ope.estimators.sequential_estimators.StateReward(state: Union[reagent.ope.estimators.sequential_estimators.State, NoneType] = None, reward: float = 0.0)

Bases: object

reward = 0.0
state = None
class reagent.ope.estimators.sequential_estimators.Transition(last_state: Union[reagent.ope.estimators.sequential_estimators.State, NoneType] = None, action: Union[reagent.ope.estimators.types.TypeWrapper[Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]], NoneType] = None, action_prob: float = 0.0, state: Union[reagent.ope.estimators.sequential_estimators.State, NoneType] = None, reward: float = 0.0, status: reagent.ope.estimators.sequential_estimators.Transition.Status = <Status.NORMAL: 1>)

Bases: object

class Status

Bases: enum.Enum

An enumeration.

NOOP = 0
NORMAL = 1
TERMINATED = 2
action = None
action_prob = 0.0
last_state = None
reward = 0.0
state = None
status = 1
class reagent.ope.estimators.sequential_estimators.ValueFunction

Bases: abc.ABC

Value function to calculate state and state-action values

abstract reset()
abstract state_action_value(state: reagent.ope.estimators.sequential_estimators.State, action: reagent.ope.estimators.types.TypeWrapper[typing.Union[int, typing.Tuple[int], float, typing.Tuple[float], numpy.ndarray, torch.Tensor]][Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]]) → float
abstract state_value(state: reagent.ope.estimators.sequential_estimators.State) → float

reagent.ope.estimators.slate_estimators module

class reagent.ope.estimators.slate_estimators.DCGSlateMetric(device=None)

Bases: reagent.ope.estimators.slate_estimators.SlateMetric

slot_values(rewards: reagent.ope.estimators.slate_estimators.SlateSlotValues) → reagent.ope.estimators.slate_estimators.SlateSlotValues
slot_weights(slots: reagent.ope.estimators.slate_estimators.SlateSlots) → reagent.ope.estimators.slate_estimators.SlateSlotValues
class reagent.ope.estimators.slate_estimators.DMEstimator(trainer: reagent.ope.estimators.types.Trainer, training_sample_ratio: float, device=None)

Bases: reagent.ope.estimators.slate_estimators.SlateEstimator

Direct Method estimator

evaluate(input: reagent.ope.estimators.slate_estimators.SlateEstimatorInput, *kwargs) → Optional[reagent.ope.estimators.estimator.EstimatorResult]
class reagent.ope.estimators.slate_estimators.DoublyRobustEstimator(trainer: reagent.ope.estimators.types.Trainer, training_sample_ratio: float, weight_clamper: Optional[reagent.ope.utils.Clamper] = None, weighted: bool = False, device=None)

Bases: reagent.ope.estimators.slate_estimators.DMEstimator

evaluate(input: reagent.ope.estimators.slate_estimators.SlateEstimatorInput, *kwargs) → Optional[reagent.ope.estimators.estimator.EstimatorResult]
class reagent.ope.estimators.slate_estimators.ERRSlateMetric(max_reward: float, device=None)

Bases: reagent.ope.estimators.slate_estimators.SlateMetric

slot_values(rewards: reagent.ope.estimators.slate_estimators.SlateSlotValues) → reagent.ope.estimators.slate_estimators.SlateSlotValues
slot_weights(slots: reagent.ope.estimators.slate_estimators.SlateSlots) → reagent.ope.estimators.slate_estimators.SlateSlotValues
class reagent.ope.estimators.slate_estimators.FrechetDistribution(shape: float, deterministic: bool = False)

Bases: reagent.ope.estimators.slate_estimators.RewardDistribution

Frechet distribution

distribution(rewards: torch.Tensor) → torch.Tensor
property name
class reagent.ope.estimators.slate_estimators.IPSEstimator(weight_clamper: Optional[reagent.ope.utils.Clamper] = None, weighted: bool = True, device=None)

Bases: reagent.ope.estimators.slate_estimators.SlateEstimator

evaluate(input: reagent.ope.estimators.slate_estimators.SlateEstimatorInput, *kwargs) → Optional[reagent.ope.estimators.estimator.EstimatorResult]
class reagent.ope.estimators.slate_estimators.LogSample(context: reagent.ope.estimators.slate_estimators.SlateContext, metric: reagent.ope.estimators.slate_estimators.SlateMetric, log_slate: reagent.ope.estimators.slate_estimators.Slate, log_reward: float, _log_slate_probability: float = nan, _log_slot_item_probabilities: Union[reagent.ope.estimators.slate_estimators.SlateSlotItemProbabilities, NoneType] = None, _log_item_probabilities: Union[reagent.ope.estimators.slate_estimators.SlateItemProbabilities, NoneType] = None, _tgt_slate_probability: float = nan, _tgt_slot_item_probabilities: Union[reagent.ope.estimators.slate_estimators.SlateSlotItemProbabilities, NoneType] = None, _tgt_item_probabilities: Union[reagent.ope.estimators.slate_estimators.SlateItemProbabilities, NoneType] = None, ground_truth_reward: float = nan, slot_weights: Union[reagent.ope.estimators.slate_estimators.SlateSlotValues, NoneType] = None, slot_probabilities: Union[reagent.ope.estimators.slate_estimators.SlateSlotValues, NoneType] = None, item_features: Union[reagent.ope.estimators.slate_estimators.SlateItemFeatures, NoneType] = None)

Bases: object

ground_truth_reward = nan
item_features = None
property items
log_slate_probability(slate: Optional[reagent.ope.estimators.slate_estimators.Slate] = None) → float
log_slot_item_expectations(slots: reagent.ope.estimators.slate_estimators.SlateSlots) → Optional[reagent.ope.estimators.slate_estimators.SlateSlotItemExpectations]
slot_probabilities = None
slot_weights = None
tgt_slate_probability() → float
tgt_slate_space(slots: reagent.ope.estimators.slate_estimators.SlateSlots) → Iterable[Tuple[Sequence[reagent.ope.estimators.types.TypeWrapper[typing.Union[int, typing.Tuple[int], float, typing.Tuple[float], numpy.ndarray, torch.Tensor]][Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]]], float]]
tgt_slot_expectations(slots: reagent.ope.estimators.slate_estimators.SlateSlots) → Optional[reagent.ope.estimators.slate_estimators.SlateSlotItemExpectations]
validate()
class reagent.ope.estimators.slate_estimators.NDCGSlateMetric(item_rewards: reagent.ope.estimators.slate_estimators.SlateItemValues, device=None)

Bases: reagent.ope.estimators.slate_estimators.DCGSlateMetric

slot_weights(slots: reagent.ope.estimators.slate_estimators.SlateSlots) → reagent.ope.estimators.slate_estimators.SlateSlotValues
class reagent.ope.estimators.slate_estimators.PBMEstimator(weight_clamper: Optional[reagent.ope.utils.Clamper] = None, weighted: bool = True, device=None)

Bases: reagent.ope.estimators.slate_estimators.SlateEstimator

Estimator from reference 1: Position-Based Click Model

evaluate(input: reagent.ope.estimators.slate_estimators.SlateEstimatorInput, *kwargs) → Optional[reagent.ope.estimators.estimator.EstimatorResult]
class reagent.ope.estimators.slate_estimators.PassThruDistribution(deterministic: bool = False)

Bases: reagent.ope.estimators.slate_estimators.RewardDistribution

No-op distribution, probability determined by reward

distribution(rewards: torch.Tensor) → torch.Tensor
property name
class reagent.ope.estimators.slate_estimators.PseudoInverseEstimator(weight_clamper: Optional[reagent.ope.utils.Clamper] = None, weighted: bool = True, device=None)

Bases: reagent.ope.estimators.slate_estimators.SlateEstimator

Estimator from reference 2

evaluate(input: reagent.ope.estimators.slate_estimators.SlateEstimatorInput, *kwargs) → Optional[reagent.ope.estimators.estimator.EstimatorResult]
class reagent.ope.estimators.slate_estimators.RankingDistribution(alpha: float = -1.0, deterministic: bool = False)

Bases: reagent.ope.estimators.slate_estimators.RewardDistribution

Ranking distribution according to https://arxiv.org/abs/1605.04812

distribution(rewards: torch.Tensor) → torch.Tensor
property name
class reagent.ope.estimators.slate_estimators.RewardDistribution(deterministic: bool = False)

Bases: abc.ABC

Return customized probability distribution according to rewards

abstract distribution(rewards: torch.Tensor) → torch.Tensor
abstract property name
class reagent.ope.estimators.slate_estimators.Slate(values: Union[MutableMapping[reagent.ope.estimators.types.TypeWrapper[typing.Union[int, typing.Tuple[int], float, typing.Tuple[float], numpy.ndarray, torch.Tensor]][Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]], ValueType], MutableSequence[ValueType]])

Bases: reagent.ope.estimators.slate_estimators.SlateSlotObjects

Class represents a slate: map from slots to items/docs

property items
one_hots(items: reagent.ope.estimators.slate_estimators.SlateItems, device=None) → torch.Tensor
slot_features(item_features: reagent.ope.estimators.slate_estimators.SlateItemFeatures) → reagent.ope.estimators.slate_estimators.SlateSlotFeatures

Map items in the slate to given values :param item_values: Map from all items to some values

Returns

List of values in the slate

slot_values(item_values: reagent.ope.estimators.slate_estimators.SlateItemValues) → reagent.ope.estimators.slate_estimators.SlateSlotValues

Map items in the slate to given values :param item_values: Map from all items to some values

Returns

List of values in the slate

class reagent.ope.estimators.slate_estimators.SlateContext(query: reagent.ope.estimators.types.TypeWrapper[typing.Union[typing.Tuple[int], typing.Tuple[float], numpy.ndarray, torch.Tensor, typing.Tuple[int, int]]], slots: reagent.ope.estimators.slate_estimators.SlateSlots, params: object = None)

Bases: object

params = None
class reagent.ope.estimators.slate_estimators.SlateEstimator(device=None)

Bases: reagent.ope.estimators.estimator.Estimator

class reagent.ope.estimators.slate_estimators.SlateEstimatorInput(samples: Sequence[reagent.ope.estimators.slate_estimators.LogSample])

Bases: object

validate()
class reagent.ope.estimators.slate_estimators.SlateItemFeatures(values: Union[Mapping[reagent.ope.estimators.types.TypeWrapper[typing.Union[int, typing.Tuple[int], float, typing.Tuple[float], numpy.ndarray, torch.Tensor]][Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]], torch.Tensor], Sequence[torch.Tensor], torch.Tensor, numpy.ndarray])

Bases: reagent.ope.estimators.types.Objects

property items
class reagent.ope.estimators.slate_estimators.SlateItemProbabilities(values: Union[Mapping[reagent.ope.estimators.types.TypeWrapper[typing.Union[int, typing.Tuple[int], float, typing.Tuple[float], numpy.ndarray, torch.Tensor]][Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]], float], Sequence[float], numpy.ndarray, torch.Tensor], greedy: bool = False)

Bases: reagent.ope.estimators.slate_estimators.SlateItemValues

Probabilities of each item being selected into the slate

property is_deterministic
sample_slate(slots: reagent.ope.estimators.slate_estimators.SlateSlots) → reagent.ope.estimators.slate_estimators.Slate
slate_probability(slate: reagent.ope.estimators.slate_estimators.Slate) → float

Calculate probability of a slate under this distribution :param slate:

Returns

probability

slate_space(slots: reagent.ope.estimators.slate_estimators.SlateSlots, max_size: int = -1) → Iterable[Tuple[Sequence[reagent.ope.estimators.types.TypeWrapper[typing.Union[int, typing.Tuple[int], float, typing.Tuple[float], numpy.ndarray, torch.Tensor]][Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]]], float]]

Return all possible slates and their probabilities

The algorithm is similar to _calculate_expectations(), but has less value to cache thus save both space and computation :param slots: slots to be filled :param max_size: max number of samples to be returned

<= 0 return all samples

slot_item_expectations(slots: reagent.ope.estimators.slate_estimators.SlateSlots) → reagent.ope.estimators.slate_estimators.SlateSlotItemExpectations
class reagent.ope.estimators.slate_estimators.SlateItemValues(values: Union[Mapping[KeyType, float], Sequence[float], numpy.ndarray, torch.Tensor])

Bases: reagent.ope.estimators.types.Values

property items
class reagent.ope.estimators.slate_estimators.SlateItems(items: Union[Sequence[ValueType], int])

Bases: reagent.ope.estimators.types.Items

class reagent.ope.estimators.slate_estimators.SlateMetric(device=None)

Bases: object

Metric calculator for a slate: weights (dot) rewards

Base class is just sum of the all item rewards

calculate_reward(slots: reagent.ope.estimators.slate_estimators.SlateSlots, rewards: Optional[reagent.ope.estimators.slate_estimators.SlateSlotValues] = None, slot_values: Optional[reagent.ope.estimators.slate_estimators.SlateSlotValues] = None, slot_weights: Optional[reagent.ope.estimators.slate_estimators.SlateSlotValues] = None) → float
slot_values(rewards: reagent.ope.estimators.slate_estimators.SlateSlotValues) → reagent.ope.estimators.slate_estimators.SlateSlotValues
slot_weights(slots: reagent.ope.estimators.slate_estimators.SlateSlots) → reagent.ope.estimators.slate_estimators.SlateSlotValues
class reagent.ope.estimators.slate_estimators.SlateModel

Bases: abc.ABC

Model providing item relevance/reward, slot examination (click) distribution

abstract item_rewards(context: reagent.ope.estimators.slate_estimators.SlateContext) → reagent.ope.estimators.slate_estimators.SlateItemValues

Returns each item’s relevance under the context :param context:

Returns

Item relevances

slot_probabilities(context: reagent.ope.estimators.slate_estimators.SlateContext) → reagent.ope.estimators.slate_estimators.SlateSlotValues

Returns each slot/positions’s probability independent of showing item, used in PBM estimator :param context:

Returns:

class reagent.ope.estimators.slate_estimators.SlatePolicy(device=None)

Bases: abc.ABC

Policy interface

class reagent.ope.estimators.slate_estimators.SlateSlotFeatures(values: Union[MutableMapping[reagent.ope.estimators.types.TypeWrapper[typing.Union[int, typing.Tuple[int], float, typing.Tuple[float], numpy.ndarray, torch.Tensor]][Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]], ValueType], MutableSequence[ValueType]])

Bases: reagent.ope.estimators.slate_estimators.SlateSlotObjects

property features
class reagent.ope.estimators.slate_estimators.SlateSlotItemExpectations(values: Union[MutableMapping[reagent.ope.estimators.types.TypeWrapper[typing.Union[int, typing.Tuple[int], float, typing.Tuple[float], numpy.ndarray, torch.Tensor]][Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]], reagent.ope.estimators.slate_estimators.SlateItemValues], MutableSequence[reagent.ope.estimators.slate_estimators.SlateItemValues]])

Bases: reagent.ope.estimators.slate_estimators.SlateSlotItemValues

property expectations
expected_rewards(item_rewards: reagent.ope.estimators.slate_estimators.SlateItemValues, device=None) → reagent.ope.estimators.slate_estimators.SlateSlotValues

Calculate expected relevances of each slot, given each item’s relevances, under this distribution :param item_rewards: :param device:

Returns

Map of slots to their expected relevance

class reagent.ope.estimators.slate_estimators.SlateSlotItemProbabilities(values: Union[MutableMapping[reagent.ope.estimators.types.TypeWrapper[typing.Union[int, typing.Tuple[int], float, typing.Tuple[float], numpy.ndarray, torch.Tensor]][Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]], reagent.ope.estimators.slate_estimators.SlateItemValues], MutableSequence[reagent.ope.estimators.slate_estimators.SlateItemValues]], greedy: bool = False)

Bases: reagent.ope.estimators.slate_estimators.SlateSlotItemValues

sample_slate(slots: reagent.ope.estimators.slate_estimators.SlateSlots) → reagent.ope.estimators.slate_estimators.Slate
slate_probability(slate: reagent.ope.estimators.slate_estimators.Slate) → float

Calculate probability of a slate under this distribution :param slate:

Returns

probability

slot_item_expectations(samples: int = 20000) → reagent.ope.estimators.slate_estimators.SlateSlotItemExpectations
class reagent.ope.estimators.slate_estimators.SlateSlotItemValues(values: Union[MutableMapping[reagent.ope.estimators.types.TypeWrapper[typing.Union[int, typing.Tuple[int], float, typing.Tuple[float], numpy.ndarray, torch.Tensor]][Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]], reagent.ope.estimators.slate_estimators.SlateItemValues], MutableSequence[reagent.ope.estimators.slate_estimators.SlateItemValues]])

Bases: reagent.ope.estimators.slate_estimators.SlateSlotObjects

values_tensor(device=None) → torch.Tensor
class reagent.ope.estimators.slate_estimators.SlateSlotObjects(values: Union[MutableMapping[reagent.ope.estimators.types.TypeWrapper[typing.Union[int, typing.Tuple[int], float, typing.Tuple[float], numpy.ndarray, torch.Tensor]][Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]], ValueType], MutableSequence[ValueType]])

Bases: reagent.ope.estimators.types.Objects

fill(values: Sequence[ValueType]) → Union[Mapping[reagent.ope.estimators.types.TypeWrapper[typing.Union[int, typing.Tuple[int], float, typing.Tuple[float], numpy.ndarray, torch.Tensor]][Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]], ValueType], Sequence[ValueType]]
property objects
property slots
class reagent.ope.estimators.slate_estimators.SlateSlotValues(values: Union[Mapping[KeyType, float], Sequence[float], numpy.ndarray, torch.Tensor])

Bases: reagent.ope.estimators.types.Values

Map from a slot to a value

class reagent.ope.estimators.slate_estimators.SlateSlots(items: Union[Sequence[ValueType], int])

Bases: reagent.ope.estimators.types.Items

List of slot

fill(values: Union[Mapping[reagent.ope.estimators.types.TypeWrapper[typing.Union[int, typing.Tuple[int], float, typing.Tuple[float], numpy.ndarray, torch.Tensor]][Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]], float], Sequence[float], numpy.ndarray, torch.Tensor]) → reagent.ope.estimators.slate_estimators.SlateSlotValues

Map slots to given values :param values: given values

Returns

Map from slots to given values

reagent.ope.estimators.slate_estimators.is_to_calculate_expectation(slate_size: int, item_size: int) → bool

Switch between calculating and sampling expectations, balanced by execution time and accuracy :returns: True to calculate

False to sample

reagent.ope.estimators.slate_estimators.make_slate(slots: reagent.ope.estimators.slate_estimators.SlateSlots, items: Sequence[reagent.ope.estimators.types.TypeWrapper[typing.Union[int, typing.Tuple[int], float, typing.Tuple[float], numpy.ndarray, torch.Tensor]][Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]]]) → reagent.ope.estimators.slate_estimators.Slate

Assign items to slots to make a slate

reagent.ope.estimators.slate_estimators.make_slot_item_distributions(slots: reagent.ope.estimators.slate_estimators.SlateSlots, dists: Sequence[reagent.ope.estimators.slate_estimators.SlateItemValues]) → reagent.ope.estimators.slate_estimators.SlateSlotItemExpectations

reagent.ope.estimators.types module

class reagent.ope.estimators.types.ActionDistribution(values: Union[Mapping[KeyType, float], Sequence[float], numpy.ndarray, torch.Tensor])

Bases: reagent.ope.estimators.types.Values

class reagent.ope.estimators.types.ActionSpace(items: Union[Sequence[ValueType], int])

Bases: reagent.ope.estimators.types.Items

distribution(dist: Union[Mapping[reagent.ope.estimators.types.TypeWrapper[typing.Union[int, typing.Tuple[int], float, typing.Tuple[float], numpy.ndarray, torch.Tensor]][Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]], float], Sequence[float], numpy.ndarray, torch.Tensor]) → reagent.ope.estimators.types.ActionDistribution
property space
class reagent.ope.estimators.types.Items(items: Union[Sequence[ValueType], int])

Bases: typing.Generic, abc.ABC

List of items

fill(values: Union[Mapping[ValueType, float], Sequence[float], numpy.ndarray, torch.Tensor]) → Union[Sequence[float], Mapping[ValueType, float]]
index_of(item: ValueType) → int
property is_sequence
class reagent.ope.estimators.types.Objects(values: Union[Mapping[KeyType, ValueType], Sequence[ValueType]])

Bases: typing.Generic, abc.ABC

Generic class for a map from item to its value. It supports [] indexing, and iterator protocol

items

list of items

values

list of their values

index_of(key: KeyType) → int
property is_sequence
property keys
property values
class reagent.ope.estimators.types.Policy(action_space: reagent.ope.estimators.types.ActionSpace, device=None)

Bases: abc.ABC

Policy interface

property action_space
class reagent.ope.estimators.types.PredictResults(predictions: Union[torch.Tensor, NoneType], scores: torch.Tensor, probabilities: Union[torch.Tensor, NoneType] = None)

Bases: object

probabilities = None
class reagent.ope.estimators.types.Trainer

Bases: abc.ABC

property is_trained
load_model(file: str)
abstract property name
abstract predict(x: torch.Tensor, device=None) → reagent.ope.estimators.types.PredictResults
reset()
save_model(file: str)
abstract score(x: torch.Tensor, y: torch.Tensor, weight: Optional[torch.Tensor] = None) → float
abstract train(data: reagent.ope.estimators.types.TrainingData, iterations: int = 1, num_samples: int = 0)
class reagent.ope.estimators.types.TrainingData(train_x: torch.Tensor, train_y: torch.Tensor, train_weight: Union[torch.Tensor, NoneType], validation_x: torch.Tensor, validation_y: torch.Tensor, validation_weight: Union[torch.Tensor, NoneType])

Bases: object

class reagent.ope.estimators.types.TypeWrapper(*args, **kwds)

Bases: typing.Generic

class reagent.ope.estimators.types.Values(values: Union[Mapping[KeyType, float], Sequence[float], numpy.ndarray, torch.Tensor])

Bases: reagent.ope.estimators.types.Objects

Generic class for a map from item to its value. It supports [] indexing, and iterator protocol

items

list of items

values

list of their values

greedy(size=1) → Union[Sequence[KeyType], KeyType]
probability(key: ValueType) → float
replace(values: Union[Mapping[ValueType, float], Sequence[float], torch.Tensor, numpy.ndarray]) → reagent.ope.estimators.types.Values

Replace current values with new values, and returns the new copy. Current Values object is not changed

Parameters

values – new value

Returns

Values object with new values

sample(size=1) → Union[Sequence[KeyType], KeyType]
sort(descending: bool = True) → Tuple[Sequence[KeyType], torch.Tensor]

Sort based on values

Parameters

descending – sorting order

Returns

Tuple of sorted indices and values

reagent.ope.estimators.types.is_array(obj)

Module contents