reagent.evaluation package

Subpackages

Submodules

reagent.evaluation.cpe module

class reagent.evaluation.cpe.CpeDetails

Bases: object

log()
log_to_tensorboard() None
class reagent.evaluation.cpe.CpeEstimate(raw, normalized, raw_std_error, normalized_std_error)

Bases: NamedTuple

normalized: float

Alias for field number 1

normalized_std_error: float

Alias for field number 3

raw: float

Alias for field number 0

raw_std_error: float

Alias for field number 2

class reagent.evaluation.cpe.CpeEstimateSet(direct_method, inverse_propensity, doubly_robust, sequential_doubly_robust, weighted_doubly_robust, magic, switch, switch_dr)

Bases: NamedTuple

check_estimates_exist()
direct_method: Optional[reagent.evaluation.cpe.CpeEstimate]

Alias for field number 0

doubly_robust: Optional[reagent.evaluation.cpe.CpeEstimate]

Alias for field number 2

fill_empty_with_zero()
inverse_propensity: Optional[reagent.evaluation.cpe.CpeEstimate]

Alias for field number 1

log()
log_to_tensorboard(metric_name: str) None
magic: Optional[reagent.evaluation.cpe.CpeEstimate]

Alias for field number 5

sequential_doubly_robust: Optional[reagent.evaluation.cpe.CpeEstimate]

Alias for field number 3

switch: Optional[reagent.evaluation.cpe.CpeEstimate]

Alias for field number 6

switch_dr: Optional[reagent.evaluation.cpe.CpeEstimate]

Alias for field number 7

weighted_doubly_robust: Optional[reagent.evaluation.cpe.CpeEstimate]

Alias for field number 4

reagent.evaluation.cpe.bootstrapped_std_error_of_mean(data, sample_percent=0.25, num_samples=1000)

Compute bootstrapped standard error of mean of input data.

Parameters
  • data – Input data (1D torch tensor or numpy array).

  • sample_percent – Size of sample to use to calculate bootstrap statistic.

  • num_samples – Number of times to sample.

reagent.evaluation.doubly_robust_estimator module

class reagent.evaluation.doubly_robust_estimator.DoublyRobustEstimator

Bases: object

For details, visit https://arxiv.org/pdf/1612.01205.pdf

estimate(edp: reagent.evaluation.evaluation_data_page.EvaluationDataPage, hp: Optional[reagent.evaluation.doubly_robust_estimator.DoublyRobustHP] = None) Tuple[reagent.evaluation.cpe.CpeEstimate, reagent.evaluation.cpe.CpeEstimate, reagent.evaluation.cpe.CpeEstimate]
class reagent.evaluation.doubly_robust_estimator.DoublyRobustHP(frac_train, frac_valid, bootstrap_num_samples, bootstrap_sample_percent, xgb_params, bope_mode, bope_num_samples)

Bases: NamedTuple

bootstrap_num_samples: int

Alias for field number 2

bootstrap_sample_percent: float

Alias for field number 3

bope_mode: Optional[str]

Alias for field number 5

bope_num_samples: Optional[int]

Alias for field number 6

frac_train: float

Alias for field number 0

frac_valid: float

Alias for field number 1

xgb_params: Optional[Dict[str, Union[float, int, str]]]

Alias for field number 4

class reagent.evaluation.doubly_robust_estimator.EstimationData(contexts_actions_train: Optional[torch.Tensor], policy_indicators_train: Optional[torch.Tensor], weights_train: Optional[torch.Tensor], contexts_actions_valid: Optional[torch.Tensor], policy_indicators_valid: Optional[torch.Tensor], weights_valid: Optional[torch.Tensor], contexts_actions_eval: Optional[torch.Tensor], contexts_train: Optional[torch.Tensor], actions_logged_train: Optional[torch.Tensor], contexts_valid: Optional[torch.Tensor], actions_logged_valid: Optional[torch.Tensor], contexts_eval: Optional[torch.Tensor], actions_logged_eval: Optional[torch.Tensor], model_propensities_eval: torch.Tensor, model_rewards_eval: torch.Tensor, action_mask_eval: torch.Tensor, logged_rewards_eval: torch.Tensor, model_rewards_for_logged_action_eval: torch.Tensor, logged_propensities_eval: torch.Tensor)

Bases: object

action_mask_eval: torch.Tensor
actions_logged_eval: Optional[torch.Tensor]
actions_logged_train: Optional[torch.Tensor]
actions_logged_valid: Optional[torch.Tensor]
contexts_actions_eval: Optional[torch.Tensor]
contexts_actions_train: Optional[torch.Tensor]
contexts_actions_valid: Optional[torch.Tensor]
contexts_eval: Optional[torch.Tensor]
contexts_train: Optional[torch.Tensor]
contexts_valid: Optional[torch.Tensor]
logged_propensities_eval: torch.Tensor
logged_rewards_eval: torch.Tensor
model_propensities_eval: torch.Tensor
model_rewards_eval: torch.Tensor
model_rewards_for_logged_action_eval: torch.Tensor
policy_indicators_train: Optional[torch.Tensor]
policy_indicators_valid: Optional[torch.Tensor]
weights_train: Optional[torch.Tensor]
weights_valid: Optional[torch.Tensor]
class reagent.evaluation.doubly_robust_estimator.ImportanceSamplingData(importance_weight, logged_rewards, model_rewards, model_rewards_for_logged_action, model_propensities)

Bases: NamedTuple

importance_weight: torch.Tensor

Alias for field number 0

logged_rewards: torch.Tensor

Alias for field number 1

model_propensities: torch.Tensor

Alias for field number 4

model_rewards: torch.Tensor

Alias for field number 2

model_rewards_for_logged_action: torch.Tensor

Alias for field number 3

class reagent.evaluation.doubly_robust_estimator.TrainValidEvalData(contexts_dict, model_propensities_dict, actions_logged_dict, action_mask_dict, logged_rewards_dict, model_rewards_dict, model_rewards_for_logged_action_dict, logged_propensities_dict, num_examples_dict)

Bases: NamedTuple

action_mask_dict: Dict[str, torch.Tensor]

Alias for field number 3

actions_logged_dict: Dict[str, torch.Tensor]

Alias for field number 2

contexts_dict: Dict[str, torch.Tensor]

Alias for field number 0

logged_propensities_dict: Dict[str, torch.Tensor]

Alias for field number 7

logged_rewards_dict: Dict[str, torch.Tensor]

Alias for field number 4

model_propensities_dict: Dict[str, torch.Tensor]

Alias for field number 1

model_rewards_dict: Dict[str, torch.Tensor]

Alias for field number 5

model_rewards_for_logged_action_dict: Dict[str, torch.Tensor]

Alias for field number 6

num_examples_dict: Dict[str, int]

Alias for field number 8

reagent.evaluation.evaluation_data_page module

class reagent.evaluation.evaluation_data_page.EvaluationDataPage(mdp_id: 'Optional[torch.Tensor]', sequence_number: 'Optional[torch.Tensor]', logged_propensities: 'torch.Tensor', logged_rewards: 'torch.Tensor', action_mask: 'torch.Tensor', model_propensities: 'torch.Tensor', model_rewards: 'torch.Tensor', model_rewards_for_logged_action: 'torch.Tensor', model_values: 'Optional[torch.Tensor]' = None, possible_actions_mask: 'Optional[torch.Tensor]' = None, optimal_q_values: 'Optional[torch.Tensor]' = None, eval_action_idxs: 'Optional[torch.Tensor]' = None, logged_values: 'Optional[torch.Tensor]' = None, logged_metrics: 'Optional[torch.Tensor]' = None, logged_metrics_values: 'Optional[torch.Tensor]' = None, model_metrics: 'Optional[torch.Tensor]' = None, model_metrics_for_logged_action: 'Optional[torch.Tensor]' = None, model_metrics_values: 'Optional[torch.Tensor]' = None, model_metrics_values_for_logged_action: 'Optional[torch.Tensor]' = None, possible_actions_state_concat: 'Optional[torch.Tensor]' = None, contexts: 'Optional[torch.Tensor]' = None)

Bases: reagent.core.types.TensorDataClass

action_mask: torch.Tensor
append(edp)
compute_values(gamma: float)
static compute_values_for_mdps(rewards: torch.Tensor, mdp_ids: torch.Tensor, sequence_numbers: torch.Tensor, gamma: float) torch.Tensor
contexts: Optional[torch.Tensor] = None
classmethod create_from_tensors_dqn(trainer: DQNTrainer, mdp_ids: torch.Tensor, sequence_numbers: torch.Tensor, states: rlt.FeatureData, actions: rlt.FeatureData, propensities: torch.Tensor, rewards: torch.Tensor, possible_actions_mask: torch.Tensor, metrics: Optional[torch.Tensor] = None)
classmethod create_from_tensors_parametric_dqn(trainer: ParametricDQNTrainer, mdp_ids: torch.Tensor, sequence_numbers: torch.Tensor, states: rlt.FeatureData, actions: rlt.FeatureData, propensities: torch.Tensor, rewards: torch.Tensor, possible_actions_mask: torch.Tensor, possible_actions: rlt.FeatureData, max_num_actions: int, metrics: Optional[torch.Tensor] = None)
classmethod create_from_tensors_seq2slate(seq2slate_net: reagent.models.seq2slate.Seq2SlateTransformerNet, reward_network: torch.nn.modules.module.Module, training_input: reagent.core.types.PreprocessedRankingInput, eval_greedy: bool, mdp_ids: Optional[torch.Tensor] = None, sequence_numbers: Optional[torch.Tensor] = None)
Parameters

eval_greedy – If True, evaluate the greedy policy which

always picks the most probable output sequence. If False, evaluate

the stochastic ranking policy.

classmethod create_from_training_batch(tdb: rlt.PreprocessedRankingInput, trainer: ReAgentLightningModule, reward_network: Optional[nn.Module] = None)
eval_action_idxs: Optional[torch.Tensor] = None
logged_metrics: Optional[torch.Tensor] = None
logged_metrics_values: Optional[torch.Tensor] = None
logged_propensities: torch.Tensor
logged_rewards: torch.Tensor
logged_values: Optional[torch.Tensor] = None
mdp_id: Optional[torch.Tensor]
model_metrics: Optional[torch.Tensor] = None
model_metrics_for_logged_action: Optional[torch.Tensor] = None
model_metrics_values: Optional[torch.Tensor] = None
model_metrics_values_for_logged_action: Optional[torch.Tensor] = None
model_propensities: torch.Tensor
model_rewards: torch.Tensor
model_rewards_for_logged_action: torch.Tensor
model_values: Optional[torch.Tensor] = None
optimal_q_values: Optional[torch.Tensor] = None
possible_actions_mask: Optional[torch.Tensor] = None
possible_actions_state_concat: Optional[torch.Tensor] = None
sequence_number: Optional[torch.Tensor]
set_metric_as_reward(i: int, num_actions: int)
sort()
validate()

reagent.evaluation.evaluator module

class reagent.evaluation.evaluator.Evaluator(action_names, gamma, model, metrics_to_score=None)

Bases: object

NUM_J_STEPS_FOR_MAGIC_ESTIMATOR = 25
add_observer(observer: reagent.core.tracker.Observer)
add_observers(observers: List[reagent.core.tracker.Observer])
evaluate_post_training(edp: reagent.evaluation.evaluation_data_page.EvaluationDataPage) reagent.evaluation.cpe.CpeDetails
get_target_distribution_error(actions, target_distribution, actual_distribution)

Calculate MSE between actual and target action distribution.

static huberLoss(label, output)
notify_observers(**kwargs)
score_cpe(metric_name, edp: reagent.evaluation.evaluation_data_page.EvaluationDataPage)
reagent.evaluation.evaluator.get_metrics_to_score(metric_reward_values: Optional[Dict[str, float]]) List[str]
reagent.evaluation.evaluator.get_tensor(x, dtype=None)
Input:
  • x: list or a sequence

  • dtype: target data type of the elements in tensor [optional]

    It will be inferred automatically if not provided.

Output:

Tensor given a list or a sequence. If the input is None, it returns None If the input is a tensor it returns the tensor. If type is provides the output Tensor will have that type

reagent.evaluation.ope_adapter module

class reagent.evaluation.ope_adapter.OPEstimatorAdapter(ope_estimator: reagent.ope.estimators.estimator.Estimator, device=None)

Bases: object

static edp_to_contextual_bandit_log(edp: reagent.evaluation.evaluation_data_page.EvaluationDataPage, device=None) reagent.ope.estimators.contextual_bandits_estimators.BanditsEstimatorInput
estimate(edp: reagent.evaluation.evaluation_data_page.EvaluationDataPage, **kwargs) reagent.evaluation.cpe.CpeEstimate
static estimator_result_to_cpe_estimate(result: reagent.ope.estimators.estimator.EstimatorResult) reagent.evaluation.cpe.CpeEstimate
class reagent.evaluation.ope_adapter.OPEvaluator(action_names, gamma, model, metrics_to_score=None, device=None)

Bases: reagent.evaluation.evaluator.Evaluator

score_cpe(metric_name, edp: reagent.evaluation.evaluation_data_page.EvaluationDataPage)
class reagent.evaluation.ope_adapter.SequentialOPEstimatorAdapter(seq_ope_estimator: reagent.ope.estimators.sequential_estimators.RLEstimator, gamma: float, device=None)

Bases: object

class EDPSeqPolicy(num_actions: int, model_propensities: torch.Tensor, device=None)

Bases: reagent.ope.estimators.sequential_estimators.RLPolicy

action_dist(state: reagent.ope.estimators.sequential_estimators.State) reagent.ope.estimators.types.ActionDistribution
class EDPValueFunc(model_values: torch.Tensor, target_propensities: torch.Tensor)

Bases: reagent.ope.estimators.sequential_estimators.ValueFunction

reset()
state_action_value(state: reagent.ope.estimators.sequential_estimators.State, action: reagent.ope.estimators.types.TypeWrapper[Union[int, Tuple[int], float, Tuple[float], numpy.ndarray, torch.Tensor]]) float
state_value(state: reagent.ope.estimators.sequential_estimators.State) float
static edp_to_rl_input(edp: reagent.evaluation.evaluation_data_page.EvaluationDataPage, gamma, device=None) reagent.ope.estimators.sequential_estimators.RLEstimatorInput
estimate(edp: reagent.evaluation.evaluation_data_page.EvaluationDataPage) reagent.evaluation.cpe.CpeEstimate
static estimator_results_to_cpe_estimate(estimator_results: reagent.ope.estimators.estimator.EstimatorResults) reagent.evaluation.cpe.CpeEstimate

reagent.evaluation.sequential_doubly_robust_estimator module

class reagent.evaluation.sequential_doubly_robust_estimator.SequentialDoublyRobustEstimator(gamma)

Bases: object

estimate(edp: reagent.evaluation.evaluation_data_page.EvaluationDataPage) reagent.evaluation.cpe.CpeEstimate

reagent.evaluation.weighted_sequential_doubly_robust_estimator module

class reagent.evaluation.weighted_sequential_doubly_robust_estimator.WeightedSequentialDoublyRobustEstimator(gamma)

Bases: object

BOOTSTRAP_SAMPLE_PCT = 0.5
CONFIDENCE_INTERVAL = 0.9
NUM_BOOTSTRAP_SAMPLES = 50
NUM_SUBSETS_FOR_CB_ESTIMATES = 25
static calculate_step_return(rewards, discounts, importance_weights, importance_weights_one_earlier, estimated_state_values, estimated_q_values, j_step)
compute_weighted_doubly_robust_point_estimate(j_steps, num_j_steps, j_step_returns, infinite_step_returns, j_step_return_trajectories)
static confidence_bounds(x, confidence)
estimate(edp: reagent.evaluation.evaluation_data_page.EvaluationDataPage, num_j_steps, whether_self_normalize_importance_weights) reagent.evaluation.cpe.CpeEstimate
static normalize_importance_weights(importance_weights, whether_self_normalize_importance_weights)
static transform_to_equal_length_trajectories(mdp_ids, actions, rewards, logged_propensities, target_propensities, estimated_q_values)

Take in samples (action, rewards, propensities, etc.) and output lists of equal-length trajectories (episodes) according to terminals. As the raw trajectories are of various lengths, the shorter ones are filled with zeros(ones) at the end.

reagent.evaluation.weighted_sequential_doubly_robust_estimator.mse_loss(x, error)

reagent.evaluation.world_model_evaluator module

class reagent.evaluation.world_model_evaluator.FeatureImportanceEvaluator(trainer: reagent.training.world_model.mdnrnn_trainer.MDNRNNTrainer, discrete_action: bool, state_feature_num: int, action_feature_num: int, sorted_action_feature_start_indices: List[int], sorted_state_feature_start_indices: List[int])

Bases: object

Evaluate feature importance weights on data pages

compute_median_feature_value(features)
evaluate(batch: reagent.core.types.MemoryNetworkInput)

Calculate feature importance: setting each state/action feature to the mean value and observe loss increase.

class reagent.evaluation.world_model_evaluator.FeatureSensitivityEvaluator(trainer: reagent.training.world_model.mdnrnn_trainer.MDNRNNTrainer, state_feature_num: int, sorted_state_feature_start_indices: List[int])

Bases: object

Evaluate state feature sensitivity caused by varying actions

evaluate(batch: reagent.core.types.MemoryNetworkInput)

Calculate state feature sensitivity due to actions: randomly permutating actions and see how much the prediction of next state feature deviates.

class reagent.evaluation.world_model_evaluator.LossEvaluator(trainer: reagent.training.world_model.mdnrnn_trainer.MDNRNNTrainer, state_dim: int)

Bases: object

Evaluate losses on data pages

evaluate(tdp: reagent.core.types.MemoryNetworkInput) Dict[str, float]

Module contents