ml.rl.evaluation package

Submodules

ml.rl.evaluation.cpe module

class ml.rl.evaluation.cpe.CpeDetails

Bases: object

log()
log_to_tensorboard() → None
class ml.rl.evaluation.cpe.CpeEstimate(raw, normalized, raw_std_error, normalized_std_error)

Bases: tuple

property normalized

Alias for field number 1

property normalized_std_error

Alias for field number 3

property raw

Alias for field number 0

property raw_std_error

Alias for field number 2

class ml.rl.evaluation.cpe.CpeEstimateSet(direct_method, inverse_propensity, doubly_robust, sequential_doubly_robust, weighted_doubly_robust, magic)

Bases: tuple

check_estimates_exist()
property direct_method

Alias for field number 0

property doubly_robust

Alias for field number 2

fill_empty_with_zero()
property inverse_propensity

Alias for field number 1

log()
log_to_tensorboard(metric_name: str) → None
property magic

Alias for field number 5

property sequential_doubly_robust

Alias for field number 3

property weighted_doubly_robust

Alias for field number 4

ml.rl.evaluation.cpe.bootstrapped_std_error_of_mean(data, sample_percent=0.25, num_samples=1000)

Compute bootstrapped standard error of mean of input data.

Parameters
  • data – Input data (1D torch tensor or numpy array).

  • sample_percent – Size of sample to use to calculate bootstrap statistic.

  • num_samples – Number of times to sample.

ml.rl.evaluation.doubly_robust_estimator module

class ml.rl.evaluation.doubly_robust_estimator.DoublyRobustEstimator

Bases: object

For details, visit https://arxiv.org/pdf/1612.01205.pdf

estimate(edp: ml.rl.evaluation.evaluation_data_page.EvaluationDataPage, hp: Optional[dict] = None) → Tuple[ml.rl.evaluation.cpe.CpeEstimate, ml.rl.evaluation.cpe.CpeEstimate, ml.rl.evaluation.cpe.CpeEstimate]
class ml.rl.evaluation.doubly_robust_estimator.DoublyRobustEstimatorBOPE

Bases: ml.rl.evaluation.doubly_robust_estimator.DoublyRobustEstimator

This class implements a doubly-robust Balanced Off-Policy Evaluation (BOP-E) method. For details about BOP-E see https://arxiv.org/abs/1906.03694 For analysis of BOP-E performance see https://fburl.com/bope_eval_nb

Note that when using BOP-E the data gets split into training, validation and evaluation parts and only the evaluation part is used directly for policy evaluation, while training and validation datasets are used for model training.

supported modes (all doubly robust): 1. bope_weights. Use BOP-E (ignoring logged propensities) to estimate the

importance weights. Propensities of the target policy are used as observation weights when training BOP-E classifier.

  1. bope_weighted_targets. Use BOP-E (ignoring logged propensities) to

    estimate the importance weights. Propensities of the target policy are used as soft targets to train BOP-E regressor. With this method BOP-E trains a regressor instead of a classifier.

  2. bope_sampling. Use BOP-E (ignoring logged propensities)

    to estimate the importance weights. Propensities of the target policy are used to sample the actions for the classifier training data.

estimate(edp: ml.rl.evaluation.evaluation_data_page.EvaluationDataPage, hp: Optional[dict] = None) → Tuple[ml.rl.evaluation.cpe.CpeEstimate, ml.rl.evaluation.cpe.CpeEstimate, ml.rl.evaluation.cpe.CpeEstimate]
class ml.rl.evaluation.doubly_robust_estimator.DoublyRobustEstimatorEstProp

Bases: ml.rl.evaluation.doubly_robust_estimator.DoublyRobustEstimator

estimate(edp: ml.rl.evaluation.evaluation_data_page.EvaluationDataPage, hp: Optional[dict] = None) → Tuple[ml.rl.evaluation.cpe.CpeEstimate, ml.rl.evaluation.cpe.CpeEstimate, ml.rl.evaluation.cpe.CpeEstimate]
class ml.rl.evaluation.doubly_robust_estimator.EstimationData(contexts_actions_train, policy_indicators_train, weights_train, contexts_actions_valid, policy_indicators_valid, weights_valid, contexts_actions_eval, contexts_train, actions_logged_train, contexts_valid, actions_logged_valid, contexts_eval, actions_logged_eval, model_propensities_eval, model_rewards_eval, action_mask_eval, logged_rewards_eval, model_rewards_for_logged_action_eval, logged_propensities_eval)

Bases: tuple

property action_mask_eval

Alias for field number 15

property actions_logged_eval

Alias for field number 12

property actions_logged_train

Alias for field number 8

property actions_logged_valid

Alias for field number 10

property contexts_actions_eval

Alias for field number 6

property contexts_actions_train

Alias for field number 0

property contexts_actions_valid

Alias for field number 3

property contexts_eval

Alias for field number 11

property contexts_train

Alias for field number 7

property contexts_valid

Alias for field number 9

property logged_propensities_eval

Alias for field number 18

property logged_rewards_eval

Alias for field number 16

property model_propensities_eval

Alias for field number 13

property model_rewards_eval

Alias for field number 14

property model_rewards_for_logged_action_eval

Alias for field number 17

property policy_indicators_train

Alias for field number 1

property policy_indicators_valid

Alias for field number 4

property weights_train

Alias for field number 2

property weights_valid

Alias for field number 5

class ml.rl.evaluation.doubly_robust_estimator.ImportanceSamplingData(importance_weight, logged_rewards, model_rewards, model_rewards_for_logged_action, model_propensities)

Bases: tuple

property importance_weight

Alias for field number 0

property logged_rewards

Alias for field number 1

property model_propensities

Alias for field number 4

property model_rewards

Alias for field number 2

property model_rewards_for_logged_action

Alias for field number 3

class ml.rl.evaluation.doubly_robust_estimator.TrainValidEvalData(contexts_dict, model_propensities_dict, actions_logged_dict, action_mask_dict, logged_rewards_dict, model_rewards_dict, model_rewards_for_logged_action_dict, logged_propensities_dict, num_examples_dict)

Bases: tuple

property action_mask_dict

Alias for field number 3

property actions_logged_dict

Alias for field number 2

property contexts_dict

Alias for field number 0

property logged_propensities_dict

Alias for field number 7

property logged_rewards_dict

Alias for field number 4

property model_propensities_dict

Alias for field number 1

property model_rewards_dict

Alias for field number 5

property model_rewards_for_logged_action_dict

Alias for field number 6

property num_examples_dict

Alias for field number 8

ml.rl.evaluation.evaluation_data_page module

class ml.rl.evaluation.evaluation_data_page.EvaluationDataPage(mdp_id, sequence_number, logged_propensities, logged_rewards, action_mask, model_propensities, model_rewards, model_rewards_for_logged_action, model_values, model_values_for_logged_action, possible_actions_mask, optimal_q_values, eval_action_idxs, logged_values, logged_metrics, logged_metrics_values, model_metrics, model_metrics_for_logged_action, model_metrics_values, model_metrics_values_for_logged_action, possible_actions_state_concat, contexts)

Bases: tuple

property action_mask

Alias for field number 4

append(edp)
compute_values(gamma: float)
static compute_values_for_mdps(rewards: torch.Tensor, mdp_ids: numpy.ndarray, sequence_numbers: torch.Tensor, gamma: float) → torch.Tensor
property contexts

Alias for field number 21

classmethod create_from_tensors_dqn(trainer: ml.rl.training.dqn_trainer.DQNTrainer, mdp_ids: numpy.ndarray, sequence_numbers: torch.Tensor, states: ml.rl.types.PreprocessedFeatureVector, actions: ml.rl.types.PreprocessedFeatureVector, propensities: torch.Tensor, rewards: torch.Tensor, possible_actions_mask: torch.Tensor, metrics: Optional[torch.Tensor] = None)
classmethod create_from_tensors_parametric_dqn(trainer: ml.rl.training.parametric_dqn_trainer.ParametricDQNTrainer, mdp_ids: numpy.ndarray, sequence_numbers: torch.Tensor, states: ml.rl.types.PreprocessedFeatureVector, actions: ml.rl.types.PreprocessedFeatureVector, propensities: torch.Tensor, rewards: torch.Tensor, possible_actions_mask: torch.Tensor, possible_actions: ml.rl.types.PreprocessedFeatureVector, max_num_actions: int, metrics: Optional[torch.Tensor] = None)
classmethod create_from_training_batch(tdb: ml.rl.types.PreprocessedTrainingBatch, trainer: ml.rl.training.dqn_trainer.DQNTrainer)
property eval_action_idxs

Alias for field number 12

property logged_metrics

Alias for field number 14

property logged_metrics_values

Alias for field number 15

property logged_propensities

Alias for field number 2

property logged_rewards

Alias for field number 3

property logged_values

Alias for field number 13

property mdp_id

Alias for field number 0

property model_metrics

Alias for field number 16

property model_metrics_for_logged_action

Alias for field number 17

property model_metrics_values

Alias for field number 18

property model_metrics_values_for_logged_action

Alias for field number 19

property model_propensities

Alias for field number 5

property model_rewards

Alias for field number 6

property model_rewards_for_logged_action

Alias for field number 7

property model_values

Alias for field number 8

property model_values_for_logged_action

Alias for field number 9

property optimal_q_values

Alias for field number 11

property possible_actions_mask

Alias for field number 10

property possible_actions_state_concat

Alias for field number 20

property sequence_number

Alias for field number 1

set_metric_as_reward(i: int, num_actions: int)
sort()
validate()

ml.rl.evaluation.evaluator module

class ml.rl.evaluation.evaluator.Evaluator(action_names, gamma, model, metrics_to_score=None)

Bases: object

NUM_J_STEPS_FOR_MAGIC_ESTIMATOR = 25
evaluate_post_training(edp: ml.rl.evaluation.evaluation_data_page.EvaluationDataPage) → ml.rl.evaluation.cpe.CpeDetails
get_target_distribution_error(actions, target_distribution, actual_distribution)

Calculate MSE between actual and target action distribution.

static huberLoss(label, output)
score_cpe(metric_name, edp: ml.rl.evaluation.evaluation_data_page.EvaluationDataPage)
ml.rl.evaluation.evaluator.get_metrics_to_score(metric_reward_values: Optional[Dict[str, float]]) → List[str]
ml.rl.evaluation.evaluator.get_tensor(x, dtype=None)
Input:
  • x: list or a sequence

  • dtype: target data type of the elements in tensor [optional]

    It will be inferred automatically if not provided.

Output:

Tensor given a list or a sequence. If the input is None, it returns None If the input is a tensor it returns the tensor. If type is provides the output Tensor will have that type

ml.rl.evaluation.ranking_evaluator module

class ml.rl.evaluation.ranking_evaluator.RankingEvaluator(trainer: ml.rl.training.ranking.seq2slate_trainer.Seq2SlateTrainer)

Bases: object

Evaluate ranking models

evaluate(eval_tdp: ml.rl.types.PreprocessedTrainingBatch)
evaluate_post_training()

ml.rl.evaluation.sequential_doubly_robust_estimator module

class ml.rl.evaluation.sequential_doubly_robust_estimator.SequentialDoublyRobustEstimator(gamma)

Bases: object

estimate(edp: ml.rl.evaluation.evaluation_data_page.EvaluationDataPage) → ml.rl.evaluation.cpe.CpeEstimate

ml.rl.evaluation.weighted_sequential_doubly_robust_estimator module

class ml.rl.evaluation.weighted_sequential_doubly_robust_estimator.WeightedSequentialDoublyRobustEstimator(gamma)

Bases: object

BOOTSTRAP_SAMPLE_PCT = 0.5
CONFIDENCE_INTERVAL = 0.9
NUM_BOOTSTRAP_SAMPLES = 50
NUM_SUBSETS_FOR_CB_ESTIMATES = 25
static calculate_step_return(rewards, discounts, importance_weights, importance_weights_one_earlier, estimated_state_values, estimated_q_values, j_step)
compute_weighted_doubly_robust_point_estimate(j_steps, num_j_steps, j_step_returns, infinite_step_returns, j_step_return_trajectories)
static confidence_bounds(x, confidence)
estimate(edp: ml.rl.evaluation.evaluation_data_page.EvaluationDataPage, num_j_steps, whether_self_normalize_importance_weights) → ml.rl.evaluation.cpe.CpeEstimate
static normalize_importance_weights(importance_weights, whether_self_normalize_importance_weights)
static transform_to_equal_length_trajectories(mdp_ids, actions, rewards, logged_propensities, target_propensities, estimated_q_values)

Take in samples (action, rewards, propensities, etc.) and output lists of equal-length trajectories (episodes) according to terminals. As the raw trajectories are of various lengths, the shorter ones are filled with zeros(ones) at the end.

ml.rl.evaluation.weighted_sequential_doubly_robust_estimator.mse_loss(x, error)

ml.rl.evaluation.world_model_evaluator module

class ml.rl.evaluation.world_model_evaluator.FeatureImportanceEvaluator(trainer: ml.rl.training.world_model.mdnrnn_trainer.MDNRNNTrainer, discrete_action: bool, state_feature_num: int, action_feature_num: int, sorted_action_feature_start_indices: List[int], sorted_state_feature_start_indices: List[int])

Bases: object

Evaluate feature importance weights on data pages

compute_median_feature_value(features)
evaluate(tdp: ml.rl.types.PreprocessedTrainingBatch)

Calculate feature importance: setting each state/action feature to the mean value and observe loss increase.

class ml.rl.evaluation.world_model_evaluator.FeatureSensitivityEvaluator(trainer: ml.rl.training.world_model.mdnrnn_trainer.MDNRNNTrainer, state_feature_num: int, sorted_state_feature_start_indices: List[int])

Bases: object

Evaluate state feature sensitivity caused by varying actions

evaluate(tdp: ml.rl.types.PreprocessedTrainingBatch)

Calculate state feature sensitivity due to actions: randomly permutating actions and see how much the prediction of next state feature deviates.

class ml.rl.evaluation.world_model_evaluator.LossEvaluator(trainer: ml.rl.training.world_model.mdnrnn_trainer.MDNRNNTrainer, state_dim: int)

Bases: object

Evaluate losses on data pages

evaluate(tdp: ml.rl.types.PreprocessedTrainingBatch) → Dict

Module contents