autoqild.detectors.autogluon_leakage_detector¶

A leakage detection class leveraging AutoGluon for hyperparameter optimization and model evaluation.

Classes

AutoGluonLeakageDetector(padding_name, ...)

AutoGluonLeakageDetector leverages the AutoGluon framework for detecting information leakage in machine learning models.

class autoqild.detectors.autogluon_leakage_detector.AutoGluonLeakageDetector(padding_name, learner_params, fit_params, hash_value, cv_iterations, n_hypothesis, base_directory, validation_loss, random_state=None, **kwargs)[source]¶

Bases: InformationLeakageDetector

AutoGluonLeakageDetector leverages the AutoGluon framework for detecting information leakage in machine learning models. This class extends the InformationLeakageDetector base class and uses AutoGluon for hyperparameter optimization and model training. It evaluates potential information leakage using various metrics across different cross-validation splits.

Parameters:

padding_name (str) – The name of the padding method used in experiments to potentially obscure or prevent leakage.
learner_params (dict) – Parameters related to the AutoGluon classifier used in the leakage detection process.
fit_params (dict) – Parameters passed to the fit method of the AutoGluon models during training.
hash_value (str) – A unique hash value used to identify and manage result files for a specific experiment.
cv_iterations (int) – The number of cross-validation iterations to perform during model evaluation.
n_hypothesis (int) – The number of hypotheses or models to be tested for leakage.
base_directory (str) – The base directory where result files, logs, and backups are stored.
validation_loss (str) – The evaluation metric used to assess model performance during hyperparameter optimization.
random_state (int or None, optional) – Controls the randomness for reproducibility, ensuring consistent results across different runs.
**kwargs (dict, optional) – Additional keyword arguments passed to the InformationLeakageDetector base class.

base_detector¶

The base AutoGluon classifier used for model training.

Type:: AutoGluonClassifier

learner¶

The AutoGluon classifier instance used for the current experiment.

Type:: AutoGluonClassifier instance

logger¶

Logger instance used for recording the steps and processes of the leakage detection.

Type:: logging.Logger

detect(detection_method='LogLossMI')[source]¶

Executes the detection process to identify potential information leakage using the specified method.

Parameters:

detection_method (str)
include (The method to use for detecting information leakage. Options)
paired-t-test (-)
paired-t-test-random (-)
fishers-exact-mean (-)
fishers-exact-median (-)
mid_point_mi (-)
log_loss_mi (-)
log_loss_mi_isotonic_regression (-)
log_loss_mi_platt_scaling (-)
log_loss_mi_beta_calibration (-)
log_loss_mi_temperature_scaling (-)
log_loss_mi_histogram_binning (-)
p_c_softmax_mi (-)

Returns:

detection_decision (bool) – Indicates whether any models showed significant leakage.
hypothesis_rejected (int) – The number of models flagged for leakage.

Notes

The method implements a Holm-Bonferroni correction to control the family-wise error rate for multiple models.

evaluate_scores(X_test, X_train, y_test, y_train, y_pred, p_pred, model, n_model)[source]¶

Evaluates and stores model performance metrics for the detection process.

This method computes various evaluation metrics, such as log-loss, accuracy, and confusion matrix, for the model`s predictions. The results are stored and logged for further analysis.

Parameters:

X_test (array-like of shape (n_samples, n_features)) – The input feature matrix for the test set.
X_train (array-like of shape (n_samples, n_features)) – The input feature matrix for the training set.
y_test (array-like of shape (n_samples,)) – The true target labels for the test set.
y_train (array-like of shape (n_samples,)) – The true target labels for the training set.
y_pred (array-like of shape (n_samples,)) – The predicted labels for the test set.
p_pred (array-like of shape (n_samples, n_classes)) – The predicted class probabilities for the test set.
model (object) – The trained model that is being evaluated.
n_model (int) – The index of the model within the list of models being evaluated.

fit(X, y, **kwargs)[source]¶

Fits the models using cross-validation and evaluates them for information leakage.

This method performs cross-validation, training the AutoGluon models across different data splits. The models are then evaluated for potential leakage using metrics such as accuracy and log-loss.

Parameters:

X (array-like of shape (n_samples, n_features)) – The input feature matrix used for model training.
y (array-like of shape (n_samples,)) – The target values (class labels) corresponding to each row in X.

hyperparameter_optimization(X, y)[source]¶

Performs hyperparameter optimization using AutoGluon to find the best models for leakage detection.

This method runs a Bayesian optimization process to identify the best models according to the specified evaluation metric. The optimized models are then stored for subsequent evaluation.

Parameters:

X (array-like of shape (n_samples, n_features)) – The input feature matrix used for training during hyperparameter optimization.
y (array-like of shape (n_samples,)) – The target values (class labels) corresponding to each row in X.

Returns:

The size of the training dataset after the reduction (if applicable).

Return type:

int