autoqild.detectors.mlp_leakage_detectorΒΆ

Uses a Multi-Layer Perceptron (MLP) for detecting leakage using deep learning approaches.

Classes

MLPLeakageDetector(padding_name, ...[, ...])

MLPLeakageDetector class for detecting information leakage using a multi-layer perceptron (MLP) model.

class autoqild.detectors.mlp_leakage_detector.MLPLeakageDetector(padding_name, learner_params, fit_params, hash_value, cv_iterations, n_hypothesis, base_directory, search_space, hp_iters, n_inner_folds, validation_loss, random_state=None, **kwargs)[source]ΒΆ

Bases: SklearnLeakageDetector

MLPLeakageDetector class for detecting information leakage using a multi-layer perceptron (MLP) model.

This class extends SklearnLeakageDetector to analyze information leakage using a multi-layer perceptron (MLP) as the base model. The class is designed for experiments where deep learning models are employed for leakage detection. It integrates hyperparameter optimization and cross-validation to improve detection accuracy.

Parameters:
  • padding_name (str) – The name of the padding method used in the experiments to obscure or detect leakage.

  • learner_params (dict) – Parameters related to the MLP model used in the detection process.

  • fit_params (dict) – Parameters passed to the fit method during model training.

  • hash_value (str) – A unique hash value used to identify and manage result files for a specific experiment.

  • cv_iterations (int) – The number of cross-validation iterations to perform during model evaluation.

  • n_hypothesis (int) – The number of hypotheses or models to be tested for leakage.

  • base_directory (str) – The base directory where result files, logs, and backups are stored.

  • search_space (dict) – The hyperparameter search space for Bayesian optimization.

  • hp_iters (int) – The number of iterations for hyperparameter optimization.

  • n_inner_folds (int) – The number of folds for inner cross-validation during hyperparameter optimization.

  • validation_loss (str) – The loss function used to evaluate the performance of models during cross-validation.

  • random_state (int or RandomState instance, optional) – Controls the randomness for reproducibility, ensuring consistent results across different runs.

  • **kwargs (dict, optional) – Additional keyword arguments passed to the parent class.

detect(detection_method='log_loss_mi')[source]ΒΆ

Executes the detection process to identify potential information leakage using the specified method.

Parameters:
  • detection_method (str)

  • include (The method to use for detecting information leakage. Options)

  • paired-t-test (-)

  • paired-t-test-random (-)

  • fishers-exact-mean (-)

  • fishers-exact-median (-)

  • mid_point_mi (-)

  • log_loss_mi (-)

  • log_loss_mi_isotonic_regression (-)

  • log_loss_mi_platt_scaling (-)

  • log_loss_mi_beta_calibration (-)

  • log_loss_mi_temperature_scaling (-)

  • log_loss_mi_histogram_binning (-)

  • p_c_softmax_mi (-)

Returns:

  • detection_decision (bool) – Indicates whether any models showed significant leakage.

  • hypothesis_rejected (int) – The number of models flagged for leakage.

Notes

The method implements a Holm-Bonferroni correction to control the family-wise error rate for multiple models.

evaluate_scores(X_test, X_train, y_test, y_train, y_pred, p_pred, model, n_model)[source]ΒΆ

Evaluate and store model performance metrics for the detection process.

This method computes various evaluation metrics, such as log-loss, accuracy, and confusion matrix, for the model`s predictions. It also supports probability calibration using techniques like isotonic regression and Platt scaling. The results are stored and logged for further analysis.

Parameters:
  • X_test (array-like of shape (n_samples, n_features)) – The feature matrix for the test set.

  • X_train (array-like of shape (n_samples, n_features)) – The feature matrix for the training set.

  • y_test (array-like of shape (n_samples,)) – The true target labels for the test data.

  • y_train (array-like of shape (n_samples,)) – The true target labels for the training data.

  • y_pred (array-like of shape (n_samples,)) – The predicted target labels for the test set.

  • p_pred (array-like of shape (n_samples, n_classes)) – The predicted class probabilities for the test data.

  • model (object) – The trained model being evaluated.

  • n_model (int) – The index of the model in the list of evaluated models.

fit(X, y)[source]ΒΆ

Fits the model using cross-validation and performs hyperparameter optimization.

This method first checks if the model has already been fitted. If not, it runs the hyperparameter optimization process followed by cross-validation on the specified number of hypotheses. The model is trained using a stratified split of the dataset, and results are evaluated using predefined metrics.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – The input data used for training the models.

  • y (array-like of shape (n_samples,)) – The target values (class labels) corresponding to X.

Notes

During fitting, random classifier and majority voting classifier performance is also calculated for comparison.

hyperparameter_optimization(X, y)[source]ΒΆ

Performs Bayesian hyperparameter optimization to identify the best model parameters.

This method uses a Bayesian search strategy to explore a predefined hyperparameter search space and selects the optimal configuration based on the specified validation loss. The method performs cross-validation within the search to ensure that the selected hyperparameters generalize well.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – The input data to be used for training during hyperparameter optimization.

  • y (array-like of shape (n_samples,)) – The target values (class labels) corresponding to X.

Returns:

The size of the training dataset after reduction (if applicable).

Return type:

int

Raises:

Exception – If an error occurs during the Bayesian search fitting process.