autoqild.detectors.random_forest_leakage_detectorΒΆ
A leakage detector that utilizes RandomForest models for robust and interpretable detection.
Classes
|
RandomForestLeakageDetector class for detecting information leakage using a Random Forest model. |
- class autoqild.detectors.random_forest_leakage_detector.RandomForestLeakageDetector(padding_name, learner_params, fit_params, hash_value, cv_iterations, n_hypothesis, base_directory, search_space, hp_iters, n_inner_folds, validation_loss, random_state=None, **kwargs)[source]ΒΆ
Bases:
SklearnLeakageDetectorRandomForestLeakageDetector class for detecting information leakage using a Random Forest model.
This class extends SklearnLeakageDetector to detect information leakage using a Random Forest classifier as the base model. The Random Forest model is well-suited for leakage detection due to its ability to handle complex feature interactions and its inherent randomness. This class also supports hyperparameter optimization and cross-validation.
- Parameters:
padding_name (str) β The name of the padding method used in the experiments to obscure or detect leakage.
learner_params (dict) β Parameters related to the Random Forest model used in the detection process.
fit_params (dict) β Parameters passed to the fit method during model training.
hash_value (str) β A unique hash value used to identify and manage result files for a specific experiment.
cv_iterations (int) β The number of cross-validation iterations to perform during model evaluation.
n_hypothesis (int) β The number of hypotheses or models to be tested for leakage.
base_directory (str) β The base directory where result files, logs, and backups are stored.
search_space (dict) β The hyperparameter search space for Bayesian optimization.
hp_iters (int) β The number of iterations for hyperparameter optimization.
n_inner_folds (int) β The number of folds for inner cross-validation during hyperparameter optimization.
validation_loss (str) β The loss function used to evaluate the performance of models during cross-validation.
random_state (int or RandomState instance, optional) β Controls the randomness for reproducibility, ensuring consistent results across different runs.
**kwargs (dict, optional) β Additional keyword arguments passed to the parent class.
- detect(detection_method='log_loss_mi')[source]ΒΆ
Executes the detection process to identify potential information leakage using the specified method.
- Parameters:
detection_method (str)
include (The method to use for detecting information leakage. Options)
paired-t-test (-)
paired-t-test-random (-)
fishers-exact-mean (-)
fishers-exact-median (-)
mid_point_mi (-)
log_loss_mi (-)
log_loss_mi_isotonic_regression (-)
log_loss_mi_platt_scaling (-)
log_loss_mi_beta_calibration (-)
log_loss_mi_temperature_scaling (-)
log_loss_mi_histogram_binning (-)
p_c_softmax_mi (-)
- Returns:
detection_decision (bool) β Indicates whether any models showed significant leakage.
hypothesis_rejected (int) β The number of models flagged for leakage.
Notes
The method implements a Holm-Bonferroni correction to control the family-wise error rate for multiple models.
- evaluate_scores(X_test, X_train, y_test, y_train, y_pred, p_pred, model, n_model)[source]ΒΆ
Evaluate and store model performance metrics for the detection process.
This method computes various evaluation metrics, such as log-loss, accuracy, and confusion matrix, for the model`s predictions. It also supports probability calibration using techniques like isotonic regression and Platt scaling. The results are stored and logged for further analysis.
- Parameters:
X_test (array-like of shape (n_samples, n_features)) β The feature matrix for the test set.
X_train (array-like of shape (n_samples, n_features)) β The feature matrix for the training set.
y_test (array-like of shape (n_samples,)) β The true target labels for the test data.
y_train (array-like of shape (n_samples,)) β The true target labels for the training data.
y_pred (array-like of shape (n_samples,)) β The predicted target labels for the test set.
p_pred (array-like of shape (n_samples, n_classes)) β The predicted class probabilities for the test data.
model (object) β The trained model being evaluated.
n_model (int) β The index of the model in the list of evaluated models.
- fit(X, y)[source]ΒΆ
Fits the model using cross-validation and performs hyperparameter optimization.
This method first checks if the model has already been fitted. If not, it runs the hyperparameter optimization process followed by cross-validation on the specified number of hypotheses. The model is trained using a stratified split of the dataset, and results are evaluated using predefined metrics.
- Parameters:
X (array-like of shape (n_samples, n_features)) β The input data used for training the models.
y (array-like of shape (n_samples,)) β The target values (class labels) corresponding to X.
Notes
During fitting, random classifier and majority voting classifier performance is also calculated for comparison.
- hyperparameter_optimization(X, y)[source]ΒΆ
Performs Bayesian hyperparameter optimization to identify the best model parameters.
This method uses a Bayesian search strategy to explore a predefined hyperparameter search space and selects the optimal configuration based on the specified validation loss. The method performs cross-validation within the search to ensure that the selected hyperparameters generalize well.
- Parameters:
X (array-like of shape (n_samples, n_features)) β The input data to be used for training during hyperparameter optimization.
y (array-like of shape (n_samples,)) β The target values (class labels) corresponding to X.
- Returns:
The size of the training dataset after reduction (if applicable).
- Return type:
int
- Raises:
Exception β If an error occurs during the Bayesian search fitting process.