autoqild.detectors.tabpfn_leakage_detector¶

Uses the TabPFN model to detect information leakage, particularly in small tabular datasets.

Classes

TabPFNLeakageDetector(padding_name, ...[, ...])

TabPFNLeakageDetector class for detecting information leakage using the TabPFN model.

class autoqild.detectors.tabpfn_leakage_detector.TabPFNLeakageDetector(padding_name, learner_params, fit_params, hash_value, cv_iterations, n_hypothesis, base_directory, search_space, hp_iters, n_inner_folds, validation_loss, random_state=None, **kwargs)[source]¶

Bases: SklearnLeakageDetector

TabPFNLeakageDetector class for detecting information leakage using the TabPFN model.

This class extends SklearnLeakageDetector to perform information leakage detection using the TabPFN model, which is particularly effective for small tabular datasets. The class incorporates hyperparameter optimization, dataset reduction, and cross-validation, making it suitable for scenarios requiring lightweight and efficient models.

Parameters:

padding_name (str) – The name of the padding method used in the experiments to obscure or detect leakage.
learner_params (dict) – Parameters related to the TabPFN model used in the detection process.
fit_params (dict) – Parameters passed to the fit method during model training.
hash_value (str) – A unique hash value used to identify and manage result files for a specific experiment.
cv_iterations (int) – The number of cross-validation iterations to perform during model evaluation.
n_hypothesis (int) – The number of hypotheses or models to be tested for leakage.
base_directory (str) – The base directory where result files, logs, and backups are stored.
search_space (dict) – The hyperparameter search space for Bayesian optimization.
hp_iters (int) – The number of iterations for hyperparameter optimization.
n_inner_folds (int) – The number of folds for inner cross-validation during hyperparameter optimization.
validation_loss (str) – The loss function used to evaluate the performance of models during cross-validation.
random_state (int or RandomState instance, optional) – Controls the randomness for reproducibility, ensuring consistent results across different runs.
**kwargs (dict, optional) – Additional keyword arguments passed to the parent class.

detect(detection_method='log_loss_mi')[source]¶

Executes the detection process to identify potential information leakage using the specified method.

Parameters:

detection_method (str)
include (The method to use for detecting information leakage. Options)
paired-t-test (-)
paired-t-test-random (-)
fishers-exact-mean (-)
fishers-exact-median (-)
mid_point_mi (-)
log_loss_mi (-)
log_loss_mi_isotonic_regression (-)
log_loss_mi_platt_scaling (-)
log_loss_mi_beta_calibration (-)
log_loss_mi_temperature_scaling (-)
log_loss_mi_histogram_binning (-)
p_c_softmax_mi (-)

Returns:

detection_decision (bool) – Indicates whether any models showed significant leakage.
hypothesis_rejected (int) – The number of models flagged for leakage.

Notes

The method implements a Holm-Bonferroni correction to control the family-wise error rate for multiple models.

evaluate_scores(X_test, X_train, y_test, y_train, y_pred, p_pred, model, n_model)[source]¶

fit(X, y)[source]¶

hyperparameter_optimization(X, y)[source]¶