autoqild.mi_estimators.mine_estimator¶

Mutual Information Neural Estimator (MINE) that uses multiple deep learning architectures to estimate MI for classification tasks.

Classes

MineMIEstimator(n_classes, n_features[, ...])

MineMIEstimator class implementing the Mutual Information Neural Estimator (MINE) approach to estimate the mutual information using an ensemble of deep neural networks.

class autoqild.mi_estimators.mine_estimator.MineMIEstimator(n_classes, n_features, loss_function='donsker_varadhan_softplus', optimizer_str='adam', learning_rate=0.0001, reg_strength=0, encode_classes=True, random_state=42, **kwargs)[source]¶

Bases: MIEstimatorBase

MineMIEstimator class implementing the Mutual Information Neural Estimator (MINE) approach to estimate the mutual information using an ensemble of deep neural networks.

This class trains multiple neural networks with varying architectures to estimate the mutual information (MI) between input features and class labels. By aggregating predictions across an ensemble of models, the estimator achieves a more stable and accurate MI estimate. The model is particularly useful when there is a need for robust MI estimates in high-dimensional data with complex relationships.

Parameters:

n_classes (int) – Number of classes in the classification data samples.
n_features (int) – Number of features or dimensionality of the inputs of the classification data samples.
loss_function ({donsker_varadhan, donsker_varadhan_softplus, fdivergence}, default=`donsker_varadhan_softplus`) –
The divergence metric to use for the MINE loss. Options include:
- donsker_varadhan: Donsker-Varadhan representation of KL divergence.
- donsker_varadhan_softplus: Softplus version of the Donsker-Varadhan representation.
- fdivergence: f-divergence representation of mutual information.
optimizer_str ({RMSprop, sgd, adam, AdamW, Adagrad, Adamax, Adadelta}, default=`adam`) –
Optimizer type to use for training the neural network. Must be one of:
- RMSprop: Root Mean Square Propagation, an adaptive learning rate method.
- sgd: Stochastic Gradient Descent, a simple and widely-used optimizer.
- adam: Adaptive Moment Estimation, combining momentum and RMSProp for better convergence.
- AdamW: Adam with weight decay, an improved variant of Adam with better regularization.
- Adagrad: Adaptive Gradient Algorithm, adjusting the learning rate based on feature frequency.
- Adamax: Variant of Adam based on infinity norm, more robust with sparse gradients.
- Adadelta: An extension of Adagrad that seeks to reduce its aggressive learning rate decay.
learning_rate (float, optional, default=1e-4) – Learning rate for the optimizer.
reg_strength (float, optional, default=0) – Regularization strength.
encode_classes (bool, optional, default=True) – Indicates if the target variable should be one-hot encoded.
random_state (int, optional, default=42) – Random state for reproducibility.
**kwargs (dict, optional) – Additional keyword arguments passed to the MineMIEstimator constructor.

optimizer_cls¶

Optimizer class selected based on the optimizer_str parameter.

Type:: object

device¶

Device on which the model runs (cuda or cpu).

Type:: torch.device

models¶

List to store the trained models for each configuration.

Type:: list

n_models¶

Number of models trained.

Type:: int

label_binarizer¶

LabelBinarizer instance for encoding class labels.

Type:: LabelBinarizer

final_loss¶

The final average loss over all trained models.

Type:: float

mi_validation_final¶

The final average mutual information validation score.

Type:: float

Notes

The MineMIEstimator trains multiple models with varying configurations (e.g., different hidden layers and units). This ensemble approach allows the estimator to aggregate results from multiple models to produce a more robust estimate of mutual information. The method is particularly effective in cases where the relationships between features and labels are complex or non-linear, as the aggregation process helps to smooth out inconsistencies across individual model predictions.

Example

>>> estimator = MineMIEstimator(n_classes=3, n_features=10)
>>> estimator.fit(X_train, y_train)
>>> mi_estimate = estimator.estimate_mi(X_test, y_test)
>>> print(mi_estimate)

decision_function(X, verbose=0)[source]¶

Predict confidence scores for samples.

This method aggregates the confidence scores across all models in the ensemble.

Parameters:

X (array-like of shape (n_samples, n_features)) – Feature matrix.
verbose (int, optional, default=0) – Verbosity level.

Returns:

final_scores – Predicted confidence scores.

Return type:

array-like of shape (n_samples, n_classes)

estimate_mi(X, y, verbose=0, MON_ITER=1000, **kwargs)[source]¶

Estimate mutual information by taking a mean of estimates obtained from multiple MINE learned models with different architectures.

Parameters:

X (array-like of shape (n_samples, n_features)) – Feature matrix.
y (array-like of shape (n_samples,)) – Target vector.
verbose (int, optional, default=0) – Verbosity level.
MON_ITER (int, optional, default=1000) – Number of iterations for estimating MI.
**kwargs (dict, optional) – Additional keyword arguments.

Returns:

mi_estimated – Estimated mutual information.

Return type:

float

fit(X, y, epochs=100000, verbose=0, **kwd)[source]¶

Fit the ensemble of MINE neural networks with different architectures and estimate mutual information.

The ensemble method trains multiple neural networks with varying configurations (e.g., number of hidden layers and units) and aggregates their mutual information estimates. This aggregation produces a more stable and robust estimate by reducing the variance associated with individual models.

Parameters:

X (array-like of shape (n_samples, n_features)) – Feature matrix.
y (array-like of shape (n_samples,)) – Target vector.
epochs (int, optional, default=100000) – Number of training epochs.
verbose (int, optional, default=0) – Verbosity level.
**kwd (dict, optional) – Additional keyword arguments.

Returns:

self – Fitted estimator.

Return type:

MineMIEstimator

predict(X, verbose=0)[source]¶

Predict class labels for the input samples.

Parameters:

X (array-like of shape (n_samples, n_features)) – Feature matrix.
verbose (int, optional, default=0) – Verbosity level.

Returns:

y_pred – Predicted class labels.

Return type:

array-like of shape (n_samples,)

predict_proba(X, verbose=0)[source]¶

Predict class probabilities for the input samples.

Parameters:

X (array-like of shape (n_samples, n_features)) – Feature matrix.
verbose (int, optional, default=0) – Verbosity level.

Returns:

p_pred – Predicted class probabilities.

Return type:

array-like of shape (n_samples, n_classes)

pytorch_tensor_dataset(X, y, i=2)[source]¶

Create PyTorch tensor datasets for the input features and target labels.

Parameters:

X (array-like of shape (n_samples, n_features)) – Feature matrix.
y (array-like of shape (n_samples,)) – Target vector.
i (int, optional, default=2) – Seed increment for reproducibility.

Returns:

tensor_xy (torch.Tensor) – Tensor containing the original data and labels.
tensor_xy_tilde (torch.Tensor) – Tensor containing the permuted data and labels.

score(X, y, sample_weight=None, verbose=0)[source]¶

Compute the score of the ensemble MINE model.

The score is based on the mutual information estimated by aggregating results from multiple trained models.

Parameters:

X (array-like of shape (n_samples, n_features)) – Feature matrix.
y (array-like of shape (n_samples,)) – Target vector.
sample_weight (array-like of shape (n_samples,), optional) – Sample weights.
verbose (int, optional, default=0) – Verbosity level.

Returns:

score – The score of the model based on the final estimated mutual information.

Return type:

float