autoqild.mi_estimators.mine_estimator_mse¶

Modified MINE estimator that minimizes mean squared error (MSE) to provide more robust MI estimates.

Classes

MineMIEstimatorMSE(n_classes, n_features[, ...])

MineMIEstimatorMSE class implements a Mutual Information Neural Estimator (MINE) using Mean Squared Error (MSE) as the primary objective function.

class autoqild.mi_estimators.mine_estimator_mse.MineMIEstimatorMSE(n_classes, n_features, n_hidden=2, n_units=100, loss_function='donsker_varadhan_softplus', optimizer_str='adam', learning_rate=0.0001, reg_strength=1e-10, encode_classes=True, random_state=42, **kwargs)[source]¶

Bases: MIEstimatorBase

MineMIEstimatorMSE class implements a Mutual Information Neural Estimator (MINE) using Mean Squared Error (MSE) as the primary objective function. The class optimizes neural network architecture through hyperparameter tuning with the goal of minimizing MSE during estimation.

This class leverages MINE techniques and is specifically tailored for hyperparameter optimization, enabling the selection of the best neural network architecture for estimating mutual information.

Parameters:

n_classes (int) – Number of classes in the classification data samples.
n_features (int) – Number of features or dimensionality of the inputs of the classification data samples.
n_hidden (int, optional, default=2) – Number of hidden layers in the neural network.
n_units (int, optional, default=100) – Number of units per hidden layer.
loss_function ({donsker_varadhan, donsker_varadhan_softplus, fdivergence}, default=`donsker_varadhan_softplus`) –
The divergence metric to use for the MINE loss. Options include:
- donsker_varadhan: Donsker-Varadhan representation of KL divergence.
- donsker_varadhan_softplus: Softplus version of the Donsker-Varadhan representation.
- fdivergence: f-divergence representation of mutual information.
optimizer_str ({RMSprop, sgd, adam, AdamW, Adagrad, Adamax, Adadelta}, default=`adam`) –
Optimizer type to use for training the neural network. Must be one of:
- RMSprop: Root Mean Square Propagation, an adaptive learning rate method.
- sgd: Stochastic Gradient Descent, a simple and widely-used optimizer.
- adam: Adaptive Moment Estimation, combining momentum and RMSProp for better convergence.
- AdamW: Adam with weight decay, an improved variant of Adam with better regularization.
- Adagrad: Adaptive Gradient Algorithm, adjusting the learning rate based on feature frequency.
- Adamax: Variant of Adam based on infinity norm, more robust with sparse gradients.
- Adadelta: An extension of Adagrad that seeks to reduce its aggressive learning rate decay.
learning_rate (float, optional, default=1e-4) – Learning rate for the optimizer.
reg_strength (float, optional, default=1e-10) – Regularization strength.
encode_classes (bool, optional, default=True) – Indicates if the target variable should be one-hot encoded.
random_state (int, optional, default=42) – Random state for reproducibility.
**kwargs (dict, optional) – Additional keyword arguments passed to the MineMIEstimatorMSE constructor.

optimizer_cls¶

Optimizer class selected based on the optimizer_str parameter.

Type:: object

device¶

Device on which the model runs (cuda or cpu).

Type:: torch.device

stat_net¶

Neural network model for estimating mutual information.

Type:: StatNet

final_loss¶

The final loss after training the model.

Type:: float

mi_val¶

The final estimated mutual information value.

Type:: float

Notes

This class is particularly suited for scenarios involving hyperparameter tuning where the goal is to identify the optimal architecture that minimizes MSE during mutual information estimation.

Example

>>> estimator = MineMIEstimatorMSE(n_classes=3, n_features=10)
>>> estimator.fit(X_train, y_train)
>>> score = estimator.score(X_test, y_test)
>>> print(score)

decision_function(X, verbose=0)[source]¶

estimate_mi(X, y, verbose=0, MON_ITER=100, **kwargs)[source]¶

Estimate mutual information using the MINE model.

Parameters:

X (array-like of shape (n_samples, n_features)) – Feature matrix.
y (array-like of shape (n_samples,)) – Target vector.
verbose (int, optional, default=0) – Verbosity level.
MON_ITER (int, optional, default=100) – Number of iterations for estimating MI.
**kwargs (dict, optional) – Additional keyword arguments.

Returns:

mi_estimated – Estimated mutual information.

Return type:

float

fit(X, y, epochs=100, batch_size=128, verbose=0, **kwd)[source]¶

Fit the MINE model and estimate mutual information.

Parameters:

X (array-like of shape (n_samples, n_features)) – Feature matrix.
y (array-like of shape (n_samples,)) – Target vector.
epochs (int, optional, default=10000) – Number of training epochs.
batch_size (int, optional, default=128) – Batch size used for training.
verbose (int, optional, default=0) – Verbosity level.
**kwd (dict, optional) – Additional keyword arguments.

Returns:

self – Fitted estimator.

Return type:

MineMIEstimatorMSE

predict(X, verbose=0)[source]¶

Predict class labels for the input samples.

Parameters:

X (array-like of shape (n_samples, n_features)) – Feature matrix.
verbose (int, optional, default=0) – Verbosity level.

Returns:

y_pred – Predicted class labels.

Return type:

array-like of shape (n_samples,)

predict_proba(X, verbose=0)[source]¶

Predict class probabilities for the input samples.

Parameters:

X (array-like of shape (n_samples, n_features)) – Feature matrix.
verbose (int, optional, default=0) – Verbosity level.

Returns:

p_pred – Predicted class probabilities.

Return type:

array-like of shape (n_samples, n_classes)

pytorch_tensor_dataset(X, y, batch_size=64, i=2)[source]¶

Create PyTorch tensor datasets for the input features and target labels.

Parameters:

X (array-like of shape (n_samples, n_features)) – Feature matrix.
y (array-like of shape (n_samples,)) – Target vector.
batch_size (int, optional, default=64) – Size of the batches used for training.
i (int, optional, default=2) – Seed increment for reproducibility.

Returns:

tensor_xy (torch.Tensor) – Tensor containing the original data and labels.
tensor_xy_tilde (torch.Tensor) – Tensor containing the permuted data and labels.

score(X, y, sample_weight=None, verbose=0)[source]¶

Compute the score of the MINE model using the mean squared error between the original and permuted samples.

Parameters:

X (array-like of shape (n_samples, n_features)) – Feature matrix.
y (array-like of shape (n_samples,)) – Target vector.
sample_weight (array-like of shape (n_samples,), optional) – Sample weights.
verbose (int, optional, default=0) – Verbosity level.

Returns:

score – The score of the model using the mean squared error between the original and permuted samples loss.

Return type:

float