autoqild.mi_estimators.mine_estimator_mseΒΆ
Modified MINE estimator that minimizes mean squared error (MSE) to provide more robust MI estimates.
Classes
|
MineMIEstimatorMSE class implements a Mutual Information Neural Estimator (MINE) using Mean Squared Error (MSE) as the primary objective function. |
- class autoqild.mi_estimators.mine_estimator_mse.MineMIEstimatorMSE(n_classes, n_features, n_hidden=2, n_units=100, loss_function='donsker_varadhan_softplus', optimizer_str='adam', learning_rate=0.0001, reg_strength=1e-10, encode_classes=True, random_state=42, **kwargs)[source]ΒΆ
Bases:
MIEstimatorBaseMineMIEstimatorMSE class implements a Mutual Information Neural Estimator (MINE) using Mean Squared Error (MSE) as the primary objective function. The class optimizes neural network architecture through hyperparameter tuning with the goal of minimizing MSE during estimation.
This class leverages MINE techniques and is specifically tailored for hyperparameter optimization, enabling the selection of the best neural network architecture for estimating mutual information.
- Parameters:
n_classes (int) β Number of classes in the classification data samples.
n_features (int) β Number of features or dimensionality of the inputs of the classification data samples.
n_hidden (int, optional, default=2) β Number of hidden layers in the neural network.
n_units (int, optional, default=100) β Number of units per hidden layer.
loss_function ({donsker_varadhan, donsker_varadhan_softplus, fdivergence}, default=`donsker_varadhan_softplus`) β
The divergence metric to use for the MINE loss. Options include:
donsker_varadhan: Donsker-Varadhan representation of KL divergence.
donsker_varadhan_softplus: Softplus version of the Donsker-Varadhan representation.
fdivergence: f-divergence representation of mutual information.
optimizer_str ({RMSprop, sgd, adam, AdamW, Adagrad, Adamax, Adadelta}, default=`adam`) β
Optimizer type to use for training the neural network. Must be one of:
RMSprop: Root Mean Square Propagation, an adaptive learning rate method.
sgd: Stochastic Gradient Descent, a simple and widely-used optimizer.
adam: Adaptive Moment Estimation, combining momentum and RMSProp for better convergence.
AdamW: Adam with weight decay, an improved variant of Adam with better regularization.
Adagrad: Adaptive Gradient Algorithm, adjusting the learning rate based on feature frequency.
Adamax: Variant of Adam based on infinity norm, more robust with sparse gradients.
Adadelta: An extension of Adagrad that seeks to reduce its aggressive learning rate decay.
learning_rate (float, optional, default=1e-4) β Learning rate for the optimizer.
reg_strength (float, optional, default=1e-10) β Regularization strength.
encode_classes (bool, optional, default=True) β Indicates if the target variable should be one-hot encoded.
random_state (int, optional, default=42) β Random state for reproducibility.
**kwargs (dict, optional) β Additional keyword arguments passed to the MineMIEstimatorMSE constructor.
- optimizer_clsΒΆ
Optimizer class selected based on the optimizer_str parameter.
- Type:
object
- deviceΒΆ
Device on which the model runs (cuda or cpu).
- Type:
torch.device
- final_lossΒΆ
The final loss after training the model.
- Type:
float
- mi_valΒΆ
The final estimated mutual information value.
- Type:
float
Notes
This class is particularly suited for scenarios involving hyperparameter tuning where the goal is to identify the optimal architecture that minimizes MSE during mutual information estimation.
Example
>>> estimator = MineMIEstimatorMSE(n_classes=3, n_features=10) >>> estimator.fit(X_train, y_train) >>> score = estimator.score(X_test, y_test) >>> print(score)
- estimate_mi(X, y, verbose=0, MON_ITER=100, **kwargs)[source]ΒΆ
Estimate mutual information using the MINE model.
- Parameters:
X (array-like of shape (n_samples, n_features)) β Feature matrix.
y (array-like of shape (n_samples,)) β Target vector.
verbose (int, optional, default=0) β Verbosity level.
MON_ITER (int, optional, default=100) β Number of iterations for estimating MI.
**kwargs (dict, optional) β Additional keyword arguments.
- Returns:
mi_estimated β Estimated mutual information.
- Return type:
float
- fit(X, y, epochs=100, batch_size=128, verbose=0, **kwd)[source]ΒΆ
Fit the MINE model and estimate mutual information.
- Parameters:
X (array-like of shape (n_samples, n_features)) β Feature matrix.
y (array-like of shape (n_samples,)) β Target vector.
epochs (int, optional, default=10000) β Number of training epochs.
batch_size (int, optional, default=128) β Batch size used for training.
verbose (int, optional, default=0) β Verbosity level.
**kwd (dict, optional) β Additional keyword arguments.
- Returns:
self β Fitted estimator.
- Return type:
- predict(X, verbose=0)[source]ΒΆ
Predict class labels for the input samples.
- Parameters:
X (array-like of shape (n_samples, n_features)) β Feature matrix.
verbose (int, optional, default=0) β Verbosity level.
- Returns:
y_pred β Predicted class labels.
- Return type:
array-like of shape (n_samples,)
- predict_proba(X, verbose=0)[source]ΒΆ
Predict class probabilities for the input samples.
- Parameters:
X (array-like of shape (n_samples, n_features)) β Feature matrix.
verbose (int, optional, default=0) β Verbosity level.
- Returns:
p_pred β Predicted class probabilities.
- Return type:
array-like of shape (n_samples, n_classes)
- pytorch_tensor_dataset(X, y, batch_size=64, i=2)[source]ΒΆ
Create PyTorch tensor datasets for the input features and target labels.
- Parameters:
X (array-like of shape (n_samples, n_features)) β Feature matrix.
y (array-like of shape (n_samples,)) β Target vector.
batch_size (int, optional, default=64) β Size of the batches used for training.
i (int, optional, default=2) β Seed increment for reproducibility.
- Returns:
tensor_xy (torch.Tensor) β Tensor containing the original data and labels.
tensor_xy_tilde (torch.Tensor) β Tensor containing the permuted data and labels.
- score(X, y, sample_weight=None, verbose=0)[source]ΒΆ
Compute the score of the MINE model using the mean squared error between the original and permuted samples.
- Parameters:
X (array-like of shape (n_samples, n_features)) β Feature matrix.
y (array-like of shape (n_samples,)) β Target vector.
sample_weight (array-like of shape (n_samples,), optional) β Sample weights.
verbose (int, optional, default=0) β Verbosity level.
- Returns:
score β The score of the model using the mean squared error between the original and permuted samples loss.
- Return type:
float