autoqild.mi_estimators.pc_softmax_estimator¶

MI estimator that uses probability-corrected softmax functions to assess the information content in classification scenarios.

Classes

PCSoftmaxMIEstimator(n_classes, n_features)

PCSoftmaxMIEstimator estimates Mutual Information (MI) using a neural network trained with a modified softmax function.

class autoqild.mi_estimators.pc_softmax_estimator.PCSoftmaxMIEstimator(n_classes, n_features, n_hidden=10, n_units=100, loss_function=NLLLoss(), optimizer_str='adam', learning_rate=0.001, reg_strength=0.001, is_pc_softmax=False, random_state=42)[source]¶

Bases: MIEstimatorBase

PCSoftmaxMIEstimator estimates Mutual Information (MI) using a neural network trained with a modified softmax function.

This class uses a neural network to estimate the MI between input features and class labels. The neural network is trained using a custom softmax function that accounts for label proportions, which can help in handling imbalanced data.

Parameters:

n_classes (int) – Number of classes in the classification task.
n_features (int) – Number of features or dimensionality of the input data.
n_hidden (int, optional, default=10) – Number of hidden layers in the neural network.
n_units (int, optional, default=100) – Number of units in each hidden layer.
loss_function (torch.nn.Module, optional, default=torch.nn.NLLLoss()) – Loss function to be used during training.
optimizer_str ({RMSprop, sgd, adam, AdamW, Adagrad, Adamax, Adadelta}, default=`adam`) –
Optimizer type to use for training the neural network. Must be one of:
- RMSprop: Root Mean Square Propagation, an adaptive learning rate method.
- sgd: Stochastic Gradient Descent, a simple and widely-used optimizer.
- ”adam”: Adaptive Moment Estimation, combining momentum and RMSProp for better convergence.
- AdamW: Adam with weight decay, an improved variant of Adam with better regularization.
- Adagrad: Adaptive Gradient Algorithm, adjusting the learning rate based on feature frequency.
- Adamax: Variant of Adam based on infinity norm, more robust with sparse gradients.
- Adadelta: An extension of Adagrad that seeks to reduce its aggressive learning rate decay.
learning_rate (float, optional, default=0.001) – Learning rate for the optimizer.
reg_strength (float, optional, default=0.001) – Regularization strength for the optimizer.
is_pc_softmax (bool, optional, default=False) – If True, use the custom softmax function that accounts for label proportions.
random_state (int, optional, default=42) – Seed for random number generation to ensure reproducibility.

logger¶

Logger for logging messages and errors.

Type:: logging.Logger

optimizer¶

Optimizer used for training the neural network.

Type:: torch.optim.Optimizer

class_net¶

Instance of the neural network used for classification.

Type:: ClassNet

dataset_properties¶

Proportions of each class in the dataset.

Type:: list

final_loss¶

Final loss value after training.

Type:: float

mi_val¶

Estimated mutual information after training.

Type:: float

device¶

Device used for computation (CPU or GPU).

Type:: torch.device

decision_function(X, verbose=0)[source]¶

Compute the decision function in form of class probabilities for the input samples.

Parameters:

X (array-like of shape (n_samples, n_features)) – Feature matrix.
verbose (int, optional, default=0) – Verbosity level.

Returns:

scores – Decision function values.

Return type:

array-like of shape (n_samples, n_classes)

estimate_mi(X, y, verbose=1, **kwargs)[source]¶

Estimate Mutual Information using the trained neural network using the Softmax and PC-Softmax loss functions.

\[I(X;Y) = H(Y) - H(Y|X)\]

Softmax Function:

\[S(z_k) = \frac{e^{z_k}}{\sum_{j=1}^{K} e^{z_j}}\]

where:

( z_k ) is the logit or raw score for class ( k ).

( K ) is the total number of classes.

PC-Softmax Function:

\[S_{pc}(z_k) = \frac{e^{z_k}}{\sum_{j=1}^{K} e^{z_j} \cdot p_j}\]

where:

( z_k ) is the logit or raw score for class ( k ).

( p_j = frac{text{counts}_j}{text{total samples}} ) is the prior probability of class ( j )

Parameters:

X (array-like of shape (n_samples, n_features)) – Input data.
y (array-like of shape (n_samples,)) – Target labels.
verbose (int, optional, default=1) – Verbosity level.
**kwargs (dict, optional) – Additional keyword arguments.

Returns:

mi_estimated – The estimated mutual information.

Return type:

float

fit(X, y, epochs=50, verbose=0, **kwd)[source]¶

Fit the neural network to the data.

Parameters:

X (array-like of shape (n_samples, n_features)) – Training data.
y (array-like of shape (n_samples,)) – Target labels.
epochs (int, optional, default=50) – Number of training epochs.
verbose (int, optional, default=0) – Verbosity level.
**kwd (dict, optional) – Additional keyword arguments.

Returns:

self – Fitted estimator.

Return type:

PCSoftmaxMIEstimator

predict(X, verbose=0)[source]¶

Predict class labels for the input samples.

Parameters:

X (array-like of shape (n_samples, n_features)) – Feature matrix.
verbose (int, optional, default=0) – Verbosity level.

Returns:

y_pred – Predicted class labels.

Return type:

array-like of shape (n_samples,)

predict_proba(X, verbose=0)[source]¶

Predict class probabilities for the input samples.

Parameters:

X (array-like of shape (n_samples, n_features)) – Feature matrix.
verbose (int, optional, default=0) – Verbosity level.

Returns:

p_pred – Predicted class probabilities.

Return type:

array-like of shape (n_samples, n_classes)

score(X, y, sample_weight=None, verbose=0)[source]¶

Compute the score of the neural network.

Parameters:

X (array-like of shape (n_samples, n_features)) – Feature matrix.
y (array-like of shape (n_samples,)) – True labels for “X”.
sample_weight (array-like of shape (n_samples,), optional) – Sample weights.
verbose (int, optional, default=0) – Verbosity level.

Returns:

score – Negative loss of the model on the validation data.

Return type:

float