autoqild.mi_estimators.pc_softmax_estimatorΒΆ
MI estimator that uses probability-corrected softmax functions to assess the information content in classification scenarios.
Classes
|
PCSoftmaxMIEstimator estimates Mutual Information (MI) using a neural network trained with a modified softmax function. |
- class autoqild.mi_estimators.pc_softmax_estimator.PCSoftmaxMIEstimator(n_classes, n_features, n_hidden=10, n_units=100, loss_function=NLLLoss(), optimizer_str='adam', learning_rate=0.001, reg_strength=0.001, is_pc_softmax=False, random_state=42)[source]ΒΆ
Bases:
MIEstimatorBasePCSoftmaxMIEstimator estimates Mutual Information (MI) using a neural network trained with a modified softmax function.
This class uses a neural network to estimate the MI between input features and class labels. The neural network is trained using a custom softmax function that accounts for label proportions, which can help in handling imbalanced data.
- Parameters:
n_classes (int) β Number of classes in the classification task.
n_features (int) β Number of features or dimensionality of the input data.
n_hidden (int, optional, default=10) β Number of hidden layers in the neural network.
n_units (int, optional, default=100) β Number of units in each hidden layer.
loss_function (torch.nn.Module, optional, default=torch.nn.NLLLoss()) β Loss function to be used during training.
optimizer_str ({RMSprop, sgd, adam, AdamW, Adagrad, Adamax, Adadelta}, default=`adam`) β
Optimizer type to use for training the neural network. Must be one of:
RMSprop: Root Mean Square Propagation, an adaptive learning rate method.
sgd: Stochastic Gradient Descent, a simple and widely-used optimizer.
βadamβ: Adaptive Moment Estimation, combining momentum and RMSProp for better convergence.
AdamW: Adam with weight decay, an improved variant of Adam with better regularization.
Adagrad: Adaptive Gradient Algorithm, adjusting the learning rate based on feature frequency.
Adamax: Variant of Adam based on infinity norm, more robust with sparse gradients.
Adadelta: An extension of Adagrad that seeks to reduce its aggressive learning rate decay.
learning_rate (float, optional, default=0.001) β Learning rate for the optimizer.
reg_strength (float, optional, default=0.001) β Regularization strength for the optimizer.
is_pc_softmax (bool, optional, default=False) β If True, use the custom softmax function that accounts for label proportions.
random_state (int, optional, default=42) β Seed for random number generation to ensure reproducibility.
- loggerΒΆ
Logger for logging messages and errors.
- Type:
logging.Logger
- optimizerΒΆ
Optimizer used for training the neural network.
- Type:
torch.optim.Optimizer
- dataset_propertiesΒΆ
Proportions of each class in the dataset.
- Type:
list
- final_lossΒΆ
Final loss value after training.
- Type:
float
- mi_valΒΆ
Estimated mutual information after training.
- Type:
float
- deviceΒΆ
Device used for computation (CPU or GPU).
- Type:
torch.device
- decision_function(X, verbose=0)[source]ΒΆ
Compute the decision function in form of class probabilities for the input samples.
- Parameters:
X (array-like of shape (n_samples, n_features)) β Feature matrix.
verbose (int, optional, default=0) β Verbosity level.
- Returns:
scores β Decision function values.
- Return type:
array-like of shape (n_samples, n_classes)
- estimate_mi(X, y, verbose=1, **kwargs)[source]ΒΆ
Estimate Mutual Information using the trained neural network using the Softmax and PC-Softmax loss functions.
\[I(X;Y) = H(Y) - H(Y|X)\]Softmax Function:
\[S(z_k) = \frac{e^{z_k}}{\sum_{j=1}^{K} e^{z_j}}\]where:
( z_k ) is the logit or raw score for class ( k ).
( K ) is the total number of classes.
PC-Softmax Function:
\[S_{pc}(z_k) = \frac{e^{z_k}}{\sum_{j=1}^{K} e^{z_j} \cdot p_j}\]where:
( z_k ) is the logit or raw score for class ( k ).
( p_j = frac{text{counts}_j}{text{total samples}} ) is the prior probability of class ( j )
- Parameters:
X (array-like of shape (n_samples, n_features)) β Input data.
y (array-like of shape (n_samples,)) β Target labels.
verbose (int, optional, default=1) β Verbosity level.
**kwargs (dict, optional) β Additional keyword arguments.
- Returns:
mi_estimated β The estimated mutual information.
- Return type:
float
- fit(X, y, epochs=50, verbose=0, **kwd)[source]ΒΆ
Fit the neural network to the data.
- Parameters:
X (array-like of shape (n_samples, n_features)) β Training data.
y (array-like of shape (n_samples,)) β Target labels.
epochs (int, optional, default=50) β Number of training epochs.
verbose (int, optional, default=0) β Verbosity level.
**kwd (dict, optional) β Additional keyword arguments.
- Returns:
self β Fitted estimator.
- Return type:
- predict(X, verbose=0)[source]ΒΆ
Predict class labels for the input samples.
- Parameters:
X (array-like of shape (n_samples, n_features)) β Feature matrix.
verbose (int, optional, default=0) β Verbosity level.
- Returns:
y_pred β Predicted class labels.
- Return type:
array-like of shape (n_samples,)
- predict_proba(X, verbose=0)[source]ΒΆ
Predict class probabilities for the input samples.
- Parameters:
X (array-like of shape (n_samples, n_features)) β Feature matrix.
verbose (int, optional, default=0) β Verbosity level.
- Returns:
p_pred β Predicted class probabilities.
- Return type:
array-like of shape (n_samples, n_classes)
- score(X, y, sample_weight=None, verbose=0)[source]ΒΆ
Compute the score of the neural network.
- Parameters:
X (array-like of shape (n_samples, n_features)) β Feature matrix.
y (array-like of shape (n_samples,)) β True labels for βXβ.
sample_weight (array-like of shape (n_samples,), optional) β Sample weights.
verbose (int, optional, default=0) β Verbosity level.
- Returns:
score β Negative loss of the model on the validation data.
- Return type:
float