autoqild.utilities.metrics

This Python module provides functions for calculating various metrics related to mutual information and classification performance, including binary cross-entropy, upper and lower bounds of mutual information, AUC score, and more.

Functions

auc_score(y_true, p_pred)

Computes the AUC score for the given true labels and predicted probabilities.

bin_ce(p_e)

Computes the binary cross-entropy for a given probability p_e.

false_negative_rate(y_true, y_pred)

Computes the false negative rate (FNR).

false_positive_rate(y_true, y_pred)

Computes the false positive rate (FPR).

fanos_adjusted_lower_bound(y_true, y_pred)

Computes the adjusted Fano"s lower bound for mutual information.

fanos_lower_bound(y_true, y_pred)

Computes Fano"s lower bound for mutual information.

get_entropy_y(y_true)

Computes the entropy of the true labels.

helmann_raviv_function(n_classes, pe)

Computes the Hellman-Raviv function for a given error probability pe.

helmann_raviv_upper_bound(y_true, y_pred)

Computes the Hellman-Raviv upper bound for mutual information based on classification performance.

log_loss_estimation(y_true, y_pred)

Estimates mutual information by evaluating the log-loss of the predicted probabilities and entropy of outputs.

mid_point_mi(y_true, y_pred)

Computes the midpoint mutual information estimate by averaging the upper and lower bounds.

pc_softmax_estimation(y_true, p_pred)

Estimates the mutual information using predicted probabilities in the softmax and PC-Softmax functions.

remove_nan_values(y_pred[, y_true])

Removes rows containing NaN values from the predicted probabilities and true labels.

santhi_vardi_upper_bound(y_true, y_pred)

Computes the Santhi-Vardi upper bound for mutual information.

autoqild.utilities.metrics.auc_score(y_true, p_pred)[source]

Computes the AUC score for the given true labels and predicted probabilities.

Parameters:
  • y_true (ndarray) – True class labels.

  • p_pred (ndarray) – Predicted probabilities.

Returns:

auc_roc – AUC score.

Return type:

float

Notes

  • For multi-class scenarios, the AUC is computed using a one-vs-rest approach.

  • The method includes normalization as a fallback if issues arise during computation.

autoqild.utilities.metrics.bin_ce(p_e)[source]

Computes the binary cross-entropy for a given probability p_e.

Parameters:

p_e (float) – Probability value for which binary cross-entropy is computed.

Returns:

binary_cross_entropy – The binary cross-entropy value.

Return type:

float

Notes

  • This function handles edge cases where p_e is 0 or 1 by adding or subtracting a small epsilon value to prevent division by zero errors.

autoqild.utilities.metrics.false_negative_rate(y_true, y_pred)[source]

Computes the false negative rate (FNR).

Parameters:
  • y_true (ndarray) – True binary labels.

  • y_pred (ndarray) – Predicted binary labels.

Returns:

fnr – False negative rate.

Return type:

float

Notes

  • FNR is calculated as the ratio of false negatives to the sum of false negatives and true positives.

autoqild.utilities.metrics.false_positive_rate(y_true, y_pred)[source]

Computes the false positive rate (FPR).

Parameters:
  • y_true (ndarray) – True binary labels.

  • y_pred (ndarray) – Predicted binary labels.

Returns:

fpr – False positive rate.

Return type:

float

Notes

  • FPR is calculated as the ratio of false positives to the sum of false positives and true negatives.

autoqild.utilities.metrics.fanos_adjusted_lower_bound(y_true, y_pred)[source]

Computes the adjusted Fano”s lower bound for mutual information.

Parameters:
  • y_true (ndarray) – True class labels.

  • y_pred (ndarray) – Predicted class labels.

Returns:

fanos_adjusted_lb – Adjusted Fano”s lower bound.

Return type:

float

Notes

  • This adjusted bound accounts for binary cross-entropy and provides a refined lower bound estimate compared to the standard Fano”s bound.

autoqild.utilities.metrics.fanos_lower_bound(y_true, y_pred)[source]

Computes Fano”s lower bound for mutual information.

Parameters:
  • y_true (ndarray) – True class labels.

  • y_pred (ndarray) – Predicted class labels.

Returns:

fanos_lb – Fano”s lower bound.

Return type:

float

Notes

  • Fano”s bound gives a lower estimate of mutual information by considering the classification error and the complexity of the classification task (in terms of the number of classes).

autoqild.utilities.metrics.helmann_raviv_function(n_classes, pe)[source]

Computes the Hellman-Raviv function for a given error probability pe.

The Hellman-Raviv function is used to estimate the upper bound of mutual information based on classification error rates.

Parameters:
  • n_classes (int) – The number of classes in the classification task.

  • pe (ndarray) – The error probability values for each sample.

Returns:

hrf_values – The computed Hellman-Raviv function values.

Return type:

ndarray

Notes

  • The function partitions the error probabilities into ranges based on the number of classes and computes the upper bound using a series of logarithmic transformations.

autoqild.utilities.metrics.helmann_raviv_upper_bound(y_true, y_pred)[source]

Computes the Hellman-Raviv upper bound for mutual information based on classification performance.

Parameters:
  • y_true (ndarray) – True class labels.

  • y_pred (ndarray) – Predicted class labels.

Returns:

hr_u – The Hellman-Raviv upper bound for mutual information.

Return type:

float

Notes

  • The Hellman-Raviv bound is calculated as the difference between the logarithm of the number of classes and the computed Hellman-Raviv function for the error rate.

autoqild.utilities.metrics.log_loss_estimation(y_true, y_pred)[source]

Estimates mutual information by evaluating the log-loss of the predicted probabilities and entropy of outputs.

Parameters:
  • y_true (ndarray) – True class labels.

  • y_pred (ndarray) – Predicted probabilities.

Returns:

estimated_mi – Estimated mutual information.

Return type:

float

Notes

  • The estimation is based on calculating the entropy H(Y) of the true labels and the average log-loss of the predictions.

  • NaN values in the input are removed before performing the estimation.

autoqild.utilities.metrics.mid_point_mi(y_true, y_pred)[source]

Computes the midpoint mutual information estimate by averaging the upper and lower bounds.

Parameters:
  • y_true (ndarray) – True class labels.

  • y_pred (ndarray) – Predicted class labels.

Returns:

mid_point – Midpoint mutual information estimate.

Return type:

float

Notes

  • This estimate is computed as the average of the Hellman-Raviv upper bound and Fano”s lower bound.

  • The estimate is constrained to be non-negative by taking the maximum with zero.

autoqild.utilities.metrics.pc_softmax_estimation(y_true, p_pred)[source]

Estimates the mutual information using predicted probabilities in the softmax and PC-Softmax functions.

The mutual information I(X; Y) is estimated using the formula:

\[I(X;Y) = H(Y) - H(Y|X)\]

where H(Y) is the entropy of the true labels and H(Y|X) is the conditional entropy estimated from the predicted probabilities.

Softmax Function:

\[S(z_k) = \frac{e^{z_k}}{\sum_{j=1}^{K} e^{z_j}}\]

PC-Softmax Function:

\[S_{pc}(z_k) = \frac{e^{z_k}}{\sum_{j=1}^{K} e^{z_j} \cdot p_j}\]
Parameters:
  • y_true (ndarray) – True class labels.

  • p_pred (ndarray) – Predicted probabilities.

Returns:

estimated_mi – Estimated mutual information.

Return type:

float

Notes

  • The PC-Softmax estimation adjusts the softmax probabilities using class priors, which can improve the robustness of the MI estimate.

  • If the input contains NaN values, they are removed before performing the estimation.

autoqild.utilities.metrics.santhi_vardi_upper_bound(y_true, y_pred)[source]

Computes the Santhi-Vardi upper bound for mutual information.

Parameters:
  • y_true (ndarray) – True class labels.

  • y_pred (ndarray) – Predicted class labels.

Returns:

sv_u – The Santhi-Vardi upper bound.

Return type:

float

Notes

  • The Santhi-Vardi bound is based on the classification error rate and gives an upper estimate of the mutual information, adjusted logarithmically based on the number of classes.