autoqild.mi_estimators.gmm_mi_estimator¶
Gaussian Mixture Model-based MI estimator for evaluating mutual information using probabilistic clustering.
Classes
|
GMMMIEstimator class for estimating Mutual Information (MI) using Gaussian Mixture Models (GMMs) and performing classification using Logistic Regression. |
- class autoqild.mi_estimators.gmm_mi_estimator.GMMMIEstimator(n_classes, n_features, y_cat=False, covariance_type='full', reg_covar=1e-06, val_size=0.3, n_reduced=20, reduction_technique='select_from_model_rf', random_state=42, **kwargs)[source]¶
Bases:
MIEstimatorBaseGMMMIEstimator class for estimating Mutual Information (MI) using Gaussian Mixture Models (GMMs) and performing classification using Logistic Regression.
This class leverages GMMs to estimate mutual information and uses feature reduction techniques to create a robust classification model. It evaluates different GMMs based on goodness-of-fit measures such as AIC, BIC, and log-likelihood.
- Parameters:
n_classes (int) – Number of classes in the classification data samples.
n_features (int) – Number of features or dimensionality of the inputs of the classification data samples.
y_cat (bool, optional, default=False) – Indicates if the target variable should be considered categorical or real-valued.
covariance_type ({full, tied, diag, spherical}, default=`full`) –
String describing the type of covariance parameters to use. Must be one of:
full: each component has its own general covariance matrix.
tied: all components share the same general covariance matrix.
diag: each component has its own diagonal covariance matrix.
spherical: each component has its own single variance.
reg_covar (float, default=1e-6) – Non-negative regularization added to the diagonal of covariance. Ensures that the covariance matrices are all positive.
val_size (float, optional, default=0.30) – Validation set size as a proportion of the dataset to estimate GMMs.
n_reduced (int, optional, default=20) – Number of features to reduce to in case n_features > 100.
reduction_technique (str, optional, default=`select_from_model_rf`) –
Technique to use for feature reduction, provided by scikit-learn. Must be one of:
recursive_feature_elimination_et: Uses ExtraTreesClassifier to recursively remove features and build a model.
recursive_feature_elimination_rf: Uses RandomForestClassifier to recursively remove features and build a model.
select_from_model_et: Meta-transformer for selecting features based on importance weights using ExtraTreesClassifier.
select_from_model_rf: Meta-transformer for selecting features based on importance weights using RandomForestClassifier.
pca: Principal Component Analysis for dimensionality reduction.
lda: Linear Discriminant Analysis for separating classes.
tsne: t-Distributed Stochastic Neighbor Embedding for visualization purposes.
nmf: Non-Negative Matrix Factorization for dimensionality reduction.
random_state (int or object, optional, default=42) – Random state for reproducibility.
**kwargs (dict, optional) – Additional keyword arguments.
- y_cat¶
Indicates if the target variable should be considered categorical or real-valued.
- Type:
bool
- num_comps¶
List of component counts for GMM evaluation.
- Type:
list
- reg_covar¶
Regularization parameter for the GMM covariance matrices.
- Type:
float
- n_models¶
Number of GMM models to fit and evaluate.
- Type:
int
- covariance_type¶
The covariance type for the GMM.
- Type:
str
- val_size¶
Validation set size as a proportion of the dataset.
- Type:
float
- n_reduced¶
Number of reduced features for dimensionality reduction.
- Type:
int
- reduction_technique¶
Technique used for feature reduction.
- Type:
str
- selection_model¶
The fitted feature selection model, or None if not yet fitted.
- Type:
object or None
- __is_fitted__¶
Indicates whether the model is fitted.
- Type:
bool
- cls_model¶
The classification model used after feature reduction.
- Type:
LogisticRegression
- best_model¶
The best fitted GMM model based on likelihood, or None if no model is selected.
- Type:
object or None
- best_gmm_model¶
The best fitted GMM used for mutual information estimation.
- Type:
object or None
- best_likelihood¶
The highest log-likelihood score achieved during model evaluation.
- Type:
float or None
- best_bic¶
The best Bayesian Information Criterion (BIC) score.
- Type:
float or None
- best_aic¶
The best Akaike Information Criterion (AIC) score.
- Type:
float or None
- best_mi¶
The best estimated mutual information.
- Type:
float or None
- best_seed¶
The random seed used to achieve the best model.
- Type:
int or None
- round¶
The optimal round for feature selection.
- Type:
int or None
- logger¶
Logger instance for logging information.
- Type:
logging.Logger
- Private Methods
- ---------------
- __get_goodnessof_fit__(gmm, X, y)[source]¶
Calculate goodness of fit for the GMM model(s) used for MI estimation using Gaussian Mixture Models (GMMs).
- __transform__(X, y=None)[source]¶
Transform and reduce the feature matrix with ‘n_features’ features, using the specified reduction technique to the feature matrix with ‘n_reduced’ features.
- create_classification_model(X, y, **kwd)[source]¶
Create the logistic regression classification model on reduced feature space with n_reduced features.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Feature matrix.
y (array-like of shape (n_samples,)) – Target vector.
**kwd (dict, optional) – Additional keyword arguments.
- decision_function(X, verbose=0)[source]¶
Predict confidence scores for samples, which is proportional to the signed distance of that sample to the hyperplane.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Feature matrix.
verbose (int, optional, default=0) – Verbosity level.
- Returns:
decision – Decision function values.
- Return type:
array-like of shape (n_samples,)
- estimate_mi(X, y, verbose=0, **kwd)[source]¶
Estimate mutual information using the best fitted GMM model.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Feature matrix.
y (array-like of shape (n_samples,)) – Target vector.
verbose (int, optional, default=0) – Verbosity level.
**kwd (dict, optional) – Additional keyword arguments.
- Returns:
mi_estimated – Estimated mutual information.
- Return type:
float
- fit(X, y, verbose=0, **kwd)[source]¶
Fit the GMM model and estimate mutual information.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Feature matrix.
y (array-like of shape (n_samples,)) – Target vector.
verbose (int, optional, default=0) – print or not to print!?.
**kwd (dict, optional) – Additional keyword arguments.
- Returns:
self – Fitted estimator.
- Return type:
- predict(X, verbose=0)[source]¶
Predict class labels for the input samples with reduced features of n_reduced using the fitted logistic regression classification model.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Feature matrix.
verbose (int, optional, default=0) – Verbosity level.
- Returns:
y_pred – Predicted class labels.
- Return type:
array-like of shape (n_samples,)
- predict_proba(X, verbose=0)[source]¶
Predict class labels for the input samples with reduced features of n_reduced using the fitted logistic regression classification model.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Feature matrix.
verbose (int, optional, default=0) – Verbosity level.
- Returns:
y_pred – Predicted class labels.
- Return type:
array-like of shape (n_samples,)
- score(X, y, sample_weight=None, verbose=0)[source]¶
Compute the likelihood score of the GMM model.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Feature matrix.
y (array-like of shape (n_samples,)) – Target vector.
sample_weight (array-like of shape (n_samples,), optional) – Sample weights.
verbose (int, optional, default=0) – Verbosity level.
- Returns:
score – The score of the model based on likelihood.
- Return type:
float