autoqild.mi_estimators.tab_pfn_estimatorΒΆ
MI estimator integrating the TabPFN model, optimized for small tabular datasets with efficient MI estimation.
Classes
|
TabPFNMIEstimator integrates the TabPFN framework into the Mutual Information (MI) estimation process for classification tasks. |
- class autoqild.mi_estimators.tab_pfn_estimator.TabPFNMIEstimator(n_features, n_classes, n_ensembles=100, n_reduced=20, reduction_technique='select_from_model_rf', base_path='./', random_state=None, **kwargs)[source]ΒΆ
Bases:
ClassificationMIEstimatorTabPFNMIEstimator integrates the TabPFN framework into the Mutual Information (MI) estimation process for classification tasks.
This class extends the ClassficationMIEstimator by using TabPFN as the base estimator. TabPFN is a powerful and efficient AutoML tool for small tabular datasets, capable of providing rapid predictions with pre-trained transformer models. The integration supports advanced feature reduction techniques, making it a robust choice for MI estimation in scenarios where both accuracy and efficiency are critical.
- Parameters:
n_features (int) β The number of features in the input data.
n_classes (int) β The number of classes in the classification task.
n_ensembles (int, optional, default=100) β Number of ensemble models used in TabPFN to enhance prediction stability.
n_reduced (int, optional, default=20) β Number of features to reduce to if reduction_technique is applied.
reduction_technique (str, optional, default=`select_from_model_rf`) β
Technique to use for feature reduction, provided by scikit-learn. Must be one of:
recursive_feature_elimination_et: Uses ExtraTreesClassifier to recursively remove features and build a model.
recursive_feature_elimination_rf: Uses RandomForestClassifier to recursively remove features and build a model.
select_from_model_et: Meta-transformer for selecting features based on importance weights using ExtraTreesClassifier.
select_from_model_rf: Meta-transformer for selecting features based on importance weights using RandomForestClassifier.
pca: Principal Component Analysis for dimensionality reduction.
lda: Linear Discriminant Analysis for separating classes.
tsne: t-Distributed Stochastic Neighbor Embedding for visualization purposes.
nmf: Non-Negative Matrix Factorization for dimensionality reduction.
base_path (str, optional) β Directory to save model files. Default is None.
random_state (int or None, optional, default=None) β Seed for random number generation to ensure reproducibility.
**kwargs (dict, optional) β Additional keyword arguments passed to the AutoTabPFNClassifier constructor.
- base_estimatorΒΆ
The base AutoML estimator used for classification.
- Type:
- learner_paramsΒΆ
Parameters used to configure the base learner.
- Type:
dict
- base_learnerΒΆ
Instance of the TabPFN classifier used for learning.
- Type:
- decision_function(X, verbose=0)[source]ΒΆ
Predict confidence scores for samples using the TabPFN model.
- Parameters:
X (array-like of shape (n_samples, n_features)) β Input samples.
verbose (int, optional, default=0) β Verbosity level.
- Returns:
scores β Predicted confidence scores.
- Return type:
array-like of shape (n_samples, n_classes)
- estimate_mi(X, y, method='LogLossMI', **kwargs)[source]ΒΆ
Estimate Mutual Information using the specified method with the TabPFN model.
- Parameters:
X (array-like of shape (n_samples, n_features)) β Input data.
y (array-like of shape (n_samples,)) β Target labels.
method (str, optional, default=`LogLossMI`) β
The method to use for mutual information estimation. Options include:
βMidPointMIβ: Estimate MI using Mid-point method.
LogLossMI: Estimate MI using Log-Loss method.
LogLossMIIsotonicRegression: Estimate MI using Log-Loss method with Isotonic Regression.
LogLossMIPlattScaling: Estimate MI using Log-Loss method with Platt Scaling.
LogLossMIBetaCalibration: Estimate MI using Log-Loss method with Beta Calibration.
LogLossMITemperatureScaling: Estimate MI using Log-Loss method with Temperature Scaling.
LogLossMIHistogramBinning: Estimate MI using Log-Loss method with Histogram Binning.
PCSoftmaxMI: Estimate MI using Softmax probabilities.
**kwargs (dict, optional) β Additional keyword arguments passed to the estimation methods.
- Returns:
mutual_information β A mean of estimated MI values from cross-validation splits.
- Return type:
float
- fit(X, y, **kwd)[source]ΒΆ
Fit the TabPFN classification model to the data.
This method trains the TabPFN model using the provided dataset. It leverages the hyperparameters and reduction techniques specified during initialization.
- Parameters:
X (array-like of shape (n_samples, n_features)) β Training data.
y (array-like of shape (n_samples,)) β Target labels.
**kwd (dict, optional) β Additional keyword arguments passed to the fit method of the base learner.
- Returns:
self β Fitted estimator.
- Return type:
- predict(X, verbose=0)[source]ΒΆ
Predict class labels for samples in X using the TabPFN model.
- Parameters:
X (array-like of shape (n_samples, n_features)) β Input samples.
verbose (int, optional, default=0) β Verbosity level.
- Returns:
y_pred β Predicted class labels.
- Return type:
array-like of shape (n_samples,)
- predict_proba(X, verbose=0)[source]ΒΆ
Predict class probabilities for samples in X using the TabPFN model.
- Parameters:
X (array-like of shape (n_samples, n_features)) β Input samples.
verbose (int, optional, default=0) β Verbosity level.
- Returns:
p_pred β Predicted class probabilities.
- Return type:
array-like of shape (n_samples, n_classes)
- score(X, y, sample_weight=None, verbose=0)[source]ΒΆ
Return the accuracy score of the TabPFN model on the given test data and labels.
- Parameters:
X (array-like of shape (n_samples, n_features)) β Test samples.
y (array-like of shape (n_samples,)) β True labels for X.
sample_weight (array-like of shape (n_samples,), optional) β Sample weights.
verbose (int, optional, default=0) β Verbosity level.
- Returns:
score β Mean accuracy of self.predict(X) w.r.t. y.
- Return type:
float