autoqild.automl.tabpfn_classifier

AutoTabPFNClassifier is an AutoML model wrapper designed to work with the TabPFN (Tabular Prior-based Fully Bayesian Network) for classification tasks.

Classes

AutoTabPFNClassifier(n_features, n_classes)

AutoTabPFNClassifier is an AutoML model wrapper designed to work with the TabPFN (Tabular Prior-based Fully Bayesian Network) for classification tasks.

class autoqild.automl.tabpfn_classifier.AutoTabPFNClassifier(n_features, n_classes, n_ensembles=100, n_reduced=20, reduction_technique='select_from_model_rf', base_path=None, random_state=None, **kwargs)[source]

Bases: AutomlClassifier

AutoTabPFNClassifier is an AutoML model wrapper designed to work with the TabPFN (Tabular Prior-based Fully Bayesian Network) for classification tasks.

This class provides a high-level interface to automatically build, train, and evaluate a TabPFN model on tabular data. It supports various configurations and allows for dimensionality reduction if the number of features exceeds a specified threshold. The class is equipped to handle different feature reduction techniques and can operate on both CPU and GPU, depending on the available resources.

Parameters:
  • n_features (int) – The number of features in the input data.

  • n_classes (int) – The number of classes in the classification task.

  • n_ensembles (int, default=100) – The number of ensemble configurations used by the TabPFN model.

  • n_reduced (int, default=20) – The number of features to reduce to if n_features exceeds 50.

  • reduction_technique (str, optional, default=`select_from_model_rf`) –

    Technique to use for feature reduction, provided by scikit-learn. Must be one of:

    • recursive_feature_elimination_et: Uses ExtraTreesClassifier to recursively remove features and build a model.

    • recursive_feature_elimination_rf: Uses RandomForestClassifier to recursively remove features and build a model.

    • select_from_model_et: Meta-transformer for selecting features based on importance weights using ExtraTreesClassifier.

    • select_from_model_rf: Meta-transformer for selecting features based on importance weights using RandomForestClassifier.

    • pca: Principal Component Analysis for dimensionality reduction.

    • lda: Linear Discriminant Analysis for separating classes.

    • tsne: t-Distributed Stochastic Neighbor Embedding for visualization purposes.

    • nmf: Non-Negative Matrix Factorization for dimensionality reduction.

  • base_path (str or None, default=None) – The path where the trained model and other outputs are saved. If None, no model is saved.

  • random_state (int or None, default=None) – Seed for random number generation to ensure reproducibility.

  • **kwargs (dict) – Additional keyword arguments.

n_features

The number of features in the input data.

Type:

int

n_classes

The number of classes in the classification task.

Type:

int

n_ensembles

The number of ensemble configurations used by the TabPFN model.

Type:

int

n_reduced

The number of features to reduce to if n_features exceeds 50.

Type:

int

reduction_technique

The technique used for feature reduction.

Type:

str

base_path

The path where the trained model and other outputs are saved.

Type:

str or None

random_state

Seed for random number generation to ensure reproducibility.

Type:

int or None

device

The device used for computation, either cpu or cuda depending on the availability of a GPU.

Type:

str

selection_model

The model used for dimensionality reduction. Initialized during the first call to transform.

Type:

object or None

logger

Logger object used for logging messages and errors.

Type:

logging.Logger

model

The TabPFN model object, initialized after fitting.

Type:

TabPFNClassifier or None

__is_fitted__

Flag indicating whether the dimensionality reduction model is fitted.

Type:

bool

Private Methods
---------------
__clear_memory__[source]

Clear memory to release resources by torch.

__transform__(X, y=None)[source]

Transform and reduce the feature matrix with n_features features, using the specified reduction technique to the feature matrix with n_reduced features.

decision_function(X, verbose=0)[source]

Compute the decision function in form of class probabilities for the input samples.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Feature matrix.

  • verbose (int, optional, default=0) – Verbosity level.

Returns:

decision – Decision function values.

Return type:

array-like of shape (n_samples,)

fit(X, y, **kwd)[source]

Fit the TabPFN model to the training data.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Feature matrix.

  • y (array-like of shape (n_samples,)) – Target vector.

  • **kwd (dict, optional) – Additional keyword arguments.

predict(X, verbose=0)[source]

Predict class labels for the input samples.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Feature matrix.

  • verbose (int, optional, default=0) – Verbosity level.

Returns:

y_pred – Predicted class labels.

Return type:

array-like of shape (n_samples,)

predict_proba(X, batch_size=128, verbose=0)[source]

Predict class probabilities for the input samples.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Feature matrix.

  • batch_size (int, optional, default=32) – Number of samples for which predictions are obtained at one time using the learned model.

  • verbose (int, optional, default=0) – Verbosity level.

Returns:

y_pred – Predicted class probabilities.

Return type:

array-like of shape (n_samples, n_classes)

score(X, y, sample_weight=None, verbose=0)[source]

Compute the balanced accuracy score for the input samples.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Feature matrix.

  • y (array-like of shape (n_samples,)) – True labels.

  • sample_weight (array-like of shape (n_samples,), optional) – Sample weights.

  • verbose (int, optional, default=0) – Verbosity level.

Returns:

acc – Balanced accuracy score.

Return type:

float