autoqild.automl.tabpfn_classifier¶
AutoTabPFNClassifier is an AutoML model wrapper designed to work with the TabPFN (Tabular Prior-based Fully Bayesian Network) for classification tasks.
Classes
|
AutoTabPFNClassifier is an AutoML model wrapper designed to work with the TabPFN (Tabular Prior-based Fully Bayesian Network) for classification tasks. |
- class autoqild.automl.tabpfn_classifier.AutoTabPFNClassifier(n_features, n_classes, n_ensembles=100, n_reduced=20, reduction_technique='select_from_model_rf', base_path=None, random_state=None, **kwargs)[source]¶
Bases:
AutomlClassifierAutoTabPFNClassifier is an AutoML model wrapper designed to work with the TabPFN (Tabular Prior-based Fully Bayesian Network) for classification tasks.
This class provides a high-level interface to automatically build, train, and evaluate a TabPFN model on tabular data. It supports various configurations and allows for dimensionality reduction if the number of features exceeds a specified threshold. The class is equipped to handle different feature reduction techniques and can operate on both CPU and GPU, depending on the available resources.
- Parameters:
n_features (int) – The number of features in the input data.
n_classes (int) – The number of classes in the classification task.
n_ensembles (int, default=100) – The number of ensemble configurations used by the TabPFN model.
n_reduced (int, default=20) – The number of features to reduce to if n_features exceeds 50.
reduction_technique (str, optional, default=`select_from_model_rf`) –
Technique to use for feature reduction, provided by scikit-learn. Must be one of:
recursive_feature_elimination_et: Uses ExtraTreesClassifier to recursively remove features and build a model.
recursive_feature_elimination_rf: Uses RandomForestClassifier to recursively remove features and build a model.
select_from_model_et: Meta-transformer for selecting features based on importance weights using ExtraTreesClassifier.
select_from_model_rf: Meta-transformer for selecting features based on importance weights using RandomForestClassifier.
pca: Principal Component Analysis for dimensionality reduction.
lda: Linear Discriminant Analysis for separating classes.
tsne: t-Distributed Stochastic Neighbor Embedding for visualization purposes.
nmf: Non-Negative Matrix Factorization for dimensionality reduction.
base_path (str or None, default=None) – The path where the trained model and other outputs are saved. If None, no model is saved.
random_state (int or None, default=None) – Seed for random number generation to ensure reproducibility.
**kwargs (dict) – Additional keyword arguments.
- n_features¶
The number of features in the input data.
- Type:
int
- n_classes¶
The number of classes in the classification task.
- Type:
int
- n_ensembles¶
The number of ensemble configurations used by the TabPFN model.
- Type:
int
- n_reduced¶
The number of features to reduce to if n_features exceeds 50.
- Type:
int
- reduction_technique¶
The technique used for feature reduction.
- Type:
str
- base_path¶
The path where the trained model and other outputs are saved.
- Type:
str or None
- random_state¶
Seed for random number generation to ensure reproducibility.
- Type:
int or None
- device¶
The device used for computation, either cpu or cuda depending on the availability of a GPU.
- Type:
str
- selection_model¶
The model used for dimensionality reduction. Initialized during the first call to transform.
- Type:
object or None
- logger¶
Logger object used for logging messages and errors.
- Type:
logging.Logger
- model¶
The TabPFN model object, initialized after fitting.
- Type:
TabPFNClassifier or None
- __is_fitted__¶
Flag indicating whether the dimensionality reduction model is fitted.
- Type:
bool
- Private Methods
- ---------------
- __transform__(X, y=None)[source]¶
Transform and reduce the feature matrix with n_features features, using the specified reduction technique to the feature matrix with n_reduced features.
- decision_function(X, verbose=0)[source]¶
Compute the decision function in form of class probabilities for the input samples.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Feature matrix.
verbose (int, optional, default=0) – Verbosity level.
- Returns:
decision – Decision function values.
- Return type:
array-like of shape (n_samples,)
- fit(X, y, **kwd)[source]¶
Fit the TabPFN model to the training data.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Feature matrix.
y (array-like of shape (n_samples,)) – Target vector.
**kwd (dict, optional) – Additional keyword arguments.
- predict(X, verbose=0)[source]¶
Predict class labels for the input samples.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Feature matrix.
verbose (int, optional, default=0) – Verbosity level.
- Returns:
y_pred – Predicted class labels.
- Return type:
array-like of shape (n_samples,)
- predict_proba(X, batch_size=128, verbose=0)[source]¶
Predict class probabilities for the input samples.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Feature matrix.
batch_size (int, optional, default=32) – Number of samples for which predictions are obtained at one time using the learned model.
verbose (int, optional, default=0) – Verbosity level.
- Returns:
y_pred – Predicted class probabilities.
- Return type:
array-like of shape (n_samples, n_classes)
- score(X, y, sample_weight=None, verbose=0)[source]¶
Compute the balanced accuracy score for the input samples.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Feature matrix.
y (array-like of shape (n_samples,)) – True labels.
sample_weight (array-like of shape (n_samples,), optional) – Sample weights.
verbose (int, optional, default=0) – Verbosity level.
- Returns:
acc – Balanced accuracy score.
- Return type:
float