autoqild.automl.autogluon_classifier

AutoGluonClassifier is a wrapper for building, training, and evaluating an AutoML model using AutoGluon.

Classes

AutoGluonClassifier(n_features, n_classes[, ...])

AutoGluonClassifier is a wrapper for building, training, and evaluating an AutoML model using AutoGluon.

class autoqild.automl.autogluon_classifier.AutoGluonClassifier(n_features, n_classes, time_limit=1800, output_folder=None, eval_metric='accuracy', use_hyperparameters=True, delete_tmp_folder_after_terminate=True, auto_stack=True, remove_boosting_models=True, verbosity=6, random_state=None, **kwargs)[source]

Bases: AutomlClassifier

AutoGluonClassifier is a wrapper for building, training, and evaluating an AutoML model using AutoGluon.

This class facilitates the use of AutoGluon for automatic machine learning (AutoML) tasks, specifically focusing on classification problems. It handles various aspects of model training, including hyperparameter tuning, model stacking, and model evaluation. The class is designed to work seamlessly with the AutoGluon library, allowing users to leverage its powerful features with minimal setup.

Parameters:
  • n_features (int) – Number of features or dimensionality of the input data.

  • n_classes (int) – Number of classes in the classification problem.

  • time_limit (int, optional) – Time limit for training the model, in seconds. Default is 1800.

  • output_folder (str, optional) – Path to the directory where the trained model and related files will be saved. Default is None.

  • eval_metric (str, optional) – Evaluation metric used to assess the performance of the model. Default is accuracy.

  • use_hyperparameters (bool, optional) – Flag indicating whether to use predefined hyperparameters for model training. Default is True.

  • delete_tmp_folder_after_terminate (bool, optional) – Flag indicating whether to delete the temporary folder after model training is complete. Default is True.

  • auto_stack (bool, optional) – Flag indicating whether to use automatic stacking of models in AutoGluon. Default is True.

  • remove_boosting_models (bool, optional) – Flag indicating whether to exclude boosting models (like GBM, CAT, XGB) from the hyperparameters. Default is True.

  • verbosity (int, optional) – Level of verbosity for logging and output. Default is 6.

  • random_state (int or None, optional) – Seed for random number generation to ensure reproducibility. Default is None.

logger

Logger object used for logging messages and errors.

Type:

logging.Logger

random_state

Random state instance for reproducibility.

Type:

np.random.RandomState

output_folder

Path to the directory where the trained model and related files will be saved.

Type:

str

delete_tmp_folder_after_terminate

Flag indicating whether to delete the temporary folder after model training is complete.

Type:

bool

hyperparameter_tune_kwargs

Dictionary containing options for hyperparameter tuning, including the scheduler and searcher.

Type:

dict

eval_metric

Evaluation metric used to assess the performance of the model.

Type:

str

use_hyperparameters

Flag indicating whether to use predefined hyperparameters for model training.

Type:

bool

verbosity

Level of verbosity for logging and output.

Type:

int

hyperparameters

Dictionary of hyperparameters used for model training. If use_hyperparameters is False, this is None.

Type:

dict or None

exclude_model_types

List of model types to exclude from the training process.

Type:

list

auto_stack

Flag indicating whether to use automatic stacking of models in AutoGluon.

Type:

bool

n_features

Number of features or dimensionality of the input data.

Type:

int

n_classes

Number of classes in the classification problem.

Type:

int

sample_weight

Method for determining sample weights during training, default is auto_weight.

Type:

str

time_limit

Time limit for training the model, in seconds.

Type:

int

model

The AutoGluon model object, initialized after fitting.

Type:

autogluon.tabular.TabularPredictor or None

class_label

Name of the target label column.

Type:

str

columns

List of column names for the input DataFrame, including feature names and the class label.

Type:

list

leaderboard

DataFrame containing information about the models trained during the fitting process.

Type:

pandas.DataFrame or None

Private Methods
---------------
_is_fitted_ bool

Property to check if the model is already fitted.

convert_to_dataframe(X, y=None)[source]

Convert the input data to a DataFrame.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Feature matrix.

  • y (array-like of shape (n_samples,), optional) – Target vector.

Returns:

df_data – DataFrame containing the input data.

Return type:

pandas.DataFrame

decision_function(X, verbose=0)[source]

Compute the decision function in form of class probabilities for the input samples.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Feature matrix.

  • verbose (int, optional, default=0) – Verbosity level.

Returns:

decision – Decision function values.

Return type:

array-like of shape (n_samples,)

fit(X, y, **kwd)[source]

Fit the AutoGluon model to the training data.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Feature matrix.

  • y (array-like of shape (n_samples,)) – Target vector.

  • **kwd (dict, optional) – Additional keyword arguments.

get_k_rank_model(k)[source]

Get the k-th ranked model from the leaderboard.

Parameters:

k (int) – Rank of the model to retrieve.

Returns:

model – The k-th ranked model.

Return type:

autogluon.tabular.TabularPredictor

get_model(model_name)[source]

Get a model by its name from the leaderboard.

Parameters:

model_name (str) – Name of the model to retrieve.

Returns:

model – The specified model.

Return type:

autogluon.tabular.TabularPredictor

predict(X, verbose=0)[source]

Predict class labels for the input samples.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Feature matrix.

  • verbose (int, optional, default=0) – Verbosity level.

Returns:

y_pred – Predicted class labels.

Return type:

array-like of shape (n_samples,)

predict_proba(X, verbose=0)[source]

Predict class probabilities for the input samples.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Feature matrix.

  • verbose (int, optional, default=0) – Verbosity level.

Returns:

y_pred – Predicted class probabilities.

Return type:

array-like of shape (n_samples, n_classes)

score(X, y, sample_weight=None, verbose=0)[source]

Compute the balanced accuracy score for the input samples.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Feature matrix.

  • y (array-like of shape (n_samples,)) – True labels.

  • sample_weight (array-like of shape (n_samples,), optional) – Sample weights.

  • verbose (int, optional, default=0) – Verbosity level.

Returns:

score – Balanced accuracy score.

Return type:

float