autoqild.dataset_readers.open_ml_timming_dr¶

Reader for OpenML datasets focusing on timing features for data leakage analysis.

Classes

OpenMLTimingDatasetReader(dataset_id, imbalance)

Reader for OpenML datasets that are specifically designed for timing- based attacks.

class autoqild.dataset_readers.open_ml_timming_dr.OpenMLTimingDatasetReader(dataset_id, imbalance, create_datasets=True, random_state=None, **kwargs)[source]¶

Bases: object

Reader for OpenML datasets that are specifically designed for timing- based attacks.

This class is designed to process datasets that involve side-channel attacks based on timing, such as the Bleichenbacher timing attack. It reads, cleans, and processes the dataset, and provides methods to create datasets with class imbalance to simulate attack scenarios.

Parameters:

dataset_id (int) – The ID of the OpenML dataset.
imbalance (float) – The ratio of the number of minority class samples to the number of majority class samples. Must be between 0 and 1.
create_datasets (bool, default=True) – If True, creates leakage datasets during initialization.
random_state (int or RandomState instance, optional) – Random state for reproducibility.
**kwargs (dict) – Additional keyword arguments.

logger¶

Logger instance for logging information.

Type:: logging.Logger

dataset_id¶

The ID of the OpenML dataset.

Type:: int

imbalance¶

The ratio of the number of minority class samples to the number of majority class samples.

Type:: float

random_state¶

Random state for reproducibility.

Type:: RandomState instance

correct_class¶

The correct class label, representing correctly formatted messages.

Type:: str

vulnerable_classes¶

List of class labels representing vulnerable (incorrectly formatted) messages.

Type:: list of str

n_features¶

Number of features in the dataset.

Type:: int

fold_id¶

The fold ID as specified in the dataset description.

Type:: int

delay¶

The delay associated with the timing attack in microseconds.

Type:: int

dataset_dictionary¶

A dictionary where keys are vulnerable class labels and values are tuples of (X, y) for the respective classes.

Type:: dict

Private Methods

---------------

__read_dataset__[source]¶: Reads the dataset from OpenML and extracts relevant information. This method fetches the dataset using the OpenML API, extracts the raw data, and processes the dataset description to retrieve vulnerable class labels, number of features, and server information.

__create_leakage_datasets__[source]¶: Creates separate datasets for each class by selecting only the samples that belong to the correct class and one vulnerable class at a time.

__clean_up_dataset__[source]¶: Cleans and preprocesses the dataset. This method encodes categorical columns, formats class labels, fills missing values, and convert class label strings to integer values.

get_data(class_label=1)[source]¶

Retrieves data for a specific class label.

Parameters:

class_label (int, default=1) – The class label for which to retrieve the data.

Returns:

X (array-like of shape (n_samples, n_features)) – Feature matrix.
y (array-like of shape (n_samples,)) – Target vector.

get_sampled_imbalanced_data(X, y)[source]¶

Creates an imbalanced dataset by sampling from the data.

Parameters:

X (array-like of shape (n_samples, n_features)) – Feature matrix.
y (array-like of shape (n_samples,)) – Target vector.

Returns:

X (array-like of shape (n_samples, n_features)) – Feature matrix after applying sampling to create imbalance.
y (array-like of shape (n_samples,)) – Target vector after applying sampling to create imbalance.