Configuring AutoML (Model Selection and Hyperparameter Optimization)


This notebook is part of the CaTabRa GitHub repository.

This short example demonstrates how to configure model selection and hyperparameter optimization when training prediction models in CaTabRa’s main data analysis workflow (in particular function analyze()). The following topics are covered:

Familiarity with CaTabRa’s main data analysis workflow is assumed. A step-by-step introduction can be found in CaTabRa Workflow.

Inspect Default Configuration

Let’s start by having a look at CaTabRa’s default configuration:

[1]:
from catabra.core import config
config.DEFAULT_CONFIG
[1]:
{'automl': 'auto-sklearn',
 'ensemble_size': 10,
 'ensemble_nbest': 10,
 'memory_limit': 3072,
 'time_limit': 1,
 'jobs': 1,
 'copy_analysis_data': False,
 'copy_evaluation_data': False,
 'static_plots': True,
 'interactive_plots': False,
 'bootstrapping_repetitions': 0,
 'explainer': 'shap',
 'binary_classification_metrics': ['roc_auc', 'accuracy', 'balanced_accuracy'],
 'multiclass_classification_metrics': ['accuracy', 'balanced_accuracy'],
 'multilabel_classification_metrics': ['f1_macro'],
 'regression_metrics': ['r2', 'mean_absolute_error', 'mean_squared_error'],
 'ood_class': 'autoencoder',
 'ood_source': 'internal',
 'ood_kwargs': {},
 'auto-sklearn_include': None,
 'auto-sklearn_exclude': None,
 'auto-sklearn_resampling_strategy': None,
 'auto-sklearn_resampling_strategy_arguments': None}

A detailed explanation of the individual config parameters can be found in Configuration. The parameters that control model selection and hyperparameter optimization in general appear at the top of the list:

  • "automl": Selected AutoML backend. By default, CaTabRa uses auto-sklearn.

  • "ensemble_size": Size of the final ensemble, i.e., the number of individual models to include. Combining models to an ensemble typically improves overall performance and so is activated by default. It can be disabled by setting this parameter to 1.

  • "ensemble_nbest": Number of individual models to consider for ensemble building.

  • "memory_limit": Memory limit for individual prediction models, in MB.

  • "time_limit": Time limit for overall model training, in minutes; negative means no time limit.

  • "jobs": Number of parallel jobs to use; negative means all available processors.

In addition, there are parameters specifically controlling the behavior of the auto-sklearn backend; they are described in detail here. Each of them is prefixed with "auto-sklearn_":

  • "auto-sklearn_include": Components that are included in hyperparameter optimization, for each step of the modeling pipeline. Useful for restricting the search space to a clearly-defined subset, e.g., incvolving only one single model class.

  • "auto-sklearn_exclude": Components that are excluded from hyperparameter optimization, for each step of the modeling pipeline. If both "auto-sklearn_include" and "auto-sklearn_exclude" are given, precisely those components appearing in the former and not appearing in the latter are included.

  • "auto-sklearn_resampling_strategy": The resampling strategy to use for internal validation, i.e., for estimating how well a model generalizes to unseen data. Most frequently used values are strings like "holdout" and "cv", but in principle any subclass of sklearn.model_selection.BaseCrossValidator can be provided.

  • "auto-sklearn_resampling_strategy_arguments": Additional arguments for the resampling strategy, like the number of folds in k-fold cross validation ("cv").

Change Configuration

Changing the AutoML configuration is easy: simply update the config dict when calling catabra.analysis.analyze(), as demonstrated below. We focus on a binary classification problem here, but everything applies equally to other prediction tasks.

[2]:
# load dataset
from sklearn.datasets import load_breast_cancer
X, y = load_breast_cancer(as_frame=True, return_X_y=True)
[3]:
# add target labels to DataFrame
X['diagnosis'] = y
[4]:
# split into train- and test set by adding column with corresponding values
# the name of the column is arbitrary; CaTabRa tries to "guess" which samples belong to which set based on the column name and -values
X['train'] = X.index <= 0.8 * len(X)

Keyword argument config of function analyze() allows to update the default config dict (catabra.core.config.DEFAULT_CONFIG). The value passed to config can be either a dict, or the path to a JSON file containing such a dict. The latter is especially useful on the command line.

NOTE The time limit ("time_limit") and number of parallel jobs ("jobs") can also be passed to analyze() directly, as keyword arguments time and jobs, respectively. If they are specified in both ways, the keyword arguments take precedence.

We now analyze data and train a classifier. Deviating from CaTabRa’s default setting, we set the time budget for AutoML to 3 minutes, use 2 parallel jobs, disable ensembling, restrict the model class to random forests, und employ 5-fold cross validation for internal validation.

[5]:
from catabra.analysis import analyze

analyze(
    X,
    classify='diagnosis',     # name of column containing classification target
    split='train',            # name of column containing information about the train-test split (optional)
    time=3,                   # time budget for hyperparameter tuning, in minutes (optional)
    jobs=2,                   # number of parallel jobs to use for model training (optional)
    out='automl_example',
    config={
        'ensemble_size': 1,
        'auto-sklearn_include': {
            'classifier': ['random_forest']
        },
        'auto-sklearn_resampling_strategy': 'cv',
        'auto-sklearn_resampling_strategy_arguments': {
            'folds': 5
        },
    }
)
[CaTabRa] ### Analysis started at 2023-02-08 08:48:57.530544
[CaTabRa] Saving descriptive statistics completed
[CaTabRa] Using AutoML-backend auto-sklearn for binary_classification
[CaTabRa] Successfully loaded the following auto-sklearn add-on module(s): xgb
/home/amaletzk/miniconda3/envs/catabra/lib/python3.9/site-packages/autosklearn/metalearning/metalearning/meta_base.py:68: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  self.metafeatures = self.metafeatures.append(metafeatures)
/home/amaletzk/miniconda3/envs/catabra/lib/python3.9/site-packages/autosklearn/metalearning/metalearning/meta_base.py:72: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  self.algorithm_runs[metric].append(runs)
[WARNING] [2023-02-08 08:49:08,760:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 178 not found
[WARNING] [2023-02-08 08:49:08,760:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 477 not found
[WARNING] [2023-02-08 08:49:08,760:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 33 not found
[WARNING] [2023-02-08 08:49:08,760:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 332 not found
[WARNING] [2023-02-08 08:49:08,760:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 567 not found
[WARNING] [2023-02-08 08:49:08,760:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 63 not found
[WARNING] [2023-02-08 08:49:08,760:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 407 not found
[WARNING] [2023-02-08 08:49:08,761:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 161 not found
[WARNING] [2023-02-08 08:49:08,761:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 256 not found
[WARNING] [2023-02-08 08:49:08,761:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 184 not found
[WARNING] [2023-02-08 08:49:08,761:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 601 not found
[WARNING] [2023-02-08 08:49:08,761:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 489 not found
[WARNING] [2023-02-08 08:49:08,761:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 328 not found
[WARNING] [2023-02-08 08:49:08,761:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 22 not found
[WARNING] [2023-02-08 08:49:08,761:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 9 not found
[WARNING] [2023-02-08 08:49:08,761:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 75 not found
[WARNING] [2023-02-08 08:49:08,761:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 222 not found
[WARNING] [2023-02-08 08:49:08,761:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 91 not found
[WARNING] [2023-02-08 08:49:08,761:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 702 not found
[WARNING] [2023-02-08 08:49:08,761:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 400 not found
[WARNING] [2023-02-08 08:49:08,761:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 211 not found
[WARNING] [2023-02-08 08:49:08,761:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 149 not found
[WARNING] [2023-02-08 08:49:08,761:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 187 not found
[WARNING] [2023-02-08 08:49:08,761:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 360 not found
[WARNING] [2023-02-08 08:49:08,761:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 467 not found
[WARNING] [2023-02-08 08:49:08,761:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 475 not found
[WARNING] [2023-02-08 08:49:08,761:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 6 not found
[WARNING] [2023-02-08 08:49:08,761:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 630 not found
[WARNING] [2023-02-08 08:49:08,761:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 692 not found
[WARNING] [2023-02-08 08:49:08,761:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 589 not found
[WARNING] [2023-02-08 08:49:08,761:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 505 not found
[WARNING] [2023-02-08 08:49:08,761:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 108 not found
[WARNING] [2023-02-08 08:49:08,762:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 79 not found
[WARNING] [2023-02-08 08:49:08,762:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 327 not found
[WARNING] [2023-02-08 08:49:08,762:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 660 not found
[WARNING] [2023-02-08 08:49:08,762:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 532 not found
[WARNING] [2023-02-08 08:49:08,762:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 383 not found
[WARNING] [2023-02-08 08:49:08,762:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 138 not found
[WARNING] [2023-02-08 08:49:08,762:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 454 not found
[WARNING] [2023-02-08 08:49:08,762:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 1 not found
[WARNING] [2023-02-08 08:49:08,762:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 147 not found
[WARNING] [2023-02-08 08:49:08,762:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 240 not found
[WARNING] [2023-02-08 08:49:08,762:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 347 not found
[WARNING] [2023-02-08 08:49:08,762:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 73 not found
[WARNING] [2023-02-08 08:49:08,762:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 518 not found
[WARNING] [2023-02-08 08:49:08,762:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 66 not found
[WARNING] [2023-02-08 08:49:08,762:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 694 not found
[WARNING] [2023-02-08 08:49:08,762:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 576 not found
[WARNING] [2023-02-08 08:49:08,762:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 657 not found
[WARNING] [2023-02-08 08:49:08,762:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 623 not found
[WARNING] [2023-02-08 08:49:08,762:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 605 not found
[WARNING] [2023-02-08 08:49:08,762:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 206 not found
[WARNING] [2023-02-08 08:49:08,762:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 668 not found
[WARNING] [2023-02-08 08:49:08,763:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 285 not found
[WARNING] [2023-02-08 08:49:08,763:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 700 not found
[WARNING] [2023-02-08 08:49:08,763:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 267 not found
[WARNING] [2023-02-08 08:49:08,763:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 28 not found
[WARNING] [2023-02-08 08:49:08,763:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 651 not found
[WARNING] [2023-02-08 08:49:08,763:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 674 not found
[WARNING] [2023-02-08 08:49:08,763:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 563 not found
[WARNING] [2023-02-08 08:49:08,763:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 338 not found
[WARNING] [2023-02-08 08:49:08,763:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 44 not found
[WARNING] [2023-02-08 08:49:08,763:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 573 not found
[WARNING] [2023-02-08 08:49:08,763:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 260 not found
[WARNING] [2023-02-08 08:49:08,763:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 371 not found
[WARNING] [2023-02-08 08:49:08,763:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 216 not found
[WARNING] [2023-02-08 08:49:08,763:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 430 not found
[WARNING] [2023-02-08 08:49:08,763:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 670 not found
[WARNING] [2023-02-08 08:49:08,763:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 388 not found
[WARNING] [2023-02-08 08:49:08,764:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 131 not found
[WARNING] [2023-02-08 08:49:08,764:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 688 not found
[WARNING] [2023-02-08 08:49:08,764:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 37 not found
[WARNING] [2023-02-08 08:49:08,764:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 678 not found
[WARNING] [2023-02-08 08:49:08,764:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 270 not found
[WARNING] [2023-02-08 08:49:08,764:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 114 not found
[WARNING] [2023-02-08 08:49:08,764:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 420 not found
[WARNING] [2023-02-08 08:49:08,764:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 524 not found
[WARNING] [2023-02-08 08:49:08,764:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 294 not found
[WARNING] [2023-02-08 08:49:08,764:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 155 not found
[WARNING] [2023-02-08 08:49:08,764:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 172 not found
[WARNING] [2023-02-08 08:49:08,764:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 634 not found
[WARNING] [2023-02-08 08:49:08,765:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 395 not found
[WARNING] [2023-02-08 08:49:08,765:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 166 not found
[WARNING] [2023-02-08 08:49:08,765:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 202 not found
[WARNING] [2023-02-08 08:49:08,765:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 549 not found
[WARNING] [2023-02-08 08:49:08,765:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 596 not found
[WARNING] [2023-02-08 08:49:08,765:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 320 not found
[WARNING] [2023-02-08 08:49:08,765:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 643 not found
[WARNING] [2023-02-08 08:49:08,765:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 503 not found
[WARNING] [2023-02-08 08:49:08,765:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 84 not found
[WARNING] [2023-02-08 08:49:08,765:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 273 not found
[WARNING] [2023-02-08 08:49:08,765:Client-AutoMLSMBO(42)::0bc579b4-a785-11ed-8065-00155dea5cfb] Configuration 150 not found
[CaTabRa] New model #1 trained:
    val_roc_auc: 0.990578
    val_accuracy: 0.945175
    val_balanced_accuracy: 0.941965
    train_roc_auc: 1.000000
    type: random_forest
    total_elapsed_time: 00:15
[CaTabRa] New model #2 trained:
    val_roc_auc: 0.981656
    val_accuracy: 0.940789
    val_balanced_accuracy: 0.938213
    train_roc_auc: 0.999876
    type: random_forest
    total_elapsed_time: 00:16
[CaTabRa] New model #3 trained:
    val_roc_auc: 0.983148
    val_accuracy: 0.942982
    val_balanced_accuracy: 0.941788
    train_roc_auc: 0.998510
    type: random_forest
    total_elapsed_time: 00:22
[CaTabRa] New model #4 trained:
    val_roc_auc: 0.991122
    val_accuracy: 0.953947
    val_balanced_accuracy: 0.952317
    train_roc_auc: 1.000000
    type: random_forest
    total_elapsed_time: 00:26
[CaTabRa] New model #5 trained:
    val_roc_auc: 0.976644
    val_accuracy: 0.940789
    val_balanced_accuracy: 0.934482
    train_roc_auc: 1.000000
    type: random_forest
    total_elapsed_time: 00:27
[CaTabRa] New model #6 trained:
    val_roc_auc: 0.989751
    val_accuracy: 0.942982
    val_balanced_accuracy: 0.940732
    train_roc_auc: 1.000000
    type: random_forest
    total_elapsed_time: 00:33
[CaTabRa] New model #7 trained:
    val_roc_auc: 0.990061
    val_accuracy: 0.953947
    val_balanced_accuracy: 0.950309
    train_roc_auc: 0.999975
    type: random_forest
    total_elapsed_time: 00:33
[CaTabRa] New model #8 trained:
    val_roc_auc: 0.978487
    val_accuracy: 0.929825
    val_balanced_accuracy: 0.927711
    train_roc_auc: 0.991623
    type: random_forest
    total_elapsed_time: 00:37
[CaTabRa] New model #9 trained:
    val_roc_auc: 0.987150
    val_accuracy: 0.942982
    val_balanced_accuracy: 0.940583
    train_roc_auc: 0.997875
    type: random_forest
    total_elapsed_time: 00:42
[CaTabRa] New model #10 trained:
    val_roc_auc: 0.981738
    val_accuracy: 0.942982
    val_balanced_accuracy: 0.939876
    train_roc_auc: 1.000000
    type: random_forest
    total_elapsed_time: 00:48
[CaTabRa] New model #11 trained:
    val_roc_auc: 0.980430
    val_accuracy: 0.914474
    val_balanced_accuracy: 0.910418
    train_roc_auc: 0.988126
    type: random_forest
    total_elapsed_time: 00:53
[CaTabRa] New model #12 trained:
    val_roc_auc: 0.990262
    val_accuracy: 0.949561
    val_balanced_accuracy: 0.946409
    train_roc_auc: 0.998611
    type: random_forest
    total_elapsed_time: 00:57
[CaTabRa] New model #13 trained:
    val_roc_auc: 0.976805
    val_accuracy: 0.942982
    val_balanced_accuracy: 0.938978
    train_roc_auc: 1.000000
    type: random_forest
    total_elapsed_time: 01:05
[CaTabRa] New model #14 trained:
    val_roc_auc: 0.992828
    val_accuracy: 0.958333
    val_balanced_accuracy: 0.956967
    train_roc_auc: 1.000000
    type: random_forest
    total_elapsed_time: 01:15
[CaTabRa] New model #15 trained:
    val_roc_auc: 0.984001
    val_accuracy: 0.938596
    val_balanced_accuracy: 0.936546
    train_roc_auc: 0.999552
    type: random_forest
    total_elapsed_time: 01:16
[CaTabRa] New model #16 trained:
    val_roc_auc: 0.990654
    val_accuracy: 0.958333
    val_balanced_accuracy: 0.956131
    train_roc_auc: 1.000000
    type: random_forest
    total_elapsed_time: 01:21
[CaTabRa] New model #17 trained:
    val_roc_auc: 0.989962
    val_accuracy: 0.942982
    val_balanced_accuracy: 0.940732
    train_roc_auc: 1.000000
    type: random_forest
    total_elapsed_time: 01:22
[CaTabRa] New model #18 trained:
    val_roc_auc: 0.989322
    val_accuracy: 0.942982
    val_balanced_accuracy: 0.941183
    train_roc_auc: 0.999957
    type: random_forest
    total_elapsed_time: 01:27
[CaTabRa] New model #19 trained:
    val_roc_auc: 0.984741
    val_accuracy: 0.934211
    val_balanced_accuracy: 0.930620
    train_roc_auc: 0.996336
    type: random_forest
    total_elapsed_time: 01:29
[CaTabRa] New model #20 trained:
    val_roc_auc: 0.992432
    val_accuracy: 0.958333
    val_balanced_accuracy: 0.953759
    train_roc_auc: 1.000000
    type: random_forest
    total_elapsed_time: 01:33
[CaTabRa] New model #21 trained:
    val_roc_auc: 0.925760
    val_accuracy: 0.859649
    val_balanced_accuracy: 0.857406
    train_roc_auc: 0.998794
    type: random_forest
    total_elapsed_time: 01:36
[CaTabRa] New model #22 trained:
    val_roc_auc: 0.965040
    val_accuracy: 0.872807
    val_balanced_accuracy: 0.852311
    train_roc_auc: 0.996168
    type: random_forest
    total_elapsed_time: 01:37
[CaTabRa] New model #23 trained:
    val_roc_auc: 0.987359
    val_accuracy: 0.942982
    val_balanced_accuracy: 0.944210
    train_roc_auc: 0.996610
    type: random_forest
    total_elapsed_time: 01:42
[CaTabRa] New model #24 trained:
    val_roc_auc: 0.989271
    val_accuracy: 0.949561
    val_balanced_accuracy: 0.949179
    train_roc_auc: 0.999066
    type: random_forest
    total_elapsed_time: 01:48
[CaTabRa] New model #25 trained:
    val_roc_auc: 0.991799
    val_accuracy: 0.960526
    val_balanced_accuracy: 0.957412
    train_roc_auc: 1.000000
    type: random_forest
    total_elapsed_time: 01:53
[CaTabRa] New model #26 trained:
    val_roc_auc: 0.987889
    val_accuracy: 0.942982
    val_balanced_accuracy: 0.939873
    train_roc_auc: 0.999981
    type: random_forest
    total_elapsed_time: 01:57
[CaTabRa] New model #27 trained:
    val_roc_auc: 0.987498
    val_accuracy: 0.947368
    val_balanced_accuracy: 0.942036
    train_roc_auc: 0.997146
    type: random_forest
    total_elapsed_time: 02:05
[CaTabRa] New model #28 trained:
    val_roc_auc: 0.993676
    val_accuracy: 0.958333
    val_balanced_accuracy: 0.957877
    train_roc_auc: 1.000000
    type: random_forest
    total_elapsed_time: 02:13
[CaTabRa] New model #29 trained:
    val_roc_auc: 0.994138
    val_accuracy: 0.960526
    val_balanced_accuracy: 0.959642
    train_roc_auc: 1.000000
    type: random_forest
    total_elapsed_time: 02:18
[CaTabRa] New model #30 trained:
    val_roc_auc: 0.990193
    val_accuracy: 0.951754
    val_balanced_accuracy: 0.950711
    train_roc_auc: 0.998380
    type: random_forest
    total_elapsed_time: 02:22
[CaTabRa] New model #31 trained:
    val_roc_auc: 0.970738
    val_accuracy: 0.907895
    val_balanced_accuracy: 0.912576
    train_roc_auc: 0.999298
    type: random_forest
    total_elapsed_time: 02:23
[CaTabRa] New model #32 trained:
    val_roc_auc: 0.924071
    val_accuracy: 0.875000
    val_balanced_accuracy: 0.867082
    train_roc_auc: 0.981786
    type: random_forest
    total_elapsed_time: 02:28
[CaTabRa] New model #33 trained:
    val_roc_auc: 0.973351
    val_accuracy: 0.918860
    val_balanced_accuracy: 0.910585
    train_roc_auc: 0.990283
    type: random_forest
    total_elapsed_time: 02:34
[CaTabRa] New model #34 trained:
    val_roc_auc: 0.983643
    val_accuracy: 0.934211
    val_balanced_accuracy: 0.932543
    train_roc_auc: 0.994613
    type: random_forest
    total_elapsed_time: 02:39
[CaTabRa] New model #35 trained:
    val_roc_auc: 0.987915
    val_accuracy: 0.949561
    val_balanced_accuracy: 0.949956
    train_roc_auc: 0.999957
    type: random_forest
    total_elapsed_time: 02:51
[CaTabRa] New model #36 trained:
    val_roc_auc: 0.993741
    val_accuracy: 0.962719
    val_balanced_accuracy: 0.960448
    train_roc_auc: 0.999994
    type: random_forest
    total_elapsed_time: 02:55
[CaTabRa] Final training statistics:
    n_models_trained: 36
    ensemble_val_roc_auc: 0.9948028673835125
[CaTabRa] Creating shap explainer
[CaTabRa] Initialized out-of-distribution detector of type Autoencoder
[CaTabRa] Fitting out-of-distribution detector...
Iteration 1, loss = 0.05343448
Iteration 2, loss = 0.03335893
Iteration 3, loss = 0.02118143
Iteration 4, loss = 0.01593868
Iteration 5, loss = 0.01312783
Iteration 6, loss = 0.01251363
Iteration 7, loss = 0.01232612
Iteration 8, loss = 0.01206451
Iteration 9, loss = 0.01188612
Iteration 10, loss = 0.01168201
Iteration 11, loss = 0.01155228
Iteration 12, loss = 0.01141600
Iteration 13, loss = 0.01138186
Iteration 14, loss = 0.01132852
Iteration 15, loss = 0.01125564
Iteration 16, loss = 0.01090519
Iteration 17, loss = 0.01009503
Iteration 18, loss = 0.00847805
Iteration 19, loss = 0.00790262
Iteration 20, loss = 0.00760164
Iteration 21, loss = 0.00693598
Iteration 22, loss = 0.00670540
Iteration 23, loss = 0.00616622
Iteration 24, loss = 0.01021525
Iteration 25, loss = 0.00701635
Iteration 26, loss = 0.00710143
Iteration 27, loss = 0.00652942
Iteration 28, loss = 0.00654574
Iteration 29, loss = 0.00662634
Iteration 30, loss = 0.00638759
Iteration 31, loss = 0.00610047
Iteration 32, loss = 0.00592693
Iteration 33, loss = 0.00584019
Iteration 34, loss = 0.00580886
Iteration 35, loss = 0.00565513
Iteration 36, loss = 0.00553230
Iteration 37, loss = 0.00552978
Iteration 38, loss = 0.00542836
Iteration 39, loss = 0.00535562
Iteration 40, loss = 0.00531400
Iteration 41, loss = 0.00528915
Iteration 42, loss = 0.00525538
Iteration 43, loss = 0.00526631
Iteration 44, loss = 0.00523167
Iteration 45, loss = 0.00521995
Iteration 46, loss = 0.00520238
Iteration 47, loss = 0.00520256
Iteration 48, loss = 0.00518122
Iteration 49, loss = 0.00516981
Iteration 50, loss = 0.00518312
Iteration 51, loss = 0.00517175
Iteration 52, loss = 0.00515201
Iteration 53, loss = 0.00514509
Iteration 54, loss = 0.00513804
Iteration 55, loss = 0.00514761
Iteration 56, loss = 0.00523686
Iteration 57, loss = 0.00522094
Iteration 58, loss = 0.00517241
Iteration 59, loss = 0.00517111
Iteration 60, loss = 0.00515946
Iteration 61, loss = 0.00515752
Iteration 62, loss = 0.00516861
Iteration 63, loss = 0.00520753
Iteration 64, loss = 0.00517113
Iteration 65, loss = 0.00515843
Iteration 66, loss = 0.00515102
Iteration 67, loss = 0.00515684
Iteration 68, loss = 0.00520826
Iteration 69, loss = 0.00523546
Iteration 70, loss = 0.00526605
Iteration 71, loss = 0.00525350
Iteration 72, loss = 0.00524780
Iteration 73, loss = 0.00521648
Iteration 74, loss = 0.00522342
Iteration 75, loss = 0.00520578
Iteration 76, loss = 0.00513097
Iteration 77, loss = 0.00509752
Iteration 78, loss = 0.00511022
Iteration 79, loss = 0.00511949
Iteration 80, loss = 0.00514209
Iteration 81, loss = 0.00513007
Iteration 82, loss = 0.00511384
Iteration 83, loss = 0.00516374
Iteration 84, loss = 0.00511610
Iteration 85, loss = 0.00514120
Iteration 86, loss = 0.00511401
Iteration 87, loss = 0.00512188
Iteration 88, loss = 0.00511960
Iteration 89, loss = 0.00516426
Training loss did not improve more than tol=0.000100 for 50 consecutive epochs. Stopping.
[CaTabRa] Out-of-distribution detector fitted.
[CaTabRa] ### Analysis finished at 2023-02-08 08:52:07.889648
[CaTabRa] ### Elapsed time: 0 days 00:03:10.359104
[CaTabRa] ### Output saved in /mnt/c/Users/amaletzk/Documents/CaTabRa/catabra/examples/automl_example
[CaTabRa] ### Evaluation started at 2023-02-08 08:52:07.940771
[CaTabRa] Predicting out-of-distribution samples.
[CaTabRa] Saving descriptive statistics completed
[CaTabRa] Saving descriptive statistics completed
[CaTabRa] Evaluation results for train:
    roc_auc: 1.0
    accuracy @ 0.5: 1.0
    balanced_accuracy @ 0.5: 1.0
[CaTabRa] Evaluation results for not_train:
    roc_auc: 0.997789566755084
    accuracy @ 0.5: 0.9646017699115044
    balanced_accuracy @ 0.5: 0.963527851458886
[CaTabRa] ### Evaluation finished at 2023-02-08 08:52:14.246114
[CaTabRa] ### Elapsed time: 0 days 00:00:06.305343
[CaTabRa] ### Output saved in /mnt/c/Users/amaletzk/Documents/CaTabRa/catabra/examples/automl_example/eval

Grouped Splitting

CaTabRa natively supports grouped splitting/resampling. That means, all samples are assigned to groups, and when splitting/resampling the data for internal validation all samples belonging to the same group are ensured to be put into either the training- or the validation set. Refer to the scikit-learn user guide for details.

To activate grouped splitting, all one needs to to is add a column with the corresponding grouping information to the data table and inform CaTabRa about it. There is no need to adjust the resampling strategy; this is taken care of automatically if the resampling strategy is given as a string, like "holdout" or "cv".

[8]:
import numpy as np
X['group'] = np.random.randint(50, size=len(X))
[9]:
analyze(
    X,
    classify='diagnosis',
    group='group',              # name of the column to use for grouping
    split='train',
    time=1,
    jobs=1,
    out='automl_grouping_example',
    config={
        'ensemble_size': 1,
        'auto-sklearn_include': {
            'classifier': ['random_forest']
        },
        'auto-sklearn_resampling_strategy': 'cv',
        'auto-sklearn_resampling_strategy_arguments': {
            'folds': 5
        },
    }
)
[CaTabRa] ### Analysis started at 2023-02-08 09:18:22.258738
[CaTabRa warning] 43 groups in "not_train" overlap with training set
[CaTabRa] Saving descriptive statistics completed
[CaTabRa] Using AutoML-backend auto-sklearn for binary_classification
[CaTabRa] Successfully loaded the following auto-sklearn add-on module(s): xgb
[WARNING] [2023-02-08 09:18:23,631:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 178 not found
[WARNING] [2023-02-08 09:18:23,631:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 477 not found
[WARNING] [2023-02-08 09:18:23,631:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 33 not found
[WARNING] [2023-02-08 09:18:23,631:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 332 not found
[WARNING] [2023-02-08 09:18:23,631:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 567 not found
[WARNING] [2023-02-08 09:18:23,631:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 63 not found
[WARNING] [2023-02-08 09:18:23,631:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 407 not found
[WARNING] [2023-02-08 09:18:23,631:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 161 not found
[WARNING] [2023-02-08 09:18:23,631:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 256 not found
[WARNING] [2023-02-08 09:18:23,631:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 184 not found
[WARNING] [2023-02-08 09:18:23,631:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 601 not found
[WARNING] [2023-02-08 09:18:23,631:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 489 not found
[WARNING] [2023-02-08 09:18:23,631:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 328 not found
[WARNING] [2023-02-08 09:18:23,631:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 22 not found
[WARNING] [2023-02-08 09:18:23,631:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 9 not found
[WARNING] [2023-02-08 09:18:23,631:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 75 not found
[WARNING] [2023-02-08 09:18:23,631:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 222 not found
[WARNING] [2023-02-08 09:18:23,631:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 91 not found
[WARNING] [2023-02-08 09:18:23,631:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 702 not found
[WARNING] [2023-02-08 09:18:23,631:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 400 not found
[WARNING] [2023-02-08 09:18:23,631:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 211 not found
[WARNING] [2023-02-08 09:18:23,631:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 149 not found
[WARNING] [2023-02-08 09:18:23,631:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 187 not found
[WARNING] [2023-02-08 09:18:23,631:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 360 not found
[WARNING] [2023-02-08 09:18:23,631:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 467 not found
[WARNING] [2023-02-08 09:18:23,631:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 475 not found
[WARNING] [2023-02-08 09:18:23,631:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 6 not found
[WARNING] [2023-02-08 09:18:23,632:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 630 not found
[WARNING] [2023-02-08 09:18:23,632:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 692 not found
[WARNING] [2023-02-08 09:18:23,632:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 589 not found
[WARNING] [2023-02-08 09:18:23,632:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 79 not found
[WARNING] [2023-02-08 09:18:23,632:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 327 not found
[WARNING] [2023-02-08 09:18:23,632:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 505 not found
[WARNING] [2023-02-08 09:18:23,632:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 108 not found
The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
[WARNING] [2023-02-08 09:18:23,632:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 660 not found
[WARNING] [2023-02-08 09:18:23,632:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 532 not found
[WARNING] [2023-02-08 09:18:23,632:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 383 not found
[WARNING] [2023-02-08 09:18:23,632:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 138 not found
[WARNING] [2023-02-08 09:18:23,632:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 454 not found
[WARNING] [2023-02-08 09:18:23,632:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 1 not found
[WARNING] [2023-02-08 09:18:23,632:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 147 not found
[WARNING] [2023-02-08 09:18:23,632:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 240 not found
[WARNING] [2023-02-08 09:18:23,632:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 347 not found
[WARNING] [2023-02-08 09:18:23,632:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 73 not found
[WARNING] [2023-02-08 09:18:23,632:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 518 not found
[WARNING] [2023-02-08 09:18:23,632:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 66 not found
[WARNING] [2023-02-08 09:18:23,632:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 694 not found
[WARNING] [2023-02-08 09:18:23,632:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 576 not found
[WARNING] [2023-02-08 09:18:23,632:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 657 not found
[WARNING] [2023-02-08 09:18:23,632:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 623 not found
[WARNING] [2023-02-08 09:18:23,632:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 605 not found
[WARNING] [2023-02-08 09:18:23,632:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 206 not found
[WARNING] [2023-02-08 09:18:23,632:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 668 not found
[WARNING] [2023-02-08 09:18:23,632:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 285 not found
[WARNING] [2023-02-08 09:18:23,632:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 700 not found
[WARNING] [2023-02-08 09:18:23,632:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 267 not found
[WARNING] [2023-02-08 09:18:23,632:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 28 not found
[WARNING] [2023-02-08 09:18:23,633:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 651 not found
[WARNING] [2023-02-08 09:18:23,633:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 674 not found
[WARNING] [2023-02-08 09:18:23,633:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 563 not found
[WARNING] [2023-02-08 09:18:23,633:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 338 not found
[WARNING] [2023-02-08 09:18:23,633:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 44 not found
[WARNING] [2023-02-08 09:18:23,633:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 573 not found
[WARNING] [2023-02-08 09:18:23,633:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 260 not found
[WARNING] [2023-02-08 09:18:23,633:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 371 not found
[WARNING] [2023-02-08 09:18:23,633:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 216 not found
[WARNING] [2023-02-08 09:18:23,633:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 430 not found
[WARNING] [2023-02-08 09:18:23,633:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 670 not found
[WARNING] [2023-02-08 09:18:23,633:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 388 not found
[WARNING] [2023-02-08 09:18:23,633:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 131 not found
[WARNING] [2023-02-08 09:18:23,633:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 688 not found
[WARNING] [2023-02-08 09:18:23,633:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 37 not found
[WARNING] [2023-02-08 09:18:23,633:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 678 not found
[WARNING] [2023-02-08 09:18:23,633:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 270 not found
[WARNING] [2023-02-08 09:18:23,633:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 114 not found
[WARNING] [2023-02-08 09:18:23,633:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 420 not found
[WARNING] [2023-02-08 09:18:23,633:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 524 not found
[WARNING] [2023-02-08 09:18:23,633:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 294 not found
[WARNING] [2023-02-08 09:18:23,633:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 155 not found
[WARNING] [2023-02-08 09:18:23,633:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 172 not found
[WARNING] [2023-02-08 09:18:23,633:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 634 not found
[WARNING] [2023-02-08 09:18:23,634:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 395 not found
[WARNING] [2023-02-08 09:18:23,634:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 166 not found
[WARNING] [2023-02-08 09:18:23,634:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 202 not found
[WARNING] [2023-02-08 09:18:23,634:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 549 not found
[WARNING] [2023-02-08 09:18:23,634:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 596 not found
[WARNING] [2023-02-08 09:18:23,634:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 320 not found
[WARNING] [2023-02-08 09:18:23,634:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 643 not found
[WARNING] [2023-02-08 09:18:23,634:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 503 not found
[WARNING] [2023-02-08 09:18:23,634:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 84 not found
[WARNING] [2023-02-08 09:18:23,634:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 273 not found
[WARNING] [2023-02-08 09:18:23,634:Client-AutoMLSMBO(1)::272b8846-a789-11ed-8065-00155dea5cfb] Configuration 150 not found
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.989207
    n_constituent_models: 1
    total_elapsed_time: 00:06
[CaTabRa] New model #1 trained:
    val_roc_auc: 0.990527
    val_accuracy: 0.956140
    val_balanced_accuracy: 0.954456
    train_roc_auc: 1.000000
    type: random_forest
    total_elapsed_time: 00:06
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.989207
    n_constituent_models: 1
    total_elapsed_time: 00:11
[CaTabRa] New model #2 trained:
    val_roc_auc: 0.984413
    val_accuracy: 0.951754
    val_balanced_accuracy: 0.951015
    train_roc_auc: 0.999938
    type: random_forest
    total_elapsed_time: 00:11
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.989207
    n_constituent_models: 1
    total_elapsed_time: 00:24
[CaTabRa] New model #3 trained:
    val_roc_auc: 0.983311
    val_accuracy: 0.945175
    val_balanced_accuracy: 0.945531
    train_roc_auc: 0.998575
    type: random_forest
    total_elapsed_time: 00:24
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.989207
    n_constituent_models: 1
    total_elapsed_time: 00:29
[CaTabRa] New model #4 trained:
    val_roc_auc: 0.985170
    val_accuracy: 0.934211
    val_balanced_accuracy: 0.930524
    train_roc_auc: 1.000000
    type: random_forest
    total_elapsed_time: 00:29
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.989207
    n_constituent_models: 1
    total_elapsed_time: 00:35
[CaTabRa] New model #5 trained:
    val_roc_auc: 0.987843
    val_accuracy: 0.958333
    val_balanced_accuracy: 0.956501
    train_roc_auc: 0.999981
    type: random_forest
    total_elapsed_time: 00:34
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.989765
    n_constituent_models: 1
    total_elapsed_time: 00:40
[CaTabRa] New model #6 trained:
    val_roc_auc: 0.991570
    val_accuracy: 0.960526
    val_balanced_accuracy: 0.959362
    train_roc_auc: 1.000000
    type: random_forest
    total_elapsed_time: 00:40
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.989765
    n_constituent_models: 1
    total_elapsed_time: 00:44
[CaTabRa] New model #7 trained:
    val_roc_auc: 0.981361
    val_accuracy: 0.925439
    val_balanced_accuracy: 0.925783
    train_roc_auc: 0.991716
    type: random_forest
    total_elapsed_time: 00:44
[CaTabRa] Final training statistics:
    n_models_trained: 7
    ensemble_val_roc_auc: 0.9897650338510553
[CaTabRa] Creating shap explainer
[CaTabRa] Initialized out-of-distribution detector of type Autoencoder
[CaTabRa] Fitting out-of-distribution detector...
Iteration 1, loss = 0.07226690
Iteration 2, loss = 0.03080995
Iteration 3, loss = 0.01865199
Iteration 4, loss = 0.01413740
Iteration 5, loss = 0.01299999
Iteration 6, loss = 0.01277603
Iteration 7, loss = 0.01231875
Iteration 8, loss = 0.01203157
Iteration 9, loss = 0.01181357
Iteration 10, loss = 0.01160450
Iteration 11, loss = 0.01142577
Iteration 12, loss = 0.01129933
Iteration 13, loss = 0.01121611
Iteration 14, loss = 0.01108283
Iteration 15, loss = 0.01075400
Iteration 16, loss = 0.01028424
Iteration 17, loss = 0.00990511
Iteration 18, loss = 0.00933364
Iteration 19, loss = 0.00876849
Iteration 20, loss = 0.00834088
Iteration 21, loss = 0.00793186
Iteration 22, loss = 0.00753125
Iteration 23, loss = 0.00710350
Iteration 24, loss = 0.00715539
Iteration 25, loss = 0.00680348
Iteration 26, loss = 0.00656715
Iteration 27, loss = 0.00640634
Iteration 28, loss = 0.00604455
Iteration 29, loss = 0.00590793
Iteration 30, loss = 0.00573342
Iteration 31, loss = 0.00578921
Iteration 32, loss = 0.00584300
Iteration 33, loss = 0.00621820
Iteration 34, loss = 0.00593511
Iteration 35, loss = 0.00576550
Iteration 36, loss = 0.00568527
Iteration 37, loss = 0.00563653
Iteration 38, loss = 0.00552953
Iteration 39, loss = 0.00554856
Iteration 40, loss = 0.00552326
Iteration 41, loss = 0.00542966
Iteration 42, loss = 0.00538754
Iteration 43, loss = 0.00534680
Iteration 44, loss = 0.00530251
Iteration 45, loss = 0.00528413
Iteration 46, loss = 0.00527366
Iteration 47, loss = 0.00526739
Iteration 48, loss = 0.00525395
Iteration 49, loss = 0.00524411
Iteration 50, loss = 0.00524560
Iteration 51, loss = 0.00524680
Iteration 52, loss = 0.00522649
Iteration 53, loss = 0.00523373
Iteration 54, loss = 0.00522288
Iteration 55, loss = 0.00522254
Iteration 56, loss = 0.00522273
Iteration 57, loss = 0.00522365
Iteration 58, loss = 0.00521429
Iteration 59, loss = 0.00521640
Iteration 60, loss = 0.00520363
Iteration 61, loss = 0.00520393
Iteration 62, loss = 0.00521063
Iteration 63, loss = 0.00521127
Iteration 64, loss = 0.00520559
Iteration 65, loss = 0.00519667
Iteration 66, loss = 0.00520940
Iteration 67, loss = 0.00519600
Iteration 68, loss = 0.00518769
Iteration 69, loss = 0.00518342
Iteration 70, loss = 0.00519698
Iteration 71, loss = 0.00519225
Iteration 72, loss = 0.00520492
Iteration 73, loss = 0.00519966
Iteration 74, loss = 0.00518665
Iteration 75, loss = 0.00518489
Iteration 76, loss = 0.00518413
Iteration 77, loss = 0.00524593
Iteration 78, loss = 0.00518972
Iteration 79, loss = 0.00519518
Iteration 80, loss = 0.00516761
Iteration 81, loss = 0.00517987
Iteration 82, loss = 0.00515879
Iteration 83, loss = 0.00515755
Iteration 84, loss = 0.00516150
Iteration 85, loss = 0.00517317
Iteration 86, loss = 0.00517116
Iteration 87, loss = 0.00518954
Iteration 88, loss = 0.00518423
Iteration 89, loss = 0.00516772
Training loss did not improve more than tol=0.000100 for 50 consecutive epochs. Stopping.
[CaTabRa] Out-of-distribution detector fitted.
[CaTabRa] ### Analysis finished at 2023-02-08 09:19:20.887679
[CaTabRa] ### Elapsed time: 0 days 00:00:58.628941
[CaTabRa] ### Output saved in /mnt/c/Users/amaletzk/Documents/CaTabRa/catabra/examples/automl_grouping_example
[CaTabRa] ### Evaluation started at 2023-02-08 09:19:20.890291
[CaTabRa] Predicting out-of-distribution samples.
[CaTabRa] Saving descriptive statistics completed
[CaTabRa] Saving descriptive statistics completed
[CaTabRa] Evaluation results for train:
    roc_auc: 1.0
    accuracy @ 0.5: 0.9956140350877193
    balanced_accuracy @ 0.5: 0.9946236559139785
[CaTabRa] Evaluation results for not_train:
    roc_auc: 1.0
    accuracy @ 0.5: 0.9734513274336283
    balanced_accuracy @ 0.5: 0.9827586206896552
[CaTabRa] ### Evaluation finished at 2023-02-08 09:19:25.823857
[CaTabRa] ### Elapsed time: 0 days 00:00:04.933566
[CaTabRa] ### Output saved in /mnt/c/Users/amaletzk/Documents/CaTabRa/catabra/examples/automl_grouping_example/eval

In the output above, note the warning that some groups in "not_train" overlap with the training set. This warning is shown because we randomly assigned samples to groups, ignoring the existing train-test split. In a real use-case the train-test split should respect the given grouping.

NOTE If no grouping is specified when calling analyze(), samples are implicitly grouped by the row index of the data table. Hence, if you do not want to group samples, ensure a unique row index.