Performance Metrics

This notebook is part of the CaTabRa GitHub repository.

This short example demonstrates how to change the hyperparameter training objective and the metrics reported during training. We focus on binary classification here, but everything applies equally to multiclass- and multilabel classification, and regression.

Familiarity with CaTabRa’s main data analysis workflow is assumed. A step-by-step introduction can be found in CaTabRa Workflow.

Inspect Default Metrics

For each of the prediction tasks supported by CaTabRa, a default metric is optimized during hyperparameter tuning. In the case of binary classification this is ROC-AUC, the area under the Receiver Operating Characteristic curve, as can be seen when inspecting catabra.core.config.DEFAULT_CONFIG:

[2]:

from catabra.core import config
config.DEFAULT_CONFIG

[2]:

{'automl': 'auto-sklearn',
 'ensemble_size': 10,
 'ensemble_nbest': 10,
 'memory_limit': 3072,
 'time_limit': 1,
 'jobs': 1,
 'copy_analysis_data': False,
 'copy_evaluation_data': False,
 'static_plots': True,
 'interactive_plots': False,
 'bootstrapping_repetitions': 0,
 'explainer': 'shap',
 'binary_classification_metrics': ['roc_auc', 'accuracy', 'balanced_accuracy'],
 'multiclass_classification_metrics': ['accuracy', 'balanced_accuracy'],
 'multilabel_classification_metrics': ['f1_macro'],
 'regression_metrics': ['r2', 'mean_absolute_error', 'mean_squared_error'],
 'ood_class': 'autoencoder',
 'ood_source': 'internal',
 'ood_kwargs': {},
 'auto-sklearn_include': None,
 'auto-sklearn_exclude': None,
 'auto-sklearn_resampling_strategy': None,
 'auto-sklearn_resampling_strategy_arguments': None}

The binary classification metrics are listed under "binary_classification_metrics". The first entry in the list is the hyperparameter optimization objective, the remaining entries are additional metrics reported during model training. Likewise, "multiclass_classification_metrics", "multilabel_classification_metrics" and "regression_metrics" contain the same information for the other prediction tasks.

NOTE For more information about the possible config parameters and their meaning, please refer to Configuration.

Change Metrics

Changing the optimization objective and/or list of metrics reported during model training is easy: simply update the config dict when calling catabra.analysis.analyze(), as demonstrated below.

[3]:

# load dataset
from sklearn.datasets import load_breast_cancer
X, y = load_breast_cancer(as_frame=True, return_X_y=True)

[4]:

# add target labels to DataFrame
X['diagnosis'] = y

[5]:

# split into train- and test set by adding column with corresponding values
# the name of the column is arbitrary; CaTabRa tries to "guess" which samples belong to which set based on the column name and -values
X['train'] = X.index <= 0.8 * len(X)

Keyword argument config of function analyze() allows to update the default config dict. In this example, we use it to specify different binary classification metrics. The value passed to config can be either a dict, or the path to a JSON file containing such a dict. The latter is especially useful on the command line.

[6]:

from catabra.analysis import analyze

analyze(
    X,
    classify='diagnosis',     # name of column containing classification target
    split='train',            # name of column containing information about the train-test split (optional)
    time=1,                   # time budget for hyperparameter tuning, in minutes (optional)
    out='performance_metrics',
    config={
        'binary_classification_metrics': ['f1', 'sensitivity', 'specificity']
    }
)

[CaTabRa] ### Analysis started at 2023-02-07 12:50:54.424329
[CaTabRa] Saving descriptive statistics completed
[CaTabRa] Using AutoML-backend auto-sklearn for binary_classification
[CaTabRa] Successfully loaded the following auto-sklearn add-on module(s): xgb

/home/amaletzk/miniconda3/envs/catabra/lib/python3.9/site-packages/autosklearn/metalearning/metalearning/meta_base.py:68: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  self.metafeatures = self.metafeatures.append(metafeatures)
/home/amaletzk/miniconda3/envs/catabra/lib/python3.9/site-packages/autosklearn/metalearning/metalearning/meta_base.py:72: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  self.algorithm_runs[metric].append(runs)

[CaTabRa] New ensemble fitted:
    ensemble_val_f1: 0.937143
    n_constituent_models: 1
    total_elapsed_time: 00:04
[CaTabRa] New model #1 trained:
    val_f1: 0.937143
    val_sensitivity: 0.921348
    val_specificity: 0.935484
    train_f1: 1.000000
    type: random_forest
    total_elapsed_time: 00:04
[CaTabRa] New ensemble fitted:
    ensemble_val_f1: 0.966667
    n_constituent_models: 2
    total_elapsed_time: 00:05
[CaTabRa] New model #2 trained:
    val_f1: 0.961326
    val_sensitivity: 0.977528
    val_specificity: 0.919355
    train_f1: 0.983607
    type: mlp
    total_elapsed_time: 00:05
[CaTabRa] New ensemble fitted:
    ensemble_val_f1: 0.961326
    n_constituent_models: 2
    total_elapsed_time: 00:07
[CaTabRa] New model #3 trained:
    val_f1: 0.937143
    val_sensitivity: 0.921348
    val_specificity: 0.935484
    train_f1: 0.989011
    type: random_forest
    total_elapsed_time: 00:07
[CaTabRa] New ensemble fitted:
    ensemble_val_f1: 0.961326
    n_constituent_models: 3
    total_elapsed_time: 00:08
[CaTabRa] New model #4 trained:
    val_f1: 0.931034
    val_sensitivity: 0.910112
    val_specificity: 0.935484
    train_f1: 0.989011
    type: random_forest
    total_elapsed_time: 00:08
[CaTabRa] New ensemble fitted:
    ensemble_val_f1: 0.966667
    n_constituent_models: 3
    total_elapsed_time: 00:10
[CaTabRa] New model #5 trained:
    val_f1: 0.935673
    val_sensitivity: 0.898876
    val_specificity: 0.967742
    train_f1: 0.991690
    type: extra_trees
    total_elapsed_time: 00:09
[CaTabRa] New ensemble fitted:
    ensemble_val_f1: 0.966667
    n_constituent_models: 3
    total_elapsed_time: 00:11
[CaTabRa] New model #6 trained:
    val_f1: 0.943182
    val_sensitivity: 0.932584
    val_specificity: 0.935484
    train_f1: 1.000000
    type: gradient_boosting
    total_elapsed_time: 00:10
[CaTabRa] New ensemble fitted:
    ensemble_val_f1: 0.966667
    n_constituent_models: 3
    total_elapsed_time: 00:12
[CaTabRa] New model #7 trained:
    val_f1: 0.948571
    val_sensitivity: 0.932584
    val_specificity: 0.951613
    train_f1: 0.983516
    type: extra_trees
    total_elapsed_time: 00:12
[CaTabRa] New ensemble fitted:
    ensemble_val_f1: 0.966667
    n_constituent_models: 3
    total_elapsed_time: 00:13
[CaTabRa] New model #8 trained:
    val_f1: 0.955056
    val_sensitivity: 0.955056
    val_specificity: 0.935484
    train_f1: 1.000000
    type: gradient_boosting
    total_elapsed_time: 00:13
[CaTabRa] New ensemble fitted:
    ensemble_val_f1: 0.966667
    n_constituent_models: 5
    total_elapsed_time: 00:14
[CaTabRa] New model #9 trained:
    val_f1: 0.960894
    val_sensitivity: 0.966292
    val_specificity: 0.935484
    train_f1: 0.978142
    type: mlp
    total_elapsed_time: 00:14
[CaTabRa] New ensemble fitted:
    ensemble_val_f1: 0.966667
    n_constituent_models: 5
    total_elapsed_time: 00:16
[CaTabRa] New model #10 trained:
    val_f1: 0.931818
    val_sensitivity: 0.921348
    val_specificity: 0.919355
    train_f1: 1.000000
    type: random_forest
    total_elapsed_time: 00:16
[CaTabRa] New ensemble fitted:
    ensemble_val_f1: 0.977778
    n_constituent_models: 5
    total_elapsed_time: 00:17
[CaTabRa] New model #11 trained:
    val_f1: 0.966667
    val_sensitivity: 0.977528
    val_specificity: 0.935484
    train_f1: 1.000000
    type: mlp
    total_elapsed_time: 00:17
[CaTabRa] New model #12 trained:
    val_f1: 0.927374
    val_sensitivity: 0.932584
    val_specificity: 0.887097
    train_f1: 1.000000
    type: gradient_boosting
    total_elapsed_time: 00:18
[CaTabRa] New ensemble fitted:
    ensemble_val_f1: 0.983240
    n_constituent_models: 6
    total_elapsed_time: 00:21
[CaTabRa] New model #13 trained:
    val_f1: 0.937143
    val_sensitivity: 0.921348
    val_specificity: 0.935484
    train_f1: 0.997230
    type: extra_trees
    total_elapsed_time: 00:21
[CaTabRa] New ensemble fitted:
    ensemble_val_f1: 0.983240
    n_constituent_models: 6
    total_elapsed_time: 00:22
[CaTabRa] New model #14 trained:
    val_f1: 0.956044
    val_sensitivity: 0.977528
    val_specificity: 0.903226
    train_f1: 0.991736
    type: lda
    total_elapsed_time: 00:21
[CaTabRa] New ensemble fitted:
    ensemble_val_f1: 0.977778
    n_constituent_models: 8
    total_elapsed_time: 00:23
[CaTabRa] New model #15 trained:
    val_f1: 0.961326
    val_sensitivity: 0.977528
    val_specificity: 0.919355
    train_f1: 0.991781
    type: mlp
    total_elapsed_time: 00:22
[CaTabRa] New ensemble fitted:
    ensemble_val_f1: 0.977778
    n_constituent_models: 8
    total_elapsed_time: 00:23
[CaTabRa] New model #16 trained:
    val_f1: 0.945055
    val_sensitivity: 0.966292
    val_specificity: 0.887097
    train_f1: 0.978022
    type: sgd
    total_elapsed_time: 00:23
[CaTabRa] New model #17 trained:
    val_f1: 0.934066
    val_sensitivity: 0.955056
    val_specificity: 0.870968
    train_f1: 1.000000
    type: adaboost
    total_elapsed_time: 00:25
[CaTabRa] New model #18 trained:
    val_f1: 0.926554
    val_sensitivity: 0.921348
    val_specificity: 0.903226
    train_f1: 1.000000
    type: adaboost
    total_elapsed_time: 00:26
[CaTabRa] New model #19 trained:
    val_f1: 0.937143
    val_sensitivity: 0.921348
    val_specificity: 0.935484
    train_f1: 0.997245
    type: random_forest
    total_elapsed_time: 00:27
[CaTabRa] New model #20 trained:
    val_f1: 0.898876
    val_sensitivity: 0.898876
    val_specificity: 0.854839
    train_f1: 1.000000
    type: k_nearest_neighbors
    total_elapsed_time: 00:28
[CaTabRa] New ensemble fitted:
    ensemble_val_f1: 0.977778
    n_constituent_models: 7
    total_elapsed_time: 00:29
[CaTabRa] New model #21 trained:
    val_f1: 0.966667
    val_sensitivity: 0.977528
    val_specificity: 0.935484
    train_f1: 1.000000
    type: gradient_boosting
    total_elapsed_time: 00:29
[CaTabRa] New model #22 trained:
    val_f1: 0.925714
    val_sensitivity: 0.910112
    val_specificity: 0.919355
    train_f1: 0.997245
    type: random_forest
    total_elapsed_time: 00:30
[CaTabRa] New ensemble fitted:
    ensemble_val_f1: 0.977778
    n_constituent_models: 5
    total_elapsed_time: 00:33
[CaTabRa] New model #23 trained:
    val_f1: 0.960894
    val_sensitivity: 0.966292
    val_specificity: 0.935484
    train_f1: 0.994444
    type: mlp
    total_elapsed_time: 00:33
[CaTabRa] New ensemble fitted:
    ensemble_val_f1: 0.983240
    n_constituent_models: 6
    total_elapsed_time: 00:34
[CaTabRa] New model #24 trained:
    val_f1: 0.949721
    val_sensitivity: 0.955056
    val_specificity: 0.919355
    train_f1: 1.000000
    type: gradient_boosting
    total_elapsed_time: 00:34
[CaTabRa] New model #25 trained:
    val_f1: 0.741667
    val_sensitivity: 1.000000
    val_specificity: 0.000000
    train_f1: 0.744856
    type: bernoulli_nb
    total_elapsed_time: 00:37
[CaTabRa] New ensemble fitted:
    ensemble_val_f1: 0.983425
    n_constituent_models: 7
    total_elapsed_time: 00:40
[CaTabRa] New model #26 trained:
    val_f1: 0.971751
    val_sensitivity: 0.966292
    val_specificity: 0.967742
    train_f1: 1.000000
    type: mlp
    total_elapsed_time: 00:40
[CaTabRa] New model #27 trained:
    val_f1: 0.913295
    val_sensitivity: 0.887640
    val_specificity: 0.919355
    train_f1: 0.963989
    type: adaboost
    total_elapsed_time: 00:41
[CaTabRa] New ensemble fitted:
    ensemble_val_f1: 0.988889
    n_constituent_models: 5
    total_elapsed_time: 00:45
[CaTabRa] New model #28 trained:
    val_f1: 0.950276
    val_sensitivity: 0.966292
    val_specificity: 0.903226
    train_f1: 0.975342
    type: passive_aggressive
    total_elapsed_time: 00:44
[CaTabRa] New model #29 trained:
    val_f1: 0.106383
    val_sensitivity: 0.056180
    val_specificity: 1.000000
    train_f1: 0.021858
    type: bernoulli_nb
    total_elapsed_time: 00:45
[CaTabRa] New model #30 trained:
    val_f1: 0.930481
    val_sensitivity: 0.977528
    val_specificity: 0.822581
    train_f1: 0.949333
    type: lda
    total_elapsed_time: 00:48
[CaTabRa] New model #31 trained:
    val_f1: 0.908108
    val_sensitivity: 0.943820
    val_specificity: 0.806452
    train_f1: 0.924675
    type: mlp
    total_elapsed_time: 00:49
[CaTabRa] New model #32 trained:
    val_f1: 0.741667
    val_sensitivity: 1.000000
    val_specificity: 0.000000
    train_f1: 0.744856
    type: mlp
    total_elapsed_time: 00:50
[CaTabRa] Final training statistics:
    n_models_trained: 32
    ensemble_val_f1: 0.9888888888888888
[CaTabRa] Creating shap explainer
[CaTabRa] Initialized out-of-distribution detector of type Autoencoder
[CaTabRa] Fitting out-of-distribution detector...
Iteration 1, loss = 0.06674697
Iteration 2, loss = 0.03886039
Iteration 3, loss = 0.02630481
Iteration 4, loss = 0.01931956
Iteration 5, loss = 0.01464805
Iteration 6, loss = 0.01285085
Iteration 7, loss = 0.01249570
Iteration 8, loss = 0.01238648
Iteration 9, loss = 0.01221173
Iteration 10, loss = 0.01181073
Iteration 11, loss = 0.01156576
Iteration 12, loss = 0.01151248
Iteration 13, loss = 0.01146056
Iteration 14, loss = 0.01140084
Iteration 15, loss = 0.01138180
Iteration 16, loss = 0.01134451
Iteration 17, loss = 0.01131035
Iteration 18, loss = 0.01130526
Iteration 19, loss = 0.01126944
Iteration 20, loss = 0.01126597
Iteration 21, loss = 0.01125684
Iteration 22, loss = 0.01123151
Iteration 23, loss = 0.01136320
Iteration 24, loss = 0.01122500
Iteration 25, loss = 0.01140600
Iteration 26, loss = 0.01130277
Iteration 27, loss = 0.01126492
Iteration 28, loss = 0.01132336
Iteration 29, loss = 0.01131247
Iteration 30, loss = 0.01122559
Iteration 31, loss = 0.01131992
Iteration 32, loss = 0.01125866
Iteration 33, loss = 0.01126349
Iteration 34, loss = 0.01127804
Iteration 35, loss = 0.01125354
Iteration 36, loss = 0.01124170
Iteration 37, loss = 0.01121357
Iteration 38, loss = 0.01128867
Iteration 39, loss = 0.01122808
Iteration 40, loss = 0.01121827
Iteration 41, loss = 0.01123449
Iteration 42, loss = 0.01121298
Iteration 43, loss = 0.01122342
Iteration 44, loss = 0.01122471
Iteration 45, loss = 0.01120527
Iteration 46, loss = 0.01133284
Iteration 47, loss = 0.01130642
Iteration 48, loss = 0.01128560
Iteration 49, loss = 0.01135006
Iteration 50, loss = 0.01132227
Iteration 51, loss = 0.01128201
Iteration 52, loss = 0.01125684
Iteration 53, loss = 0.01125339
Iteration 54, loss = 0.01121753
Iteration 55, loss = 0.01135285
Iteration 56, loss = 0.01131737
Iteration 57, loss = 0.01125638
Iteration 58, loss = 0.01130594
Iteration 59, loss = 0.01125258
Iteration 60, loss = 0.01121187
Iteration 61, loss = 0.01127508
Iteration 62, loss = 0.01121970
Training loss did not improve more than tol=0.000100 for 50 consecutive epochs. Stopping.
[CaTabRa] Out-of-distribution detector fitted.
[CaTabRa] ### Analysis finished at 2023-02-07 12:51:58.779368
[CaTabRa] ### Elapsed time: 0 days 00:01:04.355039
[CaTabRa] ### Output saved in /mnt/c/Users/amaletzk/Documents/CaTabRa/catabra/examples/performance_metrics
[CaTabRa] ### Evaluation started at 2023-02-07 12:51:58.826301
[CaTabRa] Predicting out-of-distribution samples.
[CaTabRa] Saving descriptive statistics completed
[CaTabRa] Saving descriptive statistics completed
[CaTabRa] Evaluation results for train:
    f1 @ 0.5: 0.994475138121547
    sensitivity @ 0.5: 1.0
    specificity @ 0.5: 0.9838709677419355
[CaTabRa] Evaluation results for not_train:
    f1 @ 0.5: 0.9942196531791907
    sensitivity @ 0.5: 0.9885057471264368
    specificity @ 0.5: 1.0
[CaTabRa] ### Evaluation finished at 2023-02-07 12:52:04.051203
[CaTabRa] ### Elapsed time: 0 days 00:00:05.224902
[CaTabRa] ### Output saved in /mnt/c/Users/amaletzk/Documents/CaTabRa/catabra/examples/performance_metrics/eval

Note that the F1-score, sensitivity and specificity are now reported during model training. The F1-score is the hyperparameter optimization objective.

NOTE Regardless of the metrics specified in the config dict, evaluating a model with function catabra.evaluation.evaluate() always reports all suitable built-in performance metrics in metrics.xlsx.

Available Metrics

Check out Metrics for an overview of all built-in metrics available in CaTabRa.