Plotting


This notebook is part of the CaTabRa GitHub repository.

This short example demonstrates CaTabRa’s built-in plotting capabilities:

For a more thorough account on plotting in CaTabRa, please refer to Plots.

Familiarity with CaTabRa’s main data analysis workflow is assumed. A step-by-step introduction can be found in CaTabRa Workflow.

Create Plots in Python

When analyzing data and evaluating or explaining prediction models, CaTabRa automatically plots some of the results and saves the resulting figures as png, pdf or other files. For a more fine-grained control of plotting, there is also a Python API.

Let’s start with analyzing some data.

[2]:
# load dataset
from sklearn.datasets import load_breast_cancer
X, y = load_breast_cancer(as_frame=True, return_X_y=True)
[3]:
# add target labels to DataFrame
X['diagnosis'] = y
[4]:
# split into train- and test set by adding column with corresponding values
# the name of the column is arbitrary; CaTabRa tries to "guess" which samples belong to which set based on the column name and -values
X['train'] = X.index <= 0.8 * len(X)
[5]:
from catabra.analysis import analyze

analyze(
    X,
    classify='diagnosis',     # name of column containing classification target
    split='train',            # name of column containing information about the train-test split (optional)
    time=1,                   # time budget for hyperparameter tuning, in minutes (optional)
    out='plotting_example',
)
[CaTabRa] ### Analysis started at 2023-04-13 12:21:00.744873
[CaTabRa] Saving descriptive statistics completed
/mnt/c/Users/skaltenl/Documents/catabra_2023/develop/catabra/catabra/util/statistics.py:213: FutureWarning: The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning.
  return dict_stat, dict_non_num_stat, (df.corr() if df.shape[1] <= corr_threshold else None)
[CaTabRa] Using AutoML-backend auto-sklearn for binary_classification
[CaTabRa] Successfully loaded the following auto-sklearn add-on module(s): xgb
[CaTabRa] Using auto-sklearn 2.0.
/home/skaltenl/anaconda3/envs/test2/lib/python3.9/site-packages/autosklearn/experimental/selector.py:24: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
  for col, series in prediction.iteritems():
/home/skaltenl/anaconda3/envs/test2/lib/python3.9/site-packages/smac/intensification/parallel_scheduling.py:153: UserWarning: SuccessiveHalving is executed with 1 workers only. Consider to use pynisher to use all available workers.
  warnings.warn(
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.986260
    n_constituent_models: 1
    total_elapsed_time: 00:05
[CaTabRa] New model #1 trained:
    val_roc_auc: 0.989845
    val_accuracy: 0.947368
    val_balanced_accuracy: 0.946356
    train_roc_auc: 1.000000
    type: gradient_boosting
    total_elapsed_time: 00:05
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.986260
    n_constituent_models: 1
    total_elapsed_time: 00:08
[CaTabRa] New model #2 trained:
    val_roc_auc: 0.945430
    val_accuracy: 0.921053
    val_balanced_accuracy: 0.924134
    train_roc_auc: 1.000000
    type: gradient_boosting
    total_elapsed_time: 00:08
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.986260
    n_constituent_models: 1
    total_elapsed_time: 00:10
[CaTabRa] New model #3 trained:
    val_roc_auc: 0.971416
    val_accuracy: 0.921053
    val_balanced_accuracy: 0.919952
    train_roc_auc: 0.993877
    type: gradient_boosting
    total_elapsed_time: 00:10
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.987834
    n_constituent_models: 3
    total_elapsed_time: 00:13
[CaTabRa] New model #4 trained:
    val_roc_auc: 0.968250
    val_accuracy: 0.929825
    val_balanced_accuracy: 0.926523
    train_roc_auc: 0.995034
    type: gradient_boosting
    total_elapsed_time: 00:13
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.996953
    n_constituent_models: 1
    total_elapsed_time: 00:16
[CaTabRa] New model #5 trained:
    val_roc_auc: 0.997073
    val_accuracy: 0.971491
    val_balanced_accuracy: 0.970072
    train_roc_auc: 0.999985
    type: mlp
    total_elapsed_time: 00:16
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.996953
    n_constituent_models: 1
    total_elapsed_time: 00:18
[CaTabRa] New model #6 trained:
    val_roc_auc: 0.955048
    val_accuracy: 0.914474
    val_balanced_accuracy: 0.915233
    train_roc_auc: 0.986036
    type: gradient_boosting
    total_elapsed_time: 00:18
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.996953
    n_constituent_models: 1
    total_elapsed_time: 00:21
[CaTabRa] New model #7 trained:
    val_roc_auc: 0.990054
    val_accuracy: 0.949561
    val_balanced_accuracy: 0.946535
    train_roc_auc: 1.000000
    type: gradient_boosting
    total_elapsed_time: 00:21
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.997172
    n_constituent_models: 2
    total_elapsed_time: 00:23
[CaTabRa] New model #8 trained:
    val_roc_auc: 0.995579
    val_accuracy: 0.949561
    val_balanced_accuracy: 0.954898
    train_roc_auc: 0.996864
    type: mlp
    total_elapsed_time: 00:23
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.997172
    n_constituent_models: 2
    total_elapsed_time: 00:26
[CaTabRa] New model #9 trained:
    val_roc_auc: 0.990352
    val_accuracy: 0.969298
    val_balanced_accuracy: 0.967384
    train_roc_auc: 0.999701
    type: mlp
    total_elapsed_time: 00:25
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.997172
    n_constituent_models: 2
    total_elapsed_time: 00:28
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.997172
    n_constituent_models: 2
    total_elapsed_time: 00:30
[CaTabRa] New model #10 trained:
    val_roc_auc: 0.988949
    val_accuracy: 0.936404
    val_balanced_accuracy: 0.937933
    train_roc_auc: 0.999955
    type: mlp
    total_elapsed_time: 00:30
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.997172
    n_constituent_models: 2
    total_elapsed_time: 00:33
[CaTabRa] New model #11 trained:
    val_roc_auc: 0.992861
    val_accuracy: 0.964912
    val_balanced_accuracy: 0.962007
    train_roc_auc: 1.000000
    type: extra_trees
    total_elapsed_time: 00:32
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.997172
    n_constituent_models: 2
    total_elapsed_time: 00:35
[CaTabRa] New model #12 trained:
    val_roc_auc: 0.991756
    val_accuracy: 0.953947
    val_balanced_accuracy: 0.951912
    train_roc_auc: 1.000000
    type: gradient_boosting
    total_elapsed_time: 00:35
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.997172
    n_constituent_models: 2
    total_elapsed_time: 00:37
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.997172
    n_constituent_models: 2
    total_elapsed_time: 00:40
[CaTabRa] New model #13 trained:
    val_roc_auc: 0.995639
    val_accuracy: 0.964912
    val_balanced_accuracy: 0.961171
    train_roc_auc: 0.999044
    type: mlp
    total_elapsed_time: 00:40
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.997172
    n_constituent_models: 2
    total_elapsed_time: 00:43
[CaTabRa] New model #14 trained:
    val_roc_auc: 0.993429
    val_accuracy: 0.967105
    val_balanced_accuracy: 0.964695
    train_roc_auc: 0.999836
    type: extra_trees
    total_elapsed_time: 00:42
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.997172
    n_constituent_models: 2
    total_elapsed_time: 00:47
[CaTabRa] New model #15 trained:
    val_roc_auc: 0.958393
    val_accuracy: 0.888158
    val_balanced_accuracy: 0.889665
    train_roc_auc: 0.973305
    type: gradient_boosting
    total_elapsed_time: 00:49
[CaTabRa] New model #16 trained:
    val_roc_auc: 0.989934
    val_accuracy: 0.947368
    val_balanced_accuracy: 0.944683
    train_roc_auc: 0.997961
    type: gradient_boosting
    total_elapsed_time: 00:51
/home/skaltenl/anaconda3/envs/test2/lib/python3.9/site-packages/sklearn/preprocessing/_data.py:3237: RuntimeWarning: divide by zero encountered in log
  loglike = -n_samples / 2 * np.log(x_trans.var())
/home/skaltenl/anaconda3/envs/test2/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:614: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (32) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/skaltenl/anaconda3/envs/test2/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:614: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (32) reached and the optimization hasn't converged yet.
  warnings.warn(
[CaTabRa] Final training statistics:
    n_models_trained: 16
    ensemble_val_roc_auc: 0.9971724412584628
[CaTabRa] Creating shap explainer
[CaTabRa] Initialized out-of-distribution detector of type BinsDetector
[CaTabRa] Fitting out-of-distribution detector...
[CaTabRa] Out-of-distribution detector fitted.
[CaTabRa] ### Analysis finished at 2023-04-13 12:21:59.989624
[CaTabRa] ### Elapsed time: 0 days 00:00:59.244751
[CaTabRa] ### Output saved in /mnt/c/Users/skaltenl/Documents/catabra_2023/develop/catabra/examples/plotting_example
[CaTabRa] ### Evaluation started at 2023-04-13 12:22:00.008499
[CaTabRa] Saving descriptive statistics completed
[CaTabRa] Predicting out-of-distribution samples.
The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning.
[CaTabRa] Saving descriptive statistics completed
[CaTabRa] Predicting out-of-distribution samples.
[CaTabRa] Evaluation results for train:
    roc_auc: 0.9990442054958184
    accuracy @ 0.5: 0.9824561403508771
    balanced_accuracy @ 0.5: 0.9818399044205496
The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning.
[CaTabRa] Evaluation results for not_train:
    roc_auc: 0.9995579133510168
    accuracy @ 0.5: 0.9734513274336283
    balanced_accuracy @ 0.5: 0.9827586206896552
[CaTabRa] ### Evaluation finished at 2023-04-13 12:22:02.811294
[CaTabRa] ### Elapsed time: 0 days 00:00:02.802795
[CaTabRa] ### Output saved in /mnt/c/Users/skaltenl/Documents/catabra_2023/develop/catabra/examples/plotting_example/eval

Recall from CaTabRa Workflow that by specifying a train-test split the final classifier is automatically evaluated after training. The resulting figures are saved in eval/train/static_plots/ and eval/not_train/static_plots. But we can create (and modify) them directly in Python, too.

[6]:
from catabra.util.io import CaTabRaLoader

loader = CaTabRaLoader('plotting_example')

Create performance plots for the test set:

[7]:
from catabra.evaluation import plot_results

figures = plot_results(
    loader.path / 'eval/not_train/predictions.xlsx',    # table with predictions for all samples
    loader.path / 'eval/not_train/metrics.xlsx',        # table with performance metrics
    loader.get_encoder()                                # data encoder
)

The result is a dict mapping keys to `matplotlib.pyplot.Figure <https://matplotlib.org/stable/api/figure_api.html#matplotlib.figure.Figure>`__ instances. The keys correspond precisely to the names of the figure-files in eval/not_train/static_plots/.

[8]:
figures.keys()
[8]:
dict_keys(['roc_curve', 'pr_curve', 'threshold', 'confusion_matrix', 'calibration'])
[9]:
figures['threshold']
[9]:
../_images/jupyter_plotting_16_0.png
[10]:
figures['roc_curve']
[10]:
../_images/jupyter_plotting_17_0.png
[11]:
figures['confusion_matrix']
[11]:
../_images/jupyter_plotting_18_0.png

There are similar ways to plot training history and feature importance. Check out Plots for details.

Create Interactive Plots with plotly

So far, all plots were created with the default Matplotlib backend. CaTabRa can be instructed to produce interactive plots using the plotly backend with only a few lines of code.

Since plotly is not installed by default, we have to install it manually (using either pip or conda):

[12]:
!pip install plotly==5.7.0
WARNING: Ignoring invalid distribution -ytest (/home/skaltenl/anaconda3/envs/test2/lib/python3.9/site-packages)
WARNING: Ignoring invalid distribution -lotly (/home/skaltenl/anaconda3/envs/test2/lib/python3.9/site-packages)
WARNING: Ignoring invalid distribution -ytest (/home/skaltenl/anaconda3/envs/test2/lib/python3.9/site-packages)
WARNING: Ignoring invalid distribution -lotly (/home/skaltenl/anaconda3/envs/test2/lib/python3.9/site-packages)
Requirement already satisfied: plotly==5.7.0 in /home/skaltenl/anaconda3/envs/test2/lib/python3.9/site-packages (5.7.0)
Requirement already satisfied: six in /home/skaltenl/anaconda3/envs/test2/lib/python3.9/site-packages (from plotly==5.7.0) (1.16.0)
Requirement already satisfied: tenacity>=6.2.0 in /home/skaltenl/anaconda3/envs/test2/lib/python3.9/site-packages (from plotly==5.7.0) (8.2.2)
WARNING: Ignoring invalid distribution -ytest (/home/skaltenl/anaconda3/envs/test2/lib/python3.9/site-packages)
WARNING: Ignoring invalid distribution -lotly (/home/skaltenl/anaconda3/envs/test2/lib/python3.9/site-packages)
WARNING: Ignoring invalid distribution -ytest (/home/skaltenl/anaconda3/envs/test2/lib/python3.9/site-packages)
WARNING: Ignoring invalid distribution -lotly (/home/skaltenl/anaconda3/envs/test2/lib/python3.9/site-packages)
WARNING: Ignoring invalid distribution -ytest (/home/skaltenl/anaconda3/envs/test2/lib/python3.9/site-packages)
WARNING: Ignoring invalid distribution -lotly (/home/skaltenl/anaconda3/envs/test2/lib/python3.9/site-packages)
WARNING: Ignoring invalid distribution -ytest (/home/skaltenl/anaconda3/envs/test2/lib/python3.9/site-packages)
WARNING: Ignoring invalid distribution -lotly (/home/skaltenl/anaconda3/envs/test2/lib/python3.9/site-packages)

For automatically creating interactive plots during all stages of CaTabRa’s data analysis workflow we can update the config dict passed to the initial call to analyze(). The config dict can be updated by either passing a dict or the path to a JSON file containing such a dict; the latter is especially useful on the command line.

NOTE For more information about the possible config parameters and their meaning, please refer to Configuration.

[13]:
analyze(
    X,
    classify='diagnosis',     # name of column containing classification target
    split='train',            # name of column containing information about the train-test split (optional)
    time=1,                   # time budget for hyperparameter tuning, in minutes (optional)
    out='plotting_example_interactive',
    config={
        'static_plots': True,         # whether to create static plots; True by default
        'interactive_plots': True     # whether to create interactive plots; False by default
    }
)
[CaTabRa] ### Analysis started at 2023-04-13 12:22:05.167578
[CaTabRa] Saving descriptive statistics completed
[CaTabRa] Using AutoML-backend auto-sklearn for binary_classification
[CaTabRa] Successfully loaded the following auto-sklearn add-on module(s): xgb
[CaTabRa] Using auto-sklearn 2.0.
The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning.
iteritems is deprecated and will be removed in a future version. Use .items instead.
SuccessiveHalving is executed with 1 workers only. Consider to use pynisher to use all available workers.
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.986260
    n_constituent_models: 1
    total_elapsed_time: 00:03
[CaTabRa] New model #1 trained:
    val_roc_auc: 0.989845
    val_accuracy: 0.947368
    val_balanced_accuracy: 0.946356
    train_roc_auc: 1.000000
    type: gradient_boosting
    total_elapsed_time: 00:03
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.986260
    n_constituent_models: 1
    total_elapsed_time: 00:05
[CaTabRa] New model #2 trained:
    val_roc_auc: 0.945430
    val_accuracy: 0.921053
    val_balanced_accuracy: 0.924134
    train_roc_auc: 1.000000
    type: gradient_boosting
    total_elapsed_time: 00:05
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.986260
    n_constituent_models: 1
    total_elapsed_time: 00:08
[CaTabRa] New model #3 trained:
    val_roc_auc: 0.971416
    val_accuracy: 0.921053
    val_balanced_accuracy: 0.919952
    train_roc_auc: 0.993877
    type: gradient_boosting
    total_elapsed_time: 00:08
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.987834
    n_constituent_models: 3
    total_elapsed_time: 00:10
[CaTabRa] New model #4 trained:
    val_roc_auc: 0.968250
    val_accuracy: 0.929825
    val_balanced_accuracy: 0.926523
    train_roc_auc: 0.995034
    type: gradient_boosting
    total_elapsed_time: 00:10
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.996953
    n_constituent_models: 1
    total_elapsed_time: 00:14
[CaTabRa] New model #5 trained:
    val_roc_auc: 0.997073
    val_accuracy: 0.971491
    val_balanced_accuracy: 0.970072
    train_roc_auc: 0.999985
    type: mlp
    total_elapsed_time: 00:13
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.996953
    n_constituent_models: 1
    total_elapsed_time: 00:16
[CaTabRa] New model #6 trained:
    val_roc_auc: 0.955048
    val_accuracy: 0.914474
    val_balanced_accuracy: 0.915233
    train_roc_auc: 0.986036
    type: gradient_boosting
    total_elapsed_time: 00:15
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.996953
    n_constituent_models: 1
    total_elapsed_time: 00:18
[CaTabRa] New model #7 trained:
    val_roc_auc: 0.990054
    val_accuracy: 0.949561
    val_balanced_accuracy: 0.946535
    train_roc_auc: 1.000000
    type: gradient_boosting
    total_elapsed_time: 00:18
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.997172
    n_constituent_models: 2
    total_elapsed_time: 00:20
[CaTabRa] New model #8 trained:
    val_roc_auc: 0.995579
    val_accuracy: 0.949561
    val_balanced_accuracy: 0.954898
    train_roc_auc: 0.996864
    type: mlp
    total_elapsed_time: 00:20
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.997172
    n_constituent_models: 2
    total_elapsed_time: 00:23
[CaTabRa] New model #9 trained:
    val_roc_auc: 0.990352
    val_accuracy: 0.969298
    val_balanced_accuracy: 0.967384
    train_roc_auc: 0.999701
    type: mlp
    total_elapsed_time: 00:23
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.997172
    n_constituent_models: 2
    total_elapsed_time: 00:25
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.997172
    n_constituent_models: 2
    total_elapsed_time: 00:28
[CaTabRa] New model #10 trained:
    val_roc_auc: 0.988949
    val_accuracy: 0.936404
    val_balanced_accuracy: 0.937933
    train_roc_auc: 0.999955
    type: mlp
    total_elapsed_time: 00:27
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.997172
    n_constituent_models: 2
    total_elapsed_time: 00:30
[CaTabRa] New model #11 trained:
    val_roc_auc: 0.992861
    val_accuracy: 0.964912
    val_balanced_accuracy: 0.962007
    train_roc_auc: 1.000000
    type: extra_trees
    total_elapsed_time: 00:30
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.997172
    n_constituent_models: 2
    total_elapsed_time: 00:32
[CaTabRa] New model #12 trained:
    val_roc_auc: 0.991756
    val_accuracy: 0.953947
    val_balanced_accuracy: 0.951912
    train_roc_auc: 1.000000
    type: gradient_boosting
    total_elapsed_time: 00:32
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.997172
    n_constituent_models: 2
    total_elapsed_time: 00:35
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.997172
    n_constituent_models: 2
    total_elapsed_time: 00:38
[CaTabRa] New model #13 trained:
    val_roc_auc: 0.995639
    val_accuracy: 0.964912
    val_balanced_accuracy: 0.961171
    train_roc_auc: 0.999044
    type: mlp
    total_elapsed_time: 00:37
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.997172
    n_constituent_models: 2
    total_elapsed_time: 00:40
[CaTabRa] New model #14 trained:
    val_roc_auc: 0.993429
    val_accuracy: 0.967105
    val_balanced_accuracy: 0.964695
    train_roc_auc: 0.999836
    type: extra_trees
    total_elapsed_time: 00:40
[CaTabRa] New ensemble fitted:
    ensemble_val_roc_auc: 0.997172
    n_constituent_models: 2
    total_elapsed_time: 00:44
[CaTabRa] New model #15 trained:
    val_roc_auc: 0.958393
    val_accuracy: 0.888158
    val_balanced_accuracy: 0.889665
    train_roc_auc: 0.973305
    type: gradient_boosting
    total_elapsed_time: 00:46
[CaTabRa] New model #16 trained:
    val_roc_auc: 0.989934
    val_accuracy: 0.947368
    val_balanced_accuracy: 0.944683
    train_roc_auc: 0.997961
    type: gradient_boosting
    total_elapsed_time: 00:48
divide by zero encountered in log
Stochastic Optimizer: Maximum iterations (32) reached and the optimization hasn't converged yet.
Stochastic Optimizer: Maximum iterations (32) reached and the optimization hasn't converged yet.
[CaTabRa] Final training statistics:
    n_models_trained: 16
    ensemble_val_roc_auc: 0.9971724412584628
[CaTabRa] Creating shap explainer
[CaTabRa] Initialized out-of-distribution detector of type BinsDetector
[CaTabRa] Fitting out-of-distribution detector...
[CaTabRa] Out-of-distribution detector fitted.
[CaTabRa] ### Analysis finished at 2023-04-13 12:23:02.650250
[CaTabRa] ### Elapsed time: 0 days 00:00:57.482672
[CaTabRa] ### Output saved in /mnt/c/Users/skaltenl/Documents/catabra_2023/develop/catabra/examples/plotting_example_interactive
[CaTabRa] ### Evaluation started at 2023-04-13 12:23:02.652969
[CaTabRa] Saving descriptive statistics completed
[CaTabRa] Predicting out-of-distribution samples.
/mnt/c/Users/skaltenl/Documents/catabra_2023/develop/catabra/catabra/util/statistics.py:213: FutureWarning:

The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning.

[CaTabRa] Saving descriptive statistics completed
[CaTabRa] Predicting out-of-distribution samples.
[CaTabRa] Evaluation results for train:
    roc_auc: 0.9990442054958184
    accuracy @ 0.5: 0.9824561403508771
    balanced_accuracy @ 0.5: 0.9818399044205496
/mnt/c/Users/skaltenl/Documents/catabra_2023/develop/catabra/catabra/util/statistics.py:213: FutureWarning:

The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning.

[CaTabRa] Evaluation results for not_train:
    roc_auc: 0.9995579133510168
    accuracy @ 0.5: 0.9734513274336283
    balanced_accuracy @ 0.5: 0.9827586206896552
[CaTabRa] ### Evaluation finished at 2023-04-13 12:23:05.557959
[CaTabRa] ### Elapsed time: 0 days 00:00:02.904990
[CaTabRa] ### Output saved in /mnt/c/Users/skaltenl/Documents/catabra_2023/develop/catabra/examples/plotting_example_interactive/eval

Looking at eval/train/ and eval/not_train/ in the newly created plotting_example_interactive you will now find folders static_plots (as before), but also interactive_plots. The latter contain HTML files with plotly-generated interactive plots that can be viewed in any modern browser.

In addition, it is also possible to create interactive plots directly in Python. Continuing the example from above, all we have to do is pass interactive=True to function plot_results():

[14]:
figures = plot_results(
    loader.path / 'eval/not_train/predictions.xlsx',    # table with predictions for all samples
    loader.path / 'eval/not_train/metrics.xlsx',        # table with performance metrics
    loader.get_encoder(),                               # data encoder
    interactive=True
)

The output is again a dict mapping keys to figures, but this time the figures are plotly figure instances:

[15]:
figures['threshold']
[16]:
figures['roc_curve']
[17]:
figures['confusion_matrix']

NOTE Every static Matplotlib figure that can be created in CaTabRa’s main data analysis workflow has an interactive plotly analogue, and vice versa. This includes training history plots, model performance plots and feature importance plots.