Explanation

class EnsembleExplainer(ensemble: FittedEnsemble = None, config: dict | None = None, feature_names: list | None = None, target_names: list | None = None, x: DataFrame | None = None, y: DataFrame | None = None, params=None)[source]

Bases: object

Class for explaining a given ensemble, or constituents of it.

Parameters:
  • ensemble (FittedEnsemble) – The ensemble to explain, an instance of FittedEnsemble.

  • config (dict, optional) – Config dictionary.

  • feature_names (list, optional) – List of feature names. None defaults to range(n_features), where n_features is determined from x.

  • target_names (list, optional) – List of target names, optional. In case of regression this is the list of target variables, in case of binary classification this is the singleton list with the sole target variable, and in multiclass- and multilabel classification this is the list of classes. None defaults to range(n_targets), where n_targets is determined from y.

  • x (DataFrame, optional) – Training data, which is required by some explanation methods (e.g., SHAP).

  • y (DataFrame, optional) – Labels of x.

  • params (optional) – Params obtained from a previous instantiation of an ensemble explainer of this type on ensemble. If given, neither feature_names, target_names, x nor y may be provided.

Examples

>>> # Paradigm for explaining a pipeline `model` of a FittedEnsemble:
>>>
>>> # Setup:
>>> preprocessing_explainer = TransformationExplainer.make(transformation=model.preprocessing)
>>> x_train = preprocessing_explainer.fit_forward(x_train, y_train)
>>>
>>> # Local explanations for `x_test`:
>>> x_test_pp = preprocessing_estimator.forward(x_test)
>>> explanation = func(x_test_pp)
>>> explanation = preprocessing_explainer.backward(explanation)
>>>
>>> # Global explanations:
>>> explanation = func_global()
>>> explanation = preprocessing_explainer.backward_global(explanation)
>>> # Paradigm for explaining data `(x, y)` after applying some preprocessing steps `preprocessing`:
>>>
>>> preprocessing_explainer = TransformationExplainer.make(transformation=preprocessing)
>>> x_pp = preprocessing_explainer.fit_forward(x, y)
>>> explanation = func(x_pp, y)
>>> explanation = preprocessing_explainer.backward(explanation)       # or `backward_global(explanation)`
static register(name: str, factory: Callable[[...], EnsembleExplainer])[source]

Register a new ensemble explainer factory.

Parameters:
  • name (str) – The name of the ensemble explainer.

  • factory (Callable) – The factory, a function mapping argument-dicts to instances of class EnsembleExplainer (or subclasses thereof).

property behavior: dict[source]

Description of the behavior of methods explain() and explain_global(), especially w.r.t. parameters x and y.

Returns:

Dictionary with keys

  • "supports_local": True if the backend supports local explanations, i.e., method explain() can be called. If False, calling explain() raises an exception.

  • "requires_y": True if y must be passed to explain() and explain_global().

  • "global_accepts_x": True if x can be passed to method explain_global().

  • "global_requires_x": True if x must be passed to method explain_global(). If False but "global_accepts_x" is True, the global behavior differs depending on whether x is provided. "global_requires_x" can only be True if “global_accepts_x” is True as well.

  • "global_is_mean_of_local": True if global explanations are the mean of the individual local explanations, if x is provided. If True, it might be better to call method explain() instead of explain_global(), since the computational effort is identical. Can only be True if “supports_local” is True as well.

Return type:

dict

property params_: dict[source]

Get all params necessary for instantiating this EnsembleExplainer via parameter params.

explain(x: DataFrame, y: DataFrame | None = None, jobs: int = 1, batch_size: int | None = None, model_id=None, mapping: Dict[str, List[str]] | None = None, show_progress: bool = False) dict[source]

Explain the ensemble, or some of its constituent models (pipelines), on a set of samples.

Parameters:
  • x (DataFrame) – The samples, a DataFrame with the same feature columns as the ensemble was trained on.

  • y (DataFrame, optional) – The labels. If given, a DataFrame with the same number of rows and row index as x and the same target columns as the ensemble was trained on. Check property behavior to see whether this argument is required (depends on the backend).

  • jobs (int, default=1) – The number of jobs to use.

  • batch_size (int, optional) – The batch size to use.

  • model_id (optional) – The ID(s) of the model(s) to explain, or None to explain all models in the ensemble.

  • mapping (dict, optional) – Mapping specifying which features to combine: target column names are mapped to lists of source column names in x.

  • show_progress (bool, default=False) – Whether to display a progress bar.

Returns:

Dictionary with 1-2 levels of nesting. The keys in the outer dict are model-IDs (possibly including “__ensemble__”), and the keys in the inner dicts (if any) are arbitrary and usually depend on the prediction task and the explanation backend. Ultimately, the values are DataFrames with the same row index as x and columns corresponding to feature_names, containing feature importance scores. Note that the result consists entirely of floating point values, even if x has categorical or other columns.

Return type:

dict

explain_global(x: DataFrame | None = None, y: DataFrame | None = None, sample_weight: ndarray | None = None, jobs: int = 1, batch_size: int | None = None, model_id=None, mapping: Dict[str, List[str]] | None = None, show_progress: bool = False) dict[source]

Explain the ensemble, or some of its constituent models (pipelines), globally.

Parameters:
  • x (DataFrame, optional) – Samples, optional, a DataFrame with the same columns as the ensemble was trained on. Check property behavior to see whether this argument is accepted or required (depends on the backend).

  • y (DataFrame, optional) – The labels, optional. If given, a DataFrame with the same number of rows and row index as x and the same target columns as the ensemble was trained on. Check property behavior to see whether this argument is required (depends on the backend).

  • sample_weight (ndarray, optional) – Sample weight. Ignored if x is None.

  • jobs (int, default=1) – The number of jobs to use.

  • batch_size (int, optional) – The batch size to use.

  • model_id (optional) – The ID(s) of the model(s) to explain, or None to explain all models in the ensemble.

  • mapping (dict, optional) – Mapping specifying which features to combine: target column names are mapped to lists of source column names in x.

  • show_progress (bool, default=False) – Whether to display a progress bar.

Returns:

Dictionary whose keys are model-IDs (possibly including “__ensemble__”), and whose values are Series or DataFrames with feature importance scores. In either case, the row index equals feature_names, and the columns of DataFrames can be arbitrary and usually depend on the prediction task and the explanation backend.

Return type:

dict

aggregate_features(features: DataFrame, mapping: Dict[str, List[str]]) DataFrame[source]

Combine features for obtaining aggregated values corresponding to aggregated local explanations returned by method aggregate_explanations().

Parameters:
  • features (DataFrame) – DataFrame to aggregate, from which the corresponding local explanations were calculated.

  • mapping (dict) – Mapping specifying which features to combine: target column names are mapped to lists of source column names in features.

Returns:

DataFrame with aggregated features.

Return type:

DataFrame

get_versions() dict[source]

Get the versions of all key packages and libraries this explanation backend depends upon.

Returns:

Dictionary whose keys are package names and whose values are version strings.

Return type:

dict

explain(*table: str | Path | DataFrame, folder: str | Path | None = None, model_id=None, explainer: str | None = None, split: str | None = None, sample_weight: str | None = None, out: str | Path | None = None, glob: bool | None = None, jobs: int | None = None, batch_size: int | None = None, aggregation_mapping: Dict[str, List[str]] | None = None, from_invocation: str | Path | dict | None = None)[source]

Explain an existing CaTabRa object (prediction model) in terms of feature importance.

Parameters:
  • *table (str | Path | DataFrame) – The table(s) to explain the CaTabRa object on. If multiple are given, their columns are merged into a single table. Must have the same format as the table(s) initially passed to function analyze(), possibly without target column(s).

  • folder (str | Path) – The folder containing the CaTabRa object to explain.

  • model_id (optional) – ID(s) of the prediction model(s) to evaluate. If None or “__ensemble__”, all models in the ensemble are explained, if possible. Note that due to technical restrictions not all models might be explainable.

  • explainer (str, optional) – Name of the explainer to use. Defaults to the first explainer specified in config param “explainer”. Note that only explainers that were fitted to training data during “analyze” can be used, as well as explainers that do not need to be fit to training data (e.g., “permutation”).

  • split (str, optional) – Column used for splitting the data into disjoint subsets. If specified and not “”, each subset is explained individually. In contrast to function analyze(), the name/values of the column do not need to carry any semantic information about training and test sets.

  • sample_weight (str, optional) – Column with sample weights. If specified and not “”, must have numeric data type. Sample weights are used both for training and evaluating prediction models.

  • out (str | Path, optional) – Directory where to save all generated artifacts. Defaults to a directory located in folder, with a name following a fixed naming pattern. If out already exists, the user is prompted to specify whether it should be replaced; otherwise, it is automatically created.

  • glob (bool, optional) – Whether to explain the CaTabRa object globally. If True, table might not have to be specified (depends on explanation backend).

  • jobs (int, optional) – Optional, number of jobs to use. Overwrites the “jobs” config param.

  • batch_size (int, optional) – Optional, batch size used for explaining the prediction model(s).

  • aggregation_mapping (str | dict, optional) – Optional, mapping from target column names to lists of source column names in table, whose explanations will be aggregated by the explainer’s aggregation function. Can be either a dict or a JSON file containing such a dict. Useful for generating joint explanations of certain features, e.g., corresponding to the same variable observed at different times.

  • from_invocation (str | Path | dict, optional) – Dict or path to an invocation.json file. All arguments of this function not explicitly specified are taken from this dict; this also includes the table on which to explain the CaTabRa object.

class CaTabRaExplanation(invocation: str | Path | dict | None = None)[source]

Bases: CaTabRaBase

explain_split(explainer: EnsembleExplainer, x: DataFrame | None = None, y: DataFrame | None = None, sample_weight: ndarray | None = None, directory=None, glob: bool = False, model_id=None, batch_size: int | None = None, jobs: int = 1, aggregation_mapping: Dict | None = None, static_plots: bool = True, interactive_plots: bool = False, verbose: bool = False) dict | None[source]

Explain a single data split.

Parameters:
  • explainer (EnsembleExplainer) – Explainer object.

  • x (DataFrame, optional) – Encoded data to apply explainer to, optional unless glob is False. Only features, no labels.

  • y (DataFrame, optional) – Encoded data to apply explainer to, optional. Only labels, no features.

  • sample_weight (ndarray, optional) – Sample weights, optional. Ignored if x is None or glob is False.

  • directory (str | Path, optional) – Directory where to save the explanations. If None, results are returned in a dict.

  • glob (bool, default=False) – Whether to create global explanations.

  • model_id (optional) – ID(s) of the model(s) to explain.

  • batch_size (int, optional) – Batch size.

  • jobs (int, default=1) – Number of jobs.

  • aggregation_mapping (dict, optional) – Mapping from target column name to list of source columns. The source columns’ explanations will be aggregated by the explainer’s aggregation function.

  • static_plots (bool, default=True) – Whether to create static plots.

  • interactive_plots (bool, default=False) – Whether to create interactive plots.

  • verbose (bool, default=False) – Whether to print intermediate results and progress bars.

Returns:

None if directory is given, else dict with evaluation results.

Return type:

dict | None

plot_beeswarms(explanations: dict | str | Path | DataFrame, features: DataFrame | None = None, interactive: bool = False, title: str | None = None, max_features: int | None = None, add_sum_of_remaining: bool = True) dict | DataFrame[source]

Create beeswarm plots of local explanations.

explanations: dict | str | Path | DataFrame

Local explanations to plot, a dict as returned by EnsembleExplainer.explain(), i.e., 1-2 levels of nesting, values are DataFrames with samples on row index and features on column index.

features: DataFrame, optional

Encoded feature values corresponding to feature importance scores, optional.

interactive: bool, default=False

Whether to create interactive or static plots.

title: str, optional

The title of the plots.

max_features: int, optional

Maximum number of features to plot, or None to determine this number automatically.

add_sum_of_remaining: bool, default=True

Whether to add the sum of remaining features, if not all features can be plotted.

Returns:

Dict with plots or single plot.

Return type:

dict

plot_bars(explanations: dict | str | Path | DataFrame, interactive: bool = False, title: str | None = None, max_features: int = 10, add_sum_of_remaining: bool = True) dict | DataFrame[source]

Create bar plots of global explanations.

Parameters:
  • explanations (dict | str | Path | DataFrame) – Global explanations to plot, a dict as returned by EnsembleExplainer.explain_global(), i.e., values are Series or DataFrames with features on row index and arbitrary column index.

  • interactive (bool, default=False) – Whether to create interactive or static plots.

  • title (str, optional) – The title of the plots.

  • max_features (int, default=10) – Maximum number of features to plot.

  • add_sum_of_remaining (bool, default=True) – Whether to add the sum of remaining features, if not all features can be plotted.

Returns:

Dict with plots or single plot.

Return type:

dict

average_local_explanations(explanations: DataFrame | dict, sample_weight: ndarray | None = None, **kwargs) ndarray | DataFrame | dict[source]

Average local explanations to get a global overview of feature importance.

Parameters:
  • explanations (DataFrame | dict) – Local explanations to average, DataFrame of shape (*dim, n_samples, n_features) or a (nested) dict thereof with at most two levels of nesting.

  • sample_weight (ndarray, optional) – Sample weights, optional.

Returns:

Averaged explanations, with the same format as what would be returned by method EnsembleExplainer.explain_global(). That is, either a single DataFrame, or a dict whose values are DataFrames.

Return type:

ndarray | DataFrame | dict

Base

class EnsembleExplainer(ensemble: FittedEnsemble = None, config: dict | None = None, feature_names: list | None = None, target_names: list | None = None, x: DataFrame | None = None, y: DataFrame | None = None, params=None)[source]

Bases: object

Class for explaining a given ensemble, or constituents of it.

Parameters:
  • ensemble (FittedEnsemble) – The ensemble to explain, an instance of FittedEnsemble.

  • config (dict, optional) – Config dictionary.

  • feature_names (list, optional) – List of feature names. None defaults to range(n_features), where n_features is determined from x.

  • target_names (list, optional) – List of target names, optional. In case of regression this is the list of target variables, in case of binary classification this is the singleton list with the sole target variable, and in multiclass- and multilabel classification this is the list of classes. None defaults to range(n_targets), where n_targets is determined from y.

  • x (DataFrame, optional) – Training data, which is required by some explanation methods (e.g., SHAP).

  • y (DataFrame, optional) – Labels of x.

  • params (optional) – Params obtained from a previous instantiation of an ensemble explainer of this type on ensemble. If given, neither feature_names, target_names, x nor y may be provided.

Examples

>>> # Paradigm for explaining a pipeline `model` of a FittedEnsemble:
>>>
>>> # Setup:
>>> preprocessing_explainer = TransformationExplainer.make(transformation=model.preprocessing)
>>> x_train = preprocessing_explainer.fit_forward(x_train, y_train)
>>>
>>> # Local explanations for `x_test`:
>>> x_test_pp = preprocessing_estimator.forward(x_test)
>>> explanation = func(x_test_pp)
>>> explanation = preprocessing_explainer.backward(explanation)
>>>
>>> # Global explanations:
>>> explanation = func_global()
>>> explanation = preprocessing_explainer.backward_global(explanation)
>>> # Paradigm for explaining data `(x, y)` after applying some preprocessing steps `preprocessing`:
>>>
>>> preprocessing_explainer = TransformationExplainer.make(transformation=preprocessing)
>>> x_pp = preprocessing_explainer.fit_forward(x, y)
>>> explanation = func(x_pp, y)
>>> explanation = preprocessing_explainer.backward(explanation)       # or `backward_global(explanation)`
static register(name: str, factory: Callable[[...], EnsembleExplainer])[source]

Register a new ensemble explainer factory.

Parameters:
  • name (str) – The name of the ensemble explainer.

  • factory (Callable) – The factory, a function mapping argument-dicts to instances of class EnsembleExplainer (or subclasses thereof).

property behavior: dict[source]

Description of the behavior of methods explain() and explain_global(), especially w.r.t. parameters x and y.

Returns:

Dictionary with keys

  • "supports_local": True if the backend supports local explanations, i.e., method explain() can be called. If False, calling explain() raises an exception.

  • "requires_y": True if y must be passed to explain() and explain_global().

  • "global_accepts_x": True if x can be passed to method explain_global().

  • "global_requires_x": True if x must be passed to method explain_global(). If False but "global_accepts_x" is True, the global behavior differs depending on whether x is provided. "global_requires_x" can only be True if “global_accepts_x” is True as well.

  • "global_is_mean_of_local": True if global explanations are the mean of the individual local explanations, if x is provided. If True, it might be better to call method explain() instead of explain_global(), since the computational effort is identical. Can only be True if “supports_local” is True as well.

Return type:

dict

property params_: dict[source]

Get all params necessary for instantiating this EnsembleExplainer via parameter params.

explain(x: DataFrame, y: DataFrame | None = None, jobs: int = 1, batch_size: int | None = None, model_id=None, mapping: Dict[str, List[str]] | None = None, show_progress: bool = False) dict[source]

Explain the ensemble, or some of its constituent models (pipelines), on a set of samples.

Parameters:
  • x (DataFrame) – The samples, a DataFrame with the same feature columns as the ensemble was trained on.

  • y (DataFrame, optional) – The labels. If given, a DataFrame with the same number of rows and row index as x and the same target columns as the ensemble was trained on. Check property behavior to see whether this argument is required (depends on the backend).

  • jobs (int, default=1) – The number of jobs to use.

  • batch_size (int, optional) – The batch size to use.

  • model_id (optional) – The ID(s) of the model(s) to explain, or None to explain all models in the ensemble.

  • mapping (dict, optional) – Mapping specifying which features to combine: target column names are mapped to lists of source column names in x.

  • show_progress (bool, default=False) – Whether to display a progress bar.

Returns:

Dictionary with 1-2 levels of nesting. The keys in the outer dict are model-IDs (possibly including “__ensemble__”), and the keys in the inner dicts (if any) are arbitrary and usually depend on the prediction task and the explanation backend. Ultimately, the values are DataFrames with the same row index as x and columns corresponding to feature_names, containing feature importance scores. Note that the result consists entirely of floating point values, even if x has categorical or other columns.

Return type:

dict

explain_global(x: DataFrame | None = None, y: DataFrame | None = None, sample_weight: ndarray | None = None, jobs: int = 1, batch_size: int | None = None, model_id=None, mapping: Dict[str, List[str]] | None = None, show_progress: bool = False) dict[source]

Explain the ensemble, or some of its constituent models (pipelines), globally.

Parameters:
  • x (DataFrame, optional) – Samples, optional, a DataFrame with the same columns as the ensemble was trained on. Check property behavior to see whether this argument is accepted or required (depends on the backend).

  • y (DataFrame, optional) – The labels, optional. If given, a DataFrame with the same number of rows and row index as x and the same target columns as the ensemble was trained on. Check property behavior to see whether this argument is required (depends on the backend).

  • sample_weight (ndarray, optional) – Sample weight. Ignored if x is None.

  • jobs (int, default=1) – The number of jobs to use.

  • batch_size (int, optional) – The batch size to use.

  • model_id (optional) – The ID(s) of the model(s) to explain, or None to explain all models in the ensemble.

  • mapping (dict, optional) – Mapping specifying which features to combine: target column names are mapped to lists of source column names in x.

  • show_progress (bool, default=False) – Whether to display a progress bar.

Returns:

Dictionary whose keys are model-IDs (possibly including “__ensemble__”), and whose values are Series or DataFrames with feature importance scores. In either case, the row index equals feature_names, and the columns of DataFrames can be arbitrary and usually depend on the prediction task and the explanation backend.

Return type:

dict

aggregate_features(features: DataFrame, mapping: Dict[str, List[str]]) DataFrame[source]

Combine features for obtaining aggregated values corresponding to aggregated local explanations returned by method aggregate_explanations().

Parameters:
  • features (DataFrame) – DataFrame to aggregate, from which the corresponding local explanations were calculated.

  • mapping (dict) – Mapping specifying which features to combine: target column names are mapped to lists of source column names in features.

Returns:

DataFrame with aggregated features.

Return type:

DataFrame

get_versions() dict[source]

Get the versions of all key packages and libraries this explanation backend depends upon.

Returns:

Dictionary whose keys are package names and whose values are version strings.

Return type:

dict

Sklearn Explainers

class PCAExplainer(transformer: PCA, params=None)[source]

Bases: _LinearTransformationExplainer

class FastICAExplainer(transformer: FastICA, params=None)[source]

Bases: _LinearTransformationExplainer

class TruncatedSVDExplainer(transformer: TruncatedSVD, params=None)[source]

Bases: _LinearTransformationExplainer

class RBFSamplerExplainer(transformer: RBFSampler, params=None)[source]

Bases: _LinearTransformationExplainer