Out-of-Distribution

Base

class OODDetector(subset: float = 1, random_state=None, verbose=False)[source]

Bases: BaseEstimator, ClassifierMixin, ABC

Base class for out-of-distribution detection

Parameters:
  • subset (float, default=1) – Fraction of samples to use for training: [0,1].

  • random_state (int, optional) – State for functions working with randomness. If not given initialized randomly.

  • verbose (bool, default=False) – Whether to print log messages

classmethod create(name: str, source: str = 'internal', kwargs=None) OODDetector[source]

factory method for creating OODDetector subclasses or PyOD classes from strings

Parameters:

name: str

If source is ‘internal’ name of OODDetector module in snake_case; if source is ‘pyod’ name of pyod detector module in snake_case; if source is ‘external’ full path to the OODDetector (e.g. module1.module2.CustomOOD)

source: str, default=’internal’

Whether to use internal class (from CaTaBra) , classes from pyod or a custom class. One of [‘internal, ‘pyod’, ‘external’]

kwargs: optional

keyword arguments for the detector class

predict_proba(X: DataFrame)[source]

Get o.o.d. probabilities of the given samples. Note that despite its name, this function does not necessarily return probabilities between 0 and 1, but in any case larger values correspond to an increased likelihood of being o.o.d.

Parameters:
  • X (DataFrame) – The data to analyze.

  • Returns

  • ------- – O.o.d. probabilities. Shape depends on subtype (FeaturewiseOODDetector, SamplewiseOODDetector or OverallOODDetector)

class SamplewiseOODDetector(subset: float = 1, random_state=None, verbose=False)[source]

Bases: OODDetector, ABC

OOD detector that works on a per sample basis. Predictions are of the shape (n_samples,).

class FeaturewiseOODDetector(subset: float = 1, random_state=None, verbose=False)[source]

Bases: OODDetector, ABC

OOD detector that works on a per column basis. Predictions are of the shape (n_selected_cols,), where n_selected_cols are the number of columns returned after applying _transform to the data.

class OverallOODDetector(subset: float = 1, random_state=None, verbose=False)[source]

Bases: OODDetector, ABC

OOD detector that works on a full data set basis. Predictions are int, or float in the proba case.

Internal

class Autoencoder(subset=1, target_dim_factor=0.25, reduction_factor=0.9, thresh=0.5, random_state: int | None = None, verbose=False, mlp_kwargs=None)[source]

Bases: SamplewiseOODDetector

Autoencoder for out-of distribution detection. Uses a neural network to encode data into a lower dimensional space and reconstruct the original data from it. Reconstruction error determines the likelihood of a sample being out-of-distribution.

Parameters:
  • target_dim_factor (float, default=0.25) – Fraction of features in the smallest dimension.

  • reduction_factor (float, default=0.9) – How much each layer reduces the dimensionality

  • threshold (float, default=0.5) – Threshold value to decide when a sample is out of distribution

class SeparableMLP(hidden_layer_sizes=(100,), activation='relu', *, solver='adam', alpha=0.0001, batch_size='auto', learning_rate='constant', learning_rate_init=0.001, power_t=0.5, max_iter=200, shuffle=True, random_state=None, tol=0.0001, verbose=False, warm_start=False, momentum=0.9, nesterovs_momentum=True, early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-08, n_iter_no_change=10, max_fun=15000)[source]

Bases: MLPRegressor

class BinsDetector(subset=1, bins: None | DataFrame | int = None, random_state: int | None = None, verbose=True)[source]

Bases: SamplewiseOODDetector

Simple OOD detector that distributes the training set into equally sized bins. A sample is considered OOD if it falls within a bin with no corresponding training samples.

Parameters:

bins (int | DataFrame, optional) – Number of bins for each column. if int each column uses the same amount of bins. Defaults to 2 * std for each columns.

class KSTest(subset=1, p_val=0.05, random_state: int | None = None, verbose=True)[source]

Bases: FeaturewiseOODDetector

Two sample Kolmogorov-Smirnov test [1]. Hypothesis test for the following question: “How likely is it that we would see two sets of samples like this if they were drawn from the same (but unknown) probability distribution?”

References

soft_brownian_offset(X, d_min=0.5, d_off=0.1, n_samples=1, show_progress=False, softness=False, random_state=None)[source]

Generates OOD samples using SBO on the input X and returns n_samples number of samples constrained by other parameters. Based on [1].

Parameters:
  • X (ndarray) – In-distribution (ID) data to form OOD samples around. First dimension contains samples.

  • d_min (float, default=0.5) – (Likely) Minimum distance to ID data.

  • d_off (float, default=0.1) – Offset distance used in each iteration.

  • n_samples (int, default=1) – Number of samples to return.

  • show_progress (bool, default=False) – Whether to show a tqdm progress bar.

  • softness (float, default=False) – Describes softness of minimum distance. Parameter between 0 (hard) and 1 (soft).

  • random_state (int) – RNG state used for reproducibility.

Returns:

Out of distribution samples of shape (n_samples, X.shape[1])

Return type:

ndarray

References

gaussian_hyperspheric_offset(n_samples, mu=4, std=0.7, n_dim=3, random_state=None)[source]

Generates OOD samples using GHO and returns n_samples number of samples constrained by other parameters. Inspired by [1].

Parameters:
  • n_samples (int) – Number of samples to return.

  • mu (float, default=4) – Mean of distribution.

  • std (float, default=0.7) – Standard deviation of distribution.

  • n_dim (int, default=3) – Number of dimensions.

  • random_state (int, optional) – RNG state used for reproducibility.

Returns:

Out of distribution samples of shape (n_samples, n_dim)

Return type:

ndarray

References

class SoftBrownianOffset(subset: float = 1, classifier=<class 'sklearn.ensemble._forest.RandomForestClassifier'>, dim_reduction=<class 'sklearn.decomposition._pca.PCA'>, dist_min: float = 0.2, dist_off: float = 0.01, softness: float = 0, samples: float = 1, random_state: int | None = None, verbose: bool = True, **kwargs)[source]

Bases: SamplewiseOODDetector

Out-of-Distribution detector using soft brownian offset. Transforms samples into a lower dimensional space and generates synthetic OOD samples in this subspace. A classifier is trained to detect the OOD samples.

Parameters:
  • classifier (default=RandomForestClassifier) – Classifier for training to differentiate in- (ID) and out-of-distribution (OOD) samples.

  • dim_reduction (default=PCA) – Dimensionality reduction algorithm to use.

  • dist_min (float, default=0.2) – (Likely) Minimum distance to ID data

  • dist_off (float, default=0.01) – Offset distance used in each iteration

  • softness (float, default=0) – Describes softness of minimum distance. Parameter between 0 (hard) and 1 (soft)

  • samples (float, default=1) – Number of samples to return in proportion to original samples

PyOD

class PyODDetector(name: str, subset: float = 1, transformer=<function make_standard_transformer>, verbose=False, **kwargs)[source]

Bases: SamplewiseOODDetector

class to transform PyOD ood class into a CaTabRa OODDetector Requires pyod to be installed.

Parameters:
  • name (str) – Name of the module the detector class is in. Given in snake_case format.

  • subset (float) – Proportion of samples to use [0,1]

  • transformer – Transformer to apply to data before fitting the detector. Must implement fit(X) and transform(X).

  • verbose (bool, default=False) – Whether to log the detection steps.

  • **kwargs – Keyword arguments for the specific pyod detector.

property pyod_detector[source]