If source is ‘internal’ name of OODDetector module in snake_case; if source is ‘pyod’ name of pyod
detector module in snake_case; if source is ‘external’ full path to the OODDetector
(e.g. module1.module2.CustomOOD)
source: str, default=’internal’
Whether to use internal class (from CaTaBra) , classes from pyod or a custom class. One of [‘internal,
‘pyod’, ‘external’]
Get o.o.d. probabilities of the given samples. Note that despite its name, this function does not necessarily
return probabilities between 0 and 1, but in any case larger values correspond to an increased likelihood of
being o.o.d.
Parameters:
X (DataFrame) – The data to analyze.
Returns –
------- – O.o.d. probabilities. Shape depends on subtype (FeaturewiseOODDetector, SamplewiseOODDetector or
OverallOODDetector)
OOD detector that works on a per column basis.
Predictions are of the shape (n_selected_cols,), where n_selected_cols are the number of columns returned after
applying _transform to the data.
Autoencoder for out-of distribution detection.
Uses a neural network to encode data into a lower dimensional space and reconstruct the original data from it.
Reconstruction error determines the likelihood of a sample being out-of-distribution.
Parameters:
target_dim_factor (float, default=0.25) – Fraction of features in the smallest dimension.
reduction_factor (float, default=0.9) – How much each layer reduces the dimensionality
threshold (float, default=0.5) – Threshold value to decide when a sample is out of distribution
Simple OOD detector that distributes the training set into equally sized bins.
A sample is considered OOD if it falls within a bin with no corresponding training samples.
Parameters:
bins (int | DataFrame, optional) – Number of bins for each column. if int each column uses the same amount of bins. Defaults to 2 * std for each
columns.
Two sample Kolmogorov-Smirnov test [1].
Hypothesis test for the following question:
“How likely is it that we would see two sets of samples like this if they were drawn from the same (but unknown)
probability distribution?”
Out-of-Distribution detector using soft brownian offset.
Transforms samples into a lower dimensional space and generates synthetic OOD samples in this subspace.
A classifier is trained to detect the OOD samples.
Parameters:
classifier (default=RandomForestClassifier) – Classifier for training to differentiate in- (ID) and out-of-distribution (OOD) samples.
dim_reduction (default=PCA) – Dimensionality reduction algorithm to use.
dist_min (float, default=0.2) – (Likely) Minimum distance to ID data
dist_off (float, default=0.01) – Offset distance used in each iteration
softness (float, default=0) – Describes softness of minimum distance. Parameter between 0 (hard) and 1 (soft)
samples (float, default=1) – Number of samples to return in proportion to original samples