Base Module

Base classes for calibration methods.

This module defines the abstract base class that all calibration methods in the multicalibration library inherit from.

class mcgrad.base.BaseCalibrator[source]

Bases: ABC

Abstract base class for calibration methods.

A calibrator adjusts predicted probabilities so that they are well-calibrated, meaning the predicted probabilities accurately reflect true outcome frequencies. For example, among all predictions of 0.7, approximately 70% should be positive.

Calibrators follow a fit/predict pattern similar to scikit-learn estimators:

  1. Call fit() with training data containing predictions and ground truth labels

  2. Call predict() to obtain calibrated predictions for new data

Alternatively, use fit_transform() to fit and transform in a single call.

Subclasses must implement:

  • fit(): Learn the calibration mapping from training data

  • predict(): Apply the learned calibration to new predictions

Some calibrators (multicalibrators) also accept categorical and numerical feature columns to achieve calibration across different subpopulations (segments) of the data.

abstractmethod fit(df_train, prediction_column_name, label_column_name, weight_column_name=None, categorical_feature_column_names=None, numerical_feature_column_names=None, **kwargs)[source]

Fit the calibration method on the provided training data.

Parameters:
  • df_train (DataFrame) – The dataframe containing the training data.

  • prediction_column_name (str) – Name of the column in dataframe df that contains the predictions.

  • label_column_name (str) – Name of the column in dataframe df that contains the ground truth labels.

  • weight_column_name (str | None) – Name of the column in dataframe df that contains the instance weights.

  • categorical_feature_column_names (list[str] | None) – List of column names in df_train that contain the categorical dimensions that are part of the segment space. This argument is ignored by methods that merely calibrate and do not multicalibrate (e.g., Isotonic regression and Platt scaling).

  • numerical_feature_column_names (list[str] | None) – List of column names in df_train that contain the numerical dimensions that are part of the segment space. This argument is ignored by methods that merely calibrate and do not multicalibrate (e.g., Isotonic regression and Platt scaling).

  • kwargs (Any) – Additional keyword arguments.

Return type:

Self

Returns:

The fitted calibrator instance.

abstractmethod predict(df, prediction_column_name, categorical_feature_column_names=None, numerical_feature_column_names=None, **kwargs)[source]

Apply a calibration model to a DataFrame.

This requires the fit method to have been previously called on this calibrator object.

Parameters:
  • df (DataFrame) – The dataframe containing the data to calibrate.

  • prediction_column_name (str) – Name of the column in dataframe df that contains the predictions.

  • categorical_feature_column_names (list[str] | None) – List of column names in the df that contain the categorical dimensions that are part of the segment space. This argument is ignored by methods that merely calibrate and do not multicalibrate (e.g., Isotonic regression and Platt scaling).

  • numerical_feature_column_names (list[str] | None) – List of column names in the df that contain the numerical dimensions that are part of the segment space. This argument is ignored by methods that merely calibrate and do not multicalibrate (e.g., Isotonic regression and Platt scaling).

  • kwargs (Any) – Additional keyword arguments.

Return type:

ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]

Returns:

Array of calibrated predictions.

fit_transform(df, prediction_column_name, label_column_name, weight_column_name=None, categorical_feature_column_names=None, numerical_feature_column_names=None, is_train_set_col_name=None, **kwargs)[source]

Fit the model and apply calibration transformation to all data.

Parameters:
  • df (DataFrame) – The dataframe containing the data to calibrate.

  • prediction_column_name (str) – Name of the column in dataframe df that contains the predictions.

  • label_column_name (str) – Name of the column in dataframe df that contains the ground truth labels.

  • weight_column_name (str | None) – Name of the column in dataframe df that contains the instance weights.

  • categorical_feature_column_names (list[str] | None) – List of column names in the df that contain the categorical dimensions that are part of the segment space. This argument is ignored by methods that merely calibrate and do not multicalibrate (e.g., Isotonic regression and Platt scaling).

  • numerical_feature_column_names (list[str] | None) – List of column names in the df that contain the numerical dimensions that are part of the segment space. This argument is ignored by methods that merely calibrate and do not multicalibrate (e.g., Isotonic regression and Platt scaling).

  • is_train_set_col_name (str | None) – Name of the column in the dataframe that contains a boolean indicating whether the row is part of the training set (1/True) or test set (0/False). If no is_train_set_col_name is provided, then all rows are considered part of the training set.

  • kwargs (Any) – Additional keyword arguments.

Return type:

ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]

Returns:

Array of calibrated predictions.