Methods Module
Calibration methods for machine learning models.
This module provides implementations of various calibration techniques including multicalibration methods (MCGrad), traditional approaches (Platt scaling, isotonic regression), and segment-aware calibrators.
All calibrators follow a scikit-learn-style fit/predict interface defined by
BaseCalibrator.
- class mcgrad.methods.MCGrad(encode_categorical_variables=True, monotone_t=None, num_rounds=None, lightgbm_params=None, early_stopping=None, patience=None, early_stopping_use_crossvalidation=None, n_folds=None, early_stopping_score_func=None, early_stopping_minimize_score=None, early_stopping_timeout=28800, save_training_performance=False, monitored_metrics_during_training=None, allow_missing_segment_feature_values=True, random_state=42)[source]
Bases:
_BaseMCGradMCGrad (Multicalibration Gradient Boosting) as described in [1].
References:
- [1] Tax, N., Perini, L., Linder, F., Haimovich, D., Karamshuk, D., Okati, N., Vojnovic, M.,
& Apostolopoulos, P. A. (2026). MCGrad: Multicalibration at Web Scale. In Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2026). https://doi.org/10.1145/3770854.3783954
arXiv preprint: https://arxiv.org/abs/2509.19884
- UNSHRINK_LOGIT_EPSILON = 10
- DEFAULT_HYPERPARAMS: dict[str, Any] = {'early_stopping': True, 'lightgbm_params': {'lambda_l2': 0.009131373863997217, 'learning_rate': 0.028729759162731475, 'max_depth': 5, 'min_child_samples': 160, 'min_gain_to_split': 0.15007305226251808, 'n_estimators': 94, 'num_leaves': 5}, 'monotone_t': False, 'n_folds': 5, 'patience': 0}
- DEFAULT_ALLOW_MISSING_SEGMENT_FEATURE_VALUES = True
- ESS_THRESHOLD_FOR_CROSS_VALIDATION = 2500000
- MAX_NUM_ROUNDS_EARLY_STOPPING = 100
- MCE_STAT_SIGN_THRESHOLD = 2.49767216
- MCE_STRONG_EVIDENCE_THRESHOLD = 4.70812972
- NUM_ROUNDS_DEFAULT_NO_EARLY_STOPPING = 5
- VALID_SIZE = 0.4
- __init__(encode_categorical_variables=True, monotone_t=None, num_rounds=None, lightgbm_params=None, early_stopping=None, patience=None, early_stopping_use_crossvalidation=None, n_folds=None, early_stopping_score_func=None, early_stopping_minimize_score=None, early_stopping_timeout=28800, save_training_performance=False, monitored_metrics_during_training=None, allow_missing_segment_feature_values=True, random_state=42)
- Parameters:
encode_categorical_variables (
bool) – whether to encode categorical variables using a modified label encoding (when True), or whether to assume that categorical variables are already manipulated into the right format prior to calling MCGrad (when False).monotone_t (
bool|None) – whether to use a monotonicity constraint on the logit feature (i.e., t): value True implies that the decision tree is blocked from creating splits where a lower value of t results in a higher predicted probability.num_rounds (
int|None) – number of rounds boosting that is used in MCGrad. When early stopping is used, then num_rounds specifies the maximum number of rounds. If set to None, default values are used.lightgbm_params (
dict[str,Any] |None) – the training parameters of lightgbm model. See: https://lightgbm.readthedocs.io/en/stable/Parameters.html if None, we will use a set of default parameters.early_stopping (
bool|None) – whether to use early stopping. When early stopping is used, then num_rounds specifies the maximum number of rounds that are fit, and the effective number of rounds is determined based on validation performance.patience (
int|None) – the maximum number of consecutive rounds without improvement in early_stopping_score_func.early_stopping_use_crossvalidation (
bool|None) – whether to use cross-validation (k-fold) for early stopping (otherwise use holdout). If set to None, then the evaluation method is determined automatically.early_stopping_score_func (
_ScoreFunctionInterface|None) – the metric used to select the optimal number of rounds, when early stopping is used. If None, a subclass-specific default is used (log_loss for MCGrad, MSE for RegressionMCGrad). Usewrap_sklearn_metric_func()to wrap an sklearn metric, orwrap_multicalibration_error_metric()for multicalibration error.early_stopping_minimize_score (
bool|None) – whether the score function used for early stopping should be minimized (True) or maximized (False). Defaults to None, which automatically determines the direction based on the default metric. Must be explicitly set when providing a customearly_stopping_score_func.early_stopping_timeout (
int|None) – number of seconds after which early stopping is forced to stop and the number of rounds is determined. If set to None, then early stopping will not time out. Ignored when early stopping is disabled.n_folds (
int|None) – number of folds for k-fold cross-validation (used only when early_stopping_use_crossvalidation is True; or when that argument is None and k-fold is chosen automatically).save_training_performance (
bool) – whether to save the training performance values for each round, in addition to the performance on the held-out validation set. This parameter is only relevant when early stopping is used. If set to False, then only the performance on the held-out validation set is saved.monitored_metrics_during_training (
list[_ScoreFunctionInterface] |None) – a list of metrics to monitor during training. This parameter is only relevant when early stopping is used. It includes which metrics to monitor during training, in addition to the metric used for early stopping (score_func).allow_missing_segment_feature_values (
bool) – whether to allow missing values in the segment feature data. If set to True, missing values are used for training and prediction. If set to False, training with missing values will raise an Exception and prediction with missing values will return None.random_state (
int|Generator|None) – Controls randomness for reproducibility. Can be an integer seed, a numpy Generator, or None for non-deterministic behavior.
- classmethod deserialize(model_str)
Deserializes an MCGrad model from a JSON string.
Reconstructs a fitted MCGrad model from a previously serialized representation. The behavior depends on the
schema_versionfield:schema_version == 2orschema_version == 1: full configuration round-trip for the fields listed in_SCHEMA_V1_INIT_KWARGS.self.num_roundsis restored to the configured upper bound; usenum_rounds_trainedto get the actual booster count.no
schema_versionfield (legacy): boosters and encoder are restored; all other configuration falls back to defaults and a warning is logged.unknown
schema_version: raisesValueError.
- Parameters:
model_str (
str) – JSON string containing the serialized model- Return type:
Self- Returns:
A fitted MCGrad instance with all state restored
- feature_importance()
Returns the feature importance of the first MCGrad round.
Importance is defined as the total gain from splits on a feature from the first round of MCGrad.
- Return type:
- Returns:
A dataframe with columns ‘feature’ and ‘importance’, sorted by importance in descending order
- fit(df_train, prediction_column_name, label_column_name, weight_column_name=None, categorical_feature_column_names=None, numerical_feature_column_names=None, df_val=None, **kwargs)
Fit the MCGrad calibration model on the provided training data.
- Parameters:
df_train (
DataFrame) – The dataframe containing the training dataprediction_column_name (
str) – Name of the column in dataframe df that contains the uncalibrated predictionslabel_column_name (
str) – Name of the column in dataframe df that contains the ground truth labelsweight_column_name (
str|None) – Name of the column in dataframe df that contains the instance weightscategorical_feature_column_names (
list[str] |None) – List of column names in df_train that contain the categorical segmentation featuresnumerical_feature_column_names (
list[str] |None) – List of column names in df_train that contain the numerical segmentation featuresdf_val (
DataFrame|None) – Optional validation dataframe for early stopping. When provided with early stopping enabled, this validation set will be used instead of a holdout from the training data. early_stopping_use_crossvalidation has to be set to False for this to work.
- Return type:
Self- Returns:
The fitted calibrator instance
- fit_transform(df, prediction_column_name, label_column_name, weight_column_name=None, categorical_feature_column_names=None, numerical_feature_column_names=None, is_train_set_col_name=None, **kwargs)
Fit the model and apply calibration transformation to all data.
- Parameters:
df (
DataFrame) – The dataframe containing the data to calibrate.prediction_column_name (
str) – Name of the column in dataframe df that contains the predictions.label_column_name (
str) – Name of the column in dataframe df that contains the ground truth labels.weight_column_name (
str|None) – Name of the column in dataframe df that contains the instance weights.categorical_feature_column_names (
list[str] |None) – List of column names in the df that contain the categorical dimensions that are part of the segment space. This argument is ignored by methods that merely calibrate and do not multicalibrate (e.g., Isotonic regression and Platt scaling).numerical_feature_column_names (
list[str] |None) – List of column names in the df that contain the numerical dimensions that are part of the segment space. This argument is ignored by methods that merely calibrate and do not multicalibrate (e.g., Isotonic regression and Platt scaling).is_train_set_col_name (
str|None) – Name of the column in the dataframe that contains a boolean indicating whether the row is part of the training set (1/True) or test set (0/False). If no is_train_set_col_name is provided, then all rows are considered part of the training set.kwargs (
Any) – Additional keyword arguments.
- Return type:
ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]]- Returns:
Array of calibrated predictions.
- property num_rounds_trained: int
Number of boosting rounds actually trained on this instance.
This is distinct from
num_rounds, which is the configured upper bound supplied at construction time. With early stopping, the trained count can be strictly less than the configured upper bound. Returns0on an unfitted instance (equivalent tolen(self.mr)).
- property performance_metrics: dict[str, list[float]]
Returns the performance metrics collected during early stopping procedure.
Metrics are tracked for each round of MCGrad during the early stopping phase. The dictionary contains metric names as keys and lists of values (one per round) as values. Metrics include the early stopping metric and any additional monitored metrics specified during initialization.
- Returns:
Dictionary mapping metric names to lists of values per round
- predict(df, prediction_column_name, categorical_feature_column_names=None, numerical_feature_column_names=None, return_all_rounds=False, **kwargs)
Apply the MCGrad calibration model to a DataFrame.
This requires the fit method to have been previously called on this calibrator object.
- Parameters:
df (
DataFrame) – The dataframe containing the data to calibrateprediction_column_name (
str) – Name of the column in dataframe df that contains the predictionscategorical_feature_column_names (
list[str] |None) – List of column names in the df that contain the categorical segmentation featuresnumerical_feature_column_names (
list[str] |None) – List of column names in the df that contain the numerical segmentation featuresreturn_all_rounds (
bool) – If True, returns predictions for all MCGrad rounds as a 2D array of shape (num_rounds, num_samples). If False, returns only the final round predictions as a 1D arraykwargs (
Any) – Additional keyword arguments
- Return type:
ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]]- Returns:
Array of calibrated predictions. Shape depends on return_all_rounds parameter
- serialize()
Serializes the fitted MCGrad model to a JSON string.
The serialized model includes all boosters, unshrink factors, encoder state, and the full JSON-serializable configuration, allowing the model to be saved and restored later.
The output carries a
schema_versionfield.2: identical structure to version 1; the bump signals that downstream consumers should enforce version checks.1: persists the simple scalar and dict-valued__init__kwargs (see_SCHEMA_V1_INIT_KWARGS).
Fields backed by callables or RNG objects (custom
early_stopping_score_func,early_stopping_minimize_score,monitored_metrics_during_training,random_state) are not persisted; a deserialized model uses subclass defaults for those.- Return type:
- Returns:
JSON string containing the serialized model
- class mcgrad.methods.RegressionMCGrad(encode_categorical_variables=True, monotone_t=None, num_rounds=None, lightgbm_params=None, early_stopping=None, patience=None, early_stopping_use_crossvalidation=None, n_folds=None, early_stopping_score_func=None, early_stopping_minimize_score=None, early_stopping_timeout=28800, save_training_performance=False, monitored_metrics_during_training=None, allow_missing_segment_feature_values=True, random_state=42)[source]
Bases:
_BaseMCGradRegression variant of MCGrad for continuous label calibration.
Note that automatic determination of train/test split vs. cross validation is currently not supported for Regression.
- DEFAULT_HYPERPARAMS: dict[str, Any] = {'early_stopping': True, 'lightgbm_params': {'learning_rate': 0.1, 'max_depth': -1, 'min_child_samples': 20, 'min_gain_to_split': 0, 'n_estimators': 100, 'num_leaves': 31}, 'monotone_t': False, 'n_folds': 5, 'patience': 0}
- DEFAULT_ALLOW_MISSING_SEGMENT_FEATURE_VALUES = True
- ESS_THRESHOLD_FOR_CROSS_VALIDATION = 2500000
- MAX_NUM_ROUNDS_EARLY_STOPPING = 100
- MCE_STAT_SIGN_THRESHOLD = 2.49767216
- MCE_STRONG_EVIDENCE_THRESHOLD = 4.70812972
- NUM_ROUNDS_DEFAULT_NO_EARLY_STOPPING = 5
- VALID_SIZE = 0.4
- __init__(encode_categorical_variables=True, monotone_t=None, num_rounds=None, lightgbm_params=None, early_stopping=None, patience=None, early_stopping_use_crossvalidation=None, n_folds=None, early_stopping_score_func=None, early_stopping_minimize_score=None, early_stopping_timeout=28800, save_training_performance=False, monitored_metrics_during_training=None, allow_missing_segment_feature_values=True, random_state=42)
- Parameters:
encode_categorical_variables (
bool) – whether to encode categorical variables using a modified label encoding (when True), or whether to assume that categorical variables are already manipulated into the right format prior to calling MCGrad (when False).monotone_t (
bool|None) – whether to use a monotonicity constraint on the logit feature (i.e., t): value True implies that the decision tree is blocked from creating splits where a lower value of t results in a higher predicted probability.num_rounds (
int|None) – number of rounds boosting that is used in MCGrad. When early stopping is used, then num_rounds specifies the maximum number of rounds. If set to None, default values are used.lightgbm_params (
dict[str,Any] |None) – the training parameters of lightgbm model. See: https://lightgbm.readthedocs.io/en/stable/Parameters.html if None, we will use a set of default parameters.early_stopping (
bool|None) – whether to use early stopping. When early stopping is used, then num_rounds specifies the maximum number of rounds that are fit, and the effective number of rounds is determined based on validation performance.patience (
int|None) – the maximum number of consecutive rounds without improvement in early_stopping_score_func.early_stopping_use_crossvalidation (
bool|None) – whether to use cross-validation (k-fold) for early stopping (otherwise use holdout). If set to None, then the evaluation method is determined automatically.early_stopping_score_func (
_ScoreFunctionInterface|None) – the metric used to select the optimal number of rounds, when early stopping is used. If None, a subclass-specific default is used (log_loss for MCGrad, MSE for RegressionMCGrad). Usewrap_sklearn_metric_func()to wrap an sklearn metric, orwrap_multicalibration_error_metric()for multicalibration error.early_stopping_minimize_score (
bool|None) – whether the score function used for early stopping should be minimized (True) or maximized (False). Defaults to None, which automatically determines the direction based on the default metric. Must be explicitly set when providing a customearly_stopping_score_func.early_stopping_timeout (
int|None) – number of seconds after which early stopping is forced to stop and the number of rounds is determined. If set to None, then early stopping will not time out. Ignored when early stopping is disabled.n_folds (
int|None) – number of folds for k-fold cross-validation (used only when early_stopping_use_crossvalidation is True; or when that argument is None and k-fold is chosen automatically).save_training_performance (
bool) – whether to save the training performance values for each round, in addition to the performance on the held-out validation set. This parameter is only relevant when early stopping is used. If set to False, then only the performance on the held-out validation set is saved.monitored_metrics_during_training (
list[_ScoreFunctionInterface] |None) – a list of metrics to monitor during training. This parameter is only relevant when early stopping is used. It includes which metrics to monitor during training, in addition to the metric used for early stopping (score_func).allow_missing_segment_feature_values (
bool) – whether to allow missing values in the segment feature data. If set to True, missing values are used for training and prediction. If set to False, training with missing values will raise an Exception and prediction with missing values will return None.random_state (
int|Generator|None) – Controls randomness for reproducibility. Can be an integer seed, a numpy Generator, or None for non-deterministic behavior.
- classmethod deserialize(model_str)
Deserializes an MCGrad model from a JSON string.
Reconstructs a fitted MCGrad model from a previously serialized representation. The behavior depends on the
schema_versionfield:schema_version == 2orschema_version == 1: full configuration round-trip for the fields listed in_SCHEMA_V1_INIT_KWARGS.self.num_roundsis restored to the configured upper bound; usenum_rounds_trainedto get the actual booster count.no
schema_versionfield (legacy): boosters and encoder are restored; all other configuration falls back to defaults and a warning is logged.unknown
schema_version: raisesValueError.
- Parameters:
model_str (
str) – JSON string containing the serialized model- Return type:
Self- Returns:
A fitted MCGrad instance with all state restored
- feature_importance()
Returns the feature importance of the first MCGrad round.
Importance is defined as the total gain from splits on a feature from the first round of MCGrad.
- Return type:
- Returns:
A dataframe with columns ‘feature’ and ‘importance’, sorted by importance in descending order
- fit(df_train, prediction_column_name, label_column_name, weight_column_name=None, categorical_feature_column_names=None, numerical_feature_column_names=None, df_val=None, **kwargs)
Fit the MCGrad calibration model on the provided training data.
- Parameters:
df_train (
DataFrame) – The dataframe containing the training dataprediction_column_name (
str) – Name of the column in dataframe df that contains the uncalibrated predictionslabel_column_name (
str) – Name of the column in dataframe df that contains the ground truth labelsweight_column_name (
str|None) – Name of the column in dataframe df that contains the instance weightscategorical_feature_column_names (
list[str] |None) – List of column names in df_train that contain the categorical segmentation featuresnumerical_feature_column_names (
list[str] |None) – List of column names in df_train that contain the numerical segmentation featuresdf_val (
DataFrame|None) – Optional validation dataframe for early stopping. When provided with early stopping enabled, this validation set will be used instead of a holdout from the training data. early_stopping_use_crossvalidation has to be set to False for this to work.
- Return type:
Self- Returns:
The fitted calibrator instance
- fit_transform(df, prediction_column_name, label_column_name, weight_column_name=None, categorical_feature_column_names=None, numerical_feature_column_names=None, is_train_set_col_name=None, **kwargs)
Fit the model and apply calibration transformation to all data.
- Parameters:
df (
DataFrame) – The dataframe containing the data to calibrate.prediction_column_name (
str) – Name of the column in dataframe df that contains the predictions.label_column_name (
str) – Name of the column in dataframe df that contains the ground truth labels.weight_column_name (
str|None) – Name of the column in dataframe df that contains the instance weights.categorical_feature_column_names (
list[str] |None) – List of column names in the df that contain the categorical dimensions that are part of the segment space. This argument is ignored by methods that merely calibrate and do not multicalibrate (e.g., Isotonic regression and Platt scaling).numerical_feature_column_names (
list[str] |None) – List of column names in the df that contain the numerical dimensions that are part of the segment space. This argument is ignored by methods that merely calibrate and do not multicalibrate (e.g., Isotonic regression and Platt scaling).is_train_set_col_name (
str|None) – Name of the column in the dataframe that contains a boolean indicating whether the row is part of the training set (1/True) or test set (0/False). If no is_train_set_col_name is provided, then all rows are considered part of the training set.kwargs (
Any) – Additional keyword arguments.
- Return type:
ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]]- Returns:
Array of calibrated predictions.
- property num_rounds_trained: int
Number of boosting rounds actually trained on this instance.
This is distinct from
num_rounds, which is the configured upper bound supplied at construction time. With early stopping, the trained count can be strictly less than the configured upper bound. Returns0on an unfitted instance (equivalent tolen(self.mr)).
- property performance_metrics: dict[str, list[float]]
Returns the performance metrics collected during early stopping procedure.
Metrics are tracked for each round of MCGrad during the early stopping phase. The dictionary contains metric names as keys and lists of values (one per round) as values. Metrics include the early stopping metric and any additional monitored metrics specified during initialization.
- Returns:
Dictionary mapping metric names to lists of values per round
- predict(df, prediction_column_name, categorical_feature_column_names=None, numerical_feature_column_names=None, return_all_rounds=False, **kwargs)
Apply the MCGrad calibration model to a DataFrame.
This requires the fit method to have been previously called on this calibrator object.
- Parameters:
df (
DataFrame) – The dataframe containing the data to calibrateprediction_column_name (
str) – Name of the column in dataframe df that contains the predictionscategorical_feature_column_names (
list[str] |None) – List of column names in the df that contain the categorical segmentation featuresnumerical_feature_column_names (
list[str] |None) – List of column names in the df that contain the numerical segmentation featuresreturn_all_rounds (
bool) – If True, returns predictions for all MCGrad rounds as a 2D array of shape (num_rounds, num_samples). If False, returns only the final round predictions as a 1D arraykwargs (
Any) – Additional keyword arguments
- Return type:
ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]]- Returns:
Array of calibrated predictions. Shape depends on return_all_rounds parameter
- serialize()
Serializes the fitted MCGrad model to a JSON string.
The serialized model includes all boosters, unshrink factors, encoder state, and the full JSON-serializable configuration, allowing the model to be saved and restored later.
The output carries a
schema_versionfield.2: identical structure to version 1; the bump signals that downstream consumers should enforce version checks.1: persists the simple scalar and dict-valued__init__kwargs (see_SCHEMA_V1_INIT_KWARGS).
Fields backed by callables or RNG objects (custom
early_stopping_score_func,early_stopping_minimize_score,monitored_metrics_during_training,random_state) are not persisted; a deserialized model uses subclass defaults for those.- Return type:
- Returns:
JSON string containing the serialized model
- early_stopping_score_func: _ScoreFunctionInterface
- early_stopping_estimation_method: _EstimationMethod
- class mcgrad.methods.PlattScaling[source]
Bases:
BaseCalibratorPlatt scaling calibration method.
Platt scaling fits a logistic regression model to transform uncalibrated predictions into calibrated probabilities. Given an uncalibrated prediction \(\hat{p}\), it first converts to log-odds (logit): \(t = \log(\hat{p} / (1 - \hat{p}))\), then fits the model:
\[P(y=1 | t) = \sigma(a \cdot t + b)\]where \(\sigma\) is the sigmoid function and \(a, b\) are learned parameters. This is equivalent to fitting a logistic regression with a single feature (the logit of the original prediction).
References:
Platt, J. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in large margin classifiers, 10(3), 61-74.
Niculescu-Mizil, A., & Caruana, R. (2005). Predicting good probabilities with supervised learning. International Conference on Machine Learning (ICML). pp. 625-632.
- log_reg: LogisticRegression | None
- fit(df_train, prediction_column_name, label_column_name, weight_column_name=None, categorical_feature_column_names=None, numerical_feature_column_names=None, **kwargs)[source]
Fit the Platt scaling model on the provided training data.
- Parameters:
df_train (
DataFrame) – The dataframe containing the training dataprediction_column_name (
str) – Name of the column in dataframe df that contains the predictionslabel_column_name (
str) – Name of the column in dataframe df that contains the ground truth labelsweight_column_name (
str|None) – Name of the column in dataframe df that contains the instance weightscategorical_feature_column_names (
list[str] |None) – Ignored for Platt scaling (no multicalibration)numerical_feature_column_names (
list[str] |None) – Ignored for Platt scaling (no multicalibration)kwargs (
Any) – Additional keyword arguments
- Return type:
Self- Returns:
The fitted calibrator instance
- predict(df, prediction_column_name, categorical_feature_column_names=None, numerical_feature_column_names=None, **kwargs)[source]
Apply the Platt scaling model to a DataFrame.
This requires the fit method to have been previously called on this calibrator object.
- Parameters:
df (
DataFrame) – The dataframe containing the data to calibrateprediction_column_name (
str) – Name of the column in dataframe df that contains the predictionscategorical_feature_column_names (
list[str] |None) – Ignored for Platt scaling (no multicalibration)numerical_feature_column_names (
list[str] |None) – Ignored for Platt scaling (no multicalibration)kwargs (
Any) – Additional keyword arguments
- Return type:
ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]]- Returns:
Array of calibrated predictions
- fit_transform(df, prediction_column_name, label_column_name, weight_column_name=None, categorical_feature_column_names=None, numerical_feature_column_names=None, is_train_set_col_name=None, **kwargs)
Fit the model and apply calibration transformation to all data.
- Parameters:
df (
DataFrame) – The dataframe containing the data to calibrate.prediction_column_name (
str) – Name of the column in dataframe df that contains the predictions.label_column_name (
str) – Name of the column in dataframe df that contains the ground truth labels.weight_column_name (
str|None) – Name of the column in dataframe df that contains the instance weights.categorical_feature_column_names (
list[str] |None) – List of column names in the df that contain the categorical dimensions that are part of the segment space. This argument is ignored by methods that merely calibrate and do not multicalibrate (e.g., Isotonic regression and Platt scaling).numerical_feature_column_names (
list[str] |None) – List of column names in the df that contain the numerical dimensions that are part of the segment space. This argument is ignored by methods that merely calibrate and do not multicalibrate (e.g., Isotonic regression and Platt scaling).is_train_set_col_name (
str|None) – Name of the column in the dataframe that contains a boolean indicating whether the row is part of the training set (1/True) or test set (0/False). If no is_train_set_col_name is provided, then all rows are considered part of the training set.kwargs (
Any) – Additional keyword arguments.
- Return type:
ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]]- Returns:
Array of calibrated predictions.
- class mcgrad.methods.IsotonicRegression[source]
Bases:
BaseCalibratorIsotonic regression calibration method.
Isotonic regression fits a non-decreasing step function that minimizes the mean squared error between calibrated predictions and true labels, subject to a monotonicity constraint. Given uncalibrated predictions \(\hat{p}_i\) and labels \(y_i\), it finds:
\[\min_{f} \sum_{i} (y_i - f(\hat{p}_i))^2 \quad \text{subject to} \quad f(\hat{p}_i) \leq f(\hat{p}_j) \text{ whenever } \hat{p}_i \leq \hat{p}_j\]The result is a piecewise-constant function that maps predictions to calibrated probabilities. For input values outside of the training domain, predictions are clipped to the value corresponding to the nearest training interval endpoint.
References:
Zadrozny, B., & Elkan, C. (2001). Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. International Conference on Machine Learning (ICML). pp. 609-616.
Niculescu-Mizil, A., & Caruana, R. (2005). Predicting good probabilities with supervised learning. International Conference on Machine Learning (ICML). pp. 625-632.
- __init__()[source]
Initializes an IsotonicRegression calibrator.
Creates an isotonic regression model that enforces monotonicity constraints. For input values outside of the training domain, predictions are set to the value corresponding to the nearest training interval endpoint.
- fit(df_train, prediction_column_name, label_column_name, weight_column_name=None, categorical_feature_column_names=None, numerical_feature_column_names=None, **kwargs)[source]
Fit the isotonic regression calibration model on the provided training data.
- Parameters:
df_train (
DataFrame) – The dataframe containing the training dataprediction_column_name (
str) – Name of the column in dataframe df that contains the predictionslabel_column_name (
str) – Name of the column in dataframe df that contains the ground truth labelsweight_column_name (
str|None) – Name of the column in dataframe df that contains the instance weightscategorical_feature_column_names (
list[str] |None) – Ignored for isotonic regression (no multicalibration)numerical_feature_column_names (
list[str] |None) – Ignored for isotonic regression (no multicalibration)kwargs (
Any) – Additional keyword arguments
- Return type:
Self- Returns:
The fitted calibrator instance
- predict(df, prediction_column_name, categorical_feature_column_names=None, numerical_feature_column_names=None, **kwargs)[source]
Apply the isotonic regression calibration model to a DataFrame.
This requires the fit method to have been previously called on this calibrator object.
- Parameters:
df (
DataFrame) – The dataframe containing the data to calibrateprediction_column_name (
str) – Name of the column in dataframe df that contains the predictionscategorical_feature_column_names (
list[str] |None) – Ignored for isotonic regression (no multicalibration)numerical_feature_column_names (
list[str] |None) – Ignored for isotonic regression (no multicalibration)kwargs (
Any) – Additional keyword arguments
- Return type:
ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]]- Returns:
Array of calibrated predictions
- fit_transform(df, prediction_column_name, label_column_name, weight_column_name=None, categorical_feature_column_names=None, numerical_feature_column_names=None, is_train_set_col_name=None, **kwargs)
Fit the model and apply calibration transformation to all data.
- Parameters:
df (
DataFrame) – The dataframe containing the data to calibrate.prediction_column_name (
str) – Name of the column in dataframe df that contains the predictions.label_column_name (
str) – Name of the column in dataframe df that contains the ground truth labels.weight_column_name (
str|None) – Name of the column in dataframe df that contains the instance weights.categorical_feature_column_names (
list[str] |None) – List of column names in the df that contain the categorical dimensions that are part of the segment space. This argument is ignored by methods that merely calibrate and do not multicalibrate (e.g., Isotonic regression and Platt scaling).numerical_feature_column_names (
list[str] |None) – List of column names in the df that contain the numerical dimensions that are part of the segment space. This argument is ignored by methods that merely calibrate and do not multicalibrate (e.g., Isotonic regression and Platt scaling).is_train_set_col_name (
str|None) – Name of the column in the dataframe that contains a boolean indicating whether the row is part of the training set (1/True) or test set (0/False). If no is_train_set_col_name is provided, then all rows are considered part of the training set.kwargs (
Any) – Additional keyword arguments.
- Return type:
ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]]- Returns:
Array of calibrated predictions.
- class mcgrad.methods.MultiplicativeAdjustment(clip_to_zero_one=True)[source]
Bases:
BaseCalibratorCalibrates predictions by applying a multiplicative correction factor.
This method computes a scalar multiplier \(m\) that aligns the sum of predictions with the sum of labels. Given predictions \(\hat{p}_i\), labels \(y_i\), and optional weights \(w_i\), the multiplier is computed as:
\[m = \frac{\sum_i w_i y_i}{\sum_i w_i \hat{p}_i}\]The calibrated predictions are then \(m \cdot \hat{p}_i\). This is useful when predictions are directionally correct but systematically over- or under-estimated.
- __init__(clip_to_zero_one=True)[source]
- Parameters:
clip_to_zero_one (
bool) – If True, clips calibrated predictions to the [0, 1] range.
- fit(df_train, prediction_column_name, label_column_name, weight_column_name=None, categorical_feature_column_names=None, numerical_feature_column_names=None, **kwargs)[source]
Fit the multiplicative adjustment calibration model on the provided training data.
- Parameters:
df_train (
DataFrame) – The dataframe containing the training dataprediction_column_name (
str) – Name of the column in dataframe df that contains the predictionslabel_column_name (
str) – Name of the column in dataframe df that contains the ground truth labelsweight_column_name (
str|None) – Name of the column in dataframe df that contains the instance weightscategorical_feature_column_names (
list[str] |None) – Ignored for multiplicative adjustment (no multicalibration)numerical_feature_column_names (
list[str] |None) – Ignored for multiplicative adjustment (no multicalibration)kwargs (
Any) – Additional keyword arguments
- Return type:
Self- Returns:
The fitted calibrator instance
- predict(df, prediction_column_name, categorical_feature_column_names=None, numerical_feature_column_names=None, **kwargs)[source]
Apply the multiplicative adjustment calibration model to a DataFrame.
This requires the fit method to have been previously called on this calibrator object.
- Parameters:
df (
DataFrame) – The dataframe containing the data to calibrateprediction_column_name (
str) – Name of the column in dataframe df that contains the predictionscategorical_feature_column_names (
list[str] |None) – Ignored for multiplicative adjustment (no multicalibration)numerical_feature_column_names (
list[str] |None) – Ignored for multiplicative adjustment (no multicalibration)kwargs (
Any) – Additional keyword arguments
- Return type:
ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]]- Returns:
Array of calibrated predictions
- fit_transform(df, prediction_column_name, label_column_name, weight_column_name=None, categorical_feature_column_names=None, numerical_feature_column_names=None, is_train_set_col_name=None, **kwargs)
Fit the model and apply calibration transformation to all data.
- Parameters:
df (
DataFrame) – The dataframe containing the data to calibrate.prediction_column_name (
str) – Name of the column in dataframe df that contains the predictions.label_column_name (
str) – Name of the column in dataframe df that contains the ground truth labels.weight_column_name (
str|None) – Name of the column in dataframe df that contains the instance weights.categorical_feature_column_names (
list[str] |None) – List of column names in the df that contain the categorical dimensions that are part of the segment space. This argument is ignored by methods that merely calibrate and do not multicalibrate (e.g., Isotonic regression and Platt scaling).numerical_feature_column_names (
list[str] |None) – List of column names in the df that contain the numerical dimensions that are part of the segment space. This argument is ignored by methods that merely calibrate and do not multicalibrate (e.g., Isotonic regression and Platt scaling).is_train_set_col_name (
str|None) – Name of the column in the dataframe that contains a boolean indicating whether the row is part of the training set (1/True) or test set (0/False). If no is_train_set_col_name is provided, then all rows are considered part of the training set.kwargs (
Any) – Additional keyword arguments.
- Return type:
ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]]- Returns:
Array of calibrated predictions.
- class mcgrad.methods.AdditiveAdjustment(clip_to_zero_one=True)[source]
Bases:
BaseCalibratorCalibrates predictions by adding a constant correction term.
This method computes a scalar offset \(c\) that aligns the weighted average of predictions with the weighted average of labels. Given predictions \(\hat{p}_i\), labels \(y_i\), and optional weights \(w_i\), the offset is computed as:
\[c = \frac{\sum_i w_i (y_i - \hat{p}_i)}{\sum_i w_i}\]The calibrated predictions are then \(\hat{p}_i + c\). This is useful when predictions have an approximately constant bias that needs correction.
- __init__(clip_to_zero_one=True)[source]
- Parameters:
clip_to_zero_one (
bool) – If True, clips calibrated predictions to the [0, 1] range.
- fit(df_train, prediction_column_name, label_column_name, weight_column_name=None, categorical_feature_column_names=None, numerical_feature_column_names=None, **kwargs)[source]
Fit the additive adjustment calibration model on the provided training data.
- Parameters:
df_train (
DataFrame) – The dataframe containing the training dataprediction_column_name (
str) – Name of the column in dataframe df that contains the predictionslabel_column_name (
str) – Name of the column in dataframe df that contains the ground truth labelsweight_column_name (
str|None) – Name of the column in dataframe df that contains the instance weightscategorical_feature_column_names (
list[str] |None) – Ignored for additive adjustment (no multicalibration)numerical_feature_column_names (
list[str] |None) – Ignored for additive adjustment (no multicalibration)kwargs (
Any) – Additional keyword arguments
- Return type:
Self- Returns:
The fitted calibrator instance
- predict(df, prediction_column_name, categorical_feature_column_names=None, numerical_feature_column_names=None, **kwargs)[source]
Apply the additive adjustment calibration model to a DataFrame.
This requires the fit method to have been previously called on this calibrator object.
- Parameters:
df (
DataFrame) – The dataframe containing the data to calibrateprediction_column_name (
str) – Name of the column in dataframe df that contains the predictionscategorical_feature_column_names (
list[str] |None) – Ignored for additive adjustment (no multicalibration)numerical_feature_column_names (
list[str] |None) – Ignored for additive adjustment (no multicalibration)kwargs (
Any) – Additional keyword arguments
- Return type:
ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]]- Returns:
Array of calibrated predictions
- fit_transform(df, prediction_column_name, label_column_name, weight_column_name=None, categorical_feature_column_names=None, numerical_feature_column_names=None, is_train_set_col_name=None, **kwargs)
Fit the model and apply calibration transformation to all data.
- Parameters:
df (
DataFrame) – The dataframe containing the data to calibrate.prediction_column_name (
str) – Name of the column in dataframe df that contains the predictions.label_column_name (
str) – Name of the column in dataframe df that contains the ground truth labels.weight_column_name (
str|None) – Name of the column in dataframe df that contains the instance weights.categorical_feature_column_names (
list[str] |None) – List of column names in the df that contain the categorical dimensions that are part of the segment space. This argument is ignored by methods that merely calibrate and do not multicalibrate (e.g., Isotonic regression and Platt scaling).numerical_feature_column_names (
list[str] |None) – List of column names in the df that contain the numerical dimensions that are part of the segment space. This argument is ignored by methods that merely calibrate and do not multicalibrate (e.g., Isotonic regression and Platt scaling).is_train_set_col_name (
str|None) – Name of the column in the dataframe that contains a boolean indicating whether the row is part of the training set (1/True) or test set (0/False). If no is_train_set_col_name is provided, then all rows are considered part of the training set.kwargs (
Any) – Additional keyword arguments.
- Return type:
ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]]- Returns:
Array of calibrated predictions.
- class mcgrad.methods.IdentityCalibrator[source]
Bases:
BaseCalibratorA pass-through calibrator that returns predictions unchanged. Useful as a baseline or fallback option.
- fit(df_train, prediction_column_name, label_column_name, weight_column_name=None, categorical_feature_column_names=None, numerical_feature_column_names=None, **kwargs)[source]
Fit the identity calibrator (no-op, returns uncalibrated predictions).
- Parameters:
df_train (
DataFrame) – The dataframe containing the training data (ignored)prediction_column_name (
str) – Name of the column in dataframe df that contains the predictions (ignored)label_column_name (
str) – Name of the column in dataframe df that contains the ground truth labels (ignored)weight_column_name (
str|None) – Name of the column in dataframe df that contains the instance weights (ignored)categorical_feature_column_names (
list[str] |None) – Ignoredkwargs (
Any) – Additional keyword arguments (ignored)
- Return type:
Self- Returns:
The calibrator instance
- predict(df, prediction_column_name, categorical_feature_column_names=None, numerical_feature_column_names=None, **kwargs)[source]
Apply the identity calibrator (returns uncalibrated predictions).
- Parameters:
- Return type:
ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]]- Returns:
Array of uncalibrated predictions
- fit_transform(df, prediction_column_name, label_column_name, weight_column_name=None, categorical_feature_column_names=None, numerical_feature_column_names=None, is_train_set_col_name=None, **kwargs)
Fit the model and apply calibration transformation to all data.
- Parameters:
df (
DataFrame) – The dataframe containing the data to calibrate.prediction_column_name (
str) – Name of the column in dataframe df that contains the predictions.label_column_name (
str) – Name of the column in dataframe df that contains the ground truth labels.weight_column_name (
str|None) – Name of the column in dataframe df that contains the instance weights.categorical_feature_column_names (
list[str] |None) – List of column names in the df that contain the categorical dimensions that are part of the segment space. This argument is ignored by methods that merely calibrate and do not multicalibrate (e.g., Isotonic regression and Platt scaling).numerical_feature_column_names (
list[str] |None) – List of column names in the df that contain the numerical dimensions that are part of the segment space. This argument is ignored by methods that merely calibrate and do not multicalibrate (e.g., Isotonic regression and Platt scaling).is_train_set_col_name (
str|None) – Name of the column in the dataframe that contains a boolean indicating whether the row is part of the training set (1/True) or test set (0/False). If no is_train_set_col_name is provided, then all rows are considered part of the training set.kwargs (
Any) – Additional keyword arguments.
- Return type:
ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]]- Returns:
Array of calibrated predictions.
- class mcgrad.methods.PlattScalingWithFeatures[source]
Bases:
BaseCalibratorA variant of Platt scaling that incorporates additional features alongside the log-odds.
This calibrator fits a logistic regression model using the log-odds of the original prediction plus additional features derived from categorical and numerical columns. Given an uncalibrated prediction \(\hat{p}\) and feature vector \(\mathbf{x}\), it fits the model:
\[P(y=1 | \hat{p}, \mathbf{x}) = \sigma(a \cdot t + \mathbf{w}^T \mathbf{x} + b)\]where \(t = \log(\hat{p} / (1 - \hat{p}))\) is the logit transformation, \(\sigma\) is the sigmoid function, \(a\) is the coefficient for the logit, \(\mathbf{w}\) are the coefficients for the features, and \(b\) is the intercept.
Categorical features are one-hot encoded and numerical features are discretized into 3 quantile bins before fitting. This allows the calibration to vary across different feature values while still learning a single unified model (unlike
SegmentwiseCalibratorwhich fits completely separate models per segment).- log_reg: LogisticRegression | None
- ohe: OneHotEncoder | None
- kbd: KBinsDiscretizer | None
- fit(df_train, prediction_column_name, label_column_name, weight_column_name=None, categorical_feature_column_names=None, numerical_feature_column_names=None, **kwargs)[source]
Fit the Platt scaling with features model on the provided training data.
- Parameters:
df_train (
DataFrame) – The dataframe containing the training dataprediction_column_name (
str) – Name of the column in dataframe df that contains the predictionslabel_column_name (
str) – Name of the column in dataframe df that contains the ground truth labelsweight_column_name (
str|None) – Name of the column in dataframe df that contains the instance weightscategorical_feature_column_names (
list[str] |None) – List of column names in df_train that contain the categorical segmentation features (these will be one-hot encoded)numerical_feature_column_names (
list[str] |None) – List of column names in df_train that contain the numerical segmentation features (these will be discretized into bins)kwargs (
Any) – Additional keyword arguments
- Return type:
Self- Returns:
The fitted calibrator instance
- predict(df, prediction_column_name, categorical_feature_column_names=None, numerical_feature_column_names=None, **kwargs)[source]
Apply the Platt scaling with features model to a DataFrame.
This requires the fit method to have been previously called on this calibrator object.
- Parameters:
df (
DataFrame) – The dataframe containing the data to calibrateprediction_column_name (
str) – Name of the column in dataframe df that contains the predictionscategorical_feature_column_names (
list[str] |None) – List of column names in the df that contain the categorical segmentation features (must match the features used during training)numerical_feature_column_names (
list[str] |None) – List of column names in the df that contain the numerical segmentation features (must match the features used during training)kwargs (
Any) – Additional keyword arguments
- Return type:
ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]]- Returns:
Array of calibrated predictions
- fit_transform(df, prediction_column_name, label_column_name, weight_column_name=None, categorical_feature_column_names=None, numerical_feature_column_names=None, is_train_set_col_name=None, **kwargs)
Fit the model and apply calibration transformation to all data.
- Parameters:
df (
DataFrame) – The dataframe containing the data to calibrate.prediction_column_name (
str) – Name of the column in dataframe df that contains the predictions.label_column_name (
str) – Name of the column in dataframe df that contains the ground truth labels.weight_column_name (
str|None) – Name of the column in dataframe df that contains the instance weights.categorical_feature_column_names (
list[str] |None) – List of column names in the df that contain the categorical dimensions that are part of the segment space. This argument is ignored by methods that merely calibrate and do not multicalibrate (e.g., Isotonic regression and Platt scaling).numerical_feature_column_names (
list[str] |None) – List of column names in the df that contain the numerical dimensions that are part of the segment space. This argument is ignored by methods that merely calibrate and do not multicalibrate (e.g., Isotonic regression and Platt scaling).is_train_set_col_name (
str|None) – Name of the column in the dataframe that contains a boolean indicating whether the row is part of the training set (1/True) or test set (0/False). If no is_train_set_col_name is provided, then all rows are considered part of the training set.kwargs (
Any) – Additional keyword arguments.
- Return type:
ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]]- Returns:
Array of calibrated predictions.
- class mcgrad.methods.SegmentwiseCalibrator(calibrator_class, calibrator_kwargs=None)[source]
Bases:
Generic[TCalibrator],BaseCalibratorA meta-calibrator that partitions data into segments based on categorical features and applies a separate calibration method to each segment. This enables more precise calibration when different segments require different calibration adjustments.
Example:
calibrator = SegmentwiseCalibrator(calibrator_class=PlattScaling) calibrator.fit( df_train, prediction_column_name="prediction", label_column_name="label", categorical_feature_column_names=["country"], ) calibrated_predictions = calibrator.predict( df_test, prediction_column_name="prediction", categorical_feature_column_names=["country"], )
This is equivalent to fitting a separate
PlattScalingmodel for each unique country value in the dataset. At prediction time, each sample is calibrated using the calibration model that was fit on its corresponding country segment. For unseen segments during prediction, the uncalibrated predictions are returned.- classmethod __class_getitem__(params)
Parameterizes a generic class.
At least, parameterizing a generic class is the main thing this method does. For example, for some generic class Foo, this is called when we do Foo[int] - there, with cls=Foo and params=int.
However, note that this method is also called when defining generic classes in the first place with class Foo(Generic[T]): ….
- fit_transform(df, prediction_column_name, label_column_name, weight_column_name=None, categorical_feature_column_names=None, numerical_feature_column_names=None, is_train_set_col_name=None, **kwargs)
Fit the model and apply calibration transformation to all data.
- Parameters:
df (
DataFrame) – The dataframe containing the data to calibrate.prediction_column_name (
str) – Name of the column in dataframe df that contains the predictions.label_column_name (
str) – Name of the column in dataframe df that contains the ground truth labels.weight_column_name (
str|None) – Name of the column in dataframe df that contains the instance weights.categorical_feature_column_names (
list[str] |None) – List of column names in the df that contain the categorical dimensions that are part of the segment space. This argument is ignored by methods that merely calibrate and do not multicalibrate (e.g., Isotonic regression and Platt scaling).numerical_feature_column_names (
list[str] |None) – List of column names in the df that contain the numerical dimensions that are part of the segment space. This argument is ignored by methods that merely calibrate and do not multicalibrate (e.g., Isotonic regression and Platt scaling).is_train_set_col_name (
str|None) – Name of the column in the dataframe that contains a boolean indicating whether the row is part of the training set (1/True) or test set (0/False). If no is_train_set_col_name is provided, then all rows are considered part of the training set.kwargs (
Any) – Additional keyword arguments.
- Return type:
ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]]- Returns:
Array of calibrated predictions.
- calibrator_per_segment: dict[str, BaseCalibrator]
- fit(df_train, prediction_column_name, label_column_name, weight_column_name=None, categorical_feature_column_names=None, numerical_feature_column_names=None, **kwargs)[source]
Fit segment-specific calibration models on the provided training data.
Data is partitioned into segments based on categorical features, and a separate calibrator is fit for each segment.
- Parameters:
df_train (
DataFrame) – The dataframe containing the training dataprediction_column_name (
str) – Name of the column in dataframe df that contains the predictionslabel_column_name (
str) – Name of the column in dataframe df that contains the ground truth labelsweight_column_name (
str|None) – Name of the column in dataframe df that contains the instance weightscategorical_feature_column_names (
list[str] |None) – List of column names in df_train that contain the categorical segmentation features (passed to individual calibrators)numerical_feature_column_names (
list[str] |None) – List of column names in df_train that contain the numerical segmentation features (passed to individual calibrators)kwargs (
Any) – Additional keyword arguments
- Return type:
Self- Returns:
The fitted calibrator instance
- predict(df, prediction_column_name, categorical_feature_column_names=None, numerical_feature_column_names=None, **kwargs)[source]
Apply segment-specific calibration models to a DataFrame.
This requires the fit method to have been previously called on this calibrator object. For any unseen segments, the identity calibrator is used (returns uncalibrated predictions).
- Parameters:
df (
DataFrame) – The dataframe containing the data to calibrateprediction_column_name (
str) – Name of the column in dataframe df that contains the predictionscategorical_feature_column_names (
list[str] |None) – List of column names in the df that contain the categorical segmentation features (must match the features used during training)numerical_feature_column_names (
list[str] |None) – List of column names in the df that contain the numerical segmentation features (must match the features used during training)kwargs (
Any) – Additional keyword arguments
- Return type:
ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]]- Returns:
Array of calibrated predictions