ssbc.calibration

Calibration-related APIs.

This package provides conformal prediction, bootstrap uncertainty analysis, and cross-conformal validation utilities.

ssbc.calibration.alpha_scan(labels, probs, fixed_threshold=None)[source]

Scan through all possible alpha thresholds and report prediction set statistics.

For each unique threshold value derived from the calibration data’s non-conformity scores, this function computes the number of abstentions, singletons, and doublets for both classes using Mondrian conformal prediction.

Optionally, a fixed threshold can be evaluated separately and returned as a dict.

Parameters:

labels (np.ndarray, shape (n,)) – True binary labels (0 or 1)
probs (np.ndarray, shape (n, 2)) – Classification probabilities [P(class=0), P(class=1)]
fixed_threshold (float, optional) – Fixed non-conformity score threshold for special case analysis. If None (default), no fixed threshold is evaluated.

Returns:

If fixed_threshold is None:: DataFrame with scan results
If fixed_threshold is provided:: Tuple of (DataFrame with scan results, dict with fixed threshold results)

DataFrame columns: - alpha: miscoverage rate (alpha) - qhat_0: threshold for class 0 - qhat_1: threshold for class 1 - n_abstentions: number of empty prediction sets - n_singletons: number of singleton prediction sets - n_doublets: number of doublet prediction sets - n_singletons_correct: number of correct singletons (marginal) - singleton_coverage: fraction of singletons that are correct (marginal) - n_singletons_0: singletons when true label is 0 - n_singletons_correct_0: correct singletons when true label is 0 - singleton_coverage_0: coverage for class 0 singletons - n_singletons_1: singletons when true label is 1 - n_singletons_correct_1: correct singletons when true label is 1 - singleton_coverage_1: coverage for class 1 singletons

Fixed threshold dict (when provided) has same keys as DataFrame columns

Return type:

pd.DataFrame or tuple[pd.DataFrame, dict]

Examples

>>> labels = np.array([0, 1, 0, 1])
>>> probs = np.array([[0.8, 0.2], [0.3, 0.7], [0.9, 0.1], [0.2, 0.8]])
>>> df = alpha_scan(labels, probs)
>>> print(df.head())

ssbc.calibration.compute_pac_operational_metrics(y_cal, probs_cal, alpha, delta, ci_level=0.95, class_label=1)[source]

Compute PAC-controlled confidence intervals for operational metrics.

Extends SSBC to provide rigorous bounds on operational metrics (singleton rates, escalation rates) without accepting risk by fiat. Uses a two-step approach:

SSBC for coverage: Compute α_adj that achieves Pr(coverage ≥ 1-α) ≥ 1-δ
PAC bounds on operational rates: For each possible α’ in discrete grid, run LOO-CV to estimate operational metrics, weight by Beta distribution probability, and aggregate to get PAC-controlled bounds.

Parameters:

y_cal (np.ndarray, shape (n,)) – Binary labels (0 or 1) for calibration set
probs_cal (np.ndarray, shape (n,) or (n, 2)) – Predicted probabilities. If 1D, interpreted as P(class=1). If 2D, uses column corresponding to class_label.
alpha (float) – Target miscoverage rate (must be in (0, 1))
delta (float) – PAC risk tolerance (must be in (0, 1))
ci_level (float, default=0.95) – Confidence level for operational metric CIs (e.g., 0.95 for 95% CI)
class_label (int, default=1) – Which class to calibrate for (0 or 1). Uses class_label column if probs_cal is 2D.

Returns:

Dictionary with keys: - ‘alpha_adj’: Adjusted miscoverage from SSBC - ‘singleton_rate_ci’: [lower, upper] PAC-controlled bounds - ‘doublet_rate_ci’: [lower, upper] - ‘abstention_rate_ci’: [lower, upper] - ‘expected_singleton_rate’: Probability-weighted mean singleton rate - ‘expected_doublet_rate’: Probability-weighted mean doublet rate - ‘expected_abstention_rate’: Probability-weighted mean abstention rate - ‘alpha_grid’: Discrete grid of possible alphas - ‘singleton_fractions’: Singleton rate for each alpha in grid - ‘doublet_fractions’: Doublet rate for each alpha in grid - ‘abstention_fractions’: Abstention rate for each alpha in grid - ‘beta_weights’: Probability weights from Beta distribution - ‘n_calibration’: Number of calibration points

Return type:

dict

Examples

>>> y_cal = np.array([0, 1, 0, 1, 1])
>>> probs_cal = np.array([0.2, 0.8, 0.3, 0.9, 0.7])
>>> result = compute_pac_operational_metrics(
...     y_cal, probs_cal, alpha=0.1, delta=0.1
... )
>>> print(f"Singleton rate: [{result['singleton_rate_ci'][0]:.3f}, "
...       f"{result['singleton_rate_ci'][1]:.3f}]")

Notes

Mathematical Framework:

Coverage decomposes as:: coverage = p_s(1 - α_singleton) + p_d·1 + p_a·0

where p_s, p_d, p_a are fractions of singletons, doublets, abstentions.

For each α’ in discrete grid {k/(n+1)}, k=1,…,n: 1. Run LOO-CV to determine prediction sets for each point 2. Calculate operational rates: p_s(α’), p_d(α’), p_a(α’) 3. Compute Clopper-Pearson CIs for each rate 4. Weight by Beta(k, n+1-k) probability

Aggregate across α’ with probability weighting to get PAC-controlled bounds.

Edge Cases: - Small n: Discretization is coarse, bounds may be conservative - Extreme α or δ: May result in very wide bounds - Class imbalance: Focus on class_label, ensure sufficient samples

ssbc.calibration.mondrian_conformal_calibrate(class_data, alpha_target, delta, mode='beta', m=None)[source]

Perform Mondrian (per-class) conformal calibration with SSBC correction.

For each class, compute: 1. Nonconformity scores: s(x, y) = 1 - P(y|x) 2. SSBC-corrected alpha for PAC guarantee 3. Conformal quantile threshold 4. Singleton error rate bounds via PAC guarantee

Then evaluate prediction set sizes on calibration data PER CLASS and MARGINALLY.

Parameters:

class_data (dict) – Output from split_by_class()
alpha_target (float or dict) – Target miscoverage rate for each class If float: same for both classes If dict: {0: α0, 1: α1} for per-class control
delta (float or dict) – PAC risk tolerance for each class If float: same for both classes If dict: {0: δ0, 1: δ1} for per-class control
mode (str, default="beta") – “beta” (infinite test) or “beta-binomial” (finite test)
m (int, optional) – Test window size for beta-binomial mode

Returns:

calibration_result (dict) – Dictionary with keys 0 and 1, each containing calibration info
prediction_stats (dict) – Dictionary with keys: - 0, 1: per-class statistics (conditioned on true label) - ‘marginal’: overall statistics (ignoring true labels)

Return type:

tuple[dict[int, dict[str, Any]], dict[Any, Any]]

Examples

>>> labels = np.array([0, 1, 0, 1])
>>> probs = np.array([[0.8, 0.2], [0.3, 0.7], [0.9, 0.1], [0.2, 0.8]])
>>> class_data = split_by_class(labels, probs)
>>> cal_result, pred_stats = mondrian_conformal_calibrate(
...     class_data, alpha_target=0.1, delta=0.1
... )

ssbc.calibration.split_by_class(labels, probs)[source]

Split calibration data by true class for Mondrian conformal prediction.

Parameters:

labels (np.ndarray, shape (n,)) – True binary labels (0 or 1)
probs (np.ndarray, shape (n, 2)) – Classification probabilities [P(class=0), P(class=1)]

Returns:

Dictionary with keys 0 and 1, each containing: - ‘labels’: labels for this class (all same value) - ‘probs’: probabilities for samples in this class - ‘indices’: original indices (for tracking) - ‘n’: number of samples in this class

Return type:

dict

Examples

>>> labels = np.array([0, 1, 0, 1])
>>> probs = np.array([[0.8, 0.2], [0.3, 0.7], [0.9, 0.1], [0.2, 0.8]])
>>> class_data = split_by_class(labels, probs)
>>> print(class_data[0]['n'])  # Number of class 0 samples
2

ssbc.calibration.cross_conformal_validation(labels, probs, alpha_target=0.1, delta=0.1, n_folds=5, stratify=True, seed=None)[source]

K-fold cross-conformal validation for Mondrian conformal prediction.

Estimates the variability of operational rates (abstentions, singletons, doublets) due to finite calibration sample effects by splitting data into K folds.

For each fold: 1. Train: Compute SSBC-corrected thresholds on K-1 folds 2. Test: Evaluate operational rates on held-out fold 3. Record: Store rates for this fold

Aggregate rates across folds to quantify finite-sample variability.

Parameters:

labels (np.ndarray, shape (n,)) – Calibration labels (0 or 1)
probs (np.ndarray, shape (n, 2)) – Calibration probabilities [P(class=0), P(class=1)]
alpha_target (float, default=0.10) – Target miscoverage rate
delta (float, default=0.10) – PAC risk for SSBC correction
n_folds (int, default=5) – Number of folds (K)
stratify (bool, default=True) – Stratify folds by class labels
seed (int, optional) – Random seed for reproducibility

Returns:

Cross-conformal results with keys: - ‘fold_rates’: List of rate dicts for each fold - ‘marginal’: Statistics for marginal rates - ‘class_0’: Statistics for class 0 rates - ‘class_1’: Statistics for class 1 rates Each statistics dict contains: - ‘samples’: Array of rates across folds - ‘mean’: Mean rate - ‘std’: Standard deviation - ‘quantiles’: Dict with q05, q25, q50, q75, q95 - ‘ci_95’: 95% Clopper-Pearson CI (if applicable)

Return type:

dict

Examples

>>> from ssbc import cross_conformal_validation
>>> results = cross_conformal_validation(labels, probs, n_folds=10)
>>> m = results['marginal']['singleton']
>>> print(f"Singleton rate: {m['mean']:.3f} ± {m['std']:.3f}")
>>> print(f"95% range: [{m['quantiles']['q05']:.3f}, {m['quantiles']['q95']:.3f}]")

Notes

Different from other methods: - LOO-CV: Leave-one-out, aggregates counts (not fold-level rates) - Bootstrap: Resamples with replacement, tests on fresh data - Cross-conformal: K-fold split, estimates rate distribution from calibration

This method directly estimates the variability of rates due to finite calibration samples, without requiring a data simulator.

ssbc.calibration.print_cross_conformal_results(results)[source]

Pretty print cross-conformal validation results.

Parameters:: results (dict) – Results from cross_conformal_validation()
Return type:: None

ssbc.calibration.bootstrap_calibration_uncertainty(labels, probs, simulator, alpha_target=0.1, delta=0.1, test_size=1000, n_bootstrap=1000, n_jobs=-1, seed=None)[source]

Bootstrap analysis of calibration uncertainty.

For each bootstrap iteration: 1. Resample calibration data with replacement 2. Calibrate (compute SSBC thresholds) 3. Evaluate on fresh independent test set 4. Record operational rates

This models: “If I recalibrate on similar datasets, how do rates vary?”

Parameters:

labels (np.ndarray) – Calibration labels
probs (np.ndarray) – Calibration probabilities
simulator (DataGenerator) – Simulator to generate independent test sets
alpha_target (float, default=0.10) – Target miscoverage
delta (float, default=0.10) – PAC risk
test_size (int, default=1000) – Size of test sets for evaluation
n_bootstrap (int, default=1000) – Number of bootstrap iterations
n_jobs (int, default=-1) – Parallel jobs (-1 for all cores)
seed (int, optional) – Random seed

Returns:

Bootstrap distributions with keys: - ‘marginal’: dict with ‘singleton’, ‘doublet’, ‘abstention’, ‘singleton_error’ - ‘class_0’: dict with same metrics - ‘class_1’: dict with same metrics Each metric contains: - ‘samples’: array of rates across bootstrap trials - ‘mean’: mean rate - ‘std’: standard deviation - ‘quantiles’: dict with q05, q25, q50, q75, q95

Return type:

dict

Examples

>>> from ssbc import BinaryClassifierSimulator, bootstrap_calibration_uncertainty
>>> sim = BinaryClassifierSimulator(p_class1=0.2, beta_params_class0=(1,7), beta_params_class1=(5,2))
>>> labels, probs = sim.generate(100)
>>> results = bootstrap_calibration_uncertainty(labels, probs, sim, n_bootstrap=100)
>>> print(results['marginal']['singleton']['mean'])

ssbc.calibration.plot_bootstrap_distributions(bootstrap_results, figsize=(16, 12), save_path=None)[source]

Plot bootstrap distributions.

Parameters:

bootstrap_results (dict) – Results from bootstrap_calibration_uncertainty()
figsize (tuple, default=(16, 12)) – Figure size
save_path (str, optional) – Path to save figure. If None, displays interactively.

Raises:

ImportError – If matplotlib is not installed

Return type:

None

Examples

>>> from ssbc import bootstrap_calibration_uncertainty, plot_bootstrap_distributions
>>> results = bootstrap_calibration_uncertainty(...)
>>> plot_bootstrap_distributions(results, save_path='bootstrap_results.png')

Modules

`bootstrap`	Bootstrap analysis of calibration uncertainty for operational rates.
`conformal`	Mondrian conformal prediction with SSBC correction.
`cross_conformal`	Cross-conformal validation for estimating rate variability.