ssbc.calibration.bootstrap

Bootstrap analysis of calibration uncertainty for operational rates.

This models: “If I recalibrate many times on similar datasets, how do rates vary?” Different from LOO-CV which models: “Given ONE fixed calibration, how do test sets vary?”

Functions

bootstrap_calibration_uncertainty(labels, ...)

Bootstrap analysis of calibration uncertainty.

plot_bootstrap_distributions(bootstrap_results)

Plot bootstrap distributions.

Classes

DataGenerator(*args, **kwargs)

Protocol for data generators (e.g., BinaryClassifierSimulator).

class ssbc.calibration.bootstrap.DataGenerator(*args, **kwargs)[source]

Protocol for data generators (e.g., BinaryClassifierSimulator).

generate(n_samples)[source]

Generate samples.

Returns:

(labels, probabilities)

Return type:

tuple

Parameters:

n_samples (int)

__init__(*args, **kwargs)
ssbc.calibration.bootstrap.bootstrap_calibration_uncertainty(labels, probs, simulator, alpha_target=0.1, delta=0.1, test_size=1000, n_bootstrap=1000, n_jobs=-1, seed=None)[source]

Bootstrap analysis of calibration uncertainty.

For each bootstrap iteration: 1. Resample calibration data with replacement 2. Calibrate (compute SSBC thresholds) 3. Evaluate on fresh independent test set 4. Record operational rates

This models: “If I recalibrate on similar datasets, how do rates vary?”

Parameters:
  • labels (np.ndarray) – Calibration labels

  • probs (np.ndarray) – Calibration probabilities

  • simulator (DataGenerator) – Simulator to generate independent test sets

  • alpha_target (float, default=0.10) – Target miscoverage

  • delta (float, default=0.10) – PAC risk

  • test_size (int, default=1000) – Size of test sets for evaluation

  • n_bootstrap (int, default=1000) – Number of bootstrap iterations

  • n_jobs (int, default=-1) – Parallel jobs (-1 for all cores)

  • seed (int, optional) – Random seed

Returns:

Bootstrap distributions with keys: - ‘marginal’: dict with ‘singleton’, ‘doublet’, ‘abstention’, ‘singleton_error’ - ‘class_0’: dict with same metrics - ‘class_1’: dict with same metrics Each metric contains: - ‘samples’: array of rates across bootstrap trials - ‘mean’: mean rate - ‘std’: standard deviation - ‘quantiles’: dict with q05, q25, q50, q75, q95

Return type:

dict

Examples

>>> from ssbc import BinaryClassifierSimulator, bootstrap_calibration_uncertainty
>>> sim = BinaryClassifierSimulator(p_class1=0.2, beta_params_class0=(1,7), beta_params_class1=(5,2))
>>> labels, probs = sim.generate(100)
>>> results = bootstrap_calibration_uncertainty(labels, probs, sim, n_bootstrap=100)
>>> print(results['marginal']['singleton']['mean'])
ssbc.calibration.bootstrap.plot_bootstrap_distributions(bootstrap_results, figsize=(16, 12), save_path=None)[source]

Plot bootstrap distributions.

Parameters:
  • bootstrap_results (dict) – Results from bootstrap_calibration_uncertainty()

  • figsize (tuple, default=(16, 12)) – Figure size

  • save_path (str, optional) – Path to save figure. If None, displays interactively.

Raises:

ImportError – If matplotlib is not installed

Return type:

None

Examples

>>> from ssbc import bootstrap_calibration_uncertainty, plot_bootstrap_distributions
>>> results = bootstrap_calibration_uncertainty(...)
>>> plot_bootstrap_distributions(results, save_path='bootstrap_results.png')