ssbc.bounds.statistical

Statistical utility functions for SSBC.

Functions

clopper_pearson_intervals(labels[, confidence])

Compute Clopper-Pearson (exact binomial) confidence intervals for class prevalences.

clopper_pearson_lower(k, n[, confidence])

Compute lower Clopper-Pearson (one-sided) confidence bound.

clopper_pearson_upper(k, n[, confidence])

Compute upper Clopper-Pearson (one-sided) confidence bound.

cp_interval(count, total[, confidence])

Compute Clopper-Pearson exact confidence interval.

ensure_ci(d, count, total[, confidence])

Extract or compute rate and confidence interval from a dictionary.

prediction_bounds(k_cal, n_cal, n_test[, ...])

Compute prediction bounds accounting for both calibration and test set sampling uncertainty.

prediction_bounds_beta_binomial(k_cal, ...)

Beta-Binomial predictive interval for future empirical rate.

prediction_bounds_lower(k_cal, n_cal, n_test)

Compute lower prediction bound accounting for both calibration and test set sampling uncertainty.

prediction_bounds_upper(k_cal, n_cal, n_test)

Compute upper prediction bound accounting for both calibration and test set sampling uncertainty.

ssbc.bounds.statistical.clopper_pearson_lower(k, n, confidence=0.95)[source]

Compute lower Clopper-Pearson (one-sided) confidence bound.

Parameters:
  • k (int) – Number of successes

  • n (int) – Total number of trials

  • confidence (float, default=0.95) – Confidence level (e.g., 0.95 for 95% confidence)

Returns:

Lower confidence bound for the true proportion

Return type:

float

Examples

>>> lower = clopper_pearson_lower(k=5, n=10, confidence=0.95)
>>> print(f"Lower bound: {lower:.3f}")

Notes

Uses Beta distribution quantiles for exact binomial confidence bounds. For PAC-style guarantees, you may want to use delta = 1 - confidence.

ssbc.bounds.statistical.clopper_pearson_upper(k, n, confidence=0.95)[source]

Compute upper Clopper-Pearson (one-sided) confidence bound.

Parameters:
  • k (int) – Number of successes

  • n (int) – Total number of trials

  • confidence (float, default=0.95) – Confidence level (e.g., 0.95 for 95% confidence)

Returns:

Upper confidence bound for the true proportion

Return type:

float

Examples

>>> upper = clopper_pearson_upper(k=5, n=10, confidence=0.95)
>>> print(f"Upper bound: {upper:.3f}")

Notes

Uses Beta distribution quantiles for exact binomial confidence bounds. For PAC-style guarantees, you may want to use delta = 1 - confidence.

ssbc.bounds.statistical.clopper_pearson_intervals(labels, confidence=0.95)[source]

Compute Clopper-Pearson (exact binomial) confidence intervals for class prevalences.

Parameters:
  • labels (np.ndarray) – Binary labels (0 or 1)

  • confidence (float, default=0.95) – Confidence level (e.g., 0.95 for 95% CI)

Returns:

Dictionary with keys 0 and 1, each containing: - ‘count’: number of samples in this class - ‘proportion’: observed proportion - ‘lower’: lower bound of CI - ‘upper’: upper bound of CI

Return type:

dict

Examples

>>> labels = np.array([0, 0, 1, 1, 0])
>>> intervals = clopper_pearson_intervals(labels, confidence=0.95)
>>> print(intervals[0]['proportion'])
0.6

Notes

The Clopper-Pearson interval is an exact binomial confidence interval based on Beta distribution quantiles. It provides conservative coverage guarantees.

ssbc.bounds.statistical.cp_interval(count, total, confidence=0.95)[source]

Compute Clopper-Pearson exact confidence interval.

Helper function for computing a single CI from count and total.

Parameters:
  • count (int) – Number of successes

  • total (int) – Total number of trials

  • confidence (float, default=0.95) – Confidence level

Returns:

Dictionary with keys: - ‘count’: original count - ‘proportion’: count/total - ‘lower’: lower CI bound - ‘upper’: upper CI bound

Return type:

dict

ssbc.bounds.statistical.prediction_bounds(k_cal, n_cal, n_test, confidence=0.95, method='simple')[source]

Compute prediction bounds accounting for both calibration and test set sampling uncertainty.

This function provides two methods for computing prediction bounds: 1. “simple”: Uses standard error formula (faster, good for large samples) 2. “beta_binomial”: Uses Beta-Binomial distribution (more accurate for small samples)

Parameters:
  • k_cal (int) – Number of successes in calibration data for a single well-defined Bernoulli event. Must be the count of a binary indicator (e.g., Z_i = 1{event}) across all n_cal trials.

  • n_cal (int) – Total number of trials in calibration data for the same Bernoulli event. This is the fixed denominator (total sample size or conditional subpopulation size).

  • n_test (int) – Expected number of future trials for the same Bernoulli event. For joint rates, this is the planned test size (fixed). For conditional rates, this is an estimated future conditional subpopulation size.

  • confidence (float, default=0.95) – Confidence level (e.g., 0.95 for 95% prediction bounds)

  • method (str, default="simple") – Method to use: “simple” or “beta_binomial”

Returns:

(lower_bound, upper_bound) for operational rates on future test sets

Return type:

tuple[float, float]

Examples

>>> # Simple method (default)
>>> lower, upper = prediction_bounds(k_cal=50, n_cal=100, n_test=1000, confidence=0.95)
>>> print(f"Simple bounds: [{lower:.3f}, {upper:.3f}]")
>>> # Beta-Binomial method (more accurate for small samples)
>>> lower, upper = prediction_bounds(k_cal=50, n_cal=100, n_test=1000, confidence=0.95, method="beta_binomial")
>>> print(f"Beta-Binomial bounds: [{lower:.3f}, {upper:.3f}]")

Notes

The prediction bounds account for both: 1. Calibration uncertainty: uncertainty in the true rate p from calibration data 2. Test set sampling uncertainty: variability when sampling n_test points from the true distribution

Simple method (default): - Mathematical formula: SE = sqrt(p̂(1-p̂) * (1/n_cal + 1/n_test)) - Good for large sample sizes - Faster computation

Beta-Binomial method: - Uses Beta-Binomial distribution for exact finite-sample modeling - More accurate for small sample sizes - Slower computation - Uses uniform prior Beta(1,1) and equal-tailed intervals by default - For advanced use (Jeffreys prior or HPD intervals), call

prediction_bounds_beta_binomial() directly

For large n_test, bounds approach calibration-only bounds. For small n_test, bounds are wider due to additional test set sampling uncertainty.

This is the recommended function for computing operational rate bounds when applying fixed thresholds to future test sets.

ssbc.bounds.statistical.prediction_bounds_beta_binomial(k_cal, n_cal, n_test, confidence=0.95, use_jeffreys=False, tail='equal_tailed')[source]

Beta-Binomial predictive interval for future empirical rate.

This function computes exact prediction bounds using the Beta-Binomial distribution, which properly accounts for both calibration uncertainty and test set sampling variability.

Parameters:
  • k_cal (int) – Number of successes in calibration data

  • n_cal (int) – Total number of samples in calibration data

  • n_test (int) – Expected size of future test sets

  • confidence (float, default=0.95) – Desired predictive mass (confidence level)

  • use_jeffreys (bool, default=False) – If False, use uniform prior Beta(1,1), giving posterior Beta(k_cal+1, n_cal-k_cal+1). If True, use Jeffreys prior Beta(1/2,1/2), giving posterior Beta(k_cal+0.5, n_cal-k_cal+0.5).

  • tail (str, default="equal_tailed") – Interval type: - “equal_tailed”: Invert predictive CDF (α/2 each side) - “hpd”: Shortest high posterior density predictive interval

Returns:

(lower_rate, upper_rate) for operational rates on future test sets

Return type:

tuple[float, float]

Notes

This method models: 1. True rate p ~ Beta(alpha, beta) (posterior from calibration data) 2. Future successes k_test | p ~ Binomial(n_test, p) 3. Marginal predictive distribution: k_test ~ BetaBinomial(n_test, alpha, beta) 4. Return bounds on the rate k_test / n_test

This provides exact finite-sample prediction bounds that account for both sources of uncertainty without normal approximations.