ssbc.bounds
Unified bounds computation module for SSBC.
This module consolidates all statistical bounds computation functions to reduce code duplication and provide a consistent API.
- ssbc.bounds.clopper_pearson_intervals(labels, confidence=0.95)[source]
Compute Clopper-Pearson (exact binomial) confidence intervals for class prevalences.
- Parameters:
labels (np.ndarray) – Binary labels (0 or 1)
confidence (float, default=0.95) – Confidence level (e.g., 0.95 for 95% CI)
- Returns:
Dictionary with keys 0 and 1, each containing: - ‘count’: number of samples in this class - ‘proportion’: observed proportion - ‘lower’: lower bound of CI - ‘upper’: upper bound of CI
- Return type:
Examples
>>> labels = np.array([0, 0, 1, 1, 0]) >>> intervals = clopper_pearson_intervals(labels, confidence=0.95) >>> print(intervals[0]['proportion']) 0.6
Notes
The Clopper-Pearson interval is an exact binomial confidence interval based on Beta distribution quantiles. It provides conservative coverage guarantees.
- ssbc.bounds.clopper_pearson_lower(k, n, confidence=0.95)[source]
Compute lower Clopper-Pearson (one-sided) confidence bound.
- Parameters:
- Returns:
Lower confidence bound for the true proportion
- Return type:
Examples
>>> lower = clopper_pearson_lower(k=5, n=10, confidence=0.95) >>> print(f"Lower bound: {lower:.3f}")
Notes
Uses Beta distribution quantiles for exact binomial confidence bounds. For PAC-style guarantees, you may want to use delta = 1 - confidence.
- ssbc.bounds.clopper_pearson_upper(k, n, confidence=0.95)[source]
Compute upper Clopper-Pearson (one-sided) confidence bound.
- Parameters:
- Returns:
Upper confidence bound for the true proportion
- Return type:
Examples
>>> upper = clopper_pearson_upper(k=5, n=10, confidence=0.95) >>> print(f"Upper bound: {upper:.3f}")
Notes
Uses Beta distribution quantiles for exact binomial confidence bounds. For PAC-style guarantees, you may want to use delta = 1 - confidence.
- ssbc.bounds.cp_interval(count, total, confidence=0.95)[source]
Compute Clopper-Pearson exact confidence interval.
Helper function for computing a single CI from count and total.
- ssbc.bounds.ensure_ci(d, count, total, confidence=0.95)[source]
Extract or compute rate and confidence interval from a dictionary.
If the dictionary already contains rate/CI information, use it. Otherwise, compute Clopper-Pearson CI from count/total.
This function re-normalizes to the requested confidence level if the provided dictionary is missing bounds or if the provided bounds look degenerate (NaN values).
- Parameters:
- Returns:
Rate and confidence interval bounds
- Return type:
tuple of (rate, lower, upper)
- ssbc.bounds.prediction_bounds(k_cal, n_cal, n_test, confidence=0.95, method='simple')[source]
Compute prediction bounds accounting for both calibration and test set sampling uncertainty.
This function provides two methods for computing prediction bounds: 1. “simple”: Uses standard error formula (faster, good for large samples) 2. “beta_binomial”: Uses Beta-Binomial distribution (more accurate for small samples)
- Parameters:
k_cal (int) – Number of successes in calibration data for a single well-defined Bernoulli event. Must be the count of a binary indicator (e.g., Z_i = 1{event}) across all n_cal trials.
n_cal (int) – Total number of trials in calibration data for the same Bernoulli event. This is the fixed denominator (total sample size or conditional subpopulation size).
n_test (int) – Expected number of future trials for the same Bernoulli event. For joint rates, this is the planned test size (fixed). For conditional rates, this is an estimated future conditional subpopulation size.
confidence (float, default=0.95) – Confidence level (e.g., 0.95 for 95% prediction bounds)
method (str, default="simple") – Method to use: “simple” or “beta_binomial”
- Returns:
(lower_bound, upper_bound) for operational rates on future test sets
- Return type:
Examples
>>> # Simple method (default) >>> lower, upper = prediction_bounds(k_cal=50, n_cal=100, n_test=1000, confidence=0.95) >>> print(f"Simple bounds: [{lower:.3f}, {upper:.3f}]")
>>> # Beta-Binomial method (more accurate for small samples) >>> lower, upper = prediction_bounds(k_cal=50, n_cal=100, n_test=1000, confidence=0.95, method="beta_binomial") >>> print(f"Beta-Binomial bounds: [{lower:.3f}, {upper:.3f}]")
Notes
The prediction bounds account for both: 1. Calibration uncertainty: uncertainty in the true rate p from calibration data 2. Test set sampling uncertainty: variability when sampling n_test points from the true distribution
Simple method (default): - Mathematical formula: SE = sqrt(p̂(1-p̂) * (1/n_cal + 1/n_test)) - Good for large sample sizes - Faster computation
Beta-Binomial method: - Uses Beta-Binomial distribution for exact finite-sample modeling - More accurate for small sample sizes - Slower computation - Uses uniform prior Beta(1,1) and equal-tailed intervals by default - For advanced use (Jeffreys prior or HPD intervals), call
prediction_bounds_beta_binomial() directly
For large n_test, bounds approach calibration-only bounds. For small n_test, bounds are wider due to additional test set sampling uncertainty.
This is the recommended function for computing operational rate bounds when applying fixed thresholds to future test sets.
- ssbc.bounds.prediction_bounds_beta_binomial(k_cal, n_cal, n_test, confidence=0.95, use_jeffreys=False, tail='equal_tailed')[source]
Beta-Binomial predictive interval for future empirical rate.
This function computes exact prediction bounds using the Beta-Binomial distribution, which properly accounts for both calibration uncertainty and test set sampling variability.
- Parameters:
k_cal (int) – Number of successes in calibration data
n_cal (int) – Total number of samples in calibration data
n_test (int) – Expected size of future test sets
confidence (float, default=0.95) – Desired predictive mass (confidence level)
use_jeffreys (bool, default=False) – If False, use uniform prior Beta(1,1), giving posterior Beta(k_cal+1, n_cal-k_cal+1). If True, use Jeffreys prior Beta(1/2,1/2), giving posterior Beta(k_cal+0.5, n_cal-k_cal+0.5).
tail (str, default="equal_tailed") – Interval type: - “equal_tailed”: Invert predictive CDF (α/2 each side) - “hpd”: Shortest high posterior density predictive interval
- Returns:
(lower_rate, upper_rate) for operational rates on future test sets
- Return type:
Notes
This method models: 1. True rate p ~ Beta(alpha, beta) (posterior from calibration data) 2. Future successes k_test | p ~ Binomial(n_test, p) 3. Marginal predictive distribution: k_test ~ BetaBinomial(n_test, alpha, beta) 4. Return bounds on the rate k_test / n_test
This provides exact finite-sample prediction bounds that account for both sources of uncertainty without normal approximations.
Modules
Statistical utility functions for SSBC. |