SSBC Documentation
Small-Sample Beta Correction provides PAC guarantees for conformal prediction with small calibration sets.
Contents:
- Installation
- Usage Guide
- Theory and Deployment Guide
- SMALL SAMPLE BETA CORRECTION (SSBC) AND OPERATIONAL PROPERTIES
- Core Statistical Framework
- The Coverage Guarantee
- Induced Operational Properties
- Predicting Deployment Behavior with Operational Rate Estimates
- Mondrian Operational Estimates
- Beyond Coverage: Conditional Error Rates
- Why This Matters for Deployment
- Complete Deployment Workflow
- The Transformation: Theory to Deployment
API Reference
Core Algorithm
Core API facade. |
|
Unified bounds computation module for SSBC. |
Calibration & Conformal Prediction
Calibration-related APIs. |
Metrics & Operational Bounds
Operational metrics and uncertainty APIs. |
Reporting & Visualization
Reporting and visualization APIs. |
Uncertainty Analysis & Validation
Validation API facade. |
Utilities & Tools
Utility functions for conformal prediction. |
|
Simulation utilities for testing conformal prediction. |
|
Hyperparameter sweep and optimization for Mondrian conformal prediction. |
Complete Package
Top-level package for SSBC (Small-Sample Beta Correction).
- class ssbc.SSBCResult(alpha_target, alpha_corrected, u_star, n, satisfied_mass, mode, details)[source]
Bases:
objectResult of SSBC correction.
- Parameters:
- alpha_target
Target miscoverage rate
- Type:
- alpha_corrected
Corrected miscoverage rate (u_star / (n+1))
- Type:
- u_star
Optimal u value found by the algorithm
- Type:
- n
Calibration set size
- Type:
- satisfied_mass
Probability that coverage >= target
- Type:
- mode
“beta” for infinite test window, “beta-binomial” for finite
- Type:
Literal[‘beta’, ‘beta-binomial’]
- alpha_target: float
- alpha_corrected: float
- u_star: int
- n: int
- satisfied_mass: float
- mode: Literal['beta', 'beta-binomial']
- ssbc.ssbc_correct(alpha_target, n, delta, *, mode='beta', m=None, bracket_width=None)[source]
Small-Sample Beta Correction (SSBC), corrected acceptance rule.
Find the largest α’ = u/(n+1) ≤ α_target such that: P(Coverage(α’) ≥ 1 - α_target) ≥ 1 - δ
where Coverage(α’) ~ Beta(n+1-u, u) for infinite test window.
Trivial regime: if α_target < 1/(n+1), return α_corrected=0.
- Parameters:
alpha_target (float) – Target miscoverage rate (must be in (0,1))
n (int) – Calibration set size (must be >= 1)
delta (float) – PAC risk tolerance (must be in (0,1)). This is the probability that the coverage guarantee fails. For example, delta=0.10 means we want a 90% PAC confidence (1-delta) that coverage ≥ target.
mode ({"beta", "beta-binomial"}, default="beta") – “beta” for infinite test window “beta-binomial” for finite test window (defaults to m=n)
m (int, optional) – Test window size for beta-binomial mode (defaults to n)
bracket_width (int, optional) – Search radius around initial guess (default: adaptive based on n)
- Returns:
Dataclass containing correction results and diagnostic details
- Return type:
- Raises:
ValueError – If parameters are out of valid ranges
Examples
>>> result = ssbc_correct(alpha_target=0.10, n=50, delta=0.10) >>> print(f"Corrected alpha: {result.alpha_corrected:.4f}")
Notes
The algorithm uses a bracketed search with an initial guess based on normal approximation to the Beta distribution. If the initial bracket fails to find a solution, it performs adaptive outward expansion (downward then upward) with O(n) worst-case complexity.
- ssbc.alpha_scan(labels, probs, fixed_threshold=None)[source]
Scan through all possible alpha thresholds and report prediction set statistics.
For each unique threshold value derived from the calibration data’s non-conformity scores, this function computes the number of abstentions, singletons, and doublets for both classes using Mondrian conformal prediction.
Optionally, a fixed threshold can be evaluated separately and returned as a dict.
- Parameters:
labels (np.ndarray, shape (n,)) – True binary labels (0 or 1)
probs (np.ndarray, shape (n, 2)) – Classification probabilities [P(class=0), P(class=1)]
fixed_threshold (float, optional) – Fixed non-conformity score threshold for special case analysis. If None (default), no fixed threshold is evaluated.
- Returns:
- If fixed_threshold is None:
DataFrame with scan results
- If fixed_threshold is provided:
Tuple of (DataFrame with scan results, dict with fixed threshold results)
DataFrame columns: - alpha: miscoverage rate (alpha) - qhat_0: threshold for class 0 - qhat_1: threshold for class 1 - n_abstentions: number of empty prediction sets - n_singletons: number of singleton prediction sets - n_doublets: number of doublet prediction sets - n_singletons_correct: number of correct singletons (marginal) - singleton_coverage: fraction of singletons that are correct (marginal) - n_singletons_0: singletons when true label is 0 - n_singletons_correct_0: correct singletons when true label is 0 - singleton_coverage_0: coverage for class 0 singletons - n_singletons_1: singletons when true label is 1 - n_singletons_correct_1: correct singletons when true label is 1 - singleton_coverage_1: coverage for class 1 singletons
Fixed threshold dict (when provided) has same keys as DataFrame columns
- Return type:
Examples
>>> labels = np.array([0, 1, 0, 1]) >>> probs = np.array([[0.8, 0.2], [0.3, 0.7], [0.9, 0.1], [0.2, 0.8]]) >>> df = alpha_scan(labels, probs) >>> print(df.head())
- ssbc.compute_pac_operational_metrics(y_cal, probs_cal, alpha, delta, ci_level=0.95, class_label=1)[source]
Compute PAC-controlled confidence intervals for operational metrics.
Extends SSBC to provide rigorous bounds on operational metrics (singleton rates, escalation rates) without accepting risk by fiat. Uses a two-step approach:
SSBC for coverage: Compute α_adj that achieves Pr(coverage ≥ 1-α) ≥ 1-δ
PAC bounds on operational rates: For each possible α’ in discrete grid, run LOO-CV to estimate operational metrics, weight by Beta distribution probability, and aggregate to get PAC-controlled bounds.
- Parameters:
y_cal (np.ndarray, shape (n,)) – Binary labels (0 or 1) for calibration set
probs_cal (np.ndarray, shape (n,) or (n, 2)) – Predicted probabilities. If 1D, interpreted as P(class=1). If 2D, uses column corresponding to class_label.
alpha (float) – Target miscoverage rate (must be in (0, 1))
delta (float) – PAC risk tolerance (must be in (0, 1))
ci_level (float, default=0.95) – Confidence level for operational metric CIs (e.g., 0.95 for 95% CI)
class_label (int, default=1) – Which class to calibrate for (0 or 1). Uses class_label column if probs_cal is 2D.
- Returns:
Dictionary with keys: - ‘alpha_adj’: Adjusted miscoverage from SSBC - ‘singleton_rate_ci’: [lower, upper] PAC-controlled bounds - ‘doublet_rate_ci’: [lower, upper] - ‘abstention_rate_ci’: [lower, upper] - ‘expected_singleton_rate’: Probability-weighted mean singleton rate - ‘expected_doublet_rate’: Probability-weighted mean doublet rate - ‘expected_abstention_rate’: Probability-weighted mean abstention rate - ‘alpha_grid’: Discrete grid of possible alphas - ‘singleton_fractions’: Singleton rate for each alpha in grid - ‘doublet_fractions’: Doublet rate for each alpha in grid - ‘abstention_fractions’: Abstention rate for each alpha in grid - ‘beta_weights’: Probability weights from Beta distribution - ‘n_calibration’: Number of calibration points
- Return type:
Examples
>>> y_cal = np.array([0, 1, 0, 1, 1]) >>> probs_cal = np.array([0.2, 0.8, 0.3, 0.9, 0.7]) >>> result = compute_pac_operational_metrics( ... y_cal, probs_cal, alpha=0.1, delta=0.1 ... ) >>> print(f"Singleton rate: [{result['singleton_rate_ci'][0]:.3f}, " ... f"{result['singleton_rate_ci'][1]:.3f}]")
Notes
Mathematical Framework:
- Coverage decomposes as:
coverage = p_s(1 - α_singleton) + p_d·1 + p_a·0
where p_s, p_d, p_a are fractions of singletons, doublets, abstentions.
For each α’ in discrete grid {k/(n+1)}, k=1,…,n: 1. Run LOO-CV to determine prediction sets for each point 2. Calculate operational rates: p_s(α’), p_d(α’), p_a(α’) 3. Compute Clopper-Pearson CIs for each rate 4. Weight by Beta(k, n+1-k) probability
Aggregate across α’ with probability weighting to get PAC-controlled bounds.
Edge Cases: - Small n: Discretization is coarse, bounds may be conservative - Extreme α or δ: May result in very wide bounds - Class imbalance: Focus on class_label, ensure sufficient samples
- ssbc.mondrian_conformal_calibrate(class_data, alpha_target, delta, mode='beta', m=None)[source]
Perform Mondrian (per-class) conformal calibration with SSBC correction.
For each class, compute: 1. Nonconformity scores: s(x, y) = 1 - P(y|x) 2. SSBC-corrected alpha for PAC guarantee 3. Conformal quantile threshold 4. Singleton error rate bounds via PAC guarantee
Then evaluate prediction set sizes on calibration data PER CLASS and MARGINALLY.
- Parameters:
class_data (dict) – Output from split_by_class()
alpha_target (float or dict) – Target miscoverage rate for each class If float: same for both classes If dict: {0: α0, 1: α1} for per-class control
delta (float or dict) – PAC risk tolerance for each class If float: same for both classes If dict: {0: δ0, 1: δ1} for per-class control
mode (str, default="beta") – “beta” (infinite test) or “beta-binomial” (finite test)
m (int, optional) – Test window size for beta-binomial mode
- Returns:
calibration_result (dict) – Dictionary with keys 0 and 1, each containing calibration info
prediction_stats (dict) – Dictionary with keys: - 0, 1: per-class statistics (conditioned on true label) - ‘marginal’: overall statistics (ignoring true labels)
- Return type:
Examples
>>> labels = np.array([0, 1, 0, 1]) >>> probs = np.array([[0.8, 0.2], [0.3, 0.7], [0.9, 0.1], [0.2, 0.8]]) >>> class_data = split_by_class(labels, probs) >>> cal_result, pred_stats = mondrian_conformal_calibrate( ... class_data, alpha_target=0.1, delta=0.1 ... )
- ssbc.split_by_class(labels, probs)[source]
Split calibration data by true class for Mondrian conformal prediction.
- Parameters:
labels (np.ndarray, shape (n,)) – True binary labels (0 or 1)
probs (np.ndarray, shape (n, 2)) – Classification probabilities [P(class=0), P(class=1)]
- Returns:
Dictionary with keys 0 and 1, each containing: - ‘labels’: labels for this class (all same value) - ‘probs’: probabilities for samples in this class - ‘indices’: original indices (for tracking) - ‘n’: number of samples in this class
- Return type:
Examples
>>> labels = np.array([0, 1, 0, 1]) >>> probs = np.array([[0.8, 0.2], [0.3, 0.7], [0.9, 0.1], [0.2, 0.8]]) >>> class_data = split_by_class(labels, probs) >>> print(class_data[0]['n']) # Number of class 0 samples 2
- ssbc.clopper_pearson_intervals(labels, confidence=0.95)[source]
Compute Clopper-Pearson (exact binomial) confidence intervals for class prevalences.
- Parameters:
labels (np.ndarray) – Binary labels (0 or 1)
confidence (float, default=0.95) – Confidence level (e.g., 0.95 for 95% CI)
- Returns:
Dictionary with keys 0 and 1, each containing: - ‘count’: number of samples in this class - ‘proportion’: observed proportion - ‘lower’: lower bound of CI - ‘upper’: upper bound of CI
- Return type:
Examples
>>> labels = np.array([0, 0, 1, 1, 0]) >>> intervals = clopper_pearson_intervals(labels, confidence=0.95) >>> print(intervals[0]['proportion']) 0.6
Notes
The Clopper-Pearson interval is an exact binomial confidence interval based on Beta distribution quantiles. It provides conservative coverage guarantees.
- ssbc.clopper_pearson_lower(k, n, confidence=0.95)[source]
Compute lower Clopper-Pearson (one-sided) confidence bound.
- Parameters:
- Returns:
Lower confidence bound for the true proportion
- Return type:
Examples
>>> lower = clopper_pearson_lower(k=5, n=10, confidence=0.95) >>> print(f"Lower bound: {lower:.3f}")
Notes
Uses Beta distribution quantiles for exact binomial confidence bounds. For PAC-style guarantees, you may want to use delta = 1 - confidence.
- ssbc.clopper_pearson_upper(k, n, confidence=0.95)[source]
Compute upper Clopper-Pearson (one-sided) confidence bound.
- Parameters:
- Returns:
Upper confidence bound for the true proportion
- Return type:
Examples
>>> upper = clopper_pearson_upper(k=5, n=10, confidence=0.95) >>> print(f"Upper bound: {upper:.3f}")
Notes
Uses Beta distribution quantiles for exact binomial confidence bounds. For PAC-style guarantees, you may want to use delta = 1 - confidence.
- ssbc.prediction_bounds(k_cal, n_cal, n_test, confidence=0.95, method='simple')[source]
Compute prediction bounds accounting for both calibration and test set sampling uncertainty.
This function provides two methods for computing prediction bounds: 1. “simple”: Uses standard error formula (faster, good for large samples) 2. “beta_binomial”: Uses Beta-Binomial distribution (more accurate for small samples)
- Parameters:
k_cal (int) – Number of successes in calibration data for a single well-defined Bernoulli event. Must be the count of a binary indicator (e.g., Z_i = 1{event}) across all n_cal trials.
n_cal (int) – Total number of trials in calibration data for the same Bernoulli event. This is the fixed denominator (total sample size or conditional subpopulation size).
n_test (int) – Expected number of future trials for the same Bernoulli event. For joint rates, this is the planned test size (fixed). For conditional rates, this is an estimated future conditional subpopulation size.
confidence (float, default=0.95) – Confidence level (e.g., 0.95 for 95% prediction bounds)
method (str, default="simple") – Method to use: “simple” or “beta_binomial”
- Returns:
(lower_bound, upper_bound) for operational rates on future test sets
- Return type:
Examples
>>> # Simple method (default) >>> lower, upper = prediction_bounds(k_cal=50, n_cal=100, n_test=1000, confidence=0.95) >>> print(f"Simple bounds: [{lower:.3f}, {upper:.3f}]")
>>> # Beta-Binomial method (more accurate for small samples) >>> lower, upper = prediction_bounds(k_cal=50, n_cal=100, n_test=1000, confidence=0.95, method="beta_binomial") >>> print(f"Beta-Binomial bounds: [{lower:.3f}, {upper:.3f}]")
Notes
The prediction bounds account for both: 1. Calibration uncertainty: uncertainty in the true rate p from calibration data 2. Test set sampling uncertainty: variability when sampling n_test points from the true distribution
Simple method (default): - Mathematical formula: SE = sqrt(p̂(1-p̂) * (1/n_cal + 1/n_test)) - Good for large sample sizes - Faster computation
Beta-Binomial method: - Uses Beta-Binomial distribution for exact finite-sample modeling - More accurate for small sample sizes - Slower computation - Uses uniform prior Beta(1,1) and equal-tailed intervals by default - For advanced use (Jeffreys prior or HPD intervals), call
prediction_bounds_beta_binomial() directly
For large n_test, bounds approach calibration-only bounds. For small n_test, bounds are wider due to additional test set sampling uncertainty.
This is the recommended function for computing operational rate bounds when applying fixed thresholds to future test sets.
- ssbc.compute_robust_prediction_bounds(loo_predictions, n_test, alpha=0.05, method='auto', inflation_factor=None, verbose=True)[source]
Main function: Compute robust prediction bounds for small-sample LOO-CV.
This is the primary entry point. It intelligently selects methods based on sample size and provides comprehensive diagnostics.
Parameters:
- loo_predictionsnp.ndarray, shape (n_cal,)
Binary LOO predictions (1=singleton/success, 0=not/failure)
- n_testint
Expected size of future test sets
- alphafloat
Significance level (e.g., 0.05 for 95% confidence)
- methodstr
‘auto’ - Automatically select best method (recommended) ‘analytical’ - Method 1: Analytical with LOO correction ‘exact’ - Method 2: Exact binomial with effective n ‘hoeffding’ - Method 3: Distribution-free bound ‘all’ - Compute all three and report
- inflation_factorfloat, optional
Manual override for LOO variance inflation factor. If None, automatically estimated. Typical values: 1.0 (no inflation), 2.0 (standard LOO), 1.5-2.5 (empirical range)
- verbosebool, default=True
If True, print diagnostic information about method selection and inflation factors.
Returns:
- L_primefloat
Lower prediction bound
- U_primefloat
Upper prediction bound
- reportdict
Comprehensive diagnostics and method comparison
Usage Examples:
# Basic usage (auto-selects best method) L, U, report = compute_robust_prediction_bounds(loo_preds, n_test=50)
# Force conservative method L, U, report = compute_robust_prediction_bounds(
loo_preds, n_test=50, method=’exact’
)
# Compare all methods L, U, report = compute_robust_prediction_bounds(
loo_preds, n_test=50, method=’all’
) print(report[‘comparison_table’])
- ssbc.format_prediction_bounds_report(rate_name, loo_predictions, n_test, alpha=0.05, include_all_methods=True)[source]
Generate a formatted text report of prediction bounds.
This produces human-readable output suitable for inclusion in rigorous analysis reports.
Parameters:
- rate_namestr
Name of the rate (e.g., ‘Singleton Rate’, ‘Doublet Rate’)
- loo_predictionsnp.ndarray
Binary LOO predictions
- n_testint
Test set size
- alphafloat
Significance level
- include_all_methodsbool
If True, compare all three methods in report
Returns:
- reportstr
Formatted text report
- ssbc.cp_interval(count, total, confidence=0.95)[source]
Compute Clopper-Pearson exact confidence interval.
Helper function for computing a single CI from count and total.
- ssbc.compute_operational_rate(prediction_sets, true_labels, rate_type)[source]
Compute operational rate indicators for prediction sets.
For each prediction set, compute a binary indicator showing whether a specific operational event occurred (singleton, doublet, abstention, error in singleton, or correct in singleton).
- Parameters:
prediction_sets (list[set | list]) – Prediction sets for each sample. Each set contains predicted labels.
true_labels (np.ndarray) – True labels for each sample
rate_type ({"singleton", "doublet", "abstention", "error_in_singleton", "correct_in_singleton"}) – Type of operational rate to compute: - “singleton”: prediction set contains exactly one label - “doublet”: prediction set contains exactly two labels - “abstention”: prediction set is empty - “error_in_singleton”: singleton prediction that doesn’t contain true label - “correct_in_singleton”: singleton prediction that contains true label
- Returns:
Binary indicators (0 or 1) for whether the event holds for each sample
- Return type:
np.ndarray
Examples
>>> pred_sets = [{0}, {0, 1}, set(), {1}] >>> true_labels = np.array([0, 0, 1, 0]) >>> indicators = compute_operational_rate(pred_sets, true_labels, "singleton") >>> print(indicators) # [1, 0, 0, 1] >>> indicators = compute_operational_rate(pred_sets, true_labels, "correct_in_singleton") >>> print(indicators) # [1, 0, 0, 0] - first and last are singletons, first is correct
Notes
This function is useful for computing operational statistics on conformal prediction sets, such as singleton rates, escalation rates, and error rates.
- ssbc.evaluate_test_dataset(test_labels, test_probs, threshold_0, threshold_1)[source]
Evaluate a test dataset and compute empirical operational rates.
This function takes a test dataset with true labels and probability predictions, applies Mondrian conformal prediction thresholds, and returns comprehensive empirical rates for both marginal and per-class statistics.
- Parameters:
test_labels (np.ndarray) – True labels for test samples (0 or 1)
test_probs (np.ndarray) – Probability predictions for test samples, shape (n_samples, 2) test_probs[i, 0] = P(class=0), test_probs[i, 1] = P(class=1)
threshold_0 (float) – Conformal prediction threshold for class 0
threshold_1 (float) – Conformal prediction threshold for class 1
- Returns:
Dictionary containing empirical rates with structure: - ‘marginal’: Marginal rates across all samples - ‘class_0’: Rates for class 0 samples only - ‘class_1’: Rates for class 1 samples only Each containing: - ‘singleton_rate’: Fraction of samples with singleton predictions - ‘doublet_rate’: Fraction of samples with doublet predictions - ‘abstention_rate’: Fraction of samples with abstention (empty set) - ‘singleton_error_rate’: Fraction of singleton predictions that are incorrect - ‘n_samples’: Number of samples in this group - ‘n_singletons’: Number of singleton predictions - ‘n_doublets’: Number of doublet predictions - ‘n_abstentions’: Number of abstentions
- Return type:
Examples
>>> import numpy as np >>> from ssbc import evaluate_test_dataset >>> >>> # Generate test data >>> test_labels = np.array([0, 0, 1, 1, 0]) >>> test_probs = np.array([ ... [0.8, 0.2], # High confidence class 0 ... [0.6, 0.4], # Medium confidence class 0 ... [0.3, 0.7], # High confidence class 1 ... [0.4, 0.6], # Medium confidence class 1 ... [0.5, 0.5], # Uncertain ... ]) >>> >>> # Evaluate with thresholds >>> results = evaluate_test_dataset(test_labels, test_probs, 0.3, 0.3) >>> print(f"Marginal singleton rate: {results['marginal']['singleton_rate']:.3f}") >>> print(f"Class 0 singleton rate: {results['class_0']['singleton_rate']:.3f}")
Notes
This function is useful for: - Evaluating conformal prediction performance on test data - Comparing empirical rates to theoretical bounds - Computing operational statistics for reporting - Validating that thresholds work as expected
The function builds prediction sets using the Mondrian approach: - For each sample, include class 0 if score_0 <= threshold_0 - For each sample, include class 1 if score_1 <= threshold_1 - Where score_k = 1 - P(class=k)
- class ssbc.BinaryClassifierSimulator(p_class1, beta_params_class0, beta_params_class1, seed=None)[source]
Bases:
objectSimulate binary classification data with probabilities from Beta distributions.
This simulator generates realistic classification scenarios where the predicted probabilities for each class follow Beta distributions. Useful for testing and benchmarking conformal prediction methods.
- Parameters:
p_class1 (float) – Probability of drawing class 1 (class imbalance parameter) Must be in [0, 1]
beta_params_class0 (tuple of (a, b)) – Beta distribution parameters for p(class=1) when true label is 0 Typically use parameters that give low probabilities (e.g., (2, 8))
beta_params_class1 (tuple of (a, b)) – Beta distribution parameters for p(class=1) when true label is 1 Typically use parameters that give high probabilities (e.g., (8, 2))
seed (int, optional) – Random seed for reproducibility
- p_class1
Probability of class 1
- Type:
- p_class0
Probability of class 0 (= 1 - p_class1)
- Type:
- a0, b0
Beta parameters for class 0
- Type:
- a1, b1
Beta parameters for class 1
- Type:
- rng
Random number generator
- Type:
Examples
>>> # Simulate imbalanced data: 10% positive class >>> # Class 0: Beta(2, 8) → mean p(class=1) = 0.2 (low scores, correct) >>> # Class 1: Beta(8, 2) → mean p(class=1) = 0.8 (high scores, correct) >>> sim = BinaryClassifierSimulator( ... p_class1=0.10, ... beta_params_class0=(2, 8), ... beta_params_class1=(8, 2), ... seed=42 ... ) >>> labels, probs = sim.generate(n_samples=100) >>> print(labels.shape) (100,) >>> print(probs.shape) (100, 2)
Notes
The Beta distribution parameters (a, b) control the shape: - Mean = a / (a + b) - For a classifier that works well:
Class 0 should have low p(class=1): use (a, b) with a < b
Class 1 should have high p(class=1): use (a, b) with a > b
- __init__(p_class1, beta_params_class0, beta_params_class1, seed=None)[source]
Initialize the binary classifier simulator.
- generate(n_samples)[source]
Generate n_samples of (label, p(class=0), p(class=1)).
- Parameters:
n_samples (int) – Number of samples to generate
- Returns:
labels (np.ndarray, shape (n_samples,)) – True binary labels (0 or 1)
probs (np.ndarray, shape (n_samples, 2)) – Classification probabilities [p(class=0), p(class=1)] Each row sums to 1.0
- Return type:
Examples
>>> sim = BinaryClassifierSimulator( ... p_class1=0.5, ... beta_params_class0=(2, 8), ... beta_params_class1=(8, 2), ... seed=42 ... ) >>> labels, probs = sim.generate(n_samples=5) >>> print(f"Generated {len(labels)} samples") Generated 5 samples >>> print(f"Class balance: {np.bincount(labels)}") Class balance: [2 3]
- ssbc.report_prediction_stats(prediction_stats, calibration_result, operational_bounds_per_class=None, marginal_operational_bounds=None, verbose=True)[source]
Report rigorous statistics for Mondrian conformal prediction with valid CIs.
Only displays statistics with valid confidence intervals: - Per-class statistics from calibration data (valid within class) - Per-class operational bounds from cross-validation (rigorous PAC bounds) - Marginal operational bounds from cross-validated Mondrian (rigorous PAC bounds)
Does NOT display marginal statistics from calibration data (invalid CIs for Mondrian).
- Parameters:
prediction_stats (dict) – Output from mondrian_conformal_calibrate (second return value)
calibration_result (dict) – Output from mondrian_conformal_calibrate (first return value)
operational_bounds_per_class (dict[int, OperationalRateBoundsResult], optional) – Per-class operational bounds (from generate_rigorous_pac_report)
marginal_operational_bounds (OperationalRateBoundsResult, optional) – Marginal operational bounds (from generate_rigorous_pac_report)
verbose (bool, default=True) – If True, print detailed statistics to stdout
- Returns:
Structured summary with valid CIs: - Keys 0, 1 for per-class statistics - Key ‘marginal_bounds’ if marginal_operational_bounds provided
- Return type:
Examples
>>> # Get operational bounds from rigorous PAC report >>> from ssbc import generate_rigorous_pac_report >>> report = generate_rigorous_pac_report(labels, probs, alpha_target=0.10, delta=0.10) >>> cal_result = report['calibration_result'] >>> pred_stats = report['prediction_stats'] >>> op_bounds = report['pac_bounds_class_0'] # Per-class bounds >>> marginal = report['pac_bounds_marginal'] # Marginal bounds >>> summary = report_prediction_stats(pred_stats, cal_result, op_bounds, marginal)
- ssbc.plot_parallel_coordinates_plotly(df, columns=None, color='err_all', color_continuous_scale=None, title='Mondrian sweep – interactive parallel coordinates', height=600, base_opacity=0.9, unselected_opacity=0.06)[source]
Create interactive parallel coordinates plot for hyperparameter sweep results.
- Parameters:
df (pd.DataFrame) – DataFrame with hyperparameter sweep results
columns (list of str, optional) – Columns to display in parallel coordinates Default: [‘a0’,’d0’,’a1’,’d1’,’cov’,’sing_rate’,’err_all’,’err_pred0’,’err_pred1’,’err_y0’,’err_y1’,’esc_rate’]
color (str, default='err_all') – Column to use for coloring lines
color_continuous_scale (plotly colorscale, optional) – Color scale for the lines
title (str, default="Mondrian sweep – interactive parallel coordinates") – Plot title
height (int, default=600) – Plot height in pixels
base_opacity (float, default=0.9) – Opacity of selected lines
unselected_opacity (float, default=0.06) – Opacity of unselected lines (creates contrast)
- Returns:
Interactive plotly figure
- Return type:
plotly.graph_objects.Figure
Examples
>>> import pandas as pd >>> df = sweep_hyperparams_and_collect(...) >>> fig = plot_parallel_coordinates_plotly(df, color='err_all') >>> fig.show() # In notebook >>> # Or save: fig.write_html("sweep_results.html")
- ssbc.bootstrap_calibration_uncertainty(labels, probs, simulator, alpha_target=0.1, delta=0.1, test_size=1000, n_bootstrap=1000, n_jobs=-1, seed=None)[source]
Bootstrap analysis of calibration uncertainty.
For each bootstrap iteration: 1. Resample calibration data with replacement 2. Calibrate (compute SSBC thresholds) 3. Evaluate on fresh independent test set 4. Record operational rates
This models: “If I recalibrate on similar datasets, how do rates vary?”
- Parameters:
labels (np.ndarray) – Calibration labels
probs (np.ndarray) – Calibration probabilities
simulator (DataGenerator) – Simulator to generate independent test sets
alpha_target (float, default=0.10) – Target miscoverage
delta (float, default=0.10) – PAC risk
test_size (int, default=1000) – Size of test sets for evaluation
n_bootstrap (int, default=1000) – Number of bootstrap iterations
n_jobs (int, default=-1) – Parallel jobs (-1 for all cores)
seed (int, optional) – Random seed
- Returns:
Bootstrap distributions with keys: - ‘marginal’: dict with ‘singleton’, ‘doublet’, ‘abstention’, ‘singleton_error’ - ‘class_0’: dict with same metrics - ‘class_1’: dict with same metrics Each metric contains: - ‘samples’: array of rates across bootstrap trials - ‘mean’: mean rate - ‘std’: standard deviation - ‘quantiles’: dict with q05, q25, q50, q75, q95
- Return type:
Examples
>>> from ssbc import BinaryClassifierSimulator, bootstrap_calibration_uncertainty >>> sim = BinaryClassifierSimulator(p_class1=0.2, beta_params_class0=(1,7), beta_params_class1=(5,2)) >>> labels, probs = sim.generate(100) >>> results = bootstrap_calibration_uncertainty(labels, probs, sim, n_bootstrap=100) >>> print(results['marginal']['singleton']['mean'])
- ssbc.plot_bootstrap_distributions(bootstrap_results, figsize=(16, 12), save_path=None)[source]
Plot bootstrap distributions.
- Parameters:
- Raises:
ImportError – If matplotlib is not installed
- Return type:
None
Examples
>>> from ssbc import bootstrap_calibration_uncertainty, plot_bootstrap_distributions >>> results = bootstrap_calibration_uncertainty(...) >>> plot_bootstrap_distributions(results, save_path='bootstrap_results.png')
- ssbc.cross_conformal_validation(labels, probs, alpha_target=0.1, delta=0.1, n_folds=5, stratify=True, seed=None)[source]
K-fold cross-conformal validation for Mondrian conformal prediction.
Estimates the variability of operational rates (abstentions, singletons, doublets) due to finite calibration sample effects by splitting data into K folds.
For each fold: 1. Train: Compute SSBC-corrected thresholds on K-1 folds 2. Test: Evaluate operational rates on held-out fold 3. Record: Store rates for this fold
Aggregate rates across folds to quantify finite-sample variability.
- Parameters:
labels (np.ndarray, shape (n,)) – Calibration labels (0 or 1)
probs (np.ndarray, shape (n, 2)) – Calibration probabilities [P(class=0), P(class=1)]
alpha_target (float, default=0.10) – Target miscoverage rate
delta (float, default=0.10) – PAC risk for SSBC correction
n_folds (int, default=5) – Number of folds (K)
stratify (bool, default=True) – Stratify folds by class labels
seed (int, optional) – Random seed for reproducibility
- Returns:
Cross-conformal results with keys: - ‘fold_rates’: List of rate dicts for each fold - ‘marginal’: Statistics for marginal rates - ‘class_0’: Statistics for class 0 rates - ‘class_1’: Statistics for class 1 rates Each statistics dict contains: - ‘samples’: Array of rates across folds - ‘mean’: Mean rate - ‘std’: Standard deviation - ‘quantiles’: Dict with q05, q25, q50, q75, q95 - ‘ci_95’: 95% Clopper-Pearson CI (if applicable)
- Return type:
Examples
>>> from ssbc import cross_conformal_validation >>> results = cross_conformal_validation(labels, probs, n_folds=10) >>> m = results['marginal']['singleton'] >>> print(f"Singleton rate: {m['mean']:.3f} ± {m['std']:.3f}") >>> print(f"95% range: [{m['quantiles']['q05']:.3f}, {m['quantiles']['q95']:.3f}]")
Notes
Different from other methods: - LOO-CV: Leave-one-out, aggregates counts (not fold-level rates) - Bootstrap: Resamples with replacement, tests on fresh data - Cross-conformal: K-fold split, estimates rate distribution from calibration
This method directly estimates the variability of rates due to finite calibration samples, without requiring a data simulator.
- ssbc.print_cross_conformal_results(results)[source]
Pretty print cross-conformal validation results.
- Parameters:
results (dict) – Results from cross_conformal_validation()
- Return type:
None
- ssbc.validate_pac_bounds(report, simulator, test_size, n_trials=1000, seed=None, verbose=True, n_jobs=-1)[source]
Empirically validate prediction interval operational bounds.
Takes a PAC report from generate_rigorous_pac_report() and validates that the theoretical bounds actually hold in practice by: 1. Extracting the FIXED thresholds from calibration 2. Running n_trials simulations with fresh test sets 3. Measuring empirical coverage of all reported bounds (analytical, exact, hoeffding)
When the report includes method comparison (prediction_method=”all”), validates all three methods separately. Otherwise, validates only the selected method.
- Parameters:
report (dict) – Output from generate_rigorous_pac_report()
simulator (DataGenerator) – Simulator to generate independent test data (e.g., BinaryClassifierSimulator)
test_size (int) – Size of each test set
n_trials (int, default=1000) – Number of independent trials
seed (int, optional) – Random seed for reproducibility
verbose (bool, default=True) – Print validation progress
n_jobs (int, default=-1) – Number of parallel jobs for trial execution. -1 = use all cores (default), 1 = single-threaded, N = use N cores.
- Returns:
Validation results with: - ‘marginal’: Marginal operational rates and coverage - ‘class_0’: Class 0 operational rates and coverage - ‘class_1’: Class 1 operational rates and coverage Each containing:
’singleton’, ‘doublet’, ‘abstention’, ‘singleton_error’ dicts with:
’rates’: Array of rates across trials
’mean’: Mean rate
’quantiles’: Quantiles (5%, 25%, 50%, 75%, 95%)
’bounds’: Selected/default bounds from report
’expected’: Expected rate from report
’empirical_coverage’: Fraction of trials within selected bounds
’method_validations’: Dict of method-specific validations (when available): - ‘analytical’: {bounds, empirical_coverage} - ‘exact’: {bounds, empirical_coverage} - ‘hoeffding’: {bounds, empirical_coverage}
- Return type:
Examples
>>> from ssbc import BinaryClassifierSimulator, generate_rigorous_pac_report, validate_pac_bounds >>> sim = BinaryClassifierSimulator(p_class1=0.2, seed=42) >>> labels, probs = sim.generate(100) >>> report = generate_rigorous_pac_report(labels, probs, delta=0.10) >>> validation = validate_pac_bounds(report, sim, test_size=1000, n_trials=1000) >>> print(f"Singleton coverage: {validation['marginal']['singleton']['empirical_coverage']:.1%}")
Notes
This function is useful for: - Verifying theoretical PAC guarantees empirically - Understanding the tightness of bounds - Debugging issues with bounds calculation - Generating validation plots for papers/reports
The empirical coverage should be ≥ PAC level (1 - δ) for rigorous bounds.
- ssbc.print_validation_results(validation)[source]
Pretty print validation results.
- Parameters:
validation (dict) – Output from validate_pac_bounds()
- Return type:
None
Examples
>>> validation = validate_pac_bounds(report, sim, test_size=1000, n_trials=1000) >>> print_validation_results(validation)
- ssbc.plot_validation_bounds(validation, metric='singleton', show_detail=True, main_figsize=(18, 5), detail_figsize=(18, 12), bins=50, method_colors=None, return_figs=False)[source]
Plot empirical distributions with prediction interval bounds for all methods.
Creates visualization comparing empirical rates against bounds from analytical, exact, and hoeffding methods when available.
- Parameters:
validation (dict) – Output from validate_pac_bounds() containing validation results
metric (str, default="singleton") – Which metric to plot. Options: “singleton”, “doublet”, “abstention”, “singleton_error”
show_detail (bool, default=True) – If True, also create detailed 3x3 grid showing each method separately
main_figsize (tuple[int, int], default=(18, 5)) – Figure size for main comparison plot (width, height in inches)
detail_figsize (tuple[int, int], default=(18, 12)) – Figure size for detailed method comparison grid (width, height in inches)
bins (int, default=50) – Number of bins for histograms
method_colors (dict or None, default=None) – Custom colors and linestyles for methods. Dict mapping method names to (color, linestyle) tuples. If None, uses default colors: - “analytical”: (“#2E86AB”, “solid”) # Blue - “exact”: (“#A23B72”, “dashed”) # Purple - “hoeffding”: (“#F18F01”, “dashdot”) # Orange
return_figs (bool, default=False) – If True, returns matplotlib Figure objects for further customization. Returns (fig_main, fig_detail) or (fig_main, None) if show_detail=False. If False, calls plt.show() and returns None.
- Returns:
- If return_figs=True:
(fig_main, fig_detail) if show_detail=True
(fig_main, None) if show_detail=False
If return_figs=False: None (displays plots directly)
- Return type:
tuple or None
Examples
>>> from ssbc import validate_pac_bounds, plot_validation_bounds >>> validation = validate_pac_bounds(report, sim, test_size=1000, n_trials=1000) >>> plot_validation_bounds(validation, metric="singleton") >>> # Or get figure objects for customization >>> fig_main, fig_detail = plot_validation_bounds( ... validation, metric="singleton", return_figs=True ... ) >>> fig_main.savefig("validation_main.png")
Notes
The main plot shows all three methods overlaid on the same histogram for easy comparison. The detailed plot shows each method separately in a 3x3 grid. Both plots include: - Empirical distribution histogram - Method-specific bounds (when method comparison available) - Expected value from LOO-CV - Empirical mean from validation trials - Coverage percentages for each method
- ssbc.validate_prediction_interval_calibration(simulator, n_calibration, BigN, alpha_target=0.1, delta=0.1, test_size=1000, n_trials=1000, ci_level=0.95, use_loo_correction=True, prediction_method='all', loo_inflation_factor=None, seed=None, n_jobs=-1, verbose=False)[source]
Validate that prediction interval confidence level holds across calibration datasets.
This meta-validation checks if the nominal confidence level (e.g., 95%) actually holds when repeating the entire calibration+validation process many times with different calibration datasets.
For each of BigN calibration datasets: 1. Generate random calibration data 2. Compute prediction interval bounds 3. Validate bounds with many test sets 4. Record empirical coverage
Then aggregates results to see if ~95% of calibrations achieve ≥95% coverage.
- Parameters:
simulator (DataGenerator) – Simulator for generating calibration and test data (e.g., BinaryClassifierSimulator)
n_calibration (int) – Size of each calibration dataset
BigN (int) – Number of different calibration datasets to test
alpha_target (float or dict[int, float], default=0.10) – Target miscoverage rate per class
delta (float or dict[int, float], default=0.10) – PAC risk tolerance for threshold calibration
test_size (int, default=1000) – Size of each test set in validation
n_trials (int, default=1000) – Number of test sets per calibration dataset (for validation)
ci_level (float, default=0.95) – Nominal confidence level for prediction intervals (target to validate)
use_loo_correction (bool, default=True) – Use LOO-corrected bounds
prediction_method (str, default="all") – Method for bounds computation (“all” to compare all methods)
loo_inflation_factor (float, optional) – Manual override for LOO inflation factor
seed (int, optional) – Random seed for reproducibility
n_jobs (int, default=-1) – Number of parallel jobs (-1 = all cores)
verbose (bool, default=False) – If True, print progress for each calibration dataset
- Returns:
Meta-validation results with keys: - ‘n_calibrations’: BigN - ‘n_calibration’: Calibration dataset size - ‘n_trials_per_calibration’: n_trials - ‘ci_level’: Target confidence level - ‘marginal’: Dict with coverage statistics per method - ‘class_0’: Dict with coverage statistics per method - ‘class_1’: Dict with coverage statistics per method Each scope contains: - ‘singleton’, ‘doublet’, ‘abstention’, ‘singleton_error’: Dicts with:
’selected’: Coverage stats for selected bounds
’analytical’: Coverage stats if available
’exact’: Coverage stats if available
’hoeffding’: Coverage stats if available
Each method has: - ‘coverages’: Array of empirical coverages across BigN calibrations - ‘mean’: Mean coverage - ‘median’: Median coverage - ‘quantiles’: {q05, q25, q50, q75, q95} - ‘fraction_above_target’: Fraction achieving ≥ci_level - ‘fraction_above_95pct’: Fraction achieving ≥95% (for comparison)
- Return type:
Examples
>>> from ssbc import BinaryClassifierSimulator, validate_prediction_interval_calibration >>> sim = BinaryClassifierSimulator(p_class1=0.2, seed=42) >>> results = validate_prediction_interval_calibration( ... simulator=sim, ... n_calibration=100, ... BigN=50, ... n_trials=500, ... verbose=False ... ) >>> print(f"Fraction achieving ≥95%: {results['marginal']['singleton']['selected']['fraction_above_target']:.1%}")
- ssbc.print_calibration_validation_results(results)[source]
Pretty print meta-validation results.
- Parameters:
results (dict) – Output from validate_prediction_interval_calibration()
- Return type:
None
Examples
>>> results = validate_prediction_interval_calibration(...) >>> print_calibration_validation_results(results)
- ssbc.get_calibration_bounds_dataframe(results, scope=None, metric=None)[source]
Extract calibration bounds and observed quantiles as DataFrame.
Converts the raw calibration data from validate_prediction_interval_calibration() into a pandas DataFrame format for easy plotting and analysis.
- Parameters:
results (dict) – Output from validate_prediction_interval_calibration()
scope (str, optional) – Filter to specific scope: “marginal”, “class_0”, or “class_1”. If None, includes all scopes.
metric (str, optional) – Filter to specific metric: “singleton”, “doublet”, “abstention”, “singleton_error”. If None, includes all metrics.
- Returns:
Pandas DataFrame with columns: - calibration_idx: Index of calibration dataset (0 to BigN-1) - scope: marginal, class_0, or class_1 - metric: singleton, doublet, abstention, singleton_error - observed_q05: 5th percentile of test set rates - observed_q95: 95th percentile of test set rates - selected_lower: Lower bound from selected method - selected_upper: Upper bound from selected method - analytical_lower: Lower bound from analytical method (NaN if not available) - analytical_upper: Upper bound from analytical method (NaN if not available) - exact_lower: Lower bound from exact method (NaN if not available) - exact_upper: Upper bound from exact method (NaN if not available) - hoeffding_lower: Lower bound from hoeffding method (NaN if not available) - hoeffding_upper: Upper bound from hoeffding method (NaN if not available)
- Return type:
DataFrame
Examples
>>> import pandas as pd >>> from ssbc import get_calibration_bounds_dataframe >>> results = validate_prediction_interval_calibration(...) >>> df = get_calibration_bounds_dataframe(results) >>> # Filter to singleton marginal >>> df_single = df[(df['scope'] == 'marginal') & (df['metric'] == 'singleton')] >>> # Plot lower bounds >>> import matplotlib.pyplot as plt >>> plt.scatter(df_single['analytical_lower'], df_single['observed_q05'])
- ssbc.plot_calibration_excess(df, scope=None, metric=None, methods=None, figsize=(14, 6), bins=30, return_fig=False)[source]
Plot excess (difference between observed and predicted quantiles).
Creates histograms showing: - Lower excess: observed_q05 - predicted_lower (positive = predicted too high) - Upper excess: predicted_upper - observed_q95 (positive = predicted too high)
- Parameters:
df (DataFrame) – Output from get_calibration_bounds_dataframe()
scope (str, optional) – Filter to specific scope: “marginal”, “class_0”, or “class_1”. If None, uses all scopes (creates separate subplots).
metric (str, optional) – Filter to specific metric: “singleton”, “doublet”, “abstention”, “singleton_error”. If None, uses all metrics (creates separate subplots).
methods (list[str], optional) – Methods to plot: [“analytical”, “exact”, “hoeffding”]. If None, plots all available methods.
figsize (tuple[int, int], default=(14, 6)) – Figure size (width, height in inches)
bins (int, default=30) – Number of histogram bins
return_fig (bool, default=False) – If True, returns matplotlib Figure object. If False, calls plt.show()
- Returns:
If return_fig=True, returns Figure object. Otherwise None.
- Return type:
Figure or None
Examples
>>> from ssbc import get_calibration_bounds_dataframe, plot_calibration_excess >>> results = validate_prediction_interval_calibration(...) >>> df = get_calibration_bounds_dataframe(results) >>> # Plot for singleton marginal >>> df_single = df[(df['scope'] == 'marginal') & (df['metric'] == 'singleton')] >>> plot_calibration_excess(df_single, scope='marginal', metric='singleton')
- ssbc.generate_rigorous_pac_report(labels, probs, alpha_target=0.1, delta=0.1, test_size=None, ci_level=0.95, use_union_bound=False, n_jobs=-1, verbose=True, prediction_method='exact', use_loo_correction=True, loo_inflation_factor=None)[source]
Generate complete rigorous PAC report with coverage volatility.
This is the UNIFIED function that gives you everything properly: - SSBC-corrected thresholds - Coverage guarantees - PAC-controlled operational bounds (marginal + per-class) - Singleton error rates with PAC guarantees - All bounds account for coverage volatility via BetaBinomial
- Parameters:
labels (np.ndarray, shape (n,)) – True labels (0 or 1)
probs (np.ndarray, shape (n, 2)) – Predicted probabilities [P(class=0), P(class=1)]
alpha_target (float or dict[int, float], default=0.10) – Target miscoverage per class
delta (float or dict[int, float], default=0.10) – PAC risk tolerance. Used for both: - Coverage guarantee (via SSBC) - Operational bounds (pac_level = 1 - delta)
test_size (int, optional) – Expected test set size. If None, uses calibration size
ci_level (float, default=0.95) – Confidence level for prediction bounds
prediction_method (str, default="hoeffding") – Method for LOO uncertainty quantification (when use_loo_correction=True): - “auto”: Automatically select best method - “analytical”: Method 1 (recommended for n>=40) - “exact”: Method 2 (recommended for n=20-40) - “hoeffding”: Method 3 (ultra-conservative, default) - “all”: Compare all methods When use_loo_correction=False, this parameter is ignored.
use_loo_correction (bool, default=False) –
If True, uses LOO-CV uncertainty correction for small samples (n=20-40). This accounts for all four sources of uncertainty: 1. LOO-CV correlation structure (variance inflation ≈2×) 2. Threshold calibration uncertainty 3. Parameter estimation uncertainty 4. Test sampling uncertainty Recommended for small calibration sets where standard bounds may be too narrow.
LOO-CV Correlation Issue: The critical challenge with LOO-CV is that the N LOO predictions are not independent. The training sets for different folds overlap substantially—folds i and j using training sets D_{-i} and D_{-j} differ by only two examples out of N−1. Because each fold’s threshold is computed from nearly identical data, the resulting predictions exhibit strong positive correlation. This correlation structure is handled through specialized LOO-corrected methods that account for the dependency between folds when computing diagnostic bounds.
loo_inflation_factor (float, optional) –
Manual override for LOO variance inflation factor. If None (default), automatically estimated from the data using empirical variance.
Empirical Correction Factor Estimation: The inflation factor is estimated by comparing the empirical variance of LOO predictions to the theoretical IID variance. Specifically, inflation = (Var_empirical / Var_IID) × (n / (n-1)), where Var_empirical is the sample variance of the binary LOO predictions (with Bessel’s correction), Var_IID = p̂(1-p̂) is the expected variance under independence, and the n/(n-1) factor accounts for the finite-sample bias correction. For large n, this approaches the theoretical value of 2.0, but for small samples (n=20-40), the actual inflation can vary. The estimated factor is clipped to [1.0, 6.0] to prevent extreme values from outliers or numerical instability.
Typical values: - 1.0: No inflation (assumes independent samples - usually wrong for LOO) - 2.0: Standard LOO inflation (theoretical value for n→∞) - 1.5-2.5: Empirical range for small samples - >2.5: High correlation scenarios - Up to 6.0: Extended range for very high correlation scenarios
Note: This parameter can be used as a phenomenological control knob to correct for issues not modeled properly in the statistical framework. For example, if validation suggests the default estimation is too optimistic or too conservative, manually adjusting this factor can help achieve desired coverage behavior. Use with caution and validate empirically.
use_union_bound (bool, default=False) – Apply Bonferroni for simultaneous guarantees
n_jobs (int, default=-1) – Number of parallel jobs for LOO-CV computation. -1 = use all cores (default), 1 = single-threaded, N = use N cores.
verbose (bool, default=True) – Print comprehensive report
- Returns:
Complete report with keys: - ‘ssbc_class_0’: SSBCResult for class 0 - ‘ssbc_class_1’: SSBCResult for class 1 - ‘pac_bounds_marginal’: PAC operational bounds (marginal) - ‘pac_bounds_class_0’: PAC operational bounds (class 0) - ‘pac_bounds_class_1’: PAC operational bounds (class 1) - ‘calibration_result’: From mondrian_conformal_calibrate - ‘prediction_stats’: From mondrian_conformal_calibrate
- Return type:
Examples
>>> from ssbc import BinaryClassifierSimulator >>> from ssbc.rigorous_report import generate_rigorous_pac_report >>> >>> sim = BinaryClassifierSimulator(p_class1=0.5, seed=42) >>> labels, probs = sim.generate(n_samples=1000) >>> >>> report = generate_rigorous_pac_report( ... labels, probs, ... alpha_target=0.10, ... delta=0.10, ... verbose=True ... )
Notes
This replaces the old workflow (removed in v1.1.0):
OLD (removed - these functions no longer exist):
`python # These functions were removed in v1.1.0: # op_bounds = compute_mondrian_operational_bounds(...) # Removed # marginal_bounds = compute_marginal_operational_bounds(...) # Removed # report_prediction_stats(...) # Removed `NEW (rigorous):
`python report = generate_rigorous_pac_report(labels, probs, alpha_target, delta) # Done! All bounds account for coverage volatility. `
- ssbc.sweep_hyperparams_and_collect(class_data, alpha_0, delta_0, alpha_1, delta_1, mode='beta', extra_metrics=None, quiet=True)[source]
Sweep (a0,d0,a1,d1), run mondrian_conformal_calibrate + report_prediction_stats, and return a tidy DataFrame with hyperparams + selected metrics.
This function performs a grid search over hyperparameter combinations and evaluates the resulting conformal prediction performance.
- Parameters:
class_data (dict) – Output from split_by_class()
alpha_0 (array-like) – Grid of alpha values for class 0
delta_0 (array-like) – Grid of delta values for class 0
alpha_1 (array-like) – Grid of alpha values for class 1
delta_1 (array-like) – Grid of delta values for class 1
mode (str, default="beta") – “beta” or “beta-binomial” mode for SSBC
extra_metrics (dict of {name: function}, optional) – Additional metrics to compute. Each function takes the summary dict and returns a scalar value.
quiet (bool, default=True) – If True, suppress progress output
- Returns:
Tidy dataframe with one row per hyperparameter combination. Columns include: - a0, d0, a1, d1: hyperparameters - cov: overall coverage rate - sing_rate: singleton prediction rate - err_all: overall singleton error rate - err_pred0, err_pred1: errors by predicted class - err_y0, err_y1: errors by true class - esc_rate: escalation rate (doublets + abstentions) - n_total, sing_count, m_abst, m_doublets: counts - Any additional metrics from extra_metrics
- Return type:
pd.DataFrame
Examples
>>> import numpy as np >>> from ssbc import BinaryClassifierSimulator, split_by_class >>> >>> # Generate data >>> sim = BinaryClassifierSimulator(0.1, (2, 8), (8, 2), seed=42) >>> labels, probs = sim.generate(1000) >>> class_data = split_by_class(labels, probs) >>> >>> # Define grid >>> alpha_grid = np.arange(0.05, 0.20, 0.05) >>> delta_grid = np.arange(0.05, 0.20, 0.05) >>> >>> # Run sweep >>> df = sweep_hyperparams_and_collect( ... class_data, ... alpha_0=alpha_grid, delta_0=delta_grid, ... alpha_1=alpha_grid, delta_1=delta_grid, ... ) >>> >>> # Analyze results >>> print(df[['a0', 'a1', 'cov', 'sing_rate', 'err_all']].head())
Notes
The function performs a complete grid search, so the total number of evaluations is len(alpha_0) × len(delta_0) × len(alpha_1) × len(delta_1). For large grids, this can be computationally expensive.
- ssbc.sweep_and_plot_parallel_plotly(class_data, delta_0, delta_1, alpha_0, alpha_1, mode='beta', extra_metrics=None, color='err_all', color_continuous_scale=None, title=None, height=600)[source]
Convenience wrapper: run sweep + show plotly parallel coordinates figure.
This function combines hyperparameter sweep and visualization in one call.
- Parameters:
class_data (dict) – Output from split_by_class()
delta_0 (array-like) – Grid of delta values for classes 0 and 1
delta_1 (array-like) – Grid of delta values for classes 0 and 1
alpha_0 (array-like) – Grid of alpha values for classes 0 and 1
alpha_1 (array-like) – Grid of alpha values for classes 0 and 1
mode (str, default="beta") – “beta” or “beta-binomial” mode for SSBC
extra_metrics (dict of {name: function}, optional) – Additional metrics to compute
color (str, default='err_all') – Column to use for coloring the parallel coordinates
color_continuous_scale (plotly colorscale, optional) – Color scale for the plot
title (str, optional) – Plot title (defaults to auto-generated title)
height (int, default=600) – Plot height in pixels
- Returns:
df (pd.DataFrame) – Results dataframe
fig (plotly.graph_objects.Figure) – Interactive parallel coordinates plot
Examples
>>> import numpy as np >>> from ssbc import BinaryClassifierSimulator, split_by_class >>> >>> # Generate data >>> sim = BinaryClassifierSimulator(0.1, (2, 8), (8, 2), seed=42) >>> labels, probs = sim.generate(1000) >>> class_data = split_by_class(labels, probs) >>> >>> # Run sweep and plot >>> df, fig = sweep_and_plot_parallel_plotly( ... class_data, ... delta_0=np.arange(0.05, 0.20, 0.05), ... delta_1=np.arange(0.05, 0.20, 0.05), ... alpha_0=np.arange(0.05, 0.20, 0.05), ... alpha_1=np.arange(0.05, 0.20, 0.05), ... color='err_all' ... ) >>> fig.show() # Display in notebook >>> # Or save: fig.write_html("sweep_results.html")
Notes
The parallel coordinates plot allows interactive exploration of the hyperparameter space. You can brush (select) ranges on any axis to filter configurations and see their impact on other metrics.