Usage Guide

Overview

SSBC (Small-Sample Beta Correction) provides tools for:

  • PAC coverage guarantees for conformal prediction with finite samples

  • Mondrian conformal prediction for class-conditional guarantees

  • PAC operational bounds for deployment rate estimates (LOO-CV + Clopper-Pearson)

  • Uncertainty quantification via bootstrap and cross-conformal validation

  • Statistical utilities for exact binomial confidence intervals

Installation

pip install ssbc

Package Organization

SSBC is organized into focused packages that group related functionality:

  • ssbc.core_pkg: Core SSBC algorithm (ssbc_correct, SSBCResult)

  • ssbc.bounds: Statistical bounds computation (Clopper-Pearson, prediction bounds)

  • ssbc.calibration: Conformal prediction and calibration (Mondrian CP, bootstrap, cross-conformal)

  • ssbc.metrics: Operational metrics and uncertainty quantification (LOO-CV, operational bounds)

  • ssbc.reporting: Reporting and visualization utilities

  • ssbc.validation_pkg: Validation and empirical testing utilities

For most use cases, the recommended approach is to import from the top-level ssbc package:

from ssbc import (
    ssbc_correct,
    mondrian_conformal_calibrate,
    generate_rigorous_pac_report,
    # ... other functions
)

This provides a stable API regardless of internal organization. For specialized use cases, you may also import directly from specific packages:

from ssbc.calibration import split_by_class
from ssbc.metrics import compute_pac_operational_bounds_marginal
from ssbc.bounds import clopper_pearson_intervals

Quick Start

LOO-CV Uncertainty Methods

The rigorous report supports multiple methods for small-sample uncertainty quantification:

report = generate_rigorous_pac_report(
    labels=labels,
    probs=probs,
    alpha_target=0.10,
    delta=0.10,
    test_size=1000,
    prediction_method="all",  # Compare analytical, exact, and Hoeffding methods
    use_loo_correction=True,
    loo_inflation_factor=2.0,  # Override default LOO variance inflation
)

# Access method comparison
marginal = report['pac_bounds_marginal']
if 'loo_diagnostics' in marginal:
    singleton_diag = marginal['loo_diagnostics'].get('singleton', {})
    if 'comparison' in singleton_diag:
        comparison = singleton_diag['comparison']
        print("Method comparison:")
        for method, lower, upper, width in zip(
            comparison['method'],
            comparison['lower'],
            comparison['upper'],
            comparison['width']
        ):
            print(f"  {method}: [{lower:.3f}, {upper:.3f}] (width: {width:.3f})")

Core SSBC Algorithm

Basic Correction

from ssbc import ssbc_correct

# Correct miscoverage rate for finite-sample PAC guarantee
result = ssbc_correct(
    alpha_target=0.10,  # Target 10% miscoverage
    n=100,              # Calibration set size
    delta=0.05,         # 95% PAC guarantee
    mode="beta"         # Infinite test window
)

print(f"Corrected alpha: {result.alpha_corrected:.4f}")
print(f"Use u* = {result.u_star} as threshold index")

Parameters

  • alpha_target: Target miscoverage rate (e.g., 0.10 for 90% coverage)

  • n: Calibration set size

  • delta: PAC risk tolerance (probability of violating guarantee)

  • mode: “beta” (infinite test) or “beta-binomial” (finite test)

Mondrian Conformal Prediction

Basic Workflow

from ssbc import split_by_class, mondrian_conformal_calibrate

# Split data by class for Mondrian CP
class_data = split_by_class(labels, probs)

# Calibrate with SSBC correction
cal_result, pred_stats = mondrian_conformal_calibrate(
    class_data=class_data,
    alpha_target=0.10,  # Target 90% coverage per class
    delta=0.10,         # 90% PAC guarantee
    mode="beta"
)

# View thresholds
for label in [0, 1]:
    print(f"Class {label}:")
    print(f"  Threshold: {cal_result[label]['threshold']:.4f}")
    print(f"  Corrected α: {cal_result[label]['alpha_corrected']:.4f}")

Alpha Scan Analysis

Analyze how prediction set statistics vary across all possible alpha thresholds:

from ssbc import alpha_scan

# Scan all possible alpha thresholds
df = alpha_scan(labels, probs)

print(f"Scanned {len(df)} alpha values")
print(df.head())

# Find optimal operating point
max_singleton_idx = df['n_singletons'].idxmax()
optimal = df.loc[max_singleton_idx]
print(f"\nMaximum singleton rate at alpha={optimal['alpha']:.4f}:")
print(f"  Singletons: {optimal['n_singletons']}")
print(f"  Singleton coverage: {optimal['singleton_coverage']:.4f}")

DataFrame columns:

  • alpha: miscoverage rate

  • qhat_0, qhat_1: per-class thresholds

  • n_abstentions, n_singletons, n_doublets: prediction set counts

  • singleton_coverage: fraction of singletons that are correct

  • singleton_coverage_0, singleton_coverage_1: per-class singleton coverage

Uncertainty Quantification

Standalone Diagnostic Tools

Bootstrap and cross-conformal validation are available as standalone diagnostic tools (not integrated into the main PAC bounds computation):

Bootstrap Calibration Uncertainty

Analyze how operational rates vary if you recalibrate on similar datasets:

from ssbc import bootstrap_calibration_uncertainty, plot_bootstrap_distributions

results = bootstrap_calibration_uncertainty(
    labels=labels,
    probs=probs,
    simulator=sim,
    n_bootstrap=1000,
    test_size=1000
)

# Visualize distributions
plot_bootstrap_distributions(results, save_path='bootstrap_results.png')

Cross-Conformal Validation

K-fold cross-validation for finite-sample diagnostics:

from ssbc import cross_conformal_validation

results = cross_conformal_validation(
    labels=labels,
    probs=probs,
    n_folds=10,
    alpha_target=0.10,
    delta=0.10
)

print(f"Singleton rate: {results['marginal']['singleton']['mean']:.3f} ± {results['marginal']['singleton']['std']:.3f}")

Validation with Class-Conditional Error Metrics

The validation framework includes class-conditional singleton error metrics:

from ssbc import validate_pac_bounds, validate_prediction_interval_calibration

# Single validation
validation = validate_pac_bounds(report, simulator=sim, test_size=1000, n_trials=10000)

# Access class-conditional metrics (marginal scope only)
marginal = validation['marginal']
print(f"Error rate (class 0, normalized): {marginal['singleton_error_class0']['mean']:.3f}")
print(f"Error rate (class 1, normalized): {marginal['singleton_error_class1']['mean']:.3f}")
print(f"P(error | singleton & class=0): {marginal['singleton_error_cond_class0']['mean']:.3f}")
print(f"P(error | singleton & class=1): {marginal['singleton_error_cond_class1']['mean']:.3f}")

# Meta-validation across many calibrations
results = validate_prediction_interval_calibration(
    simulator=sim,
    n_calibration=100,
    BigN=500,
    n_trials=1000,
    prediction_method="all",
)

Empirical Validation

Verify theoretical PAC guarantees empirically:

from ssbc import validate_pac_bounds, print_validation_results

# Generate report
report = generate_rigorous_pac_report(labels, probs, delta=0.10)

# Validate with many test trials
validation = validate_pac_bounds(
    report=report,
    simulator=sim,
    test_size=1000,
    n_trials=10000,
)

# Print validation results
print_validation_results(validation)

# Check coverage
coverage = validation['marginal']['singleton']['empirical_coverage']
pac_level = report['parameters']['pac_level_marginal']
if coverage >= pac_level:
    print(f"✅ Validation passed: {coverage:.1%} >= {pac_level:.1%}")

Statistical Utilities

Clopper-Pearson Confidence Intervals

from ssbc import clopper_pearson_lower, clopper_pearson_upper, cp_interval

# One-sided bounds
lower = clopper_pearson_lower(k=45, n=100, confidence=0.95)
upper = clopper_pearson_upper(k=45, n=100, confidence=0.95)

# Two-sided interval
interval = cp_interval(count=45, total=100, confidence=0.95)
print(f"Rate: {interval['proportion']:.3f}")
print(f"95% CI: [{interval['lower']:.3f}, {interval['upper']:.3f}]")

Operational Rate Computation

from ssbc import compute_operational_rate
import numpy as np

# Example prediction sets
pred_sets = [{0}, {0, 1}, set(), {1}, {0}]
true_labels = np.array([0, 0, 1, 1, 0])

# Compute indicators for different rates
singleton_indicators = compute_operational_rate(
    pred_sets, true_labels, "singleton"
)
error_indicators = compute_operational_rate(
    pred_sets, true_labels, "error_in_singleton"
)

print(f"Singleton rate: {np.mean(singleton_indicators):.2%}")
print(f"Error rate: {np.mean(error_indicators):.2%}")

Supported rate types:

  • "singleton": Single predicted label

  • "doublet": Two predicted labels

  • "abstention": Empty prediction set

  • "error_in_singleton": Singleton with incorrect prediction

  • "correct_in_singleton": Singleton with correct prediction

Hyperparameter Tuning

Sweep over α and δ values to find optimal configurations:

from ssbc import sweep_and_plot_parallel_plotly
import numpy as np

# Define grid
alpha_grid = np.arange(0.05, 0.20, 0.05)
delta_grid = np.arange(0.05, 0.20, 0.05)

# Split data by class
class_data = split_by_class(labels, probs)

# Run sweep and visualize
df, fig = sweep_and_plot_parallel_plotly(
    class_data=class_data,
    alpha_0=alpha_grid, delta_0=delta_grid,
    alpha_1=alpha_grid, delta_1=delta_grid,
    color='err_all'  # Color by error rate
)

# Save interactive plot
fig.write_html("sweep_results.html")

# Analyze results
print(df[['a0', 'd0', 'cov', 'sing_rate', 'err_all']].head())

The interactive plot allows you to:

  • Brush (select) ranges on any axis to filter configurations

  • Explore trade-offs between coverage, automation, and error rates

  • Identify Pareto-optimal hyperparameter settings

Understanding Report Components

PAC Report Structure

report = generate_rigorous_pac_report(labels, probs)

# SSBC results for each class
ssbc_0 = report['ssbc_class_0']  # SSBCResult
ssbc_1 = report['ssbc_class_1']  # SSBCResult

# PAC operational bounds
marginal_bounds = report['pac_bounds_marginal']  # Marginal statistics
class_0_bounds = report['pac_bounds_class_0']    # Class 0 conditional
class_1_bounds = report['pac_bounds_class_1']    # Class 1 conditional

# Calibration results
cal_result = report['calibration_result']  # Thresholds per class
pred_stats = report['prediction_stats']    # Prediction statistics

# LOO diagnostics and method comparison (if prediction_method="all")
loo_diagnostics = bounds.get('loo_diagnostics', {})

# Parameters used
params = report['parameters']

PAC Bounds Dictionary

Each PAC bounds dictionary contains:

bounds = report['pac_bounds_marginal']

# Bounds (as lists [lower, upper])
bounds['singleton_rate_bounds']       # [lower, upper]
bounds['doublet_rate_bounds']         # [lower, upper]
bounds['abstention_rate_bounds']      # [lower, upper]
bounds['singleton_error_rate_bounds'] # [lower, upper]

# Class-conditional error metrics (marginal bounds only)
bounds.get('singleton_error_rate_class0_bounds', None)      # [lower, upper]
bounds.get('singleton_error_rate_class1_bounds', None)      # [lower, upper]
bounds.get('singleton_error_rate_cond_class0_bounds', None) # [lower, upper]
bounds.get('singleton_error_rate_cond_class1_bounds', None) # [lower, upper]

# Expected values (from LOO-CV)
bounds['expected_singleton_rate']
bounds['expected_doublet_rate']
bounds['expected_abstention_rate']
bounds['expected_singleton_error_rate']
bounds.get('expected_singleton_error_rate_class0', None)
bounds.get('expected_singleton_error_rate_class1', None)
bounds.get('expected_singleton_error_rate_cond_class0', None)
bounds.get('expected_singleton_error_rate_cond_class1', None)

# Metadata
bounds['n_grid_points']  # Number of grid points evaluated
bounds['pac_level']      # PAC confidence level
bounds['ci_level']       # Clopper-Pearson CI level

Key Concepts

PAC Coverage (from SSBC)

Guarantee: With probability ≥ 1-δ over calibration sets, the conformal predictor achieves coverage ≥ 1-α_target on future data.

Properties:

  • Valid for ANY sample size n

  • Distribution-free

  • Frequentist (no priors)

PAC Operational Bounds (LOO-CV + Clopper-Pearson)

Estimates: Rigorous bounds on deployment rates accounting for estimation uncertainty.

Procedure:

  1. For each calibration point i, compute threshold using all OTHER points (LOO-CV)

  2. Evaluate point i with that threshold (unbiased evaluation)

  3. Aggregate counts across all n evaluations

  4. Apply Clopper-Pearson confidence intervals to bound the true rate

Properties:

  • Unbiased estimates (LOO ensures no data leakage)

  • Exact binomial CIs (Clopper-Pearson)

  • Accounts for estimation uncertainty from finite calibration

  • Valid for any future test set from same distribution

PAC Bounds (LOO-CV + Prediction Bounds)

PAC Bounds (LOO-CV + Clopper-Pearson):

  • Question: “Given THIS calibration, what rates on future test sets?”

  • Accounts for: Estimation uncertainty and test set sampling variability

  • Methods: Analytical (recommended n≥40), Exact (n=20-40), Hoeffding (ultra-conservative)

  • Use for: Deployment guarantees, SLA contracts

  • Validates: Operational rates (singleton, doublet, abstention, error rates) and class-conditional error metrics

Marginal vs Per-Class

Marginal bounds (ignore true labels):

  • “What will a user see?”

  • Deployment view

  • Overall automation rate

Per-class bounds (conditioned on true label):

  • “How does performance differ by ground truth?”

  • Class-specific rates

  • Identifies minority class challenges

Examples

Complete examples are available in the examples/ directory:

1. Core SSBC Algorithm

python examples/ssbc_core_example.py

Demonstrates the SSBC algorithm for different calibration set sizes.

2. Mondrian Conformal Prediction

python examples/mondrian_conformal_example.py

Complete workflow: simulation → calibration → per-class reporting.

3. Complete Workflow with PAC Operational Bounds

python examples/complete_workflow_example.py

Shows the complete PAC bounds workflow using generate_rigorous_pac_report().

4. SLA/Deployment Contracts

python examples/sla_example.py

Full deployment pipeline with contract-ready operational guarantees.

5. Alpha Scan Analysis

python examples/alpha_scan_example.py

Scan across all possible alpha thresholds to find optimal operating points.

6. PAC Bounds Validation

python examples/pac_validation_example.py

Empirically validate that theoretical PAC guarantees hold in practice.

7. Bootstrap Demo (Standalone Diagnostic)

python examples/bootstrap_calibration_demo.py

Standalone bootstrap analysis with detailed visualization. This is a diagnostic tool, not integrated into PAC bounds.

8. Cross-Conformal Validation (Standalone Diagnostic)

python examples/cross_conformal_example.py

K-fold cross-validation for finite-sample diagnostics. This is a diagnostic tool, not integrated into PAC bounds.

References

Key Statistical Properties

  • Distribution-Free: No P(X,Y) assumptions

  • Model-Agnostic: Works with any classifier

  • Frequentist: Valid frequentist guarantees

  • Non-Bayesian: No priors required

  • Finite-Sample: Exact guarantees for small n (not asymptotic)

  • Exchangeability Only: Minimal assumption

Further Reading

  • See theory.md for detailed theoretical background

  • See installation.md for setup instructions

  • See examples/ directory for complete working examples