# Usage Guide ## Overview SSBC (Small-Sample Beta Correction) provides tools for: - **PAC coverage guarantees** for conformal prediction with finite samples - **Mondrian conformal prediction** for class-conditional guarantees - **PAC operational bounds** for deployment rate estimates (LOO-CV + Clopper-Pearson) - **Uncertainty quantification** via bootstrap and cross-conformal validation - **Statistical utilities** for exact binomial confidence intervals ## Installation ```bash pip install ssbc ``` ## Package Organization SSBC is organized into focused packages that group related functionality: - **`ssbc.core_pkg`**: Core SSBC algorithm (`ssbc_correct`, `SSBCResult`) - **`ssbc.bounds`**: Statistical bounds computation (Clopper-Pearson, prediction bounds) - **`ssbc.calibration`**: Conformal prediction and calibration (Mondrian CP, bootstrap, cross-conformal) - **`ssbc.metrics`**: Operational metrics and uncertainty quantification (LOO-CV, operational bounds) - **`ssbc.reporting`**: Reporting and visualization utilities - **`ssbc.validation_pkg`**: Validation and empirical testing utilities For most use cases, the recommended approach is to import from the top-level `ssbc` package: ```python from ssbc import ( ssbc_correct, mondrian_conformal_calibrate, generate_rigorous_pac_report, # ... other functions ) ``` This provides a stable API regardless of internal organization. For specialized use cases, you may also import directly from specific packages: ```python from ssbc.calibration import split_by_class from ssbc.metrics import compute_pac_operational_bounds_marginal from ssbc.bounds import clopper_pearson_intervals ``` ## Quick Start ### Unified Workflow (Recommended) The complete rigorous workflow is available through a single function: ```python from ssbc import BinaryClassifierSimulator, generate_rigorous_pac_report # Generate or load calibration data sim = BinaryClassifierSimulator( p_class1=0.2, beta_params_class0=(1, 7), beta_params_class1=(5, 2), seed=42 ) labels, probs = sim.generate(n_samples=100) # Generate comprehensive PAC report report = generate_rigorous_pac_report( labels=labels, probs=probs, alpha_target=0.10, # Target 90% coverage delta=0.10, # 90% PAC confidence test_size=1000, # Expected deployment size use_union_bound=True, # Simultaneous guarantees verbose=True, ) # Access PAC bounds marginal_bounds = report['pac_bounds_marginal'] class_0_bounds = report['pac_bounds_class_0'] class_1_bounds = report['pac_bounds_class_1'] print(f"Singleton rate: {marginal_bounds['singleton_rate_bounds']}") print(f"Expected: {marginal_bounds['expected_singleton_rate']:.3f}") ``` ### LOO-CV Uncertainty Methods The rigorous report supports multiple methods for small-sample uncertainty quantification: ```python report = generate_rigorous_pac_report( labels=labels, probs=probs, alpha_target=0.10, delta=0.10, test_size=1000, prediction_method="all", # Compare analytical, exact, and Hoeffding methods use_loo_correction=True, loo_inflation_factor=2.0, # Override default LOO variance inflation ) # Access method comparison marginal = report['pac_bounds_marginal'] if 'loo_diagnostics' in marginal: singleton_diag = marginal['loo_diagnostics'].get('singleton', {}) if 'comparison' in singleton_diag: comparison = singleton_diag['comparison'] print("Method comparison:") for method, lower, upper, width in zip( comparison['method'], comparison['lower'], comparison['upper'], comparison['width'] ): print(f" {method}: [{lower:.3f}, {upper:.3f}] (width: {width:.3f})") ``` ## Core SSBC Algorithm ### Basic Correction ```python from ssbc import ssbc_correct # Correct miscoverage rate for finite-sample PAC guarantee result = ssbc_correct( alpha_target=0.10, # Target 10% miscoverage n=100, # Calibration set size delta=0.05, # 95% PAC guarantee mode="beta" # Infinite test window ) print(f"Corrected alpha: {result.alpha_corrected:.4f}") print(f"Use u* = {result.u_star} as threshold index") ``` ### Parameters - `alpha_target`: Target miscoverage rate (e.g., 0.10 for 90% coverage) - `n`: Calibration set size - `delta`: PAC risk tolerance (probability of violating guarantee) - `mode`: "beta" (infinite test) or "beta-binomial" (finite test) ## Mondrian Conformal Prediction ### Basic Workflow ```python from ssbc import split_by_class, mondrian_conformal_calibrate # Split data by class for Mondrian CP class_data = split_by_class(labels, probs) # Calibrate with SSBC correction cal_result, pred_stats = mondrian_conformal_calibrate( class_data=class_data, alpha_target=0.10, # Target 90% coverage per class delta=0.10, # 90% PAC guarantee mode="beta" ) # View thresholds for label in [0, 1]: print(f"Class {label}:") print(f" Threshold: {cal_result[label]['threshold']:.4f}") print(f" Corrected α: {cal_result[label]['alpha_corrected']:.4f}") ``` ## Alpha Scan Analysis Analyze how prediction set statistics vary across all possible alpha thresholds: ```python from ssbc import alpha_scan # Scan all possible alpha thresholds df = alpha_scan(labels, probs) print(f"Scanned {len(df)} alpha values") print(df.head()) # Find optimal operating point max_singleton_idx = df['n_singletons'].idxmax() optimal = df.loc[max_singleton_idx] print(f"\nMaximum singleton rate at alpha={optimal['alpha']:.4f}:") print(f" Singletons: {optimal['n_singletons']}") print(f" Singleton coverage: {optimal['singleton_coverage']:.4f}") ``` **DataFrame columns:** - `alpha`: miscoverage rate - `qhat_0`, `qhat_1`: per-class thresholds - `n_abstentions`, `n_singletons`, `n_doublets`: prediction set counts - `singleton_coverage`: fraction of singletons that are correct - `singleton_coverage_0`, `singleton_coverage_1`: per-class singleton coverage ## Uncertainty Quantification ### Standalone Diagnostic Tools Bootstrap and cross-conformal validation are available as standalone diagnostic tools (not integrated into the main PAC bounds computation): ### Bootstrap Calibration Uncertainty Analyze how operational rates vary if you recalibrate on similar datasets: ```python from ssbc import bootstrap_calibration_uncertainty, plot_bootstrap_distributions results = bootstrap_calibration_uncertainty( labels=labels, probs=probs, simulator=sim, n_bootstrap=1000, test_size=1000 ) # Visualize distributions plot_bootstrap_distributions(results, save_path='bootstrap_results.png') ``` ### Cross-Conformal Validation K-fold cross-validation for finite-sample diagnostics: ```python from ssbc import cross_conformal_validation results = cross_conformal_validation( labels=labels, probs=probs, n_folds=10, alpha_target=0.10, delta=0.10 ) print(f"Singleton rate: {results['marginal']['singleton']['mean']:.3f} ± {results['marginal']['singleton']['std']:.3f}") ``` ### Validation with Class-Conditional Error Metrics The validation framework includes class-conditional singleton error metrics: ```python from ssbc import validate_pac_bounds, validate_prediction_interval_calibration # Single validation validation = validate_pac_bounds(report, simulator=sim, test_size=1000, n_trials=10000) # Access class-conditional metrics (marginal scope only) marginal = validation['marginal'] print(f"Error rate (class 0, normalized): {marginal['singleton_error_class0']['mean']:.3f}") print(f"Error rate (class 1, normalized): {marginal['singleton_error_class1']['mean']:.3f}") print(f"P(error | singleton & class=0): {marginal['singleton_error_cond_class0']['mean']:.3f}") print(f"P(error | singleton & class=1): {marginal['singleton_error_cond_class1']['mean']:.3f}") # Meta-validation across many calibrations results = validate_prediction_interval_calibration( simulator=sim, n_calibration=100, BigN=500, n_trials=1000, prediction_method="all", ) ``` ### Empirical Validation Verify theoretical PAC guarantees empirically: ```python from ssbc import validate_pac_bounds, print_validation_results # Generate report report = generate_rigorous_pac_report(labels, probs, delta=0.10) # Validate with many test trials validation = validate_pac_bounds( report=report, simulator=sim, test_size=1000, n_trials=10000, ) # Print validation results print_validation_results(validation) # Check coverage coverage = validation['marginal']['singleton']['empirical_coverage'] pac_level = report['parameters']['pac_level_marginal'] if coverage >= pac_level: print(f"✅ Validation passed: {coverage:.1%} >= {pac_level:.1%}") ``` ## Statistical Utilities ### Clopper-Pearson Confidence Intervals ```python from ssbc import clopper_pearson_lower, clopper_pearson_upper, cp_interval # One-sided bounds lower = clopper_pearson_lower(k=45, n=100, confidence=0.95) upper = clopper_pearson_upper(k=45, n=100, confidence=0.95) # Two-sided interval interval = cp_interval(count=45, total=100, confidence=0.95) print(f"Rate: {interval['proportion']:.3f}") print(f"95% CI: [{interval['lower']:.3f}, {interval['upper']:.3f}]") ``` ### Operational Rate Computation ```python from ssbc import compute_operational_rate import numpy as np # Example prediction sets pred_sets = [{0}, {0, 1}, set(), {1}, {0}] true_labels = np.array([0, 0, 1, 1, 0]) # Compute indicators for different rates singleton_indicators = compute_operational_rate( pred_sets, true_labels, "singleton" ) error_indicators = compute_operational_rate( pred_sets, true_labels, "error_in_singleton" ) print(f"Singleton rate: {np.mean(singleton_indicators):.2%}") print(f"Error rate: {np.mean(error_indicators):.2%}") ``` **Supported rate types:** - `"singleton"`: Single predicted label - `"doublet"`: Two predicted labels - `"abstention"`: Empty prediction set - `"error_in_singleton"`: Singleton with incorrect prediction - `"correct_in_singleton"`: Singleton with correct prediction ## Hyperparameter Tuning Sweep over α and δ values to find optimal configurations: ```python from ssbc import sweep_and_plot_parallel_plotly import numpy as np # Define grid alpha_grid = np.arange(0.05, 0.20, 0.05) delta_grid = np.arange(0.05, 0.20, 0.05) # Split data by class class_data = split_by_class(labels, probs) # Run sweep and visualize df, fig = sweep_and_plot_parallel_plotly( class_data=class_data, alpha_0=alpha_grid, delta_0=delta_grid, alpha_1=alpha_grid, delta_1=delta_grid, color='err_all' # Color by error rate ) # Save interactive plot fig.write_html("sweep_results.html") # Analyze results print(df[['a0', 'd0', 'cov', 'sing_rate', 'err_all']].head()) ``` The interactive plot allows you to: - Brush (select) ranges on any axis to filter configurations - Explore trade-offs between coverage, automation, and error rates - Identify Pareto-optimal hyperparameter settings ## Understanding Report Components ### PAC Report Structure ```python report = generate_rigorous_pac_report(labels, probs) # SSBC results for each class ssbc_0 = report['ssbc_class_0'] # SSBCResult ssbc_1 = report['ssbc_class_1'] # SSBCResult # PAC operational bounds marginal_bounds = report['pac_bounds_marginal'] # Marginal statistics class_0_bounds = report['pac_bounds_class_0'] # Class 0 conditional class_1_bounds = report['pac_bounds_class_1'] # Class 1 conditional # Calibration results cal_result = report['calibration_result'] # Thresholds per class pred_stats = report['prediction_stats'] # Prediction statistics # LOO diagnostics and method comparison (if prediction_method="all") loo_diagnostics = bounds.get('loo_diagnostics', {}) # Parameters used params = report['parameters'] ``` ### PAC Bounds Dictionary Each PAC bounds dictionary contains: ```python bounds = report['pac_bounds_marginal'] # Bounds (as lists [lower, upper]) bounds['singleton_rate_bounds'] # [lower, upper] bounds['doublet_rate_bounds'] # [lower, upper] bounds['abstention_rate_bounds'] # [lower, upper] bounds['singleton_error_rate_bounds'] # [lower, upper] # Class-conditional error metrics (marginal bounds only) bounds.get('singleton_error_rate_class0_bounds', None) # [lower, upper] bounds.get('singleton_error_rate_class1_bounds', None) # [lower, upper] bounds.get('singleton_error_rate_cond_class0_bounds', None) # [lower, upper] bounds.get('singleton_error_rate_cond_class1_bounds', None) # [lower, upper] # Expected values (from LOO-CV) bounds['expected_singleton_rate'] bounds['expected_doublet_rate'] bounds['expected_abstention_rate'] bounds['expected_singleton_error_rate'] bounds.get('expected_singleton_error_rate_class0', None) bounds.get('expected_singleton_error_rate_class1', None) bounds.get('expected_singleton_error_rate_cond_class0', None) bounds.get('expected_singleton_error_rate_cond_class1', None) # Metadata bounds['n_grid_points'] # Number of grid points evaluated bounds['pac_level'] # PAC confidence level bounds['ci_level'] # Clopper-Pearson CI level ``` ## Key Concepts ### PAC Coverage (from SSBC) **Guarantee:** With probability ≥ 1-δ over calibration sets, the conformal predictor achieves coverage ≥ 1-α_target on future data. **Properties:** - Valid for ANY sample size n - Distribution-free - Frequentist (no priors) ### PAC Operational Bounds (LOO-CV + Clopper-Pearson) **Estimates:** Rigorous bounds on deployment rates accounting for estimation uncertainty. **Procedure:** 1. For each calibration point i, compute threshold using all OTHER points (LOO-CV) 2. Evaluate point i with that threshold (unbiased evaluation) 3. Aggregate counts across all n evaluations 4. Apply Clopper-Pearson confidence intervals to bound the true rate **Properties:** - Unbiased estimates (LOO ensures no data leakage) - Exact binomial CIs (Clopper-Pearson) - Accounts for estimation uncertainty from finite calibration - Valid for any future test set from same distribution ### PAC Bounds (LOO-CV + Prediction Bounds) **PAC Bounds (LOO-CV + Clopper-Pearson):** - Question: "Given THIS calibration, what rates on future test sets?" - Accounts for: Estimation uncertainty and test set sampling variability - Methods: Analytical (recommended n≥40), Exact (n=20-40), Hoeffding (ultra-conservative) - Use for: Deployment guarantees, SLA contracts - Validates: Operational rates (singleton, doublet, abstention, error rates) and class-conditional error metrics ### Marginal vs Per-Class **Marginal bounds** (ignore true labels): - "What will a user see?" - Deployment view - Overall automation rate **Per-class bounds** (conditioned on true label): - "How does performance differ by ground truth?" - Class-specific rates - Identifies minority class challenges ## Examples Complete examples are available in the `examples/` directory: ### 1. Core SSBC Algorithm ```bash python examples/ssbc_core_example.py ``` Demonstrates the SSBC algorithm for different calibration set sizes. ### 2. Mondrian Conformal Prediction ```bash python examples/mondrian_conformal_example.py ``` Complete workflow: simulation → calibration → per-class reporting. ### 3. Complete Workflow with PAC Operational Bounds ```bash python examples/complete_workflow_example.py ``` Shows the complete PAC bounds workflow using `generate_rigorous_pac_report()`. ### 4. SLA/Deployment Contracts ```bash python examples/sla_example.py ``` Full deployment pipeline with contract-ready operational guarantees. ### 5. Alpha Scan Analysis ```bash python examples/alpha_scan_example.py ``` Scan across all possible alpha thresholds to find optimal operating points. ### 6. PAC Bounds Validation ```bash python examples/pac_validation_example.py ``` Empirically validate that theoretical PAC guarantees hold in practice. ### 7. Bootstrap Demo (Standalone Diagnostic) ```bash python examples/bootstrap_calibration_demo.py ``` Standalone bootstrap analysis with detailed visualization. This is a diagnostic tool, not integrated into PAC bounds. ### 8. Cross-Conformal Validation (Standalone Diagnostic) ```bash python examples/cross_conformal_example.py ``` K-fold cross-validation for finite-sample diagnostics. This is a diagnostic tool, not integrated into PAC bounds. ## References ### Key Statistical Properties - **Distribution-Free**: No P(X,Y) assumptions - **Model-Agnostic**: Works with any classifier - **Frequentist**: Valid frequentist guarantees - **Non-Bayesian**: No priors required - **Finite-Sample**: Exact guarantees for small n (not asymptotic) - **Exchangeability Only**: Minimal assumption ### Further Reading - See [theory.md](theory.md) for detailed theoretical background - See [installation.md](installation.md) for setup instructions - See `examples/` directory for complete working examples