ssbc.hyperparameter
Hyperparameter sweep and optimization for Mondrian conformal prediction.
Functions
|
Convenience wrapper: run sweep + show plotly parallel coordinates figure. |
|
Sweep (a0,d0,a1,d1), run mondrian_conformal_calibrate + report_prediction_stats, and return a tidy DataFrame with hyperparams + selected metrics. |
- ssbc.hyperparameter.sweep_hyperparams_and_collect(class_data, alpha_0, delta_0, alpha_1, delta_1, mode='beta', extra_metrics=None, quiet=True)[source]
Sweep (a0,d0,a1,d1), run mondrian_conformal_calibrate + report_prediction_stats, and return a tidy DataFrame with hyperparams + selected metrics.
This function performs a grid search over hyperparameter combinations and evaluates the resulting conformal prediction performance.
- Parameters:
class_data (dict) – Output from split_by_class()
alpha_0 (array-like) – Grid of alpha values for class 0
delta_0 (array-like) – Grid of delta values for class 0
alpha_1 (array-like) – Grid of alpha values for class 1
delta_1 (array-like) – Grid of delta values for class 1
mode (str, default="beta") – “beta” or “beta-binomial” mode for SSBC
extra_metrics (dict of {name: function}, optional) – Additional metrics to compute. Each function takes the summary dict and returns a scalar value.
quiet (bool, default=True) – If True, suppress progress output
- Returns:
Tidy dataframe with one row per hyperparameter combination. Columns include: - a0, d0, a1, d1: hyperparameters - cov: overall coverage rate - sing_rate: singleton prediction rate - err_all: overall singleton error rate - err_pred0, err_pred1: errors by predicted class - err_y0, err_y1: errors by true class - esc_rate: escalation rate (doublets + abstentions) - n_total, sing_count, m_abst, m_doublets: counts - Any additional metrics from extra_metrics
- Return type:
pd.DataFrame
Examples
>>> import numpy as np >>> from ssbc import BinaryClassifierSimulator, split_by_class >>> >>> # Generate data >>> sim = BinaryClassifierSimulator(0.1, (2, 8), (8, 2), seed=42) >>> labels, probs = sim.generate(1000) >>> class_data = split_by_class(labels, probs) >>> >>> # Define grid >>> alpha_grid = np.arange(0.05, 0.20, 0.05) >>> delta_grid = np.arange(0.05, 0.20, 0.05) >>> >>> # Run sweep >>> df = sweep_hyperparams_and_collect( ... class_data, ... alpha_0=alpha_grid, delta_0=delta_grid, ... alpha_1=alpha_grid, delta_1=delta_grid, ... ) >>> >>> # Analyze results >>> print(df[['a0', 'a1', 'cov', 'sing_rate', 'err_all']].head())
Notes
The function performs a complete grid search, so the total number of evaluations is len(alpha_0) × len(delta_0) × len(alpha_1) × len(delta_1). For large grids, this can be computationally expensive.
- ssbc.hyperparameter.sweep_and_plot_parallel_plotly(class_data, delta_0, delta_1, alpha_0, alpha_1, mode='beta', extra_metrics=None, color='err_all', color_continuous_scale=None, title=None, height=600)[source]
Convenience wrapper: run sweep + show plotly parallel coordinates figure.
This function combines hyperparameter sweep and visualization in one call.
- Parameters:
class_data (dict) – Output from split_by_class()
delta_0 (array-like) – Grid of delta values for classes 0 and 1
delta_1 (array-like) – Grid of delta values for classes 0 and 1
alpha_0 (array-like) – Grid of alpha values for classes 0 and 1
alpha_1 (array-like) – Grid of alpha values for classes 0 and 1
mode (str, default="beta") – “beta” or “beta-binomial” mode for SSBC
extra_metrics (dict of {name: function}, optional) – Additional metrics to compute
color (str, default='err_all') – Column to use for coloring the parallel coordinates
color_continuous_scale (plotly colorscale, optional) – Color scale for the plot
title (str, optional) – Plot title (defaults to auto-generated title)
height (int, default=600) – Plot height in pixels
- Returns:
df (pd.DataFrame) – Results dataframe
fig (plotly.graph_objects.Figure) – Interactive parallel coordinates plot
Examples
>>> import numpy as np >>> from ssbc import BinaryClassifierSimulator, split_by_class >>> >>> # Generate data >>> sim = BinaryClassifierSimulator(0.1, (2, 8), (8, 2), seed=42) >>> labels, probs = sim.generate(1000) >>> class_data = split_by_class(labels, probs) >>> >>> # Run sweep and plot >>> df, fig = sweep_and_plot_parallel_plotly( ... class_data, ... delta_0=np.arange(0.05, 0.20, 0.05), ... delta_1=np.arange(0.05, 0.20, 0.05), ... alpha_0=np.arange(0.05, 0.20, 0.05), ... alpha_1=np.arange(0.05, 0.20, 0.05), ... color='err_all' ... ) >>> fig.show() # Display in notebook >>> # Or save: fig.write_html("sweep_results.html")
Notes
The parallel coordinates plot allows interactive exploration of the hyperparameter space. You can brush (select) ranges on any axis to filter configurations and see their impact on other metrics.