ssbc.simulation
Simulation utilities for testing conformal prediction.
Classes
|
Simulate binary classification data with probabilities from Beta distributions. |
- class ssbc.simulation.BinaryClassifierSimulator(p_class1, beta_params_class0, beta_params_class1, seed=None)[source]
Simulate binary classification data with probabilities from Beta distributions.
This simulator generates realistic classification scenarios where the predicted probabilities for each class follow Beta distributions. Useful for testing and benchmarking conformal prediction methods.
- Parameters:
p_class1 (float) – Probability of drawing class 1 (class imbalance parameter) Must be in [0, 1]
beta_params_class0 (tuple of (a, b)) – Beta distribution parameters for p(class=1) when true label is 0 Typically use parameters that give low probabilities (e.g., (2, 8))
beta_params_class1 (tuple of (a, b)) – Beta distribution parameters for p(class=1) when true label is 1 Typically use parameters that give high probabilities (e.g., (8, 2))
seed (int, optional) – Random seed for reproducibility
- a0, b0
Beta parameters for class 0
- Type:
- a1, b1
Beta parameters for class 1
- Type:
- rng
Random number generator
- Type:
Examples
>>> # Simulate imbalanced data: 10% positive class >>> # Class 0: Beta(2, 8) → mean p(class=1) = 0.2 (low scores, correct) >>> # Class 1: Beta(8, 2) → mean p(class=1) = 0.8 (high scores, correct) >>> sim = BinaryClassifierSimulator( ... p_class1=0.10, ... beta_params_class0=(2, 8), ... beta_params_class1=(8, 2), ... seed=42 ... ) >>> labels, probs = sim.generate(n_samples=100) >>> print(labels.shape) (100,) >>> print(probs.shape) (100, 2)
Notes
The Beta distribution parameters (a, b) control the shape: - Mean = a / (a + b) - For a classifier that works well:
Class 0 should have low p(class=1): use (a, b) with a < b
Class 1 should have high p(class=1): use (a, b) with a > b
- __init__(p_class1, beta_params_class0, beta_params_class1, seed=None)[source]
Initialize the binary classifier simulator.
- generate(n_samples)[source]
Generate n_samples of (label, p(class=0), p(class=1)).
- Parameters:
n_samples (int) – Number of samples to generate
- Returns:
labels (np.ndarray, shape (n_samples,)) – True binary labels (0 or 1)
probs (np.ndarray, shape (n_samples, 2)) – Classification probabilities [p(class=0), p(class=1)] Each row sums to 1.0
- Return type:
Examples
>>> sim = BinaryClassifierSimulator( ... p_class1=0.5, ... beta_params_class0=(2, 8), ... beta_params_class1=(8, 2), ... seed=42 ... ) >>> labels, probs = sim.generate(n_samples=5) >>> print(f"Generated {len(labels)} samples") Generated 5 samples >>> print(f"Class balance: {np.bincount(labels)}") Class balance: [2 3]