ssbc.simulation

Simulation utilities for testing conformal prediction.

Classes

BinaryClassifierSimulator(p_class1, ...[, seed])

Simulate binary classification data with probabilities from Beta distributions.

class ssbc.simulation.BinaryClassifierSimulator(p_class1, beta_params_class0, beta_params_class1, seed=None)[source]

Simulate binary classification data with probabilities from Beta distributions.

This simulator generates realistic classification scenarios where the predicted probabilities for each class follow Beta distributions. Useful for testing and benchmarking conformal prediction methods.

Parameters:
  • p_class1 (float) – Probability of drawing class 1 (class imbalance parameter) Must be in [0, 1]

  • beta_params_class0 (tuple of (a, b)) – Beta distribution parameters for p(class=1) when true label is 0 Typically use parameters that give low probabilities (e.g., (2, 8))

  • beta_params_class1 (tuple of (a, b)) – Beta distribution parameters for p(class=1) when true label is 1 Typically use parameters that give high probabilities (e.g., (8, 2))

  • seed (int, optional) – Random seed for reproducibility

p_class1

Probability of class 1

Type:

float

p_class0

Probability of class 0 (= 1 - p_class1)

Type:

float

a0, b0

Beta parameters for class 0

Type:

float

a1, b1

Beta parameters for class 1

Type:

float

rng

Random number generator

Type:

numpy.random.Generator

Examples

>>> # Simulate imbalanced data: 10% positive class
>>> # Class 0: Beta(2, 8) → mean p(class=1) = 0.2 (low scores, correct)
>>> # Class 1: Beta(8, 2) → mean p(class=1) = 0.8 (high scores, correct)
>>> sim = BinaryClassifierSimulator(
...     p_class1=0.10,
...     beta_params_class0=(2, 8),
...     beta_params_class1=(8, 2),
...     seed=42
... )
>>> labels, probs = sim.generate(n_samples=100)
>>> print(labels.shape)
(100,)
>>> print(probs.shape)
(100, 2)

Notes

The Beta distribution parameters (a, b) control the shape: - Mean = a / (a + b) - For a classifier that works well:

  • Class 0 should have low p(class=1): use (a, b) with a < b

  • Class 1 should have high p(class=1): use (a, b) with a > b

__init__(p_class1, beta_params_class0, beta_params_class1, seed=None)[source]

Initialize the binary classifier simulator.

Parameters:
generate(n_samples)[source]

Generate n_samples of (label, p(class=0), p(class=1)).

Parameters:

n_samples (int) – Number of samples to generate

Returns:

  • labels (np.ndarray, shape (n_samples,)) – True binary labels (0 or 1)

  • probs (np.ndarray, shape (n_samples, 2)) – Classification probabilities [p(class=0), p(class=1)] Each row sums to 1.0

Return type:

tuple[ndarray, ndarray]

Examples

>>> sim = BinaryClassifierSimulator(
...     p_class1=0.5,
...     beta_params_class0=(2, 8),
...     beta_params_class1=(8, 2),
...     seed=42
... )
>>> labels, probs = sim.generate(n_samples=5)
>>> print(f"Generated {len(labels)} samples")
Generated 5 samples
>>> print(f"Class balance: {np.bincount(labels)}")
Class balance: [2 3]
__repr__()[source]

String representation of the simulator.

Return type:

str