Simulator Module

Synthetic data generation for testing, validation, ML training data generation.

This module provides tools for generating realistic simulated spectroscopy data from models, with support for different detector types and noise models. Use for: - Testing fitting algorithms with known ground truth - Exploring parameter sensitivity and identifiability - Optimizing experimental design (SNR requirements) - Generating training data for machine learning - Validating analysis pipelines

Key Features

  • Two detector types: analog and photon counting

  • Multiple noise models: Poisson, Gaussian, or none

  • 1D and 2D spectrum simulation

  • Batch generation for statistical analysis

  • Parameter sweeping (grid/random/uniform) for ML training

Detector Types

Analog Detectors (CCD, photodiodes, lock-in amplifiers): - Continuous signal output - Additive noise (Gaussian or Poisson) - Noise level controlled by noise_level parameter

Photon Counting (APD, photomultiplier, event mode): - Discrete photon events - Shot noise inherent (Poisson statistics) - Count rate determines signal-to-noise ratio

Workflow

  1. Testing and Validation

    • Create model with trspecfit.mcp.Model

    • Initialize Simulator with model and noise parameters

    • Generate data with simulate_1d() or simulate_2d() OR generate multiple realizations with simulate_n()

    • Save data and ground truth with save_data()

    • Fit simulated data to validate fitting pipeline

  2. Machine Learning Training Data Generation

    • Create model with trspecfit.mcp.Model

    • Define parameter space using trspecfit.utils.sweep.ParameterSweep

    • Initialize Simulator with model and noise parameters

    • Generate multiple realizations (n) for each parameter combination (data, ground truth, and relevant metadata get saved automatically)

Examples

See examples/simulator/ directory for complete workflows.

class trspecfit.simulator.Simulator(model: Model, detection: str = 'analog', noise_level: float = 0.05, noise_type: str = 'poisson', counts_per_delay: int | None = None, count_rate: float | None = None, integration_time: float | None = None, seed: int | None = None)[source]

Bases: object

Simulate 2D time- and energy-resolved spectroscopy data with noise.

This class generates synthetic data based on a model, adding realistic noise to simulate experimental measurements. Supports both analog detectors (with additive noise) and photon counting detectors (with shot noise).

Parameters:
  • model (Model) – Model instance from trspecfit.mcp with defined components and parameters. Must have energy and time axes set before simulation.

  • detection ({'analog', 'photon_counting'}, default='analog') – Detection technique to simulate: - ‘analog’: Continuous signal with additive noise - ‘photon_counting’: Discrete photon events with Poisson statistics

  • noise_level (float, default=0.05) – Noise amplitude for analog detectors (0.0-1.0 for relative noise). Larger values = more noise. Ignored for photon_counting.

  • noise_type ({'poisson', 'gaussian', 'none'}, default='poisson') – Type of noise for analog detectors: - ‘poisson’: Shot noise (realistic for low light) - ‘gaussian’: White noise (simpler, faster) - ‘none’: No noise (testing and debugging) Ignored for photon_counting (always Poisson).

  • counts_per_delay (int, optional) – Total photon count per time delay (photon_counting only). Directly sets signal-to-noise ratio. Mutually exclusive with count_rate + integration_time.

  • count_rate (float, optional) – Photon count rate in Hz (photon_counting only). Combined with integration_time to compute counts_per_delay.

  • integration_time (float, optional) – Integration time per delay point in seconds (photon_counting only). Combined with count_rate to compute counts_per_delay.

  • seed (int, optional) – Random seed for reproducibility. If None, uses random initialization.

model

Model instance used for simulation

Type:

Model

detection

Detection type (‘analog’ or ‘photon_counting’)

Type:

str

seed

Random seed value

Type:

int or None

noise_level

Analog detector noise level

Type:

float

noise_type

Analog detector noise type

Type:

str

counts_per_delay

Photon counting detector count budget

Type:

int

count_rate

Photon counting detector rate

Type:

float or None

integration_time

Photon counting integration time per delay

Type:

float or None

data_clean

Most recently generated clean (noiseless) data

Type:

ndarray or None

data_noisy

Most recently generated noisy data

Type:

ndarray or None

noise

Most recently generated noise component (noisy - clean)

Type:

ndarray or None

Examples

See examples/simulator/ directory for complete workflows.

Notes

Analog vs. Photon Counting:

Analog detectors (CCD, photodiode, lock-in): - Pros: High dynamic range, simple operation - Cons: Read noise, dark current - Simulation: Continuous signal + additive noise

Photon counting (APD, PMT, event mode): - Pros: No read noise, single-photon sensitivity - Cons: Dead time, count rate limits, pulse pileup - Simulation: Discrete events following Poisson statistics

Noise Level Selection:

For analog detectors, noise_level is relative to signal: - 0.01 (1%): Very clean, ideal conditions - 0.05 (5%): Typical good data - 0.10 (10%): Moderate noise, still fittable - 0.20 (20%): Challenging, may need averaging

For photon counting, SNR set by counts_per_delay: - 100 counts: SNR ~ 10 (marginal) - 1000 counts: SNR ~ 32 (good) - 10000 counts: SNR ~ 100 (excellent)

Photon Counting Parameter Resolution:

The simulator resolves photon counting parameters as: 1. If counts_per_delay specified directly → use it 2. Else if count_rate and integration_time specified → compute counts_per_delay 3. Else → estimate from model scale (prints warning)

The third case assumes model amplitudes represent realistic count rates, which may not be true. Always specify counts_per_delay or (count_rate, integration_time) explicitly for accurate photon counting simulation.

Memory Usage:

Large 2D datasets can use significant memory: - Single dataset: ~8 MB per 1000×500 spectrum (float64) - simulate_n(n=100): ~800 MB for same size - Consider smaller grids or batch processing for large n

See also

trspecfit.mcp.Model

Model class for simulation

simulate_1d

Generate 1D spectrum

simulate_2d

Generate 2D spectrum

simulate_n

Generate multiple realizations

save_data

Save simulated data to HDF5

add_noise(clean_data: ndarray, dim: int = 2) tuple[ndarray, ndarray][source]

Add noise to clean data based on detection technique.

Parameters:
  • clean_data (ndarray) – Clean data array (1D or 2D).

  • dim (int) – Dimension (1 for 1D, 2 for 2D).

Returns:

Tuple of (noisy_data, noise).

Return type:

tuple of (ndarray, ndarray)

generate_clean_data(dim: int = 2, t_ind: int = 0) ndarray[source]

Generate clean data from model (no noise).

Parameters:
  • dim (int) – Dimension (1 for 1D, 2 for 2D).

  • t_ind (int) – Time index for 1D simulations (ignored for 2D).

Returns:

Clean data array (1D or 2D depending on dim).

Return type:

ndarray

get_snr(scale: str = 'linear') float[source]

Calculate Signal-to-Noise Ratio (SNR).

Computes SNR from the most recently simulated data using power-based definition: SNR = signal_power / noise_power.

Parameters:

scale ({'linear', 'dB'}, default='linear') – Output scale: - ‘linear’: SNR as ratio (e.g., 25.0) - ‘dB’: SNR in decibels (e.g., 13.98 dB)

Returns:

SNR value in requested scale. Returns np.inf if noise_power is exactly zero.

Return type:

float

Raises:

ValueError – If no simulated data available (must call simulate_1d/2d/n first), or if scale is not ‘linear’ or ‘dB’.

Examples

>>> # Calculate SNR after simulation
>>> sim = Simulator(model, noise_level=0.05)
>>> clean, noisy, noise = sim.simulate_2d()
>>>
>>> snr_linear = sim.get_snr(scale='linear')
>>> print(f"SNR: {snr_linear:.1f}")
SNR: 25.3
>>>
>>> snr_db = sim.get_snr(scale='dB')
>>> print(f"SNR: {snr_db:.1f} dB")
SNR: 14.0 dB
>>> # Compare SNR across noise levels
>>> for noise_level in [0.01, 0.05, 0.10, 0.20]:
...     sim.set_noise_level(noise_level)
...     sim.simulate_2d()
...     snr = sim.get_snr()
...     print(f"Noise {noise_level:.2f}: SNR = {snr:.1f}")
Noise 0.01: SNR = 625.0
Noise 0.05: SNR = 25.0
Noise 0.10: SNR = 6.2
Noise 0.20: SNR = 1.6
>>> # Plot SNR vs photon count
>>> counts = [100, 500, 1000, 5000, 10000]
>>> snrs = []
>>> for count in counts:
...     sim = Simulator(model, detection='photon_counting',
...                     counts_per_delay=count)
...     sim.simulate_2d()
...     snrs.append(sim.get_snr())
>>> plt.loglog(counts, snrs, 'o-')
>>> plt.xlabel('Counts per delay')
>>> plt.ylabel('SNR')

Notes

SNR Definition:

Uses power-based (energy) definition:

SNR_linear = (mean(signal²)) / (mean(noise²)) SNR_dB = 10 × log₁₀(SNR_linear)

This differs from amplitude-based definition (20 log₁₀) by factor of 2. Power-based is standard in signal processing and communications.

Interpretation:

Linear scale: - SNR = 1: Signal and noise have equal power (marginal) - SNR = 10: Signal 10× stronger than noise (good) - SNR = 100: Signal 100× stronger than noise (excellent)

dB scale: - 0 dB: Equal signal and noise - 10 dB: 10× signal power (good) - 20 dB: 100× signal power (excellent) - Each 10 dB = 10× power ratio

Typical Values:

For spectroscopy data: - SNR < 5 (< 7 dB): Difficult to fit reliably - SNR 5-20 (7-13 dB): Good quality, typical experimental data - SNR 20-100 (13-20 dB): High quality - SNR > 100 (> 20 dB): Exceptional, near ideal

Limitations:

This is a global SNR averaged over entire spectrum. Local SNR may vary significantly, especially for: - Weak features vs. strong peaks - Time-dependent signals (varying amplitude) - Non-uniform noise (detector artifacts)

For accurate local SNR, compute on regions of interest separately.

See also

simulate_1d

Must call before get_snr

simulate_2d

Must call before get_snr

plot_comparison

Shows SNR in title

plot_comparison(t_ind: int = 0, dim: int = 1, snr_scale: str = 'linear', *, save_img: int = 0, config: PlotConfig | None = None, **plot_kwargs) None[source]

Plot comparison of clean vs noisy data.

Creates visualization showing clean model data, noisy simulated data, and noise component side-by-side. Essential for visually assessing simulation quality and noise characteristics.

Parameters:
  • t_ind (int, default=0) – Time index for 1D plots (ignored for dim=2)

  • dim ({1, 2}, default=1) – Dimensionality: - 1: Create 1D plot with clean, noisy, and noise curves - 2: Create three-panel 2D plot (clean, noisy, noise)

  • snr_scale ({'linear', 'dB'}, default='linear') – Scale for SNR display in title: - ‘linear’: Show as ratio (e.g., “SNR: 25.0 linear”) - ‘dB’: Show in decibels (e.g., “SNR: 14.0 dB”)

  • save_img (int, default=0) – 0: display, 1: save+display, -1: save only, -2: close (no display/save)

  • config (PlotConfig, optional) – Override the model’s inherited plot configuration for this call. If None, uses the model’s own plot_config.

  • **plot_kwargs (dict) – Per-call overrides for any PlotConfig field (e.g. z_colormap, ticksize). Applied on top of config.

Examples

>>> # 1D comparison
>>> sim = Simulator(model, noise_level=0.05)
>>> sim.simulate_1d(t_ind=0)
>>> sim.plot_comparison(dim=1)
>>> # 2D comparison with dB scale
>>> sim = Simulator(model, noise_level=0.05)
>>> sim.simulate_2d()
>>> sim.plot_comparison(dim=2, snr_scale='dB')
>>> # Compare different noise levels visually
>>> fig, axes = plt.subplots(3, 1, figsize=(10, 12))
>>> for i, noise_level in enumerate([0.01, 0.05, 0.10]):
...     sim.set_noise_level(noise_level)
...     sim.simulate_1d()
...     # ... manual plotting on axes[i] ...
>>> # Check photon counting vs analog
>>> sim_analog = Simulator(model, detection='analog', noise_level=0.05)
>>> sim_photon = Simulator(model, detection='photon_counting',
...                         counts_per_delay=1000)
>>> sim_analog.simulate_2d()
>>> sim_photon.simulate_2d()
>>> # ... compare visually ...

Notes

1D Plot Layout:

Single plot with three traces: - Clean: Black line (ground truth) - Noisy: Red scatter points (simulated data) - Noise: Gray line (noise component)

Scatter points for noisy data help visualize noise granularity.

2D Plot Layout:

Three side-by-side panels: - Left: Clean model data - Center: Noisy simulated data (with SNR in title) - Right: Noise component (difference)

All use same colormap from model.plot_config for consistency.

Visual Assessment:

Good simulation should show: - Noisy data follows clean data trend - Noise is randomly distributed (no patterns) - SNR appropriate for intended use case - Peak features still distinguishable in noisy data

If noise dominates signal (SNR << 1), features may be completely obscured - increase signal or reduce noise.

Configuration:

Plot uses model.plot_config for: - Axis labels (energy/time labels) - Axis direction (e.g., reversed energy) - Colormap (for 2D plots) - DPI settings

This ensures consistency with other trspecfit plots.

See also

simulate_1d

Generate 1D data to plot

simulate_2d

Generate 2D data to plot

get_snr

SNR calculation shown in title

save_data(*, filepath: str | None = None, save_format: str = 'hdf5', n_data: list[ndarray] | None = None, overwrite: bool = True, show_output: int = 1) None[source]

Save simulated data to file with metadata.

Exports simulated data in HDF5 format with complete metadata including model parameters, noise settings, and experimental axes. Essential for sharing simulated datasets and ensuring reproducibility.

Parameters:
  • filepath (str or Path, optional) – Path where to save data. If None, uses default: ‘./simulated_data/simulated_data.h5’ If provided path doesn’t include ‘simulated_data’ directory, it will be automatically placed there.

  • save_format (str, default='hdf5') – File format. Currently only ‘hdf5’ supported. Future: could add .mat, .npz, etc.

  • n_data (list of ndarray, optional) – Multiple noisy datasets from simulate_n() to save. If None, saves single dataset from simulate_1d() or simulate_2d().

  • overwrite (bool, default=True) – If True, overwrite existing files. If False, raise FileExistsError if file exists.

  • show_output (int, default=1) –

    Output mode:

    • 0: Silent / programmatic / API mode – no prints

    • 1: Interactive / notebook / UI mode – show timing and save confirmation

Raises:
  • ValueError – If no simulated data available (must call simulate first)

  • FileExistsError – If file exists and overwrite=False

Examples

>>> # Save single simulation
>>> sim = Simulator(model, noise_level=0.05, seed=42)
>>> clean, noisy, noise = sim.simulate_2d()
>>> sim.save_data('simulation_001.h5')
Data saved to: ./simulated_data/simulation_001.h5
>>> # Save multiple realizations
>>> clean, noisy_list, noise_list = sim.simulate_n(n=50, dim=2)
>>> sim.save_data(
...     filepath='batch_simulation.h5',
...     n_data=noisy_list
... )
Data saved to: ./simulated_data/batch_simulation.h5
>>> # Prevent accidental overwrites
>>> sim.save_data('important_data.h5', overwrite=False)
FileExistsError: File already exists: ./simulated_data/important_data.h5
Set overwrite=True to overwrite, or provide a different filepath.
>>> # Load saved data later
>>> import h5py
>>> with h5py.File('simulated_data/simulation_001.h5', 'r') as f:
...     energy = f['energy'][:]
...     time = f['time'][:]
...     clean = f['clean_data'][:]
...     noisy = f['simulated_data/000000'][:]
...
...     # Read metadata
...     noise_level = f['metadata'].attrs['noise_level']
...     model_params = f['metadata'].attrs['model_parameters']

Notes

HDF5 File Structure:

/
├── energy              (dataset: 1D array)
├── time                (dataset: 1D array, empty for 1D simulations)
├── clean_data          (dataset: 1D or 2D array)
├── simulated_data/     (group)
│   ├── 000000          (dataset: first noisy realization)
│   ├── 000001          (dataset: second noisy realization)
│   └── ...
└── metadata/           (group with [optional]attributes)
    ├── detection       ('analog' or 'photon_counting')
    ├── noise_level     (analog noise level)
    ├── noise_type      (analog noise type)
    ├── counts_per_delay (photon counting counts)
    ├── count_rate      ([optional] photon counting rate)
    ├── integration_time      ([optional] photon counting integration time)
    ├── seed            ([optional] random seed, if set)
    ├── dimension       (1 or 2)
    ├── n_datasets      (number of noisy datasets)
    ├── model_parameters (JSON string of all parameters)
    └── model_name      (model name)

Why HDF5?

HDF5 format chosen because: - Efficient for large multidimensional arrays - Self-describing (metadata embedded) - Widely supported (Python, MATLAB, Igor, etc.) - Allows partial loading (don’t need entire file in memory) - Standard in scientific computing

Model Parameters:

All model parameters saved as JSON string in metadata for complete reproducibility. Includes:

  • Parameter values

  • vary flags (which parameters were free)

  • Bounds (min/max)

  • Expressions (parameter constraints)

This allows exact recreation of the model used for simulation.

File Organization:

Default directory structure:

project_directory/
└── simulated_data/
    ├── simulation_001.h5
    ├── simulation_002.h5
    └── batch_001.h5

Keeps simulated data organized and separate from experimental data.

Multiple Datasets:

When n_data provided (from simulate_n), all realizations saved in simulated_data group with sequential names:

  • 000000, 000001, …, 000099 for 100 datasets

  • Zero-padded for proper sorting

Clean data saved once (same for all realizations).

Loading Data:

Standard h5py usage:

import h5py

with h5py.File('simulated_data/data.h5', 'r') as f:
    # Load axes
    energy = f['energy'][:]
    time = f['time'][:]

    # Load clean data
    clean = f['clean_data'][:]

    # Load all noisy datasets
    noisy_datasets = []
    for key in sorted(f['simulated_data'].keys()):
        noisy_datasets.append(f['simulated_data'][key][:])

    # Load metadata
    detection = f['metadata'].attrs['detection']
    n_datasets = f['metadata'].attrs['n_datasets']

See also

simulate_n

Generate multiple datasets to save

simulate_1d

Generate 1D data

simulate_2d

Generate 2D data

h5py

Python HDF5 library

set_count_rate(count_rate: float, integration_time: float | None = None) None[source]

Update count rate (photon counting only).

Parameters:
  • count_rate (float) – Photon rate in Hz.

  • integration_time (float | None) – Integration time per delay in seconds. If None, uses existing value.

set_counts_per_delay(counts_per_delay: int) None[source]

Update counts per delay (photon counting only).

Parameters:

counts_per_delay (int) – Total photon counts collected per delay step.

set_noise_level(noise_level: float) None[source]

Update noise level (analog detectors only).

Parameters:

noise_level (float) – Standard deviation of Gaussian noise (absolute units).

set_noise_type(noise_type: str) None[source]

Update noise type (analog detectors only).

Parameters:

noise_type (str) – Noise distribution: 'gaussian' or 'uniform'.

set_seed(seed: int | None) None[source]

Update random seed.

Parameters:

seed (int or None) – Random seed for reproducibility. None for non-deterministic.

simulate_1d(t_ind: int = 0) tuple[ndarray, ndarray, ndarray][source]

Simulate 1D spectrum (energy-resolved) at a specific time point.

Generates a single energy-resolved spectrum from the model at the specified time index, adds appropriate noise for the detector type, and stores results for later access.

Parameters:

t_ind (int, default=0) – Time index for which to generate spectrum. For models without time-dependence, use default 0.

Returns:

  • clean_data (ndarray) – Noiseless spectrum from model (shape: [n_energy])

  • noisy_data (ndarray) – Spectrum with added noise (shape: [n_energy])

  • noise (ndarray) – Noise component (noisy - clean, shape: [n_energy])

Examples

>>> # Simulate baseline spectrum
>>> sim = Simulator(model, noise_level=0.05)
>>> clean, noisy, noise = sim.simulate_1d(t_ind=0)
>>>
>>> # Plot comparison
>>> plt.plot(model.energy, clean, 'k-', label='Clean')
>>> plt.plot(model.energy, noisy, 'r.', label='Noisy', ms=2)
>>> plt.legend()
>>> # Calculate SNR
>>> snr = sim.get_snr()
>>> print(f"Signal-to-noise ratio: {snr:.1f}")
>>> # Simulate different time points
>>> for t_i in [0, 50, 100]:
...     clean, noisy, noise = sim.simulate_1d(t_ind=t_i)
...     plt.plot(model.energy, noisy, label=f't={model.time[t_i]:.1f}')

Notes

Results are stored in simulator attributes for later access: - self.data_clean: (Last) clean spectrum - self.data_noisy: Last noisy spectrum - self.noise: Last noise realization

Can access these without re-simulation: >>> sim.simulate_1d() >>> snr = sim.get_snr() # Uses stored data >>> sim.plot_comparison(dim=1) # Uses stored data

See also

simulate_2d

Simulate full 2D spectrum

simulate_n

Generate multiple 1D realizations

plot_comparison

Visualize results

simulate_2d() tuple[ndarray, ndarray, ndarray][source]

Simulate 2D spectrum (time- and energy-resolved).

Generates a complete 2D time- and energy-resolved spectrum from the model, adds appropriate noise for each time point, and stores results.

Returns:

  • clean_data (ndarray) – Noiseless 2D spectrum from model (shape: [n_time, n_energy])

  • noisy_data (ndarray) – 2D spectrum with added noise (shape: [n_time, n_energy])

  • noise (ndarray) – Noise component (noisy - clean, shape: [n_time, n_energy])

Examples

>>> # Basic 2D simulation
>>> sim = Simulator(model, noise_level=0.05)
>>> clean, noisy, noise = sim.simulate_2d()
>>>
>>> # Visualize with built-in plotter
>>> sim.plot_comparison(dim=2)
>>> # Manual visualization
>>> fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
>>> ax1.pcolormesh(model.energy, model.time, clean)
>>> ax1.set_title('Clean Model')
>>> ax2.pcolormesh(model.energy, model.time, noisy)
>>> ax2.set_title(f'Noisy (SNR={sim.get_snr():.1f})')
>>> # Test fitting on simulated data
>>> clean, noisy, noise = sim.simulate_2d()
>>> # ... set up fitting ...
>>> file.data = noisy  # Use noisy data for fit
>>> file.fit_2d(model_name='test', stages=2)
>>> # Compare fitted vs. true parameters
>>> # Vary noise level to study impact
>>> for noise_level in [0.01, 0.05, 0.10]:
...     sim.set_noise_level(noise_level)
...     clean, noisy, noise = sim.simulate_2d()
...     snr = sim.get_snr()
...     print(f"Noise level {noise_level:.2f}: SNR = {snr:.1f}")

Notes

Noise Application:

For analog detectors, noise is added independently at each pixel. For photon counting, photons are distributed across all pixels according to the signal probability distribution, then reconverted to same scale as input for direct comparison.

Performance:

Simulation time scales with: - Model evaluation time (dominates for complex models) - Array size (n_time × n_energy) - Noise generation method

Typical times: - Simple model, 200×500 array: ~0.1-1 second - Complex model with time-dependence: ~1-10 seconds - Photon counting slightly slower than analog

Memory:

Three arrays stored (clean, noisy, noise), each: - Size: n_time × n_energy × 8 bytes (float64) - Example: 200×500 = ~2.4 MB per array, ~7.2 MB total

See also

simulate_1d

Simulate single spectrum

simulate_n

Generate multiple 2D realizations

plot_comparison

Visualize results

save_data

Save to HDF5 file

simulate_n(n: int, *, dim: int = 2, t_ind: int = 0, show_progress: bool = True) tuple[ndarray, list[ndarray], list[ndarray]][source]

Generate n simulated datasets with independent noise realizations.

Generates the clean data ONCE from the model, then adds n independent noise realizations. Use for statistical analysis of fitting algorithms and uncertainty quantification or machine learning model training.

Parameters:
  • n (int) – Number of datasets to generate (must be >= 1)

  • dim ({1, 2}, default=2) – Dimensionality: - 1: Generate 1D spectra - 2: Generate 2D spectra

  • t_ind (int, default=0) – Time index for 1D simulations (ignored for dim=2)

  • show_progress (bool, default=True) – Print progress updates during generation

Returns:

  • clean_data (ndarray) – Single clean dataset (1D or 2D depending on dim). Same for all n realizations (generated once).

  • noisy_data_list (list of ndarray) – List of n noisy datasets, each with independent noise. Each element has same shape as clean_data.

  • noise_list (list of ndarray) – List of n noise realizations (noisy - clean for each dataset). Each element has same shape as clean_data.

Examples

>>> # Generate 20 independent noisy datasets
>>> sim = Simulator(model, noise_level=0.05)
>>> clean, noisy_list, noise_list = sim.simulate_n(n=20, dim=2)
>>>
>>> # Fit each dataset and analyze parameter distribution
>>> fitted_params = []
>>> for noisy_data in noisy_list:
...     file.data = noisy_data
...     file.fit_2d('test', stages=2)
...     fitted_params.append(model.lmfit_pars['amplitude'].value)
>>>
>>> # Check parameter recovery
>>> true_value = model.lmfit_pars['amplitude'].value
>>> mean_fitted = np.mean(fitted_params)
>>> std_fitted = np.std(fitted_params)
>>> print(f"True: {true_value:.2f}")
>>> print(f"Mean fitted: {mean_fitted:.2f} ± {std_fitted:.2f}")
>>> # Analyze noise statistics
>>> noise_mean = np.mean(noise_list, axis=0)
>>> noise_std = np.std(noise_list, axis=0)
>>>
>>> # Should be close to zero (unbiased)
>>> print(f"Noise mean: {np.mean(noise_mean):.2e}")
>>> # Should match noise_level * signal scale
>>> print(f"Noise std: {np.mean(noise_std):.2e}")
>>> # Save multiple realizations for later use
>>> clean, noisy_list, noise_list = sim.simulate_n(n=100, dim=2)
>>> sim.save_data(
...     filepath='simulations/batch_001.h5',
...     n_data=noisy_list
... )
>>> # Test convergence of fitted parameters with n
>>> for n_datasets in [5, 10, 20, 50]:
...     clean, noisy_list, _ = sim.simulate_n(n=n_datasets, dim=2)
...     # ... fit each and compute parameter statistics ...
...     print(f"n={n_datasets}: parameter std = {param_std:.3f}")

Notes

Efficiency:

Generating clean data once and adding n noise realizations is much faster than generating n complete simulations:

  • This method: 1 model evaluation + n noise additions

  • n separate simulate_2d calls: n model evaluations + n noise additions

For complex models where evaluation is slow, this can save minutes to hours of computation time.

Statistical Analysis:

This function enables: - Monte Carlo analysis of fitting uncertainty - Algorithm validation (can recover true parameters?) - Bias detection (systematic fitting errors) - Confidence interval validation (coverage probability) - Experimental design optimization (required SNR)

Memory Considerations:

All n datasets stored in memory as lists: - Memory usage: n × (n_time × n_energy × 8 bytes) - Example: 100 datasets of 200×500 = ~800 MB

For very large n or large arrays, consider: - Processing in batches - Saving to disk incrementally - Using generator pattern instead of list

Progress Display:

When show_progress=True, prints: - “Generating clean data from model… Done” - “Adding noise to dataset i/n” (updates in place) - “Generated n noisy datasets successfully”

Set show_progress=False for batch processing or when redirecting output.

See also

simulate_1d

Single 1D simulation

simulate_2d

Single 2D simulation

save_data

Save multiple datasets to HDF5

simulate_parameter_sweep(parameter_sweep: ParameterSweep, n_realizations: int, *, dim: int = 2, filepath: str = 'ml_training_data.h5', show_progress: bool = True) None[source]

Generate ML training dataset by sweeping parameters.

Processes configurations one at a time, immediately saving to disk. Memory usage remains constant regardless of parameter space size.

Parameters:
  • parameter_sweep (ParameterSweep) – Generator yielding parameter configurations

  • n_realizations (int) – Number of noisy realizations per parameter configuration

  • dim ({1, 2}, default=2) – Dimensionality of simulated data: - 1: Generate 1D spectra - 2: Generate 2D spectra

  • filepath (str, default='ml_training_data.h5') – HDF5 file path for output

  • show_progress (bool, default=True) – Print progress updates during generation

Examples

>>> # Set up parameter space
>>> sweep = ParameterSweep(strategy='random', seed=42)
>>> sweep.add_uniform('GLP_01_A', 5, 30, n_samples=100)
>>> sweep.add_uniform('GLP_01_x0', 5, 15, n_samples=100)
>>>
>>> # Generate dataset
>>> sim = Simulator(model, noise_level=0.05, seed=42)
>>> sim.simulate_parameter_sweep(
...     parameter_sweep=sweep,
...     n_realizations=20,
...     filepath='training_data.h5'
... )
Processing config 1/100: {'GLP_01_A': 12.5, 'GLP_01_x0': 8.3}
  Saved config 1 with 20 realizations
...
Parameter sweep complete!
Generated 100 configs × 20 realizations
Data saved to: ./simulated_data/training_data.h5

Notes

Memory Efficiency: Only one configuration is in memory at a time. Each is immediately written to disk before processing the next. Total memory usage is independent of parameter space size.

Resumability: If interrupted, completed configurations are already saved to disk. Currently does not support automatic resume (will overwrite file).

File Structure: See _initialize_sweep_hdf5 for complete HDF5 structure description.

See also

ParameterSweep

Define parameter space to sweep

simulate_n

Generate multiple noisy realizations

_initialize_sweep_hdf5

HDF5 file structure

_append_config_to_hdf5

Incremental saving logic