Simulator Module
Synthetic data generation for testing, validation, ML training data generation.
This module provides tools for generating realistic simulated spectroscopy data from models, with support for different detector types and noise models. Use for: - Testing fitting algorithms with known ground truth - Exploring parameter sensitivity and identifiability - Optimizing experimental design (SNR requirements) - Generating training data for machine learning - Validating analysis pipelines
Key Features
Two detector types: analog and photon counting
Multiple noise models: Poisson, Gaussian, or none
1D and 2D spectrum simulation
Batch generation for statistical analysis
Parameter sweeping (grid/random/uniform) for ML training
Detector Types
Analog Detectors (CCD, photodiodes, lock-in amplifiers): - Continuous signal output - Additive noise (Gaussian or Poisson) - Noise level controlled by noise_level parameter
Photon Counting (APD, photomultiplier, event mode): - Discrete photon events - Shot noise inherent (Poisson statistics) - Count rate determines signal-to-noise ratio
Workflow
Testing and Validation
Create model with trspecfit.mcp.Model
Initialize Simulator with model and noise parameters
Generate data with simulate_1d() or simulate_2d() OR generate multiple realizations with simulate_n()
Save data and ground truth with save_data()
Fit simulated data to validate fitting pipeline
Machine Learning Training Data Generation
Create model with trspecfit.mcp.Model
Define parameter space using trspecfit.utils.sweep.ParameterSweep
Initialize Simulator with model and noise parameters
Generate multiple realizations (n) for each parameter combination (data, ground truth, and relevant metadata get saved automatically)
Examples
See examples/simulator/ directory for complete workflows.
- class trspecfit.simulator.Simulator(model: Model, detection: str = 'analog', noise_level: float = 0.05, noise_type: str = 'poisson', counts_per_delay: int | None = None, count_rate: float | None = None, integration_time: float | None = None, seed: int | None = None)[source]
Bases:
objectSimulate 2D time- and energy-resolved spectroscopy data with noise.
This class generates synthetic data based on a model, adding realistic noise to simulate experimental measurements. Supports both analog detectors (with additive noise) and photon counting detectors (with shot noise).
- Parameters:
model (Model) – Model instance from trspecfit.mcp with defined components and parameters. Must have energy and time axes set before simulation.
detection ({'analog', 'photon_counting'}, default='analog') – Detection technique to simulate: - ‘analog’: Continuous signal with additive noise - ‘photon_counting’: Discrete photon events with Poisson statistics
noise_level (float, default=0.05) – Noise amplitude for analog detectors (0.0-1.0 for relative noise). Larger values = more noise. Ignored for photon_counting.
noise_type ({'poisson', 'gaussian', 'none'}, default='poisson') – Type of noise for analog detectors: - ‘poisson’: Shot noise (realistic for low light) - ‘gaussian’: White noise (simpler, faster) - ‘none’: No noise (testing and debugging) Ignored for photon_counting (always Poisson).
counts_per_delay (int, optional) – Total photon count per time delay (photon_counting only). Directly sets signal-to-noise ratio. Mutually exclusive with count_rate + integration_time.
count_rate (float, optional) – Photon count rate in Hz (photon_counting only). Combined with integration_time to compute counts_per_delay.
integration_time (float, optional) – Integration time per delay point in seconds (photon_counting only). Combined with count_rate to compute counts_per_delay.
seed (int, optional) – Random seed for reproducibility. If None, uses random initialization.
- data_clean
Most recently generated clean (noiseless) data
- Type:
ndarray or None
- data_noisy
Most recently generated noisy data
- Type:
ndarray or None
- noise
Most recently generated noise component (noisy - clean)
- Type:
ndarray or None
Examples
See examples/simulator/ directory for complete workflows.
Notes
Analog vs. Photon Counting:
Analog detectors (CCD, photodiode, lock-in): - Pros: High dynamic range, simple operation - Cons: Read noise, dark current - Simulation: Continuous signal + additive noise
Photon counting (APD, PMT, event mode): - Pros: No read noise, single-photon sensitivity - Cons: Dead time, count rate limits, pulse pileup - Simulation: Discrete events following Poisson statistics
Noise Level Selection:
For analog detectors, noise_level is relative to signal: - 0.01 (1%): Very clean, ideal conditions - 0.05 (5%): Typical good data - 0.10 (10%): Moderate noise, still fittable - 0.20 (20%): Challenging, may need averaging
For photon counting, SNR set by counts_per_delay: - 100 counts: SNR ~ 10 (marginal) - 1000 counts: SNR ~ 32 (good) - 10000 counts: SNR ~ 100 (excellent)
Photon Counting Parameter Resolution:
The simulator resolves photon counting parameters as: 1. If counts_per_delay specified directly → use it 2. Else if count_rate and integration_time specified → compute counts_per_delay 3. Else → estimate from model scale (prints warning)
The third case assumes model amplitudes represent realistic count rates, which may not be true. Always specify counts_per_delay or (count_rate, integration_time) explicitly for accurate photon counting simulation.
Memory Usage:
Large 2D datasets can use significant memory: - Single dataset: ~8 MB per 1000×500 spectrum (float64) - simulate_n(n=100): ~800 MB for same size - Consider smaller grids or batch processing for large n
See also
trspecfit.mcp.ModelModel class for simulation
simulate_1dGenerate 1D spectrum
simulate_2dGenerate 2D spectrum
simulate_nGenerate multiple realizations
save_dataSave simulated data to HDF5
- add_noise(clean_data: ndarray, dim: int = 2) tuple[ndarray, ndarray][source]
Add noise to clean data based on detection technique.
- generate_clean_data(dim: int = 2, t_ind: int = 0) ndarray[source]
Generate clean data from model (no noise).
- get_snr(scale: str = 'linear') float[source]
Calculate Signal-to-Noise Ratio (SNR).
Computes SNR from the most recently simulated data using power-based definition: SNR = signal_power / noise_power.
- Parameters:
scale ({'linear', 'dB'}, default='linear') – Output scale: - ‘linear’: SNR as ratio (e.g., 25.0) - ‘dB’: SNR in decibels (e.g., 13.98 dB)
- Returns:
SNR value in requested scale. Returns np.inf if noise_power is exactly zero.
- Return type:
- Raises:
ValueError – If no simulated data available (must call simulate_1d/2d/n first), or if scale is not ‘linear’ or ‘dB’.
Examples
>>> # Calculate SNR after simulation >>> sim = Simulator(model, noise_level=0.05) >>> clean, noisy, noise = sim.simulate_2d() >>> >>> snr_linear = sim.get_snr(scale='linear') >>> print(f"SNR: {snr_linear:.1f}") SNR: 25.3 >>> >>> snr_db = sim.get_snr(scale='dB') >>> print(f"SNR: {snr_db:.1f} dB") SNR: 14.0 dB
>>> # Compare SNR across noise levels >>> for noise_level in [0.01, 0.05, 0.10, 0.20]: ... sim.set_noise_level(noise_level) ... sim.simulate_2d() ... snr = sim.get_snr() ... print(f"Noise {noise_level:.2f}: SNR = {snr:.1f}") Noise 0.01: SNR = 625.0 Noise 0.05: SNR = 25.0 Noise 0.10: SNR = 6.2 Noise 0.20: SNR = 1.6
>>> # Plot SNR vs photon count >>> counts = [100, 500, 1000, 5000, 10000] >>> snrs = [] >>> for count in counts: ... sim = Simulator(model, detection='photon_counting', ... counts_per_delay=count) ... sim.simulate_2d() ... snrs.append(sim.get_snr()) >>> plt.loglog(counts, snrs, 'o-') >>> plt.xlabel('Counts per delay') >>> plt.ylabel('SNR')
Notes
SNR Definition:
Uses power-based (energy) definition:
SNR_linear = (mean(signal²)) / (mean(noise²)) SNR_dB = 10 × log₁₀(SNR_linear)
This differs from amplitude-based definition (20 log₁₀) by factor of 2. Power-based is standard in signal processing and communications.
Interpretation:
Linear scale: - SNR = 1: Signal and noise have equal power (marginal) - SNR = 10: Signal 10× stronger than noise (good) - SNR = 100: Signal 100× stronger than noise (excellent)
dB scale: - 0 dB: Equal signal and noise - 10 dB: 10× signal power (good) - 20 dB: 100× signal power (excellent) - Each 10 dB = 10× power ratio
Typical Values:
For spectroscopy data: - SNR < 5 (< 7 dB): Difficult to fit reliably - SNR 5-20 (7-13 dB): Good quality, typical experimental data - SNR 20-100 (13-20 dB): High quality - SNR > 100 (> 20 dB): Exceptional, near ideal
Limitations:
This is a global SNR averaged over entire spectrum. Local SNR may vary significantly, especially for: - Weak features vs. strong peaks - Time-dependent signals (varying amplitude) - Non-uniform noise (detector artifacts)
For accurate local SNR, compute on regions of interest separately.
See also
simulate_1dMust call before get_snr
simulate_2dMust call before get_snr
plot_comparisonShows SNR in title
- plot_comparison(t_ind: int = 0, dim: int = 1, snr_scale: str = 'linear', *, save_img: int = 0, config: PlotConfig | None = None, **plot_kwargs) None[source]
Plot comparison of clean vs noisy data.
Creates visualization showing clean model data, noisy simulated data, and noise component side-by-side. Essential for visually assessing simulation quality and noise characteristics.
- Parameters:
t_ind (int, default=0) – Time index for 1D plots (ignored for dim=2)
dim ({1, 2}, default=1) – Dimensionality: - 1: Create 1D plot with clean, noisy, and noise curves - 2: Create three-panel 2D plot (clean, noisy, noise)
snr_scale ({'linear', 'dB'}, default='linear') – Scale for SNR display in title: - ‘linear’: Show as ratio (e.g., “SNR: 25.0 linear”) - ‘dB’: Show in decibels (e.g., “SNR: 14.0 dB”)
save_img (int, default=0) – 0: display, 1: save+display, -1: save only, -2: close (no display/save)
config (PlotConfig, optional) – Override the model’s inherited plot configuration for this call. If None, uses the model’s own plot_config.
**plot_kwargs (dict) – Per-call overrides for any PlotConfig field (e.g.
z_colormap,ticksize). Applied on top of config.
Examples
>>> # 1D comparison >>> sim = Simulator(model, noise_level=0.05) >>> sim.simulate_1d(t_ind=0) >>> sim.plot_comparison(dim=1)
>>> # 2D comparison with dB scale >>> sim = Simulator(model, noise_level=0.05) >>> sim.simulate_2d() >>> sim.plot_comparison(dim=2, snr_scale='dB')
>>> # Compare different noise levels visually >>> fig, axes = plt.subplots(3, 1, figsize=(10, 12)) >>> for i, noise_level in enumerate([0.01, 0.05, 0.10]): ... sim.set_noise_level(noise_level) ... sim.simulate_1d() ... # ... manual plotting on axes[i] ...
>>> # Check photon counting vs analog >>> sim_analog = Simulator(model, detection='analog', noise_level=0.05) >>> sim_photon = Simulator(model, detection='photon_counting', ... counts_per_delay=1000) >>> sim_analog.simulate_2d() >>> sim_photon.simulate_2d() >>> # ... compare visually ...
Notes
1D Plot Layout:
Single plot with three traces: - Clean: Black line (ground truth) - Noisy: Red scatter points (simulated data) - Noise: Gray line (noise component)
Scatter points for noisy data help visualize noise granularity.
2D Plot Layout:
Three side-by-side panels: - Left: Clean model data - Center: Noisy simulated data (with SNR in title) - Right: Noise component (difference)
All use same colormap from model.plot_config for consistency.
Visual Assessment:
Good simulation should show: - Noisy data follows clean data trend - Noise is randomly distributed (no patterns) - SNR appropriate for intended use case - Peak features still distinguishable in noisy data
If noise dominates signal (SNR << 1), features may be completely obscured - increase signal or reduce noise.
Configuration:
Plot uses model.plot_config for: - Axis labels (energy/time labels) - Axis direction (e.g., reversed energy) - Colormap (for 2D plots) - DPI settings
This ensures consistency with other trspecfit plots.
See also
simulate_1dGenerate 1D data to plot
simulate_2dGenerate 2D data to plot
get_snrSNR calculation shown in title
- save_data(*, filepath: str | None = None, save_format: str = 'hdf5', n_data: list[ndarray] | None = None, overwrite: bool = True, show_output: int = 1) None[source]
Save simulated data to file with metadata.
Exports simulated data in HDF5 format with complete metadata including model parameters, noise settings, and experimental axes. Essential for sharing simulated datasets and ensuring reproducibility.
- Parameters:
filepath (str or Path, optional) – Path where to save data. If None, uses default: ‘./simulated_data/simulated_data.h5’ If provided path doesn’t include ‘simulated_data’ directory, it will be automatically placed there.
save_format (str, default='hdf5') – File format. Currently only ‘hdf5’ supported. Future: could add .mat, .npz, etc.
n_data (list of ndarray, optional) – Multiple noisy datasets from simulate_n() to save. If None, saves single dataset from simulate_1d() or simulate_2d().
overwrite (bool, default=True) – If True, overwrite existing files. If False, raise FileExistsError if file exists.
show_output (int, default=1) –
Output mode:
0: Silent / programmatic / API mode – no prints
1: Interactive / notebook / UI mode – show timing and save confirmation
- Raises:
ValueError – If no simulated data available (must call simulate first)
FileExistsError – If file exists and overwrite=False
Examples
>>> # Save single simulation >>> sim = Simulator(model, noise_level=0.05, seed=42) >>> clean, noisy, noise = sim.simulate_2d() >>> sim.save_data('simulation_001.h5') Data saved to: ./simulated_data/simulation_001.h5
>>> # Save multiple realizations >>> clean, noisy_list, noise_list = sim.simulate_n(n=50, dim=2) >>> sim.save_data( ... filepath='batch_simulation.h5', ... n_data=noisy_list ... ) Data saved to: ./simulated_data/batch_simulation.h5
>>> # Prevent accidental overwrites >>> sim.save_data('important_data.h5', overwrite=False) FileExistsError: File already exists: ./simulated_data/important_data.h5 Set overwrite=True to overwrite, or provide a different filepath.
>>> # Load saved data later >>> import h5py >>> with h5py.File('simulated_data/simulation_001.h5', 'r') as f: ... energy = f['energy'][:] ... time = f['time'][:] ... clean = f['clean_data'][:] ... noisy = f['simulated_data/000000'][:] ... ... # Read metadata ... noise_level = f['metadata'].attrs['noise_level'] ... model_params = f['metadata'].attrs['model_parameters']
Notes
HDF5 File Structure:
/ ├── energy (dataset: 1D array) ├── time (dataset: 1D array, empty for 1D simulations) ├── clean_data (dataset: 1D or 2D array) ├── simulated_data/ (group) │ ├── 000000 (dataset: first noisy realization) │ ├── 000001 (dataset: second noisy realization) │ └── ... └── metadata/ (group with [optional]attributes) ├── detection ('analog' or 'photon_counting') ├── noise_level (analog noise level) ├── noise_type (analog noise type) ├── counts_per_delay (photon counting counts) ├── count_rate ([optional] photon counting rate) ├── integration_time ([optional] photon counting integration time) ├── seed ([optional] random seed, if set) ├── dimension (1 or 2) ├── n_datasets (number of noisy datasets) ├── model_parameters (JSON string of all parameters) └── model_name (model name)Why HDF5?
HDF5 format chosen because: - Efficient for large multidimensional arrays - Self-describing (metadata embedded) - Widely supported (Python, MATLAB, Igor, etc.) - Allows partial loading (don’t need entire file in memory) - Standard in scientific computing
Model Parameters:
All model parameters saved as JSON string in metadata for complete reproducibility. Includes:
Parameter values
vary flags (which parameters were free)
Bounds (min/max)
Expressions (parameter constraints)
This allows exact recreation of the model used for simulation.
File Organization:
Default directory structure:
project_directory/ └── simulated_data/ ├── simulation_001.h5 ├── simulation_002.h5 └── batch_001.h5Keeps simulated data organized and separate from experimental data.
Multiple Datasets:
When n_data provided (from simulate_n), all realizations saved in simulated_data group with sequential names:
000000, 000001, …, 000099 for 100 datasets
Zero-padded for proper sorting
Clean data saved once (same for all realizations).
Loading Data:
Standard h5py usage:
import h5py with h5py.File('simulated_data/data.h5', 'r') as f: # Load axes energy = f['energy'][:] time = f['time'][:] # Load clean data clean = f['clean_data'][:] # Load all noisy datasets noisy_datasets = [] for key in sorted(f['simulated_data'].keys()): noisy_datasets.append(f['simulated_data'][key][:]) # Load metadata detection = f['metadata'].attrs['detection'] n_datasets = f['metadata'].attrs['n_datasets']
See also
simulate_nGenerate multiple datasets to save
simulate_1dGenerate 1D data
simulate_2dGenerate 2D data
h5pyPython HDF5 library
- set_count_rate(count_rate: float, integration_time: float | None = None) None[source]
Update count rate (photon counting only).
- set_counts_per_delay(counts_per_delay: int) None[source]
Update counts per delay (photon counting only).
- Parameters:
counts_per_delay (int) – Total photon counts collected per delay step.
- set_noise_level(noise_level: float) None[source]
Update noise level (analog detectors only).
- Parameters:
noise_level (float) – Standard deviation of Gaussian noise (absolute units).
- set_noise_type(noise_type: str) None[source]
Update noise type (analog detectors only).
- Parameters:
noise_type (str) – Noise distribution:
'gaussian'or'uniform'.
- set_seed(seed: int | None) None[source]
Update random seed.
- Parameters:
seed (int or None) – Random seed for reproducibility. None for non-deterministic.
- simulate_1d(t_ind: int = 0) tuple[ndarray, ndarray, ndarray][source]
Simulate 1D spectrum (energy-resolved) at a specific time point.
Generates a single energy-resolved spectrum from the model at the specified time index, adds appropriate noise for the detector type, and stores results for later access.
- Parameters:
t_ind (int, default=0) – Time index for which to generate spectrum. For models without time-dependence, use default 0.
- Returns:
clean_data (ndarray) – Noiseless spectrum from model (shape: [n_energy])
noisy_data (ndarray) – Spectrum with added noise (shape: [n_energy])
noise (ndarray) – Noise component (noisy - clean, shape: [n_energy])
Examples
>>> # Simulate baseline spectrum >>> sim = Simulator(model, noise_level=0.05) >>> clean, noisy, noise = sim.simulate_1d(t_ind=0) >>> >>> # Plot comparison >>> plt.plot(model.energy, clean, 'k-', label='Clean') >>> plt.plot(model.energy, noisy, 'r.', label='Noisy', ms=2) >>> plt.legend()
>>> # Calculate SNR >>> snr = sim.get_snr() >>> print(f"Signal-to-noise ratio: {snr:.1f}")
>>> # Simulate different time points >>> for t_i in [0, 50, 100]: ... clean, noisy, noise = sim.simulate_1d(t_ind=t_i) ... plt.plot(model.energy, noisy, label=f't={model.time[t_i]:.1f}')
Notes
Results are stored in simulator attributes for later access: - self.data_clean: (Last) clean spectrum - self.data_noisy: Last noisy spectrum - self.noise: Last noise realization
Can access these without re-simulation: >>> sim.simulate_1d() >>> snr = sim.get_snr() # Uses stored data >>> sim.plot_comparison(dim=1) # Uses stored data
See also
simulate_2dSimulate full 2D spectrum
simulate_nGenerate multiple 1D realizations
plot_comparisonVisualize results
- simulate_2d() tuple[ndarray, ndarray, ndarray][source]
Simulate 2D spectrum (time- and energy-resolved).
Generates a complete 2D time- and energy-resolved spectrum from the model, adds appropriate noise for each time point, and stores results.
- Returns:
clean_data (ndarray) – Noiseless 2D spectrum from model (shape: [n_time, n_energy])
noisy_data (ndarray) – 2D spectrum with added noise (shape: [n_time, n_energy])
noise (ndarray) – Noise component (noisy - clean, shape: [n_time, n_energy])
Examples
>>> # Basic 2D simulation >>> sim = Simulator(model, noise_level=0.05) >>> clean, noisy, noise = sim.simulate_2d() >>> >>> # Visualize with built-in plotter >>> sim.plot_comparison(dim=2)
>>> # Manual visualization >>> fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4)) >>> ax1.pcolormesh(model.energy, model.time, clean) >>> ax1.set_title('Clean Model') >>> ax2.pcolormesh(model.energy, model.time, noisy) >>> ax2.set_title(f'Noisy (SNR={sim.get_snr():.1f})')
>>> # Test fitting on simulated data >>> clean, noisy, noise = sim.simulate_2d() >>> # ... set up fitting ... >>> file.data = noisy # Use noisy data for fit >>> file.fit_2d(model_name='test', stages=2) >>> # Compare fitted vs. true parameters
>>> # Vary noise level to study impact >>> for noise_level in [0.01, 0.05, 0.10]: ... sim.set_noise_level(noise_level) ... clean, noisy, noise = sim.simulate_2d() ... snr = sim.get_snr() ... print(f"Noise level {noise_level:.2f}: SNR = {snr:.1f}")
Notes
Noise Application:
For analog detectors, noise is added independently at each pixel. For photon counting, photons are distributed across all pixels according to the signal probability distribution, then reconverted to same scale as input for direct comparison.
Performance:
Simulation time scales with: - Model evaluation time (dominates for complex models) - Array size (n_time × n_energy) - Noise generation method
Typical times: - Simple model, 200×500 array: ~0.1-1 second - Complex model with time-dependence: ~1-10 seconds - Photon counting slightly slower than analog
Memory:
Three arrays stored (clean, noisy, noise), each: - Size: n_time × n_energy × 8 bytes (float64) - Example: 200×500 = ~2.4 MB per array, ~7.2 MB total
See also
simulate_1dSimulate single spectrum
simulate_nGenerate multiple 2D realizations
plot_comparisonVisualize results
save_dataSave to HDF5 file
- simulate_n(n: int, *, dim: int = 2, t_ind: int = 0, show_progress: bool = True) tuple[ndarray, list[ndarray], list[ndarray]][source]
Generate n simulated datasets with independent noise realizations.
Generates the clean data ONCE from the model, then adds n independent noise realizations. Use for statistical analysis of fitting algorithms and uncertainty quantification or machine learning model training.
- Parameters:
n (int) – Number of datasets to generate (must be >= 1)
dim ({1, 2}, default=2) – Dimensionality: - 1: Generate 1D spectra - 2: Generate 2D spectra
t_ind (int, default=0) – Time index for 1D simulations (ignored for dim=2)
show_progress (bool, default=True) – Print progress updates during generation
- Returns:
clean_data (ndarray) – Single clean dataset (1D or 2D depending on dim). Same for all n realizations (generated once).
noisy_data_list (list of ndarray) – List of n noisy datasets, each with independent noise. Each element has same shape as clean_data.
noise_list (list of ndarray) – List of n noise realizations (noisy - clean for each dataset). Each element has same shape as clean_data.
Examples
>>> # Generate 20 independent noisy datasets >>> sim = Simulator(model, noise_level=0.05) >>> clean, noisy_list, noise_list = sim.simulate_n(n=20, dim=2) >>> >>> # Fit each dataset and analyze parameter distribution >>> fitted_params = [] >>> for noisy_data in noisy_list: ... file.data = noisy_data ... file.fit_2d('test', stages=2) ... fitted_params.append(model.lmfit_pars['amplitude'].value) >>> >>> # Check parameter recovery >>> true_value = model.lmfit_pars['amplitude'].value >>> mean_fitted = np.mean(fitted_params) >>> std_fitted = np.std(fitted_params) >>> print(f"True: {true_value:.2f}") >>> print(f"Mean fitted: {mean_fitted:.2f} ± {std_fitted:.2f}")
>>> # Analyze noise statistics >>> noise_mean = np.mean(noise_list, axis=0) >>> noise_std = np.std(noise_list, axis=0) >>> >>> # Should be close to zero (unbiased) >>> print(f"Noise mean: {np.mean(noise_mean):.2e}") >>> # Should match noise_level * signal scale >>> print(f"Noise std: {np.mean(noise_std):.2e}")
>>> # Save multiple realizations for later use >>> clean, noisy_list, noise_list = sim.simulate_n(n=100, dim=2) >>> sim.save_data( ... filepath='simulations/batch_001.h5', ... n_data=noisy_list ... )
>>> # Test convergence of fitted parameters with n >>> for n_datasets in [5, 10, 20, 50]: ... clean, noisy_list, _ = sim.simulate_n(n=n_datasets, dim=2) ... # ... fit each and compute parameter statistics ... ... print(f"n={n_datasets}: parameter std = {param_std:.3f}")
Notes
Efficiency:
Generating clean data once and adding n noise realizations is much faster than generating n complete simulations:
This method: 1 model evaluation + n noise additions
n separate simulate_2d calls: n model evaluations + n noise additions
For complex models where evaluation is slow, this can save minutes to hours of computation time.
Statistical Analysis:
This function enables: - Monte Carlo analysis of fitting uncertainty - Algorithm validation (can recover true parameters?) - Bias detection (systematic fitting errors) - Confidence interval validation (coverage probability) - Experimental design optimization (required SNR)
Memory Considerations:
All n datasets stored in memory as lists: - Memory usage: n × (n_time × n_energy × 8 bytes) - Example: 100 datasets of 200×500 = ~800 MB
For very large n or large arrays, consider: - Processing in batches - Saving to disk incrementally - Using generator pattern instead of list
Progress Display:
When show_progress=True, prints: - “Generating clean data from model… Done” - “Adding noise to dataset i/n” (updates in place) - “Generated n noisy datasets successfully”
Set show_progress=False for batch processing or when redirecting output.
See also
simulate_1dSingle 1D simulation
simulate_2dSingle 2D simulation
save_dataSave multiple datasets to HDF5
- simulate_parameter_sweep(parameter_sweep: ParameterSweep, n_realizations: int, *, dim: int = 2, filepath: str = 'ml_training_data.h5', show_progress: bool = True) None[source]
Generate ML training dataset by sweeping parameters.
Processes configurations one at a time, immediately saving to disk. Memory usage remains constant regardless of parameter space size.
- Parameters:
parameter_sweep (ParameterSweep) – Generator yielding parameter configurations
n_realizations (int) – Number of noisy realizations per parameter configuration
dim ({1, 2}, default=2) – Dimensionality of simulated data: - 1: Generate 1D spectra - 2: Generate 2D spectra
filepath (str, default='ml_training_data.h5') – HDF5 file path for output
show_progress (bool, default=True) – Print progress updates during generation
Examples
>>> # Set up parameter space >>> sweep = ParameterSweep(strategy='random', seed=42) >>> sweep.add_uniform('GLP_01_A', 5, 30, n_samples=100) >>> sweep.add_uniform('GLP_01_x0', 5, 15, n_samples=100) >>> >>> # Generate dataset >>> sim = Simulator(model, noise_level=0.05, seed=42) >>> sim.simulate_parameter_sweep( ... parameter_sweep=sweep, ... n_realizations=20, ... filepath='training_data.h5' ... ) Processing config 1/100: {'GLP_01_A': 12.5, 'GLP_01_x0': 8.3} Saved config 1 with 20 realizations ... Parameter sweep complete! Generated 100 configs × 20 realizations Data saved to: ./simulated_data/training_data.h5
Notes
Memory Efficiency: Only one configuration is in memory at a time. Each is immediately written to disk before processing the next. Total memory usage is independent of parameter space size.
Resumability: If interrupted, completed configurations are already saved to disk. Currently does not support automatic resume (will overwrite file).
File Structure: See _initialize_sweep_hdf5 for complete HDF5 structure description.
See also
ParameterSweepDefine parameter space to sweep
simulate_nGenerate multiple noisy realizations
_initialize_sweep_hdf5HDF5 file structure
_append_config_to_hdf5Incremental saving logic