Repo Architecture

Orientation guide for anyone (human or LLM) modifying the trspecfit codebase. Start here before writing new code — the goal is to avoid reinventing wheels and to put new code in the right layer.

For what model combinations are actually supported, see supported_models.md. For how the compiled 2D evaluator was designed, see lowered_evaluator.md.

Two-layer design

trspecfit has two distinct layers, and the distinction matters for every change you make:

  1. Authoring / user-facing layer (mcp.py, trspecfit.py, simulator.py, YAML parsing in utils/parsing.py). Optimized for readability, clear error messages, and interactive exploration. Performance is secondary — these modules run once per model, not once per optimizer iteration. Users poke at objects here; tracebacks land here.

  2. Compiled hot-path layer (graph_ir.py, eval_1d.py, eval_2d.py, and the numeric bodies in functions/). Optimized for performance. These run inside the residual loop, possibly millions of times per fit. Array-oriented, no Python dicts/strings in the inner loops, no object attribute lookups where a packed array will do. See lowered_evaluator.md for the motivation.

spectra.py is the bridge — it hands the current parameter vector from fitlib to the compiled evaluator.

New features are usually added first in the authoring layer (mcp), then lowered into the compiled layer once the semantics are stable.

Top-level modules (src/trspecfit/)

trspecfit.py — user entry point

Defines Project and File. This is the API most users see in notebooks. Project holds project-wide config (plot defaults, file I/O formats, fit settings). File wraps a single dataset (1D or 2D), its axes, and any number of named Models for comparison. Methods: load_model, add_time_dependence, add_par_profile, set_fit_limits, fit_baseline, fit_slice_by_slice, fit_2d. Always use these public methods in tests and examples — they carry out validation and axis propagation that direct Model construction skips. Authoring-layer; keep readable.

mcp.py — Model / Component / Par system

Hierarchical model construction: ParComponentModel. Plus two Model subclasses: Dynamics (time-dependent behavior, multi-cycle, convolution kernels) and Profile (auxiliary-axis parameter variation). Handles parameter naming ({model}_{component}_{param}), expressions via asteval, and a slow reference evaluator used by the simulator and for cross-checking. Keep this file human-readable with no regard for performance — hot-path evaluation lives in the compiled layer. User tracebacks and interactive debugging go through here.

graph_ir.py — compiled intermediate representation

Lowers an mcp Model into a typed DAG (NodeKind, EdgeKind, domain classification) and then into a packed ScheduledPlan1D / ScheduledPlan2D: flat numpy arrays of instructions, parameter indices, and RPN expression programs. No Python objects, strings, or dicts in the execution data — the whole plan is array-oriented so the evaluator can run in a tight loop. Also contains compile-time gates (can_lower_2d, etc.) that decide whether a given model is representable in the v1 fast path.

eval_1d.py, eval_2d.py — hot-path evaluators

Pure functions of the form evaluate_Nd(plan, theta) -> spectrum. Consume the ScheduledPlan produced by graph_ir and the current parameter vector. eval_2d broadcasts peak functions with (n_time, 1) parameters against (1, n_energy) energy. Dynamics, convolution, and profile dispatch tables live here (DYNAMICS_DISPATCH, CONV_KERNEL_DISPATCH). Performance-critical — prefer array operations, avoid Python-level branching on model structure (the plan already captured it).

spectra.py — evaluator bridge

Thin module that the fitting engine calls on every residual evaluation. fit_model_gir (the default) dispatches to the compiled evaluator (evaluate_1d / evaluate_2d) when a ScheduledPlan is present, and falls back to fit_model_mcp — the mcp reference evaluator — when the model is not lowerable or when 1D component-wise spectra are requested for plotting. Users can swap in a custom spectrum function via Project.spec_fun_str.

fitlib.py — lmfit wrappers, CI, MCMC, plotting

The fitting machinery: residual function, fit_wrapper (global + local solvers), confidence intervals via lmfit.conf_interval, MCMC via lmfit.emcee, and the 1D/2D fit-result plotting (plt_fit_res_1d, plt_fit_res_2d). Internal module — method docstrings stay minimal, module-level doc carries the weight.

simulator.py — synthetic data generation

User-facing Simulator class. Generates 1D/2D spectra from a Model with noise (Poisson, Gaussian, none) and detector type (analog, photon counting). Supports simulate_n (replicates), ParameterSweep integration for ML training-data generation, and HDF5 export. Use here for testing, fit-pipeline validation, identifiability studies, and training-data synthesis.

fit_results.py — completed-fit inspection / comparison

User-facing FitResults class — the immutable view over a list of SavedFitSlot. Two construction paths: FitResults.load(path) for loaded archives and the Project.results property for in-session work. A FitResults is frozen at construction (the underlying slot list is copied), so r1 = p.results; <run another fit>; r2 = p.results gives two distinct snapshots — r1 does not see the new slot. Query API: find / get / files / models / iteration. Comparison: compare_models (returns a metrics DataFrame; refuses to compare slots whose observed_sha256 differs on the same (file, fit_type)) and plot_residuals (smoke-test-grade panels, no energy/time labels — slots don’t carry parent-file axes). The save/export side lives in utils/fit_io.py; this module is read-only on top of those slots.

Fit results: save / export / load architecture

The fit-output persistence layer is slot-driven, not model-walking. Once a fit completes, the result is captured eagerly into a SavedFitSlot (one per (file, model, fit_type, selection)); everything downstream — save, export, in-session comparison, archive load — reads slots, never live Model.result.

fit_baseline / fit_spectrum /                ┌──────────────────────────┐
fit_slice_by_slice / fit_2d  ────► result ───►  _slot_from_<fit_type>   │
                                              │   (eager extraction in  │
                                              │    utils/fit_io.py)     │
                                              └────────────┬────────────┘
                                                           │
                                                           ▼
                                            Project._fit_history (append-only log)
                                                           │
                                       ┌───────────────────┼──────────────────────┐
                                       ▼                   ▼                      ▼
                       Project.results (wrapper)  Project.save_fits        Project.export_fits
                                                  (filter + snapshot       (filter + CSV/PNG
                                                   collapse → HDF5)         tree)

HDF5 archive ────► reader ────► FitResults (FitResults.load / Project.load_fits)
                                Independent of _fit_history; never merged in.

Two different I/O directions, two different surfaces:

  • Save / load (round-trippable): Project.save_fits(path) → HDF5 archive; FitResults.load(path) (or the equivalent Project.load_fits(path) convenience) deserializes back. Schema in fit_archive_schema.md. Append-mode by default; slot-scoped overwrite.

  • Export (one-way): Project.export_fits(path, format="csv") → directory of human-readable CSVs and PNGs. No load counterpart — round-tripping fits is HDF5’s job.

File.save_fit / File.export_fit / File.compare_models are one-line delegates to the corresponding Project.* / FitResults.* methods. There is no File.load_fit: load is path-scoped, not file-scoped.

The legacy File.save_sbs_fit / File.save_2d_fit are deprecated aliases that emit DeprecationWarning and forward to the new File.export_fit. The legacy on-disk layout is preserved internally by _save_sbs_fit_legacy / _save_2d_fit_legacy, which are called from inside fit_slice_by_slice / fit_2d / Project.fit_2d on every fit unless the auto-export side effect is disabled via Project.auto_export = False (default True). Both are scheduled for removal before v1.0.0; new code should use Project.export_fits / File.export_fit.

config/ — runtime configuration

config/functions.py

Introspects functions/{energy,time,profile}.py to discover which function names are available. Provides all_functions, background_functions, convolution_functions, energy_functions, numbering_exceptions, get_function_parameters. The YAML parser and mcp use this to decide which components can be numbered (GLP_01, GLP_02, …) and which are singletons (backgrounds, convolutions). If you add a new background function, register it here.

config/plot.py

PlotConfig dataclass. The single source of truth for plot appearance (axis labels/limits/direction, colormaps, DPI, etc.). Inheritance chain: Project defaults → File overrides → Model inherits → per-call overrides. Use PlotConfig whenever you add a plotting function — do not invent new keyword arguments for styling.

functions/ — the function registry

Three flat modules of numeric functions. Function names and parameter names deliberately use CamelCase / PascalCase (not snake_case) because _ is the component-ID delimiter ({model}_{component}_{param}). See CLAUDE.md at the repo root for the full naming rule. These functions are called directly from the compiled evaluators — they should be fast, numpy-only, and free of Python-level allocation where possible.

functions/energy.py

Peak and background shapes used as spectral components. Examples: GLP, Gauss, Voigt, DoniachSunjic, Offset, LinBack, Shirley. Peak functions have signature func(x, par1, par2, ...); background functions have signature func(x, par, spectrum=None) so they can depend on the current peak sum (e.g. Shirley). Add new peak or background shapes here.

functions/time.py

Dynamics and convolution kernels. Dynamics functions (e.g. expFun, sinFun, linFun, erfFun, sqrtFun) share signature func(t, par1, ..., t0, y0) with the invariant f(t < t0) = 0. Convolution kernels are named funcCONV (e.g. gaussCONV) with a companion funcCONV_kernel_width(...) returning the kernel-width multiplier. Add new time-domain behavior or IRF kernels here.

functions/profile.py

Auxiliary-axis profile functions, required to start with a p prefix (e.g. pExpDecay, pLinear, pGauss). Signature func(x, par1, ...) where x is the auxiliary axis (depth, position, fluence, …). Attached to a parameter via File.add_par_profile; evaluation samples the profiled parameter across the aux axis and averages. Add new profile shapes here.

utils/ — reusable helpers (check here before inventing)

utils/arrays.py

Array/numeric helpers. Notably my_conv (used by the 2D evaluator for IRF convolution), format_float_scientific (fixed-width scientific notation), oom (order-of-magnitude), running averages, sign-change detection, angular normalization. Use my_conv rather than rolling your own scipy.signal.convolve wrapper.

utils/hdf5.py

Typed HDF5 helpers. require_group, require_dataset, json_loads_attr. All HDF5 I/O in the repo should go through these rather than raw h5py calls — they normalize attribute types across numpy/bytes/str.

utils/fit_io.py

Fit-results persistence. Owns the SavedProject / SavedFile / SavedFitSlot dataclasses (the on-disk data model), the four per-fit-type slot extractors (_slot_from_baseline, _slot_from_spectrum, _slot_from_sbs, _slot_from_2d — all called once at fit completion with copied snapshot args, never live Model references), the identity helpers (compute_file_fingerprint, compute_history_key, compute_archive_slot_key, build_selection_json, compute_observed_sha256), the snapshot-collapse helper (collapse_history_to_snapshot), and the HDF5 reader/writer (read_archive, write_archive) plus the CSV/PNG exporter (write_csv_export). The SavedFitSlot is the single source of truth for completed-fit state — neither Model nor File carries observed/fit/metrics. New persistence work lands here, not in fitlib or trspecfit.py. See docs/design/fit_archive_schema.md for the on-disk schema.

utils/lmfit.py

lmfit-parameter plumbing. Parameter construction, extraction, conversion to pandas DataFrames, MCMC config helpers, and the VARY_LEVELS / _vary_to_bool / vary_to_level machinery for the static/file/project vary hierarchy. Any new lmfit interop belongs here rather than in trspecfit.py or fitlib.py.

utils/parsing.py

YAML model parsing. ModelValidationError, the custom _ComponentNumberingConstructor that auto-numbers duplicate YAML keys (GLPGLP_01, GLP_02), and the validation that enforces which function names and parameters are legal. Extend here — not in mcp — when adding new YAML syntax.

utils/plot.py

Generic matplotlib helpers used by the library and user notebooks: 1D/2D data plotting, image loading (load_plot, load_plot_grid) for embedding saved figures in reports, axis formatting utilities. All plotting functions take a PlotConfig. Specialized plotting (e.g. fit residuals) lives in fitlib.py, not here.

utils/sweep.py

ParameterSweep (grid / random / uniform / normal sampling) and SweepDataset (inspection of generated HDF5 datasets). Used by Simulator for ML training-data generation. Grows as new sampling strategies are needed.

Typical execution flow

For a 2D fit via File.fit_2d:

  1. File.fit_2d gathers the target Model, axes, fit limits.

  2. Model is lowered to a ScheduledPlan2D via graph_ir.schedule_2d once, up front.

  3. fitlib.fit_wrapper runs lmfit; each residual call goes residual_funspectra.fit_model_gireval_2d.evaluate_2d(plan, theta).

  4. evaluate_2d produces the model spectrum using only the plan arrays and the parameter vector — no mcp objects touched in the hot path.

  5. After the fit: confidence intervals / MCMC / plotting run in fitlib. The completed result is then captured eagerly into a SavedFitSlot via utils/fit_io.py and appended to Project._fit_history; that slot is what Project.results, Project.save_fits, and Project.export_fits operate on. Live Model.result is never re-read by these paths — see “Fit results: save / export / load architecture” above.

Models outside the current compiled support set (see supported_models.md) fall back to the mcp reference evaluator. New features are generally prototyped on that slow path first.

Where to put new code — quick guide

  • New energy / time / profile function → implement it in functions/{energy,time,profile}.py; for the full checklist (tests, registration, and GIR follow-up when needed), use ../ai/add-function.md.

  • New YAML keyword / syntaxutils/parsing.py + validation.

  • New user-facing method on a fileFile in trspecfit.py.

  • New model composition rule → mcp first; update supported_models.md; lower into graph_ir once stable.

  • New plot style / axis logicutils/plot.py, driven by PlotConfig.

  • New fit-result post-processing (CI, MCMC, in-fit plots)fitlib.py.

  • New fit-archive field, exporter format, or comparison metricutils/fit_io.py (data model + writer/reader + CSV exporter) and fit_results.py (query / compare_models). Slot extraction stays in utils/fit_io.py; the four _append_<fit_type>_slot call sites in trspecfit.py should not be replicated elsewhere.

  • New simulator feature / sampling strategysimulator.py / utils/sweep.py.

  • New HDF5 I/O → go through utils/hdf5.py helpers.

  • Performance optimization of an existing feature → lower into graph_ir / eval_*. Do not optimize mcp.