# Repo Architecture

Orientation guide for anyone (human or LLM) modifying the trspecfit codebase.
Start here before writing new code — the goal is to avoid reinventing wheels
and to put new code in the right layer.

For what model combinations are actually supported, see
[supported_models.md](supported_models.md).
For how the compiled 2D evaluator was designed, see
[lowered_evaluator.md](lowered_evaluator.md).

## Two-layer design

trspecfit has two distinct layers, and **the distinction matters for every
change you make**:

1. **Authoring / user-facing layer** (`mcp.py`, `trspecfit.py`, `simulator.py`,
   YAML parsing in `utils/parsing.py`). Optimized for **readability, clear
   error messages, and interactive exploration**. Performance is secondary —
   these modules run once per model, not once per optimizer iteration.
   Users poke at objects here; tracebacks land here.

2. **Compiled hot-path layer** (`graph_ir.py`, `eval_1d.py`, `eval_2d.py`,
   and the numeric bodies in `functions/`). Optimized for **performance**.
   These run inside the residual loop, possibly millions of times per fit.
   Array-oriented, no Python dicts/strings in the inner loops, no object
   attribute lookups where a packed array will do. See
   [lowered_evaluator.md](lowered_evaluator.md) for the motivation.

`spectra.py` is the bridge — it hands the current parameter vector from
`fitlib` to the compiled evaluator.

New features are usually added first in the authoring layer (mcp), then
lowered into the compiled layer once the semantics are stable.

## Top-level modules (`src/trspecfit/`)

### `trspecfit.py` — user entry point

Defines `Project` and `File`. This is the API most users see in notebooks.
`Project` holds project-wide config (plot defaults, file I/O formats, fit
settings). `File` wraps a single dataset (1D or 2D), its axes, and any
number of named `Model`s for comparison. Methods: `load_model`,
`add_time_dependence`, `add_par_profile`, `set_fit_limits`, `fit_baseline`,
`fit_slice_by_slice`, `fit_2d`. **Always use these public methods in tests
and examples** — they carry out validation and axis propagation that
direct `Model` construction skips. Authoring-layer; keep readable.

### `mcp.py` — Model / Component / Par system

Hierarchical model construction: `Par` → `Component` → `Model`. Plus two
`Model` subclasses: `Dynamics` (time-dependent behavior, multi-cycle,
convolution kernels) and `Profile` (auxiliary-axis parameter variation).
Handles parameter naming (`{model}_{component}_{param}`), expressions via
`asteval`, and a slow reference evaluator used by the simulator and for
cross-checking. **Keep this file human-readable with no regard for
performance** — hot-path evaluation lives in the compiled layer. User
tracebacks and interactive debugging go through here.

### `graph_ir.py` — compiled intermediate representation

Lowers an mcp `Model` into a typed DAG (`NodeKind`, `EdgeKind`, domain
classification) and then into a packed `ScheduledPlan1D` / `ScheduledPlan2D`:
flat numpy arrays of instructions, parameter indices, and RPN expression
programs. No Python objects, strings, or dicts in the execution data — the
whole plan is array-oriented so the evaluator can run in a tight loop.
Also contains compile-time gates (`can_lower_2d`, etc.) that decide
whether a given model is representable in the v1 fast path.

### `eval_1d.py`, `eval_2d.py` — hot-path evaluators

Pure functions of the form `evaluate_Nd(plan, theta) -> spectrum`. Consume
the `ScheduledPlan` produced by `graph_ir` and the current parameter
vector. `eval_2d` broadcasts peak functions with `(n_time, 1)` parameters
against `(1, n_energy)` energy. Dynamics, convolution, and profile
dispatch tables live here (`DYNAMICS_DISPATCH`, `CONV_KERNEL_DISPATCH`).
**Performance-critical — prefer array operations, avoid Python-level
branching on model structure (the plan already captured it).**

### `spectra.py` — evaluator bridge

Thin module that the fitting engine calls on every residual evaluation.
`fit_model_gir` (the default) dispatches to the compiled evaluator
(`evaluate_1d` / `evaluate_2d`) when a `ScheduledPlan` is present, and
falls back to `fit_model_mcp` — the mcp reference evaluator — when the
model is not lowerable or when 1D component-wise spectra are requested
for plotting. Users can swap in a custom spectrum function via
`Project.spec_fun_str`.

### `fitlib.py` — lmfit wrappers, CI, MCMC, plotting

The fitting machinery: residual function, `fit_wrapper` (global + local
solvers), confidence intervals via `lmfit.conf_interval`, MCMC via
`lmfit.emcee`, and the 1D/2D fit-result plotting (`plt_fit_res_1d`,
`plt_fit_res_2d`). Internal module — method docstrings stay minimal,
module-level doc carries the weight.

### `simulator.py` — synthetic data generation

User-facing `Simulator` class. Generates 1D/2D spectra from a `Model`
with noise (Poisson, Gaussian, none) and detector type (analog, photon
counting). Supports `simulate_n` (replicates), `ParameterSweep`
integration for ML training-data generation, and HDF5 export. Use here
for testing, fit-pipeline validation, identifiability studies, and
training-data synthesis.

### `fit_results.py` — completed-fit inspection / comparison

User-facing `FitResults` class — the immutable view over a list of
`SavedFitSlot`. Two construction paths: `FitResults.load(path)` for
loaded archives and the `Project.results` property for in-session work.
A `FitResults` is frozen at construction (the underlying slot list is
copied), so `r1 = p.results; <run another fit>; r2 = p.results` gives
two distinct snapshots — `r1` does not see the new slot. Query API:
`find` / `get` / `files` / `models` / iteration. Comparison:
`compare_models` (returns a metrics DataFrame; refuses to compare
slots whose `observed_sha256` differs on the same `(file, fit_type)`)
and `plot_residuals` (smoke-test-grade panels, no energy/time labels —
slots don't carry parent-file axes). The save/export side lives in
`utils/fit_io.py`; this module is read-only on top of those slots.

## Fit results: save / export / load architecture

The fit-output persistence layer is **slot-driven**, not model-walking.
Once a fit completes, the result is captured eagerly into a
`SavedFitSlot` (one per `(file, model, fit_type, selection)`); everything
downstream — save, export, in-session comparison, archive load — reads
slots, never live `Model.result`.

```
fit_baseline / fit_spectrum /                ┌──────────────────────────┐
fit_slice_by_slice / fit_2d  ────► result ───►  _slot_from_<fit_type>   │
                                              │   (eager extraction in  │
                                              │    utils/fit_io.py)     │
                                              └────────────┬────────────┘
                                                           │
                                                           ▼
                                            Project._fit_history (append-only log)
                                                           │
                                       ┌───────────────────┼──────────────────────┐
                                       ▼                   ▼                      ▼
                       Project.results (wrapper)  Project.save_fits        Project.export_fits
                                                  (filter + snapshot       (filter + CSV/PNG
                                                   collapse → HDF5)         tree)

HDF5 archive ────► reader ────► FitResults (FitResults.load / Project.load_fits)
                                Independent of _fit_history; never merged in.
```

**Two different I/O directions, two different surfaces:**

- **Save / load** (round-trippable): `Project.save_fits(path)` →
  HDF5 archive; `FitResults.load(path)` (or the equivalent
  `Project.load_fits(path)` convenience) deserializes back. Schema in
  [fit_archive_schema.md](fit_archive_schema.md). Append-mode by default;
  slot-scoped overwrite.
- **Export** (one-way): `Project.export_fits(path, format="csv")` →
  directory of human-readable CSVs and PNGs. No `load` counterpart —
  round-tripping fits is HDF5's job.

`File.save_fit` / `File.export_fit` / `File.compare_models` are
one-line delegates to the corresponding `Project.*` / `FitResults.*`
methods. There is no `File.load_fit`: load is path-scoped, not file-scoped.

The legacy `File.save_sbs_fit` / `File.save_2d_fit` are deprecated
aliases that emit `DeprecationWarning` and forward to the new
`File.export_fit`. The legacy on-disk layout is preserved internally
by `_save_sbs_fit_legacy` / `_save_2d_fit_legacy`, which are called
from inside `fit_slice_by_slice` / `fit_2d` / `Project.fit_2d` on every
fit unless the auto-export side effect is disabled via
`Project.auto_export = False` (default `True`). Both are scheduled for
removal before v1.0.0; new code should use `Project.export_fits` /
`File.export_fit`.

## `config/` — runtime configuration

### `config/functions.py`

Introspects `functions/{energy,time,profile}.py` to discover which function
names are available. Provides `all_functions`, `background_functions`,
`convolution_functions`, `energy_functions`, `numbering_exceptions`,
`get_function_parameters`. The YAML parser and mcp use this to decide
which components can be numbered (`GLP_01`, `GLP_02`, ...) and which are
singletons (backgrounds, convolutions). **If you add a new background
function, register it here.**

### `config/plot.py`

`PlotConfig` dataclass. The single source of truth for plot appearance
(axis labels/limits/direction, colormaps, DPI, etc.). Inheritance chain:
Project defaults → File overrides → Model inherits → per-call overrides.
Use `PlotConfig` whenever you add a plotting function — do not invent new
keyword arguments for styling.

## `functions/` — the function registry

Three flat modules of numeric functions. **Function names and parameter
names deliberately use CamelCase / PascalCase** (not snake_case) because
`_` is the component-ID delimiter (`{model}_{component}_{param}`). See
`CLAUDE.md` at the repo root for the full naming rule. These
functions are called directly from the compiled evaluators — they should
be fast, numpy-only, and free of Python-level allocation where possible.

### `functions/energy.py`

Peak and background shapes used as spectral components. Examples: `GLP`,
`Gauss`, `Voigt`, `DoniachSunjic`, `Offset`, `LinBack`, `Shirley`. Peak
functions have signature `func(x, par1, par2, ...)`; background functions
have signature `func(x, par, spectrum=None)` so they can depend on the
current peak sum (e.g. Shirley). Add new peak or background shapes here.

### `functions/time.py`

Dynamics and convolution kernels. Dynamics functions (e.g. `expFun`,
`sinFun`, `linFun`, `erfFun`, `sqrtFun`) share signature
`func(t, par1, ..., t0, y0)` with the invariant `f(t < t0) = 0`.
Convolution kernels are named `funcCONV` (e.g. `gaussCONV`) with a
companion `funcCONV_kernel_width(...)` returning the kernel-width multiplier.
Add new time-domain behavior or IRF kernels here.

### `functions/profile.py`

Auxiliary-axis profile functions, required to start with a `p` prefix
(e.g. `pExpDecay`, `pLinear`, `pGauss`). Signature `func(x, par1, ...)`
where `x` is the auxiliary axis (depth, position, fluence, ...). Attached
to a parameter via `File.add_par_profile`; evaluation samples the
profiled parameter across the aux axis and averages. Add new profile
shapes here.

## `utils/` — reusable helpers (check here before inventing)

### `utils/arrays.py`

Array/numeric helpers. Notably `my_conv` (used by the 2D evaluator for
IRF convolution), `format_float_scientific` (fixed-width scientific
notation), `oom` (order-of-magnitude), running averages, sign-change
detection, angular normalization. Use `my_conv` rather than rolling your
own `scipy.signal.convolve` wrapper.

### `utils/hdf5.py`

Typed HDF5 helpers. `require_group`, `require_dataset`, `json_loads_attr`.
All HDF5 I/O in the repo should go through these rather than raw
`h5py` calls — they normalize attribute types across numpy/bytes/str.

### `utils/fit_io.py`

Fit-results persistence. Owns the `SavedProject` / `SavedFile` /
`SavedFitSlot` dataclasses (the on-disk data model), the four
per-fit-type slot extractors (`_slot_from_baseline`,
`_slot_from_spectrum`, `_slot_from_sbs`, `_slot_from_2d` — all called
once at fit completion with copied snapshot args, never live `Model`
references), the identity helpers (`compute_file_fingerprint`,
`compute_history_key`, `compute_archive_slot_key`,
`build_selection_json`, `compute_observed_sha256`), the
snapshot-collapse helper (`collapse_history_to_snapshot`), and the
HDF5 reader/writer (`read_archive`, `write_archive`) plus the CSV/PNG
exporter (`write_csv_export`). The `SavedFitSlot` is the **single
source of truth for completed-fit state** — neither `Model` nor `File`
carries observed/fit/metrics. New persistence work lands here, not in
`fitlib` or `trspecfit.py`. See `docs/design/fit_archive_schema.md`
for the on-disk schema.

### `utils/lmfit.py`

lmfit-parameter plumbing. Parameter construction, extraction, conversion
to pandas DataFrames, MCMC config helpers, and the `VARY_LEVELS` /
`_vary_to_bool` / `vary_to_level` machinery for the
`static`/`file`/`project` vary hierarchy. Any new lmfit interop belongs
here rather than in `trspecfit.py` or `fitlib.py`.

### `utils/parsing.py`

YAML model parsing. `ModelValidationError`, the custom
`_ComponentNumberingConstructor` that auto-numbers duplicate YAML keys
(`GLP` → `GLP_01`, `GLP_02`), and the validation that enforces which
function names and parameters are legal. Extend here — not in mcp — when
adding new YAML syntax.

### `utils/plot.py`

Generic matplotlib helpers used by the library and user notebooks:
1D/2D data plotting, image loading (`load_plot`, `load_plot_grid`) for
embedding saved figures in reports, axis formatting utilities. All
plotting functions take a `PlotConfig`. Specialized plotting (e.g. fit
residuals) lives in `fitlib.py`, not here.

### `utils/sweep.py`

`ParameterSweep` (grid / random / uniform / normal sampling) and
`SweepDataset` (inspection of generated HDF5 datasets). Used by
`Simulator` for ML training-data generation. Grows as new sampling
strategies are needed.

## Typical execution flow

For a 2D fit via `File.fit_2d`:

1. `File.fit_2d` gathers the target `Model`, axes, fit limits.
2. `Model` is lowered to a `ScheduledPlan2D` via `graph_ir.schedule_2d`
   once, up front.
3. `fitlib.fit_wrapper` runs lmfit; each residual call goes
   `residual_fun` → `spectra.fit_model_gir` → `eval_2d.evaluate_2d(plan, theta)`.
4. `evaluate_2d` produces the model spectrum using only the plan arrays
   and the parameter vector — no mcp objects touched in the hot path.
5. After the fit: confidence intervals / MCMC / plotting run in
   `fitlib`. The completed result is then captured eagerly into a
   `SavedFitSlot` via `utils/fit_io.py` and appended to
   `Project._fit_history`; that slot is what `Project.results`,
   `Project.save_fits`, and `Project.export_fits` operate on. Live
   `Model.result` is never re-read by these paths — see
   "Fit results: save / export / load architecture" above.

Models outside the current compiled support set (see
[supported_models.md](supported_models.md)) fall back to the mcp reference
evaluator. New features are generally prototyped on that slow path first.

## Where to put new code — quick guide

- **New energy / time / profile function** → implement it in `functions/{energy,time,profile}.py`; for the full checklist (tests, registration, and GIR follow-up when needed), use [../ai/add-function.md](../ai/add-function.md).
- **New YAML keyword / syntax** → `utils/parsing.py` + validation.
- **New user-facing method on a file** → `File` in `trspecfit.py`.
- **New model composition rule** → mcp first; update `supported_models.md`; lower into `graph_ir` once stable.
- **New plot style / axis logic** → `utils/plot.py`, driven by `PlotConfig`.
- **New fit-result post-processing (CI, MCMC, in-fit plots)** → `fitlib.py`.
- **New fit-archive field, exporter format, or comparison metric** → `utils/fit_io.py` (data model + writer/reader + CSV exporter) and `fit_results.py` (query / `compare_models`). Slot extraction stays in `utils/fit_io.py`; the four `_append_<fit_type>_slot` call sites in `trspecfit.py` should not be replicated elsewhere.
- **New simulator feature / sampling strategy** → `simulator.py` / `utils/sweep.py`.
- **New HDF5 I/O** → go through `utils/hdf5.py` helpers.
- **Performance optimization of an existing feature** → lower into `graph_ir` / `eval_*`. Do **not** optimize mcp.