# Execution Plan: Lowered Evaluator > Implementation history (Phases 1-6): [archive/lowered_evaluator_implementation.md](archive/lowered_evaluator_implementation.md). ## Motivation A 2D fit (400 energy x 440 time, 4 free params, 556 optimizer iterations) takes ~10 seconds. Profiling shows: | Category | Self-time | % of fit | Cause | |---|---|---|---| | Math (GLP + LinBack) | 3.2 s | 32% | The actual numeric work | | Parameter overhead | 2.8 s | 28% | `par_extract`, `Par.value`, `valuesdict` -- called 2M times | | Python dispatch | 1.6 s | 16% | `Component.value`, `combine`, `create_value_1d`, OOP calls | | Other | 2.6 s | 24% | lmfit internals, array ops, etc. | **Root cause:** The code acts as an *interpreter* inside the optimizer loop. `create_value_2d` (mcp.py:866) loops over time steps; at each step `Component.value` (mcp.py:1518) loops over parameters; each `Par.value` (mcp.py:1979) dispatches through 5 branches and calls `par_extract` which calls `valuesdict()`. All of this repeats 440 x 556 = 245k times. The irony: for time-dependent parameters, the full time trace *already exists* in `Dynamics.value_1d`. But `Par.value(t_ind)` consumes it one scalar at a time via `base + self.t_model.value_1d[t_ind]`. **Goal:** Stop interpreting the model on every residual call. Lower the model once into a flat, array-oriented execution plan, then evaluate it as a pure function. ## Architecture overview ``` User API (unchanged) Future model-builder UI | | v v Model / Component / Par <-- OOP tree (parsing, validation, user interaction) | | | build_graph(model) | (UI will target GraphIR directly) v v GraphIR <-- NEW: semantic IR (DAG), axis-agnostic | works for 1D and 2D models | can_lower_2d(graph) <-- gate: can the 2D backend compile this graph? | schedule_2d(graph) <-- NEW: compile DAG to flat 2D execution schedule v ScheduledPlan2D <-- NEW: packed arrays, (n_params, n_time) traces | | evaluate_2d(plan, theta) <-- NEW: pure function, array-oriented v spectrum (n_time, n_energy) ``` Three-layer design: 1. **OOP tree** -- the user-facing `Model`/`Component`/`Par` objects. Handles parsing, validation, user interaction. Unchanged. 2. **GraphIR** -- a directed acyclic graph of typed nodes with explicit data-dependency edges. This is the *semantic* representation of the model. Both the OOP tree and a future drag-and-drop model-builder UI can target it. **The graph is axis-agnostic** -- it works for 1D energy models, 1D time models, and 2D time+energy models. A user builds a 1D energy model first (valid graph, no time axis), then adds dynamics (graph gains time-dependent nodes and a time axis), making it compilable by the 2D backend. Standalone ``TIME_1D`` graphs are still valuable even without a lowered backend because they allow a future model-builder UI to author, validate, and save reusable dynamics-only YAML models. 3. **ScheduledPlan2D** -- a flat, packed-array execution schedule compiled from the graph by the 2D backend. No Python objects, no strings, no dicts in the hot path. This is what the evaluator actually runs. Backend-specific: a future `ScheduledPlan1D` / `can_lower_1d` / `evaluate_1d` can target the same GraphIR with a simpler storage model (no `(n_params, n_time)` trace matrix needed for 1D) for `ENERGY_1D` models (`fit_baseline`, `fit_spectrum`). `TIME_1D` standalone dynamics graphs remain graph-valid but are out of lowered backend scope for now. Key principle: **the OOP tree is for humans; the GraphIR is for semantics; the ScheduledPlan2D is for the optimizer.** --- ## Spec ### 1. GraphIR -- the semantic intermediate representation The GraphIR is a DAG (directed acyclic graph) of typed nodes connected by explicit dependency edges. It captures the full semantics of the model without prescribing an evaluation strategy. #### 1.1 Node types ```python class NodeKind(IntEnum): """Node types in the model graph.""" # --- Parameter nodes (leaves) --- STATIC_PARAM = 0 # fixed value, never changes during fit OPT_PARAM = 1 # optimizer-visible parameter (lmfit varies it) # --- Computed parameter nodes --- DYNAMICS_TRACE = 2 # time-dependent trace: evaluates a dynamics # function over the full time axis, producing # an (n_time,) array PARAM_PLUS_TRACE = 3 # base_param + dynamics_trace -> (n_time,) resolved value EXPRESSION = 4 # arithmetic expression referencing other params # --- Component evaluation nodes --- COMPONENT_EVAL = 5 # evaluates a component function over its domain axis: # energy functions -> (n_energy,) or (n_time, n_energy) # time functions -> (n_time,) # profile functions -> (n_aux,) # the domain is determined by the component's package # (fcts_energy, fcts_time, fcts_profile), not by the # graph's DomainKind # --- Reduction / combination nodes --- SUM = 6 # element-wise sum of multiple inputs SPECTRUM_FED_OP = 7 # component that consumes accumulated spectrum # as input (Shirley only in v1) # --- Convolution and profile nodes --- # These are part of the IR spec and build_graph emits them when the # model uses these features. They cannot be compiled by the v1 2D # backend (can_lower_2d returns False), but they are fully representable # in the graph -- a future backend or the model-builder UI can work # with them. CONVOLUTION = 100 # convolves accumulated signal with a kernel # component (e.g. gaussCONV). Edges: ADDEND from # the accumulated signal, PARAM_INPUT from kernel # parameters. Output replaces the accumulated signal. PROFILE_SAMPLE = 101 # resolves one profiled parameter at one # aux_axis point: # base + profile.value_1d[aux_ind] # One PROFILE_SAMPLE node per profiled parameter # per aux_axis point. PROFILE_AVERAGE = 102 # uniform average over per-sample COMPONENT_EVAL # outputs for one component. Edges: ADDEND from # each per-sample COMPONENT_EVAL node. SUBCYCLE_MASK = 103 # element-wise multiply by time_n_sub mask array. # Applied to a DYNAMICS_TRACE to zero out inactive # subcycle regions. SUBCYCLE_REMAP = 104 # remaps a DYNAMICS_TRACE to use time_norm instead # of the raw time axis (resets to 0 each subcycle). # Precedes the dynamics function evaluation. ``` #### 1.2 Edge semantics Edges are typed and carry dependency information: ```python class EdgeKind(IntEnum): """Edge types in the model graph.""" PARAM_INPUT = 0 # parameter flows into a component or expression TRACE_INPUT = 1 # dynamics trace flows into PARAM_PLUS_TRACE BASE_INPUT = 2 # base param flows into PARAM_PLUS_TRACE ADDEND = 3 # component output flows into SUM SPECTRUM_INPUT = 4 # accumulated spectrum flows into SPECTRUM_FED_OP EXPR_REF = 5 # parameter reference within an expression ``` #### 1.3 Graph structure ```python @dataclass class GraphNode: """One node in the model graph.""" id: int # unique node ID kind: NodeKind name: str # human-readable name (e.g. "GLP_01_A", "GLP_01") source_order: int # stable ordering key for deterministic scheduling. # When built from YAML: component definition order. # When built from UI: insertion order in the canvas. # schedule_2d uses this as tie-breaker when # topological sort has multiple valid orderings. # Payload (interpretation depends on kind): value: float | None # for STATIC_PARAM, OPT_PARAM: initial value function_name: str | None # for COMPONENT_EVAL, SPECTRUM_FED_OP, DYNAMICS_TRACE, # CONVOLUTION: the function registry name # (e.g. "GLP", "Shirley", "expFun", "gaussCONV"). # This is the graph-level function identity -- it # works across all domains (energy, time, profile). # Backend-specific compilers map this to their own # op enums (e.g. schedule_2d maps "GLP" -> OpKind.GLP). package: str | None # which function module: "energy", "time", "profile". # Together with function_name, uniquely identifies # the callable. Needed because function names could # theoretically collide across packages. expr_string: str | None # for EXPRESSION: the expression source vary: bool # for OPT_PARAM: whether optimizer can change it bounds: tuple[float, float] | None # for OPT_PARAM: (min, max) bounds arrays: dict[str, np.ndarray] # node-local array data. Examples: # SUBCYCLE_MASK: {"time_n_sub": array} # SUBCYCLE_REMAP: {"time_norm": array} # PROFILE_SAMPLE: {"aux_axis": array} # CONVOLUTION: {"kernel_time": array} # Empty dict for nodes that need no array payload. # This is graph-level data, not backend-specific -- # the scheduler copies what it needs into the plan. @dataclass class GraphEdge: """One edge in the model graph.""" source: int # source node ID target: int # target node ID kind: EdgeKind position: int | None # for PARAM_INPUT: which positional arg (0, 1, 2, ...) class DomainKind(IntEnum): """Model domain classification. Determined by which axes the model operates on: - ENERGY_1D: model has energy axis only. This is the starting state for all energy models (fit_baseline, fit_spectrum). COMPONENT_EVAL nodes evaluate energy functions over (n_energy,). - TIME_1D: model has time axis only. Used for standalone Dynamics models. COMPONENT_EVAL nodes evaluate time functions over (n_time,). This domain is valid in the GraphIR for semantic completeness, graph tooling, and future model-builder/UI workflows, but has no compiled backend in the current scope. - ENERGY_TIME_2D: model has both axes. Created when dynamics are added to an ENERGY_1D model. COMPONENT_EVAL nodes evaluate energy functions over (n_time, n_energy) via broadcasting. """ ENERGY_1D = 0 # energy-resolved only (e.g. fit_baseline, fit_spectrum) TIME_1D = 1 # time-resolved only (e.g. standalone dynamics fit) ENERGY_TIME_2D = 2 # energy + time resolved (the 2D fit case) @dataclass class GraphIR: """Directed acyclic graph representing a model. Axis-agnostic: works for 1D and 2D models. A 1D energy model has time=None; adding dynamics populates time and promotes domain to ENERGY_TIME_2D. Backend-specific compilers (schedule_2d, etc.) check the domain before compiling. """ nodes: list[GraphNode] edges: list[GraphEdge] domain: DomainKind # what axes this model operates on energy: np.ndarray | None # (n_energy,) or None time: np.ndarray | None # (n_time,) or None # Reverse map for quick lookup node_by_name: dict[str, int] # name -> node ID ``` #### 1.4 Example: `simple_energy` model as a graph The YAML model: ```yaml simple_energy: Offset: y0: [2, True, 0, 5] Shirley: pShirley: [4.0E-4, False] GLP: A: [20, True, 5, 25] x0: [84.5, True, 82, 88] F: [1.0, True, 0.75, 2.5] m: [0.3, True, 0, 1] GLP: A: [17, True, 5, 25] x0: [88.1, True] F: [1.0, True, 0.75, 2.5] m: [0.3, True, 0, 1] ``` Becomes: ``` Nodes: 0: OPT_PARAM "Offset_y0" value=2.0 1: STATIC_PARAM "Shirley_pShirley" value=4e-4 2: OPT_PARAM "GLP_01_A" value=20.0 3: OPT_PARAM "GLP_01_x0" value=84.5 4: OPT_PARAM "GLP_01_F" value=1.0 5: OPT_PARAM "GLP_01_m" value=0.3 6: OPT_PARAM "GLP_02_A" value=17.0 7: OPT_PARAM "GLP_02_x0" value=88.1 8: OPT_PARAM "GLP_02_F" value=1.0 9: OPT_PARAM "GLP_02_m" value=0.3 10: COMPONENT_EVAL "GLP_01" function_name="GLP" package="energy" 11: COMPONENT_EVAL "GLP_02" function_name="GLP" package="energy" 12: SUM "peak_sum" 13: COMPONENT_EVAL "Offset" function_name="Offset" package="energy" 14: SPECTRUM_FED_OP "Shirley" function_name="Shirley" package="energy" 15: SUM "total" Edges: 2 -> 10 PARAM_INPUT(pos=0) # A -> GLP_01 3 -> 10 PARAM_INPUT(pos=1) # x0 -> GLP_01 4 -> 10 PARAM_INPUT(pos=2) # F -> GLP_01 5 -> 10 PARAM_INPUT(pos=3) # m -> GLP_01 6 -> 11 PARAM_INPUT(pos=0) # A -> GLP_02 7 -> 11 PARAM_INPUT(pos=1) # x0 -> GLP_02 8 -> 11 PARAM_INPUT(pos=2) # F -> GLP_02 9 -> 11 PARAM_INPUT(pos=3) # m -> GLP_02 10 -> 12 ADDEND # GLP_01 -> peak_sum 11 -> 12 ADDEND # GLP_02 -> peak_sum 0 -> 13 PARAM_INPUT(pos=0) # y0 -> Offset 12 -> 14 SPECTRUM_INPUT # peak_sum -> Shirley (Shirley consumes spectrum) 1 -> 14 PARAM_INPUT(pos=0) # pShirley -> Shirley 12 -> 15 ADDEND # peak_sum -> total 13 -> 15 ADDEND # Offset -> total 14 -> 15 ADDEND # Shirley -> total ``` Notice: Offset has no SPECTRUM_INPUT edge because it does not consume the accumulated spectrum. LinBack also does not -- it receives `spectrum` in the current Python signature for interface uniformity, but ignores it. Only Shirley has a true data dependency on the peak sum. #### 1.5 Time-dependent model as a graph When `GLP_01_A` has a dynamics model (`expFun` with params `A`, `tau`, `t0`, `y0`): ``` Additional nodes: 20: OPT_PARAM "GLP_01_A_expFun_01_A" value=5.0 21: OPT_PARAM "GLP_01_A_expFun_01_tau" value=100.0 22: OPT_PARAM "GLP_01_A_expFun_01_t0" value=0.0 23: STATIC_PARAM "GLP_01_A_expFun_01_y0" value=0.0 24: DYNAMICS_TRACE "GLP_01_A_dynamics" function_name="expFun" package="time" 25: PARAM_PLUS_TRACE "GLP_01_A_resolved" Additional edges: 20 -> 24 PARAM_INPUT(pos=0) # A -> dynamics 21 -> 24 PARAM_INPUT(pos=1) # tau -> dynamics 22 -> 24 PARAM_INPUT(pos=2) # t0 -> dynamics 23 -> 24 PARAM_INPUT(pos=3) # y0 -> dynamics 2 -> 25 BASE_INPUT # GLP_01_A base value 24 -> 25 TRACE_INPUT # dynamics trace 25 -> 10 PARAM_INPUT(pos=0) # resolved A -> GLP_01 (replaces edge 2->10) ``` #### 1.6 Expression model as a graph When `GLP_02_A = "3/4*GLP_01_A"`: ``` Node 6 becomes: 6: EXPRESSION "GLP_02_A" expr_string="3/4*GLP_01_A" Additional edge: 2 -> 6 EXPR_REF # GLP_01_A referenced in expression ``` The expression node's output feeds into GLP_02 as before via `6 -> 11 PARAM_INPUT(pos=0)`. If `GLP_01_A` is time-dependent (has a PARAM_PLUS_TRACE), then node 6 references the resolved value (node 25), and the expression output is also `(n_time,)`. If `GLP_01_A` is profiled, the interpreter semantics are different: the expression must be re-evaluated at each aux-axis point before the component is evaluated and averaged. In the graph, this is represented by per-sample `EXPRESSION` nodes (for example `GLP_02_A_profile_expr_0`, `GLP_02_A_profile_expr_1`, ...) whose `EXPR_REF` edges point to the matching per-sample `PROFILE_SAMPLE` nodes for `GLP_01_A`. Those per-sample expression nodes feed per-sample `COMPONENT_EVAL` nodes, and the component-level `PROFILE_AVERAGE` averages the resulting traces. #### 1.7 Why a graph, not a two-bucket sort The previous version of this doc sorted components into "independent" and "spectrum-dependent" buckets. This is wrong for two reasons: 1. **It conflates dependency class with evaluation order.** Offset and LinBack are currently classified as `SPECTRUM_DEP` because they have `spectrum` in their Python signature, but neither actually reads it. Only Shirley does. The graph makes this explicit: only Shirley has a `SPECTRUM_INPUT` edge. 2. **Reordering risks changing semantics.** The current interpreter combines components in LIFO order (mcp.py:806). Background components are defined first in YAML, peaks last, so the reverse iteration evaluates peaks before backgrounds. This ordering is validated: `background_last` in the test YAML is expected to fail. The graph preserves the original model semantics via edges, and the scheduler derives a safe topological order. ### 2. `can_lower_2d(graph) -> bool` Note the input: `can_lower_2d` operates on the GraphIR, not the OOP model. This means "can the 2D NumPy backend compile this graph," not "can this model be represented as a graph at all." Every model can be a graph; not every graph can be compiled to the 2D fast evaluator yet. A 1D-only graph (``ENERGY_1D`` or ``TIME_1D``) returns False here. A future `can_lower_1d` would target ``ENERGY_1D`` models; ``TIME_1D`` standalone dynamics graphs remain graph-valid but backend-out-of-scope for now. **Compilable in v1:** - All `COMPONENT_EVAL` nodes have a `function_name` that `schedule_2d` can map to a supported `OpKind` with verified broadcast semantics. v1 list: `Gauss`, `GaussAsym`, `Lorentz`, `Voigt`, `GLS`, `GLP`, `DS`, `Offset`, `LinBack`. Offset and LinBack are `COMPONENT_EVAL`, not `SPECTRUM_FED_OP` -- they do not consume the accumulated spectrum (see section 4). - `SPECTRUM_FED_OP` nodes: `Shirley` only (the sole function that truly reads the accumulated peak sum) - All `EXPRESSION` nodes contain only arithmetic operations (add, sub, mul, div, neg, pow, literal constants, parameter references) - `CONVOLUTION`, `PROFILE_*`, and `SUBCYCLE_*` nodes are all now compilable (Phases 6.2-6.4). Convolution is gated per-node by ``_is_lowerable_convolution_2d`` (time-domain kernel, resolved-trace topology); subcycle nodes are compiled away in ``schedule_2d`` into per-substep ``dyn_sub_time_axes`` / ``dyn_sub_masks`` schedule arrays. - `graph.domain == ENERGY_TIME_2D` (has both energy and time axes) **Falls back to interpreter when `can_lower_2d` returns False:** - 1D-only models (domain != ENERGY_TIME_2D; future `can_lower_1d` would target `ENERGY_1D`, not `TIME_1D`, in the current roadmap) - Non-arithmetic expressions - Convolution shapes outside the resolved-trace contract encoded in ``_is_lowerable_convolution_2d`` (e.g. spectrum-level ``comp_type="conv"``, non-time kernels) Phases 6.2-6.4 folded `CONVOLUTION`, `PROFILE_*`, and `SUBCYCLE_*` into the 2D backend. A resolved-trace (``subcycle=0``) IRF coexists with subcycle-aware dynamics in the same model; only ``subcycle>0`` substeps carry ``time_norm`` / ``time_n_sub`` schedule arrays. ### 3. ScheduledPlan2D -- the compiled 2D execution schedule The scheduler takes a GraphIR and produces a flat, packed-array execution plan. No Python objects in the hot path. #### 3.1 Storage model: uniform `(n_params, n_time)` trace matrix The previous version of this doc stored some params as scalars and some as `(n_time,)` arrays, which forces `dtype=object` or runtime branching. Instead, we use a uniform storage model: **All parameters are stored as `(n_time,)` traces in a dense matrix.** - Static params: broadcast to `(n_time,)` by repeating the scalar value. This wastes ~8 bytes * n_time per static param (e.g. 3.5 KB for 440 time points) but eliminates all scalar-vs-array branching. - Optimizer params: same -- broadcast to `(n_time,)`. - Time-dependent params: naturally `(n_time,)` already. - Expression results: `(n_time,)` regardless of whether inputs are time-varying (if all inputs are constant, the output is constant across time, but stored as a repeated vector anyway). The scratch matrix has shape `(n_params, n_time)` and dtype `float64`. Component evaluation gathers rows from this matrix and reshapes them to `(n_time, 1)` for broadcasting against `(1, n_energy)`. Memory cost: for a model with 20 parameters and 440 time points, the scratch matrix is 20 * 440 * 8 = 69 KB. Negligible. #### 3.2 Data structures ```python class OpKind(IntEnum): """2D backend component function op codes. This is the *backend-specific* lowered enum, not the graph-level function identity. schedule_2d maps GraphNode.function_name to OpKind during compilation (e.g. "GLP" -> GLP, "Shirley" -> SHIRLEY). A future 1D backend would have its own enum with time-domain ops. """ GAUSS = 0 GAUSS_ASYM = 1 LORENTZ = 2 GLS = 3 GLP = 4 DS = 5 OFFSET = 10 LINBACK = 11 SHIRLEY = 12 @dataclass(frozen=True) class ScheduledPlan2D: """Compiled 2D execution schedule. No Python objects in the hot path (except ``expr_programs``). """ energy: np.ndarray # (n_energy,) time: np.ndarray # (n_time,) n_params: int # total parameter count (all types) n_time: int # len(time) # --- Parameter mapping --- # Initial trace matrix: param_traces[i, :] is the base trace for param i. # For static/opt params, all n_time values are identical. # For time-dep params, the trace is base + dynamics_trace. param_traces_init: np.ndarray # (n_params, n_time) initial values # Which rows are optimizer-visible (overwritten from theta each call): opt_indices: np.ndarray # (n_opt,) int -- row indices into trace matrix opt_param_names: list[str] # (n_opt,) canonical optimizer parameter names # defines the order theta must follow # --- Dynamics subgraphs (grouped by target PARAM_PLUS_TRACE) --- # Each dynamics group corresponds to one time-dependent parameter. # Multiple dynamics components (e.g. bi-exponential terms) feeding # the same PARAM_PLUS_TRACE are grouped together. Substeps within a # group are indexed via the CSR-style dyn_group_indptr. n_dyn_groups: int dyn_group_target_row: np.ndarray # (n_dyn_groups,) int dyn_group_base_row: np.ndarray # (n_dyn_groups,) int dyn_group_indptr: np.ndarray # (n_dyn_groups + 1,) int -- CSR into substep arrays dyn_sub_func_id: np.ndarray # (n_substeps,) int dyn_sub_param_rows: np.ndarray # (n_substeps, max_dyn_params) int, -1 padded dyn_sub_n_params: np.ndarray # (n_substeps,) int # --- Expression evaluation --- n_expressions: int expr_target_rows: np.ndarray # (n_expressions,) int -- which row to write expr_programs: list["ExprProgram"] # compiled RPN programs # --- Interleaved parameter resolution schedule --- # Dynamics groups and expressions may depend on each other # (for example, a dynamics parameter expressed in terms of another # dynamics parameter). These arrays encode the correct topological # execution order: # kind=0 -> dynamics group step, index into dyn_group_* arrays # kind=1 -> expression step, index into expr_* arrays / expr_programs resolution_kinds: np.ndarray # (n_dyn_groups + n_expressions,) int8 resolution_indices: np.ndarray # (n_dyn_groups + n_expressions,) int # --- Scheduled component ops --- # Components in topologically-sorted execution order (derived from graph edges). # NOT a naive "peaks then backgrounds" reorder -- preserves model semantics. n_ops: int op_schedule: np.ndarray # (n_ops,) int -- execution order indices op_kinds: np.ndarray # (n_ops,) OpKind int codes # For each op: which rows in trace matrix are its parameter inputs. # Stored as CSR (compressed sparse row) for variable-length param lists: op_param_indptr: np.ndarray # (n_ops + 1,) int -- CSR row pointers op_param_indices: np.ndarray # (total_op_params,) int -- row indices # Which ops need the accumulated spectrum as input: op_needs_spectrum: np.ndarray # (n_ops,) bool # Accumulation targets: which ops contribute to the "peak sum" that # spectrum-fed ops consume. Derived from graph SUM/SPECTRUM_INPUT edges. op_is_pre_spectrum: np.ndarray # (n_ops,) bool -- contributes to peak_sum ``` #### 3.3 Expression programs ```python class ExprNodeKind(IntEnum): """RPN instruction types.""" CONST = 0 # push literal float PARAM_REF = 1 # push trace matrix row (by index) ADD = 2 # pop 2, push sum SUB = 3 # pop 2, push difference MUL = 4 # pop 2, push product DIV = 5 # pop 2, push quotient NEG = 6 # pop 1, push negation POW = 7 # pop 2, push power @dataclass(frozen=True) class ExprProgram: """Compiled expression: flat int array encoding an RPN program.""" # Encoding: pairs of (node_kind, operand). # CONST: operand is float bits (np.float64.view(np.int64)) # PARAM_REF: operand is row index into trace matrix # Operators: operand is 0 (unused) instructions: np.ndarray # (2 * n_instructions,) int64 ``` All values flowing through the RPN evaluator are `(n_time,)` arrays (rows from the trace matrix). Constants are broadcast to `(n_time,)`. This means expression evaluation is uniform -- no scalar/array branching. **What's out of scope (v1):** Function calls (`np.exp`, `np.log`), conditionals, string operations. These would extend `ExprNodeKind`. If encountered during graph construction, those nodes stay in the graph but `can_lower_2d()` returns False. ### 4. `build_graph(model) -> GraphIR` Walks the OOP tree once and emits the graph. #### Algorithm ``` build_graph(model): 1. Create parameter nodes For each component in model.components: For each Par in component.pars: - If expression: create EXPRESSION node - If par.t_vary: create OPT_PARAM (base), DYNAMICS_TRACE, PARAM_PLUS_TRACE nodes, plus OPT/STATIC nodes for dynamics params - If par.vary == False: create STATIC_PARAM node - Else: create OPT_PARAM node 2. Create expression edges For each EXPRESSION node: - Parse expr_string to extract referenced parameter names - Add EXPR_REF edge from each referenced param node to this node - If any referenced param has a PARAM_PLUS_TRACE, reference that instead (resolved time-dependent value) 3. Create component nodes For each component in model.components: - Create COMPONENT_EVAL or SPECTRUM_FED_OP node based on: * Shirley: SPECTRUM_FED_OP (it actually reads spectrum) * Offset, LinBack: COMPONENT_EVAL (they don't read spectrum despite Python signature; LinBack uses x/params only, Offset is pure constant) * Others: COMPONENT_EVAL - Add PARAM_INPUT edges from resolved param nodes to component, with position matching function signature order 4. Create convolution, profile, and subcycle nodes (when present) For each component with comp_type == "conv": - Create CONVOLUTION node - Add ADDEND edge from accumulated signal (the SUM being built) - Add PARAM_INPUT edges from kernel parameters For each component with a profiled parameter (`p_vary`) or an expression parameter that references a profiled parameter (`expr_refs_profile_dep`): - For each direct `p_vary` parameter: create PROFILE_SAMPLE nodes for each aux_axis point - Add PARAM_INPUT edges into each PROFILE_SAMPLE (base + profile-model params for that aux point) - For each `expr_refs_profile_dep` parameter: create per-sample EXPRESSION nodes, with EXPR_REF edges to the matching per-sample PROFILE_SAMPLE nodes of the referenced profiled parameter - Create a per-sample COMPONENT_EVAL node for each aux_axis point, using PROFILE_SAMPLE / per-sample EXPRESSION inputs where needed - Create one component-level PROFILE_AVERAGE node with ADDEND edges from the per-sample COMPONENT_EVAL nodes - Replace the original component in the combination graph with this PROFILE_AVERAGE node For each dynamics model with subcycle != 0: - Create SUBCYCLE_REMAP node before the DYNAMICS_TRACE - Create SUBCYCLE_MASK node after the DYNAMICS_TRACE - Store time_norm and time_n_sub in node.arrays (e.g. SUBCYCLE_REMAP.arrays["time_norm"], SUBCYCLE_MASK.arrays["time_n_sub"]) These nodes are fully represented in the graph. can_lower_2d() will return False if any are present, but the graph is still valid and can be interpreted or compiled by a future backend. 5. Create combination nodes - Analyze the model's combine order (LIFO from mcp.py:806) - Create SUM node for peak accumulation - For SPECTRUM_FED_OP nodes (Shirley): add SPECTRUM_INPUT edge from the peak SUM node - Create final SUM node that adds everything together - Edges preserve the original combine semantics 6. Assign source_order to all nodes - Monotonically increasing, following the order nodes were created during the walk. For YAML-built models this reflects component definition order. For UI-built graphs, the UI sets source_order at node creation time. 7. Return GraphIR ``` #### Offset and LinBack reclassification The current interpreter treats all background functions uniformly (`comp_type == "back"` -> pass `spectrum=value`). But examining the actual functions: - `Offset(x, y0, spectrum)`: returns `np.full_like(spectrum, y0)`. Uses `spectrum` only for shape. With uniform `(n_time, n_energy)` arrays, this is just a broadcast scalar -- no spectrum dependency. - `LinBack(x, m, b, xStart, xStop, spectrum)`: returns a piecewise function of `x`, `m`, `b`, `xStart`, `xStop`. The `spectrum` argument is never read. Pure function of its own params + energy axis. - `Shirley(x, pShirley, spectrum)`: returns `pShirley * cumsum(spectrum[::-1])[::-1]`. Truly reads `spectrum`. In the graph, only Shirley gets a `SPECTRUM_INPUT` edge. Offset and LinBack become regular `COMPONENT_EVAL` nodes (type `INDEPENDENT` in the old terminology). This is semantically correct and allows them to be evaluated in the first pass alongside peaks. ### 5. `schedule_2d(graph) -> ScheduledPlan2D` Compiles the graph into a flat execution schedule. The v1 backend is a **specialized lowering target**, not a generic DAG executor. The GraphIR is general-purpose (and a future UI will build arbitrary graphs), but the current scheduler and evaluator collapse the graph to a fixed-shape pipeline: parameter traces -> component ops -> peak_sum -> optional spectrum-fed ops -> final sum. This is intentional -- it keeps the evaluator simple and fast. As the compiler supports more node types, the pipeline shape may grow, but it should always be a concrete schedule, never a runtime graph walker. #### Algorithm ``` schedule_2d(graph): 1. Topological sort of all nodes Tie-breaker: when two nodes have no dependency ordering between them, sort by node.source_order (lower first). This makes the schedule deterministic regardless of whether the graph came from YAML parsing or a UI drag-and-drop canvas. 2. Assign trace matrix rows - One row per parameter (STATIC_PARAM, OPT_PARAM, resolved PARAM_PLUS_TRACE, EXPRESSION output) - Contiguous: opt params first (so theta maps to a slice), then static, then computed 3. Compile grouped dynamics subgraphs - For each DYNAMICS_TRACE node: * Record function ID (expFun -> 0, sinFun -> 1, etc.) * Record param row indices (from PARAM_INPUT edges) * Record the PARAM_PLUS_TRACE target row and its base row - Group all DYNAMICS_TRACE nodes feeding the same PARAM_PLUS_TRACE into one dynamics group - Pack groups as CSR-style arrays: dyn_group_target_row / dyn_group_base_row / dyn_group_indptr, plus flat dyn_sub_* arrays for the grouped substeps 4. Compile expressions - For each EXPRESSION node (in topological order): * Parse expr_string into RPN using Python ast module * Reject unsupported AST nodes (function calls, etc.) * Resolve param references to trace matrix row indices (using EXPR_REF edges as the source of truth) * Encode as flat int64 array 5. Build interleaved parameter-resolution schedule - Walk topo order and emit: * expression steps when EXPRESSION nodes appear * one dynamics-group step at the last DYNAMICS_TRACE node in each group, so all prerequisite expressions are already resolved before that group is evaluated - Store as resolution_kinds / resolution_indices 6. Schedule component ops - Topological order from graph edges determines execution order - For each COMPONENT_EVAL / SPECTRUM_FED_OP node: * Record OpKind * Record param row indices (from PARAM_INPUT edges, CSR-encoded) * Record whether it needs spectrum input (has SPECTRUM_INPUT edge) - Derive op_is_pre_spectrum: which ops contribute to the accumulated peak sum (inputs to the SUM node that feeds SPECTRUM_FED_OP nodes) 7. Initialize trace matrix - For static params: fill row with repeated scalar - For opt params: fill row with repeated initial value - Replay the same interleaved resolution schedule used by the runtime evaluator: * dynamics group -> base + sum(substep traces) * expression -> evaluate RPN program against current trace matrix 8. Pack into ScheduledPlan2D, return ``` #### Critical invariants - `schedule_2d()` is called **once** before the optimizer starts. The plan is immutable during optimization (except the trace matrix scratch space, which is a working copy). - The plan does not hold references to any `Model`, `Component`, or `Par` objects. It is pure data. - Optimizer parameter ordering is defined by `opt_param_names` stored in the plan (see below). This is the canonical order -- `extract_theta` must return values in this order, and result writeback must use it. At construction time, `opt_param_names` is derived from `model.parameter_names` / `model.lmfit_pars` (only the vary=True subset), so the contract is explicit rather than implicit. - The execution order preserves model semantics: the scheduler derives it from graph edges with stable tie-breaking on original definition order, not from a coarse bucket sort. ### 6. `evaluate_2d(plan, theta) -> ndarray` Pure function. Takes the scheduled plan and optimizer parameter vector, returns `(n_time, n_energy)` spectrum. #### Algorithm ``` evaluate_2d(plan, theta): 1. PARAMETER RESOLUTION (once per call) a. Copy plan.param_traces_init -> traces (n_params, n_time) scratch b. Broadcast optimizer params into trace matrix: traces[plan.opt_indices, :] = theta[:, np.newaxis] c. Execute the interleaved resolution schedule: For each step in (resolution_kinds, resolution_indices): - If kind == dynamics group: * target = dyn_group_target_row[idx] * start from traces[target, :] = traces[base_row, :] * for each grouped substep: - Gather params from traces[dyn_sub_param_rows[s], 0] - Call dynamics function(plan.time, *p) -> (n_time,) trace - Accumulate into the target row - If kind == expression: * Execute RPN against traces matrix * All operands are (n_time,) rows; arithmetic broadcasts * Write result row: traces[expr_target_rows[idx], :] = result 2. COMPONENT EVALUATION (in scheduled order) result = zeros(n_time, n_energy) peak_sum = zeros(n_time, n_energy) For each op in plan.op_schedule: - Gather params: rows from traces matrix via CSR indices -> reshape each from (n_time,) to (n_time, 1) for broadcasting - If NOT op_needs_spectrum[op]: component = eval_op(plan.energy, params, plan.op_kinds[op]) result += component if op_is_pre_spectrum[op]: peak_sum += component - If op_needs_spectrum[op]: component = eval_op_with_spectrum( plan.energy, params, peak_sum, plan.op_kinds[op]) result += component 3. Return result # (n_time, n_energy) ``` #### Component evaluation dispatch Each `OpKind` maps directly to the shared kernels in `src/trspecfit/functions/energy.py`, which are the single source of truth for component math. Peak functions broadcast naturally with `(n_time, 1)` params and `(1, n_energy)` energy; `Offset`, `LinBack`, and `Shirley` are written to support both 1D and broadcasted 2D inputs: ```python _OP_DISPATCH = { OpKind.GAUSS: (fcts_energy.Gauss, 3, False), OpKind.GLP: (fcts_energy.GLP, 4, False), OpKind.OFFSET: (fcts_energy.Offset, 1, False), OpKind.LINBACK: (fcts_energy.LinBack, 4, False), OpKind.SHIRLEY: (fcts_energy.Shirley, 1, True), } ``` Shirley should use axis-agnostic last-axis operations (`axis=-1`) so the same implementation works for both 1D and batched inputs. #### Dynamics function dispatch Dynamics functions are called once per residual call on the full time axis. The evaluator maps `dyn_sub_func_id` to the existing functions from `functions/time.py`: ```python DYNAMICS_DISPATCH = { 0: fcts_time.expFun, # (t, A, tau, t0, y0) -> (n_time,) 1: fcts_time.sinFun, # (t, A, f, phi, t0, y0) -> (n_time,) 2: fcts_time.linFun, # (t, m, t0, y0) -> (n_time,) ... } ``` These are the same functions, called once with full arrays. No per-step loop. ### 7. Integration with lmfit #### Current call chain (2D fits) ``` lmfit.minimize(residual_fun, params, ...) -> residual_fun(params, x, data, ..., args=(model, 2)) -> par_extract(params) -> list of values -> fit_model_mcp(x, par_values, True, model, 2) -> model.update_value(par_values) # write theta into lmfit.Parameters -> model.create_value_2d() # THE HOT PATH (interpreter loop) -> for each time step: create_value_1d(t_ind=ti) -> for each component: Component.value(t_ind) -> for each par: Par.value(t_ind) <-- 2M calls -> return model.value_2d -> residual = data - fit -> return residual.flatten() ``` #### New call chain (implemented) ``` # Default project setting: project.spec_fun_str = "fit_model_gir" # Before a lowerable fit (1D or 2D): graph = build_graph(model) plan = schedule_2d(graph) # or schedule_1d(graph) for ENERGY_1D theta_indices = precompute_indices(model.parameter_names, plan.opt_param_names) # During fitting: lmfit.minimize(residual_fun, params, ..., args=(plan, theta_indices, model, dim)) -> residual_fun(params, x, data, ..., fit_fun_str="fit_model_gir") -> par_extract(params) -> full par_values list -> fit_model_gir(x, par_values, True, plan, theta_indices, model, dim) -> theta = par_values[theta_indices] -> spectrum = evaluate_2d(plan, theta) # or evaluate_1d for 1D -> residual = data - spectrum -> return residual.flatten() ``` **Compiled-path args contract:** the compiled path always passes ``(plan, theta_indices, model, dim)`` as *args*, regardless of whether the plan is ``ScheduledPlan1D`` or ``ScheduledPlan2D``. The *model* and *dim* are carried so that ``fit_model_gir`` can fall back to the interpreter when needed (e.g. ``plot_sum=False`` for 1D component extraction). The non-lowerable fallback passes ``(model, dim)`` only. Implemented behavior: - `fit_model_gir` is a dispatcher. When `args[0]` is a compiled plan it destructures `(plan, theta_indices, model, dim)` and uses the fast evaluator. For 1D plans with `plot_sum=False` it falls back to `fit_model_mcp` for component extraction. When `args[0]` is not a plan it forwards to `fit_model_mcp`. - `fit_model_mcp` remains the explicit interpreter mode. - `fit_model_compare` is the validation mode. For lowerable fits (1D or 2D) it runs GIR and interpreter, compares with `assert_allclose`, and returns the GIR result. - `File.fit_2d`, `File.fit_baseline`, and `File.fit_spectrum` build a graph / plan when `spec_fun_str` is `fit_model_gir` or `fit_model_compare`. - After fitting, all three methods write `result[1].params` back into `model.lmfit_pars` via `par_extract` + `update_value`, because `fit_wrapper` optimizes a deepcopy and the GIR path does not mutate model state on every residual call. ### 8. Scope boundaries -- what compiles in v1 vs later | Feature | v1 compiler | Graph representable | Notes | |---|---|---|---| | Additive peaks (Gauss, GLP, GLS, DS, Voigt, etc.) | Yes | Yes | Core use case | | Offset, LinBack | Yes | Yes | Reclassified as COMPONENT_EVAL (no spectrum dep) | | Shirley | Yes | Yes | SPECTRUM_FED_OP using last-axis cumulative sum | | Arithmetic expressions | Yes | Yes | Compiled to RPN | | Time-dependent params (Dynamics) | Yes | Yes | Dynamics subgraph compiled | | Voigt | Yes | Yes | Broadcast-safe on the current evaluator path | | Convolution components | Yes (Phase 6.3) | Yes (CONVOLUTION node) | Resolved-trace time-domain IRF; gated by ``_is_lowerable_convolution_2d`` | | Profile-varying params | Yes (Phase 6.2) | Yes (PROFILE_* nodes) | 1D and 2D backends | | Subcycle dynamics | Yes (Phase 6.4) | Yes (SUBCYCLE_* nodes) | Compiled into per-substep ``dyn_sub_time_axes`` / ``dyn_sub_masks`` | | Non-arithmetic expressions | No | Partial | Would extend ExprNodeKind | | Project-level fits | No | Deferred | Multi-graph coordination | `can_lower_2d` and `can_lower_1d` are the gatekeepers for the 2D and 1D backends respectively. The graph is broader than any single compiler -- the UI can build graphs that no backend can compile yet. `can_lower_1d` / `schedule_1d` / `evaluate_1d` target `ENERGY_1D` models on the same GraphIR (implemented in Phase 6.1). ### 9. Validation strategy The compiled evaluator must produce results that agree with the interpreter to within reasonable floating-point tolerance. Because the compiled path sums components in a different order (graph-scheduled vs the interpreter's LIFO combine loop), and broadcasts 2D operations instead of accumulating per-time-step, floating-point summation order may differ. This means bitwise identity is not guaranteed -- `allclose` with a practical tolerance is the contract. ```python def validate_plan(model, plan): """Compare interpreter vs compiled evaluator output.""" # Evaluate via interpreter model.create_value_2d() interp_result = model.value_2d.copy() # Evaluate via plan theta = extract_theta(model.lmfit_pars) plan_result = evaluate_2d(plan, theta) # Compare -- rtol=1e-10 accounts for summation order differences. # For typical spectroscopy data (values 0-100), this means agreement # to ~10 significant digits. assert np.allclose(interp_result, plan_result, atol=1e-10, rtol=1e-10) ``` This validation runs: - In the test suite, for every test model that `can_lower_2d()` accepts - In explicit compare mode during fitting via `spec_fun_str="fit_model_compare"` - On fallback cases, compare mode delegates to the interpreter so behavior matches the non-GIR path ### 10. Future: backends and Jacobians Once `evaluate_2d(plan, theta) -> spectrum` exists as a pure function: **Numba:** `@njit` on the component eval dispatch loop. Most of the plan (int index arrays, CSR param maps, dense trace matrix) is Numba-compatible. The expression programs (`list[ExprProgram]`) would need flattening to a CSR-style encoding first -- this is noted in the plan as a v1 simplification that can be tightened later. **JAX:** Replace `np` with `jnp` in the evaluator. The plan's array structure maps directly to JAX arrays. Key wins: - `jax.jit` compiles the full evaluator (including parameter resolution) - `jax.jacfwd` / `jax.jacrev` gives analytic Jacobians for free - GPU acceleration for large grids **Analytic Jacobians:** With a differentiable evaluator, lmfit's Levenberg-Marquardt can use exact Jacobians instead of finite differences. This replaces `2 * n_free_params + 1` evaluator calls per iteration with 1 evaluator call + 1 Jacobian call. For 4 free params, that's 9 -> 2 calls, ~4.5x fewer evaluations per iteration. **Variable projection (VARPRO):** Linear parameters (amplitudes `A`, offset `y0`, slope `m`) can be solved in closed form given the nonlinear parameters. Reduces optimizer dimensionality. The graph makes identifying linear params straightforward: any param that appears as a linear factor in its component's function. --- ## Implementation tracking This document is the long-lived design/spec for the lowered evaluator. - The completed implementation history (all phases of the GIR / lowered evaluator rollout) is archived in [archive/lowered_evaluator_implementation.md](archive/lowered_evaluator_implementation.md). - The closed backend-decision benchmark write-up lives in [archive/numba_vs_jax.md](archive/numba_vs_jax.md).