Execution Plan: Lowered Evaluator
Implementation history (Phases 1-6): archive/lowered_evaluator_implementation.md.
Motivation
A 2D fit (400 energy x 440 time, 4 free params, 556 optimizer iterations) takes ~10 seconds. Profiling shows:
Category |
Self-time |
% of fit |
Cause |
|---|---|---|---|
Math (GLP + LinBack) |
3.2 s |
32% |
The actual numeric work |
Parameter overhead |
2.8 s |
28% |
|
Python dispatch |
1.6 s |
16% |
|
Other |
2.6 s |
24% |
lmfit internals, array ops, etc. |
Root cause: The code acts as an interpreter inside the optimizer loop.
create_value_2d (mcp.py:866) loops over time steps; at each step
Component.value (mcp.py:1518) loops over parameters; each Par.value
(mcp.py:1979) dispatches through 5 branches and calls par_extract which
calls valuesdict(). All of this repeats 440 x 556 = 245k times.
The irony: for time-dependent parameters, the full time trace already
exists in Dynamics.value_1d. But Par.value(t_ind) consumes it one
scalar at a time via base + self.t_model.value_1d[t_ind].
Goal: Stop interpreting the model on every residual call. Lower the model once into a flat, array-oriented execution plan, then evaluate it as a pure function.
Architecture overview
User API (unchanged) Future model-builder UI
| |
v v
Model / Component / Par <-- OOP tree (parsing, validation, user interaction)
| |
| build_graph(model) | (UI will target GraphIR directly)
v v
GraphIR <-- NEW: semantic IR (DAG), axis-agnostic
| works for 1D and 2D models
| can_lower_2d(graph) <-- gate: can the 2D backend compile this graph?
| schedule_2d(graph) <-- NEW: compile DAG to flat 2D execution schedule
v
ScheduledPlan2D <-- NEW: packed arrays, (n_params, n_time) traces
|
| evaluate_2d(plan, theta) <-- NEW: pure function, array-oriented
v
spectrum (n_time, n_energy)
Three-layer design:
OOP tree – the user-facing
Model/Component/Parobjects. Handles parsing, validation, user interaction. Unchanged.GraphIR – a directed acyclic graph of typed nodes with explicit data-dependency edges. This is the semantic representation of the model. Both the OOP tree and a future drag-and-drop model-builder UI can target it. The graph is axis-agnostic – it works for 1D energy models, 1D time models, and 2D time+energy models. A user builds a 1D energy model first (valid graph, no time axis), then adds dynamics (graph gains time-dependent nodes and a time axis), making it compilable by the 2D backend. Standalone
TIME_1Dgraphs are still valuable even without a lowered backend because they allow a future model-builder UI to author, validate, and save reusable dynamics-only YAML models.ScheduledPlan2D – a flat, packed-array execution schedule compiled from the graph by the 2D backend. No Python objects, no strings, no dicts in the hot path. This is what the evaluator actually runs. Backend-specific: a future
ScheduledPlan1D/can_lower_1d/evaluate_1dcan target the same GraphIR with a simpler storage model (no(n_params, n_time)trace matrix needed for 1D) forENERGY_1Dmodels (fit_baseline,fit_spectrum).TIME_1Dstandalone dynamics graphs remain graph-valid but are out of lowered backend scope for now.
Key principle: the OOP tree is for humans; the GraphIR is for semantics; the ScheduledPlan2D is for the optimizer.
Spec
1. GraphIR – the semantic intermediate representation
The GraphIR is a DAG (directed acyclic graph) of typed nodes connected by explicit dependency edges. It captures the full semantics of the model without prescribing an evaluation strategy.
1.1 Node types
class NodeKind(IntEnum):
"""Node types in the model graph."""
# --- Parameter nodes (leaves) ---
STATIC_PARAM = 0 # fixed value, never changes during fit
OPT_PARAM = 1 # optimizer-visible parameter (lmfit varies it)
# --- Computed parameter nodes ---
DYNAMICS_TRACE = 2 # time-dependent trace: evaluates a dynamics
# function over the full time axis, producing
# an (n_time,) array
PARAM_PLUS_TRACE = 3 # base_param + dynamics_trace -> (n_time,) resolved value
EXPRESSION = 4 # arithmetic expression referencing other params
# --- Component evaluation nodes ---
COMPONENT_EVAL = 5 # evaluates a component function over its domain axis:
# energy functions -> (n_energy,) or (n_time, n_energy)
# time functions -> (n_time,)
# profile functions -> (n_aux,)
# the domain is determined by the component's package
# (fcts_energy, fcts_time, fcts_profile), not by the
# graph's DomainKind
# --- Reduction / combination nodes ---
SUM = 6 # element-wise sum of multiple inputs
SPECTRUM_FED_OP = 7 # component that consumes accumulated spectrum
# as input (Shirley only in v1)
# --- Convolution and profile nodes ---
# These are part of the IR spec and build_graph emits them when the
# model uses these features. They cannot be compiled by the v1 2D
# backend (can_lower_2d returns False), but they are fully representable
# in the graph -- a future backend or the model-builder UI can work
# with them.
CONVOLUTION = 100 # convolves accumulated signal with a kernel
# component (e.g. gaussCONV). Edges: ADDEND from
# the accumulated signal, PARAM_INPUT from kernel
# parameters. Output replaces the accumulated signal.
PROFILE_SAMPLE = 101 # resolves one profiled parameter at one
# aux_axis point:
# base + profile.value_1d[aux_ind]
# One PROFILE_SAMPLE node per profiled parameter
# per aux_axis point.
PROFILE_AVERAGE = 102 # uniform average over per-sample COMPONENT_EVAL
# outputs for one component. Edges: ADDEND from
# each per-sample COMPONENT_EVAL node.
SUBCYCLE_MASK = 103 # element-wise multiply by time_n_sub mask array.
# Applied to a DYNAMICS_TRACE to zero out inactive
# subcycle regions.
SUBCYCLE_REMAP = 104 # remaps a DYNAMICS_TRACE to use time_norm instead
# of the raw time axis (resets to 0 each subcycle).
# Precedes the dynamics function evaluation.
1.2 Edge semantics
Edges are typed and carry dependency information:
class EdgeKind(IntEnum):
"""Edge types in the model graph."""
PARAM_INPUT = 0 # parameter flows into a component or expression
TRACE_INPUT = 1 # dynamics trace flows into PARAM_PLUS_TRACE
BASE_INPUT = 2 # base param flows into PARAM_PLUS_TRACE
ADDEND = 3 # component output flows into SUM
SPECTRUM_INPUT = 4 # accumulated spectrum flows into SPECTRUM_FED_OP
EXPR_REF = 5 # parameter reference within an expression
1.3 Graph structure
@dataclass
class GraphNode:
"""One node in the model graph."""
id: int # unique node ID
kind: NodeKind
name: str # human-readable name (e.g. "GLP_01_A", "GLP_01")
source_order: int # stable ordering key for deterministic scheduling.
# When built from YAML: component definition order.
# When built from UI: insertion order in the canvas.
# schedule_2d uses this as tie-breaker when
# topological sort has multiple valid orderings.
# Payload (interpretation depends on kind):
value: float | None # for STATIC_PARAM, OPT_PARAM: initial value
function_name: str | None # for COMPONENT_EVAL, SPECTRUM_FED_OP, DYNAMICS_TRACE,
# CONVOLUTION: the function registry name
# (e.g. "GLP", "Shirley", "expFun", "gaussCONV").
# This is the graph-level function identity -- it
# works across all domains (energy, time, profile).
# Backend-specific compilers map this to their own
# op enums (e.g. schedule_2d maps "GLP" -> OpKind.GLP).
package: str | None # which function module: "energy", "time", "profile".
# Together with function_name, uniquely identifies
# the callable. Needed because function names could
# theoretically collide across packages.
expr_string: str | None # for EXPRESSION: the expression source
vary: bool # for OPT_PARAM: whether optimizer can change it
bounds: tuple[float, float] | None # for OPT_PARAM: (min, max) bounds
arrays: dict[str, np.ndarray] # node-local array data. Examples:
# SUBCYCLE_MASK: {"time_n_sub": array}
# SUBCYCLE_REMAP: {"time_norm": array}
# PROFILE_SAMPLE: {"aux_axis": array}
# CONVOLUTION: {"kernel_time": array}
# Empty dict for nodes that need no array payload.
# This is graph-level data, not backend-specific --
# the scheduler copies what it needs into the plan.
@dataclass
class GraphEdge:
"""One edge in the model graph."""
source: int # source node ID
target: int # target node ID
kind: EdgeKind
position: int | None # for PARAM_INPUT: which positional arg (0, 1, 2, ...)
class DomainKind(IntEnum):
"""Model domain classification.
Determined by which axes the model operates on:
- ENERGY_1D: model has energy axis only. This is the starting state
for all energy models (fit_baseline, fit_spectrum). COMPONENT_EVAL
nodes evaluate energy functions over (n_energy,).
- TIME_1D: model has time axis only. Used for standalone Dynamics
models. COMPONENT_EVAL nodes evaluate time functions over
(n_time,). This domain is valid in the GraphIR for semantic
completeness, graph tooling, and future model-builder/UI workflows,
but has no compiled backend in the current scope.
- ENERGY_TIME_2D: model has both axes. Created when dynamics are
added to an ENERGY_1D model. COMPONENT_EVAL nodes evaluate energy
functions over (n_time, n_energy) via broadcasting.
"""
ENERGY_1D = 0 # energy-resolved only (e.g. fit_baseline, fit_spectrum)
TIME_1D = 1 # time-resolved only (e.g. standalone dynamics fit)
ENERGY_TIME_2D = 2 # energy + time resolved (the 2D fit case)
@dataclass
class GraphIR:
"""Directed acyclic graph representing a model.
Axis-agnostic: works for 1D and 2D models. A 1D energy model has
time=None; adding dynamics populates time and promotes domain to
ENERGY_TIME_2D. Backend-specific compilers (schedule_2d, etc.)
check the domain before compiling.
"""
nodes: list[GraphNode]
edges: list[GraphEdge]
domain: DomainKind # what axes this model operates on
energy: np.ndarray | None # (n_energy,) or None
time: np.ndarray | None # (n_time,) or None
# Reverse map for quick lookup
node_by_name: dict[str, int] # name -> node ID
1.4 Example: simple_energy model as a graph
The YAML model:
simple_energy:
Offset:
y0: [2, True, 0, 5]
Shirley:
pShirley: [4.0E-4, False]
GLP:
A: [20, True, 5, 25]
x0: [84.5, True, 82, 88]
F: [1.0, True, 0.75, 2.5]
m: [0.3, True, 0, 1]
GLP:
A: [17, True, 5, 25]
x0: [88.1, True]
F: [1.0, True, 0.75, 2.5]
m: [0.3, True, 0, 1]
Becomes:
Nodes:
0: OPT_PARAM "Offset_y0" value=2.0
1: STATIC_PARAM "Shirley_pShirley" value=4e-4
2: OPT_PARAM "GLP_01_A" value=20.0
3: OPT_PARAM "GLP_01_x0" value=84.5
4: OPT_PARAM "GLP_01_F" value=1.0
5: OPT_PARAM "GLP_01_m" value=0.3
6: OPT_PARAM "GLP_02_A" value=17.0
7: OPT_PARAM "GLP_02_x0" value=88.1
8: OPT_PARAM "GLP_02_F" value=1.0
9: OPT_PARAM "GLP_02_m" value=0.3
10: COMPONENT_EVAL "GLP_01" function_name="GLP" package="energy"
11: COMPONENT_EVAL "GLP_02" function_name="GLP" package="energy"
12: SUM "peak_sum"
13: COMPONENT_EVAL "Offset" function_name="Offset" package="energy"
14: SPECTRUM_FED_OP "Shirley" function_name="Shirley" package="energy"
15: SUM "total"
Edges:
2 -> 10 PARAM_INPUT(pos=0) # A -> GLP_01
3 -> 10 PARAM_INPUT(pos=1) # x0 -> GLP_01
4 -> 10 PARAM_INPUT(pos=2) # F -> GLP_01
5 -> 10 PARAM_INPUT(pos=3) # m -> GLP_01
6 -> 11 PARAM_INPUT(pos=0) # A -> GLP_02
7 -> 11 PARAM_INPUT(pos=1) # x0 -> GLP_02
8 -> 11 PARAM_INPUT(pos=2) # F -> GLP_02
9 -> 11 PARAM_INPUT(pos=3) # m -> GLP_02
10 -> 12 ADDEND # GLP_01 -> peak_sum
11 -> 12 ADDEND # GLP_02 -> peak_sum
0 -> 13 PARAM_INPUT(pos=0) # y0 -> Offset
12 -> 14 SPECTRUM_INPUT # peak_sum -> Shirley (Shirley consumes spectrum)
1 -> 14 PARAM_INPUT(pos=0) # pShirley -> Shirley
12 -> 15 ADDEND # peak_sum -> total
13 -> 15 ADDEND # Offset -> total
14 -> 15 ADDEND # Shirley -> total
Notice: Offset has no SPECTRUM_INPUT edge because it does not consume
the accumulated spectrum. LinBack also does not – it receives spectrum
in the current Python signature for interface uniformity, but ignores it.
Only Shirley has a true data dependency on the peak sum.
1.5 Time-dependent model as a graph
When GLP_01_A has a dynamics model (expFun with params A, tau,
t0, y0):
Additional nodes:
20: OPT_PARAM "GLP_01_A_expFun_01_A" value=5.0
21: OPT_PARAM "GLP_01_A_expFun_01_tau" value=100.0
22: OPT_PARAM "GLP_01_A_expFun_01_t0" value=0.0
23: STATIC_PARAM "GLP_01_A_expFun_01_y0" value=0.0
24: DYNAMICS_TRACE "GLP_01_A_dynamics" function_name="expFun" package="time"
25: PARAM_PLUS_TRACE "GLP_01_A_resolved"
Additional edges:
20 -> 24 PARAM_INPUT(pos=0) # A -> dynamics
21 -> 24 PARAM_INPUT(pos=1) # tau -> dynamics
22 -> 24 PARAM_INPUT(pos=2) # t0 -> dynamics
23 -> 24 PARAM_INPUT(pos=3) # y0 -> dynamics
2 -> 25 BASE_INPUT # GLP_01_A base value
24 -> 25 TRACE_INPUT # dynamics trace
25 -> 10 PARAM_INPUT(pos=0) # resolved A -> GLP_01 (replaces edge 2->10)
1.6 Expression model as a graph
When GLP_02_A = "3/4*GLP_01_A":
Node 6 becomes:
6: EXPRESSION "GLP_02_A" expr_string="3/4*GLP_01_A"
Additional edge:
2 -> 6 EXPR_REF # GLP_01_A referenced in expression
The expression node’s output feeds into GLP_02 as before via
6 -> 11 PARAM_INPUT(pos=0).
If GLP_01_A is time-dependent (has a PARAM_PLUS_TRACE), then node 6
references the resolved value (node 25), and the expression output is
also (n_time,).
If GLP_01_A is profiled, the interpreter semantics are different:
the expression must be re-evaluated at each aux-axis point before the
component is evaluated and averaged. In the graph, this is represented
by per-sample EXPRESSION nodes (for example
GLP_02_A_profile_expr_0, GLP_02_A_profile_expr_1, …) whose
EXPR_REF edges point to the matching per-sample PROFILE_SAMPLE
nodes for GLP_01_A. Those per-sample expression nodes feed per-sample
COMPONENT_EVAL nodes, and the component-level PROFILE_AVERAGE
averages the resulting traces.
1.7 Why a graph, not a two-bucket sort
The previous version of this doc sorted components into “independent” and “spectrum-dependent” buckets. This is wrong for two reasons:
It conflates dependency class with evaluation order. Offset and LinBack are currently classified as
SPECTRUM_DEPbecause they havespectrumin their Python signature, but neither actually reads it. Only Shirley does. The graph makes this explicit: only Shirley has aSPECTRUM_INPUTedge.Reordering risks changing semantics. The current interpreter combines components in LIFO order (mcp.py:806). Background components are defined first in YAML, peaks last, so the reverse iteration evaluates peaks before backgrounds. This ordering is validated:
background_lastin the test YAML is expected to fail. The graph preserves the original model semantics via edges, and the scheduler derives a safe topological order.
2. can_lower_2d(graph) -> bool
Note the input: can_lower_2d operates on the GraphIR, not the OOP model.
This means “can the 2D NumPy backend compile this graph,” not “can this
model be represented as a graph at all.” Every model can be a graph; not
every graph can be compiled to the 2D fast evaluator yet. A 1D-only graph
(ENERGY_1D or TIME_1D) returns False here. A future
can_lower_1d would target ENERGY_1D models; TIME_1D standalone
dynamics graphs remain graph-valid but backend-out-of-scope for now.
Compilable in v1:
All
COMPONENT_EVALnodes have afunction_namethatschedule_2dcan map to a supportedOpKindwith verified broadcast semantics. v1 list:Gauss,GaussAsym,Lorentz,Voigt,GLS,GLP,DS,Offset,LinBack. Offset and LinBack areCOMPONENT_EVAL, notSPECTRUM_FED_OP– they do not consume the accumulated spectrum (see section 4).SPECTRUM_FED_OPnodes:Shirleyonly (the sole function that truly reads the accumulated peak sum)All
EXPRESSIONnodes contain only arithmetic operations (add, sub, mul, div, neg, pow, literal constants, parameter references)CONVOLUTION,PROFILE_*, andSUBCYCLE_*nodes are all now compilable (Phases 6.2-6.4). Convolution is gated per-node by_is_lowerable_convolution_2d(time-domain kernel, resolved-trace topology); subcycle nodes are compiled away inschedule_2dinto per-substepdyn_sub_time_axes/dyn_sub_masksschedule arrays.graph.domain == ENERGY_TIME_2D(has both energy and time axes)
Falls back to interpreter when can_lower_2d returns False:
1D-only models (domain != ENERGY_TIME_2D; future
can_lower_1dwould targetENERGY_1D, notTIME_1D, in the current roadmap)Non-arithmetic expressions
Convolution shapes outside the resolved-trace contract encoded in
_is_lowerable_convolution_2d(e.g. spectrum-levelcomp_type="conv", non-time kernels)
Phases 6.2-6.4 folded CONVOLUTION, PROFILE_*, and SUBCYCLE_* into
the 2D backend. A resolved-trace (subcycle=0) IRF coexists with
subcycle-aware dynamics in the same model; only subcycle>0 substeps
carry time_norm / time_n_sub schedule arrays.
3. ScheduledPlan2D – the compiled 2D execution schedule
The scheduler takes a GraphIR and produces a flat, packed-array execution plan. No Python objects in the hot path.
3.1 Storage model: uniform (n_params, n_time) trace matrix
The previous version of this doc stored some params as scalars and some as
(n_time,) arrays, which forces dtype=object or runtime branching.
Instead, we use a uniform storage model:
All parameters are stored as (n_time,) traces in a dense matrix.
Static params: broadcast to
(n_time,)by repeating the scalar value. This wastes ~8 bytes * n_time per static param (e.g. 3.5 KB for 440 time points) but eliminates all scalar-vs-array branching.Optimizer params: same – broadcast to
(n_time,).Time-dependent params: naturally
(n_time,)already.Expression results:
(n_time,)regardless of whether inputs are time-varying (if all inputs are constant, the output is constant across time, but stored as a repeated vector anyway).
The scratch matrix has shape (n_params, n_time) and dtype float64.
Component evaluation gathers rows from this matrix and reshapes them to
(n_time, 1) for broadcasting against (1, n_energy).
Memory cost: for a model with 20 parameters and 440 time points, the scratch matrix is 20 * 440 * 8 = 69 KB. Negligible.
3.2 Data structures
class OpKind(IntEnum):
"""2D backend component function op codes.
This is the *backend-specific* lowered enum, not the graph-level
function identity. schedule_2d maps GraphNode.function_name to
OpKind during compilation (e.g. "GLP" -> GLP, "Shirley" -> SHIRLEY).
A future 1D backend would have its own enum with time-domain ops.
"""
GAUSS = 0
GAUSS_ASYM = 1
LORENTZ = 2
GLS = 3
GLP = 4
DS = 5
OFFSET = 10
LINBACK = 11
SHIRLEY = 12
@dataclass(frozen=True)
class ScheduledPlan2D:
"""Compiled 2D execution schedule.
No Python objects in the hot path (except ``expr_programs``).
"""
energy: np.ndarray # (n_energy,)
time: np.ndarray # (n_time,)
n_params: int # total parameter count (all types)
n_time: int # len(time)
# --- Parameter mapping ---
# Initial trace matrix: param_traces[i, :] is the base trace for param i.
# For static/opt params, all n_time values are identical.
# For time-dep params, the trace is base + dynamics_trace.
param_traces_init: np.ndarray # (n_params, n_time) initial values
# Which rows are optimizer-visible (overwritten from theta each call):
opt_indices: np.ndarray # (n_opt,) int -- row indices into trace matrix
opt_param_names: list[str] # (n_opt,) canonical optimizer parameter names
# defines the order theta must follow
# --- Dynamics subgraphs (grouped by target PARAM_PLUS_TRACE) ---
# Each dynamics group corresponds to one time-dependent parameter.
# Multiple dynamics components (e.g. bi-exponential terms) feeding
# the same PARAM_PLUS_TRACE are grouped together. Substeps within a
# group are indexed via the CSR-style dyn_group_indptr.
n_dyn_groups: int
dyn_group_target_row: np.ndarray # (n_dyn_groups,) int
dyn_group_base_row: np.ndarray # (n_dyn_groups,) int
dyn_group_indptr: np.ndarray # (n_dyn_groups + 1,) int -- CSR into substep arrays
dyn_sub_func_id: np.ndarray # (n_substeps,) int
dyn_sub_param_rows: np.ndarray # (n_substeps, max_dyn_params) int, -1 padded
dyn_sub_n_params: np.ndarray # (n_substeps,) int
# --- Expression evaluation ---
n_expressions: int
expr_target_rows: np.ndarray # (n_expressions,) int -- which row to write
expr_programs: list["ExprProgram"] # compiled RPN programs
# --- Interleaved parameter resolution schedule ---
# Dynamics groups and expressions may depend on each other
# (for example, a dynamics parameter expressed in terms of another
# dynamics parameter). These arrays encode the correct topological
# execution order:
# kind=0 -> dynamics group step, index into dyn_group_* arrays
# kind=1 -> expression step, index into expr_* arrays / expr_programs
resolution_kinds: np.ndarray # (n_dyn_groups + n_expressions,) int8
resolution_indices: np.ndarray # (n_dyn_groups + n_expressions,) int
# --- Scheduled component ops ---
# Components in topologically-sorted execution order (derived from graph edges).
# NOT a naive "peaks then backgrounds" reorder -- preserves model semantics.
n_ops: int
op_schedule: np.ndarray # (n_ops,) int -- execution order indices
op_kinds: np.ndarray # (n_ops,) OpKind int codes
# For each op: which rows in trace matrix are its parameter inputs.
# Stored as CSR (compressed sparse row) for variable-length param lists:
op_param_indptr: np.ndarray # (n_ops + 1,) int -- CSR row pointers
op_param_indices: np.ndarray # (total_op_params,) int -- row indices
# Which ops need the accumulated spectrum as input:
op_needs_spectrum: np.ndarray # (n_ops,) bool
# Accumulation targets: which ops contribute to the "peak sum" that
# spectrum-fed ops consume. Derived from graph SUM/SPECTRUM_INPUT edges.
op_is_pre_spectrum: np.ndarray # (n_ops,) bool -- contributes to peak_sum
3.3 Expression programs
class ExprNodeKind(IntEnum):
"""RPN instruction types."""
CONST = 0 # push literal float
PARAM_REF = 1 # push trace matrix row (by index)
ADD = 2 # pop 2, push sum
SUB = 3 # pop 2, push difference
MUL = 4 # pop 2, push product
DIV = 5 # pop 2, push quotient
NEG = 6 # pop 1, push negation
POW = 7 # pop 2, push power
@dataclass(frozen=True)
class ExprProgram:
"""Compiled expression: flat int array encoding an RPN program."""
# Encoding: pairs of (node_kind, operand).
# CONST: operand is float bits (np.float64.view(np.int64))
# PARAM_REF: operand is row index into trace matrix
# Operators: operand is 0 (unused)
instructions: np.ndarray # (2 * n_instructions,) int64
All values flowing through the RPN evaluator are (n_time,) arrays
(rows from the trace matrix). Constants are broadcast to (n_time,).
This means expression evaluation is uniform – no scalar/array branching.
What’s out of scope (v1): Function calls (np.exp, np.log),
conditionals, string operations. These would extend ExprNodeKind.
If encountered during graph construction, those nodes stay in the graph
but can_lower_2d() returns False.
4. build_graph(model) -> GraphIR
Walks the OOP tree once and emits the graph.
Algorithm
build_graph(model):
1. Create parameter nodes
For each component in model.components:
For each Par in component.pars:
- If expression: create EXPRESSION node
- If par.t_vary: create OPT_PARAM (base), DYNAMICS_TRACE,
PARAM_PLUS_TRACE nodes, plus OPT/STATIC nodes for dynamics params
- If par.vary == False: create STATIC_PARAM node
- Else: create OPT_PARAM node
2. Create expression edges
For each EXPRESSION node:
- Parse expr_string to extract referenced parameter names
- Add EXPR_REF edge from each referenced param node to this node
- If any referenced param has a PARAM_PLUS_TRACE, reference that
instead (resolved time-dependent value)
3. Create component nodes
For each component in model.components:
- Create COMPONENT_EVAL or SPECTRUM_FED_OP node based on:
* Shirley: SPECTRUM_FED_OP (it actually reads spectrum)
* Offset, LinBack: COMPONENT_EVAL (they don't read spectrum
despite Python signature; LinBack uses x/params only,
Offset is pure constant)
* Others: COMPONENT_EVAL
- Add PARAM_INPUT edges from resolved param nodes to component,
with position matching function signature order
4. Create convolution, profile, and subcycle nodes (when present)
For each component with comp_type == "conv":
- Create CONVOLUTION node
- Add ADDEND edge from accumulated signal (the SUM being built)
- Add PARAM_INPUT edges from kernel parameters
For each component with a profiled parameter (`p_vary`) or an
expression parameter that references a profiled parameter
(`expr_refs_profile_dep`):
- For each direct `p_vary` parameter:
create PROFILE_SAMPLE nodes for each aux_axis point
- Add PARAM_INPUT edges into each PROFILE_SAMPLE
(base + profile-model params for that aux point)
- For each `expr_refs_profile_dep` parameter:
create per-sample EXPRESSION nodes, with EXPR_REF edges to
the matching per-sample PROFILE_SAMPLE nodes of the
referenced profiled parameter
- Create a per-sample COMPONENT_EVAL node for each aux_axis
point, using PROFILE_SAMPLE / per-sample EXPRESSION inputs
where needed
- Create one component-level PROFILE_AVERAGE node with ADDEND
edges from the per-sample COMPONENT_EVAL nodes
- Replace the original component in the combination graph with
this PROFILE_AVERAGE node
For each dynamics model with subcycle != 0:
- Create SUBCYCLE_REMAP node before the DYNAMICS_TRACE
- Create SUBCYCLE_MASK node after the DYNAMICS_TRACE
- Store time_norm and time_n_sub in node.arrays
(e.g. SUBCYCLE_REMAP.arrays["time_norm"],
SUBCYCLE_MASK.arrays["time_n_sub"])
These nodes are fully represented in the graph. can_lower_2d()
will return False if any are present, but the graph is still
valid and can be interpreted or compiled by a future backend.
5. Create combination nodes
- Analyze the model's combine order (LIFO from mcp.py:806)
- Create SUM node for peak accumulation
- For SPECTRUM_FED_OP nodes (Shirley): add SPECTRUM_INPUT edge
from the peak SUM node
- Create final SUM node that adds everything together
- Edges preserve the original combine semantics
6. Assign source_order to all nodes
- Monotonically increasing, following the order nodes were created
during the walk. For YAML-built models this reflects component
definition order. For UI-built graphs, the UI sets source_order
at node creation time.
7. Return GraphIR
Offset and LinBack reclassification
The current interpreter treats all background functions uniformly
(comp_type == "back" -> pass spectrum=value). But examining the
actual functions:
Offset(x, y0, spectrum): returnsnp.full_like(spectrum, y0). Usesspectrumonly for shape. With uniform(n_time, n_energy)arrays, this is just a broadcast scalar – no spectrum dependency.LinBack(x, m, b, xStart, xStop, spectrum): returns a piecewise function ofx,m,b,xStart,xStop. Thespectrumargument is never read. Pure function of its own params + energy axis.Shirley(x, pShirley, spectrum): returnspShirley * cumsum(spectrum[::-1])[::-1]. Truly readsspectrum.
In the graph, only Shirley gets a SPECTRUM_INPUT edge. Offset and
LinBack become regular COMPONENT_EVAL nodes (type INDEPENDENT in
the old terminology). This is semantically correct and allows them to
be evaluated in the first pass alongside peaks.
5. schedule_2d(graph) -> ScheduledPlan2D
Compiles the graph into a flat execution schedule. The v1 backend is a specialized lowering target, not a generic DAG executor. The GraphIR is general-purpose (and a future UI will build arbitrary graphs), but the current scheduler and evaluator collapse the graph to a fixed-shape pipeline: parameter traces -> component ops -> peak_sum -> optional spectrum-fed ops -> final sum. This is intentional – it keeps the evaluator simple and fast. As the compiler supports more node types, the pipeline shape may grow, but it should always be a concrete schedule, never a runtime graph walker.
Algorithm
schedule_2d(graph):
1. Topological sort of all nodes
Tie-breaker: when two nodes have no dependency ordering between
them, sort by node.source_order (lower first). This makes the
schedule deterministic regardless of whether the graph came from
YAML parsing or a UI drag-and-drop canvas.
2. Assign trace matrix rows
- One row per parameter (STATIC_PARAM, OPT_PARAM, resolved
PARAM_PLUS_TRACE, EXPRESSION output)
- Contiguous: opt params first (so theta maps to a slice),
then static, then computed
3. Compile grouped dynamics subgraphs
- For each DYNAMICS_TRACE node:
* Record function ID (expFun -> 0, sinFun -> 1, etc.)
* Record param row indices (from PARAM_INPUT edges)
* Record the PARAM_PLUS_TRACE target row and its base row
- Group all DYNAMICS_TRACE nodes feeding the same
PARAM_PLUS_TRACE into one dynamics group
- Pack groups as CSR-style arrays:
dyn_group_target_row / dyn_group_base_row / dyn_group_indptr,
plus flat dyn_sub_* arrays for the grouped substeps
4. Compile expressions
- For each EXPRESSION node (in topological order):
* Parse expr_string into RPN using Python ast module
* Reject unsupported AST nodes (function calls, etc.)
* Resolve param references to trace matrix row indices
(using EXPR_REF edges as the source of truth)
* Encode as flat int64 array
5. Build interleaved parameter-resolution schedule
- Walk topo order and emit:
* expression steps when EXPRESSION nodes appear
* one dynamics-group step at the last DYNAMICS_TRACE node in
each group, so all prerequisite expressions are already
resolved before that group is evaluated
- Store as resolution_kinds / resolution_indices
6. Schedule component ops
- Topological order from graph edges determines execution order
- For each COMPONENT_EVAL / SPECTRUM_FED_OP node:
* Record OpKind
* Record param row indices (from PARAM_INPUT edges, CSR-encoded)
* Record whether it needs spectrum input (has SPECTRUM_INPUT edge)
- Derive op_is_pre_spectrum: which ops contribute to the
accumulated peak sum (inputs to the SUM node that feeds
SPECTRUM_FED_OP nodes)
7. Initialize trace matrix
- For static params: fill row with repeated scalar
- For opt params: fill row with repeated initial value
- Replay the same interleaved resolution schedule used by the
runtime evaluator:
* dynamics group -> base + sum(substep traces)
* expression -> evaluate RPN program against current trace matrix
8. Pack into ScheduledPlan2D, return
Critical invariants
schedule_2d()is called once before the optimizer starts. The plan is immutable during optimization (except the trace matrix scratch space, which is a working copy).The plan does not hold references to any
Model,Component, orParobjects. It is pure data.Optimizer parameter ordering is defined by
opt_param_namesstored in the plan (see below). This is the canonical order –extract_thetamust return values in this order, and result writeback must use it. At construction time,opt_param_namesis derived frommodel.parameter_names/model.lmfit_pars(only the vary=True subset), so the contract is explicit rather than implicit.The execution order preserves model semantics: the scheduler derives it from graph edges with stable tie-breaking on original definition order, not from a coarse bucket sort.
6. evaluate_2d(plan, theta) -> ndarray
Pure function. Takes the scheduled plan and optimizer parameter vector,
returns (n_time, n_energy) spectrum.
Algorithm
evaluate_2d(plan, theta):
1. PARAMETER RESOLUTION (once per call)
a. Copy plan.param_traces_init -> traces (n_params, n_time) scratch
b. Broadcast optimizer params into trace matrix:
traces[plan.opt_indices, :] = theta[:, np.newaxis]
c. Execute the interleaved resolution schedule:
For each step in (resolution_kinds, resolution_indices):
- If kind == dynamics group:
* target = dyn_group_target_row[idx]
* start from traces[target, :] = traces[base_row, :]
* for each grouped substep:
- Gather params from traces[dyn_sub_param_rows[s], 0]
- Call dynamics function(plan.time, *p) -> (n_time,) trace
- Accumulate into the target row
- If kind == expression:
* Execute RPN against traces matrix
* All operands are (n_time,) rows; arithmetic broadcasts
* Write result row: traces[expr_target_rows[idx], :] = result
2. COMPONENT EVALUATION (in scheduled order)
result = zeros(n_time, n_energy)
peak_sum = zeros(n_time, n_energy)
For each op in plan.op_schedule:
- Gather params: rows from traces matrix via CSR indices
-> reshape each from (n_time,) to (n_time, 1) for broadcasting
- If NOT op_needs_spectrum[op]:
component = eval_op(plan.energy, params, plan.op_kinds[op])
result += component
if op_is_pre_spectrum[op]:
peak_sum += component
- If op_needs_spectrum[op]:
component = eval_op_with_spectrum(
plan.energy, params, peak_sum, plan.op_kinds[op])
result += component
3. Return result # (n_time, n_energy)
Component evaluation dispatch
Each OpKind maps directly to the shared kernels in
src/trspecfit/functions/energy.py, which are the single source of
truth for component math. Peak functions broadcast naturally with
(n_time, 1) params and (1, n_energy) energy; Offset, LinBack,
and Shirley are written to support both 1D and broadcasted 2D inputs:
_OP_DISPATCH = {
OpKind.GAUSS: (fcts_energy.Gauss, 3, False),
OpKind.GLP: (fcts_energy.GLP, 4, False),
OpKind.OFFSET: (fcts_energy.Offset, 1, False),
OpKind.LINBACK: (fcts_energy.LinBack, 4, False),
OpKind.SHIRLEY: (fcts_energy.Shirley, 1, True),
}
Shirley should use axis-agnostic last-axis operations (axis=-1) so the
same implementation works for both 1D and batched inputs.
Dynamics function dispatch
Dynamics functions are called once per residual call on the full time
axis. The evaluator maps dyn_sub_func_id to the existing functions
from functions/time.py:
DYNAMICS_DISPATCH = {
0: fcts_time.expFun, # (t, A, tau, t0, y0) -> (n_time,)
1: fcts_time.sinFun, # (t, A, f, phi, t0, y0) -> (n_time,)
2: fcts_time.linFun, # (t, m, t0, y0) -> (n_time,)
...
}
These are the same functions, called once with full arrays. No per-step loop.
7. Integration with lmfit
Current call chain (2D fits)
lmfit.minimize(residual_fun, params, ...)
-> residual_fun(params, x, data, ..., args=(model, 2))
-> par_extract(params) -> list of values
-> fit_model_mcp(x, par_values, True, model, 2)
-> model.update_value(par_values) # write theta into lmfit.Parameters
-> model.create_value_2d() # THE HOT PATH (interpreter loop)
-> for each time step:
create_value_1d(t_ind=ti)
-> for each component:
Component.value(t_ind)
-> for each par: Par.value(t_ind) <-- 2M calls
-> return model.value_2d
-> residual = data - fit
-> return residual.flatten()
New call chain (implemented)
# Default project setting:
project.spec_fun_str = "fit_model_gir"
# Before a lowerable fit (1D or 2D):
graph = build_graph(model)
plan = schedule_2d(graph) # or schedule_1d(graph) for ENERGY_1D
theta_indices = precompute_indices(model.parameter_names, plan.opt_param_names)
# During fitting:
lmfit.minimize(residual_fun, params, ..., args=(plan, theta_indices, model, dim))
-> residual_fun(params, x, data, ..., fit_fun_str="fit_model_gir")
-> par_extract(params) -> full par_values list
-> fit_model_gir(x, par_values, True, plan, theta_indices, model, dim)
-> theta = par_values[theta_indices]
-> spectrum = evaluate_2d(plan, theta) # or evaluate_1d for 1D
-> residual = data - spectrum
-> return residual.flatten()
Compiled-path args contract: the compiled path always passes
(plan, theta_indices, model, dim) as args, regardless of
whether the plan is ScheduledPlan1D or ScheduledPlan2D.
The model and dim are carried so that fit_model_gir can
fall back to the interpreter when needed (e.g. plot_sum=False
for 1D component extraction). The non-lowerable fallback passes
(model, dim) only.
Implemented behavior:
fit_model_giris a dispatcher. Whenargs[0]is a compiled plan it destructures(plan, theta_indices, model, dim)and uses the fast evaluator. For 1D plans withplot_sum=Falseit falls back tofit_model_mcpfor component extraction. Whenargs[0]is not a plan it forwards tofit_model_mcp.fit_model_mcpremains the explicit interpreter mode.fit_model_compareis the validation mode. For lowerable fits (1D or 2D) it runs GIR and interpreter, compares withassert_allclose, and returns the GIR result.File.fit_2d,File.fit_baseline, andFile.fit_spectrumbuild a graph / plan whenspec_fun_strisfit_model_girorfit_model_compare.After fitting, all three methods write
result[1].paramsback intomodel.lmfit_parsviapar_extract+update_value, becausefit_wrapperoptimizes a deepcopy and the GIR path does not mutate model state on every residual call.
8. Scope boundaries – what compiles in v1 vs later
Feature |
v1 compiler |
Graph representable |
Notes |
|---|---|---|---|
Additive peaks (Gauss, GLP, GLS, DS, Voigt, etc.) |
Yes |
Yes |
Core use case |
Offset, LinBack |
Yes |
Yes |
Reclassified as COMPONENT_EVAL (no spectrum dep) |
Shirley |
Yes |
Yes |
SPECTRUM_FED_OP using last-axis cumulative sum |
Arithmetic expressions |
Yes |
Yes |
Compiled to RPN |
Time-dependent params (Dynamics) |
Yes |
Yes |
Dynamics subgraph compiled |
Voigt |
Yes |
Yes |
Broadcast-safe on the current evaluator path |
Convolution components |
Yes (Phase 6.3) |
Yes (CONVOLUTION node) |
Resolved-trace time-domain IRF; gated by |
Profile-varying params |
Yes (Phase 6.2) |
Yes (PROFILE_* nodes) |
1D and 2D backends |
Subcycle dynamics |
Yes (Phase 6.4) |
Yes (SUBCYCLE_* nodes) |
Compiled into per-substep |
Non-arithmetic expressions |
No |
Partial |
Would extend ExprNodeKind |
Project-level fits |
No |
Deferred |
Multi-graph coordination |
can_lower_2d and can_lower_1d are the gatekeepers for the 2D and 1D
backends respectively. The graph is broader than any single compiler –
the UI can build graphs that no backend can compile yet.
can_lower_1d / schedule_1d / evaluate_1d target ENERGY_1D
models on the same GraphIR (implemented in Phase 6.1).
9. Validation strategy
The compiled evaluator must produce results that agree with the
interpreter to within reasonable floating-point tolerance. Because the
compiled path sums components in a different order (graph-scheduled vs
the interpreter’s LIFO combine loop), and broadcasts 2D operations
instead of accumulating per-time-step, floating-point summation order
may differ. This means bitwise identity is not guaranteed – allclose
with a practical tolerance is the contract.
def validate_plan(model, plan):
"""Compare interpreter vs compiled evaluator output."""
# Evaluate via interpreter
model.create_value_2d()
interp_result = model.value_2d.copy()
# Evaluate via plan
theta = extract_theta(model.lmfit_pars)
plan_result = evaluate_2d(plan, theta)
# Compare -- rtol=1e-10 accounts for summation order differences.
# For typical spectroscopy data (values 0-100), this means agreement
# to ~10 significant digits.
assert np.allclose(interp_result, plan_result, atol=1e-10, rtol=1e-10)
This validation runs:
In the test suite, for every test model that
can_lower_2d()acceptsIn explicit compare mode during fitting via
spec_fun_str="fit_model_compare"On fallback cases, compare mode delegates to the interpreter so behavior matches the non-GIR path
10. Future: backends and Jacobians
Once evaluate_2d(plan, theta) -> spectrum exists as a pure function:
Numba: @njit on the component eval dispatch loop. Most of the
plan (int index arrays, CSR param maps, dense trace matrix) is
Numba-compatible. The expression programs (list[ExprProgram]) would
need flattening to a CSR-style encoding first – this is noted in the
plan as a v1 simplification that can be tightened later.
JAX: Replace np with jnp in the evaluator. The plan’s array
structure maps directly to JAX arrays. Key wins:
jax.jitcompiles the full evaluator (including parameter resolution)jax.jacfwd/jax.jacrevgives analytic Jacobians for freeGPU acceleration for large grids
Analytic Jacobians: With a differentiable evaluator, lmfit’s
Levenberg-Marquardt can use exact Jacobians instead of finite differences.
This replaces 2 * n_free_params + 1 evaluator calls per iteration with
1 evaluator call + 1 Jacobian call. For 4 free params, that’s 9 -> 2
calls, ~4.5x fewer evaluations per iteration.
Variable projection (VARPRO): Linear parameters (amplitudes A,
offset y0, slope m) can be solved in closed form given the nonlinear
parameters. Reduces optimizer dimensionality. The graph makes identifying
linear params straightforward: any param that appears as a linear factor
in its component’s function.
Implementation tracking
This document is the long-lived design/spec for the lowered evaluator.
The completed implementation history (all phases of the GIR / lowered evaluator rollout) is archived in archive/lowered_evaluator_implementation.md.
The closed backend-decision benchmark write-up lives in archive/numba_vs_jax.md.