# Benchmark GIR vs Interpreter Shared source of truth for benchmarking the compiled GIR evaluator against the interpreter (MCP) path. Run `benchmark_gir.py` to compare the compiled and interpreter evaluation paths on an example fitting workflow. ## Available examples ```bash ls -d examples/fitting_workflows/0[0-9]_*/ 2>/dev/null | \ grep -v _fits | \ while read -r d; do printf ' %s\n' "$(basename "$d")"; done ``` Lowerability is checked per-node by `can_lower_2d()`; there is no blanket exclusion for convolution or subcycle dynamics — both lower when their structural contracts are satisfied (resolved-trace time-domain convolution, subcycle substeps compiled into schedule arrays). The examples exercise different GIR paths: | # | example | GIR path exercised | |---|-----------------------------|--------------------| | 1 | `01_basic_fitting` | convolution (`MonoExpPosIRF` -> `*CONV` kernel) | | 2 | `02_dependent_parameters` | plain dynamics, no conv/subcycle/profile (default) | | 3 | `03_multi_cycle` | subcycle dynamics | | 4 | `04_par_profiles` | profile models | | 5 | `05_project_level_fitting` | not currently supported by the benchmark harness | Example 02 is the default because it is the cleanest baseline comparison (pure dynamics, no side paths). ## Task Parse the arguments: - First positional integer -> `--example N` (default: `2`) - `--fit` -> include full-fit benchmark - `-n N` -> fit repetitions (default: `3`) Run: ```bash .venv/bin/python .claude/skills/benchmark/benchmark_gir.py --example --calls 200 [--fit] [-n ] ``` Report the results to the user. Highlight the speedup ratio, the `Max |diff|` correctness check, and note which GIR path the example exercises (convolution / subcycle / profile / plain). ## Fit-count and planning-cost modes Two additional modes report operational characteristics of the fit rather than a head-to-head speedup: - `--nfev` — run the standard baseline + `fit_2d` pipeline and report the total number of residual evaluations per stage. Useful when checking whether a change inflates the fit work (not just the per-call cost). - `--plan-time` — measure `build_graph` + `schedule_2d` cost against the total `fit_2d` wall time. Useful for confirming that planning overhead stays negligible relative to the fit itself. Both modes accept `--example 0` to run across all examples and print a summary table at the end. ```bash .venv/bin/python .claude/skills/benchmark/benchmark_gir.py --example --nfev .venv/bin/python .claude/skills/benchmark/benchmark_gir.py --example --plan-time ``` ## Profiling (GIR path only) For flamegraphs of the GIR hot path, use `--profile` to run a GIR-only loop (no interpreter path, no correctness check, no prints inside the loop) and attach `py-spy` to the subprocess. Prerequisite (one-time): ```bash .venv/bin/pip install -e ".[profiling]" ``` Invocation: ```bash .venv/bin/py-spy record --rate 500 -o docs/design/benchmarks/gir_profile.svg -- \ .venv/bin/python .claude/skills/benchmark/benchmark_gir.py --example --profile ``` `py-spy` needs permission to attach to the child process. On Linux this requires either `sudo` or `sysctl kernel.yama.ptrace_scope=0`.