Benchmark GIR vs Interpreter
Shared source of truth for benchmarking the compiled GIR evaluator against the interpreter (MCP) path.
Run benchmark_gir.py to compare the compiled and interpreter evaluation paths
on an example fitting workflow.
Available examples
ls -d examples/fitting_workflows/0[0-9]_*/ 2>/dev/null | \
grep -v _fits | \
while read -r d; do printf ' %s\n' "$(basename "$d")"; done
Lowerability is checked per-node by can_lower_2d(); there is no blanket
exclusion for convolution or subcycle dynamics — both lower when their
structural contracts are satisfied (resolved-trace time-domain convolution,
subcycle substeps compiled into schedule arrays). The examples exercise
different GIR paths:
# |
example |
GIR path exercised |
|---|---|---|
1 |
|
convolution ( |
2 |
|
plain dynamics, no conv/subcycle/profile (default) |
3 |
|
subcycle dynamics |
4 |
|
profile models |
5 |
|
not currently supported by the benchmark harness |
Example 02 is the default because it is the cleanest baseline comparison (pure dynamics, no side paths).
Task
Parse the arguments:
First positional integer ->
--example N(default:2)--fit-> include full-fit benchmark-n N-> fit repetitions (default:3)
Run:
.venv/bin/python .claude/skills/benchmark/benchmark_gir.py --example <N> --calls 200 [--fit] [-n <N>]
Report the results to the user. Highlight the speedup ratio, the
Max |diff| correctness check, and note which GIR path the example exercises
(convolution / subcycle / profile / plain).
Fit-count and planning-cost modes
Two additional modes report operational characteristics of the fit rather than a head-to-head speedup:
--nfev— run the standard baseline +fit_2dpipeline and report the total number of residual evaluations per stage. Useful when checking whether a change inflates the fit work (not just the per-call cost).--plan-time— measurebuild_graph+schedule_2dcost against the totalfit_2dwall time. Useful for confirming that planning overhead stays negligible relative to the fit itself.
Both modes accept --example 0 to run across all examples and print a summary
table at the end.
.venv/bin/python .claude/skills/benchmark/benchmark_gir.py --example <N> --nfev
.venv/bin/python .claude/skills/benchmark/benchmark_gir.py --example <N> --plan-time
Profiling (GIR path only)
For flamegraphs of the GIR hot path, use --profile to run a GIR-only loop
(no interpreter path, no correctness check, no prints inside the loop) and
attach py-spy to the subprocess.
Prerequisite (one-time):
.venv/bin/pip install -e ".[profiling]"
Invocation:
.venv/bin/py-spy record --rate 500 -o docs/design/benchmarks/gir_profile.svg -- \
.venv/bin/python .claude/skills/benchmark/benchmark_gir.py --example <N> --profile
py-spy needs permission to attach to the child process. On Linux this
requires either sudo or sysctl kernel.yama.ptrace_scope=0.