Code Review Checklist

Shared source of truth for AI-assisted code review in this repo. The Claude skill wrapper and the Codex wrapper should both point here instead of carrying their own copies of the checklist.

Scope

This checklist accepts one optional argument:

  • diff (default) — review only files changed relative to main. Run git diff main --name-only -- '*.py' to get the file list, then apply every check below to those files only. Read the full diff with git diff main for context on what changed.

  • full — review all Python source under src/ and tests/.

  • file / directory / glob — restrict the review to that scope.

Naming exclusion: function registry names (GLP, pExpDecay, LinBack, gaussCONV, etc.) and their parameters (A, x0, SD, xStart, xStop) use camelCase/PascalCase because _ is the component ID delimiter. Exclude src/trspecfit/functions/ from naming checks, and ignore variables/parameters that match registry parameter names elsewhere.

For each item report one of:

  • PASS — no issues found

  • INFO — minor observations, no action needed

  • WARN — issues found, list them with file and line number

  • FAIL — serious issues found, list them with file and line number

Work through the checklist in order. Use parallel agent/tool calls where items are independent.

1. Bugs and correctness

Read the code in scope looking for:

  • Logic errors: wrong condition, off-by-one, inverted boolean, wrong variable

  • Incorrect indexing or slicing of arrays/axes

  • Wrong variable reuse (e.g., mutating an input that is used later)

  • Boundary mishandling (empty arrays, single-element edge cases)

  • Silent type coercion that changes meaning (int vs float, list vs array)

  • Regressions: if reviewing a diff, check whether changed code breaks existing callers, contracts, or assumptions documented in docstrings

2. Performance

Look for:

  • Unnecessary copies of large arrays (np.array(x) when x is already an ndarray, .copy() without reason)

  • Repeated expensive computations inside loops that could be hoisted

  • O(n^2) or worse patterns where O(n) or O(n log n) alternatives exist (e.g., nested list scans, repeated in checks on lists instead of sets)

  • Allocation inside tight loops (creating arrays/lists per iteration when pre-allocation or vectorization would work)

  • Unvectorized element-wise loops over numpy arrays

Report as WARN only for patterns with measurable impact in typical use (not micro-optimizations).

3. Bare / broad exception handling

Find except Exception, except BaseException, and bare except:. Flag any that silently swallow errors (no re-raise, no logging). Acceptable: top-level CLI entry points, describe/repr methods guarding display.

4. God classes / long methods

Flag any single method or function longer than 200 lines (excluding docstring and blank lines). Flag any class with more than 30 public methods. Report as INFO with a note on whether the size is justified.

5. Dead code

Search for:

  • Commented-out code blocks (3+ consecutive commented lines that look like code)

  • Unreachable code after return/raise/break/continue

6. Typing / modern Python syntax

Search for:

  • Optional[X] or Union[X, Y] where X | Y should be used instead (project style, Python 3.12+)

7. Docstring coverage

Check all public functions, methods, and classes in src/ for:

  • Missing docstrings entirely

  • Docstrings that are not NumPy-style (missing Parameters/Returns sections)

User-facing API (trspecfit.py, functions/) should have extensive docstrings with Parameters, Returns, and Notes. Internal modules can keep method docstrings minimal but must have module and class-level docstrings. Report count and list the worst offenders.

8. Abstractions and duplication

Read src/ modules looking for:

  • Poor or missing abstractions — repeated patterns that should be extracted into a shared helper, base class method, or utility

  • Copy-pasted logic across files or classes (similar blocks of 5+ lines)

  • Overly concrete code that handles special cases inline instead of through polymorphism or configuration

Suggest specific refactorings where the improvement is clear. Ignore test files. Note: repeated blocks that extract different subsets of fields from a config object are not duplication — a generic helper would just move the per-site variation elsewhere without reducing complexity.

9. Numpy anti-patterns

Search for:

  • np.concatenate where np.where would suffice (piecewise assignment)

  • np.shape(x) instead of x.shape

  • np.ones(np.shape(...)) instead of np.ones_like(...) / np.full_like(...)

  • np.arange(0, n, 1) instead of np.arange(n)

  • Manual loops over array elements that could be vectorized

10. Fragile array comparisons

Flag == or != on floating-point arrays/values from computation. Should use np.isclose, np.allclose, or tolerance-based comparison. Ignore comparisons against integer values or input validation checks.

11. Ignored warnings

Search for warnings.filterwarnings("ignore") or warnings.simplefilter("ignore"). These can mask real issues. Flag unless scoped to a specific known warning.

12. Plotting mixed with logic

Flag functions that both compute results AND produce plots without a show_plot/debug flag to disable the plotting side. Plotting should be separable from computation. Also check functions that do have a display flag (e.g. show_output, save_img) but contain plot blocks that bypass it — plots should save-and-close (not display) when the flag is off.

13. Poor separation of concerns

Check for:

  • Circular imports (A imports B imports A)

  • Business logic in UI/plotting code

  • I/O operations (file read/write) mixed into computation functions

14. Missing __repr__ / __str__

Check main domain classes for a __repr__ method. Classes that appear in debugging output or collections should be representable. Report as INFO.

15. Global mutable state

Search for:

  • global keyword usage

  • Module-level mutable containers (dict, list, set) that are modified at runtime (not just defined as constants)

  • Module-level variables reassigned after import time

16. Security

Search for:

  • eval() or exec() usage

  • pickle.load() / pickle.loads() on potentially untrusted data

  • yaml.load() without Loader=SafeLoader (or not using yaml.safe_load())

  • Hardcoded file paths or credentials

  • subprocess calls with shell=True

  • os.system() calls

17. Missing edge-case tests

Scan test files for coverage of:

  • Empty/None inputs to public API functions

  • Boundary values (zero-length arrays, single-element arrays)

  • Invalid YAML/config inputs

  • NaN/Inf in numeric inputs

Report as INFO with suggestions for what to add.

18. GIR / MCP parity

Trigger when the diff touches any of src/trspecfit/graph_ir.py, src/trspecfit/eval_1d.py, src/trspecfit/eval_2d.py, src/trspecfit/spectra.py, or src/trspecfit/functions/*.py.

Verify:

  • fit_model_compare parity coverage still exists for the modified path (see tests/test_gir_integration.py and integration harnesses).

  • New functions either lower to GIR or route cleanly to MCP-only fallback via can_lower_1d() / can_lower_2d() returning False.

  • Evaluator semantics changes are reflected in both paths, or the MCP-only behavior is preserved with an explicit test.

Report as FAIL if parity is broken without a test; WARN if parity tests exist but do not cover the new code path.

19. Two-layer design compliance

Per CLAUDE.md “Architecture guardrails” the repo has a deliberate split:

  • Authoring layer (mcp.py, trspecfit.py, simulator.py, utils/parsing.py): optimize for readability, validation, and clear errors.

  • Compiled hot-path (graph_ir.py, eval_1d.py, eval_2d.py, numeric bodies in functions/): array-oriented; no Python-object attribute access or model-structure branching in inner loops.

Flag:

  • Python-object access, isinstance checks, or model-structure branching inside inner loops of eval_1d.py / eval_2d.py.

  • Validation-heavy helpers or error-formatting code landing in graph_ir.py or eval_*.py when it belongs in the authoring layer.

  • Performance micro-optimizations landing in mcp.py / trspecfit.py when the goal of that layer is readability.

Report as WARN with file/line and a suggestion for which layer the code belongs in.

20. can_lower_* hygiene

Trigger when the diff introduces a new NodeKind or modifies lowerability-gating logic in graph_ir.py.

Verify:

  • The new kind is correctly included in or excluded from _NON_LOWERABLE_NODE_KINDS_BASE and _NON_LOWERABLE_2D_NODE_KINDS.

  • can_lower_1d() and can_lower_2d() handle the new kind — either it is lowered, or it is explicitly routed to the MCP fallback.

  • The MCP fallback path has at least one test exercising a model containing the new kind with lowering disabled (or provably rejected by can_lower_*).

  • Per-node gates like _is_lowerable_convolution_2d / _is_lowerable_subcycle_2d are updated when the structural contract they enforce changes.

Report as FAIL if a new node kind has no coverage in either direction.

Summary

Print a summary table:

#

Check

Status

Issues

1

Bugs & correctness

2

Performance

3

Broad exceptions

4

God classes / long methods

5

Dead code

6

Typing / modern Python syntax

7

Docstring coverage

8

Abstractions & duplication

9

Numpy anti-patterns

10

Fragile array comparisons

11

Ignored warnings

12

Plotting mixed with logic

13

Separation of concerns

14

Missing repr

15

Global mutable state

16

Security

17

Edge-case tests

18

GIR / MCP parity

19

Two-layer design compliance

20

can_lower_* hygiene

Then list any FAIL or WARN items with actionable next steps.