Code Review Checklist
Shared source of truth for AI-assisted code review in this repo. The Claude skill wrapper and the Codex wrapper should both point here instead of carrying their own copies of the checklist.
Scope
This checklist accepts one optional argument:
diff(default) — review only files changed relative tomain. Rungit diff main --name-only -- '*.py'to get the file list, then apply every check below to those files only. Read the full diff withgit diff mainfor context on what changed.full— review all Python source undersrc/andtests/.file / directory / glob — restrict the review to that scope.
Naming exclusion: function registry names (GLP, pExpDecay, LinBack,
gaussCONV, etc.) and their parameters (A, x0, SD, xStart, xStop)
use camelCase/PascalCase because _ is the component ID delimiter. Exclude
src/trspecfit/functions/ from naming checks, and ignore variables/parameters
that match registry parameter names elsewhere.
For each item report one of:
PASS — no issues found
INFO — minor observations, no action needed
WARN — issues found, list them with file and line number
FAIL — serious issues found, list them with file and line number
Work through the checklist in order. Use parallel agent/tool calls where items are independent.
1. Bugs and correctness
Read the code in scope looking for:
Logic errors: wrong condition, off-by-one, inverted boolean, wrong variable
Incorrect indexing or slicing of arrays/axes
Wrong variable reuse (e.g., mutating an input that is used later)
Boundary mishandling (empty arrays, single-element edge cases)
Silent type coercion that changes meaning (int vs float, list vs array)
Regressions: if reviewing a diff, check whether changed code breaks existing callers, contracts, or assumptions documented in docstrings
2. Performance
Look for:
Unnecessary copies of large arrays (
np.array(x)whenxis already an ndarray,.copy()without reason)Repeated expensive computations inside loops that could be hoisted
O(n^2) or worse patterns where O(n) or O(n log n) alternatives exist (e.g., nested list scans, repeated
inchecks on lists instead of sets)Allocation inside tight loops (creating arrays/lists per iteration when pre-allocation or vectorization would work)
Unvectorized element-wise loops over numpy arrays
Report as WARN only for patterns with measurable impact in typical use (not micro-optimizations).
3. Bare / broad exception handling
Find except Exception, except BaseException, and bare except:.
Flag any that silently swallow errors (no re-raise, no logging).
Acceptable: top-level CLI entry points, describe/repr methods guarding display.
4. God classes / long methods
Flag any single method or function longer than 200 lines (excluding docstring and blank lines). Flag any class with more than 30 public methods. Report as INFO with a note on whether the size is justified.
5. Dead code
Search for:
Commented-out code blocks (3+ consecutive commented lines that look like code)
Unreachable code after
return/raise/break/continue
6. Typing / modern Python syntax
Search for:
Optional[X]orUnion[X, Y]whereX | Yshould be used instead (project style, Python 3.12+)
7. Docstring coverage
Check all public functions, methods, and classes in src/ for:
Missing docstrings entirely
Docstrings that are not NumPy-style (missing Parameters/Returns sections)
User-facing API (trspecfit.py, functions/) should have extensive docstrings
with Parameters, Returns, and Notes. Internal modules can keep method docstrings
minimal but must have module and class-level docstrings.
Report count and list the worst offenders.
8. Abstractions and duplication
Read src/ modules looking for:
Poor or missing abstractions — repeated patterns that should be extracted into a shared helper, base class method, or utility
Copy-pasted logic across files or classes (similar blocks of 5+ lines)
Overly concrete code that handles special cases inline instead of through polymorphism or configuration
Suggest specific refactorings where the improvement is clear. Ignore test files. Note: repeated blocks that extract different subsets of fields from a config object are not duplication — a generic helper would just move the per-site variation elsewhere without reducing complexity.
9. Numpy anti-patterns
Search for:
np.concatenatewherenp.wherewould suffice (piecewise assignment)np.shape(x)instead ofx.shapenp.ones(np.shape(...))instead ofnp.ones_like(...)/np.full_like(...)np.arange(0, n, 1)instead ofnp.arange(n)Manual loops over array elements that could be vectorized
10. Fragile array comparisons
Flag == or != on floating-point arrays/values from computation.
Should use np.isclose, np.allclose, or tolerance-based comparison.
Ignore comparisons against integer values or input validation checks.
11. Ignored warnings
Search for warnings.filterwarnings("ignore") or
warnings.simplefilter("ignore").
These can mask real issues. Flag unless scoped to a specific known warning.
12. Plotting mixed with logic
Flag functions that both compute results AND produce plots without a
show_plot/debug flag to disable the plotting side. Plotting should be
separable from computation. Also check functions that do have a display
flag (e.g. show_output, save_img) but contain plot blocks that bypass
it — plots should save-and-close (not display) when the flag is off.
13. Poor separation of concerns
Check for:
Circular imports (A imports B imports A)
Business logic in UI/plotting code
I/O operations (file read/write) mixed into computation functions
14. Missing __repr__ / __str__
Check main domain classes for a __repr__ method. Classes that appear in
debugging output or collections should be representable. Report as INFO.
15. Global mutable state
Search for:
globalkeyword usageModule-level mutable containers (
dict,list,set) that are modified at runtime (not just defined as constants)Module-level variables reassigned after import time
16. Security
Search for:
eval()orexec()usagepickle.load()/pickle.loads()on potentially untrusted datayaml.load()withoutLoader=SafeLoader(or not usingyaml.safe_load())Hardcoded file paths or credentials
subprocesscalls withshell=Trueos.system()calls
17. Missing edge-case tests
Scan test files for coverage of:
Empty/None inputs to public API functions
Boundary values (zero-length arrays, single-element arrays)
Invalid YAML/config inputs
NaN/Inf in numeric inputs
Report as INFO with suggestions for what to add.
18. GIR / MCP parity
Trigger when the diff touches any of src/trspecfit/graph_ir.py,
src/trspecfit/eval_1d.py, src/trspecfit/eval_2d.py,
src/trspecfit/spectra.py, or src/trspecfit/functions/*.py.
Verify:
fit_model_compareparity coverage still exists for the modified path (seetests/test_gir_integration.pyand integration harnesses).New functions either lower to GIR or route cleanly to MCP-only fallback via
can_lower_1d()/can_lower_2d()returningFalse.Evaluator semantics changes are reflected in both paths, or the MCP-only behavior is preserved with an explicit test.
Report as FAIL if parity is broken without a test; WARN if parity tests exist but do not cover the new code path.
19. Two-layer design compliance
Per CLAUDE.md “Architecture guardrails” the repo has a deliberate split:
Authoring layer (
mcp.py,trspecfit.py,simulator.py,utils/parsing.py): optimize for readability, validation, and clear errors.Compiled hot-path (
graph_ir.py,eval_1d.py,eval_2d.py, numeric bodies infunctions/): array-oriented; no Python-object attribute access or model-structure branching in inner loops.
Flag:
Python-object access,
isinstancechecks, or model-structure branching inside inner loops ofeval_1d.py/eval_2d.py.Validation-heavy helpers or error-formatting code landing in
graph_ir.pyoreval_*.pywhen it belongs in the authoring layer.Performance micro-optimizations landing in
mcp.py/trspecfit.pywhen the goal of that layer is readability.
Report as WARN with file/line and a suggestion for which layer the code belongs in.
20. can_lower_* hygiene
Trigger when the diff introduces a new NodeKind or modifies
lowerability-gating logic in graph_ir.py.
Verify:
The new kind is correctly included in or excluded from
_NON_LOWERABLE_NODE_KINDS_BASEand_NON_LOWERABLE_2D_NODE_KINDS.can_lower_1d()andcan_lower_2d()handle the new kind — either it is lowered, or it is explicitly routed to the MCP fallback.The MCP fallback path has at least one test exercising a model containing the new kind with lowering disabled (or provably rejected by
can_lower_*).Per-node gates like
_is_lowerable_convolution_2d/_is_lowerable_subcycle_2dare updated when the structural contract they enforce changes.
Report as FAIL if a new node kind has no coverage in either direction.
Summary
Print a summary table:
# |
Check |
Status |
Issues |
|---|---|---|---|
1 |
Bugs & correctness |
… |
… |
2 |
Performance |
… |
… |
3 |
Broad exceptions |
… |
… |
4 |
God classes / long methods |
… |
… |
5 |
Dead code |
… |
… |
6 |
Typing / modern Python syntax |
… |
… |
7 |
Docstring coverage |
… |
… |
8 |
Abstractions & duplication |
… |
… |
9 |
Numpy anti-patterns |
… |
… |
10 |
Fragile array comparisons |
… |
… |
11 |
Ignored warnings |
… |
… |
12 |
Plotting mixed with logic |
… |
… |
13 |
Separation of concerns |
… |
… |
14 |
Missing repr |
… |
… |
15 |
Global mutable state |
… |
… |
16 |
Security |
… |
… |
17 |
Edge-case tests |
… |
… |
18 |
GIR / MCP parity |
… |
… |
19 |
Two-layer design compliance |
… |
… |
20 |
|
… |
… |
Then list any FAIL or WARN items with actionable next steps.