# Examples Upgrade Plan Design note for a future branch that reorganizes the example notebooks around the way users actually approach the package. This is deliberately separate from the fit-results save/load branch: the current branch should finish the archive feature with minimal examples/docs coverage, then merge. The broader examples upgrade is a teaching and UX project with enough file movement and narrative work to deserve its own branch. ## Decisions (locked) - Track-based navigation replaces the linear "walk forward" path. The quickstart still recommends `01_basic_fitting` as the first notebook, but does not imply that every user should walk every example in order. - Top-level directories for this examples-upgrade pass: `fitting_workflows/` (existing name kept) and `synthetic_data/` (renamed from `data_generation/`). No `data_preparation/` track in this pass. - Inside `fitting_workflows/`, layout is flat with three numeric blocks: **01–04** = fitting skills on a single file; **10–11** = post-fit work (comparison, persistence, export); **20+** = multi-file workflows. `fitting_workflows/README.md` documents the legend. - `10_model_comparison` is strictly about model comparison. The persistence / inspection / export side (save/load h5, browse loaded archives, ship single slots, the two-channels framing) moves to a sibling notebook `11_save_load_export`. Each notebook has one job, and notebook 11 gets a discoverable filesystem location that can be linked from `save_fit` / `export_fit` docstrings, the README, and the CHANGELOG. - Notebook 11 uses notebook 10's full pipeline as its preamble via the IPython `%run` magic (`%run ../10_model_comparison/example.ipynb`), preceded by a one-line markdown pointer ("see `10_model_comparison` for the fitting/comparison details; this notebook focuses on what to do with the results"). Output suppression rides on the existing `project.yaml` knobs already used by `10_model_comparison` (`show_output: 0`, `auto_export: false`); `%%capture` is the fallback, not the default mechanism. Expected preamble runtime ~30–40 s (baseline fits <1 s each, SbS ~10 s each, 2D fits a few seconds each) — acceptable for a notebook a reader opens deliberately. Single source of truth: notebook 10 owns the fit pipeline; notebook 11 inherits any future updates automatically and ends up with rich state (baseline + SbS + 2D slots, σ snapshot, conf_ci) available for the persistence demos. - Casual user's mental model = `File`. `file.save_fit()` saves a snapshot of completed fits for this file, keeping the latest slot per model / fit type / selection (snapshot semantics inherited from `Project.save_fits`). It is the casual user's "save all current fits for this file" API. - Model comparison (`FitResults.load` + `compare_models`) is its own notebook (`10_model_comparison`), **not** part of `01_basic_fitting` — comparison requires fitting two models, which doubles the cognitive load of the basic notebook. - Export terminology: **export** = one-way CSV/PNG for humans + tools like Origin; **save/load** = HDF5 round-trip via the `FitResults` archive. - `Project.auto_export` (default `True`, configurable via `project.yaml` or post-init mutation) gates the fit-completion CSV/PNG side effects. The basic notebook documents the default behavior and shows the opt-out. ## Motivation The fit-results work introduces a clearer split between: - `File` as the natural surface for fitting and exporting one dataset. - `Project` as configuration, workspace, multi-file coordination, and archive ownership. - `FitResults` as the inspection/comparison object for completed fits. The examples should reinforce that split. Users who are thinking "fit file 1, export the fit, load it into Origin" should not feel that they need to understand the full `Project` model. Power users should still have a clear path to multi-file fitting, project-level shared fits, archives, and model comparison. ## Current State The examples are currently organized as: ```text examples/ data_generation/ simulator/ ml_training/ fitting_workflows/ 01_basic_fitting/ 02_dependent_parameters/ 03_multi_cycle/ 04_par_profiles/ 05_project_level_fitting/ ``` This is already close in one respect: simulation and ML training data are separate from fitting. The weakness is that all fitting notebooks live in one linear sequence, even though they represent different user mindsets: - Single-file fitting skills (`01` through `04`). - Post-fit comparison (currently absent). - Multi-file / project-level fitting (`05`, with no bridge between the single-file basics and full project-level shared fits). The current quickstart also tells users to start at `01` and work forward. That is useful for a tutorial path, but less useful once examples cover multiple tracks. ## User Workflows ### Single-file / Origin-style user This user has one processed dataset and wants to fit it, export tables/plots, and keep working in external tools. Primary API: ```python file.fit_baseline(...) file.fit_2d(...) file.export_fit() # CSV + PNG, Origin-friendly file.save_fit() # HDF5 archive snapshot for this file file.get_fit_results(fit_type="2d") ``` The `Project` should appear as setup/context, not as the main conceptual object. For this user, `Project` is mostly where config, paths, plotting defaults, and model files live. ### Multi-file individual fitting user This user has several files but wants to fit each file separately. They want a shared loop, consistent settings, per-file exports, and a summary view. Primary API: ```python project = trspecfit.Project(...) files = [...] for file in files: file.fit_baseline(...) file.fit_2d(...) project.export_fits() # one coherent tree across files project.save_fits() # one portable HDF5 for the batch project.results.compare_models(file=files[0], ...) ``` This is the bridge workflow: `Project` is useful as a collection and session workspace, but each fit remains file-scoped. ### Project-level / shared-fit user This user intentionally wants shared parameters across multiple files and is ready for `Project` to be an active fitting object. Primary API: ```python project.fit_2d(...) project.save_fits(...) project.results.compare_models(...) ``` This workflow is more complex and should come after multi-file individual fitting, not immediately after single-file basics. ### Synthetic-data / ML user This user is working with forward simulation, validation, or training data. They are not preparing experimental data and not primarily fitting an existing file. The current simulator and ML training notebooks belong together. ## Proposed Directory Layout ```text examples/ fitting_workflows/ # existing name kept 01_basic_fitting/ # block 0x: fitting skills (single-file) 02_dependent_parameters/ 03_multi_cycle_dynamics/ # renamed from 03_multi_cycle 04_parameter_profiles/ # renamed from 04_par_profiles 10_model_comparison/ # block 1x: post-fit work 11_save_load_export/ # block 1x: post-fit work (NEW) 20_fit_each_separately/ # block 2x: multi-file (NEW, bridge) 21_project_level_shared_fit/ # was 05_project_level_fitting synthetic_data/ # renamed from data_generation 01_simulator/ 02_ml_training_data/ ``` Numbering convention documented in `fitting_workflows/README.md`: - **0x** — fitting skills on a single file. - **1x** — post-fit work (comparison, persistence, export). - **2x** — multi-file workflows. The flat layout with three numeric blocks keeps alphabetical sort intact while making the category structure visible without an extra directory level. Numbering restarts at the next block boundary as new notebooks are added within a category. ## Notebook Content Targets ### `fitting_workflows/01_basic_fitting` Core "one file" story: - Load processed `data`, `energy`, and `time`. - Fit baseline, optional slice-by-slice, and 2D model. - Show `file.get_fit_results(...)`. - Show `file.export_fit()` as the Origin-friendly CSV/PNG workflow. - Show `file.save_fit()` as the archive-snapshot persistence (keeps the latest slot per model / fit type / selection for this file). - **Callout**: `fit_*` methods auto-write CSVs/PNGs to `project.path_results` on completion by default. The notebook shows both the default and the `project.auto_export = False` opt-out (also settable via `project.yaml`). - **Out of scope here**: `FitResults.load` and `compare_models` — those move to `10_model_comparison` so this notebook stays focused on the casual user's single-file path. ### `fitting_workflows/10_model_comparison` Post-fit comparison story (NEW). Strictly model selection — persistence, inspection, and export move to the sibling notebook `11_save_load_export` so each notebook has one job. - Two models, one file. Compress the fitting cells — readers have seen the fit API in 01–04, so this notebook glosses over fitting and focuses on comparison. - Three comparison stories, each isolating one structural choice: baseline (line shape), SbS (parsimony), 2D (instrument response). - In-session comparison via `project.results.compare_models(...)` and the sugar delegate `file.compare_models(...)`. - `compare_models` aggregation modes: default `median`, `sum`, and `long` for per-slice rows. Close §6 with a "two practical questions" payoff — "which model fits spectrum #4 best?" (long form + slice filter) and "which model fits best across the board?" (sum-aggregated, sorted by AIC). This motivates `long` form for the batch-of-spectra use case (SbS as N independent 1D fits, not necessarily time). - `plot_residuals` at both ends: 1D obs+fit+residual for baseline, shared-scale residual heatmaps for 2D (where the IRF residual band is the decisive visual). Persistence content (`save_fit` / `FitResults.load` / `compare_models` on loaded archives / slot anatomy / filtered single-slot ship / two channels / overwrite semantics / σ-snapshot recalibration) does **not** live here. See `11_save_load_export`. ### `fitting_workflows/11_save_load_export` Save / load / export story (NEW). The canonical reference for the `FitResults` archive API, used after a reader has seen fitting and comparison. **Preamble pattern** (first two cells): 1. Markdown pointer to `10_model_comparison` ("see that notebook for the fits' details; this one focuses on what to do with the results"). 2. A single code cell that runs notebook 10's content with suppressed output: ```python %run ../10_model_comparison/example.ipynb ``` IPython `%run` executes the target notebook in the current kernel, so all of notebook 10's variables (`file`, `project`, fitted models) are in scope below. Output suppression rides on the existing `project.yaml` knobs (`show_output: 0`, `auto_export: false`). `%%capture` is the fallback if any output leaks past the YAML knobs. Runtime ~30–40 s. After the preamble, the actual content: - `file.save_fit("comparison.fit.h5")` and `FitResults.load(path)` — the canonical round-trip with no live `Project` on the reload side. - `loaded.compare_models(...)` showing the same comparison API works identically against an on-disk archive (sanity check, not the primary point). - Filtered single-slot save (`save_fit(path, model=..., fit_type=...)`) and the parallel `export_fit` with the same filters. "Ship the winners" as the natural next step once a reader has a verdict. - The two channels framed by audience: HDF5 (structured, lossless, σ-snapshot included, round-trips back into trspecfit — for *future you* and other trspecfit users) vs CSV/PNG tree (one-way — for Origin, MATLAB, paper plots, non-trspecfit colleagues). - `FitResults` query API: `files()`, `models()`, `find()`, `get()`, and slot anatomy via `dataclasses.fields(slot)` rendering shapes for arrays/frames and keys for dicts (every constituent part is discoverable without opening the `.h5`). - Overwrite / slot-collision semantics on `save_fit` (append-by-default, `FileExistsError` on slot collision unless `overwrite=True`). - σ-snapshot semantics — calibrated columns survive load without re-`set_sigma()`; what-if recalibration via `chi2_red_raw`. The preamble pattern is preferred over an inline stripped-down setup because (a) single source of truth — notebook 10 owns the fit pipeline, notebook 11 inherits future updates automatically — and (b) rich state: all slot types (baseline + SbS + 2D, with conf_ci on baseline) are available for the persistence demos, not just a minimum quorum. The ~30–40 s runtime cost is acceptable for a notebook a reader opens deliberately. ### `fitting_workflows/20_fit_each_separately` Bridge story (NEW): - One model definition and one set of fit limits applied across N files (avoids the duplicated setup code a bare `for file in files: ...` loop would require without `Project`). - `project.export_fits()` produces a single coherent directory tree (`//__/...`) — easier to diff or zip than N separate per-file dumps. - `project.results.compare_models(file=...)` works across the full batch, including replicates of the same physical sample. - `project.save_fits(path)` packages the whole batch into one portable HDF5. - Concrete contrast: mention what is lost when running the loop without `Project` (the four points above) — makes the value prop explicit rather than implicit. This notebook makes the distinction clear: multi-file workspace does not necessarily mean shared/project-level fitting. ### `fitting_workflows/21_project_level_shared_fit` Power-user story: - Load multiple related datasets. - Define shared and per-file parameters. - Run `project.fit_2d(...)`. - Save/archive results with `project.save_fits(...)`. - Compare/inspect via `project.results` or loaded `FitResults`. This is where `Project` becomes the main object. The notebook should note that the joint multi-file residual is currently in MVP state: it is not yet lowered to GIR (see `TODO.md`), which is the source of the slowness — not a permanent characterization. ### `synthetic_data` Forward-model story: - `01_simulator`: generate known-truth spectra for validation and demos. - `02_ml_training_data`: sweep parameter space and save training datasets. These examples can keep using `Project`/`File` internally because the simulator needs a model, but the section is described as synthetic data generation, not as a fitting tutorial. ## Docs Navigation The examples documentation moves from a single linear path to a "choose your track" entry point: - New user with one processed file: start at `fitting_workflows/01_basic_fitting`. - Comparing two fits on one file: start at `fitting_workflows/10_model_comparison`. - Saving, loading, or exporting fit results (HDF5 archive or CSV/PNG tree): start at `fitting_workflows/11_save_load_export`. - Many files, separate fits: start at `fitting_workflows/20_fit_each_separately`. - Shared/global fit: start at `fitting_workflows/21_project_level_shared_fit`. - Simulation or ML training data: start at `synthetic_data`. The quickstart can still recommend the basic fitting notebook as the first notebook, but it should not imply that every user should walk every example in numerical order. ## Save/Export/Load Presentation The examples should be careful about language: - Use **export** for one-way CSV/PNG output intended for humans and tools like Origin: `file.export_fit()` / `project.export_fits()`. - Use **save/load** for round-trippable HDF5 fit-result archives: `file.save_fit()`, `project.save_fits()`, `FitResults.load(...)`, `project.load_fits(...)`. - Keep individual export visibly supported. The deprecated methods are the old method names and legacy implementations, not the single-file export workflow. - Present `FitResults` as the result browser/comparison object, not as something casual single-file users must understand before exporting. **Auto-export side effect.** `fit_*` methods write CSVs/PNGs to `project.path_results` automatically on completion by default. Explicit `file.export_fit()` / `project.export_fits()` calls are the re-runnable, slot-filtered version of that same content. `project.auto_export = False` (also settable via `auto_export: false` in `project.yaml`) makes the explicit path the only one that writes — useful for parameter sweeps, ML training-data generation, and the long-term real-time fitting goal. Notebooks should describe both the default and the opt-out, so users are not surprised by files appearing on disk before they "exported." ## Migration Plan 1. Finish the current fit-results save/load branch with minimal notebook/docs coverage: - Extend `01_basic_fitting/example.ipynb` with a final section demonstrating `file.save_fit()` + `file.export_fit()` (no `compare_models` / `load` — those belong in `10_model_comparison`, written in the follow-up branch). - Make sure `file.export_fit()` is presented as the Origin-style path. - Run tests and merge. 2. Start a new branch for the examples upgrade. 3. Move current notebooks into the new structure: - `fitting_workflows/01_basic_fitting` → unchanged - `fitting_workflows/02_dependent_parameters` → unchanged - `fitting_workflows/03_multi_cycle` → `fitting_workflows/03_multi_cycle_dynamics` - `fitting_workflows/04_par_profiles` → `fitting_workflows/04_parameter_profiles` - `fitting_workflows/05_project_level_fitting` → `fitting_workflows/21_project_level_shared_fit` - `data_generation/simulator` → `synthetic_data/01_simulator` - `data_generation/ml_training` → `synthetic_data/02_ml_training_data` 4. Split and add notebooks: - `fitting_workflows/10_model_comparison/` — already exists post fit-saving merge. Trim to comparison-only: lift §8 (Save → Load → Compare Across Sessions), §9 (Browse the Loaded Archive), the "Ship just the winning fits" subsection, and the persistence bullets/tips into `11_save_load_export`. Update its intro table-of-contents (drop bullets about save/load/export) and the Tips block accordingly. - `fitting_workflows/11_save_load_export/` — NEW. Two-cell preamble (markdown pointer + `%run ../10_model_comparison/example.ipynb`), then the content lifted from the pre-split notebook 10. See the content target above for the full scope. - `fitting_workflows/20_fit_each_separately/` — NEW. 5. Add `fitting_workflows/README.md` documenting the 0x / 1x / 2x numeric-block legend. 6. Update `examples/README.md`, `docs/examples/index.rst`, and `docs/quickstart.md` to use the track-based navigation. Grep for hardcoded old paths first. 7. Run notebook smoke checks or at least path/import checks after the moves. ## Non-goals For The Save/Load Branch - Do not reorganize the full `examples/` tree in the save/load branch. - Do not rewrite every existing notebook to the new teaching architecture before merging the archive work. The save/load branch should only make the new feature discoverable enough that users are not stranded. The full teaching architecture belongs in the follow-up examples branch. **Data-preparation workflows.** Dark subtraction, detector calibration, and pixel-to-energy mapping are upstream preprocessing — out of scope for this examples-upgrade pass. Dark subtraction and detector calibration may be too instrument-specific for this repo's core example tree. Energy-axis calibration by fitting reference spectra is closer to `trspecfit`'s value proposition, so it can be revisited later if we have a compact, shareable Au 4f / valence-band style dataset and a workflow that teaches calibration without turning into an instrument-control tutorial. ## Open Questions - Should moved notebooks preserve old numeric prefixes exactly, or use the new block scheme? **Resolved**: use the new 0x / 1x / 2x block scheme; document the legend in `fitting_workflows/README.md`. - Should the examples upgrade include a compatibility note for old paths, or is this acceptable as a clean pre-1.0 examples reorganization? **Resolve via grep** of `docs/`, `README.md`, `examples/README.md`, and any reference in the docstrings before the rename branch starts — the answer follows the number of hits.