Examples Upgrade Plan

Design note for a future branch that reorganizes the example notebooks around the way users actually approach the package. This is deliberately separate from the fit-results save/load branch: the current branch should finish the archive feature with minimal examples/docs coverage, then merge. The broader examples upgrade is a teaching and UX project with enough file movement and narrative work to deserve its own branch.

Decisions (locked)

  • Track-based navigation replaces the linear “walk forward” path. The quickstart still recommends 01_basic_fitting as the first notebook, but does not imply that every user should walk every example in order.

  • Top-level directories for this examples-upgrade pass: fitting_workflows/ (existing name kept) and synthetic_data/ (renamed from data_generation/). No data_preparation/ track in this pass.

  • Inside fitting_workflows/, layout is flat with three numeric blocks: 01–04 = fitting skills on a single file; 10–11 = post-fit work (comparison, persistence, export); 20+ = multi-file workflows. fitting_workflows/README.md documents the legend.

  • 10_model_comparison is strictly about model comparison. The persistence / inspection / export side (save/load h5, browse loaded archives, ship single slots, the two-channels framing) moves to a sibling notebook 11_save_load_export. Each notebook has one job, and notebook 11 gets a discoverable filesystem location that can be linked from save_fit / export_fit docstrings, the README, and the CHANGELOG.

  • Notebook 11 uses notebook 10’s full pipeline as its preamble via the IPython %run magic (%run ../10_model_comparison/example.ipynb), preceded by a one-line markdown pointer (“see 10_model_comparison for the fitting/comparison details; this notebook focuses on what to do with the results”). Output suppression rides on the existing project.yaml knobs already used by 10_model_comparison (show_output: 0, auto_export: false); %%capture is the fallback, not the default mechanism. Expected preamble runtime ~30–40 s (baseline fits <1 s each, SbS ~10 s each, 2D fits a few seconds each) — acceptable for a notebook a reader opens deliberately. Single source of truth: notebook 10 owns the fit pipeline; notebook 11 inherits any future updates automatically and ends up with rich state (baseline + SbS + 2D slots, σ snapshot, conf_ci) available for the persistence demos.

  • Casual user’s mental model = File. file.save_fit() saves a snapshot of completed fits for this file, keeping the latest slot per model / fit type / selection (snapshot semantics inherited from Project.save_fits). It is the casual user’s “save all current fits for this file” API.

  • Model comparison (FitResults.load + compare_models) is its own notebook (10_model_comparison), not part of 01_basic_fitting — comparison requires fitting two models, which doubles the cognitive load of the basic notebook.

  • Export terminology: export = one-way CSV/PNG for humans + tools like Origin; save/load = HDF5 round-trip via the FitResults archive.

  • Project.auto_export (default True, configurable via project.yaml or post-init mutation) gates the fit-completion CSV/PNG side effects. The basic notebook documents the default behavior and shows the opt-out.

Motivation

The fit-results work introduces a clearer split between:

  • File as the natural surface for fitting and exporting one dataset.

  • Project as configuration, workspace, multi-file coordination, and archive ownership.

  • FitResults as the inspection/comparison object for completed fits.

The examples should reinforce that split. Users who are thinking “fit file 1, export the fit, load it into Origin” should not feel that they need to understand the full Project model. Power users should still have a clear path to multi-file fitting, project-level shared fits, archives, and model comparison.

Current State

The examples are currently organized as:

examples/
  data_generation/
    simulator/
    ml_training/
  fitting_workflows/
    01_basic_fitting/
    02_dependent_parameters/
    03_multi_cycle/
    04_par_profiles/
    05_project_level_fitting/

This is already close in one respect: simulation and ML training data are separate from fitting. The weakness is that all fitting notebooks live in one linear sequence, even though they represent different user mindsets:

  • Single-file fitting skills (01 through 04).

  • Post-fit comparison (currently absent).

  • Multi-file / project-level fitting (05, with no bridge between the single-file basics and full project-level shared fits).

The current quickstart also tells users to start at 01 and work forward. That is useful for a tutorial path, but less useful once examples cover multiple tracks.

User Workflows

Single-file / Origin-style user

This user has one processed dataset and wants to fit it, export tables/plots, and keep working in external tools.

Primary API:

file.fit_baseline(...)
file.fit_2d(...)
file.export_fit()                       # CSV + PNG, Origin-friendly
file.save_fit()                         # HDF5 archive snapshot for this file
file.get_fit_results(fit_type="2d")

The Project should appear as setup/context, not as the main conceptual object. For this user, Project is mostly where config, paths, plotting defaults, and model files live.

Multi-file individual fitting user

This user has several files but wants to fit each file separately. They want a shared loop, consistent settings, per-file exports, and a summary view.

Primary API:

project = trspecfit.Project(...)
files = [...]

for file in files:
    file.fit_baseline(...)
    file.fit_2d(...)

project.export_fits()                              # one coherent tree across files
project.save_fits()                                # one portable HDF5 for the batch
project.results.compare_models(file=files[0], ...)

This is the bridge workflow: Project is useful as a collection and session workspace, but each fit remains file-scoped.

Project-level / shared-fit user

This user intentionally wants shared parameters across multiple files and is ready for Project to be an active fitting object.

Primary API:

project.fit_2d(...)
project.save_fits(...)
project.results.compare_models(...)

This workflow is more complex and should come after multi-file individual fitting, not immediately after single-file basics.

Synthetic-data / ML user

This user is working with forward simulation, validation, or training data. They are not preparing experimental data and not primarily fitting an existing file. The current simulator and ML training notebooks belong together.

Proposed Directory Layout

examples/
  fitting_workflows/                              # existing name kept
    01_basic_fitting/                             # block 0x: fitting skills (single-file)
    02_dependent_parameters/
    03_multi_cycle_dynamics/                      # renamed from 03_multi_cycle
    04_parameter_profiles/                        # renamed from 04_par_profiles
    10_model_comparison/                          # block 1x: post-fit work
    11_save_load_export/                          # block 1x: post-fit work (NEW)
    20_fit_each_separately/                       # block 2x: multi-file (NEW, bridge)
    21_project_level_shared_fit/                  # was 05_project_level_fitting
  synthetic_data/                                 # renamed from data_generation
    01_simulator/
    02_ml_training_data/

Numbering convention documented in fitting_workflows/README.md:

  • 0x — fitting skills on a single file.

  • 1x — post-fit work (comparison, persistence, export).

  • 2x — multi-file workflows.

The flat layout with three numeric blocks keeps alphabetical sort intact while making the category structure visible without an extra directory level. Numbering restarts at the next block boundary as new notebooks are added within a category.

Notebook Content Targets

fitting_workflows/01_basic_fitting

Core “one file” story:

  • Load processed data, energy, and time.

  • Fit baseline, optional slice-by-slice, and 2D model.

  • Show file.get_fit_results(...).

  • Show file.export_fit() as the Origin-friendly CSV/PNG workflow.

  • Show file.save_fit() as the archive-snapshot persistence (keeps the latest slot per model / fit type / selection for this file).

  • Callout: fit_* methods auto-write CSVs/PNGs to project.path_results on completion by default. The notebook shows both the default and the project.auto_export = False opt-out (also settable via project.yaml).

  • Out of scope here: FitResults.load and compare_models — those move to 10_model_comparison so this notebook stays focused on the casual user’s single-file path.

fitting_workflows/10_model_comparison

Post-fit comparison story (NEW). Strictly model selection — persistence, inspection, and export move to the sibling notebook 11_save_load_export so each notebook has one job.

  • Two models, one file. Compress the fitting cells — readers have seen the fit API in 01–04, so this notebook glosses over fitting and focuses on comparison.

  • Three comparison stories, each isolating one structural choice: baseline (line shape), SbS (parsimony), 2D (instrument response).

  • In-session comparison via project.results.compare_models(...) and the sugar delegate file.compare_models(...).

  • compare_models aggregation modes: default median, sum, and long for per-slice rows. Close §6 with a “two practical questions” payoff — “which model fits spectrum #4 best?” (long form + slice filter) and “which model fits best across the board?” (sum-aggregated, sorted by AIC). This motivates long form for the batch-of-spectra use case (SbS as N independent 1D fits, not necessarily time).

  • plot_residuals at both ends: 1D obs+fit+residual for baseline, shared-scale residual heatmaps for 2D (where the IRF residual band is the decisive visual).

Persistence content (save_fit / FitResults.load / compare_models on loaded archives / slot anatomy / filtered single-slot ship / two channels / overwrite semantics / σ-snapshot recalibration) does not live here. See 11_save_load_export.

fitting_workflows/11_save_load_export

Save / load / export story (NEW). The canonical reference for the FitResults archive API, used after a reader has seen fitting and comparison.

Preamble pattern (first two cells):

  1. Markdown pointer to 10_model_comparison (“see that notebook for the fits’ details; this one focuses on what to do with the results”).

  2. A single code cell that runs notebook 10’s content with suppressed output:

    %run ../10_model_comparison/example.ipynb
    

    IPython %run executes the target notebook in the current kernel, so all of notebook 10’s variables (file, project, fitted models) are in scope below. Output suppression rides on the existing project.yaml knobs (show_output: 0, auto_export: false). %%capture is the fallback if any output leaks past the YAML knobs. Runtime ~30–40 s.

After the preamble, the actual content:

  • file.save_fit("comparison.fit.h5") and FitResults.load(path) — the canonical round-trip with no live Project on the reload side.

  • loaded.compare_models(...) showing the same comparison API works identically against an on-disk archive (sanity check, not the primary point).

  • Filtered single-slot save (save_fit(path, model=..., fit_type=...)) and the parallel export_fit with the same filters. “Ship the winners” as the natural next step once a reader has a verdict.

  • The two channels framed by audience: HDF5 (structured, lossless, σ-snapshot included, round-trips back into trspecfit — for future you and other trspecfit users) vs CSV/PNG tree (one-way — for Origin, MATLAB, paper plots, non-trspecfit colleagues).

  • FitResults query API: files(), models(), find(), get(), and slot anatomy via dataclasses.fields(slot) rendering shapes for arrays/frames and keys for dicts (every constituent part is discoverable without opening the .h5).

  • Overwrite / slot-collision semantics on save_fit (append-by-default, FileExistsError on slot collision unless overwrite=True).

  • σ-snapshot semantics — calibrated columns survive load without re-set_sigma(); what-if recalibration via chi2_red_raw.

The preamble pattern is preferred over an inline stripped-down setup because (a) single source of truth — notebook 10 owns the fit pipeline, notebook 11 inherits future updates automatically — and (b) rich state: all slot types (baseline + SbS + 2D, with conf_ci on baseline) are available for the persistence demos, not just a minimum quorum. The ~30–40 s runtime cost is acceptable for a notebook a reader opens deliberately.

fitting_workflows/20_fit_each_separately

Bridge story (NEW):

  • One model definition and one set of fit limits applied across N files (avoids the duplicated setup code a bare for file in files: ... loop would require without Project).

  • project.export_fits() produces a single coherent directory tree (<root>/<file_name>/<model>__<fit_type>/...) — easier to diff or zip than N separate per-file dumps.

  • project.results.compare_models(file=...) works across the full batch, including replicates of the same physical sample.

  • project.save_fits(path) packages the whole batch into one portable HDF5.

  • Concrete contrast: mention what is lost when running the loop without Project (the four points above) — makes the value prop explicit rather than implicit.

This notebook makes the distinction clear: multi-file workspace does not necessarily mean shared/project-level fitting.

fitting_workflows/21_project_level_shared_fit

Power-user story:

  • Load multiple related datasets.

  • Define shared and per-file parameters.

  • Run project.fit_2d(...).

  • Save/archive results with project.save_fits(...).

  • Compare/inspect via project.results or loaded FitResults.

This is where Project becomes the main object. The notebook should note that the joint multi-file residual is currently in MVP state: it is not yet lowered to GIR (see TODO.md), which is the source of the slowness — not a permanent characterization.

synthetic_data

Forward-model story:

  • 01_simulator: generate known-truth spectra for validation and demos.

  • 02_ml_training_data: sweep parameter space and save training datasets.

These examples can keep using Project/File internally because the simulator needs a model, but the section is described as synthetic data generation, not as a fitting tutorial.

Docs Navigation

The examples documentation moves from a single linear path to a “choose your track” entry point:

  • New user with one processed file: start at fitting_workflows/01_basic_fitting.

  • Comparing two fits on one file: start at fitting_workflows/10_model_comparison.

  • Saving, loading, or exporting fit results (HDF5 archive or CSV/PNG tree): start at fitting_workflows/11_save_load_export.

  • Many files, separate fits: start at fitting_workflows/20_fit_each_separately.

  • Shared/global fit: start at fitting_workflows/21_project_level_shared_fit.

  • Simulation or ML training data: start at synthetic_data.

The quickstart can still recommend the basic fitting notebook as the first notebook, but it should not imply that every user should walk every example in numerical order.

Save/Export/Load Presentation

The examples should be careful about language:

  • Use export for one-way CSV/PNG output intended for humans and tools like Origin: file.export_fit() / project.export_fits().

  • Use save/load for round-trippable HDF5 fit-result archives: file.save_fit(), project.save_fits(), FitResults.load(...), project.load_fits(...).

  • Keep individual export visibly supported. The deprecated methods are the old method names and legacy implementations, not the single-file export workflow.

  • Present FitResults as the result browser/comparison object, not as something casual single-file users must understand before exporting.

Auto-export side effect. fit_* methods write CSVs/PNGs to project.path_results automatically on completion by default. Explicit file.export_fit() / project.export_fits() calls are the re-runnable, slot-filtered version of that same content. project.auto_export = False (also settable via auto_export: false in project.yaml) makes the explicit path the only one that writes — useful for parameter sweeps, ML training-data generation, and the long-term real-time fitting goal. Notebooks should describe both the default and the opt-out, so users are not surprised by files appearing on disk before they “exported.”

Migration Plan

  1. Finish the current fit-results save/load branch with minimal notebook/docs coverage:

    • Extend 01_basic_fitting/example.ipynb with a final section demonstrating file.save_fit() + file.export_fit() (no compare_models / load — those belong in 10_model_comparison, written in the follow-up branch).

    • Make sure file.export_fit() is presented as the Origin-style path.

    • Run tests and merge.

  2. Start a new branch for the examples upgrade.

  3. Move current notebooks into the new structure:

    • fitting_workflows/01_basic_fitting → unchanged

    • fitting_workflows/02_dependent_parameters → unchanged

    • fitting_workflows/03_multi_cyclefitting_workflows/03_multi_cycle_dynamics

    • fitting_workflows/04_par_profilesfitting_workflows/04_parameter_profiles

    • fitting_workflows/05_project_level_fittingfitting_workflows/21_project_level_shared_fit

    • data_generation/simulatorsynthetic_data/01_simulator

    • data_generation/ml_trainingsynthetic_data/02_ml_training_data

  4. Split and add notebooks:

    • fitting_workflows/10_model_comparison/ — already exists post fit-saving merge. Trim to comparison-only: lift §8 (Save → Load → Compare Across Sessions), §9 (Browse the Loaded Archive), the “Ship just the winning fits” subsection, and the persistence bullets/tips into 11_save_load_export. Update its intro table-of-contents (drop bullets about save/load/export) and the Tips block accordingly.

    • fitting_workflows/11_save_load_export/ — NEW. Two-cell preamble (markdown pointer + %run ../10_model_comparison/example.ipynb), then the content lifted from the pre-split notebook 10. See the content target above for the full scope.

    • fitting_workflows/20_fit_each_separately/ — NEW.

  5. Add fitting_workflows/README.md documenting the 0x / 1x / 2x numeric-block legend.

  6. Update examples/README.md, docs/examples/index.rst, and docs/quickstart.md to use the track-based navigation. Grep for hardcoded old paths first.

  7. Run notebook smoke checks or at least path/import checks after the moves.

Non-goals For The Save/Load Branch

  • Do not reorganize the full examples/ tree in the save/load branch.

  • Do not rewrite every existing notebook to the new teaching architecture before merging the archive work.

The save/load branch should only make the new feature discoverable enough that users are not stranded. The full teaching architecture belongs in the follow-up examples branch.

Data-preparation workflows. Dark subtraction, detector calibration, and pixel-to-energy mapping are upstream preprocessing — out of scope for this examples-upgrade pass. Dark subtraction and detector calibration may be too instrument-specific for this repo’s core example tree. Energy-axis calibration by fitting reference spectra is closer to trspecfit’s value proposition, so it can be revisited later if we have a compact, shareable Au 4f / valence-band style dataset and a workflow that teaches calibration without turning into an instrument-control tutorial.

Open Questions

  • Should moved notebooks preserve old numeric prefixes exactly, or use the new block scheme? Resolved: use the new 0x / 1x / 2x block scheme; document the legend in fitting_workflows/README.md.

  • Should the examples upgrade include a compatibility note for old paths, or is this acceptable as a clean pre-1.0 examples reorganization? Resolve via grep of docs/, README.md, examples/README.md, and any reference in the docstrings before the rename branch starts — the answer follows the number of hits.