From a Single Scenario to a Design

library(ssdsims)

A single ssd_scenario is one regular grid — a rectangular cross-join of the axes. A real study is rarely rectangular: you want a coarse grid everywhere and a dense refinement in one region (more nrow values, but only for one dataset, say). Forcing that into one scenario computes a cross-product you never wanted; running it as several separate pipelines recomputes the cells they share and gives you no single summary.

A design is the answer: a named set of scenarios run as one pipeline, unioned into one possibly-ragged task set. Overlapping cells are computed once, and the result is a single combined summary with a scenario column. This vignette shows how to grow a one-off run into a design — a one-line change — and then refine it.

Start with a single scenario

A typical one-off run defines a scenario and turns it into a targets pipeline with ssd_scenario_targets():

data <- ssd_scenario_data(
  boron = ssddata::ccme_boron,
  cadmium = ssddata::ccme_cadmium
)
# shard on (dataset, sim) so each (dataset, sim) is its own cell
pb <- list(
  sample = c("dataset", "sim"),
  fit = c("dataset", "sim"),
  hc = c("dataset", "sim")
)
coarse <- ssd_define_scenario(
  data,
  nsim = 5L,
  seed = 42L,
  nrow = c(5L, 10L, 20L),
  dists = ssd_distset(lnorm = "lnorm"),
  partition_by = pb
)
coarse
#> <ssdsims_scenario>
#>   seed:     42
#>   nsim:     5
#>   datasets: boron, cadmium
#>   nrow:     5, 10, 20
#>   replace:  TRUE
#>   nrow_max: 1000 (setting)
#>   fit grid:
#>     rescale: FALSE
#>     computable: FALSE
#>     at_boundary_ok: TRUE
#>     min_pmix: ssd_min_pmix
#>     range_shape1: {0.05, 20}
#>     range_shape2: {0.05, 20}
#>     dists: lnorm (setting)
#>   hc grid:
#>     est_method: multi (setting)
#>     proportion: 0.05 (setting)
#>     ci: FALSE (setting)
#>     nboot: 1000
#>     ci_method: weighted_samples
#>     parametric: TRUE
#>     samples: FALSE (setting)
#>   distsets:
#>     lnorm: lnorm
#>   partition_by:
#>     sample: dataset, sim
#>     fit: dataset, sim
#>     hc: dataset, sim
#>   bundle:
#>     sample: replace
#>     fit: replace, nrow, rescale, computable, at_boundary_ok, min_pmix, range_shape1, range_shape2
#>     hc: replace, nrow, rescale, computable, at_boundary_ok, min_pmix, range_shape1, range_shape2, nboot, ci_method, parametric, distset

This sweeps both datasets shallowly: a regular {boron, cadmium} x sim 1:5 x nrow {5, 10, 20} grid.

# _targets.R
library(targets)
library(tarchetypes)
library(ssdsims)

ssd_scenario_targets(coarse, root = "results")

Migrate: wrap it in a design

To grow this into a study, wrap the scenario with ssd_design() and switch the factory to ssd_design_targets(). That is the entire change:

design <- ssd_design(coarse)
design
#> <ssdsims_design>
#>   scenarios: 1
#>     coarse (seed 42)
# _targets.R
design <- ssd_design(coarse)
ssd_design_targets(design, root = "results")

A design of one is valid and uniformly shaped — the recommended starting point for a study that may grow. The per-task results are byte-identical to the standalone run: combining changes only addressing (target names and the results tree), never a task’s reproducible (seed, primer).

Cache-preserving upgrade

The upgrade reuses the standalone run’s shards — it recomputes nothing. Both factories root shards under the same seed-/layout-keyed tree (results/seed=42/layout=.../..., from scenario_results_dir()) and weave the seed into the same target names, so a design of one mints byte-identical targets and paths to the standalone run. Re-running into the same store is a full cache hit; only the per-member and combined summary targets are new.

Refine: zoom into one region

Now the irregular part. Suppose boron deserves a closer look — more replicates and a finer nrow sweep — but cadmium does not. Define a second scenario covering just that region (only boron, sim 1:15, a finer nrow) and add it to the design:

dense <- ssd_define_scenario(
  ssd_scenario_data(boron = ssddata::ccme_boron),
  nsim = 15L,
  seed = 42L,
  nrow = c(8L, 12L, 16L),
  dists = ssd_distset(lnorm = "lnorm"),
  partition_by = pb
)
study <- ssd_design(coarse = coarse, dense = dense)
study
#> <ssdsims_design>
#>   scenarios: 2
#>     coarse (seed 42)
#>     dense (seed 42)
# _targets.R
ssd_design_targets(study, root = "results")

The union is genuinely ragged — neither a rectangle nor a strict nesting. Over the (dataset, sim) cells:

  sim:    1   2   3   4   5   6  ...  15
  boron   ■   ■   ■   ■   ■   □  ...  □      ■ shared (coarse + dense)
  cadmium ◆   ◆   ◆   ◆   ◆                  ◆ coarse-only
                          └── □ dense-only (boron, sim 6:15) ──┘
  • boron sims 1–5 are shared — built once and read by both members; their fit/hc tasks merge both nrow sweeps ({5,10,20}{8,12,16}).
  • cadmium stays coarse-only; the refinement never touches it.
  • boron sims 6–15 are the dense-only zoom.

A single scenario could not express this without computing the full {boron, cadmium} x sim 1:15 x {5,8,10,12,16,20} cross-product — most of which you never wanted. Re-running tar_make() builds only the genuinely new cells; the shared boron 1–5 stay cached.

The combined results/summary.parquet carries a scenario column ("coarse" or "dense"), and each member’s rows are filtered to exactly its own cells — ready to plot the broad sweep and the zoom together.

Notes

  • Consistency contract. ssd_design() requires that a name means the same thing across members: the same dataset name must bind identical data, the same min_pmix name the same function, the same distset name the same members, and partition_by must match. This is what makes sharing a cell across members sound. Inconsistent bindings abort at construction.
  • Varying the seed. Members may use different seeds (e.g. repeating the whole exploration under several master seeds); they land under separate seed= trees and share nothing. Members sharing a seed share their coincident cells (common random numbers).
  • Keep nrow_max uniform. nrow_max is a sample draw-size guard, not a comparison axis; differing or changing it across members is undefined behaviour for shard sharing.

See “Running a Sharded Pipeline” for the shard/results layout a design builds on.