library(ssdsims)library(ssdsims)A single ssd_scenario is one regular grid — a rectangular cross-join of the axes. A real study is rarely rectangular: you want a coarse grid everywhere and a dense refinement in one region (more nrow values, but only for one dataset, say). Forcing that into one scenario computes a cross-product you never wanted; running it as several separate pipelines recomputes the cells they share and gives you no single summary.
A design is the answer: a named set of scenarios run as one pipeline, unioned into one possibly-ragged task set. Overlapping cells are computed once, and the result is a single combined summary with a scenario column. This vignette shows how to grow a one-off run into a design — a one-line change — and then refine it.
A typical one-off run defines a scenario and turns it into a targets pipeline with ssd_scenario_targets():
data <- ssd_scenario_data(
boron = ssddata::ccme_boron,
cadmium = ssddata::ccme_cadmium
)
# shard on (dataset, sim) so each (dataset, sim) is its own cell
pb <- list(
sample = c("dataset", "sim"),
fit = c("dataset", "sim"),
hc = c("dataset", "sim")
)
coarse <- ssd_define_scenario(
data,
nsim = 5L,
seed = 42L,
nrow = c(5L, 10L, 20L),
dists = ssd_distset(lnorm = "lnorm"),
partition_by = pb
)
coarse
#> <ssdsims_scenario>
#> seed: 42
#> nsim: 5
#> datasets: boron, cadmium
#> nrow: 5, 10, 20
#> replace: TRUE
#> nrow_max: 1000 (setting)
#> fit grid:
#> rescale: FALSE
#> computable: FALSE
#> at_boundary_ok: TRUE
#> min_pmix: ssd_min_pmix
#> range_shape1: {0.05, 20}
#> range_shape2: {0.05, 20}
#> dists: lnorm (setting)
#> hc grid:
#> est_method: multi (setting)
#> proportion: 0.05 (setting)
#> ci: FALSE (setting)
#> nboot: 1000
#> ci_method: weighted_samples
#> parametric: TRUE
#> samples: FALSE (setting)
#> distsets:
#> lnorm: lnorm
#> partition_by:
#> sample: dataset, sim
#> fit: dataset, sim
#> hc: dataset, sim
#> bundle:
#> sample: replace
#> fit: replace, nrow, rescale, computable, at_boundary_ok, min_pmix, range_shape1, range_shape2
#> hc: replace, nrow, rescale, computable, at_boundary_ok, min_pmix, range_shape1, range_shape2, nboot, ci_method, parametric, distsetThis sweeps both datasets shallowly: a regular {boron, cadmium} x sim 1:5 x nrow {5, 10, 20} grid.
# _targets.R
library(targets)
library(tarchetypes)
library(ssdsims)
ssd_scenario_targets(coarse, root = "results")To grow this into a study, wrap the scenario with ssd_design() and switch the factory to ssd_design_targets(). That is the entire change:
design <- ssd_design(coarse)
design
#> <ssdsims_design>
#> scenarios: 1
#> coarse (seed 42)# _targets.R
design <- ssd_design(coarse)
ssd_design_targets(design, root = "results")A design of one is valid and uniformly shaped — the recommended starting point for a study that may grow. The per-task results are byte-identical to the standalone run: combining changes only addressing (target names and the results tree), never a task’s reproducible (seed, primer).
Cache-preserving upgrade
The upgrade reuses the standalone run’s shards — it recomputes nothing. Both factories root shards under the same seed-/layout-keyed tree (
results/seed=42/layout=.../..., fromscenario_results_dir()) and weave theseedinto the same target names, so a design of one mints byte-identical targets and paths to the standalone run. Re-running into the same store is a full cache hit; only the per-member and combinedsummarytargets are new.
Now the irregular part. Suppose boron deserves a closer look — more replicates and a finer nrow sweep — but cadmium does not. Define a second scenario covering just that region (only boron, sim 1:15, a finer nrow) and add it to the design:
dense <- ssd_define_scenario(
ssd_scenario_data(boron = ssddata::ccme_boron),
nsim = 15L,
seed = 42L,
nrow = c(8L, 12L, 16L),
dists = ssd_distset(lnorm = "lnorm"),
partition_by = pb
)
study <- ssd_design(coarse = coarse, dense = dense)
study
#> <ssdsims_design>
#> scenarios: 2
#> coarse (seed 42)
#> dense (seed 42)# _targets.R
ssd_design_targets(study, root = "results")The union is genuinely ragged — neither a rectangle nor a strict nesting. Over the (dataset, sim) cells:
sim: 1 2 3 4 5 6 ... 15
boron ■ ■ ■ ■ ■ □ ... □ ■ shared (coarse + dense)
cadmium ◆ ◆ ◆ ◆ ◆ ◆ coarse-only
└── □ dense-only (boron, sim 6:15) ──┘
boron sims 1–5 are shared — built once and read by both members; their fit/hc tasks merge both nrow sweeps ({5,10,20} ∪ {8,12,16}).cadmium stays coarse-only; the refinement never touches it.boron sims 6–15 are the dense-only zoom.A single scenario could not express this without computing the full {boron, cadmium} x sim 1:15 x {5,8,10,12,16,20} cross-product — most of which you never wanted. Re-running tar_make() builds only the genuinely new cells; the shared boron 1–5 stay cached.
The combined results/summary.parquet carries a scenario column ("coarse" or "dense"), and each member’s rows are filtered to exactly its own cells — ready to plot the broad sweep and the zoom together.
ssd_design() requires that a name means the same thing across members: the same dataset name must bind identical data, the same min_pmix name the same function, the same distset name the same members, and partition_by must match. This is what makes sharing a cell across members sound. Inconsistent bindings abort at construction.seeds (e.g. repeating the whole exploration under several master seeds); they land under separate seed= trees and share nothing. Members sharing a seed share their coincident cells (common random numbers).nrow_max uniform. nrow_max is a sample draw-size guard, not a comparison axis; differing or changing it across members is undefined behaviour for shard sharing.See “Running a Sharded Pipeline” for the shard/results layout a design builds on.