Package 'ssdsims'

Title: Simulation Analyses for Species Sensitivity Distributions
Description: Runs reproducible simulation studies for species sensitivity distribution (SSD) models built on the 'ssdtools' package. Expands a declarative scenario into per-step task tables, draws data, fits distributions, and estimates hazard concentrations, with a 'targets'-based, Hive-partitioned shard pipeline for running studies in parallel or on a cluster.
Authors: Joe Thorley [aut, cre] (ORCID: <https://orcid.org/0000-0002-7683-4592>), Rebecca Fisher [aut]
Maintainer: Joe Thorley <[email protected]>
License: Apache License (== 2.0) | file LICENSE
Version: 0.0.0.9012
Built: 2026-06-05 02:51:05 UTC
Source: https://github.com/poissonconsulting/ssdsims

Help Index


Local dqrng pcg64 Backend

Description

Activates the dqrng pcg64 RNG backend for the duration of the calling frame, then resets it when .local_envir exits. While active, base R's runif(), rnorm(), rbinom(), rexp(), rgamma(), rpois(), sample.int(), and sample() (and therefore dplyr::slice_sample() and ⁠ssdtools::ssd_r*()⁠) draw from dqrng's pcg64, seeded via dqrng::dqset.seed(). pcg64 is forced explicitly because it accepts the length-2 stream argument the per-task primer design relies on; dqrng's own default (⁠Xoroshiro128++⁠) does not.

Usage

local_dqrng_backend(.local_envir = parent.frame())

Arguments

.local_envir

⁠[environment]⁠
The environment to use for scoping.

Details

Registering the backend is a process-global side effect that also advances base R's .Random.seed. local_dqrng_backend() follows the withr convention (compare withr::local_seed()): it pairs activation with deferred reset so the backend is always restored, including on error.

The helper is reentrant. dqrng::register_methods() / dqrng::restore_methods() keep a single global save-slot, so a nested reset would tear the backend down for the still-open outer scope. To avoid this, a local_dqrng_backend() call made while the backend is already active is a no-op: it does not re-activate the backend and schedules no further reset. Only the outermost call activates the backend on entry and resets it on exit, so the RNG stream is identical whether or not a nested call occurs.

Value

Invisibly returns TRUE if this call activated the backend (the outermost scope) or FALSE if the backend was already active and the call was a no-op.

See Also

withr::local_seed(), dqrng::dqset.seed().

Examples

local_dqrng_backend()
dqrng::dqset.seed(42, stream = c(1L, 2L))
runif(3)

Local/With dqrng State

Description

local_dqrng_state() installs a per-task ⁠(seed, primer)⁠ starting point as the running dqrng RNG state via dqrng::dqset.seed(), restoring the previous state when .local_envir exits. with_dqrng_state() evaluates code with that state installed, then restores the previous state. The primer argument is the per-task primer (the value handed to dqrng's stream argument, per TARGETS-DESIGN.md §2 and the GLOSSARY); the ⁠_state⁠ suffix marks that the wrapper installs that primer as the running RNG state.

Usage

local_dqrng_state(seed, primer, .local_envir = parent.frame())

with_dqrng_state(seed, primer, code)

Arguments

seed

⁠[whole number]⁠
A scalar seed passed to dqrng::dqset.seed().

primer

⁠[integer(2)]⁠
A length-2 integer primer passed as the stream argument of dqrng::dqset.seed(). NA_integer_ is permitted (the reserved INT_MIN encoding of TARGETS-DESIGN.md §2).

.local_envir

⁠[environment]⁠
The environment to use for scoping.

code

[any]
Code to execute in the temporary environment

Details

These are the dqrng-path analogues of local_lecuyer_cmrg_state() / with_lecuyer_cmrg_state(). Like those helpers they snapshot the RNG state on entry (via dqrng::dqrng_get_state()) and withr::defer() a restore (via dqrng::dqrng_set_state()), so a call leaves the surrounding RNG stream undisturbed, including on error.

Both require an active dqrng backend: they abort unless a local_dqrng_backend() scope is open. This fails fast rather than silently seeding base R's Mersenne-Twister.

Value

local_dqrng_state() invisibly returns primer; with_dqrng_state() returns the value of code.

See Also

withr::local_seed(), local_dqrng_backend(), local_lecuyer_cmrg_state().

Examples

local_dqrng_backend()
local_dqrng_state(42, c(1L, 2L))
runif(3)

with_dqrng_state(42, c(1L, 2L), runif(3))

Local/With L'Ecuyer-CMRG Seed

Description

local_lecuyer_cmrg_seed() seeds the L'Ecuyer-CMRG RNG with a scalar integer via base::set.seed(), restoring the previous state when .local_envir exits. with_lecuyer_cmrg_seed() evaluates code with that seed in effect, then restores the previous state. For a .Random.seed-style state vector (e.g. from get_lecuyer_cmrg_stream_state() or parallel::nextRNGStream()) use local_lecuyer_cmrg_state() / with_lecuyer_cmrg_state().

Usage

local_lecuyer_cmrg_seed(seed, .local_envir = parent.frame())

with_lecuyer_cmrg_seed(seed, code)

Arguments

seed

⁠[integer(1)]⁠
The random seed to use to evaluate the code.

.local_envir

⁠[environment]⁠
The environment to use for scoping.

code

[any]
Code to execute in the temporary environment

Value

with_lecuyer_cmrg_seed() returns the value of code.

See Also

withr::local_seed(), local_lecuyer_cmrg_state(), parallel::nextRNGStream().

Examples

local_lecuyer_cmrg_seed(42)
runif(3)

with_lecuyer_cmrg_seed(42, {
  runif(3)
})

Local/With L'Ecuyer-CMRG State

Description

local_lecuyer_cmrg_state() sets the L'Ecuyer-CMRG RNG state to a .Random.seed-style integer vector (length 7) by assigning to .Random.seed directly, restoring the previous state when .local_envir exits. with_lecuyer_cmrg_state() evaluates code with that state in effect, then restores the previous state. A state is the full internal RNG state (as returned by parallel::nextRNGStream() or get_lecuyer_cmrg_stream_state()); contrast with base::set.seed() which takes a scalar seed (see local_lecuyer_cmrg_seed() / with_lecuyer_cmrg_seed()).

Usage

local_lecuyer_cmrg_state(state, .local_envir = parent.frame())

with_lecuyer_cmrg_state(state, code)

Arguments

state

⁠[integer(7)]⁠
A L'Ecuyer-CMRG .Random.seed vector.

.local_envir

⁠[environment]⁠
The environment to use for scoping.

code

[any]
Code to execute in the temporary environment

Value

local_lecuyer_cmrg_state() invisibly returns state; with_lecuyer_cmrg_state() returns the value of code.

See Also

parallel::nextRNGStream(), local_lecuyer_cmrg_seed().

Examples

state <- with_lecuyer_cmrg_seed(42, parallel::nextRNGStream(.Random.seed))
local_lecuyer_cmrg_state(state)
runif(3)

with_lecuyer_cmrg_state(state, runif(3))

Isolate a Materialised Dataset from a Scenario by Name

Description

Returns the validated, materialised dataset tibble stored on scenario under name. The dataset was validated (a numeric Conc column) and materialised at construction by ssd_define_scenario(), so this accessor performs no registration, persistence, or re-validation - it just isolates the value a shard body fits. Aborts with an informative error when name is not one of the scenario's datasets.

Usage

scenario_dataset(scenario, name)

Arguments

scenario

An ssdsims_scenario from ssd_define_scenario().

name

A scalar string naming one of the scenario's datasets.

Details

Names - not values - drive task hashing (TARGETS-DESIGN.md section 1.1): the task path carries the dataset name and this accessor resolves it back to the tibble at run time, so the tibble never enters a task identity.

Value

The materialised dataset tibble stored under name.

See Also

scenario_min_pmix() for the min_pmix counterpart.

Examples

scenario <- ssd_define_scenario(ssddata::ccme_boron, nsim = 1L, seed = 42L)
scenario_dataset(scenario, "ccme_boron")

Isolate a Materialised min_pmix Function from a Scenario by Name

Description

Returns the single-argument min_pmix function materialised on scenario under name. ssd_define_scenario() resolves each min_pmix reference to a function once, at construction (so a cluster worker needs no shared interactive environment), and stores it keyed by name; this accessor isolates it. Aborts with an informative error when name is not one of the scenario's min_pmix names.

Usage

scenario_min_pmix(scenario, name)

Arguments

scenario

An ssdsims_scenario from ssd_define_scenario().

name

A scalar string naming one of the scenario's min_pmix entries.

Details

Names - not function values - drive task hashing (TARGETS-DESIGN.md section 1.1): the fit-task path carries the min_pmix name, and this accessor resolves it back to the function at run time, so the function value never enters a task identity (no byte-stability concern from byte-compilation or captured environments).

Value

The single-argument min_pmix function stored under name.

See Also

scenario_dataset() for the dataset counterpart.

Examples

scenario <- ssd_define_scenario(ssddata::ccme_boron, nsim = 1L, seed = 42L)
scenario_min_pmix(scenario, "ssd_min_pmix")

Layout-keyed Results Root for a Scenario

Description

Returns ⁠<root>/layout=<hash>⁠, where the hash is derived from the scenario's partition_by. A step's Hive shard path depth and axes are a function of partition_by/bundle, so writing two different layouts into one root would leave shards of different granularity side by side - and the depth-agnostic glob the readers use (⁠<step>/**/part.parquet⁠) would then union stale and current shards, double-counting tasks. Keying the results root on the layout isolates each partition_by into its own subtree: re-running a scenario with a changed partition_by/bundle writes to a fresh root (never mixing granularities), while re-running the same layout reuses the root (idempotent and cache-friendly - the same shard paths are simply rewritten).

Usage

scenario_results_dir(scenario, root = "results")

Arguments

scenario

An ssdsims_scenario from ssd_define_scenario().

root

The results root directory (default "results").

Details

The targets pipeline writes under this root (see the shipped ⁠_targets.R⁠ template). The single-core ssd_run_scenario_shards() takes the complementary approach: it owns and clears a fixed dir on each run.

Value

The layout-keyed path ⁠file.path(root, paste0("layout=", <hash>))⁠.

See Also

ssd_run_scenario_shards(), ssd_summarize().

Examples

scenario <- ssd_define_scenario(ssddata::ccme_boron, nsim = 1L, seed = 42L)
scenario_results_dir(scenario)

Assemble and Validate Datasets for a Simulation Scenario

Description

Collects one or more datasets into a validated, named collection - the single entry point through which ssd_define_scenario() takes dataset input. Each dataset must carry a numeric Conc column (the species sensitivity distribution convention); additional columns are preserved.

Usage

ssd_data(...)

Arguments

...

One or more data frames, optionally named. Each is validated for a numeric Conc column.

Details

Names are taken from the argument names where supplied, otherwise derived from the argument expression by symbol capture (e.g. ssddata::ccme_boron becomes "ccme_boron"). A literal with no derivable name (e.g. a bare data.frame(...) call) must be given an explicit name.

ssd_data() is intended to grow: the planned scenario-input-types change (see TARGETS-DESIGN.md section 12) will let each input also be one of the data generators ssd_run_scenario() accepts today - a fitdists or tmbfit object, a generator function, or a function-name string - with the data materialised by the dataset registry. For now each input must be a data frame.

Value

An ssdsims_data object: a named list of validated tibbles.

Examples

ssd_data(ssddata::ccme_boron)
ssd_data(boron = ssddata::ccme_boron, cadmium = ssddata::ccme_cadmium)

Define a Simulation Scenario

Description

Constructs a purely declarative ssdsims_scenario object: the root of the targets-based pipeline (see TARGETS-DESIGN.md section 1). The object stores only declarative fields - a scalar seed, the replicate count nsim, the sample sizes nrow, the dataset names, and the fit and hc argument grids. It performs no random-number generation, no task expansion, and has no dependency on targets.

Usage

ssd_define_scenario(
  data,
  nsim,
  seed,
  ...,
  name = NULL,
  nrow = 6L,
  replace = FALSE,
  dists = ssdtools::ssd_dists_bcanz(),
  rescale = FALSE,
  computable = FALSE,
  at_boundary_ok = TRUE,
  min_pmix = list(ssdtools::ssd_min_pmix),
  range_shape1 = list(c(0.05, 20)),
  range_shape2 = list(c(0.05, 20)),
  proportion = 0.05,
  ci = FALSE,
  nboot = 1000,
  est_method = "multi",
  ci_method = "weighted_samples",
  parametric = TRUE,
  samples = FALSE,
  partition_by = NULL,
  bundle = NULL,
  upload = NULL
)

Arguments

data

An ssd_data() collection (preferred), or - for convenience - a single data frame or a (named or unnamed) list of data frames. Bare inputs are validated via the same Conc contract as ssd_data().

nsim

A count of the number of data sets to generate.

seed

A scalar whole number; the scenario's RNG root. Required - changing it fully re-roots the scenario's random-number draws.

...

Unused; must be empty.

name

An optional dataset name for the single-data-frame form, overriding the derived name. Must not be combined with a named list or an ssd_data() collection.

nrow

A positive whole number of the minimum number of non-missing rows.

replace

A logical vector specifying whether to sample with replacement.

dists

A character vector of the distribution names.

rescale

A flag specifying whether to leave the values unchanged (FALSE) or to rescale concentration values by dividing by the geometric mean of the minimum and maximum positive finite values (TRUE) or a string specifying whether to leave the values unchanged ("no") or to rescale concentration values by dividing by the geometric mean of the minimum and maximum positive finite values ("geomean") or to logistically transform ("odds").

computable

A flag specifying whether to only return fits with numerically computable standard errors.

at_boundary_ok

A flag specifying whether a model with one or more parameters at the boundary should be considered to have converged (default = TRUE).

min_pmix

The min_pmix function(s), referenced by name. Supply either a character vector of names, or a function (or list of functions) with a single argument that inputs the number of rows of data and returns a proportion between 0 and 0.5 - in which case the name is derived from the argument expression (e.g. ssdtools::ssd_min_pmix gives "ssd_min_pmix"), mirroring dataset name derivation. The name is what the task path hashes; the resolved single-argument function is additionally materialised on the scenario (keyed by name) for execution and isolated via scenario_min_pmix(). A name-string is resolved to a function at construction (from ssdtools or the caller's environment), failing fast if it cannot be resolved to a single-argument function.

range_shape1

A list of numeric vectors of length two of the lower and upper bounds for the shape1 parameter.

range_shape2

A list of numeric vectors of length two of the lower and upper bounds for the shape2 parameter.

proportion

A numeric vector of proportion values to estimate hazard concentrations for.

ci

A flag specifying whether to estimate confidence intervals (by bootstrapping).

nboot

A count of the number of bootstrap samples to use to estimate the confidence limits. A value of 10,000 is recommended for official guidelines.

est_method

A string specifying whether to estimate directly from the model-averaged cumulative distribution function (est_method = 'multi') or to take the arithmetic mean of the estimates from the individual cumulative distribution functions weighted by the AICc derived weights (est_method = 'arithmetic') or or to use the geometric mean instead (est_method = 'geometric').

ci_method

A string specifying which method to use for estimating the standard error and confidence limits from the bootstrap samples. The default and recommended value is still ci_method = "weighted_samples" which takes bootstrap samples from each distribution proportional to its AICc based weights and calculates the confidence limits (and SE) from this single set. ci_method = "multi_fixed" and ci_method = "multi_free" generate the bootstrap samples using the model-averaged cumulative distribution function but differ in whether the model weights are fixed at the values for the original dataset or re-estimated for each bootstrap sample dataset. The value ci_method = "MACL" (was ci_method = "weighted_arithmetic"), which is only included for historical reasons, takes the weighted arithmetic mean of the confidence limits while ci_method = GMACL which takes the weighted geometric mean of the confidence limits was added for completeness but is also not recommended. Finally ci_method = "arithmetic_samples" and ci_method = "geometric_samples" take the weighted arithmetic or geometric mean of the values for each bootstrap iteration across all the distributions and then calculate the confidence limits (and SE) from the single set of samples.

parametric

A flag specifying whether to perform parametric bootstrapping as opposed to non-parametrically resampling the original data with replacement.

samples

A logical scalar (default FALSE): retain the bootstrap draws in the hc result's samples list-column (passed to ssdtools::ssd_hc()). This is output retention only - it does not change the estimates or the per-task RNG, so it is not a grid or task axis (a single TRUE is a superset of FALSE). Changing it re-runs the hc step (the discarded draws must be re-bootstrapped) but yields byte-identical estimates; retained samples can be large (nboot draws per dist per task), so it is off by default.

partition_by

An optional, possibly-partial named list keyed by step (sample/fit/hc) of character vectors naming the Hive path axes for that step (one shard per path cell; the inner complement rides as Parquet columns). Each entry must be unique, non-missing, and a subset of that step's axis vocabulary: sample = dataset, sim, replace; fit adds nrow, rescale, computable, at_boundary_ok, min_pmix, range_shape1, range_shape2; hc adds ci, nboot, est_method, ci_method, parametric. "nrow" is rejected only for sample (the shared draw carries no nrow axis; the fit step truncates it inline), and is a valid path axis for fit/hc. Steps partition independently - there is no cross-step constraint; a step may be finer or coarser than its neighbour on a shared axis (the m:n parent-shard relationship is resolved at the read layer). Steps left unnamed take their documented defaults (sample = c("dataset", "sim", "replace"), fit = c("dataset", "sim", "nrow", "rescale"), hc = c("dataset", "sim"); these supersede TARGETS-DESIGN.md section 5's pre-fold table). The split is orthogonal to the per-task RNG primer, so changing it shifts file paths only, never results.

bundle

An optional, possibly-partial named list keyed by step, the per-step complement of partition_by: it names the inner axes to keep together within a shard, and the stored path axes become setdiff(task_axes(step), bundle[[step]]). partition_by and bundle are complementary per-step entry points - at most one may name a given step (a step in both is an error), but they may be mixed across steps and either may be partial. Use partition_by when you want few path axes, bundle when you want fine sharding and only a few inner axes. Both normalise into the single stored partition_by path list.

upload

An optional upload specification (a list), or NULL for no upload.

Details

Input data is forwarded through ssd_data() for validation (a numeric Conc column is required) and retained on the scenario (as ⁠$data⁠) so a local run (ssd_run_scenario_baseline()) can sample it directly. The dataset names (⁠$datasets⁠) are what the targets/cluster path hashes; the validated tibbles ride on the scenario and are isolated by name via scenario_dataset(), so the hash need not carry the data frames.

Value

An S3 object of class ssdsims_scenario.

Dataset input

The preferred form is an ssd_data() collection, which owns validation and naming: ssd_define_scenario(ssd_data(boron = ccme_boron, cadmium = ccme_cadmium), ...). For convenience, bare data frame input is also accepted in four forms (routed through the same Conc validation):

  1. A single data frame, name derived from the argument expression: ssd_define_scenario(ssddata::ccme_boron, ...) gives "ccme_boron".

  2. A single data frame with an explicit ⁠name=⁠: ssd_define_scenario(ssddata::ccme_boron, name = "boron", ...).

  3. A named list, names taken from the list: ssd_define_scenario(list(boron = ccme_boron, cadmium = ccme_cadmium), ...).

  4. An unnamed list, names derived per element: ssd_define_scenario(list(ccme_boron, ccme_cadmium), ...).

Supplying both a named list and ⁠name=⁠ is an error.

ci = FALSE

When ci = FALSE is the only confidence-interval value, the bootstrap-only knobs nboot, ci_method, and parametric are meaningless. Passing any of them in that case is an error; set ci = c(FALSE, TRUE) to enable bootstrap, or omit the knobs.

Examples

ssd_define_scenario(ssddata::ccme_boron, nsim = 100L, nrow = c(5L, 10L), seed = 42L)

Fit SSD Distributions to Simulated Data

Description

Fit SSD Distributions to Simulated Data

Usage

ssd_fit_dists_sims(
  x,
  dists = ssdtools::ssd_dists_bcanz(),
  ...,
  rescale = FALSE,
  computable = FALSE,
  at_boundary_ok = TRUE,
  min_pmix = list(ssdtools::ssd_min_pmix),
  range_shape1 = list(c(0.05, 20)),
  range_shape2 = range_shape1,
  seed = NULL,
  silent = TRUE,
  .progress = FALSE
)

Arguments

x

A data frame with sim and stream integer columns and a list column of the data frames to fit distributions to.

dists

A character vector of the distribution names.

...

Additional arguments passed to ssdtools::ssd_fit_dists().

rescale

A flag specifying whether to leave the values unchanged (FALSE) or to rescale concentration values by dividing by the geometric mean of the minimum and maximum positive finite values (TRUE) or a string specifying whether to leave the values unchanged ("no") or to rescale concentration values by dividing by the geometric mean of the minimum and maximum positive finite values ("geomean") or to logistically transform ("odds").

computable

A flag specifying whether to only return fits with numerically computable standard errors.

at_boundary_ok

A flag specifying whether a model with one or more parameters at the boundary should be considered to have converged (default = TRUE).

min_pmix

A list of one or more functions with a single argument that inputs the number of rows of data and returns a proportion between 0 and 0.5.

range_shape1

A list of numeric vectors of length two of the lower and upper bounds for the shape1 parameter.

range_shape2

A list of numeric vectors of length two of the lower and upper bounds for the shape2 parameter.

seed

An integer of the starting seed or NULL.

silent

A flag indicating whether fits should fail silently.

.progress

Whether to show a ⁠purrr::progress bar⁠.

Value

The x tibble with a list column fits of fistdist objects.


Estimate hazard concentrations for multiple simulations using bootstrapping

Description

Estimate hazard concentrations for multiple simulations using bootstrapping

Usage

ssd_hc_sims(
  x,
  proportion = 0.05,
  ...,
  ci = FALSE,
  nboot = 1000,
  est_method = "multi",
  ci_method = "weighted_samples",
  parametric = TRUE,
  seed = NULL,
  save_to = NULL,
  .progress = FALSE
)

Arguments

x

A data frame with sim and stream integer columns and a list column of fitdists objects.

proportion

A numeric vector of proportion values to estimate hazard concentrations for.

...

Additional arguments passed to ssdtools::ssd_hc().

ci

A flag specifying whether to estimate confidence intervals (by bootstrapping).

nboot

A count of the number of bootstrap samples to use to estimate the confidence limits. A value of 10,000 is recommended for official guidelines.

est_method

A string specifying whether to estimate directly from the model-averaged cumulative distribution function (est_method = 'multi') or to take the arithmetic mean of the estimates from the individual cumulative distribution functions weighted by the AICc derived weights (est_method = 'arithmetic') or or to use the geometric mean instead (est_method = 'geometric').

ci_method

A string specifying which method to use for estimating the standard error and confidence limits from the bootstrap samples. The default and recommended value is still ci_method = "weighted_samples" which takes bootstrap samples from each distribution proportional to its AICc based weights and calculates the confidence limits (and SE) from this single set. ci_method = "multi_fixed" and ci_method = "multi_free" generate the bootstrap samples using the model-averaged cumulative distribution function but differ in whether the model weights are fixed at the values for the original dataset or re-estimated for each bootstrap sample dataset. The value ci_method = "MACL" (was ci_method = "weighted_arithmetic"), which is only included for historical reasons, takes the weighted arithmetic mean of the confidence limits while ci_method = GMACL which takes the weighted geometric mean of the confidence limits was added for completeness but is also not recommended. Finally ci_method = "arithmetic_samples" and ci_method = "geometric_samples" take the weighted arithmetic or geometric mean of the values for each bootstrap iteration across all the distributions and then calculate the confidence limits (and SE) from the single set of samples.

parametric

A flag specifying whether to perform parametric bootstrapping as opposed to non-parametrically resampling the original data with replacement.

seed

An integer of the starting seed or NULL.

save_to

NULL or a string specifying a directory to save where the bootstrap datasets and parameter estimates (when successfully converged) to.

.progress

Whether to show a ⁠purrr::progress bar⁠.

Value

The x tibble with a list column hc of data frames produced by applying ssd_hc() to fits.


Run a fit Shard

Description

Runs the fit tasks bundled into one shard: reads the distinct set of parent sample shards the shard's tasks reference (each once - they may span several sample shards), isolates each task's draw by sample_id (restoring row order), truncates it inline (head(sample, nrow), RNG-free, section 5), and fits with the per-task ⁠(seed, primer)⁠ through fit_data_task_primer() (resolving min_pmix off the scenario via scenario_min_pmix()). The fitted fitdists object is serialised into a fit_blob string column keyed by fit_id, and one Parquet is written at the shard's partition path.

Usage

ssd_run_fit_step(tasks, scenario, sample_dir, out_dir)

Arguments

tasks

A tibble of the shard's fit task rows (from ssd_scenario_fit_shards()), each carrying its fit-grid values, fit_id, the parent sample path-axis values, seed, and primer.

scenario

The ssdsims_scenario (a referenced global in ⁠_targets.R⁠).

sample_dir

The sample results root the parent shards were written to.

out_dir

The fit results root (e.g. "results/fit").

Value

The shard's Parquet path.

Examples

scenario <- ssd_define_scenario(
  ssddata::ccme_boron,
  nsim = 1L,
  nrow = 6L,
  seed = 42L,
  dists = "lnorm"
)
dir <- tempfile()
ssd_run_sample_step(
  ssd_scenario_sample_shards(scenario)$tasks[[1L]],
  scenario,
  file.path(dir, "sample")
)
ssd_run_fit_step(
  ssd_scenario_fit_shards(scenario)$tasks[[1L]],
  scenario,
  file.path(dir, "sample"),
  file.path(dir, "fit")
)

Run an hc Shard

Description

Runs the hc tasks bundled into one shard: reads the distinct set of parent fit shards the shard's tasks reference (each once - an hc shard typically spans several fit shards), isolates each task's fit by fit_id, deserialises the fitdists object, and estimates the hazard concentration with the per-task ⁠(seed, primer)⁠ through hc_data_task_primer(). Each task's hc tibble (one or more rows - the proportion fan-out and the ci = FALSE collapse, section 1.2) is tagged with its hc_id and parent fit_id, stacked, and written as one Parquet at the shard's partition path.

Usage

ssd_run_hc_step(tasks, scenario, fit_dir, out_dir)

Arguments

tasks

A tibble of the shard's hc task rows (from ssd_scenario_hc_shards()), each carrying its hc-grid values, hc_id, the parent fit path-axis values and fit_id, seed, and primer.

scenario

The ssdsims_scenario (a referenced global in ⁠_targets.R⁠).

fit_dir

The fit results root the parent shards were written to.

out_dir

The hc results root (e.g. "results/hc").

Value

The shard's Parquet path.

Examples

scenario <- ssd_define_scenario(
  ssddata::ccme_boron,
  nsim = 1L,
  nrow = 6L,
  seed = 42L,
  dists = "lnorm"
)
dir <- tempfile()
ssd_run_sample_step(
  ssd_scenario_sample_shards(scenario)$tasks[[1L]],
  scenario,
  file.path(dir, "sample")
)
ssd_run_fit_step(
  ssd_scenario_fit_shards(scenario)$tasks[[1L]],
  scenario,
  file.path(dir, "sample"),
  file.path(dir, "fit")
)
ssd_run_hc_step(
  ssd_scenario_hc_shards(scenario)$tasks[[1L]],
  scenario,
  file.path(dir, "fit"),
  file.path(dir, "hc")
)

Run a sample Shard

Description

Runs the sample tasks bundled into one shard: under one local_dqrng_backend() scope, reads each task's dataset off the scenario via scenario_dataset(), draws n_max rows with the per-task ⁠(seed, primer)⁠ through sample_data_task_primer(), and writes one Parquet at the shard's Hive partition path. Each task's draw is tagged with its sample_id and a .row order index so a downstream fit shard can isolate and re-order it.

Usage

ssd_run_sample_step(tasks, scenario, out_dir)

Arguments

tasks

A tibble of the shard's task rows (the tasks list-column of a row of ssd_scenario_sample_shards()), each carrying its axis values, sample_id, seed, and primer.

scenario

The ssdsims_scenario (a referenced global in ⁠_targets.R⁠).

out_dir

The sample results root (e.g. "results/sample").

Value

The shard's Parquet path (the format = "file" contract).

See Also

ssd_run_fit_step(), ssd_run_hc_step(), ssd_scenario_sample_shards().

Examples

scenario <- ssd_define_scenario(ssddata::ccme_boron, nsim = 1L, seed = 42L)
shards <- ssd_scenario_sample_shards(scenario)
dir <- tempfile()
ssd_run_sample_step(shards$tasks[[1L]], scenario, file.path(dir, "sample"))

Run Scenario

Description

Run Scenario

Usage

ssd_run_scenario(x, ...)

## S3 method for class 'data.frame'
ssd_run_scenario(
  x,
  ...,
  nrow = 6L,
  replace = FALSE,
  dists = ssdtools::ssd_dists_bcanz(),
  rescale = FALSE,
  computable = FALSE,
  at_boundary_ok = TRUE,
  min_pmix = list(ssdtools::ssd_min_pmix),
  range_shape1 = list(c(0.05, 20)),
  range_shape2 = list(c(0.05, 20)),
  proportion = 0.05,
  ci = FALSE,
  nboot = 1000,
  est_method = "multi",
  ci_method = "weighted_samples",
  parametric = TRUE,
  seed = NULL,
  nsim = 100L,
  stream = getOption("ssdsims.stream", 1L),
  start_sim = 1L,
  .progress = FALSE
)

## S3 method for class 'fitdists'
ssd_run_scenario(
  x,
  ...,
  nrow = 6L,
  dist_sim = "top",
  dists = ssdtools::ssd_dists_bcanz(),
  rescale = FALSE,
  computable = FALSE,
  at_boundary_ok = TRUE,
  min_pmix = list(ssdtools::ssd_min_pmix),
  range_shape1 = list(c(0.05, 20)),
  range_shape2 = list(c(0.05, 20)),
  proportion = 0.05,
  ci = FALSE,
  nboot = 1000,
  est_method = "multi",
  ci_method = "weighted_samples",
  parametric = TRUE,
  seed = NULL,
  nsim = 100L,
  stream = getOption("ssdsims.stream", 1L),
  start_sim = 1L,
  .progress = FALSE
)

## S3 method for class 'tmbfit'
ssd_run_scenario(
  x,
  ...,
  nrow = 6L,
  dists = ssdtools::ssd_dists_bcanz(),
  rescale = FALSE,
  computable = FALSE,
  at_boundary_ok = TRUE,
  min_pmix = list(ssdtools::ssd_min_pmix),
  range_shape1 = list(c(0.05, 20)),
  range_shape2 = list(c(0.05, 20)),
  proportion = 0.05,
  ci = FALSE,
  nboot = 1000,
  est_method = "multi",
  ci_method = "weighted_samples",
  parametric = TRUE,
  seed = NULL,
  nsim = 100L,
  stream = getOption("ssdsims.stream", 1L),
  start_sim = 1L,
  .progress = FALSE
)

## S3 method for class 'character'
ssd_run_scenario(
  x,
  ...,
  nrow = 6L,
  args = list(),
  dists = ssdtools::ssd_dists_bcanz(),
  rescale = FALSE,
  computable = FALSE,
  at_boundary_ok = TRUE,
  min_pmix = list(ssdtools::ssd_min_pmix),
  range_shape1 = list(c(0.05, 20)),
  range_shape2 = list(c(0.05, 20)),
  proportion = 0.05,
  ci = FALSE,
  nboot = 1000,
  est_method = "multi",
  ci_method = "weighted_samples",
  parametric = TRUE,
  seed = NULL,
  nsim = 100L,
  stream = getOption("ssdsims.stream", 1L),
  start_sim = 1L,
  .progress = FALSE
)

## S3 method for class ''function''
ssd_run_scenario(
  x,
  ...,
  nrow = 6L,
  args = list(),
  dists = ssdtools::ssd_dists_bcanz(),
  rescale = FALSE,
  computable = FALSE,
  at_boundary_ok = TRUE,
  min_pmix = list(ssdtools::ssd_min_pmix),
  range_shape1 = list(c(0.05, 20)),
  range_shape2 = list(c(0.05, 20)),
  proportion = 0.05,
  ci = FALSE,
  nboot = 1000,
  est_method = "multi",
  ci_method = "weighted_samples",
  parametric = TRUE,
  seed = NULL,
  nsim = 100L,
  stream = getOption("ssdsims.stream", 1L),
  start_sim = 1L,
  .progress = FALSE
)

Arguments

x

The object to use for the scenario.

...

Unused.

nrow

A positive whole number of the minimum number of non-missing rows.

replace

A logical vector specifying whether to sample with replacement.

dists

A character vector of the distribution names.

rescale

A flag specifying whether to leave the values unchanged (FALSE) or to rescale concentration values by dividing by the geometric mean of the minimum and maximum positive finite values (TRUE) or a string specifying whether to leave the values unchanged ("no") or to rescale concentration values by dividing by the geometric mean of the minimum and maximum positive finite values ("geomean") or to logistically transform ("odds").

computable

A flag specifying whether to only return fits with numerically computable standard errors.

at_boundary_ok

A flag specifying whether a model with one or more parameters at the boundary should be considered to have converged (default = TRUE).

min_pmix

A number between 0 and 0.5 specifying the minimum proportion in mixture models.

range_shape1

A numeric vector of length two of the lower and upper bounds for the shape1 parameter.

range_shape2

shape2 parameter.

proportion

A numeric vector of proportion values to estimate hazard concentrations for.

ci

A flag specifying whether to estimate confidence intervals (by bootstrapping).

nboot

A count of the number of bootstrap samples to use to estimate the confidence limits. A value of 10,000 is recommended for official guidelines.

est_method

A string specifying whether to estimate directly from the model-averaged cumulative distribution function (est_method = 'multi') or to take the arithmetic mean of the estimates from the individual cumulative distribution functions weighted by the AICc derived weights (est_method = 'arithmetic') or or to use the geometric mean instead (est_method = 'geometric').

ci_method

A string specifying which method to use for estimating the standard error and confidence limits from the bootstrap samples. The default and recommended value is still ci_method = "weighted_samples" which takes bootstrap samples from each distribution proportional to its AICc based weights and calculates the confidence limits (and SE) from this single set. ci_method = "multi_fixed" and ci_method = "multi_free" generate the bootstrap samples using the model-averaged cumulative distribution function but differ in whether the model weights are fixed at the values for the original dataset or re-estimated for each bootstrap sample dataset. The value ci_method = "MACL" (was ci_method = "weighted_arithmetic"), which is only included for historical reasons, takes the weighted arithmetic mean of the confidence limits while ci_method = GMACL which takes the weighted geometric mean of the confidence limits was added for completeness but is also not recommended. Finally ci_method = "arithmetic_samples" and ci_method = "geometric_samples" take the weighted arithmetic or geometric mean of the values for each bootstrap iteration across all the distributions and then calculate the confidence limits (and SE) from the single set of samples.

parametric

A flag specifying whether to perform parametric bootstrapping as opposed to non-parametrically resampling the original data with replacement.

seed

An integer of the starting seed or NULL.

nsim

A count of the number of data sets to generate.

stream

A count of the stream number.

start_sim

A count of the number of the simulation to start from.

.progress

Whether to show a ⁠purrr::progress bar⁠.

dist_sim

A character vector specifying the distributions in the fitdists object or ⁠"all"`` for all the distributions and/or ⁠"top"⁠to use the distribution with most weight and/or⁠"multi"' to treat the distributions as a single distribution.

args

A named list of the argument values.

Value

A tibble of nested data sets.

Methods (by class)

  • ssd_run_scenario(data.frame): Run scenario using data.frame to sample data

  • ssd_run_scenario(fitdists): Run scenario using fitdists object to generate data

  • ssd_run_scenario(tmbfit): Run scenario using tmbfit object to generate data

  • ssd_run_scenario(character): Run scenario using name of function to generate sequence of random numbers

  • ssd_run_scenario(`function`): Run scenario data using function to generate sequence of random numbers

Examples

ssd_run_scenario(ssddata::ccme_boron, nsim = 2)

fit <- ssdtools::ssd_fit_dists(ssddata::ccme_boron)
ssd_run_scenario(fit, dist_sim = c("lnorm", "top"), nsim = 3)

fit <- ssdtools::ssd_fit_dists(ssddata::ccme_boron)
ssd_run_scenario(fit[[1]], nsim = 3)

ssd_run_scenario("rlnorm", nsim = 3)

ssd_run_scenario(ssdtools::ssd_rlnorm, nsim = 3)

Run a Scenario with the Baseline Loop Runner

Description

Executes the three task tables in dependency order - sample, fit, then hc - by looping over each table with purrr::pmap() and looking up each task's parent result by the parent's ⁠<step>_id⁠ foreign key. The fit step truncates its parent sample inline (head(sample, nrow)) before fitting. The runner does no task expansion of its own (it consumes ssd_scenario_tasks()); it just threads outputs forward and returns the collected per-step results.

Usage

ssd_run_scenario_baseline(scenario)

Arguments

scenario

An ssdsims_scenario from ssd_define_scenario().

Details

This is the no-frills baseline: it runs in-process, with no targets dependency, no shard grouping or partition_by, and no Parquet I/O.

It is reproducible without an external seed. The runner opens one local_dqrng_backend() scope and seeds each sample/fit/hc task exactly once through its ⁠*_data_task_primer()⁠ wrapper, with seed = scenario$seed and a per-task primer derived from the task's canonical identity (task_primer() over the task_axes(step) columns). Because each task's ⁠(seed, primer)⁠ pair fully determines its RNG starting point, two runs of a scenario with a fixed seed yield identical results, and a task's result is independent of the order in which tasks run. These same ⁠*_data_task_primer()⁠ wrappers are the per-task entry point a future targets shard body and the replay helper (TARGETS-DESIGN.md §7) reuse.

The scenario retains the data frames it was built from, so the runner reads them directly - no separate data argument. min_pmix names are resolved to their materialised functions off the scenario via scenario_min_pmix() (resolved once, at construction), not by a runtime ssdtools/global-env search.

Value

A named list with sample, fit, and hc elements: each the corresponding task table augmented with a list column of per-task results (sample draws, fits objects, and hc tibbles).

Examples

scenario <- ssd_define_scenario(
  ssddata::ccme_boron,
  nsim = 1L,
  nrow = 6L,
  seed = 42L,
  dists = "lnorm"
)
out <- ssd_run_scenario_baseline(scenario)
out$hc

Run a Scenario over Hive-partitioned Parquet Shards (single core)

Description

Executes a scenario's three task steps in dependency order - sample, then fit, then hc - materialising each step's results as one Parquet per partition_by path cell under a Hive-partitioned tree ⁠<dir>/<step>/<axis=value>/.../part.parquet⁠, and linking steps by reading the parent step's shards back via duckplyr (predicate pushdown), rather than threading results in memory. This is the single-core, targets-free sibling of ssd_run_scenario_baseline() and the first consumer of partition-by's path/inner split (scenario_dataset()'s sibling scenario_partition_axes()).

Usage

ssd_run_scenario_shards(scenario, dir = tempfile("ssdsims-shards-"))

Arguments

scenario

An ssdsims_scenario from ssd_define_scenario().

dir

A results root to write the Hive-partitioned shards under; created if absent. Defaults to a per-run session temp dir (the shards are left on disk for inspection and reuse). The runner owns the sample/fit/hc subtrees under dir and clears them on each run, so replaying a scenario with a changed partition_by/bundle never leaves stale-granularity shards beside the new ones. (The targets pipeline instead isolates each layout under its own scenario_results_dir() root.)

Details

It reuses the per-task seed-and-run wrappers, so for a fixed scenario$seed it is reproducible and order-independent, and its per-task results are byte-identical to ssd_run_scenario_baseline() - partition_by is a free re-layout that moves only file paths, never results. The m:n parent-shard dependency (a child shard reading several parent shards, or a parent shard feeding several children, per the section 5 coarsening defaults) is resolved at read time: each task opens the parent shard at its ⁠<parent>_id⁠ identity projected onto the parent's path axes and filters to the rows it needs.

No targets, crew, manifest, or cloud upload - this is the plain-R storage loop only, de-risking hive-partitioning/task-tables.

Value

An ssdsims_shard_run object: a list with dir and the written sample, fit, and hc shard Parquet paths (one per shard).

See Also

ssd_run_scenario_baseline() (the in-memory reference oracle), ssd_scenario_sample_shards(), ssd_run_sample_step().

Examples

scenario <- ssd_define_scenario(
  ssddata::ccme_boron,
  nsim = 1L,
  nrow = 6L,
  seed = 42L,
  dists = "lnorm"
)
run <- ssd_run_scenario_shards(scenario)
run$hc

Group fit Tasks into Shards

Description

As ssd_scenario_sample_shards() for the fit step: groups ssd_scenario_fit_tasks() by partition_by$fit. Each task row in tasks carries its parent sample path-axis values and sample_id, so the runner opens the matching sample shard by partition path.

Usage

ssd_scenario_fit_shards(scenario)

Arguments

scenario

An ssdsims_scenario from ssd_define_scenario().

Value

A tibble with one row per fit shard (path-axis columns + a tasks list-column).

See Also

ssd_run_fit_step().

Examples

scenario <- ssd_define_scenario(
  ssddata::ccme_boron,
  nsim = 2L,
  seed = 42L,
  rescale = c(FALSE, TRUE)
)
ssd_scenario_fit_shards(scenario)

Derive the fit Task Table from a Scenario

Description

Crosses each sample-task identity (dataset, sim, replace) with the scenario's nrow values and each row of the scenario's fit argument grid (rescale, computable, at_boundary_ok, min_pmix name, range_shape1, range_shape2). nrow is a genuine fit cross-join axis: the fit step truncates its parent sample inline (head(sample, nrow), RNG-free) before fitting, so the shared draw is sub-truncated without a separate data step (TARGETS-DESIGN.md section 5). Parent-identity columns are preserved verbatim so the table can be grouped directly downstream. min_pmix is referenced by name, not by function value (TARGETS-DESIGN.md section 1.1).

Usage

ssd_scenario_fit_tasks(scenario)

Arguments

scenario

An ssdsims_scenario from ssd_define_scenario().

Details

Each row carries a fit_id primary key and a sample_id foreign key referencing its parent sample task.

Value

An ssdsims_tasks object recording the "fit" step, with one row per ⁠(dataset, sim, replace, nrow)⁠ identity crossed with the fit grid.

Examples

scenario <- ssd_define_scenario(
  ssddata::ccme_boron,
  nsim = 3L,
  seed = 42L,
  rescale = c(FALSE, TRUE)
)
ssd_scenario_fit_tasks(scenario)

Group hc Tasks into Shards

Description

As ssd_scenario_sample_shards() for the hc step: groups ssd_scenario_hc_tasks() by partition_by$hc. Each task row in tasks carries its parent fit path-axis values and fit_id, so the runner opens the matching fit shard by partition path.

Usage

ssd_scenario_hc_shards(scenario)

Arguments

scenario

An ssdsims_scenario from ssd_define_scenario().

Value

A tibble with one row per hc shard (path-axis columns + a tasks list-column).

See Also

ssd_run_hc_step().

Examples

scenario <- ssd_define_scenario(
  ssddata::ccme_boron,
  nsim = 2L,
  seed = 42L,
  ci = c(FALSE, TRUE)
)
ssd_scenario_hc_shards(scenario)

Derive the hc Task Table from a Scenario

Description

Crosses each fit-task identity with each row of the scenario's hc argument grid (nboot, est_method, ci_method, parametric). The expansion honours the construction-time ci = FALSE collapse (TARGETS-DESIGN.md section 1.2): rows where ci = FALSE are not multiplied across the bootstrap-only knobs (nboot, ci_method, parametric), which are stored as NA, while ci = TRUE rows fan out across the full grid.

Usage

ssd_scenario_hc_tasks(scenario)

Arguments

scenario

An ssdsims_scenario from ssd_define_scenario().

Details

Each row carries an hc_id primary key and a fit_id foreign key referencing its parent fit task.

Value

An ssdsims_tasks object recording the "hc" step, with one row per fit-task identity crossed with the (collapsed) hc grid.

Examples

scenario <- ssd_define_scenario(
  ssddata::ccme_boron,
  nsim = 2L,
  seed = 42L,
  ci = c(FALSE, TRUE),
  nboot = c(10L, 100L)
)
ssd_scenario_hc_tasks(scenario)

Group sample Tasks into Shards

Description

Groups the scenario's sample task table (ssd_scenario_sample_tasks()) by its partition_by$sample path axes into one row per shard: each shard row carries the path-axis columns (the tar_map target-name suffix and Hive path) and a tasks list-column of that shard's task rows, each decorated with seed = scenario$seed and its per-task primer (task_primer() over task_axes("sample")). The decoration is RNG-free; the bare task table keeps its no-⁠(seed, primer)⁠ contract.

Usage

ssd_scenario_sample_shards(scenario)

Arguments

scenario

An ssdsims_scenario from ssd_define_scenario().

Value

A tibble with one row per sample shard: the path-axis columns and a tasks list-column. Suitable as tarchetypes::tar_map(values = ).

See Also

ssd_run_sample_step(), ssd_scenario_fit_shards().

Examples

scenario <- ssd_define_scenario(ssddata::ccme_boron, nsim = 2L, seed = 42L)
ssd_scenario_sample_shards(scenario)

Derive the sample Task Table from a Scenario

Description

Expands an ssdsims_scenario into the sample task table: one row per cell of the cross-join of the scenario's dataset names, replicate index (1:nsim), and replace values. Each row is the single random draw of n_max = max(nrow) rows that every nrow value sub-truncates (TARGETS-DESIGN.md section 5), so nrow is not a sample axis - the draw is shared. n_max is carried as an ordinary integer column. The derivation performs no random-number generation and adds no seed/primer/stream columns (those arrive in later roadmap steps; see TARGETS-DESIGN.md section 2).

Usage

ssd_scenario_sample_tasks(scenario)

Arguments

scenario

An ssdsims_scenario from ssd_define_scenario().

Details

Each row carries a path-style sample_id primary key.

Value

An ssdsims_tasks object (a classed tibble recording the "sample" step) with one row per ⁠(dataset, sim, replace)⁠ cell, a sample_id key, and a carried n_max column.

Examples

scenario <- ssd_define_scenario(ssddata::ccme_boron, nsim = 3L, seed = 42L)
ssd_scenario_sample_tasks(scenario)

Build the Targets Pipeline for a Scenario

Description

A target factory: returns the list of targets objects that runs a scenario as a static-branching Hive-sharded pipeline (TARGETS-DESIGN.md section 6), so a whole ⁠_targets.R⁠ reduces to build a scenario and call this:

Usage

ssd_scenario_targets(scenario, root = scenario_results_dir(scenario))

Arguments

scenario

An ssdsims_scenario from ssd_define_scenario().

root

The results root the shards and summary are written under; defaults to the per-layout scenario_results_dir().

Details

library(targets)
library(tarchetypes)
library(ssdsims)
scenario <- ssd_define_scenario(ssddata::ccme_boron, nsim = 2L, seed = 42L)
ssd_scenario_targets(scenario)

For each step it tarchetypes::tar_map()s one named, format = "file", error = "null" target per partition_by path cell (the names are the step's path axes), wires sample -> fit -> hc -> summary ordering with tar_combine() barriers (a step body reads its parents from disk by partition path, so there is no automatic edge), and writes every shard and the summary under the per-layout scenario_results_dir() root (so a changed partition_by/bundle never mixes shard granularities). scenario is referenced as a global, so editing it invalidates the dependent shards.

To parallelise the shards, set a controller (e.g. a mirai-backed crew::crew_controller_local()) with targets::tar_option_set() in ⁠_targets.R⁠ before calling this - the target set is unchanged.

Value

A list of targets target objects, for ⁠_targets.R⁠ to return.

See Also

scenario_results_dir(), ssd_run_scenario_shards() (the single-core, targets-free equivalent).

Examples

## Not run: 
# _targets.R
library(targets)
library(tarchetypes)
library(ssdsims)
scenario <- ssd_define_scenario(ssddata::ccme_boron, nsim = 2L, seed = 42L)
ssd_scenario_targets(scenario)

## End(Not run)

Expand a Scenario into all Three Task Tables

Description

The canonical expansion entry point (TARGETS-DESIGN.md section 1/section 2): derives the sample, fit, and hc task tables from a scenario in one call and bundles them into an ssdsims_task_set. The per-step derivations (ssd_scenario_sample_tasks(), ssd_scenario_fit_tasks(), ssd_scenario_hc_tasks()) remain available for callers that need a single table.

Usage

ssd_scenario_tasks(scenario, step = NULL)

Arguments

scenario

An ssdsims_scenario from ssd_define_scenario().

step

Optional single step name ("sample", "fit", or "hc"). When supplied, returns just that step's ssdsims_tasks table (the same as the matching ⁠ssd_scenario_*_tasks()⁠); when NULL (default) returns the full ssdsims_task_set.

Value

An ssdsims_task_set object (a list with sample, fit, and hc elements, each an ssdsims_tasks table), or - when step is supplied - the single ssdsims_tasks table for that step.

Examples

scenario <- ssd_define_scenario(ssddata::ccme_boron, nsim = 3L, seed = 42L)
tasks <- ssd_scenario_tasks(scenario)
tasks
tasks$hc
ssd_scenario_tasks(scenario, "hc")

Generate Data for Simulations

Description

A family of functions to generate a tibble of nested data sets.

Usage

ssd_sim_data(x, ...)

## S3 method for class 'data.frame'
ssd_sim_data(
  x,
  ...,
  nrow = 6L,
  replace = FALSE,
  seed = NULL,
  nsim = 100L,
  stream = getOption("ssdsims.stream", 1L),
  start_sim = 1L,
  .progress = FALSE
)

## S3 method for class 'fitdists'
ssd_sim_data(
  x,
  ...,
  nrow = 6L,
  dist_sim = "top",
  seed = NULL,
  nsim = 100L,
  stream = getOption("ssdsims.stream", 1L),
  start_sim = 1L,
  .progress = FALSE
)

## S3 method for class 'tmbfit'
ssd_sim_data(
  x,
  ...,
  nrow = 6L,
  seed = NULL,
  nsim = 100L,
  stream = getOption("ssdsims.stream", 1L),
  start_sim = 1L,
  .progress = FALSE
)

## S3 method for class 'character'
ssd_sim_data(
  x,
  ...,
  nrow = 6L,
  args = list(),
  seed = NULL,
  nsim = 100L,
  stream = getOption("ssdsims.stream", 1L),
  start_sim = 1L,
  .progress = FALSE
)

## S3 method for class ''function''
ssd_sim_data(
  x,
  ...,
  nrow = 6L,
  args = list(),
  seed = NULL,
  nsim = 100L,
  stream = getOption("ssdsims.stream", 1L),
  start_sim = 1L,
  .progress = FALSE
)

Arguments

x

The object to use for generating the data.

...

Unused.

nrow

A numeric vector of the number of rows in the generated data which must be between 5 and 1,000,

replace

A logical vector specifying whether to sample with replacement.

seed

An integer of the starting seed or NULL.

nsim

A count of the number of data sets to generate.

stream

A count of the stream number.

start_sim

A count of the number of the simulation to start from.

.progress

Whether to show a ⁠purrr::progress bar⁠.

dist_sim

A character vector specifying the distributions in the fitdists object or ⁠"all"`` for all the distributions and/or ⁠"top"⁠to use the distribution with most weight and/or⁠"multi"' to treat the distributions as a single distribution.

args

A named list of the argument values.

Value

A tibble of nested data sets.

Methods (by class)

  • ssd_sim_data(data.frame): Generate data by sampling from data.frame

  • ssd_sim_data(fitdists): Generate data from fitdists object

  • ssd_sim_data(tmbfit): Generate data from tmbfit object

  • ssd_sim_data(character): Generate data using name of function

  • ssd_sim_data(`function`): Generate data using function to generate sequence of random numbers

Examples

ssd_sim_data(ssddata::ccme_boron, nrow = 5, nsim = 3)

fit <- ssdtools::ssd_fit_dists(ssddata::ccme_boron)
ssd_sim_data(fit, nrow = 5, nsim = 3)

fit <- ssdtools::ssd_fit_dists(ssddata::ccme_boron)
ssd_sim_data(fit[[1]], nrow = 5, nsim = 3)

ssd_sim_data("rnorm", nrow = 5, nsim = 3)

ssd_sim_data(ssdtools::ssd_rlnorm, nrow = 5, nsim = 3)

Summarise a Run's hc Estimates Across Shards

Description

Fans in the run's results without pulling shard target values back into R or recomputing anything: reads every hc shard Parquet under dir_hc (a Hive glob) with duckplyr - the analysis-ready per-task hazard-concentration estimates - unions them, and writes path. Because it reads the result directory (not the shard targets), it sees whatever shards landed, so it unions the survivors of a partially-failed run (error = "null", section 6.2). dir_sample and dir_fit are accepted for signature symmetry with the three result layers; the sample draws and serialised fit objects are not summary material, so the combined summary is the hc layer.

Usage

ssd_summarize(dir_sample, dir_fit, dir_hc, path)

Arguments

dir_sample

The sample results root.

dir_fit

The fit results root.

dir_hc

The hc results root.

path

The output Parquet path for the combined summary.

Details

In a targets pipeline a directory read carries no dependency edge, so order summary after the shards by referencing an upstream barrier in its command (see the shipped ⁠_targets.R⁠ template's tar_combine() barriers). Reading the directory - rather than the shard target values - is what lets it union whatever shards landed (the survivors of a partially-failed run, section 6.2).

Value

The summary Parquet path (the format = "file" contract).

Examples

scenario <- ssd_define_scenario(
  ssddata::ccme_boron,
  nsim = 1L,
  nrow = 6L,
  seed = 42L,
  dists = "lnorm"
)
# Materialise the shards single-core, then fan in the hc layer.
run <- ssd_run_scenario_shards(scenario)
ssd_summarize(
  file.path(run$dir, "sample"),
  file.path(run$dir, "fit"),
  file.path(run$dir, "hc"),
  file.path(run$dir, "summary.parquet")
)

Derive a Per-task Primer from its Parameters

Description

Derives the per-task primer – a length-2 integer vector – from rlang::hash(params), suitable for the stream argument of dqrng::dqset.seed(). Together with the scenario seed, the primer fully specifies a task's RNG starting point: dqrng::dqset.seed(seed, stream = task_primer(params)). It pairs with local_dqrng_state(), which installs the ⁠(seed, primer)⁠ pair under an active local_dqrng_backend() scope.

Usage

task_primer(params)

Arguments

params

A plain named list of task parameters, or a single-row data frame (one task-table row).

Details

The primer packs 64 bits of the rlang::hash() digest (xxhash128) as c(hi32, lo32). Each 32-bit half is encoded as a signed int32, with the reserved bit pattern 0x80000000 (INT_MIN, which R cannot represent as a non-NA integer) mapped to NA_integer_; dqrng accepts NA_integer_ in stream and treats it as INT_MIN, so the encoding recovers the full 64 bits of stream entropy.

params may be a plain named list or a single-row data frame (one row of a ⁠{sample,fit,hc}_tasks⁠ table). A data-frame row is normalised to a canonical plain list – the inverse of tibble::tibble_row() – by dropping all attributes, unwrapping length-1 list-style columns to their element, and leaving df-style (nested data-frame) columns as data frames, before hashing. The primer is therefore identical whether derived from the row or from the equivalent plain list. Note that rlang::hash() is order-sensitive, so the plain list must use the same name order as the task-table columns to reproduce the row's primer (assembling params in a canonical column order is part of the task-tables caller contract below).

task_primer() normalises structure, not meaning: it hashes whatever params it is given. The canonical, name-keyed representation is a caller contract assembled where params is built (task-tables, over the task-lists tables). Per the three-step model the RNG-consuming steps each take a primer over their task identity:

  • sample – keyed ⁠(dataset, sim, replace)⁠ only. nrow is deliberately absent: every nrow shares one n_max-row draw that the fit step truncates inline (head(sample, nrow), RNG-free, no separate primer), so excluding nrow is load-bearing for the sub-truncation property (TARGETS-DESIGN.md §5).

  • fit – the parent sample identity plus nrow and the fit-grid row (rescale, computable, at_boundary_ok, min_pmix name, range_shape1, range_shape2). nrow IS part of the fit primer: a fit on a different truncation is a genuinely different computation.

  • hc – the parent fit identity plus the hc-grid row (ci, nboot, est_method, ci_method, parametric).

Function-valued parameters (e.g. min_pmix) MUST be referenced by name, not by function value, so a recompile or JIT does not move a task's primer.

Value

An integer vector of length 2 – the per-task primer – to pass as the stream argument of dqrng::dqset.seed() (via local_dqrng_state()).

See Also

local_dqrng_state(), local_dqrng_backend().

Examples

task_primer(list(dataset = "boron", sim = 1L, replace = FALSE))