---
title: "Get Started with ssdsims"
vignette: >
  %\VignetteIndexEntry{Get Started with ssdsims}
  %\VignetteEngine{quarto::html}
  %\VignetteEncoding{UTF-8}
knitr:
  opts_chunk:
    collapse: true
    comment: "#>"
---

```{r}
#| label: setup
#| include: false
# The on-ramp exercises the live API, so it needs the optional fitting
# dependencies. Skip evaluation gracefully if they are unavailable (e.g. on a
# minimal CI runner) rather than failing the build.
evaluate <- requireNamespace("ssddata", quietly = TRUE) &&
  requireNamespace("ssdtools", quietly = TRUE)
knitr::opts_chunk$set(eval = evaluate)
```

```{r}
#| label: library
library(ssdsims)
```

## What ssdsims is for

A species sensitivity distribution (SSD) study asks how an estimate --- a
hazard concentration, say --- behaves as you vary the things you control: the
sample size, the number of simulations, the distributions you fit, the
bootstrap settings. Answering that means running the same fit-and-estimate
pipeline across a grid of settings, reproducibly, often thousands of times.

ssdsims runs that study from a **declarative scenario**. You describe the study
once --- a seed, the simulation count, the sample sizes, the dataset *names*,
and the fit/hc argument grids --- and ssdsims expands it into per-step task
tables, draws the data, fits the distributions (via
[ssdtools](https://bcgov.github.io/ssdtools/)), and estimates the hazard
concentrations. The scenario itself draws no random numbers and writes nothing,
so it serialises to a compact manifest and the work it expands to is a pure
function of the scenario.

## Two tracks

The documentation splits into two tracks. Most users only need the first.

```
  BUILD AND RUN ─────────────────────────────────────────────────────────
    vignette("defining-a-scenario")   define a scenario, expand task tables
            │
            ▼
    vignette("sharded-pipeline")      run it: in-memory → Parquet shards
            │                          → a targets pipeline (parallel)
            ├──► vignette("scenario-to-design")   combine scenarios (ragged)
            ├──► vignette("cluster-pipeline")     run on a SLURM cluster
            └──► vignette("cloud-upload")         ship shards to object storage

  PREDICT AND MEASURE COST ──────────────────────────────────────────────
    vignette("cost-estimation")       predict compute cost before a run
            │
            ▼
    vignette("cost-analysis")         measure observed cost after a run
```

A scenario can fan out into a multi-day run with no warning, so the cost track
lets you size a run before launching it and measure where the time actually
went afterwards. The two tracks meet at the scenario: both read the same object.

## A recommended reading order

1. **`vignette("defining-a-scenario")`** --- the scenario object end to end:
   assembling data, declaring the scenario, expanding the task tables, and the
   in-memory baseline runner. Start here.
2. **`vignette("sharded-pipeline")`** --- the same scenario as Hive-partitioned
   Parquet shards and a `targets` pipeline; the central "two drivers, one core,
   byte-identical" idea.
3. Then, as your study grows, branch into
   **`vignette("scenario-to-design")`** (combine scenarios),
   **`vignette("cluster-pipeline")`** (SLURM), and
   **`vignette("cloud-upload")`** (object storage).
4. To size and audit compute, read **`vignette("cost-estimation")`** before a
   run and **`vignette("cost-analysis")`** after one.

## A 30-second on-ramp

Assemble the data with `ssd_scenario_data()`, declare the study with
`ssd_define_scenario()`, and run the in-memory baseline with
`ssd_run_scenario_baseline()`:

```{r}
#| label: on-ramp
scenario <- ssd_define_scenario(
  ssd_scenario_data(ssddata::ccme_boron),
  nsim = 2L,
  seed = 42L,
  nrow = c(6L, 10L),
  ci = TRUE,
  nboot = 10L,
  ci_method = c("multi_fixed", "weighted_samples")
)
scenario
```

```{r}
#| label: run
out <- ssd_run_scenario_baseline(scenario)
out$hc
```

Because each task installs its own `(seed, primer)` under a scoped dqrng
backend, the result is reproducible from the scenario's `seed` alone --- and
identical whichever runner you choose. `vignette("defining-a-scenario")` picks
up exactly here.
