---
title: "Getting Started with bboutools"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started with bboutools}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
bibliography: bibliography.bib
---

<!-- # Copyright 2022 Environment and Climate Change Canada -->
<!-- # Copyright 2023 Province of Alberta -->
<!-- # Copyright 2024 Province of Alberta -->
<!-- # -->
<!-- # Licensed under the Apache License, Version 2.0 (the "License"); -->
<!-- # you may not use this file except in compliance with the License. -->
<!-- # You may obtain a copy of the License at -->
<!-- # -->
<!-- # http://www.apache.org/licenses/LICENSE-2.0 -->
<!-- # -->
<!-- # Unless required by applicable law or agreed to in writing, software -->
<!-- # distributed under the License is distributed on an "AS IS" BASIS, -->
<!-- # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -->
<!-- # See the License for the specific language governing permissions and -->
<!-- # limitations under the License. -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup, message=FALSE, warning=FALSE}
library(bboutools)
library(bboudata)
```

`bboutools` is an R package to estimate Boreal Caribou recruitment, survival and population growth. 
Functions are provided to fit Bayesian or Maximum Likelihood (ML) models and generate and plot predictions.  

Under the hood, the [Nimble](https://r-nimble.org) R package is used to fit heirarchical Bayesian and Maximum Likelihood models. 
Model templates in Nimble use BUGS-like syntax.  

Several anonymized data sets are provided in a separate R package, `bboudata`. 

In order to use `bboutools` you need to install R (see below) or use the Shiny [app](https://github.com/poissonconsulting/bboushiny).

## Philosophy

`bboutools` is intended to be used in conjunction with [tidyverse](https://www.tidyverse.org) packages such as `readr`, `dplyr` to manipulate data and `ggplot2` [@ggplot2] to plot data.
As such, it endeavors to fulfill the tidyverse [manifesto](https://tidyverse.tidyverse.org/articles/manifesto.html).

## Installation

In order to install R [@r] the appropriate binary for the users operating system should be downloaded from [CRAN](https://cran.r-project.org) and then installed.

Once R is installed, the `bboutools` package can be installed from GitHub by executing the following code at the R console
```{r, eval=FALSE}
install.packages("remotes")
remotes::install_github("poissonconsulting/bboutools")
```

The `bboutools` package can then be loaded into the current session using
```{r, message = FALSE}
library(bboutools)
```

## Getting Help

To get additional information on a particular function just type `?` followed by the name of the function at the R console.
For example `?bb_fit_recruitment` brings up the R documentation for the `bboutools` recruitment model fit function.

For more information on using R the reader is referred to [R for Data Science](https://r4ds.had.co.nz) [@wickham_r_2016].

If you discover a bug in `bboutools` please file an issue with a [reprex](https://reprex.tidyverse.org/articles/reprex-dos-and-donts.html) (reproducible example) at <https://github.com/poissonconsulting/bboutools/issues>.

## Providing Data

Once the `bboutools` package has been loaded, the next task is to provide some data.
An easy way to do this is to create a comma separated file (`.csv`) with spreadsheet software. 
Recruitment and survival data should be formatted as in the following anonymized datasets and can be checked to confirm it is in the correct format by using the `bboudata` functions:

```{r}
# Recruitment data
bboudata::bbd_chk_data_recruitment(bboudata::bbourecruit_a)
bboudata::bbourecruit_a
```

```{r}
# Survival data
bboudata::bbd_chk_data_survival(bboudata::bbousurv_a)
bboudata::bbousurv_a
```

All columns should be included and column names should not be changed. 

The `.csv` file can then be read into R using the following
```{r, eval=FALSE}
data <- read_csv(file = "path/to/file.csv")
```

## Recruitment

The annual recruitment in boreal caribou population is typically estimated from annual calf:cow ratios.

`bboutools` fits a Binomial recruitment model to the annual counts of calves, cows, yearlings, unknown adults and potentially, bulls.  

It is up to the user to ensure that the data are from surveys that were conducted at the same time of year, when calf survival is expected to be similar to adult survival.

### Fit a Bayesian model

The function `bb_fit_recruitment()` fits a Bayesian recruitment model.  

The start month of the biological year (i.e., 'caribou year') can be set with the `year_start` argument.
By default, the start month is April. 
Data are aggregated by biological year (not calendar year) prior to model fitting. 
 
The adult female proportion can either be fixed or estimated from counts of cows and bulls (i.e., `Cows ~ Binomial(adult_female_proportion, Cows + Bulls)`). 

If the user provides a value to `adult_female_proportion`, it is fixed.
The default value is 0.65, which accounts for higher mortality of males [@smith_2004].
If `adult_female_proportion = NULL`, the adult female proportion is estimated from the data (i.e., `Cows ~ Binomial(adult_female_proportion, Cows + Bulls)`).
By default, a biologically informative prior of `Beta(65,35)` is used. 
This corresponds to an expected value of 65%.  

The yearling and calf female proportion can be set with `sex_ratio`. 
The default value is 0.5.  

The model can be fit with random effect of year, fixed effect of year and/or continuous effect of year (i.e., year trend). 
The `min_random_year` argument dictates the minimum number of years in the dataset required to fit a random year effect; otherwise a fixed year effect is fit. 
It is not recommended to fit a random year effect with fewer than 5 years [@kery_bayesian_2011]. 
A continuous fixed effect of year can be fit with `year_trend = TRUE`. 

The user can set `quiet = FALSE` argument to see messages and sampling progress. 

```{r,eval=FALSE}
recruitment <- bb_fit_recruitment(bboudata::bbourecruit_a, year_start = 4, year_trend = TRUE, quiet = TRUE)
```

```{r,echo=FALSE}
recruitment <- bboutools:::fit_recruitment_trend
```

### Convergence
Model convergence can be checked with the `glance()` function. 
```{r}
glance(recruitment)
```

Model convergence provides an indication of whether the parameter estimates are reliable. 
Convergence is considered successful by default if `rhat` < 1.05. 
The `rhat` threshold can be adjusted by the user.
`rhat` evaluates whether the chains agree on the same values. 
As the total variance of all the chains shrinks to the average variance within chains, r-hat approaches 1.  
Output of `glance()` also includes `ess` (Effective Sample Size), which represents the length of a chain (i.e., number of iterations) if each sample was independent of the one before it. 
` 
By default, the `bboutools` Bayesian method saves 1,000 MCMC samples from each of three chains (after discarding the first halves). 
The number of samples saved can be adjusted with the `niters` argument. 
With `niters` set, the user can simply increment the thinning rate as required to achieve convergence (i.e., by decreasing `rhat`).


### Summary
Various generic functions in `bboutools` can be used to summarize or interrogate the output of model fitting functions.

- `coef()` and `tidy()` provide a tidy table of the coefficient estimates. 
- `estimates()` provides a list of the coefficient estimates. 
- `augment()` provides the data used. 
- `model_code()` provides the model code in BUGS-like syntax. 
- `plot()` provides traceplots for individual parameters. 

The user can exclude individual random effect estimates from coefficient output.
```{r}
tidy(recruitment, include_random_effects = FALSE)
```

Keep in mind that any reference to 'Year' or 'Annual' in these summary outputs represent the caribou year, which can be set by the user within the fitting functions.  

### Priors
In general, weakly informative priors are used by default [@gelman_prior_2017; @mcelreath_statistical_2016]. 
The default prior distribution parameter values can be accessed from `bb_priors_recruitment()` and `bb_priors_survival()`. 
See the priors vignette for more information.

```{r}
bb_priors_recruitment()
```
The default prior distribution for adult_female_proportion is `Beta(65, 35)` and the default prior distribution for the intercept (`b0`) is `Normal(-1.5, 1)`, which is on the log scale. 
The user can change the priors by providing a named vector to the `priors` argument in the model fitting functions. The names must match one of the names in `bb_priors_recruitment()`.  

For example, less informative priors for `adult_female_proportion` (e.g., `Beta(1, 1)`) and `b0` (e.g., `Normal(0, 5)`) can be supplied as follows. 
```{r, eval=FALSE}
recruitment <- bb_fit_recruitment(bboudata::bbourecruit_a, priors = c(adult_female_proportion_alpha = 1, adult_female_proportion_beta = 1, b0_mu = 0, b0_sd = 5))
```

If the user is interested in fitting models without any prior information, see `bb_fit_recruitment_ml()` and `bb_fit_survival_ml()`, which use a Maximum Likelihood approach (see more details below). 

## Survival
The annual survival in boreal caribou population is typically estimated from the monthly fates of collared adult females. 
`bboutools` fits a Binomial monthly survival model to the number of collared females and mortalities.
The user can choose whether to include individuals with uncertain fates with the certain mortalities.  

The function `bb_fit_survival()` fits a Bayesian survival model. 

The survival model is always fit with a random intercept for each month. 
Otherwise, the `year_start`, `year_trend`, and `min_random_year` arguments have the same behaviour as `bb_fit_recruitment()` above. 

If `include_uncertain_mortalities = TRUE`, the total mortalities is the sum of the certain mortalities and uncertain mortalities ('MortalitiesCertain' and 'MortalitiesUncertain' columns); otherwise, only certain mortalities are used to fit the model. 

```{r,eval=FALSE}
survival <- bb_fit_survival(bboudata::bbousurv_a, year_start = 4, quiet = TRUE)
```

```{r,echo=FALSE}
survival <- bboutools:::fit_survival
```

```{r}
tidy(survival, include_random_effects = FALSE)
```

## Predictions
A user can generate and plot predictions of recruitment, survival and population growth.

Recruitment is the adjusted recruitment using  methods from [@decesare_estimating_2012]. 
See the 'analytical methods' article for details. 

Predictions of calf-cow ratio can also be made using `bb_predict_calf_cow_ratio()`. 

The sex ratio can be adjusted with `sex_ratio`.

#### Recruitment by year
```{r}
predict_recruitment <- bb_predict_recruitment(recruitment, year = TRUE)
bb_plot_year_recruitment(predict_recruitment)
```

#### Recruitment for a 'typical' year
```{r}
predict_recruitment_1 <- bb_predict_recruitment(recruitment, year = FALSE)
predict_recruitment_1
```

#### Recruitment trend
```{r}
predict_recruitment_trend <- bb_predict_recruitment_trend(recruitment)
bb_plot_year_trend_recruitment(predict_recruitment_trend)
```

#### Survival by year for a 'typical' month
```{r}
predict_survival <- bb_predict_survival(survival, year = TRUE, month = FALSE)
bb_plot_year_survival(predict_survival)
```

#### Survival by month for a 'typical' year

The estimates show annual survival, i.e., if that month lasted the duration of the year. 

```{r}
predict_survival_month <- bb_predict_survival(survival, year = FALSE, month = TRUE)
bb_plot_month_survival(predict_survival_month)
```

## Population Growth

A user can predict population growth (lambda) with `bb_predict_growth()`. 
The survival and recruitment models fit in the previous steps are used as input.
It is important to ensure that survival and recruitment outputs share the same biological year (i.e., 'caribou year'), which is set with the `year_start` argument in `bb_fir_survival()` and `bb_fit_recruitment()`.
Details on how lambda is calculated can be found in the analytical methods vignette. 

```{r}
predict_lambda <- bb_predict_growth(survival = survival, recruitment = recruitment)

bb_plot_year_growth(predict_lambda) +
  ggplot2::scale_y_continuous(labels = scales::percent)
```

Population change (%) is calculated with uncertainty as the cumulative product of population growth. 

```{r}
predict_change <- bb_predict_population_change(survival = survival, recruitment = recruitment)
bb_plot_year_population_change(predict_change)
```

## Maximum Likelihood

Maximum Likelihood (ML) models can be fit using the `bb_fit_recruitment_ml()` and `bb_fit_survival_ml()` functions. 
These functions take a few seconds to execute because Nimble must compile the model into C++ code.
See the [Nimble documentation](https://r-nimble.org/html_manual/cha-AD.html#how-to-use-laplace-approximation) for more information and comparison to TMB.
Similar to Bayesian model fits, generic functions (e.g., `tidy()`, `glance()` and `augment()`) work on ML fit objects (class 'bboufit_ml').

```{r,eval=FALSE}
recruitment_ml <- bb_fit_recruitment_ml(bboudata::bbourecruit_a, year_start = 4, year_trend = TRUE, quiet = TRUE)
```

```{r,echo=FALSE}
recruitment_ml <- bboutools:::fit_recruitment_ml_trend
```

```{r}
glance(recruitment_ml)
```

```{r,eval=FALSE}
survival_ml <- bb_fit_survival_ml(bboudata::bbousurv_a, year_start = 4, quiet = TRUE)
```

```{r,echo=FALSE}
survival_ml <- bboutools:::fit_survival_ml
```

The ML estimates are comparable to estimates derived from the equivalent Bayesian models above. In general, ML models can be interpreted as Bayesian models with uninformative (e.g., uniform) priors [@mcelreath_statistical_2016].

```{r}
tidy(recruitment_ml, include_random_effects = FALSE)
```

```{r}
tidy(survival_ml, include_random_effects = FALSE)
```

There is functionality in `bboutools` to generate predictions (i.e., derived parameters) from ML models. 
However, there is no functionality to get confidence intervals on predictions. 
This is a more straightforward task with Bayesian models. 

```{r}
bb_predict_survival(survival_ml)
```

```{r}
growth <- bb_predict_growth(survival_ml, recruitment_ml)
bb_plot_year_growth(growth)
```

Note that ML models can struggle to converge when there are sparse data, especially with a fixed year effect. 
If these issues arise, the user can try estimating year as a random effect, continuous effect (`year_trend`), or fitting a Bayesian model. 

Another possible source of convergence issues is initial values. 
By default, `bboutools` sets initial values based on the default priors used for parameters in the Bayesian models. 
The user can replace initial values for parameters using `inits`.

```{r,eval=FALSE}
inits_ml <- bb_fit_recruitment_ml(bboudata::bbourecruit_a, inits = c(b0 = 1, sAnnual = 0.3))
```

## Understanding `bboufit` objects 

The `bb_fit_survival()` and `bb_fit_recruitment()` functions use a Bayesian approach and return objects that inherit from class `bboufit`.

Objects of class `bboufit` have four elements: 

1. `model` - the compiled Nimble model as created by `nimble::nimbleModel()`.  
1. `model_code` - the model code in text format.  
1. `samples` - the MCMC samples generated from`nimble::runMCMC()` converted to an object of class `mcmcr`.  
1. `data` - the survival or recruitment data provided.

These are the raw materials for any further exploration or analysis. 
For example, view trace and density plots with `plot(fit$samples)`.

See [mcmcr](https://github.com/poissonconsulting/mcmcr) and [mcmcderive](https://github.com/poissonconsulting/mcmcderive) for working with `mcmcr` objects, or convert samples to an object of class `mcmc.list`, e,g, with `coda::as.mcmc.list` for working with the [coda](https://github.com/cran/coda) R package. 

The `bb_fit_survival_ml()` and `bb_fit_recruitment_ml()` functions use a Maximum Likelihood approach and return objects that inherit from class `bboufit_ml`.

Objects of class `bboufit_ml` have five elements: 

1. `model` - the Nimble model as created by `nimble::nimbleModel()`.   
1. `model_code` - the model code in text format.  
1. `mle` - the Maximum Likelihood output as created by model$findMLE().
1. `summary` - the summary of the Maximum Likelihood output as created by `model$summary(mle)`.  
1. `data` - the survival or recruitment data provided.

See [nimble](https://github.com/nimble-dev/nimble) for how to work with nimble model objects and Maximum Likelihood output. 

## References