| Title: | R Tools for EVR Fish Projects |
|---|---|
| Description: | Provides functions linearly interpolate missing values and calculate growing season degree days and growing degree days from daily water temperature data. It also exports fish codes for British Columbia and Alberta from the `fishbc` package. |
| Authors: | Joe Thorley [aut, cre] (ORCID: <https://orcid.org/0000-0002-7683-4592>), Ayla Pearson [aut] (ORCID: <https://orcid.org/0000-0001-7388-1222>), Sarah Lyons [aut] (ORCID: <https://orcid.org/0000-0002-6745-6796>), Bronwen Lewis [ctb], Jill Brooks [ctb], Andrew Harwood [ctb], Sebastian Dalgarno [ctb], Elk Valley Resources [fnd, cph] |
| Maintainer: | Joe Thorley <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.7.0 |
| Built: | 2026-06-01 05:04:08 UTC |
| Source: | https://github.com/poissonconsulting/evrfish |
Aggregates subdaily water temperature data to mean daily water temperature. If the water temperature data spans less than the min_coverage than a missing value is return instead.
aggregate_water_temp_data( data, ..., min_coverage = 0.875, date_time = "date_time", value = "water_temperature" )aggregate_water_temp_data( data, ..., min_coverage = 0.875, date_time = "date_time", value = "water_temperature" )
data |
A data frame. |
... |
These dots are for future extensions and must be empty. |
min_coverage |
A numeric value of the minimum coverage as a proportion. |
date_time |
A string indicating the column name of the POSIXct vector. |
value |
A string indicating the column name of the value vector. |
The min_coverage is converted into the minimum number of non-missing values in each day based on how many values of the shortest time interval within the time series are required to achieve the minimum coverage. For example if the date times are every 15 minutes then 24 / 0.25 = 96 values are required for 100% coverage and 84 values are required for the default 87.5% minimum coverage.
A data frame with a date column and update water temperature column.
data <- data.frame( date_time = as.POSIXct(c( "2021-05-07 00:00:00", "2021-05-07 08:00:00", "2021-05-07 16:00:00" )), water_temperature = c(5, 5, 7) ) aggregate_water_temp_data(data)data <- data.frame( date_time = as.POSIXct(c( "2021-05-07 00:00:00", "2021-05-07 08:00:00", "2021-05-07 16:00:00" )), water_temperature = c(5, 5, 7) ) aggregate_water_temp_data(data)
Time series data will be either classified as reasonable, questionable, or erroneous in the status_id column or NA if the value is missing.
classify_time_series_data( data, ..., date_time = "date_time", value = "value", questionable_min = 0, questionable_max = 30, erroneous_min = -0.5, erroneous_max = 40, questionable_rate = 2, erroneous_rate = 5, questionable_buffer = 1, erroneous_buffer = 1, gap_range = 5 )classify_time_series_data( data, ..., date_time = "date_time", value = "value", questionable_min = 0, questionable_max = 30, erroneous_min = -0.5, erroneous_max = 40, questionable_rate = 2, erroneous_rate = 5, questionable_buffer = 1, erroneous_buffer = 1, gap_range = 5 )
data |
A data frame. |
... |
These dots are for future extensions and must be empty. |
date_time |
A string indicating the column name of the POSIXct vector. |
value |
A string indicating the column name of the value vector. |
questionable_min |
A numeric value indicating the lower bound of the questionable range of temperature values. |
questionable_max |
A numeric value indicating the upper bound of the questionable range of temperature values. |
erroneous_min |
A numeric value indicating the lower bound of the erroneous range of temperature values. |
erroneous_max |
A numeric value indicating the upper bound of the erroneous range of temperature values. |
questionable_rate |
A numeric value indicating the rate of change (temperature per hour) of temperature values that is considered questionable. |
erroneous_rate |
A numeric value indicating the rate of change (temperature per hour) of temperature values that is considered erroneous. |
questionable_buffer |
A numeric value indicating the buffer in hours for questionable values. |
erroneous_buffer |
A numeric value indicating the number of hours buffer for erroneous values. |
gap_range |
A numeric value indicating the number of hours between two non reasonable values that will be coded as questionable or erroneous. |
The function only works on a single time series.
The function will error if there are missing or duplicated date time.
The data is processed by:
Classifying the time series values based on their values ('questionable_min, questionable_max, erroneous_min, erroneous_max).
The rate of change to each value is then calculated and the values are classified based on the absolute rate of change (questionable_rate, erroneous_rate).
Adjacent values to all questionable/erroneous are then coded as questionable/erroneous.
Next any value within the time buffer of a questionable/erroneous value is classified as questionable/erroneous (questionable_buffer, erroneous_buffer).
In addition, ignoring the buffer, reasonable values between two questionable/erroneous values are coded as questionable if the hourly duration of the gap is within the (gap_range).
The original data frame sorted by the date time with a status_id column.
data <- data.frame( date_time = as.POSIXct(c( "2021-05-07 08:00:00", "2021-05-07 09:00:00", "2021-05-07 10:00:00", "2021-05-07 11:00:00", "2021-05-07 12:00:00", "2021-05-07 13:00:00" )), water_temperature = c(4.124, 4.078, 4.102, 4.189, 4.243, 6.578) ) classify_time_series_data(data, value = "water_temperature")data <- data.frame( date_time = as.POSIXct(c( "2021-05-07 08:00:00", "2021-05-07 09:00:00", "2021-05-07 10:00:00", "2021-05-07 11:00:00", "2021-05-07 12:00:00", "2021-05-07 13:00:00" )), water_temperature = c(4.124, 4.078, 4.102, 4.189, 4.243, 6.578) ) classify_time_series_data(data, value = "water_temperature")
A wrapper on classify_time_series_data() with the arguments set for
water temperature data.
classify_water_temp_data( data, questionable_min = 0, questionable_max = 30, erroneous_min = -0.5, erroneous_max = 40, questionable_rate = 2, erroneous_rate = 5, questionable_buffer = 1, erroneous_buffer = 1, gap_range = 5, date_time = "temperature_date_time", value = "water_temperature" )classify_water_temp_data( data, questionable_min = 0, questionable_max = 30, erroneous_min = -0.5, erroneous_max = 40, questionable_rate = 2, erroneous_rate = 5, questionable_buffer = 1, erroneous_buffer = 1, gap_range = 5, date_time = "temperature_date_time", value = "water_temperature" )
data |
A data frame. |
questionable_min |
A numeric value indicating the lower bound of the questionable range of temperature values. |
questionable_max |
A numeric value indicating the upper bound of the questionable range of temperature values. |
erroneous_min |
A numeric value indicating the lower bound of the erroneous range of temperature values. |
erroneous_max |
A numeric value indicating the upper bound of the erroneous range of temperature values. |
questionable_rate |
A numeric value indicating the rate of change (temperature per hour) of temperature values that is considered questionable. |
erroneous_rate |
A numeric value indicating the rate of change (temperature per hour) of temperature values that is considered erroneous. |
questionable_buffer |
A numeric value indicating the buffer in hours for questionable values. |
erroneous_buffer |
A numeric value indicating the number of hours buffer for erroneous values. |
gap_range |
A numeric value indicating the number of hours between two non reasonable values that will be coded as questionable or erroneous. |
date_time |
A string indicating the column name of the POSIXct vector. |
value |
A string indicating the column name of the value vector. |
A data frame
data <- data.frame( temperature_date_time = as.POSIXct(c( "2021-05-07 08:00:00", "2021-05-07 09:00:00", "2021-05-07 10:00:00", "2021-05-07 11:00:00", "2021-05-07 12:00:00", "2021-05-07 13:00:00" )), water_temperature = c(4.124, 4.078, 4.102, 4.189, 4.243, 6.578) ) classified_data <- classify_water_temp_data(data)data <- data.frame( temperature_date_time = as.POSIXct(c( "2021-05-07 08:00:00", "2021-05-07 09:00:00", "2021-05-07 10:00:00", "2021-05-07 11:00:00", "2021-05-07 12:00:00", "2021-05-07 13:00:00" )), water_temperature = c(4.124, 4.078, 4.102, 4.189, 4.243, 6.578) ) classified_data <- classify_water_temp_data(data)
A wrapper on gsdd::date_atus() to calculats the date on which a
specified number of Accumulated Thermal Units (ATUs) are exceeded.
date_atus(x, atus = 600, start_date = as.Date("1972-03-01"))date_atus(x, atus = 600, start_date = as.Date("1972-03-01"))
x |
A data frame with two columns |
atus |
A positive number of the accumulated thermal units to exceed. |
start_date |
A Date scalar of the first date
within each year to consider (the year is ignored).
#' If |
A tibble with four columns year, start_date, end_date and atus.
date_atus(gsdd::temperature_data)date_atus(gsdd::temperature_data)
Curated data on the codes, classification and conservation status of freshwater fishes in British Columbia.
freshwaterfishfreshwaterfish
An object of class tbl_df (inherits from tbl, data.frame) with 161 rows and 17 columns.
A wrapper on gsdd::gdd() to get the Growing Degree Days up to a date for
the longest growing season.
gdd(x, end_date = as.Date("1972-09-30"), min_length = 60, msgs = TRUE)gdd(x, end_date = as.Date("1972-09-30"), min_length = 60, msgs = TRUE)
x |
A data frame with two columns |
end_date |
A Date scalar of the last date within each year to consider (the year is ignored). |
min_length |
A whole number of the minimum number of values to consider. |
msgs |
A flag specifying whether to provide messages. |
gsdd::gdd(), gsdd() and gss().
gdd(gsdd::temperature_data)gdd(gsdd::temperature_data)
A wrapper on gsdd::gsdd() to get the Growing Season Degree Days for
the longest growing season.
gsdd(x, min_length = 120, msgs = TRUE)gsdd(x, min_length = 120, msgs = TRUE)
x |
A data frame with two columns |
min_length |
A whole number of the minimum number of values to consider. |
msgs |
A flag specifying whether to provide messages. |
gsdd::gsdd(), gsdd() and gss().
gsdd(gsdd::temperature_data)gsdd(gsdd::temperature_data)
Soft-deprecated for gsdd::gsdd_vctr().
gsdd_cf(x, ignore_truncation = FALSE, min_length = 120, msgs = TRUE)gsdd_cf(x, ignore_truncation = FALSE, min_length = 120, msgs = TRUE)
x |
A numeric vector of the mean daily water temperature values for the period of interest in C. |
ignore_truncation |
A flag specifying whether to ignore truncation of the mean daily water temperature vector or a string of "start", "end", "none" (equivalent to FALSE) or "both" (equivalent to TRUE) specifying which type of truncation to ignore. |
min_length |
A whole number of the minimum number of values to consider. |
msgs |
A flag specifying whether to provide messages. |
A non-negative real number of the GSDD.
gsdd_cf(c(rep(1, 10), rep(10, 20), rep(1, 200))) gsdd_cf(gsdd::temperature_data$temperature)gsdd_cf(c(rep(1, 10), rep(10, 20), rep(1, 200))) gsdd_cf(gsdd::temperature_data$temperature)
A wrapper on gsdd::gss() to by default
get all Growing Seasons ignoring truncation.
For more information see gsdd::gss().
gss(x, min_length = 120, ignore_truncation = "end", pick = "all", msgs = TRUE)gss(x, min_length = 120, ignore_truncation = "end", pick = "all", msgs = TRUE)
x |
A data frame with two columns |
min_length |
A whole number of the minimum number of values to consider. |
ignore_truncation |
A flag specifying whether to ignore truncation of the mean daily water temperature vector or a string of "start", "end", "none" (equivalent to FALSE) or "both" (equivalent to TRUE) specifying which type of truncation to ignore. |
pick |
A string specifying whether to pick the "longest", "shortest", "first" or "last" 'season' or the season with the "biggest" or "smallest" GSDD. By default the returned value is the the GSDD value for the "longest" 'season'. |
msgs |
A flag specifying whether to provide messages. |
gsdd::gss(), gsdd() and gss().
gss(gsdd::temperature_data)gss(gsdd::temperature_data)
A wrapper on gsdd::gss_plot() to by default
plot all Growing Seasons ignoring truncation.
For more information see gsdd::gss_plot().
gss_plot( x, min_length = 60, ignore_truncation = TRUE, pick = "all", latex = FALSE, nrow = NULL, ncol = NULL, msgs = TRUE )gss_plot( x, min_length = 60, ignore_truncation = TRUE, pick = "all", latex = FALSE, nrow = NULL, ncol = NULL, msgs = TRUE )
x |
A data frame with two columns |
min_length |
A whole number of the minimum number of values to consider. |
ignore_truncation |
A flag specifying whether to ignore truncation of the mean daily water temperature vector or a string of "start", "end", "none" (equivalent to FALSE) or "both" (equivalent to TRUE) specifying which type of truncation to ignore. |
pick |
A string specifying whether to pick the "longest", "shortest", "first" or "last" 'season' or the season with the "biggest" or "smallest" GSDD. By default the returned value is the the GSDD value for the "longest" 'season'. |
latex |
A flag specifying whether to use LaTeX to include degree symbol in y-axis label. |
nrow |
A count of the number of rows to facet by. |
ncol |
A count of the number of columns to facet by. |
msgs |
A flag specifying whether to provide messages. |
gsdd::gss_plot() and gss().
gss_plot(gsdd::temperature_data)gss_plot(gsdd::temperature_data)
Useful for filling in short runs of missing values in a time series.
interpolate_numeric_vector(x, span = 3, tails = FALSE)interpolate_numeric_vector(x, span = 3, tails = FALSE)
x |
A double or integer vector of with missing values to fill in using linear interpolation. |
span |
A whole number of the maximum span of missing values to interpolate. If a gap exceeds the span none of the values are interpolate. |
tails |
A flag specifying whether to fill in missing values at the start and end by setting them to be the same value as the closest adjacent non-missing value. |
interpolate_numeric_vector() is essentially a wrapper on stats::approx().
A double or integer vector.
interpolate_numeric_vector(c(1, NA, 4)) interpolate_numeric_vector(c(1L, NA, 4L)) interpolate_numeric_vector(c(1, NA, NA, NA, NA, 3)) interpolate_numeric_vector(c(1, NA, NA, NA, NA, 3), span = 4) interpolate_numeric_vector(c(NA, NA, 10, 1, NA)) interpolate_numeric_vector(c(NA, NA, 10, 1, NA), tails = TRUE)interpolate_numeric_vector(c(1, NA, 4)) interpolate_numeric_vector(c(1L, NA, 4L)) interpolate_numeric_vector(c(1, NA, NA, NA, NA, 3)) interpolate_numeric_vector(c(1, NA, NA, NA, NA, 3), span = 4) interpolate_numeric_vector(c(NA, NA, 10, 1, NA)) interpolate_numeric_vector(c(NA, NA, 10, 1, NA), tails = TRUE)