The wqbc
R package
calculates water quality limits for British Columbia.
Previously it also calculated the CCME Water Quality Index but that functionality has been moved to the wqindex package.
The data used in this demonstration are from the Fraser River basin (data available here under the Candian Open Government License.
To load the wqbc
package and the fraser
data run
The fraser
data is organized so that each row
corresponds to one observation.
library(tibble) # for prettier printing of tibbles
print(fraser)
#> # A tibble: 242,631 × 8
#> SiteID Date Variable Value Units Site Lat Long
#> <chr> <date> <chr> <dbl> <chr> <chr> <dbl> <dbl>
#> 1 BC08KA0007 2000-01-12 ALKALINITY TOTAL CACO3 59.1 MG/L FRASE… 53.0 -119.
#> 2 BC08KA0007 2000-01-12 ALUMINUM TOTAL 0.061 MG/L FRASE… 53.0 -119.
#> 3 BC08KA0007 2000-01-12 BARIUM EXTRACTABLE 14.5 UG/L FRASE… 53.0 -119.
#> 4 BC08KA0007 2000-01-12 BARIUM TOTAL 0.0141 MG/L FRASE… 53.0 -119.
#> 5 BC08KA0007 2000-01-12 BERYLLIUM EXTRACTABLE 0 UG/L FRASE… 53.0 -119.
#> 6 BC08KA0007 2000-01-12 BERYLLIUM TOTAL 0 UG/L FRASE… 53.0 -119.
#> 7 BC08KA0007 2000-01-12 BORON EXTRACTABLE 0.77 UG/L FRASE… 53.0 -119.
#> 8 BC08KA0007 2000-01-12 CADMIUM EXTRACTABLE 0.007 UG/L FRASE… 53.0 -119.
#> 9 BC08KA0007 2000-01-12 CADMIUM TOTAL 0 MG/L FRASE… 53.0 -119.
#> 10 BC08KA0007 2000-01-12 CALCIUM EXTRACTABLE 19.7 MG/L FRASE… 53.0 -119.
#> # ℹ 242,621 more rows
As the fraser
data is a large dataset (242631 rows)
we’ll just use the data from 2012.
data2012 <- dplyr::filter(fraser, lubridate::year(Date) == 2012)
data2012
#> # A tibble: 17,881 × 8
#> SiteID Date Variable Value Units Site Lat Long
#> <chr> <date> <chr> <dbl> <chr> <chr> <dbl> <dbl>
#> 1 BC08KA0007 2012-01-24 ALKALINITY TOTAL CACO3 64.1 MG/L FRASER… 53.0 -119.
#> 2 BC08KA0007 2012-01-24 ALUMINUM TOTAL 42 UG/L FRASER… 53.0 -119.
#> 3 BC08KA0007 2012-01-24 ANTIMONY TOTAL 0.009 UG/L FRASER… 53.0 -119.
#> 4 BC08KA0007 2012-01-24 ARSENIC TOTAL 0.12 UG/L FRASER… 53.0 -119.
#> 5 BC08KA0007 2012-01-24 BARIUM TOTAL 15 UG/L FRASER… 53.0 -119.
#> 6 BC08KA0007 2012-01-24 BERYLLIUM TOTAL 0 UG/L FRASER… 53.0 -119.
#> 7 BC08KA0007 2012-01-24 BISMUTH TOTAL 0 UG/L FRASER… 53.0 -119.
#> 8 BC08KA0007 2012-01-24 BORON TOTAL 1.6 UG/L FRASER… 53.0 -119.
#> 9 BC08KA0007 2012-01-24 CADMIUM TOTAL 0.003 UG/L FRASER… 53.0 -119.
#> 10 BC08KA0007 2012-01-24 CALCIUM DISSOLVED 20 MG/L FRASER… 53.0 -119.
#> # ℹ 17,871 more rows
This leaves us with a dataset with 17881 rows. Before the data can be
assigned the water quality limits provided by wqbc
they
first have to be standardized and then cleaned.
The standardize_wqdata()
function converts any
non-standard variable names, checks (and if possible) converts the units
and removes any missing and negative values.
data2012 <- standardize_wqdata(data2012)
#> Standardizing water quality data...
#> Deleted 43 rows with negative values in Value.
#> Substituted 'ALKALINITY TOTAL CACO3' with 'Alkalinity Total', 'ALUMINUM DISSOLVED' with 'Aluminum Dissolved', 'ALUMINUM TOTAL' with 'Aluminum Total', 'AMMONIA DISSOLVED' with 'Ammonia Dissolved', 'ARSENIC DISSOLVED' with 'Arsenic Dissolved', 'ARSENIC TOTAL' with 'Arsenic Total', 'BARIUM DISSOLVED' with 'Barium Dissolved', 'BARIUM TOTAL' with 'Barium Total', 'BERYLLIUM TOTAL' with 'Beryllium Total', 'BORON DISSOLVED' with 'Boron Dissolved', 'BORON TOTAL' with 'Boron Total', 'CADMIUM DISSOLVED' with 'Cadmium Dissolved', 'CADMIUM TOTAL' with 'Cadmium Total', 'CALCIUM DISSOLVED' with 'Calcium Dissolved', 'CARBON DISSOLVED ORGANIC' with 'Dissolved Organic Carbon', 'CARBON TOTAL ORGANIC' with 'Organic Carbon Total', 'CHLORIDE DISSOLVED' with 'Chloride Dissolved', 'CHROMIUM DISSOLVED' with 'Chromium Dissolved', 'CHROMIUM TOTAL' with 'Chromium Total', 'COBALT DISSOLVED' with 'Cobalt Dissolved', 'COBALT TOTAL' with 'Cobalt Total', 'COLOUR TRUE' with 'Colour True', 'COPPER DISSOLVED' with 'Copper Dissolved', 'COPPER TOTAL' with 'Copper Total', 'FIELD PH' with 'pH', 'FLUORIDE DISSOLVED' with 'Fluoride Dissolved', 'HARDNESS TOTAL (CALCD.) CACO3' with 'Hardness Total', 'IRON DISSOLVED' with 'Iron Dissolved', 'IRON TOTAL' with 'Iron Total', 'LEAD DISSOLVED' with 'Lead Dissolved', 'LEAD TOTAL' with 'Lead Total', 'LITHIUM TOTAL' with 'Lithium Total', 'MANGANESE DISSOLVED' with 'Manganese Dissolved', 'MANGANESE TOTAL' with 'Manganese Total', 'MOLYBDENUM DISSOLVED' with 'Molybdenum Dissolved', 'MOLYBDENUM TOTAL' with 'Molybdenum Total', 'NICKEL DISSOLVED' with 'Nickel Dissolved', 'NICKEL TOTAL' with 'Nickel Total', 'NITROGEN DISSOLVED KJELDAHL' with 'Nitrogen Dissolved', 'NITROGEN DISSOLVED NITRATE' with 'Nitrogen Dissolved', 'NITROGEN DISSOLVED NO3 & NO2' with 'Nitrogen Dissolved', 'NITROGEN TOTAL' with 'Nitrogen Total', 'NITROGEN TOTAL DISSOLVED' with 'Nitrogen Dissolved', 'NITROGEN TOTAL KJELDAHL' with 'Nitrogen Total', 'OXYGEN DISSOLVED' with 'Oxygen Dissolved', 'PH' with 'pH', 'PHOSPHORUS TOTAL' with 'Phosphorus Total', 'PHOSPHORUS TOTAL DISSOLVED' with 'Phosphorus Dissolved', 'SALINITY' with 'Salinity', 'SELENIUM DISSOLVED' with 'Selenium Dissolved', 'SELENIUM TOTAL' with 'Selenium Total', 'SILVER DISSOLVED' with 'Silver Dissolved', 'SILVER TOTAL' with 'Silver Total', 'STRONTIUM TOTAL' with 'Strontium Total', 'SULPHATE DISSOLVED' with 'Sulphate Dissolved', 'TEMPERATURE AIR' with 'Temperature', 'TEMPERATURE WATER' with 'Temperature', 'THALLIUM DISSOLVED' with 'Thallium Dissolved', 'THALLIUM TOTAL' with 'Thallium Total', 'TURBIDITY' with 'Turbidity', 'URANIUM DISSOLVED' with 'Uranium Dissolved', 'URANIUM TOTAL' with 'Uranium Total', 'VANADIUM TOTAL' with 'Vanadium Total', 'ZINC DISSOLVED' with 'Zinc Dissolved' and 'ZINC TOTAL' with 'Zinc Total'.
#> Failed to substitute 'ALKALINITY GRAN CACO3', 'ANTIMONY DISSOLVED', 'ANTIMONY TOTAL', 'BERYLLIUM DISSOLVED', 'BISMUTH DISSOLVED', 'BISMUTH TOTAL', 'CARBON DISSOLVED INORGANIC', 'CERIUM DISSOLVED', 'CERIUM TOTAL', 'CESIUM DISSOLVED', 'CESIUM TOTAL', 'GALLIUM TOTAL', 'LANTHANUM DISSOLVED', 'LANTHANUM TOTAL', 'LITHIUM DISSOLVED', 'MAGNESIUM DISSOLVED', 'NIOBIUM DISSOLVED', 'NIOBIUM TOTAL', 'NITROGEN NITRITE', 'PLATINUM DISSOLVED', 'PLATINUM TOTAL', 'POTASSIUM DISSOLVED', 'RESIDUE FILTERABLE', 'RESIDUE NONFILTRABLE', 'RUBIDIUM DISSOLVED', 'RUBIDIUM TOTAL', 'SODIUM DISSOLVED', 'SPECIFIC CONDUCTANCE', 'SPECIFIC CONDUCTANCE - FIELD', 'STRONTIUM DISSOLVED', 'TIN DISSOLVED', 'TIN TOTAL', 'TUNGSTEN DISSOLVED', 'TUNGSTEN TOTAL', 'VANADIUM DISSOLVED' and 'YTTRIUM TOTAL'.
#> Deleted 5582 rows with missing values in Variable.
#> Substituted 'DEG C' with 'degC', 'MG/L' with 'mg/L', 'PH UNITS' with 'pH' and 'UG/L' with 'ug/L'.
#> Failed to substitute 'COLOR UNIT', 'PPT' and 'RATIO'.
#> Deleted 285 rows with missing values in Units.
#> Standardized water quality data.
As a result of the standardization, the 2012 Fraser dataset has been
reduced to 11971 observations that can in principle be assigned water
quality limits by the wqbc
package. However, first it is
necessary to deal with multiple observations by cleaning the data.
as_tibble(data2012)
#> # A tibble: 11,971 × 8
#> SiteID Date Variable Value Units Site Lat Long
#> <chr> <date> <chr> <dbl> <chr> <chr> <dbl> <dbl>
#> 1 BC08KA0007 2012-01-24 Alkalinity Total 64.1 mg/L FRASER RIVER … 53.0 -119.
#> 2 BC08KA0007 2012-04-11 Alkalinity Total 65.1 mg/L FRASER RIVER … 53.0 -119.
#> 3 BC08KA0007 2012-04-18 Alkalinity Total 63.5 mg/L FRASER RIVER … 53.0 -119.
#> 4 BC08KA0007 2012-05-01 Alkalinity Total 59.5 mg/L FRASER RIVER … 53.0 -119.
#> 5 BC08KA0007 2012-05-16 Alkalinity Total 59.5 mg/L FRASER RIVER … 53.0 -119.
#> 6 BC08KA0007 2012-05-30 Alkalinity Total 58.8 mg/L FRASER RIVER … 53.0 -119.
#> 7 BC08KA0007 2012-06-11 Alkalinity Total 56.5 mg/L FRASER RIVER … 53.0 -119.
#> 8 BC08KA0007 2012-07-10 Alkalinity Total 43.2 mg/L FRASER RIVER … 53.0 -119.
#> 9 BC08KA0007 2012-07-16 Alkalinity Total 40.8 mg/L FRASER RIVER … 53.0 -119.
#> 10 BC08KA0007 2012-07-25 Alkalinity Total 40.9 mg/L FRASER RIVER … 53.0 -119.
#> # ℹ 11,961 more rows
After standardization, it is necessary to ensure that there are only
single values for each date for a given variable and this is done using
the function clean_wqdata()
.
data2012 <- clean_wqdata(data2012, by = "SiteID")
#> Cleaning water quality data...
#> Identified 0 outliers in water quality data.
#> Cleansed water quality data.
The end result is a data frame with 10654 rows each of which
represents the average value for a single variable on a particular date
at an individual SiteID
.
as_tibble(data2012)
#> # A tibble: 10,654 × 7
#> SiteID Date Variable Value Units Outlier DetectionLimit
#> <chr> <date> <chr> <dbl> <chr> <lgl> <dbl>
#> 1 BC08KA0007 2012-01-24 Alkalinity Total 64.1 mg/L FALSE NA
#> 2 BC08KA0007 2012-04-11 Alkalinity Total 65.1 mg/L FALSE NA
#> 3 BC08KA0007 2012-04-18 Alkalinity Total 63.5 mg/L FALSE NA
#> 4 BC08KA0007 2012-05-01 Alkalinity Total 59.5 mg/L FALSE NA
#> 5 BC08KA0007 2012-05-16 Alkalinity Total 59.5 mg/L FALSE NA
#> 6 BC08KA0007 2012-05-30 Alkalinity Total 58.8 mg/L FALSE NA
#> 7 BC08KA0007 2012-06-11 Alkalinity Total 56.5 mg/L FALSE NA
#> 8 BC08KA0007 2012-07-10 Alkalinity Total 43.2 mg/L FALSE NA
#> 9 BC08KA0007 2012-07-16 Alkalinity Total 40.8 mg/L FALSE NA
#> 10 BC08KA0007 2012-07-25 Alkalinity Total 40.9 mg/L FALSE NA
#> # ℹ 10,644 more rows
Once the data have been standardized and cleansed the final task is
to determine the water quality limits for each observation using the
calc_limits()
function.
data2012 <- calc_limits(data2012, by = "SiteID", term = "short")
#> Cleaning water quality data...
#> Identified 0 outliers in water quality data.
#> Cleansed water quality data.
#> Standardizing water quality data...
#> Standardized water quality data.
#> Calculating short-term water quality limits...
#> Dropped 147 values for Alkalinity Total without limits
#> Dropped 278 values for Aluminum Total without limits
#> Dropped 142 values for Ammonia Dissolved without limits
#> Dropped 34 values for Arsenic Dissolved without limits
#> Dropped 34 values for Barium Dissolved without limits
#> Dropped 279 values for Barium Total without limits
#> Dropped 279 values for Beryllium Total without limits
#> Dropped 34 values for Boron Dissolved without limits
#> Dropped 279 values for Boron Total without limits
#> Dropped 1 values for Cadmium Dissolved without limits
#> Dropped 279 values for Cadmium Total without limits
#> Dropped 284 values for Calcium Dissolved without limits
#> Dropped 34 values for Chromium Dissolved without limits
#> Dropped 278 values for Chromium Total without limits
#> Dropped 34 values for Cobalt Dissolved without limits
#> Dropped 1 values for Copper Dissolved without limits
#> Dropped 278 values for Copper Total without limits
#> Dropped 283 values for Dissolved Organic Carbon without limits
#> Dropped 162 values for Fluoride Dissolved without limits
#> Dropped 284 values for Hardness Total without limits
#> Dropped 33 values for Lead Dissolved without limits
#> Dropped 279 values for Lithium Total without limits
#> Dropped 34 values for Manganese Dissolved without limits
#> Dropped 34 values for Molybdenum Dissolved without limits
#> Dropped 34 values for Nickel Dissolved without limits
#> Dropped 278 values for Nickel Total without limits
#> Dropped 284 values for Nitrogen Dissolved without limits
#> Dropped 284 values for Nitrogen Total without limits
#> Dropped 23 values for Organic Carbon Total without limits
#> Dropped 192 values for Oxygen Dissolved without limits
#> Dropped 281 values for Phosphorus Dissolved without limits
#> Dropped 264 values for Phosphorus Total without limits
#> Dropped 34 values for Selenium Dissolved without limits
#> Dropped 279 values for Selenium Total without limits
#> Dropped 33 values for Silver Dissolved without limits
#> Dropped 279 values for Strontium Total without limits
#> Dropped 199 values for Sulphate Dissolved without limits
#> Dropped 279 values for Temperature without limits
#> Dropped 34 values for Thallium Dissolved without limits
#> Dropped 279 values for Thallium Total without limits
#> Dropped 284 values for Turbidity without limits
#> Dropped 34 values for Uranium Dissolved without limits
#> Dropped 279 values for Uranium Total without limits
#> Dropped 278 values for Vanadium Total without limits
#> Dropped 33 values for Zinc Dissolved without limits
#> Dropped 284 values for pH without limits
#> Calculated short-term water quality limits.
The final result is a data frame with 2602 rows each of which has an upper water quality limit.
as_tibble(data2012)
#> # A tibble: 2,602 × 8
#> SiteID Date Variable Value UpperLimit Units Outlier DetectionLimit
#> <chr> <date> <chr> <dbl> <dbl> <chr> <lgl> <dbl>
#> 1 BC08KA00… 2012-01-24 Arsenic… 0.12 5 ug/L NA NA
#> 2 BC08KA00… 2012-01-24 Chlorid… 0 600 mg/L NA NA
#> 3 BC08KA00… 2012-01-24 Cobalt … 0.2 110 ug/L NA NA
#> 4 BC08KA00… 2012-01-24 Iron To… 0.0303 1 mg/L NA NA
#> 5 BC08KA00… 2012-01-24 Lead To… 0.01 61.2 ug/L NA NA
#> 6 BC08KA00… 2012-01-24 Mangane… 0.0112 1.42 mg/L NA NA
#> 7 BC08KA00… 2012-01-24 Molybde… 0.00007 2 mg/L NA NA
#> 8 BC08KA00… 2012-01-24 Silver … 0 0.1 ug/L NA NA
#> 9 BC08KA00… 2012-01-24 Zinc To… 0.3 33 ug/L NA NA
#> 10 BC08KA00… 2012-04-11 Arsenic… 0.1 5 ug/L NA NA
#> # ℹ 2,592 more rows
The resultant data can be used to calculate water quality indices
using the calc_wqi()
function of the wqindex
package.