Getting Started with wqbc

The wqbc R package calculates water quality limits for British Columbia.

Previously it also calculated the CCME Water Quality Index but that functionality has been moved to the wqindex package.

Fraser Data

The data used in this demonstration are from the Fraser River basin (data available here under the Candian Open Government License.

To load the wqbc package and the fraser data run

library(wqbc)
data(fraser)

The fraser data is organized so that each row corresponds to one observation.

library(tibble) # for prettier printing of tibbles
print(fraser)
#> # A tibble: 242,631 × 8
#>    SiteID     Date       Variable                 Value Units Site     Lat  Long
#>    <chr>      <date>     <chr>                    <dbl> <chr> <chr>  <dbl> <dbl>
#>  1 BC08KA0007 2000-01-12 ALKALINITY TOTAL CACO3 59.1    MG/L  FRASE…  53.0 -119.
#>  2 BC08KA0007 2000-01-12 ALUMINUM TOTAL          0.061  MG/L  FRASE…  53.0 -119.
#>  3 BC08KA0007 2000-01-12 BARIUM EXTRACTABLE     14.5    UG/L  FRASE…  53.0 -119.
#>  4 BC08KA0007 2000-01-12 BARIUM TOTAL            0.0141 MG/L  FRASE…  53.0 -119.
#>  5 BC08KA0007 2000-01-12 BERYLLIUM EXTRACTABLE   0      UG/L  FRASE…  53.0 -119.
#>  6 BC08KA0007 2000-01-12 BERYLLIUM TOTAL         0      UG/L  FRASE…  53.0 -119.
#>  7 BC08KA0007 2000-01-12 BORON EXTRACTABLE       0.77   UG/L  FRASE…  53.0 -119.
#>  8 BC08KA0007 2000-01-12 CADMIUM EXTRACTABLE     0.007  UG/L  FRASE…  53.0 -119.
#>  9 BC08KA0007 2000-01-12 CADMIUM TOTAL           0      MG/L  FRASE…  53.0 -119.
#> 10 BC08KA0007 2000-01-12 CALCIUM EXTRACTABLE    19.7    MG/L  FRASE…  53.0 -119.
#> # ℹ 242,621 more rows

As the fraser data is a large dataset (242631 rows) we’ll just use the data from 2012.

data2012 <- dplyr::filter(fraser, lubridate::year(Date) == 2012)
data2012
#> # A tibble: 17,881 × 8
#>    SiteID     Date       Variable                Value Units Site      Lat  Long
#>    <chr>      <date>     <chr>                   <dbl> <chr> <chr>   <dbl> <dbl>
#>  1 BC08KA0007 2012-01-24 ALKALINITY TOTAL CACO3 64.1   MG/L  FRASER…  53.0 -119.
#>  2 BC08KA0007 2012-01-24 ALUMINUM TOTAL         42     UG/L  FRASER…  53.0 -119.
#>  3 BC08KA0007 2012-01-24 ANTIMONY TOTAL          0.009 UG/L  FRASER…  53.0 -119.
#>  4 BC08KA0007 2012-01-24 ARSENIC TOTAL           0.12  UG/L  FRASER…  53.0 -119.
#>  5 BC08KA0007 2012-01-24 BARIUM TOTAL           15     UG/L  FRASER…  53.0 -119.
#>  6 BC08KA0007 2012-01-24 BERYLLIUM TOTAL         0     UG/L  FRASER…  53.0 -119.
#>  7 BC08KA0007 2012-01-24 BISMUTH TOTAL           0     UG/L  FRASER…  53.0 -119.
#>  8 BC08KA0007 2012-01-24 BORON TOTAL             1.6   UG/L  FRASER…  53.0 -119.
#>  9 BC08KA0007 2012-01-24 CADMIUM TOTAL           0.003 UG/L  FRASER…  53.0 -119.
#> 10 BC08KA0007 2012-01-24 CALCIUM DISSOLVED      20     MG/L  FRASER…  53.0 -119.
#> # ℹ 17,871 more rows

This leaves us with a dataset with 17881 rows. Before the data can be assigned the water quality limits provided by wqbc they first have to be standardized and then cleaned.

Standardizing Data

The standardize_wqdata() function converts any non-standard variable names, checks (and if possible) converts the units and removes any missing and negative values.

data2012 <- standardize_wqdata(data2012)
#> Standardizing water quality data...
#> Deleted 43 rows with negative values in Value.
#> Substituted 'ALKALINITY TOTAL CACO3' with 'Alkalinity Total', 'ALUMINUM DISSOLVED' with 'Aluminum Dissolved', 'ALUMINUM TOTAL' with 'Aluminum Total', 'AMMONIA DISSOLVED' with 'Ammonia Dissolved', 'ARSENIC DISSOLVED' with 'Arsenic Dissolved', 'ARSENIC TOTAL' with 'Arsenic Total', 'BARIUM DISSOLVED' with 'Barium Dissolved', 'BARIUM TOTAL' with 'Barium Total', 'BERYLLIUM TOTAL' with 'Beryllium Total', 'BORON DISSOLVED' with 'Boron Dissolved', 'BORON TOTAL' with 'Boron Total', 'CADMIUM DISSOLVED' with 'Cadmium Dissolved', 'CADMIUM TOTAL' with 'Cadmium Total', 'CALCIUM DISSOLVED' with 'Calcium Dissolved', 'CARBON DISSOLVED ORGANIC' with 'Dissolved Organic Carbon', 'CARBON TOTAL ORGANIC' with 'Organic Carbon Total', 'CHLORIDE DISSOLVED' with 'Chloride Dissolved', 'CHROMIUM DISSOLVED' with 'Chromium Dissolved', 'CHROMIUM TOTAL' with 'Chromium Total', 'COBALT DISSOLVED' with 'Cobalt Dissolved', 'COBALT TOTAL' with 'Cobalt Total', 'COLOUR TRUE' with 'Colour True', 'COPPER DISSOLVED' with 'Copper Dissolved', 'COPPER TOTAL' with 'Copper Total', 'FIELD PH' with 'pH', 'FLUORIDE DISSOLVED' with 'Fluoride Dissolved', 'HARDNESS TOTAL (CALCD.) CACO3' with 'Hardness Total', 'IRON DISSOLVED' with 'Iron Dissolved', 'IRON TOTAL' with 'Iron Total', 'LEAD DISSOLVED' with 'Lead Dissolved', 'LEAD TOTAL' with 'Lead Total', 'LITHIUM TOTAL' with 'Lithium Total', 'MANGANESE DISSOLVED' with 'Manganese Dissolved', 'MANGANESE TOTAL' with 'Manganese Total', 'MOLYBDENUM DISSOLVED' with 'Molybdenum Dissolved', 'MOLYBDENUM TOTAL' with 'Molybdenum Total', 'NICKEL DISSOLVED' with 'Nickel Dissolved', 'NICKEL TOTAL' with 'Nickel Total', 'NITROGEN DISSOLVED KJELDAHL' with 'Nitrogen Dissolved', 'NITROGEN DISSOLVED NITRATE' with 'Nitrogen Dissolved', 'NITROGEN DISSOLVED NO3 & NO2' with 'Nitrogen Dissolved', 'NITROGEN TOTAL' with 'Nitrogen Total', 'NITROGEN TOTAL DISSOLVED' with 'Nitrogen Dissolved', 'NITROGEN TOTAL KJELDAHL' with 'Nitrogen Total', 'OXYGEN DISSOLVED' with 'Oxygen Dissolved', 'PH' with 'pH', 'PHOSPHORUS TOTAL' with 'Phosphorus Total', 'PHOSPHORUS TOTAL DISSOLVED' with 'Phosphorus Dissolved', 'SALINITY' with 'Salinity', 'SELENIUM DISSOLVED' with 'Selenium Dissolved', 'SELENIUM TOTAL' with 'Selenium Total', 'SILVER DISSOLVED' with 'Silver Dissolved', 'SILVER TOTAL' with 'Silver Total', 'STRONTIUM TOTAL' with 'Strontium Total', 'SULPHATE DISSOLVED' with 'Sulphate Dissolved', 'TEMPERATURE AIR' with 'Temperature', 'TEMPERATURE WATER' with 'Temperature', 'THALLIUM DISSOLVED' with 'Thallium Dissolved', 'THALLIUM TOTAL' with 'Thallium Total', 'TURBIDITY' with 'Turbidity', 'URANIUM DISSOLVED' with 'Uranium Dissolved', 'URANIUM TOTAL' with 'Uranium Total', 'VANADIUM TOTAL' with 'Vanadium Total', 'ZINC DISSOLVED' with 'Zinc Dissolved' and 'ZINC TOTAL' with 'Zinc Total'.
#> Failed to substitute 'ALKALINITY GRAN CACO3', 'ANTIMONY DISSOLVED', 'ANTIMONY TOTAL', 'BERYLLIUM DISSOLVED', 'BISMUTH DISSOLVED', 'BISMUTH TOTAL', 'CARBON DISSOLVED INORGANIC', 'CERIUM DISSOLVED', 'CERIUM TOTAL', 'CESIUM DISSOLVED', 'CESIUM TOTAL', 'GALLIUM TOTAL', 'LANTHANUM DISSOLVED', 'LANTHANUM TOTAL', 'LITHIUM DISSOLVED', 'MAGNESIUM DISSOLVED', 'NIOBIUM DISSOLVED', 'NIOBIUM TOTAL', 'NITROGEN NITRITE', 'PLATINUM DISSOLVED', 'PLATINUM TOTAL', 'POTASSIUM DISSOLVED', 'RESIDUE FILTERABLE', 'RESIDUE NONFILTRABLE', 'RUBIDIUM DISSOLVED', 'RUBIDIUM TOTAL', 'SODIUM DISSOLVED', 'SPECIFIC CONDUCTANCE', 'SPECIFIC CONDUCTANCE - FIELD', 'STRONTIUM DISSOLVED', 'TIN DISSOLVED', 'TIN TOTAL', 'TUNGSTEN DISSOLVED', 'TUNGSTEN TOTAL', 'VANADIUM DISSOLVED' and 'YTTRIUM TOTAL'.
#> Deleted 5582 rows with missing values in Variable.
#> Substituted 'DEG C' with 'degC', 'MG/L' with 'mg/L', 'PH UNITS' with 'pH' and 'UG/L' with 'ug/L'.
#> Failed to substitute 'COLOR UNIT', 'PPT' and 'RATIO'.
#> Deleted 285 rows with missing values in Units.
#> Standardized water quality data.

As a result of the standardization, the 2012 Fraser dataset has been reduced to 11971 observations that can in principle be assigned water quality limits by the wqbc package. However, first it is necessary to deal with multiple observations by cleaning the data.

as_tibble(data2012)
#> # A tibble: 11,971 × 8
#>    SiteID     Date       Variable         Value Units Site             Lat  Long
#>    <chr>      <date>     <chr>            <dbl> <chr> <chr>          <dbl> <dbl>
#>  1 BC08KA0007 2012-01-24 Alkalinity Total  64.1 mg/L  FRASER RIVER …  53.0 -119.
#>  2 BC08KA0007 2012-04-11 Alkalinity Total  65.1 mg/L  FRASER RIVER …  53.0 -119.
#>  3 BC08KA0007 2012-04-18 Alkalinity Total  63.5 mg/L  FRASER RIVER …  53.0 -119.
#>  4 BC08KA0007 2012-05-01 Alkalinity Total  59.5 mg/L  FRASER RIVER …  53.0 -119.
#>  5 BC08KA0007 2012-05-16 Alkalinity Total  59.5 mg/L  FRASER RIVER …  53.0 -119.
#>  6 BC08KA0007 2012-05-30 Alkalinity Total  58.8 mg/L  FRASER RIVER …  53.0 -119.
#>  7 BC08KA0007 2012-06-11 Alkalinity Total  56.5 mg/L  FRASER RIVER …  53.0 -119.
#>  8 BC08KA0007 2012-07-10 Alkalinity Total  43.2 mg/L  FRASER RIVER …  53.0 -119.
#>  9 BC08KA0007 2012-07-16 Alkalinity Total  40.8 mg/L  FRASER RIVER …  53.0 -119.
#> 10 BC08KA0007 2012-07-25 Alkalinity Total  40.9 mg/L  FRASER RIVER …  53.0 -119.
#> # ℹ 11,961 more rows

Cleaning Data

After standardization, it is necessary to ensure that there are only single values for each date for a given variable and this is done using the function clean_wqdata().

data2012 <- clean_wqdata(data2012, by = "SiteID")
#> Cleaning water quality data...
#> Identified 0 outliers in water quality data.
#> Cleansed water quality data.

The end result is a data frame with 10654 rows each of which represents the average value for a single variable on a particular date at an individual SiteID.

as_tibble(data2012)
#> # A tibble: 10,654 × 7
#>    SiteID     Date       Variable         Value Units Outlier DetectionLimit
#>    <chr>      <date>     <chr>            <dbl> <chr> <lgl>            <dbl>
#>  1 BC08KA0007 2012-01-24 Alkalinity Total  64.1 mg/L  FALSE               NA
#>  2 BC08KA0007 2012-04-11 Alkalinity Total  65.1 mg/L  FALSE               NA
#>  3 BC08KA0007 2012-04-18 Alkalinity Total  63.5 mg/L  FALSE               NA
#>  4 BC08KA0007 2012-05-01 Alkalinity Total  59.5 mg/L  FALSE               NA
#>  5 BC08KA0007 2012-05-16 Alkalinity Total  59.5 mg/L  FALSE               NA
#>  6 BC08KA0007 2012-05-30 Alkalinity Total  58.8 mg/L  FALSE               NA
#>  7 BC08KA0007 2012-06-11 Alkalinity Total  56.5 mg/L  FALSE               NA
#>  8 BC08KA0007 2012-07-10 Alkalinity Total  43.2 mg/L  FALSE               NA
#>  9 BC08KA0007 2012-07-16 Alkalinity Total  40.8 mg/L  FALSE               NA
#> 10 BC08KA0007 2012-07-25 Alkalinity Total  40.9 mg/L  FALSE               NA
#> # ℹ 10,644 more rows

Calculating Limits

Once the data have been standardized and cleansed the final task is to determine the water quality limits for each observation using the calc_limits() function.

data2012 <- calc_limits(data2012, by = "SiteID", term = "short")
#> Cleaning water quality data...
#> Identified 0 outliers in water quality data.
#> Cleansed water quality data.
#> Standardizing water quality data...
#> Standardized water quality data.
#> Calculating short-term water quality limits...
#> Dropped 147 values for Alkalinity Total without limits
#> Dropped 278 values for Aluminum Total without limits
#> Dropped 142 values for Ammonia Dissolved without limits
#> Dropped 34 values for Arsenic Dissolved without limits
#> Dropped 34 values for Barium Dissolved without limits
#> Dropped 279 values for Barium Total without limits
#> Dropped 279 values for Beryllium Total without limits
#> Dropped 34 values for Boron Dissolved without limits
#> Dropped 279 values for Boron Total without limits
#> Dropped 1 values for Cadmium Dissolved without limits
#> Dropped 279 values for Cadmium Total without limits
#> Dropped 284 values for Calcium Dissolved without limits
#> Dropped 34 values for Chromium Dissolved without limits
#> Dropped 278 values for Chromium Total without limits
#> Dropped 34 values for Cobalt Dissolved without limits
#> Dropped 1 values for Copper Dissolved without limits
#> Dropped 278 values for Copper Total without limits
#> Dropped 283 values for Dissolved Organic Carbon without limits
#> Dropped 162 values for Fluoride Dissolved without limits
#> Dropped 284 values for Hardness Total without limits
#> Dropped 33 values for Lead Dissolved without limits
#> Dropped 279 values for Lithium Total without limits
#> Dropped 34 values for Manganese Dissolved without limits
#> Dropped 34 values for Molybdenum Dissolved without limits
#> Dropped 34 values for Nickel Dissolved without limits
#> Dropped 278 values for Nickel Total without limits
#> Dropped 284 values for Nitrogen Dissolved without limits
#> Dropped 284 values for Nitrogen Total without limits
#> Dropped 23 values for Organic Carbon Total without limits
#> Dropped 192 values for Oxygen Dissolved without limits
#> Dropped 281 values for Phosphorus Dissolved without limits
#> Dropped 264 values for Phosphorus Total without limits
#> Dropped 34 values for Selenium Dissolved without limits
#> Dropped 279 values for Selenium Total without limits
#> Dropped 33 values for Silver Dissolved without limits
#> Dropped 279 values for Strontium Total without limits
#> Dropped 199 values for Sulphate Dissolved without limits
#> Dropped 279 values for Temperature without limits
#> Dropped 34 values for Thallium Dissolved without limits
#> Dropped 279 values for Thallium Total without limits
#> Dropped 284 values for Turbidity without limits
#> Dropped 34 values for Uranium Dissolved without limits
#> Dropped 279 values for Uranium Total without limits
#> Dropped 278 values for Vanadium Total without limits
#> Dropped 33 values for Zinc Dissolved without limits
#> Dropped 284 values for pH without limits
#> Calculated short-term water quality limits.

The final result is a data frame with 2602 rows each of which has an upper water quality limit.

as_tibble(data2012)
#> # A tibble: 2,602 × 8
#>    SiteID    Date       Variable   Value UpperLimit Units Outlier DetectionLimit
#>    <chr>     <date>     <chr>      <dbl>      <dbl> <chr> <lgl>            <dbl>
#>  1 BC08KA00… 2012-01-24 Arsenic… 0.12          5    ug/L  NA                  NA
#>  2 BC08KA00… 2012-01-24 Chlorid… 0           600    mg/L  NA                  NA
#>  3 BC08KA00… 2012-01-24 Cobalt … 0.2         110    ug/L  NA                  NA
#>  4 BC08KA00… 2012-01-24 Iron To… 0.0303        1    mg/L  NA                  NA
#>  5 BC08KA00… 2012-01-24 Lead To… 0.01         61.2  ug/L  NA                  NA
#>  6 BC08KA00… 2012-01-24 Mangane… 0.0112        1.42 mg/L  NA                  NA
#>  7 BC08KA00… 2012-01-24 Molybde… 0.00007       2    mg/L  NA                  NA
#>  8 BC08KA00… 2012-01-24 Silver … 0             0.1  ug/L  NA                  NA
#>  9 BC08KA00… 2012-01-24 Zinc To… 0.3          33    ug/L  NA                  NA
#> 10 BC08KA00… 2012-04-11 Arsenic… 0.1           5    ug/L  NA                  NA
#> # ℹ 2,592 more rows

Calculating Water Quality Index

The resultant data can be used to calculate water quality indices using the calc_wqi() function of the wqindex package.