Title: | Additional 'tidyverse' Functions |
---|---|
Description: | Provides functions such as str_crush(), add_missing_column(), coalesce_data() and drop_na_all() that complement 'tidyverse' functionality or functions that provide alternative behaviors such as if_else2() and str_detect2(). |
Authors: | Joe Thorley [aut] , Ayla Pearson [cre] , Poisson Consulting [cph, fnd] |
Maintainer: | Ayla Pearson <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.0.9002 |
Built: | 2024-11-01 16:19:39 UTC |
Source: | https://github.com/poissonconsulting/tidyplus |
This is a convenient way to add one more columns (if not already present) to an existing data frame. It is useful to ensure that all required columns are present in a data frame.
add_missing_column( .data, ..., .before = NULL, .after = NULL, .name_repair = c("check_unique", "unique", "universal", "minimal") )
add_missing_column( .data, ..., .before = NULL, .after = NULL, .name_repair = c("check_unique", "unique", "universal", "minimal") )
.data |
Data frame to append to. |
... |
< |
.before , .after
|
One-based column index or column name where to add the new columns, default: after last column. |
.name_repair |
Treatment of problematic column names:
This argument is passed on as |
It is wrapper on tibble::add_column()
that doesn't error if the column
is already present.
The original data frame with missing columns added if not already present.
data <- tibble::tibble(x = 1:3, y = 3:1) tibble::add_column(data, z = -1:1, w = 0) add_missing_column(data, z = -1:1, .before = "y") # add_column errors if already present try(tibble::add_column(data, x = 4:6)) # add_missing_column silently ignores add_missing_column(data, x = 4:6)
data <- tibble::tibble(x = 1:3, y = 3:1) tibble::add_column(data, z = -1:1, w = 0) add_missing_column(data, z = -1:1, .before = "y") # add_column errors if already present try(tibble::add_column(data, x = 4:6)) # add_missing_column silently ignores add_missing_column(data, x = 4:6)
Coalesce values in multiple columns by finding the first non-missing value at each position. Coalesced columns are removed.
coalesce_data(x, coalesce = list(), quiet = FALSE)
coalesce_data(x, coalesce = list(), quiet = FALSE)
x |
A data frame. |
coalesce |
A uniquely named list of character vectors where the names are the new column names and the values are the names of the columns to coalesce. If a single value is provided for a column it is treated as a regular expression. |
quiet |
A flag specifying whether to provide messages. |
Coalescence is performed in the order specified in the coalesce argument such that a column produced by coalescence can be further coalesced.
The original data frame with one or more columns coalesced into a new column.
data <- data.frame(x = c(1, NA, NA), y = c(NA, 3, NA), z = c(7, 8, 9), a = c(4, 5, 6)) coalesce_data(data, list(b = c("x", "y")), quiet = TRUE) coalesce_data(data, list(z = c("y", "x"), d = c("z", "a")))
data <- data.frame(x = c(1, NA, NA), y = c(NA, 3, NA), z = c(7, 8, 9), a = c(4, 5, 6)) coalesce_data(data, list(b = c("x", "y")), quiet = TRUE) coalesce_data(data, list(z = c("y", "x"), d = c("z", "a")))
Collapse comments coercing each element to a string (character scalar) and then collapsing into a single string using the '. ' separator.
collapse_comments(...)
collapse_comments(...)
... |
objects to be collapsed into a string. |
A string of the collapsed comments.
collapse_comments("Saw fish", character(0), "Nice. .", NA_character_) data <- data.frame( visit = c(1, 1, 2, 2), fish = 1:4, comment = c("Sunny day. ", "Skinny fish", "Lost boot", NA) ) ## Not run: data |> dplyr::group_by(visit) |> dplyr::summarise(comment = collapse_comments(comment)) |> dplyr::ungroup() ## End(Not run)
collapse_comments("Saw fish", character(0), "Nice. .", NA_character_) data <- data.frame( visit = c(1, 1, 2, 2), fish = 1:4, comment = c("Sunny day. ", "Skinny fish", "Lost boot", NA) ) ## Not run: data |> dplyr::group_by(visit) |> dplyr::summarise(comment = collapse_comments(comment)) |> dplyr::ungroup() ## End(Not run)
This is a convenient way to drop uninformative rows from a data frame.
drop_na_all(data, ...)
drop_na_all(data, ...)
data |
A data frame. |
... |
< |
The original data frame with rows for which all values are missing dropped.
tidyr::drop_na
and drop_uninformative_columns
data <- tibble::tibble( a = c(NA, NA, NA), b = c(1, 1, NA), c = c(2, NA, NA) ) drop_na_all(data) drop_na_all(data, a, c)
data <- tibble::tibble( a = c(NA, NA, NA), b = c(1, 1, NA), c = c(2, NA, NA) ) drop_na_all(data) drop_na_all(data, a, c)
This is a convenient way to drop columns which all have one value (missing or not) or
if na_distinct = FALSE
also drop columns which all have one value and/or missing values.
drop_uninformative_columns(data, na_distinct = TRUE)
drop_uninformative_columns(data, na_distinct = TRUE)
data |
A data frame. |
na_distinct |
A flag specifying whether to treat missing values as distinct from other values. |
The original data frame with only informative columns.
data <- tibble::tibble( a = c(1, 1, 1), x = c(NA, NA, NA), b = c(1, 1, NA), z = c(1, 2, 2), e = c(1, 2, NA) ) drop_uninformative_columns(data) drop_uninformative_columns(data, na_distinct = FALSE)
data <- tibble::tibble( a = c(1, 1, 1), x = c(NA, NA, NA), b = c(1, 1, NA), z = c(1, 2, 2), e = c(1, 2, NA) ) drop_uninformative_columns(data) drop_uninformative_columns(data, na_distinct = FALSE)
Keeps only non-unique rows within a data frame.
duplicates(.data, ..., .keep_all = TRUE)
duplicates(.data, ..., .keep_all = TRUE)
.data |
A data.frame. |
... |
Optional variables to use when determining non-uniqueness. If omitted, will use all variables in the data frame. |
.keep_all |
A flag specifying whether to keep all variables in .data. |
The original data frame with only non-unique rows.
data <- tibble::tibble(x = c(1, 2, 1, 1), y = c(1, 1, 1, 5)) duplicates(data) duplicates(data, x) duplicates(data, y) duplicates(data, x, y) duplicates(data, y, .keep_all = FALSE)
data <- tibble::tibble(x = c(1, 2, 1, 1), y = c(1, 1, 1, 5)) duplicates(data) duplicates(data, x) duplicates(data, y) duplicates(data, x, y) duplicates(data, y, .keep_all = FALSE)
Vectorised if else that if true returns first possibility otherwise returns
second possibility (even if the condition is a missing value).
When searching character vectors an alternative solution is to use
str_detect2()
.
if_else2(condition, true, false)
if_else2(condition, true, false)
condition |
A logical vector |
true , false
|
Vectors to use for Both
|
Where condition is TRUE
, the matching value from true
, where it's FALSE
or NA
, the matching value from false
.
ifelse()
and dplyr::if_else()
.
# consider the following data frame data <- tibble::tibble( x = c(TRUE, FALSE, NA), y = c("x is false", NA, "hello") ) # with a single vector if_else2() behaves the same as the default call to if_else(). dplyr::mutate(data, y1 = dplyr::if_else(y != "x is false", "x is true", y), y2 = if_else2(y != "x is false", "x is true", y) ) # however in the case of a second vector the use of # if_else2() does not introduce missing values dplyr::mutate(data, x1 = dplyr::if_else(stringr::str_detect(y, "x is false"), FALSE, x), x2 = if_else2(stringr::str_detect(y, "x is false"), FALSE, x) ) # in the case of regular expression matching an alternative is to use # str_detect2() dplyr::mutate(data, x3 = dplyr::if_else(str_detect2(y, "x is false"), FALSE, x) )
# consider the following data frame data <- tibble::tibble( x = c(TRUE, FALSE, NA), y = c("x is false", NA, "hello") ) # with a single vector if_else2() behaves the same as the default call to if_else(). dplyr::mutate(data, y1 = dplyr::if_else(y != "x is false", "x is true", y), y2 = if_else2(y != "x is false", "x is true", y) ) # however in the case of a second vector the use of # if_else2() does not introduce missing values dplyr::mutate(data, x1 = dplyr::if_else(stringr::str_detect(y, "x is false"), FALSE, x), x2 = if_else2(stringr::str_detect(y, "x is false"), FALSE, x) ) # in the case of regular expression matching an alternative is to use # str_detect2() dplyr::mutate(data, x3 = dplyr::if_else(str_detect2(y, "x is false"), FALSE, x) )
Extracts the only distinct value from an atomic vector or throws an informative error if no values or multiple distinct values.
only(x, na_rm = FALSE)
only(x, na_rm = FALSE)
x |
An atomic vector. |
na_rm |
A flag indicating whether to exclude missing values. |
only()
is useful when summarizing a vector by group
while checking the assumption that it is constant within the group.
The only distinct value from a vector otherwise throws an error.
only(c(1, 1)) only(c(NA, NA)) only(c(1, 1, NA), na_rm = TRUE) try(only(character(0))) try(only(c(1, NA))) try(only(c(1, 2)))
only(c(1, 1)) only(c(NA, NA)) only(c(1, 1, NA), na_rm = TRUE) try(only(character(0))) try(only(c(1, NA))) try(only(c(1, 2)))
Unlike tidyr::replace_na()
, it is only defined for vectors.
replace_na_if(x, condition, true)
replace_na_if(x, condition, true)
x |
Vector with missing values to modify. |
condition |
A logical vector |
true |
The replacement values where condition is |
replace_na_if()
is a wrapper on if_else2(is.na(x) & condition, true, x)
A modified version of x that replaces any missing values where condition is TRUE
with true
.
tidyr::replace_na()
and if_else2()
data <- tibble::tibble( x = c(TRUE, FALSE, NA), y = c("x is false", NA, "x is false") ) dplyr::mutate(data, x1 = tidyr::replace_na(x, FALSE), x3 = if_else2(is.na(x) & y == "x is false", FALSE, x), x4 = replace_na_if(x, y == "x is false", FALSE) )
data <- tibble::tibble( x = c(TRUE, FALSE, NA), y = c("x is false", NA, "x is false") ) dplyr::mutate(data, x1 = tidyr::replace_na(x, FALSE), x3 = if_else2(is.na(x) & y == "x is false", FALSE, x), x4 = replace_na_if(x, y == "x is false", FALSE) )
str_crush()
, which removes all whitespace from a string,
is the logical extension to stringr::str_trim()
and stringr::str_squish()
.
str_crush(string)
str_crush(string)
string |
Input vector. Either a character vector, or something coercible to one. |
str_crush()
is considered too specialized to be part of stringr.
A character vector the same length as string
.
stringr::str_trim()
and stringr::str_squish()
str_crush(" String with trailing, middle, and leading white space\t")
str_crush(" String with trailing, middle, and leading white space\t")
Vectorised over string
and pattern
.
Actually equivalent to grepl(pattern, x)
as returns FALSE
for NA
s (unlike stringr::str_detect()
).
This behavior is useful when searching comments many of which are NA to
indicate no comments present.
str_detect2(string, pattern, negate = FALSE)
str_detect2(string, pattern, negate = FALSE)
string |
Input vector. Either a character vector, or something coercible to one. |
pattern |
Pattern to look for. The default interpretation is a regular expression, as described in
Match a fixed string (i.e. by comparing only bytes), using
Match character, word, line and sentence boundaries with
|
negate |
If |
A logical vector the same length as string
/pattern
.
grepl()
and stringr::str_detect()
x <- c("b", NA, "ab") pattern <- "^a" grepl(pattern, x) stringr::str_detect(x, pattern) str_detect2(x, pattern)
x <- c("b", NA, "ab") pattern <- "^a" grepl(pattern, x) stringr::str_detect(x, pattern) str_detect2(x, pattern)
String replace multiple strings in a vector.
str_replace_vec(string, replace)
str_replace_vec(string, replace)
string |
Input vector. Either a character vector, or something coercible to one. |
replace |
A character vector where the names are the patterns to look
for and the values are the replacement values |
str_replace_vec()
is a vectorized form of stringr::str_replace()
.
This is different from passing a named vector to stringr::str_replace_all
,
which performs multiple replacements but to all pattern matches in a string.
A character vector the same length as
string
/pattern
/replacement
.
stringr::str_replace()
and stringr::str_replace_all()
fruits <- c("two apples", "nine pears") str_replace_vec(fruits, c("two" = "three", "nine" = "ten"))
fruits <- c("two apples", "nine pears") str_replace_vec(fruits, c("two" = "three", "nine" = "ten"))
Converts strings to Snake Case
str_to_snake_case(x)
str_to_snake_case(x)
x |
input string or multiple strings to be converted to snake case |
string or strings converted to snake_case
str_to_snake_case("string of words") str_to_snake_case("StringOfWords") str_to_snake_case("s!t$ring of %char^&act*ers") str_to_snake_case(c("multiples of strings", "strings in multiple", "many strings"))
str_to_snake_case("string of words") str_to_snake_case("StringOfWords") str_to_snake_case("s!t$ring of %char^&act*ers") str_to_snake_case(c("multiples of strings", "strings in multiple", "many strings"))
Wrapper on dplyr::summarise
that sets the default for the .group variable
to "keep". This means that all the groups set in dplyr::group_by
are
retained, not just the first group.
summarise2(.data, ..., .by = NULL, .groups = "keep") summarize2(.data, ..., .by = NULL, .groups = "keep")
summarise2(.data, ..., .by = NULL, .groups = "keep") summarize2(.data, ..., .by = NULL, .groups = "keep")
.data |
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details. |
... |
< The value can be:
Returning values with size 0 or >1 was
deprecated as of 1.1.0. Please use |
.by |
< |
.groups |
Grouping structure of the result.
When
In addition, a message informs you of that choice, unless the result is ungrouped,
the option "dplyr.summarise.inform" is set to |
An object usually of the same type as .data
.
The rows come from the underlying group_keys()
.
The columns are a combination of the grouping keys and the summary expressions that you provide.
The grouping structure is controlled by the .groups=
argument, the
output may be another grouped_df, a tibble or a rowwise data frame.
Data frame attributes are not preserved, because summarise()
fundamentally creates a new data frame.
Count: n()
, n_distinct()
The data frame backend supports creating a variable and using it in the
same summary. This means that previously created summary variables can be
further transformed or combined within the summary, as in mutate()
.
However, it also means that summary variables with the same names as previous
variables overwrite them, making those variables unavailable to later summary
variables.
This behaviour may not be supported in other backends. To avoid unexpected results, consider using new names for your summary variables, especially when creating multiple summaries.
This function is a generic, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
The following methods are currently available in loaded packages: no methods found.
dplyr::summarise()
and dplyr::summarize()
df <- data.frame( group = c("A", "A", "B", "B"), id = c(1, 1, 2, 2), value = c(10, 4, 20, 6) ) # summarise2 doesn't produce message about groups df |> dplyr::group_by(group, id) |> summarise2(mean = mean(value)) # summarise doesn't retain all the groups set in `group_by` df |> dplyr::group_by(group, id) |> dplyr::summarise(mean = mean(value))
df <- data.frame( group = c("A", "A", "B", "B"), id = c(1, 1, 2, 2), value = c(10, 4, 20, 6) ) # summarise2 doesn't produce message about groups df |> dplyr::group_by(group, id) |> summarise2(mean = mean(value)) # summarise doesn't retain all the groups set in `group_by` df |> dplyr::group_by(group, id) |> dplyr::summarise(mean = mean(value))
Convenience function for combining character columns.
unite_str(data, col, ..., sep = ". ", remove = TRUE)
unite_str(data, col, ..., sep = ". ", remove = TRUE)
data |
A data frame. |
col |
The name of the new column, as a string or symbol. This argument is passed by expression and supports
quasiquotation (you can unquote strings
and symbols). The name is captured from the expression with
|
... |
< |
sep |
Separator to use between values. |
remove |
If |
Blank values of "" are converted into missing values.
The original data frame with the one or more columns combined as character vectors separated by a period.
tidyr::unite()
and collapse_comments()
data <- tibble::tibble(x = c("good", "Saw fish.", "", NA), y = c("2021", NA, NA, NA)) # unite has poor handling of character vectors tidyr::unite(data, "new", x, y, remove = FALSE) unite_str(data, "new", x, y, remove = FALSE)
data <- tibble::tibble(x = c("good", "Saw fish.", "", NA), y = c("2021", NA, NA, NA)) # unite has poor handling of character vectors tidyr::unite(data, "new", x, y, remove = FALSE) unite_str(data, "new", x, y, remove = FALSE)