Package 'newdata' reference manual

Title:	Generate New Data Frames for Prediction
Description:	Generates new data frames for predictive purposes. By default, all specified variables vary across their range while all other variables are held constant at the default reference value. Types, classes, factor levels and time zones are always preserved. The user can specify the length of each sequence, require that only observed values and combinations are used and add new variables.
Authors:	Joe Thorley [aut, cre] , Kirill Müller [aut] , Ayla Pearson [aut] , Nadine Hussein [ctb] , Maëlle Salmon [ctb] , Poisson Consulting [fnd, cph]
Maintainer:	Joe Thorley <[email protected]>
License:	MIT + file LICENSE
Version:	0.0.0.9023
Built:	2025-01-14 21:20:05 UTC
Source:	https://github.com/poissonconsulting/newdata

Generate New Data

Description

Generates a new data frame (in the form of a tibble) with each variable held constant or varying as a unique ordered sequence.

Usage

new_data(
  data,
  seq = character(0),
  ref = list(),
  obs_only = list(character(0)),
  length_out = 30
)
new_data(
  data,
  seq = character(0),
  ref = list(),
  obs_only = list(character(0)),
  length_out = 30
)

Arguments

`data`	The data frame to generate the new data from.
`seq`	A character vector of the variables in `data` to generate sequences for.
`ref`	A named list of reference values for variables that are not in seq. Deprecated for `xnew_value()`.
`obs_only`	A list of character vectors indicating the sets of variables to only allow observed combinations for. If TRUE then obs_only is set to be seq. Deprecated for `xobs_only()`.
`length_out`	A count indicating the maximum length of sequences for all types of variables except logical, character, factor and ordered factors.

Details

Although superseded it is maintained for backwards compatibility with existing code.

The code new_data(data, c("a", "b"), length_out = 30) is effectively a wrapper for xnew_data(data, a, b, .length_out = 30) to allow a string of column names to be passed.

Value

A tibble of the new data.

Examples

new_data(old_data, "int")
new_data(old_data, "dbl")
new_data(old_data, c("int", "dbl"))
new_data(old_data, "int")
new_data(old_data, "dbl")
new_data(old_data, c("int", "dbl"))

Generate New Sequence

Description

Generate a new sequence of values. A sequence of values is used to predict the effect of a variable.

Usage

new_seq(x, length_out = NULL, ..., obs_only = NULL)

## S3 method for class 'logical'
new_seq(x, length_out = NULL, ..., obs_only = NULL)

## S3 method for class 'integer'
new_seq(x, length_out = NULL, ..., obs_only = NULL)

## S3 method for class 'double'
new_seq(x, length_out = NULL, ..., obs_only = NULL)

## S3 method for class 'character'
new_seq(x, length_out = NULL, ..., obs_only = NULL)

## S3 method for class 'factor'
new_seq(x, length_out = NULL, ..., obs_only = NULL)

## S3 method for class 'ordered'
new_seq(x, length_out = NULL, ..., obs_only = NULL)

## S3 method for class 'Date'
new_seq(x, length_out = NULL, ..., obs_only = NULL)

## S3 method for class 'POSIXct'
new_seq(x, length_out = NULL, ..., obs_only = NULL)

## S3 method for class 'hms'
new_seq(x, length_out = NULL, ..., obs_only = NULL)
new_seq(x, length_out = NULL, ..., obs_only = NULL)

## S3 method for class 'logical'
new_seq(x, length_out = NULL, ..., obs_only = NULL)

## S3 method for class 'integer'
new_seq(x, length_out = NULL, ..., obs_only = NULL)

## S3 method for class 'double'
new_seq(x, length_out = NULL, ..., obs_only = NULL)

## S3 method for class 'character'
new_seq(x, length_out = NULL, ..., obs_only = NULL)

## S3 method for class 'factor'
new_seq(x, length_out = NULL, ..., obs_only = NULL)

## S3 method for class 'ordered'
new_seq(x, length_out = NULL, ..., obs_only = NULL)

## S3 method for class 'Date'
new_seq(x, length_out = NULL, ..., obs_only = NULL)

## S3 method for class 'POSIXct'
new_seq(x, length_out = NULL, ..., obs_only = NULL)

## S3 method for class 'hms'
new_seq(x, length_out = NULL, ..., obs_only = NULL)

Arguments

`x`	The object to generate the sequence from.
`length_out`	The maximum length of the sequence.
`...`	These dots are for future extensions and must be empty.
`obs_only`	A flag specifying whether to only use observed values.

Details

By default the sequence of values for objects of class numeric is 30 evenly space values across the range of the data. Missing values are always removed unless it's the only value or the object is zero length. The length of the sequence can be varied using the length_out argument which gives the reference value when 1 and can even be 0. For integer objects the sequence is the unique integers. For character objects it's the actual values sorted by how common they are followed by their actual value. For factors it's the factor levels in order with the trailing levels dropped first. For ordered factors the intermediate levels are dropped first. For Date vectors it's the unique dates; same for hms vectors. For POSIXct vectors the time zone is preserved. For logical objects the longest possible sequence is c(TRUE, FALSE).

Value

A vector of the same class as the object.

Methods (by class)

new_seq(logical): Generate new sequence of values for logical objects
new_seq(integer): Generate new sequence of values for integer objects
new_seq(double): Generate new sequence of values for double objects
new_seq(character): Generate new sequence of values for character objects
new_seq(factor): Generate new sequence of values for factors
new_seq(ordered): Generate new sequence of values for ordered factors
new_seq(Date): Generate new sequence of values for Date vectors
new_seq(POSIXct): Generate new sequence of values for POSIXct vectors
new_seq(hms): Generate new sequence of values for hms vectors

Examples

# by default the sequence of values for objects of class numeric
# is 30 evenly space values across the range of the data
new_seq(c(1, 4))
# missing values are always removed
new_seq(c(1, 4, NA))
# unless it's the only value
new_seq(NA_real_)
# or the object is zero length
new_seq(numeric())
# the length of the sequence can be varied using the length_out argument
new_seq(c(1, 4), length_out = 3)
new_seq(c(1, 4), length_out = 2)
# which gives the reference value when 1
new_seq(c(1, 4), length_out = 1)
# and can even be 0
new_seq(c(1, 4), length_out = 0)
# for integer objects the sequence is the unique integers
new_seq(c(1L, 4L))
new_seq(c(1L, 100L))
# for character objects it's the actual values sorted by
# how common they are followed by their actual value
new_seq(c("a", "c", "c", "b", "b"))
new_seq(c("a", "c", "c", "b", "b"), length_out = 2)
# for factors its the factor levels in order
new_seq(factor(c("a", "b", "c", "c"), levels = c("b", "a", "g")))
# with the trailing levels dropped first
new_seq(factor(c("a", "b", "c", "c"), levels = c("b", "a", "g")),
  length_out = 2
)
# for ordered factors the intermediate levels are dropped first
new_seq(ordered(c("a", "b", "c", "c"), levels = c("b", "a", "g")),
  length_out = 2
)
# for Date vectors it's the unique dates
new_seq(as.Date(c("2000-01-01", "2000-01-04")))
# same for hms vectors
new_seq(hms::as_hms(c("00:00:01", "00:00:04")))
# for POSIXct vectors the time zone is preserved
new_seq(as.POSIXct(c("2000-01-01 00:00:01", "2000-01-01 00:00:04"),
  tz = "PST8PDT"
))
# for logical objects the longest possible sequence is `c(TRUE, FALSE)`
new_seq(c(TRUE, TRUE, FALSE), length_out = 3)
# by default the sequence of values for objects of class numeric
# is 30 evenly space values across the range of the data
new_seq(c(1, 4))
# missing values are always removed
new_seq(c(1, 4, NA))
# unless it's the only value
new_seq(NA_real_)
# or the object is zero length
new_seq(numeric())
# the length of the sequence can be varied using the length_out argument
new_seq(c(1, 4), length_out = 3)
new_seq(c(1, 4), length_out = 2)
# which gives the reference value when 1
new_seq(c(1, 4), length_out = 1)
# and can even be 0
new_seq(c(1, 4), length_out = 0)
# for integer objects the sequence is the unique integers
new_seq(c(1L, 4L))
new_seq(c(1L, 100L))
# for character objects it's the actual values sorted by
# how common they are followed by their actual value
new_seq(c("a", "c", "c", "b", "b"))
new_seq(c("a", "c", "c", "b", "b"), length_out = 2)
# for factors its the factor levels in order
new_seq(factor(c("a", "b", "c", "c"), levels = c("b", "a", "g")))
# with the trailing levels dropped first
new_seq(factor(c("a", "b", "c", "c"), levels = c("b", "a", "g")),
  length_out = 2
)
# for ordered factors the intermediate levels are dropped first
new_seq(ordered(c("a", "b", "c", "c"), levels = c("b", "a", "g")),
  length_out = 2
)
# for Date vectors it's the unique dates
new_seq(as.Date(c("2000-01-01", "2000-01-04")))
# same for hms vectors
new_seq(hms::as_hms(c("00:00:01", "00:00:04")))
# for POSIXct vectors the time zone is preserved
new_seq(as.POSIXct(c("2000-01-01 00:00:01", "2000-01-01 00:00:04"),
  tz = "PST8PDT"
))
# for logical objects the longest possible sequence is `c(TRUE, FALSE)`
new_seq(c(TRUE, TRUE, FALSE), length_out = 3)

Generate New Reference Value

Description

Generate a new reference value for a vector.

Usage

new_value(x, ..., obs_only = NULL)
new_value(x, ..., obs_only = NULL)

Arguments

`x`	The object to generate the reference value from.
`...`	These dots are for future extensions and must be empty.
`obs_only`	A flag specifying whether to only use observed values.

Details

By default the reference value for double vectors is the mean, unless obs_only = TRUE, in which case its the median of the unique values. For integer vectors it's the floored mean unless obs_only = TRUE, in which case it's also the median of the unique values. For character vectors it's the minimum of the most common values while for factors it's the first level. Ordered factors, Dates, times (hms), POSIXct and logical vectors are treated like integers. The factor levels and time zone are preserved.

Value

A scalar of the same class as the object.

Examples

# the reference value for objects of class numeric is the mean
new_value(c(1, 4))
# unless obs_only = TRUE, in which case its the median of the unique values
new_value(c(1, 4), obs_only = TRUE)

# for integer objects it's the floored mean
new_value(c(1L, 4L))

# for character objects it's the minimum of the most common values
new_value(c("a", "b", "c", "c", "b"))

# for factors its the first level and the factor levels are preserved
new_value(factor(c("a", "b", "c", "c"), levels = c("b", "a", "g")))

# other classes are treated like integers
new_value(ordered(c("a", "b", "c", "c"), levels = c("b", "a", "g")))
new_value(as.Date(c("2000-01-01", "2000-01-04")))
new_value(hms::as_hms(c("00:00:01", "00:00:04")))
new_value(as.POSIXct(c("2000-01-01 00:00:01", "2000-01-01 00:00:04")),
  tzone = "PST8PDT"
)
new_value(c(TRUE, FALSE, TRUE))

# the reference value for objects of class numeric is the mean
new_value(c(1, 4))
# unless obs_only = TRUE, in which case its the median of the unique values
new_value(c(1, 4), obs_only = TRUE)

# for integer objects it's the floored mean
new_value(c(1L, 4L))

# for character objects it's the minimum of the most common values
new_value(c("a", "b", "c", "c", "b"))

# for factors its the first level and the factor levels are preserved
new_value(factor(c("a", "b", "c", "c"), levels = c("b", "a", "g")))

# other classes are treated like integers
new_value(ordered(c("a", "b", "c", "c"), levels = c("b", "a", "g")))
new_value(as.Date(c("2000-01-01", "2000-01-04")))
new_value(hms::as_hms(c("00:00:01", "00:00:04")))
new_value(as.POSIXct(c("2000-01-01 00:00:01", "2000-01-01 00:00:04")),
  tzone = "PST8PDT"
)
new_value(c(TRUE, FALSE, TRUE))

Example Data

Description

An example tibble of example 'old' data.

Usage

old_data
old_data

Format

An object of class tbl_df (inherits from tbl, data.frame) with 3 rows and 9 columns.

Details

lgl: A logical vector.
int: An integer vector.
dbl: A double vector.
chr: A character vector.
fct: A factor.
ord: An ordered factor.
dte: A Date vector.
dtt: A POSIXct vector.
hms: A hms vector.

Examples

old_data
old_data

Cast New Values for `xnew_data()`

Description

Casts a sequence of values to the same class as the original vector.

Usage

xcast(..., .data = xnew_data_env$data)
xcast(..., .data = xnew_data_env$data)

Arguments

`...`	TBD
`.data`	Normally defined by `xnew_data()`, users must pass a data frame or tibble if using this function directly.

Details

xnew_seq() is a wrapper function on vctrs::vec_cast() for use in xnew_data() to avoid having to repeating the column name.

Examples

data <- tibble::tibble(
  period = factor(c("before", "before", "after", "after"),
    levels = c("before", "after")
  ),
  annual = factor(c(1, 3, 5, 8), levels = c(1, 3, 5, 8))
)

xnew_data(data, xcast(period = "before"))
xnew_data(data, xcast(period = "before", annual = c("1", "3")))
data <- tibble::tibble(
  period = factor(c("before", "before", "after", "after"),
    levels = c("before", "after")
  ),
  annual = factor(c(1, 3, 5, 8), levels = c(1, 3, 5, 8))
)

xnew_data(data, xcast(period = "before"))
xnew_data(data, xcast(period = "before", annual = c("1", "3")))

Generate New Data by Expansion

Description

Generates a new data frame (in the form of a tibble)

Usage

xnew_data(.data, ..., .length_out = NULL)
xnew_data(.data, ..., .length_out = NULL)

Arguments

`.data`	The data frame to generate the new data from.
`...`	A list of variables to generate sequences for.
`.length_out`	NULL or a count specifying the maximum length of all sequences.

Details

By default, all specified variables vary across their range while all other variables are held constant at their reference value. Types, classes, factor levels and time zones are always preserved. The user can specify the length of each sequence, require that only observed values and combinations are used and add new variables.

Examples

data <- tibble::tibble(
  period = factor(c("before", "before", "after", "after"),
    levels = c("before", "after")
  ),
  count = c(0L, 1L, 5L, 4L),
  annual = factor(c(2, 3, 5, 8), levels = c(1, 2, 3, 5, 8))
)

# By default all other variables are held constant at their reference value.
xnew_data(data)

# Specifying a variable causes it to vary across its range.
xnew_data(data, annual)

# The user can specify the length of a sequence.
xnew_data(data, xnew_seq(annual, length_out = 3))

# And only allow observed values.
xnew_data(data, xnew_seq(annual, length_out = 3, obs_only = TRUE))

# With multiple variables all combinations are produced
xnew_data(data, period, xnew_seq(annual, length_out = 3, obs_only = TRUE))

# To only preserve observed combinations use
xnew_data(data, xobs_only(period, annual))

# And to cast the values use
xnew_data(data, xcast(annual = "3"))
data <- tibble::tibble(
  period = factor(c("before", "before", "after", "after"),
    levels = c("before", "after")
  ),
  count = c(0L, 1L, 5L, 4L),
  annual = factor(c(2, 3, 5, 8), levels = c(1, 2, 3, 5, 8))
)

# By default all other variables are held constant at their reference value.
xnew_data(data)

# Specifying a variable causes it to vary across its range.
xnew_data(data, annual)

# The user can specify the length of a sequence.
xnew_data(data, xnew_seq(annual, length_out = 3))

# And only allow observed values.
xnew_data(data, xnew_seq(annual, length_out = 3, obs_only = TRUE))

# With multiple variables all combinations are produced
xnew_data(data, period, xnew_seq(annual, length_out = 3, obs_only = TRUE))

# To only preserve observed combinations use
xnew_data(data, xobs_only(period, annual))

# And to cast the values use
xnew_data(data, xcast(annual = "3"))

Generate New Sequence for `xnew_data()`

Description

Generate a new sequence of values for a vector.

Usage

xnew_seq(x, ...)
xnew_seq(x, ...)

Arguments

`x`	The object to generate the sequence from.
`...`	Additional arguments passed to `new_seq()`.

Details

xnew_seq() is a wrapper function on new_seq() for use in xnew_data() to avoid having to repeating the column name.

Examples


data <- tibble::tibble(
  a = c(1L, 3L, 4L),
  b = c(4, 4.5, 6),
  d = c("a", "b", "c")
)

xnew_data(data, a, b = new_seq(b, length_out = 3), xnew_seq(d, length_out = 2))
data <- tibble::tibble(
  a = c(1L, 3L, 4L),
  b = c(4, 4.5, 6),
  d = c("a", "b", "c")
)

xnew_data(data, a, b = new_seq(b, length_out = 3), xnew_seq(d, length_out = 2))

Generate New Reference Value for `xnew_data()`

Description

Generate a new reference value for a vector.

Usage

xnew_value(x, ...)
xnew_value(x, ...)

Arguments

`x`	The object to generate the reference value from.
`...`	Additional arguments passed to `new_value()`.

Details

xnew_value() is a wrapper function on new_value() for use in xnew_data() to avoid having to repeating the column name.

Examples

data <- tibble::tibble(
  a = c(1L, 3L, 4L),
  b = c(4, 4.5, 6),
  d = c("a", "b", "c")
)

xnew_data(data, a, b = new_value(b), xnew_value(d))
data <- tibble::tibble(
  a = c(1L, 3L, 4L),
  b = c(4, 4.5, 6),
  d = c("a", "b", "c")
)

xnew_data(data, a, b = new_value(b), xnew_value(d))

Generate Observed Combinations Only

Description

Generate Observed Combinations Only

Usage

xobs_only(..., .length_out = NULL, .data = xnew_data_env$data)
xobs_only(..., .length_out = NULL, .data = xnew_data_env$data)

Arguments

`...`	One or more variables to generate combinations for.
`.length_out`	A count to override the default length of sequences.
`.data`	Normally defined by `xnew_data()`, users must pass a data frame or tibble if using this function directly.

Examples

data <- tibble::tibble(
  period = factor(c("before", "before", "after", "after"),
    levels = c("before", "after")
  ),
  annual = factor(c(1, 3, 5, 8), levels = c(1, 3, 5, 8))
)
xnew_data(data, period, annual)
xnew_data(data, xobs_only(period, annual))
xnew_data(data, xobs_only(period, xnew_seq(annual, length_out = 3)))
data <- tibble::tibble(
  period = factor(c("before", "before", "after", "after"),
    levels = c("before", "after")
  ),
  annual = factor(c(1, 3, 5, 8), levels = c(1, 3, 5, 8))
)
xnew_data(data, period, annual)
xnew_data(data, xobs_only(period, annual))
xnew_data(data, xobs_only(period, xnew_seq(annual, length_out = 3)))

Package 'newdata'

Help Index

Generate New Data

Description

Usage

Arguments

Details

Value

See Also

Examples

Generate New Sequence

Description

Usage

Arguments

Details

Value

Methods (by class)

See Also

Examples

Generate New Reference Value

Description

Usage

Arguments

Details

Value

See Also

Examples

Example Data

Description

Usage

Format

Details

Examples

Cast New Values for xnew_data()

Description

Usage

Arguments

Details

See Also

Examples

Generate New Data by Expansion

Description

Usage

Arguments

Details

See Also

Examples

Generate New Sequence for xnew_data()

Description

Usage

Arguments

Details

See Also

Examples

Generate New Reference Value for xnew_data()

Description

Usage

Arguments

Details

See Also

Examples

Generate Observed Combinations Only

Description

Usage

Arguments

See Also

Examples

Cast New Values for `xnew_data()`

Generate New Sequence for `xnew_data()`

Generate New Reference Value for `xnew_data()`