Title: | Batch Process Files |
---|---|
Description: | Processes multiple files with a user-supplied function. The key design principle is that only files which were last modified before the directory was configured are processed. A hidden file stores the configuration time and function etc while successfully processed files are automatically touched to update their modification date. As a result batch processing can be stopped and restarted and any files created (or modified or deleted) during processing are ignored. |
Authors: | Joe Thorley [cre, aut] , Audrey Beliveau [ctb], Ayla Pearson [ctb] , Poisson Consulting [cph, fnd] |
Maintainer: | Joe Thorley <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.0.2.9000 |
Built: | 2024-11-01 16:19:32 UTC |
Source: | https://github.com/poissonconsulting/batchr |
Deletes configuration file created by batch_config()
and log file created by batch_run()
.
batch_cleanup( path, force = FALSE, remaining = FALSE, failed = NA, recursive = FALSE, silent = FALSE )
batch_cleanup( path, force = FALSE, remaining = FALSE, failed = NA, recursive = FALSE, silent = FALSE )
path |
A string of the path to the directory with the files for processing. |
force |
A flag specifying whether to delete configuration and log files even if there are files remaining to be processed. |
remaining |
A flag specifying whether to delete
any files that are remaining to be processed
(only applied when |
failed |
A logical scalar specifying how to treat files that previously failed to process. If FALSE (the default) failed files are excluded, if NA they are included and if TRUE they are only included. |
recursive |
A flag specifying whether to recurse into subdirectories
when cleaning up. This is unrelated to the |
silent |
A flag specifying whether to suppress warnings (and messages). |
The batch_completed()
function can be used to test
if batch processing is complete.
A named logical vector indicating which directories were successfully cleaned up.
path <- tempdir() write.csv(mtcars, file.path(path, "file1.csv")) batch_config(function(x) TRUE, path, regexp = "[.]csv$") batch_run(path, ask = FALSE) batch_cleanup(path) unlink(file.path(path, "file1.csv"))
path <- tempdir() write.csv(mtcars, file.path(path, "file1.csv")) batch_config(function(x) TRUE, path, regexp = "[.]csv$") batch_run(path, ask = FALSE) batch_cleanup(path) unlink(file.path(path, "file1.csv"))
Tests if there are any remaining files to process as listed
by batch_files_remaining()
.
batch_completed(path, failed = FALSE)
batch_completed(path, failed = FALSE)
path |
A string of the path to the directory with the files for processing. |
failed |
A logical scalar specifying how to treat files that previously failed to process. If FALSE (the default) failed files are excluded, if NA they are included and if TRUE they are only included. |
By default, files that previously failed to process are excluded.
A flag specifying whether batch processing is complete.
path <- tempdir() write.csv(mtcars, file.path(path, "file1.csv")) batch_config(function(x) TRUE, path, regexp = "[.]csv$") batch_completed(path) batch_run(path, ask = FALSE) batch_completed(path) batch_cleanup(path) unlink(file.path(path, "file1.csv"))
path <- tempdir() write.csv(mtcars, file.path(path, "file1.csv")) batch_config(function(x) TRUE, path, regexp = "[.]csv$") batch_completed(path) batch_run(path, ask = FALSE) batch_completed(path) batch_cleanup(path) unlink(file.path(path, "file1.csv"))
Configures a directory for batch file processing by batch_run()
.
batch_config(fun, path, regexp = ".*", recurse = FALSE, ...)
batch_config(fun, path, regexp = ".*", recurse = FALSE, ...)
fun |
A function to process each of the files.
|
path |
A string of the path to the directory with the files for processing. |
regexp |
A string of a regular expression. Only non-hidden file names which match the regular expression will be batch processed. |
recurse |
A flag specifying whether to recurse into path's subdirectories. |
... |
Additional arguments passed to |
batch_config()
creates a hidden configuration file in path
named '.batchr.rds'.
The contents of the file can be read using
batch_config_read()
or updated using batch_reconfig_fun()
.
Configuration is only possible if the directory does not already contain
a configuration file.
If recurse = TRUE
then the subdirectories
must also not contain configuration files.
The regexp must match at least one non-hidden file in the directory
or if recurse = TRUE
in the directory or subdirectories.
Hidden files are excluded to prevent accidental modification of system files.
An invisible character vector of the paths to the files to
that will be processed when batch_run()
is called.
batch_process()
and batch_run()
path <- tempdir() write.csv(mtcars, file.path(path, "file1.csv")) batch_config(function(x) TRUE, path, regexp = "[.]csv$") batch_run(path, ask = FALSE) batch_cleanup(path) unlink(file.path(path, "file1.csv"))
path <- tempdir() write.csv(mtcars, file.path(path, "file1.csv")) batch_config(function(x) TRUE, path, regexp = "[.]csv$") batch_run(path, ask = FALSE) batch_cleanup(path) unlink(file.path(path, "file1.csv"))
Reads the values in the
configuration file created by batch_config()
.
batch_config_read(path)
batch_config_read(path)
path |
A string of the path to the directory with the files for processing. |
A named list of the configuration values.
batch_process()
and batch_log_read()
path <- tempdir() write.csv(mtcars, file.path(path, "file1.csv")) batch_config(function(x) TRUE, path, regexp = "[.]csv$") batch_config_read(path) batch_cleanup(path, force = TRUE, remaining = TRUE) unlink(file.path(path, "file1.csv"))
path <- tempdir() write.csv(mtcars, file.path(path, "file1.csv")) batch_config(function(x) TRUE, path, regexp = "[.]csv$") batch_config_read(path) batch_cleanup(path, force = TRUE, remaining = TRUE) unlink(file.path(path, "file1.csv"))
Gets the current status (SUCCESS, FAILURE, REMAING) of each eligible file in path.
batch_file_status(path)
batch_file_status(path)
path |
A string of the path to the directory with the files for processing. |
A tibble with four columns:
A character vector indicating SUCCESS, FAILURE or REMAING
A hms vector of the file processing time
A character vector of the file name
A character vector of the error message (or NA if no error)
path <- tempdir() write.csv(mtcars, file.path(path, "file1.csv")) batch_config(function(x) TRUE, path, regexp = "[.]csv$") batch_file_status(path) batch_run(path, ask = FALSE) batch_file_status(path) batch_cleanup(path) unlink(file.path(path, "file1.csv"))
path <- tempdir() write.csv(mtcars, file.path(path, "file1.csv")) batch_config(function(x) TRUE, path, regexp = "[.]csv$") batch_file_status(path) batch_run(path, ask = FALSE) batch_file_status(path) batch_cleanup(path) unlink(file.path(path, "file1.csv"))
Gets the names of the files that are remaining to be processed by
batch_run()
.
batch_files_remaining(path, failed = FALSE)
batch_files_remaining(path, failed = FALSE)
path |
A string of the path to the directory with the files for processing. |
failed |
A logical scalar specifying how to treat files that previously failed to process. If FALSE (the default) failed files are excluded, if NA they are included and if TRUE they are only included. |
batch_completed()
can be used to test if there are any
files remaining.
A character vector of the names of the remaining files.
batch_process()
and batch_run()
path <- tempdir() write.csv(mtcars, file.path(path, "file1.csv")) batch_config(function(x) TRUE, path, regexp = "[.]csv$") batch_files_remaining(path) batch_run(path, ask = FALSE) batch_files_remaining(path) batch_cleanup(path) unlink(file.path(path, "file1.csv"))
path <- tempdir() write.csv(mtcars, file.path(path, "file1.csv")) batch_config(function(x) TRUE, path, regexp = "[.]csv$") batch_files_remaining(path) batch_run(path, ask = FALSE) batch_files_remaining(path) batch_cleanup(path) unlink(file.path(path, "file1.csv"))
Tests whether directory contains configuration file created by batch_config()
.
batch_is_clean(path, recurse = FALSE)
batch_is_clean(path, recurse = FALSE)
path |
A string of the path to the directory with the files for processing. |
recurse |
A flag specifying whether to recurse into path's subdirectories. |
A flag specifying whether the directory is clean.
path <- tempdir() batch_is_clean(path) write.csv(mtcars, file.path(path, "file1.csv")) batch_config(function(x) TRUE, path, regexp = "[.]csv$") batch_is_clean(path) batch_cleanup(path, force = TRUE, remaining = TRUE) batch_is_clean(path) unlink(file.path(path, "file1.csv"))
path <- tempdir() batch_is_clean(path) write.csv(mtcars, file.path(path, "file1.csv")) batch_config(function(x) TRUE, path, regexp = "[.]csv$") batch_is_clean(path) batch_cleanup(path, force = TRUE, remaining = TRUE) batch_is_clean(path) unlink(file.path(path, "file1.csv"))
Reads the values in the log file created by batch_run()
.
batch_log_read(path)
batch_log_read(path)
path |
A string of the path to the directory with the files for processing. |
A tibble with four columns:
A character vector indicating SUCCESS or FAILURE
A hms vector of the file processing time
A character vector of the file name
A character vector of the error message (or NA if no error)
batch_process()
and batch_config_read()
path <- tempdir() write.csv(mtcars, file.path(path, "file1.csv")) batch_config(function(x) TRUE, path, regexp = "[.]csv$") batch_log_read(path) batch_run(path, ask = FALSE) batch_log_read(path) batch_cleanup(path) unlink(file.path(path, "file1.csv"))
path <- tempdir() write.csv(mtcars, file.path(path, "file1.csv")) batch_config(function(x) TRUE, path, regexp = "[.]csv$") batch_log_read(path) batch_run(path, ask = FALSE) batch_log_read(path) batch_cleanup(path) unlink(file.path(path, "file1.csv"))
Performs batch processing of files in a directory using the
batch_config()
, batch_run()
and batch_cleanup()
functions.
For more control the user should call these three functions instead.
batch_process( fun, path, regexp = ".*", recurse = FALSE, progress = FALSE, force = TRUE, report = TRUE, seeds = NULL, options = furrr::furrr_options(), ask = getOption("batchr.ask", TRUE), ... )
batch_process( fun, path, regexp = ".*", recurse = FALSE, progress = FALSE, force = TRUE, report = TRUE, seeds = NULL, options = furrr::furrr_options(), ask = getOption("batchr.ask", TRUE), ... )
fun |
A function to process each of the files.
|
path |
A string of the path to the directory with the files for processing. |
regexp |
A string of a regular expression. Only non-hidden file names which match the regular expression will be batch processed. |
recurse |
A flag specifying whether to recurse into path's subdirectories. |
progress |
A flag specifying whether to print a progress bar. |
force |
A flag specifying whether to delete configuration and log files even if there are files remaining to be processed. |
report |
A flag specifying whether to outputs a report of the status of individual files to the console. |
seeds |
A named list of the L'Ecuyer-CMRG seed to use for each
file. If |
options |
The future specific options to use with the workers.
seed must be |
ask |
A flag specifying whether to ask before starting to process the files. |
... |
Additional arguments passed to |
An invisible flag indicating whether all the files where successfully processed.
batch_config()
, batch_run()
and batch_cleanup()
path <- tempdir() write.csv(mtcars, file.path(path, "file1.csv")) batch_process(function(x) TRUE, path, regexp = "[.]csv$", ask = FALSE) unlink(file.path(path, "file1.csv"))
path <- tempdir() write.csv(mtcars, file.path(path, "file1.csv")) batch_process(function(x) TRUE, path, regexp = "[.]csv$", ask = FALSE) unlink(file.path(path, "file1.csv"))
Updates the regular expression and/or recurse argument that were provided
when a directory was configured (using batch_config()
).
batch_reconfig_fileset(path, regexp = NULL, recurse = NULL)
batch_reconfig_fileset(path, regexp = NULL, recurse = NULL)
path |
A string of the path to the directory with the files for processing. |
regexp |
A string of a regular expression. Only non-hidden file names which match the regular expression will be batch processed. |
recurse |
A flag specifying whether to recurse into path's subdirectories. |
batch_reconfig_fileset()
is useful for including or excluding particular files.
It should be noted that batch_reconfig_fun()
does not alter the
configuration time.
In order to process previously failed files batch_run()
should be called with failed = NA
or failed = TRUE
.
An invisible character vector of the paths to the files remaining to be processed.
batch_process()
and batch_config()
path <- tempdir() write.csv(mtcars, file.path(path, "file1.csv")) batch_config(function(x) TRUE, path, regexp = "[.]csv$") batch_config_read(path) batch_reconfig_fileset(path, regexp = "file\\d+[.]csv$") batch_config_read(path) batch_cleanup(path, force = TRUE, remaining = TRUE) unlink(file.path(path, "file1.csv"))
path <- tempdir() write.csv(mtcars, file.path(path, "file1.csv")) batch_config(function(x) TRUE, path, regexp = "[.]csv$") batch_config_read(path) batch_reconfig_fileset(path, regexp = "file\\d+[.]csv$") batch_config_read(path) batch_cleanup(path, force = TRUE, remaining = TRUE) unlink(file.path(path, "file1.csv"))
Updates the function and function arguments that were provided
when a directory was configured (using batch_config()
).
batch_reconfig_fun(path, fun, ...)
batch_reconfig_fun(path, fun, ...)
path |
A string of the path to the directory with the files for processing. |
fun |
A function to process each of the files.
|
... |
Additional arguments passed to |
batch_reconfig_fun()
is useful if a new version of the function is required
to successfully process some of the files.
It should be noted that batch_reconfig_fun()
does not alter the
configuration time.
In order to process previously failed files batch_run()
should be called with failed = NA
or failed = TRUE
.
An invisible character vector of the paths to the files remaining to be processed.
batch_process()
and batch_config()
Outputs a report of the status of individual files to the console.
batch_report(path)
batch_report(path)
path |
A string of the path to the directory with the files for processing. |
An invisible NULL. The function is called for its side-effect of outputting a report of the status of individual files to the console.
path <- tempdir() write.csv(mtcars, file.path(path, "file1.csv")) batch_config(function(x) TRUE, path, regexp = "[.]csv$",) batch_report(path) batch_run(path, ask = FALSE) batch_report(path) batch_cleanup(path) unlink(file.path(path, "file1.csv"))
path <- tempdir() write.csv(mtcars, file.path(path, "file1.csv")) batch_config(function(x) TRUE, path, regexp = "[.]csv$",) batch_report(path) batch_run(path, ask = FALSE) batch_report(path) batch_cleanup(path) unlink(file.path(path, "file1.csv"))
Starts (or restarts if previously stopped) processing the remaining files
specified by batch_config()
.
batch_run( path, failed = FALSE, progress = FALSE, files = NULL, seeds = NULL, options = furrr::furrr_options(), ask = getOption("batchr.ask", TRUE) )
batch_run( path, failed = FALSE, progress = FALSE, files = NULL, seeds = NULL, options = furrr::furrr_options(), ask = getOption("batchr.ask", TRUE) )
path |
A string of the path to the directory with the files for processing. |
failed |
A logical scalar specifying how to treat files that previously failed to process. If FALSE (the default) failed files are excluded, if NA they are included and if TRUE they are only included. |
progress |
A flag specifying whether to print a progress bar. |
files |
A character vector of the remaining files to process.
If |
seeds |
A named list of the L'Ecuyer-CMRG seed to use for each
file. If |
options |
The future specific options to use with the workers.
seed must be |
ask |
A flag specifying whether to ask before starting to process the files. |
batch_run()
logs all file processing attempts together with the
the type (SUCCESS or FAILURE), the
system time in UTC, the file name and any error messages.
The hidden log file can be read using batch_log_read()
.
batch_files_remaining()
provides a vector of the files that
are remaining to be processed.
When processing is complete the hidden configuration file
and hidden log file can be deleted using batch_cleanup()
.
If a remaining file is removed or modified by a separate process,
batch_run()
throws an error.
An invisible named logical vector indicating for each file whether it was successfully processed.
batch_process()
, batch_config()
and
batch_cleanup()
path <- tempdir() write.csv(mtcars, file.path(path, "file1.csv")) batch_config(function(x) TRUE, path, regexp = "[.]csv$") batch_run(path, ask = FALSE) batch_cleanup(path) unlink(file.path(path, "file1.csv"))
path <- tempdir() write.csv(mtcars, file.path(path, "file1.csv")) batch_config(function(x) TRUE, path, regexp = "[.]csv$") batch_run(path, ask = FALSE) batch_cleanup(path) unlink(file.path(path, "file1.csv"))
Generates a named list of L'Ecuyer-CMRG seeds.
batch_seeds(files = batch_files_remaining())
batch_seeds(files = batch_files_remaining())
files |
A character vector of the names of the files. |
A named list of the L'Ecuyer-CMRG seed for each file name.
batch_seeds(c("a", "b"))
batch_seeds(c("a", "b"))