--- title: "Get Started with batchr" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Get Started with batchr} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` Consider a temporary directory with four txt files ```{r} path <- file.path(tempdir(), "demo") unlink(path) dir.create(path) writeLines("the contents of file.txt", file.path(path, "file.txt")) writeLines("the contents of file2.txt", file.path(path, "file2.txt")) writeLines("the contents of file3.txt", file.path(path, "file3.txt")) writeLines("the contents of file4.txt", file.path(path, "file4.txt")) ``` and a function that will successfully process the first two files (in this case rewriting their contents) but fail for the last two by returning FALSE and throwing an error, respectively. ```{r} fun <- function(file) { if (grepl("file3[.]txt$", file)) { return(FALSE) } if (grepl("file4[.]txt$", file)) stop("Uh, Houston, we've had a problem.", call. = FALSE) txt <- readLines(file) txt <- gsub("contents", "modified contents", txt) writeLines(txt, file) } ``` ## Configure Now let the user configure the directory to only process files names that include a digit before the file extension. ```{r} library(batchr) print(batch_config(fun, path = path, regexp = "file\\d[.]txt$")) ``` The contents of the hidden figuration file are as follows ```{r} batch_config_read(path) ``` The `time` value specifies the system time (in UTC) that the project was configured. It is very important because only files that were last modified *before* the configuration time are considered to be unprocessed (when a file is successfully processed its modification time is automatically set to the current system time). ## Run With the directory configured the next task is to start processing the files ```{r} print(batch_run(path, ask = FALSE)) ``` From the output we can see that 'file2.csv' was processed successfully but processing of the other two files that matched regexp failed. The output is also recorded in a hidden log file that can be read using `batch_log_read()`. ```{r} batch_log_read(path) ``` or summarised using ```{r} batch_report(path) ``` The contents of 'file2.csv' are now as follows ```{r} readLines(file.path(path, "file2.txt")) ``` ## Reconfigure At this point let us update the function and regular expression so that all files are included and successfully processed. ```{r} batch_reconfig_fileset(path, regexp = "[.]txt$") fun <- function(file) { txt <- readLines(file) txt <- gsub("contents", "modified contents", txt) writeLines(txt, file) } batch_reconfig_fun(path, fun) ``` ## Rerun Now when we call `batch_run()` the newly included 'file.txt' is successfully processed. ```{r} batch_run(path, ask = FALSE) ``` ```{r} batch_report(path) ``` In order to reattempt processing of the 'file3.txt' and 'file4.txt' we need to set `failed = TRUE`. ```{r} batch_run(path, failed = TRUE, ask = FALSE) ``` ```{r} batch_report(path) ``` ## Clean Up With all the files successfully processed the only remaining task is to delete the hidden configuration and log files. ```{r} list.files(path, all.files = TRUE) print(batch_cleanup(path)) list.files(path, all.files = TRUE) ```