Apply a Function to a Data Frame Split by Factors via Futures
future_by(
data,
INDICES,
FUN,
...,
simplify = TRUE,
future.envir = parent.frame()
)
An R object, normally a data frame, possibly a matrix.
A factor or a list of factors, each of length nrow(data)
.
a function to be applied to (usually data-frame) subsets of data
.
logical: see base::tapply.
An environment passed as argument envir
to
future::future()
as-is.
Additional arguments pass to future_lapply()
and
then to FUN()
.
An object of class "by", giving the results for each subset.
This is always a list if simplify is false, otherwise a list
or array (see base::tapply).
See also base::by()
for details.
Internally, data
is grouped by INDICES
into a list of data
subset elements which is then processed by future_lapply()
.
When the groups differ significantly in size, the processing time
may differ significantly between the groups.
To correct for processing-time imbalances, adjust the amount of chunking
via arguments future.scheduling
and future.chunk.size
.
The future_by()
is modeled as closely as possible to the
behavior of base::by()
. Both functions have "default" S3 methods that
calls data <- as.data.frame(data)
internally. This call may in turn call
an S3 method for as.data.frame()
that coerces strings to factors or not
depending on whether it has a stringsAsFactors
argument and what its
default is.
For example, the S3 method of as.data.frame()
for lists changed its
(effective) default from stringsAsFactors = TRUE
to
stringsAsFactors = TRUE
in R 4.0.0.
## ---------------------------------------------------------
## by()
## ---------------------------------------------------------
library(datasets) ## warpbreaks
library(stats) ## lm()
y0 <- by(warpbreaks, warpbreaks[,"tension"],
function(x) lm(breaks ~ wool, data = x))
plan(multisession)
y1 <- future_by(warpbreaks, warpbreaks[,"tension"],
function(x) lm(breaks ~ wool, data = x))
plan(sequential)
y2 <- future_by(warpbreaks, warpbreaks[,"tension"],
function(x) lm(breaks ~ wool, data = x))