Apply a Function to a Data Frame Split by Factors via Futures

  simplify = TRUE,
  future.envir = parent.frame()



An R object, normally a data frame, possibly a matrix.


A factor or a list of factors, each of length nrow(data).


a function to be applied to (usually data-frame) subsets of data.


logical: see base::tapply.


An environment passed as argument envir to future::future() as-is.


Additional arguments pass to future_lapply() and then to FUN().


An object of class "by", giving the results for each subset. This is always a list if simplify is false, otherwise a list or array (see base::tapply). See also base::by() for details.


Internally, data is grouped by INDICES into a list of data subset elements which is then processed by future_lapply(). When the groups differ significantly in size, the processing time may differ significantly between the groups. To correct for processing-time imbalances, adjust the amount of chunking via arguments future.scheduling and future.chunk.size.

Note on 'stringsAsFactors'

The future_by() is modeled as closely as possible to the behavior of base::by(). Both functions have "default" S3 methods that calls data <- internally. This call may in turn call an S3 method for that coerces strings to factors or not depending on whether it has a stringsAsFactors argument and what its default is. For example, the S3 method of for lists changed its (effective) default from stringsAsFactors = TRUE to stringsAsFactors = TRUE in R 4.0.0.


## --------------------------------------------------------- ## by() ## --------------------------------------------------------- library(datasets) ## warpbreaks library(stats) ## lm() y0 <- by(warpbreaks, warpbreaks[,"tension"], function(x) lm(breaks ~ wool, data = x)) plan(multisession) y1 <- future_by(warpbreaks, warpbreaks[,"tension"], function(x) lm(breaks ~ wool, data = x)) plan(sequential) y2 <- future_by(warpbreaks, warpbreaks[,"tension"], function(x) lm(breaks ~ wool, data = x))