Create a bulk set of tasks based on applying a function over a vector or data.frame. This is the bulk equivalent of task_create_call, in the same way that task_create_bulk_expr is a bulk version of task_create_expr.
Usage
task_create_bulk_call(
fn,
data,
args = NULL,
environment = "default",
bundle_name = NULL,
driver = NULL,
resources = NULL,
envvars = NULL,
parallel = NULL,
root = NULL
)
Arguments
- fn
The function to call
- data
The data to apply the function over. This can be a vector or list, in which case we act like
lapply
and applyfn
to each element in turn. Alternatively, this can be a data.frame, in which case each row is taken as a set of arguments tofn
. Note that ifdata
is adata.frame
then all arguments tofn
are named.- args
Additional arguments to
fn
, shared across all calls. These must be named. If you are using adata.frame
fordata
, you'd probably be better off adding additional columns that don't vary across rows, but the end result is the same.- environment
Name of the hipercow environment to evaluate the task within.
- bundle_name
Name to pass to
hipercow_bundle_create
when making a bundle. IfNULL
we use a random name. We always overwrite, so ifbundle_name
already refers to a bundle it will be replaced.- driver
Name of the driver to use to submit the task. The default (
NULL
) depends on your configured drivers; if you have no drivers configured no submission happens (or indeed is possible). If you have exactly one driver configured we'll submit your task with it. If you have more than one driver configured, then we will error, though in future versions we may fall back on a default driver if you have one configured. If you passFALSE
here, submission is prevented even if you have no driver configured.- resources
A list generated by hipercow_resources giving the cluster resource requirements to run your task.
- envvars
Environment variables as generated by hipercow_envvars, which you might use to control your task. These will be combined with the default environment variables (see
vignettes("details")
, this can be overridden by the optionhipercow.default_envvars
), and any driver-specific environment variables (seevignette("windows")
). Variables provided here have the highest precedence. You can unset an environment variable by setting it toNA
.- parallel
Parallel configuration as generated by hipercow_parallel, which defines which method, if any, will be used to initialise your task for parallel execution.
- root
A hipercow root, or path to it. If
NULL
we search up your directory tree.
Value
A hipercow_bundle
object, which groups together tasks,
and for which you can use a set of grouped functions to get
status (hipercow_bundle_status
), results
(hipercow_bundle_result
) etc.
Examples
cleanup <- hipercow_example_helper()
#> ℹ This example uses a special helper
# The simplest way to use this function is like lapply:
x <- runif(5)
bundle <- task_create_bulk_call(sqrt, x)
#> ✔ Submitted 5 tasks using 'example'
#> ✔ Created bundle 'quasiconservative_greatargus' with 5 tasks
hipercow_bundle_wait(bundle)
#> [1] TRUE
hipercow_bundle_result(bundle) # lapply(x, sqrt)
#> [[1]]
#> [1] 0.1042711
#>
#> [[2]]
#> [1] 0.2512714
#>
#> [[3]]
#> [1] 0.5136553
#>
#> [[4]]
#> [1] 0.7706079
#>
#> [[5]]
#> [1] 0.5454906
#>
# You can pass additional arguments in via 'args':
x <- runif(5)
bundle <- task_create_bulk_call(log, x, list(base = 3))
#> ✔ Submitted 5 tasks using 'example'
#> ✔ Created bundle 'copacetic_graywolf' with 5 tasks
hipercow_bundle_wait(bundle)
#> [1] TRUE
hipercow_bundle_result(bundle) # lapply(x, log, base = 3)
#> [[1]]
#> [1] -0.2349812
#>
#> [[2]]
#> [1] -0.8295257
#>
#> [[3]]
#> [1] -2.574896
#>
#> [[4]]
#> [1] -0.2615587
#>
#> [[5]]
#> [1] -0.5807769
#>
# Passing in a data.frame acts like Map (though with all arguments named)
x <- data.frame(a = runif(5), b = rpois(5, 10))
bundle <- task_create_bulk_call(function(a, b) sum(rnorm(b)) / a, x)
#> ✔ Submitted 5 tasks using 'example'
#> ✔ Created bundle 'known_xoloitzcuintli' with 5 tasks
hipercow_bundle_wait(bundle)
#> [1] TRUE
hipercow_bundle_result(bundle) # Map(f, x$a, x$b)
#> [[1]]
#> [1] -7.021743
#>
#> [[2]]
#> [1] 5.74547
#>
#> [[3]]
#> [1] 9.834079
#>
#> [[4]]
#> [1] 0.6170752
#>
#> [[5]]
#> [1] 27.38846
#>
cleanup()
#> ℹ Cleaning up example