Create a bulk set of tasks based on applying a function over a vector or data.frame. This is the bulk equivalent of task_create_call, in the same way that task_create_bulk_expr is a bulk version of task_create_expr.
Usage
task_create_bulk_call(
fn,
data,
args = NULL,
environment = "default",
bundle_name = NULL,
driver = NULL,
resources = NULL,
envvars = NULL,
parallel = NULL,
root = NULL
)Arguments
- fn
The function to call
- data
The data to apply the function over. This can be a vector or list, in which case we act like
lapplyand applyfnto each element in turn. Alternatively, this can be a data.frame, in which case each row is taken as a set of arguments tofn. Note that ifdatais adata.framethen all arguments tofnare named.- args
Additional arguments to
fn, shared across all calls. These must be named. If you are using adata.framefordata, you'd probably be better off adding additional columns that don't vary across rows, but the end result is the same.- environment
Name of the hipercow environment to evaluate the task within.
- bundle_name
Name to pass to
hipercow_bundle_createwhen making a bundle. IfNULLwe use a random name. We always overwrite, so ifbundle_namealready refers to a bundle it will be replaced.- driver
Name of the driver to use to submit the task. The default (
NULL) depends on your configured drivers; if you have no drivers configured no submission happens (or indeed is possible). If you have exactly one driver configured we'll submit your task with it. If you have more than one driver configured, then we will error, though in future versions we may fall back on a default driver if you have one configured. If you passFALSEhere, submission is prevented even if you have no driver configured.- resources
A list generated by hipercow_resources giving the cluster resource requirements to run your task.
- envvars
Environment variables as generated by hipercow_envvars, which you might use to control your task. These will be combined with the default environment variables (see
vignettes("details"), this can be overridden by the optionhipercow.default_envvars), and any driver-specific environment variables (seevignette("windows")). Variables provided here have the highest precedence. You can unset an environment variable by setting it toNA.- parallel
Parallel configuration as generated by hipercow_parallel, which defines which method, if any, will be used to initialise your task for parallel execution.
- root
A hipercow root, or path to it. If
NULLwe search up your directory tree.
Value
A hipercow_bundle object, which groups together tasks,
and for which you can use a set of grouped functions to get
status (hipercow_bundle_status), results
(hipercow_bundle_result) etc.
Examples
cleanup <- hipercow_example_helper()
#> ℹ This example uses a special helper
# The simplest way to use this function is like lapply:
x <- runif(5)
bundle <- task_create_bulk_call(sqrt, x)
#> ✔ Submitted 5 tasks using 'example'
#> ✔ Created bundle 'unemployed_grunion' with 5 tasks
hipercow_bundle_wait(bundle)
#> [1] TRUE
hipercow_bundle_result(bundle) # lapply(x, sqrt)
#> [[1]]
#> [1] 0.8178452
#>
#> [[2]]
#> [1] 0.7124958
#>
#> [[3]]
#> [1] 0.8126249
#>
#> [[4]]
#> [1] 0.7153959
#>
#> [[5]]
#> [1] 0.9140856
#>
# You can pass additional arguments in via 'args':
x <- runif(5)
bundle <- task_create_bulk_call(log, x, list(base = 3))
#> ✔ Submitted 5 tasks using 'example'
#> ✔ Created bundle 'pacifistic_ermine' with 5 tasks
hipercow_bundle_wait(bundle)
#> [1] TRUE
hipercow_bundle_result(bundle) # lapply(x, log, base = 3)
#> [[1]]
#> [1] -0.1078657
#>
#> [[2]]
#> [1] -0.003331266
#>
#> [[3]]
#> [1] -0.6305812
#>
#> [[4]]
#> [1] -0.9325626
#>
#> [[5]]
#> [1] -0.2321151
#>
# Passing in a data.frame acts like Map (though with all arguments named)
x <- data.frame(a = runif(5), b = rpois(5, 10))
bundle <- task_create_bulk_call(function(a, b) sum(rnorm(b)) / a, x)
#> ✔ Submitted 5 tasks using 'example'
#> ✔ Created bundle 'evilminded_ballpython' with 5 tasks
hipercow_bundle_wait(bundle)
#> Error in hipercow_bundle_wait(bundle): Bundle 'evilminded_ballpython' did not complete in time
hipercow_bundle_result(bundle) # Map(f, x$a, x$b)
#> Error in hipercow_bundle_result(bundle): Can't fetch results for bundle 'evilminded_ballpython' due to error
#> fetching result for 'e24bcb5240dc401fee2a607f60ba6b0f'
#> Caused by error in `task_result()`:
#> ! Result for task 'e24bcb5240dc401fee2a607f60ba6b0f' not available,
#> status is 'submitted'
cleanup()
#> ℹ Cleaning up example
