Skip to contents

Create a bulk set of tasks based on applying a function over a vector or data.frame. This is the bulk equivalent of task_create_call, in the same way that task_create_bulk_expr is a bulk version of task_create_expr.

Usage

task_create_bulk_call(
  fn,
  data,
  args = NULL,
  environment = "default",
  bundle_name = NULL,
  driver = NULL,
  resources = NULL,
  envvars = NULL,
  parallel = NULL,
  root = NULL
)

Arguments

fn

The function to call

data

The data to apply the function over. This can be a vector or list, in which case we act like lapply and apply fn to each element in turn. Alternatively, this can be a data.frame, in which case each row is taken as a set of arguments to fn. Note that if data is a data.frame then all arguments to fn are named.

args

Additional arguments to fn, shared across all calls. These must be named. If you are using a data.frame for data, you'd probably be better off adding additional columns that don't vary across rows, but the end result is the same.

environment

Name of the hipercow environment to evaluate the task within.

bundle_name

Name to pass to hipercow_bundle_create when making a bundle. If NULL we use a random name. We always overwrite, so if bundle_name already refers to a bundle it will be replaced.

driver

Name of the driver to use to submit the task. The default (NULL) depends on your configured drivers; if you have no drivers configured no submission happens (or indeed is possible). If you have exactly one driver configured we'll submit your task with it. If you have more than one driver configured, then we will error, though in future versions we may fall back on a default driver if you have one configured. If you pass FALSE here, submission is prevented even if you have no driver configured.

resources

A list generated by hipercow_resources giving the cluster resource requirements to run your task.

envvars

Environment variables as generated by hipercow_envvars, which you might use to control your task. These will be combined with the default environment variables (see vignettes("details"), this can be overridden by the option hipercow.default_envvars), and any driver-specific environment variables (see vignette("windows")). Variables provided here have the highest precedence. You can unset an environment variable by setting it to NA.

parallel

Parallel configuration as generated by hipercow_parallel, which defines which method, if any, will be used to initialise your task for parallel execution.

root

A hipercow root, or path to it. If NULL we search up your directory tree.

Value

A hipercow_bundle object, which groups together tasks, and for which you can use a set of grouped functions to get status (hipercow_bundle_status), results (hipercow_bundle_result) etc.

Examples

cleanup <- hipercow_example_helper()
#>  This example uses a special helper

# The simplest way to use this function is like lapply:
x <- runif(5)
bundle <- task_create_bulk_call(sqrt, x)
#>  Submitted 5 tasks using 'example'
#>  Created bundle 'quasiconservative_greatargus' with 5 tasks
hipercow_bundle_wait(bundle)
#> [1] TRUE
hipercow_bundle_result(bundle) # lapply(x, sqrt)
#> [[1]]
#> [1] 0.1042711
#> 
#> [[2]]
#> [1] 0.2512714
#> 
#> [[3]]
#> [1] 0.5136553
#> 
#> [[4]]
#> [1] 0.7706079
#> 
#> [[5]]
#> [1] 0.5454906
#> 

# You can pass additional arguments in via 'args':
x <- runif(5)
bundle <- task_create_bulk_call(log, x, list(base = 3))
#>  Submitted 5 tasks using 'example'
#>  Created bundle 'copacetic_graywolf' with 5 tasks
hipercow_bundle_wait(bundle)
#> [1] TRUE
hipercow_bundle_result(bundle) # lapply(x, log, base = 3)
#> [[1]]
#> [1] -0.2349812
#> 
#> [[2]]
#> [1] -0.8295257
#> 
#> [[3]]
#> [1] -2.574896
#> 
#> [[4]]
#> [1] -0.2615587
#> 
#> [[5]]
#> [1] -0.5807769
#> 

# Passing in a data.frame acts like Map (though with all arguments named)
x <- data.frame(a = runif(5), b = rpois(5, 10))
bundle <- task_create_bulk_call(function(a, b) sum(rnorm(b)) / a, x)
#>  Submitted 5 tasks using 'example'
#>  Created bundle 'known_xoloitzcuintli' with 5 tasks
hipercow_bundle_wait(bundle)
#> [1] TRUE
hipercow_bundle_result(bundle) # Map(f, x$a, x$b)
#> [[1]]
#> [1] -7.021743
#> 
#> [[2]]
#> [1] 5.74547
#> 
#> [[3]]
#> [1] 9.834079
#> 
#> [[4]]
#> [1] 0.6170752
#> 
#> [[5]]
#> [1] 27.38846
#> 

cleanup()
#>  Cleaning up example