Start a packet build (outpack_packet_start), end one (outpack_packet_cancel, outpack_packet_end) and interact with one (outpack_packet_use_dependency, outpack_packet_run)

outpack_packet_start(
  path,
  name,
  parameters = NULL,
  id = NULL,
  logging_console = NULL,
  logging_threshold = NULL,
  root = NULL
)

outpack_packet_cancel(packet)

outpack_packet_end(packet, insert = TRUE)

outpack_packet_run(packet, script, envir = .GlobalEnv)

outpack_packet_use_dependency(packet, query, files, search_options = NULL)

outpack_packet_add_custom(packet, application, data, schema = NULL)

Arguments

path

Path to the build / output directory.

name

The name of the packet

parameters

Optionally, a named list of parameters. The names must be unique, and the values must all be non-NA scalar atomics (logical, integer, numeric, character)

id

Optionally, an outpack id via outpack_id. If not given a new id will be generated.

logging_console

Optional logical, indicating if we should override the root's default in logging to the console. A value of NULL uses the root value, TRUE enables console output even when this is suppressed by the root, and FALSE disables it even when this is enabled by the root.

logging_threshold

Optional log threshold, indicating if we override the root's default in logging to the console. A value of NULL uses the root value, otherwise use info, debug or trace (in increasing order of verbosity).

root

The outpack root. Will be searched for from the current directory if not given.

packet

A packet object

insert

Logical, indicating if we should insert the packet into the store. This is the default and generally what you want. The use-case we have for insert = FALSE is where you want to write out all metadata after a failure, and in this case you would not want to do a final insertion into the outpack archive. When insert = FALSE, we write out the json metadata that would have been written as outpack.json within the packet working directory. Note that this skips a lot of validation (for example, validating that all files exist and that files marked immutable have not been changed)

script

Path to the script within the packet directory (a relative path). This function can be safely called multiple times within a single packet run (or zero times!) as needed.

envir

Environment in which to run the script

query

An outpack_query object, or something (e.g., a string) that can be trivially converted into one.

files

A named character vector of files; the name corresponds to the name within the current packet, while the value corresponds to the name within the upstream packet

search_options

Optional search options for restricting the search (see outpack_search for details)

application

The name of the application (used to organise the data and query it later, see Details)

data

Additional metadata to add to the packet. This must be a string representing already-serialised json data.

schema

Optionally, but recommended, a schema to validate data against. Validation will only happen if the option outpack.schema_validate is TRUE, as for the main schema validation. Will be passed to jsonvalidate::json_schema, so can be a string containing the schema or a path to the schema.

Value

Invisibly, a copy of the packet data; this can be passed as the packet argument.

Running scripts

R does not make it extremely easy to "run" a script while collecting output and warnings in a nice way; this is something you may be familiar with when running scripts through things like knitr where differences in behaviour between running from within knitr and R are not uncommon. If you see any behaviour which feels very different to what you expect please let us know.

One area of known difference is that of warnings; what R does with warnings depends on a number of options - both global and to warning itself. We do not try very hard currently to get the same behaviour with warnings as you might see running directly with source and observing your terminal, partly because we hope that in practice your code will produce very few warnings.

On failure in the script, outpack_packet_run will throw, forcing any function that calls outpack_packet_run to explicitly cope with error. The error that is generated will have class outpack_packet_run_error allowing this error to be easily distinguished from other R errors. It will have, in addition to a message field, additional data fields containing information about the error:

  • error: the original error object, as thrown and caught by outpack

  • traceback: the backtrace for the above error, currently just as a character vector, though this may change in future versions

  • output: a character vector of interleaved stdout and stderr as the script ran

  • warnings: a list of warnings raised by the script

The other reason why the script may fail is that it fails to balance one of the global resource stacks - either connections (rare) or graphics devices (easy to do). In this case, we still throw a (classed) error, but the error field in the final error will be NULL, with an informative message explaining what was not balanced.

Dependency resolution

The search_options argument controls where outpack searches for packets with the given query and if anything might be moved over the network (or from one outpack archive to another). By default everything is resolved locally only; that is we can only depend on packets that are unpacked within our current archive. If you pass a search_options argument that contains allow_remote = TRUE (see outpack_search_options then packets that are known anywhere are candidates for using as dependencies and if needed we will pull the resolved files from a remote location. Note that even if the packet is not locally present this might not be needed - if you have the same content anywhere else in an unpacked packet we will reuse the same content without refetching.

If pull_metadata = TRUE, then we will refresh location metadata before pulling, and the location argument controls which locations are pulled from.

Custom metadata

The outpack_packet_add_custom function adds arbitrary additional metadata into a packet. It is primarily designed for use with applications that build on outpack to provide additional information beyond the minimal set provided by outpack.

For example, orderly tracks "artefacts" which collect groups of file outputs into logical bundles. To support this it needs to register additional data for each artefact with:

  • the description of the artefect (a short phrase)

  • the format of the artefact (a string describing the data type)

  • the contents of the artefact (an array of filenames)

JSON for this might look like:

{
  "artefacts": [
    {
      "description": "Data for onward use",
      "format": "data",
      "contents": ["results.rds", "summary.rds"]
    },
    {
      "description": "Diagnostic figures",
      "format": "staticgraph",
      "contents": ["fits.png", "inputs.png"]
    }
  ]
}

Here, we describe two artefacts, together collecting four files.

We need to store these in outpack's final metadata, and we want to do this in a way that allows easy querying later on while scoping the data to your application. To allow for this we group all data your application adds under an application key (e.g., orderly). You can then store whatever data you want beneath this key.

NOTE1: A limitation here is that the filenames above cannot be checked against the outpack list of files because outpack does not know that contents here refers to filenames.

NOTE2: To allow for predictable serialisation to JSON, you must serialise your own data before passing through to outpack_packet_add_custom.