Start, interact with, and end a packet build — outpack_packet

Start a packet build (outpack_packet_start), end one (outpack_packet_cancel, outpack_packet_end) and interact with one (outpack_packet_use_dependency, outpack_packet_run)

outpack_packet_start(
  path,
  name,
  parameters = NULL,
  id = NULL,
  logging_console = NULL,
  logging_threshold = NULL,
  root = NULL
)

outpack_packet_cancel(packet)

outpack_packet_end(packet, insert = TRUE)

outpack_packet_run(packet, script, envir = .GlobalEnv)

outpack_packet_use_dependency(packet, query, files, search_options = NULL)

outpack_packet_add_custom(packet, application, data, schema = NULL)

Arguments

path: Path to the build / output directory.
name: The name of the packet
parameters: Optionally, a named list of parameters. The names must be unique, and the values must all be non-NA scalar atomics (logical, integer, numeric, character)
id: Optionally, an outpack id via outpack_id. If not given a new id will be generated.
logging_console: Optional logical, indicating if we should override the root's default in logging to the console. A value of NULL uses the root value, TRUE enables console output even when this is suppressed by the root, and FALSE disables it even when this is enabled by the root.
logging_threshold: Optional log threshold, indicating if we override the root's default in logging to the console. A value of NULL uses the root value, otherwise use info, debug or trace (in increasing order of verbosity).
root: The outpack root. Will be searched for from the current directory if not given.
packet: A packet object
insert: Logical, indicating if we should insert the packet into the store. This is the default and generally what you want. The use-case we have for insert = FALSE is where you want to write out all metadata after a failure, and in this case you would not want to do a final insertion into the outpack archive. When insert = FALSE, we write out the json metadata that would have been written as outpack.json within the packet working directory. Note that this skips a lot of validation (for example, validating that all files exist and that files marked immutable have not been changed)
script: Path to the script within the packet directory (a relative path). This function can be safely called multiple times within a single packet run (or zero times!) as needed.
envir: Environment in which to run the script
query: An outpack_query object, or something (e.g., a string) that can be trivially converted into one.
files: A named character vector of files; the name corresponds to the name within the current packet, while the value corresponds to the name within the upstream packet
search_options: Optional search options for restricting the search (see outpack_search for details)
application: The name of the application (used to organise the data and query it later, see Details)
data: Additional metadata to add to the packet. This must be a string representing already-serialised json data.
schema: Optionally, but recommended, a schema to validate data against. Validation will only happen if the option outpack.schema_validate is TRUE, as for the main schema validation. Will be passed to jsonvalidate::json_schema, so can be a string containing the schema or a path to the schema.

Value

Invisibly, a copy of the packet data; this can be passed as the packet argument.

Running scripts

R does not make it extremely easy to "run" a script while collecting output and warnings in a nice way; this is something you may be familiar with when running scripts through things like knitr where differences in behaviour between running from within knitr and R are not uncommon. If you see any behaviour which feels very different to what you expect please let us know.

One area of known difference is that of warnings; what R does with warnings depends on a number of options - both global and to warning itself. We do not try very hard currently to get the same behaviour with warnings as you might see running directly with source and observing your terminal, partly because we hope that in practice your code will produce very few warnings.

On failure in the script, outpack_packet_run will throw, forcing any function that calls outpack_packet_run to explicitly cope with error. The error that is generated will have class outpack_packet_run_error allowing this error to be easily distinguished from other R errors. It will have, in addition to a message field, additional data fields containing information about the error:

error: the original error object, as thrown and caught by outpack
traceback: the backtrace for the above error, currently just as a character vector, though this may change in future versions
output: a character vector of interleaved stdout and stderr as the script ran
warnings: a list of warnings raised by the script

The other reason why the script may fail is that it fails to balance one of the global resource stacks - either connections (rare) or graphics devices (easy to do). In this case, we still throw a (classed) error, but the error field in the final error will be NULL, with an informative message explaining what was not balanced.

Dependency resolution

The search_options argument controls where outpack searches for packets with the given query and if anything might be moved over the network (or from one outpack archive to another). By default everything is resolved locally only; that is we can only depend on packets that are unpacked within our current archive. If you pass a search_options argument that contains allow_remote = TRUE (see outpack_search_options then packets that are known anywhere are candidates for using as dependencies and if needed we will pull the resolved files from a remote location. Note that even if the packet is not locally present this might not be needed - if you have the same content anywhere else in an unpacked packet we will reuse the same content without refetching.

If pull_metadata = TRUE, then we will refresh location metadata before pulling, and the location argument controls which locations are pulled from.

Custom metadata

The outpack_packet_add_custom function adds arbitrary additional metadata into a packet. It is primarily designed for use with applications that build on outpack to provide additional information beyond the minimal set provided by outpack.

For example, orderly tracks "artefacts" which collect groups of file outputs into logical bundles. To support this it needs to register additional data for each artefact with:

the description of the artefect (a short phrase)
the format of the artefact (a string describing the data type)
the contents of the artefact (an array of filenames)

JSON for this might look like:

{
  "artefacts": [
    {
      "description": "Data for onward use",
      "format": "data",
      "contents": ["results.rds", "summary.rds"]
    },
    {
      "description": "Diagnostic figures",
      "format": "staticgraph",
      "contents": ["fits.png", "inputs.png"]
    }
  ]
}

Here, we describe two artefacts, together collecting four files.

We need to store these in outpack's final metadata, and we want to do this in a way that allows easy querying later on while scoping the data to your application. To allow for this we group all data your application adds under an application key (e.g., orderly). You can then store whatever data you want beneath this key.

NOTE1: A limitation here is that the filenames above cannot be checked against the outpack list of files because outpack does not know that contents here refers to filenames.

NOTE2: To allow for predictable serialisation to JSON, you must serialise your own data before passing through to outpack_packet_add_custom.