Start a packet build (outpack_packet_start
), end one
(outpack_packet_cancel
, outpack_packet_end
) and interact with
one (outpack_packet_use_dependency
,
outpack_packet_run
)
outpack_packet_start(
path,
name,
parameters = NULL,
id = NULL,
logging_console = NULL,
logging_threshold = NULL,
root = NULL
)
outpack_packet_cancel(packet)
outpack_packet_end(packet, insert = TRUE)
outpack_packet_run(packet, script, envir = .GlobalEnv)
outpack_packet_use_dependency(packet, query, files, search_options = NULL)
outpack_packet_add_custom(packet, application, data, schema = NULL)
Path to the build / output directory.
The name of the packet
Optionally, a named list of parameters. The names must be unique, and the values must all be non-NA scalar atomics (logical, integer, numeric, character)
Optionally, an outpack id via outpack_id. If not given a new id will be generated.
Optional logical, indicating if we should
override the root's default in logging to the console. A value
of NULL
uses the root value, TRUE
enables console output
even when this is suppressed by the root, and FALSE
disables
it even when this is enabled by the root.
Optional log threshold, indicating if we
override the root's default in logging to the console. A value
of NULL
uses the root value, otherwise use info
, debug
or
trace
(in increasing order of verbosity).
The outpack root. Will be searched for from the current directory if not given.
A packet object
Logical, indicating if we should insert the packet
into the store. This is the default and generally what you
want. The use-case we have for insert = FALSE
is where you
want to write out all metadata after a failure, and in this case
you would not want to do a final insertion into the outpack
archive. When insert = FALSE
, we write out the json metadata
that would have been written as outpack.json
within the packet
working directory. Note that this skips a lot of validation
(for example, validating that all files exist and that files
marked immutable have not been changed)
Path to the script within the packet directory (a relative path). This function can be safely called multiple times within a single packet run (or zero times!) as needed.
Environment in which to run the script
An outpack_query object, or something (e.g., a string) that can be trivially converted into one.
A named character vector of files; the name corresponds to the name within the current packet, while the value corresponds to the name within the upstream packet
Optional search options for restricting the search (see outpack_search for details)
The name of the application (used to organise the data and query it later, see Details)
Additional metadata to add to the packet. This must be a string representing already-serialised json data.
Optionally, but recommended, a schema to validate
data
against. Validation will only happen if the option
outpack.schema_validate
is TRUE
, as for the main schema
validation. Will be passed to jsonvalidate::json_schema, so
can be a string containing the schema or a path to the schema.
Invisibly, a copy of the packet data; this can be passed
as the packet
argument.
R does not make it extremely easy to "run" a script while collecting output and warnings in a nice way; this is something you may be familiar with when running scripts through things like knitr where differences in behaviour between running from within knitr and R are not uncommon. If you see any behaviour which feels very different to what you expect please let us know.
One area of known difference is that of warnings; what R does with
warnings depends on a number of options - both global and to
warning
itself. We do not try very hard currently to get the
same behaviour with warnings as you might see running directly
with source
and observing your terminal, partly because we
hope that in practice your code will produce very few warnings.
On failure in the script, outpack_packet_run
will throw, forcing
any function that calls outpack_packet_run
to explicitly cope
with error. The error that is generated will have class
outpack_packet_run_error
allowing this error to be easily
distinguished from other R errors. It will have, in addition to
a message
field, additional data fields containing information
about the error:
error
: the original error object, as thrown and caught by outpack
traceback
: the backtrace for the above error, currently just as a
character vector, though this may change in future versions
output
: a character vector of interleaved stdout and stderr as
the script ran
warnings
: a list of warnings raised by the script
The other reason why the script may fail is that it fails to
balance one of the global resource stacks - either connections
(rare) or graphics devices (easy to do). In this case, we still
throw a (classed) error, but the error
field in the final
error will be NULL
, with an informative message explaining
what was not balanced.
The search_options
argument controls where outpack searches for
packets with the given query and if anything might be moved over
the network (or from one outpack archive to another). By default
everything is resolved locally only; that is we can only depend
on packets that are unpacked within our current archive. If you
pass a search_options
argument that contains allow_remote = TRUE
(see outpack_search_options then packets
that are known anywhere are candidates for using as dependencies
and if needed we will pull the resolved files from a remote
location. Note that even if the packet is not locally present
this might not be needed - if you have the same content anywhere
else in an unpacked packet we will reuse the same content
without refetching.
If pull_metadata = TRUE
, then we will refresh location metadata
before pulling, and the location
argument controls which
locations are pulled from.
The outpack_packet_add_custom
function adds arbitrary
additional metadata into a packet. It is primarily designed for
use with applications that build on outpack to provide
additional information beyond the minimal set provided by
outpack.
For example, orderly tracks "artefacts" which collect groups of file outputs into logical bundles. To support this it needs to register additional data for each artefact with:
the description of the artefect (a short phrase)
the format of the artefact (a string describing the data type)
the contents of the artefact (an array of filenames)
JSON for this might look like:
{
"artefacts": [
{
"description": "Data for onward use",
"format": "data",
"contents": ["results.rds", "summary.rds"]
},
{
"description": "Diagnostic figures",
"format": "staticgraph",
"contents": ["fits.png", "inputs.png"]
}
]
}
Here, we describe two artefacts, together collecting four files.
We need to store these in outpack's final metadata, and we want to
do this in a way that allows easy querying later on while
scoping the data to your application. To allow for this we
group all data your application adds under an application key
(e.g., orderly
). You can then store whatever data you want
beneath this key.
NOTE1: A limitation here is that the filenames above cannot be
checked against the outpack list of files because outpack does
not know that contents
here refers to filenames.
NOTE2: To allow for predictable serialisation to JSON, you
must serialise your own data before passing through to
outpack_packet_add_custom
.