Creating plugins • orderly2

You may not need to read this: the intended readers are authors of orderly2 plugins, not users of such plugins.

In order to make orderly2 more extensible without bloating the core, we have designed a simple plugin interface. Our first use case for this is shifting all of orderly1’s database functionality out of the main package, but other uses are possible!

This vignette is intended to primarily serve as a design document, and will be of interest to the small number of people who might want to write a new plugin, or to edit an existing one.

The basic idea

A plugin is provided by a package, possibly it will be the only thing that a package provides. The plugin name must (currently) be the same as the package name. The only functions that the package needs to call are orderly2::orderly_plugin and orderly2::orderly_plugin_register which create and register the plugin, respectively.

To make a plugin available for an orderly project, two new bits of configuration may be present in orderly_config.yml - one declares the plugin will be used, the other configures the plugin.

To use a plugin for an individual report, functions from the plugin should be used, which configure and use the plugin.

Finally, we can save information back into the final orderly2 metadata about what the plugin did.

With the yaml-less design of orderly2 (see vignette("migrating") if you are familiar with orderly1), the line between a plugin and just package code is fairly blurred, but reasons for writing a plugin are typically that you want to make something easier in reports, and you want that action reflected in the orderly metadata.

An example

As an example, we’ll implement a stripped down version of the database plugin that inspired this work (see `orderly.db for a fuller implementation). To make this work we need functions:

…that process additional fields in orderly_config.yml that describe where to find the database
…that can be called from an orderly file that access the database
…that can add metadata to the final orderly metadata about what was done

We’ll start with the report side of things, describing what we want to happen, then work on the implementation.

Here is the directory structure of our minimal project

## .
## ├── orderly_config.yml
## └── src
##     └── example
##         └── example.R

The orderly_config.yml file contains the information shared by all possible uses of the plugin - in the case the connection information for the database:

minimum_orderly_version: 1.99.0
plugins:
  example.db:
    path: /tmp/RtmppttE8n/file1bf1549eed9e

Our plugin is called example.db and is listed within the plugins section, along with its configuration; in this case indicating the path where the SQLite file can be loaded from.

The example.R file contains information about use of the database for this specific report; in this case, making the results of the query SELECT * from mtcars WHERE cyl == 4 against the database available as some R object dat

dat <- example.db::query("SELECT * FROM mtcars WHERE cyl == 4")
orderly2::orderly_artefact("Summary of data", "data.rds")

saveRDS(summary(dat), "data.rds")

Normally, we imagine some calculation here but this is kept minimal for the purpose of demonstration.

To implement this we need to:

create a package
write a function to handle the configuration in orderly_config.yml
write a function query() used in example.R to do the query itself

Create a tiny package

There are lots of package skeleton tools out there, and if you do not have a favourite, usethis::create_package() will probably do a reasonable job. The only thing your package needs to do is to contain Imports: orderly2 in its DESCRIPTION field.

A simple package may have a structure like

## .
## ├── DESCRIPTION
## ├── NAMESPACE
## └── R
##     └── plugin.R

Here, our DESCRIPTION file contains:

Package: example.db
Version: 0.0.1
License: CC0
Title: Orderly Database Example Plugin
Description: Simple example of an orderly plugin.
Authors@R: person('Orderly Authors', role = c('aut', 'cre'),
    email = 'email@example.com')
Imports: orderly2

and the NAMESPACE and R/plugin.R files are shown below.

Handle the configuration

The only required function that a plugin needs to provide is one to process the data from orderly_config.yml. This is probably primarily concerned with validation so can be fairly simple at first, later we’ll expand this to report errors nicely:

db_config <- function(data, filename) {
  data
}

The arguments here are

data: the deserialised section of the orderly_config.yml specific to this plugin
filename: the full path to orderly_config.yml

The return value here should be the data argument with any auxiliary data added after validation.

Evaluate the query

Finally, for our minimal example, we need the function that actually does the query; in our example above this is example.db::query:

query <- function(sql) {
  ctx <- orderly2::orderly_plugin_context("example.db")
  dbname <- ctx$config$path
  con <- DBI::dbConnect(RSQLite::SQLite(), dbname)
  on.exit(DBI::dbDisconnect(con))
  DBI::dbGetQuery(con, sql)
}

The arguments here are whatever you want the user to provide – nothing here is special to orderly2. The important function here to call is orderly2::orderly_plugin_context which returns information that you can use to make the plugin work. This is explained in ?orderly2::orderly_plugin_context, but in this example we use just one element, config, the configuration for this plugin (i.e., the return value from our function db_config); see orderly2::orderly_plugin_context for other context that can be accessed here.

The last bit of package code is to register the plugin, we do this by calling orderly2::orderly_plugin_register within .onLoad() which is a special R function called when a package is loaded. This means that whenever your packages is loaded (regardless of whether it is attached) it will register the plugin.

.onLoad <- function(...) {
  orderly2::orderly_plugin_register(
    name = "example.db",
    config = db_config)
}

(It is important that the name argument here matches your package name, as orderly2 will trigger loading the package based on this name in the configuration; we may support multiple plugins within one package later.)

Note that our query function here does not appear within this registration, just the function to read and process the configuration.

Our final (minimal) package code is:

db_config <- function(data, filename) {
  data
}

query <- function(sql) {
  ctx <- orderly2::orderly_plugin_context("example.db")
  dbname <- ctx$config$path
  con <- DBI::dbConnect(RSQLite::SQLite(), dbname)
  on.exit(DBI::dbDisconnect(con))
  DBI::dbGetQuery(con, sql)
}

.onLoad <- function(...) {
  orderly2::orderly_plugin_register(
    name = "example.db",
    config = db_config)
}

and the NAMESPACE file contains

export(query)

Trying it out

In order to test your package, it needs to be loaded. You can do this by either installing the package or by using pkgload::load_all() (you may find doing so with pkgload::load_all(export_all = FALSE) gives the most reliable experience.

pkgload::load_all()

## ℹ Loading example.db

Now, we can run the report:

orderly2::orderly_run("example", root = path_root)
## ℹ Starting packet 'example' `20240913-130527-b8b867a0` at 2024-09-13 13:05:27.725355
## > dat <- example.db::query("SELECT * FROM mtcars WHERE cyl == 4")
## > orderly2::orderly_artefact("Summary of data", "data.rds")
## Warning: Please use a named argument for the description in 'orderly_artefact()'
## In future versions of orderly, we will change the order of the arguments to
## 'orderly_artefact()' so that 'files' comes first. If you name your calls to
## 'description' then you will be compatible when we make this change.
## > saveRDS(summary(dat), "data.rds")
## ✔ Finished running example.R
## ! 1 warning found:
## • Please use a named argument for the description in 'orderly_artefact()' In
## future versions of orderly, we will change the order of the arguments to
## 'orderly_artefact()' so that 'files' comes first. If you name your calls to
## 'description' then you will be compatible when we make this change.
## ℹ Finished 20240913-130527-b8b867a0 at 2024-09-13 13:05:27.910492 (0.1851375 secs)
## [1] "20240913-130527-b8b867a0"

Making the plugin more robust

The plugin above is fairly fragile because it does not do any validation on the input data from orderly_config.yml or orderly.yml. This is fairly annoying to do as yaml is incredibly flexible and reporting back information to the user about what might have gone wrong is hard.

In our case, we expect a single key-value pair in orderly_config.yml with the key being path and the value being the path to a SQLite database. We can easily expand our configuration function to report better back to the user when they misconfigure the plugin:

db_config <- function(data, filename) {
  if (!is.list(data) || is.null(names(data)) || length(data) == 0) {
    stop("Expected a named list for orderly_config.yml:example.db")
  }
  if (length(data$path) != 1 || !is.character(data$path)) {
    stop("Expected a string for orderly_config.yml:example.db:path")
  }
  if (!file.exists(data$path)) {
    stop(sprintf(
      "The database '%s' does not exist (orderly_config:example.db:path)",
      data$path))
  }
  data
}

This should do an acceptable job of preventing poor input while suggesting to the user where they might look within the configuration to fix it. Note that we return the configuration data here, and you can augment (or otherwise change) this data as you need.

Saving metadata about what the plugin did

Nothing about what the plugin does is saved into the report metadata unless you save it. Partly this is because the orderly.yml, which is saved into the final directory, serves as some sort of record. However, you probably want to know something about the data that you returned here. For example we might want to save

the query string so that later we can query it without having to read and process the orderly.yml file
some statistics about the size of the data (e.g., the number of rows returned, or the columns)
perhaps some summary of the content such as a hash so that we can see if the content has changed between different versions of a report

To save metadata, use the function orderly2::orderly_plugin_add_metadata; this takes as arguments your plugin name, any string you like to structure the saved metadata (here we’ll use query) and whatever data you want to save:

query <- function(sql) {
  ctx <- orderly2::orderly_plugin_context("example.db")
  dbname <- ctx$config$path
  con <- DBI::dbConnect(RSQLite::SQLite(), dbname)
  on.exit(DBI::dbDisconnect(con))
  d <- DBI::dbGetQuery(con, sql)
  info <- list(sql = sql, rows = nrow(d), cols = names(d))
  orderly2::orderly_plugin_add_metadata("example.db", "query", info)
  d
}

This function is otherwise the same as the minimal version above.

We also need to provide a serialisation function to ensure that the metadata is saved as expected. Because we saved our metadata under the key query, we will get a list back with an element query and then an unnamed list with as many elements as there were query calls in a given report.

db_serialise <- function(data) {
  for (i in seq_along(data$query)) {
    # Always save cols as a vector, even if length 1:
    data$query[[i]]$cols <- I(data$query[[i]]$cols)
  }
  jsonlite::toJSON(data$query, auto_unbox = TRUE)
}

Here, we ensure that everything except cols that is length 1 (which will be everything) gets turned into a scalar (so 1 not [1]) and then serialise with jsonlite::toJSON with auto_unbox as TRUE.

Taking this a step further, we can also specify a schema that this metadata will conform to

{
    "$schema": "http://json-schema.org/draft-07/schema#",

    "type": "array",
    "items": {
        "type": "object",
        "properties": {
            "sql": {
                "type": "string"
            },
            "rows": {
                "type": "number"
            },
            "cols": {
                "type": "array",
                "items": {
                    "type": "string"
                }
            }
        },
        "required": ["sql", "rows", "cols"],
        "additionalProperties": false
    }
}

We save this file as inst/schema.json within the package (any path within inst is fine).

Finally, we can also add a deserialiation hook to convert the loaded metadata into a nice data.frame:

Now, when we register the plugin, we provide the path to this schema, along with the serialisation and deserialisation functions:

.onLoad <- function(...) {
  orderly2::orderly_plugin_register(
    name = "example.db",
    config = db_config,
    serialise = db_serialise,
    deserialise = db_deserialise,
    schema = "schema.json")
}

Now, when the orderly metadata is saved (just before running the script part of a report) we will validate output that was passed into orderly2::orderly_plugin_add_metadata against the schema, if jsonvalidate is installed (currently this requires our development version) and if the R option outpack.schema_validate is set to TRUE (e.g., by running options(outpack.schema_validate = TRUE)).

Our final package has structure:

## .
## ├── archive
## │   └── example
## │       └── 20240913-130527-b8b867a0
## │           ├── data.rds
## │           └── example.R
## ├── draft
## │   └── example
## ├── orderly_config.yml
## └── src
##     └── example
##         └── example.R

The DESCRIPTION file and NAMESPACE are unchanged from above, and the schema is shown just above.

The plugin.R file contains the code collected from above:

db_config <- function(data, filename) {
  if (!is.list(data) || is.null(names(data)) || length(data) == 0) {
    stop("Expected a named list for orderly_config.yml:example.db")
  }
  if (length(data$path) != 1 || !is.character(data$path)) {
    stop("Expected a string for orderly_config.yml:example.db:path")
  }
  if (!file.exists(data$path)) {
    stop(sprintf(
      "The database '%s' does not exist (orderly_config:example.db:path)",
      data$path))
  }
  data
}

query <- function(sql) {
  ctx <- orderly2::orderly_plugin_context("example.db")
  dbname <- ctx$config$path
  con <- DBI::dbConnect(RSQLite::SQLite(), dbname)
  on.exit(DBI::dbDisconnect(con))
  d <- DBI::dbGetQuery(con, sql)
  info <- list(sql = sql, rows = nrow(d), cols = names(d))
  orderly2::orderly_plugin_add_metadata("example.db", "query", info)
  d
}

.onLoad <- function(...) {
  orderly2::orderly_plugin_register(
    name = "example.db",
    config = db_config,
    serialise = db_serialise,
    deserialise = db_deserialise,
    schema = "schema.json")
}

(this code could be in any .R file in the package, or across several).

id <- orderly2::orderly_run("example", root = path_root)
## ℹ Starting packet 'example' `20240913-130528-cb4add44` at 2024-09-13 13:05:28.7966
## > dat <- example.db::query("SELECT * FROM mtcars WHERE cyl == 4")
## > orderly2::orderly_artefact("Summary of data", "data.rds")
## Warning: Please use a named argument for the description in 'orderly_artefact()'
## In future versions of orderly, we will change the order of the arguments to
## 'orderly_artefact()' so that 'files' comes first. If you name your calls to
## 'description' then you will be compatible when we make this change.
## > saveRDS(summary(dat), "data.rds")
## ✔ Finished running example.R
## ! 1 warning found:
## • Please use a named argument for the description in 'orderly_artefact()' In
## future versions of orderly, we will change the order of the arguments to
## 'orderly_artefact()' so that 'files' comes first. If you name your calls to
## 'description' then you will be compatible when we make this change.
## ℹ Finished 20240913-130528-cb4add44 at 2024-09-13 13:05:28.849757 (0.05315685 secs)
meta <- orderly2::orderly_metadata(id, root = path_root)
meta$custom$example.db
##                                   sql rows         cols
## 1 SELECT * FROM mtcars WHERE cyl == 4   11 mpg, cyl....

Potential uses

Our need for this functionality are similar to this example - pulling out the database functionality from the original version of orderly into something that is more independent, as it turns out to be useful only in a fraction of orderly use-cases. We can imagine other potential uses though, such as:

Non-DBI-based database data extraction, or customised routines for pulling data from a database
Download files from some shared location just before use (e.g., SharePoint, OneDrive, AWS). The orderly_config.yml would contain account connection details and orderly.yml would contain mapping between the remote data/files and local files. Rather than writing to the environment as we do above, use the path argument to copy files into the correct place.
Pull data from some web API just before running

These all follow the same basic pattern of requiring some configuration in order to be able to connect to the resource service, some specification of what resources are to be fetched, and some action to actually fetch the resource and put it into place.