Often the most difficult part of configuring your cluster jobs is sorting out all the packages that you need and making sure that they are present on the cluster. There are several levels of difficulty here and this document will walk through them in turn.

Everything is on CRAN

This is the most straightforward situation - all your packages are on CRAN. You don’t need to do anything special typically, just create your context with a list of packages and create the queue:

root <- "pkgs"
ctx <- context::context_save(root, packages = c("dplyr", "ggplot2"))
#> [ init:id   ]  a11d9597170a8cc72cc5b57c3ac3d7a0
#> [ init:db   ]  rds
#> [ init:path ]  pkgs
#> [ save:id   ]  2ab2a78d26df3a50b9beb45b52eba466
#> [ save:name ]  seaisland_mammal
obj <- didehpc::queue_didehpc(ctx)
#> Loading context 2ab2a78d26df3a50b9beb45b52eba466
#> [ context   ]  2ab2a78d26df3a50b9beb45b52eba466
#> [ library   ]  dplyr, ggplot2
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#>     filter, lag
#> The following objects are masked from 'package:base':
#>
#>     intersect, setdiff, setequal, union
#> [ namespace ]
#> [ source    ]
#> Running installation script on cluster
#>   ,:\      /:.
#>  //  \_()_/  \\
#> ||   |    |   ||  CONAN THE LIBRARIAN
#> ||   |    |   ||  Library:   Q:\didehpc\20210817-145020\pkgs\lib\windows\4.0
#> ||   |____|   ||  Bootstrap: T:\conan\bootstrap\4.0
#>  \\  / || \  //   Cache:     Q:\didehpc\20210817-145020\pkgs\conan\cache/pkg
#>   `:/  ||  \;'    Policy:    lazy
#>        ||         Repos:
#>        ||           * https://mrc-ide.github.io/didehpc-pkgs
#>        XX           * https://cloud.r-project.org
#>        XX         Packages:
#>        XX           * context
#>        XX           * dplyr
#>        OO           * ggplot2
#>        `'
#> i Loading metadata database
#> v Loading metadata database ... done
#> i Getting 36 pkgs (27.89 MB) and 1 pkg with unknown size
#> v Got ids 1.0.1 (windows) (123.89 kB)
#> v Got askpass 1.1 (windows) (243.58 kB)
#> v Got digest 0.6.27 (windows) (268.65 kB)
#> v Got R6 2.5.0 (windows) (84.09 kB)
#> v Got context 0.3.0 (source) (37.72 kB)
#> v Got sys 3.4 (windows) (59.83 kB)
#> v Got uuid 0.1-4 (windows) (33.77 kB)
#> v Got storr 1.2.5 (windows) (401.33 kB)
#> v Got crayon 1.4.1 (windows) (141.87 kB)
#> v Got ellipsis 0.3.2 (windows) (49.19 kB)
#> v Got generics 0.1.0 (windows) (70.74 kB)
#> v Got openssl 1.4.4 (windows) (4.10 MB)
#> v Got cli 3.0.1 (windows) (758.73 kB)
#> v Got glue 1.4.2 (windows) (155.50 kB)
#> v Got magrittr 2.0.1 (windows) (234.90 kB)
#> v Got lifecycle 1.0.0 (windows) (111.22 kB)
#> v Got pkgconfig 2.0.3 (windows) (22.31 kB)
#> v Got rlang 0.4.11 (windows) (1.21 MB)
#> v Got purrr 0.3.4 (windows) (430.04 kB)
#> v Got tidyselect 1.1.1 (windows) (204.19 kB)
#> v Got RColorBrewer 1.1-2 (windows) (55.55 kB)
#> v Got pillar 1.6.2 (windows) (1.07 MB)
#> v Got tibble 3.1.3 (windows) (835.59 kB)
#> v Got utf8 1.2.2 (windows) (209.88 kB)
#> v Got dplyr 1.0.7 (windows) (1.35 MB)
#> v Got vctrs 0.3.8 (windows) (1.25 MB)
#> v Got gtable 0.3.0 (windows) (434.23 kB)
#> v Got labeling 0.4.2 (windows) (62.73 kB)
#> v Got munsell 0.5.0 (windows) (245.14 kB)
#> v Got scales 1.1.1 (windows) (558.34 kB)
#> v Got farver 2.1.0 (windows) (1.75 MB)
#> v Got isoband 0.2.5 (windows) (2.73 MB)
#> v Got fansi 0.5.0 (windows) (248.45 kB)
#> v Got colorspace 2.0-2 (windows) (2.65 MB)
#> v Got withr 2.4.2 (windows) (212.63 kB)
#> v Got viridisLite 0.4.0 (windows) (1.30 MB)
#> v Got ggplot2 3.3.5 (windows) (4.13 MB)
#> v Installed R6 2.5.0  (735ms)
#> v Installed crayon 1.4.1  (797ms)
#> v Installed ids 1.0.1  (922ms)
#> v Installed askpass 1.1  (1.3s)
#> v Installed sys 3.4  (1.2s)
#> v Installed digest 0.6.27  (1.5s)
#> v Installed storr 1.2.5  (1.7s)
#> v Installed uuid 0.1-4  (1.4s)
#> v Installed openssl 1.4.4  (2s)
#> i Building context 0.3.0
#> v Installed cli 3.0.1  (6.5s)
#> v Installed dplyr 1.0.7  (7s)
#> v Installed ellipsis 0.3.2  (7.1s)
#> v Installed fansi 0.5.0  (7.1s)
#> v Installed generics 0.1.0  (7.2s)
#> v Installed glue 1.4.2  (7.2s)
#> v Installed lifecycle 1.0.0  (7.2s)
#> v Installed pkgconfig 2.0.3  (1.2s)
#> v Installed purrr 0.3.4  (1.4s)
#> v Installed magrittr 2.0.1  (2s)
#> v Built context 0.3.0 (4.5s)
#> v Installed pillar 1.6.2  (2.3s)
#> v Installed rlang 0.4.11  (2s)
#> v Installed tibble 3.1.3  (2.1s)
#> v Installed tidyselect 1.1.1  (2.2s)
#> v Installed utf8 1.2.2  (1.5s)
#> v Installed RColorBrewer 1.1-2  (1.3s)
#> v Installed vctrs 0.3.8  (1.8s)
#> v Installed context 0.3.0  (1.8s)
#> v Installed farver 2.1.0  (907ms)
#> v Installed gtable 0.3.0  (876ms)
#> v Installed labeling 0.4.2  (797ms)
#> v Installed ggplot2 3.3.5  (1.6s)
#> v Installed munsell 0.5.0  (1.1s)
#> v Installed isoband 0.2.5  (1.6s)
#> v Installed viridisLite 0.4.0  (1.2s)
#> v Installed scales 1.1.1  (1.4s)
#> v Installed withr 2.4.2  (1.3s)
#> v Installed colorspace 2.0-2  (3s)
#> v Summary:   37 new   5 kept  in 1m 38.6s
#> Done!

What happened above was when the queue started up it looked to see what packages were available (none were) and then installed everything needed to run your jobs. That includes the two packages listed above but also all their dependencies and context which didehpc uses to send the jobs back and forth.

All these packages are installed into a special directory within the context root:

dir(file.path(root, "lib/windows", as.character(getRversion()[1, 1:2])))
#>  [1] "R6"           "RColorBrewer" "_cache"       "askpass"      "cli"
#>  [6] "colorspace"   "context"      "crayon"       "digest"       "dplyr"
#> [11] "ellipsis"     "fansi"        "farver"       "generics"     "ggplot2"
#> [16] "glue"         "gtable"       "ids"          "isoband"      "labeling"
#> [21] "lifecycle"    "magrittr"     "munsell"      "openssl"      "pillar"
#> [26] "pkgconfig"    "purrr"        "rlang"        "scales"       "storr"
#> [31] "sys"          "tibble"       "tidyselect"   "utf8"         "uuid"
#> [36] "vctrs"        "viridisLite"  "withr"

Everything in this library will be available to your R jobs when they run.

Everything is available in a CRAN-like repo

We keep many often-used packages in a semi-stable repository (see the mrc-ide drat, the ncov drat and the more experimental R-universe system that is being developed to support this sort of workflow in future).

To tell didehpc to look in one of these repositories when installing, create a conan::conan_sourcs object and list additional repositories as the repos argument, and pass this object in as the package_sources argument to context_save. Here, we add the mrc-ide drat repository and install the dde package; this will use the development version which is often ahead of the CRAN version.

src <- conan::conan_sources(NULL, repos = "https://mrc-ide.github.io/drat/")
ctx <- context::context_save(root, packages = "dde", package_sources = src)
#> [ open:db   ]  rds
#> [ save:id   ]  04dc80e37df65719c6b2ebfd79acfba8
#> [ save:name ]  electrometrical_weasel

Create the library as before, and dde will be installed

obj <- didehpc::queue_didehpc(ctx)
#> Loading context 04dc80e37df65719c6b2ebfd79acfba8
#> [ context   ]  04dc80e37df65719c6b2ebfd79acfba8
#> [ library   ]  dde
#> [ namespace ]
#> [ source    ]
#> Running installation script on cluster
#>   ,:\      /:.
#>  //  \_()_/  \\
#> ||   |    |   ||  CONAN THE LIBRARIAN
#> ||   |    |   ||  Library:   Q:\didehpc\20210817-145020\pkgs\lib\windows\4.0
#> ||   |____|   ||  Bootstrap: T:\conan\bootstrap\4.0
#>  \\  / || \  //   Cache:     Q:\didehpc\20210817-145020\pkgs\conan\cache/pkg
#>   `:/  ||  \;'    Policy:    lazy
#>        ||         Repos:
#>        ||           * https://mrc-ide.github.io/drat/
#>        XX           * https://cloud.r-project.org
#>        XX           * https://mrc-ide.github.io/didehpc-pkgs
#>        XX         Packages:
#>        XX           * dde
#>        OO
#>        `'
#> i Loading metadata database
#> v Loading metadata database ... done
#> i Getting 1 pkg (446.12 kB), 1 cached
#> v Got dde 1.0.3 (source) (180.79 kB)
#> v Got ring 1.0.3 (windows) (446.12 kB)
#> v Installed ring 1.0.3  (563ms)
#> i Building dde 1.0.3
#> v Built dde 1.0.3 (26.6s)
#> v Installed dde 1.0.3  (391ms)
#> v Summary:   2 new   1 kept  in 27.6s
#> Done!

If you want to add your packages to one of these repositories, please talk to Rich. You will need to increase your version number at each change (typically each merge into main/master) for the installation to notice that you have made changes.

Install packages directly from GitHub (or similar)

We use pkgdepends as the engine for installing packages from exotic locations. This is a problem that is slightly more complicated than it seems because the resolution of the dependencies are not always unambiguous, particularly with networks of dependent packages.

The basic idea is this. Suppose we want to install the rfiglet package, which is not on CRAN. We use the “Remotes”-style reference richfitz/rfiglet as an entry to conan_sources so that didehpc knows where to install rfiglet from:

src <- conan::conan_sources("richfitz/rfiglet")
ctx <- context::context_save(root, packages = "rfiglet", package_sources = src)
#> [ open:db   ]  rds
#> [ save:id   ]  e53cee7b36f20b6339f9ce2b92d9f0d8
#> [ save:name ]  enharmonic_nautilus

Note that we still list rfiglet within the packages section of context::context_save as that is what is used to load the package.

If you want to be even more explicit you can use github::richfitz/rfiglet as the reference, and you can add references such as richfitz/rfiglet@d713c1b8 to point at a particular commit, branch or tag.

obj <- didehpc::queue_didehpc(ctx)
#> Loading context e53cee7b36f20b6339f9ce2b92d9f0d8
#> [ context   ]  e53cee7b36f20b6339f9ce2b92d9f0d8
#> [ library   ]  rfiglet
#> [ namespace ]
#> [ source    ]
#> Running installation script on cluster
#>   ,:\      /:.
#>  //  \_()_/  \\
#> ||   |    |   ||  CONAN THE LIBRARIAN
#> ||   |    |   ||  Library:   Q:\didehpc\20210817-145020\pkgs\lib\windows\4.0
#> ||   |____|   ||  Bootstrap: T:\conan\bootstrap\4.0
#>  \\  / || \  //   Cache:     Q:\didehpc\20210817-145020\pkgs\conan\cache/pkg
#>   `:/  ||  \;'    Policy:    lazy
#>        ||         Repos:
#>        ||           * https://cloud.r-project.org
#>        XX           * https://mrc-ide.github.io/didehpc-pkgs
#>        XX         Packages:
#>        XX           * rfiglet
#>        XX           * richfitz/rfiglet
#>        OO
#>        `'
#> ! Using bundled GitHub PAT. Please add your own PAT using `gitcreds::gitcreds_set()`.
#> i No downloads are needed, 1 pkg is cached
#> v Got rfiglet 0.2.0 (source) (144.05 kB)
#> i Packaging rfiglet 0.2.0
#> v Packaged rfiglet 0.2.0 (3.4s)
#> i Building rfiglet 0.2.0
#> v Built rfiglet 0.2.0 (2.7s)
#> v Installed rfiglet 0.2.0 (github::richfitz/rfiglet@d713c1b) (532ms)
#> v Summary:   1 new  in 3.2s
#> Done!

Install private packages

To install a private package, first make a local copy of the package somewhere on your system. Then you need to build a source copy of this package (this will have a file extension of tar.gz).

For example, suppose that the path ~/Documents/src/defer contains a copy of your sources that you want to install, you could write:

path <- pkgbuild::build("~/Documents/src/defer", ".")
#>   
   checking for file ‘/home/rich/Documents/src/defer/DESCRIPTION’ ...
  
✔  checking for file ‘/home/rich/Documents/src/defer/DESCRIPTION’
#> 
  
─  preparing ‘defer’:
#> 
  
   checking DESCRIPTION meta-information ...
  
✔  checking DESCRIPTION meta-information
#> 
  
─  checking for LF line-endings in source and make files and shell scripts
#> 
  
─  checking for empty or unneeded directories
#> ─  building ‘defer_0.1.0.tar.gz’
#>
#>

The second argument (.) is the directory that the built package will be created in. This must be in your working directory. You might find using something like pkgs as a destination helps keeps things tidy. (You may want to use the vignettes = FALSE argument to speed this process up if your package includes slow-to-run vignettes as they will be of no use on the cluster).

file.info(path)
#>                      size isdir mode               mtime               ctime
#> ./defer_0.1.0.tar.gz 3813 FALSE  755 2021-08-17 14:52:34 2021-08-17 14:52:34
#>                                    atime  uid  gid uname grname
#> ./defer_0.1.0.tar.gz 2021-08-17 14:52:34 1000 1000  rich   rich

Then construct your package sources passing in the relative path to your package. We can use the path variable here, or you could write ./defer_0.1.0.tar.gz directly, or something like local::defer_0.1.0.tar.gz. If you have multiple packages you can pass a vector in.

src <- conan::conan_sources(path)
ctx <- context::context_save(root, packages = "defer", package_sources = src)
#> [ open:db   ]  rds
#> [ save:id   ]  b1ee3dfcbfc8e8c455707746f13564cd
#> [ save:name ]  nonpoisonous_vulpesvulpes

when you construct the context, this package will be installed for you

obj <- didehpc::queue_didehpc(ctx)
#> Loading context b1ee3dfcbfc8e8c455707746f13564cd
#> [ context   ]  b1ee3dfcbfc8e8c455707746f13564cd
#> [ library   ]  defer
#> [ namespace ]
#> [ source    ]
#> Running installation script on cluster
#>   ,:\      /:.
#>  //  \_()_/  \\
#> ||   |    |   ||  CONAN THE LIBRARIAN
#> ||   |    |   ||  Library:   Q:\didehpc\20210817-145020\pkgs\lib\windows\4.0
#> ||   |____|   ||  Bootstrap: T:\conan\bootstrap\4.0
#>  \\  / || \  //   Cache:     Q:\didehpc\20210817-145020\pkgs\conan\cache/pkg
#>   `:/  ||  \;'    Policy:    lazy
#>        ||         Repos:
#>        ||           * https://cloud.r-project.org
#>        XX           * https://mrc-ide.github.io/didehpc-pkgs
#>        XX         Packages:
#>        XX           * defer
#>        XX           * local::./defer_0.1.0.tar.gz
#>        OO
#>        `'
#> i No downloads are needed, 1 pkg is cached
#> v Got defer 0.1.0 (source) (3.81 kB)
#> i Building defer 0.1.0
#> v Built defer 0.1.0 (1.8s)
#> v Installed defer 0.1.0 (local) (313ms)
#> v Summary:   1 new  in 2.1s
#> Done!

Troubleshooting package installation

Local copies

You must have local copies of all packages installed (i.e., on the machine that is submitting the jobs). This is because we use some information about the packages to work out what can be run on the cluster. If you see a message like this when creating the queue object:

Loading context d1b3973bef7762b8d4d4ff5cbe090b2c
[ context   ]  d1b3973bef7762b8d4d4ff5cbe090b2c
[ library   ]  rfiglet
Error in library(p, character.only = TRUE) :
  there is no package called ‘rfiglet’

it means that you do not have the package installed locally and you should install it before continuing.

File locking

You cannot upgrade packages while you have cluster jobs running. The reason for this is file locking; any cluster job running has a copy of the package loaded and will prevent deletion. Unfortunately the installation will delete quite a lot of the package before it realises that it is locked, which causes all sorts of problems.

Typically if you hit this you will see a “permission denied” error concerning a dll. Once this has happened you should be prepared for any queued jobs to fail.

To avoid, if upgrading packages, use a new context root.

More control over the process

The package installation may seem a bit magic but you can tame it a little.

When constructing your queue object, you can control how provisioning will occur with the provision argument. The default is to check to see if any packages listed in your context’s packages argument are missing and only then do installation.

If you pass provision = "fake" it will leave your library alone no matter what. Alternatively pass provision = "upgrade" to try and upgrade packages, or provision = "later" to skip this step for now. You can’t submit jobs while your package installation looks incomplete.

If you want to add additional things into the library without running the full provisioning (which might upgrade all sorts of things) you can use the install_packages() method on the object. This ignores the contents of your conan_sources and you pass directly in the pkgdepends-style references; see the pkgdepends documentation for the myriad options here. Examples of usage include:

Install the latest version of a CRAN package

obj$install_packages("data.table")

Install a GitHub package

obj$install_packages("richfitz/stegasaur")

Install some local package from a .tar.gz file

obj$install_packages("local::mypkg_0.1.2.tar.gz")

You can possibly use this interface (along with provision = "fake") to manipulate your package installation fairly flexibly.

Installation failure / the wrong versions have been selected

It is possible to end up in a situation where pkgdepends can’t resolve your dependencies, or where in resolving dependencies an unwanted version of a package was installed. Please let Rich know with enough detail for him to reproduce the example himself: