Often the most difficult part of configuring your cluster jobs is sorting out all the packages that you need and making sure that they are present on the cluster. There are several levels of difficulty here and this document will walk through them in turn.
This is the most straightforward situation - all your packages are on CRAN. You don’t need to do anything special typically, just create your context with a list of packages and create the queue:
root <- "pkgs"
ctx <- context::context_save(root, packages = c("dplyr", "ggplot2"))
#> [ init:id ] a11d9597170a8cc72cc5b57c3ac3d7a0
#> [ init:db ] rds
#> [ init:path ] pkgs
#> [ save:id ] 2ab2a78d26df3a50b9beb45b52eba466
#> [ save:name ] seaisland_mammal
obj <- didehpc::queue_didehpc(ctx)
#> Loading context 2ab2a78d26df3a50b9beb45b52eba466
#> [ context ] 2ab2a78d26df3a50b9beb45b52eba466
#> [ library ] dplyr, ggplot2
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#> [ namespace ]
#> [ source ]
#> Running installation script on cluster
#> ,:\ /:.
#> // \_()_/ \\
#> || | | || CONAN THE LIBRARIAN
#> || | | || Library: Q:\didehpc\20210817-145020\pkgs\lib\windows\4.0
#> || |____| || Bootstrap: T:\conan\bootstrap\4.0
#> \\ / || \ // Cache: Q:\didehpc\20210817-145020\pkgs\conan\cache/pkg
#> `:/ || \;' Policy: lazy
#> || Repos:
#> || * https://mrc-ide.github.io/didehpc-pkgs
#> XX * https://cloud.r-project.org
#> XX Packages:
#> XX * context
#> XX * dplyr
#> OO * ggplot2
#> `'
#> i Loading metadata database
#> v Loading metadata database ... done
#> i Getting 36 pkgs (27.89 MB) and 1 pkg with unknown size
#> v Got ids 1.0.1 (windows) (123.89 kB)
#> v Got askpass 1.1 (windows) (243.58 kB)
#> v Got digest 0.6.27 (windows) (268.65 kB)
#> v Got R6 2.5.0 (windows) (84.09 kB)
#> v Got context 0.3.0 (source) (37.72 kB)
#> v Got sys 3.4 (windows) (59.83 kB)
#> v Got uuid 0.1-4 (windows) (33.77 kB)
#> v Got storr 1.2.5 (windows) (401.33 kB)
#> v Got crayon 1.4.1 (windows) (141.87 kB)
#> v Got ellipsis 0.3.2 (windows) (49.19 kB)
#> v Got generics 0.1.0 (windows) (70.74 kB)
#> v Got openssl 1.4.4 (windows) (4.10 MB)
#> v Got cli 3.0.1 (windows) (758.73 kB)
#> v Got glue 1.4.2 (windows) (155.50 kB)
#> v Got magrittr 2.0.1 (windows) (234.90 kB)
#> v Got lifecycle 1.0.0 (windows) (111.22 kB)
#> v Got pkgconfig 2.0.3 (windows) (22.31 kB)
#> v Got rlang 0.4.11 (windows) (1.21 MB)
#> v Got purrr 0.3.4 (windows) (430.04 kB)
#> v Got tidyselect 1.1.1 (windows) (204.19 kB)
#> v Got RColorBrewer 1.1-2 (windows) (55.55 kB)
#> v Got pillar 1.6.2 (windows) (1.07 MB)
#> v Got tibble 3.1.3 (windows) (835.59 kB)
#> v Got utf8 1.2.2 (windows) (209.88 kB)
#> v Got dplyr 1.0.7 (windows) (1.35 MB)
#> v Got vctrs 0.3.8 (windows) (1.25 MB)
#> v Got gtable 0.3.0 (windows) (434.23 kB)
#> v Got labeling 0.4.2 (windows) (62.73 kB)
#> v Got munsell 0.5.0 (windows) (245.14 kB)
#> v Got scales 1.1.1 (windows) (558.34 kB)
#> v Got farver 2.1.0 (windows) (1.75 MB)
#> v Got isoband 0.2.5 (windows) (2.73 MB)
#> v Got fansi 0.5.0 (windows) (248.45 kB)
#> v Got colorspace 2.0-2 (windows) (2.65 MB)
#> v Got withr 2.4.2 (windows) (212.63 kB)
#> v Got viridisLite 0.4.0 (windows) (1.30 MB)
#> v Got ggplot2 3.3.5 (windows) (4.13 MB)
#> v Installed R6 2.5.0 (735ms)
#> v Installed crayon 1.4.1 (797ms)
#> v Installed ids 1.0.1 (922ms)
#> v Installed askpass 1.1 (1.3s)
#> v Installed sys 3.4 (1.2s)
#> v Installed digest 0.6.27 (1.5s)
#> v Installed storr 1.2.5 (1.7s)
#> v Installed uuid 0.1-4 (1.4s)
#> v Installed openssl 1.4.4 (2s)
#> i Building context 0.3.0
#> v Installed cli 3.0.1 (6.5s)
#> v Installed dplyr 1.0.7 (7s)
#> v Installed ellipsis 0.3.2 (7.1s)
#> v Installed fansi 0.5.0 (7.1s)
#> v Installed generics 0.1.0 (7.2s)
#> v Installed glue 1.4.2 (7.2s)
#> v Installed lifecycle 1.0.0 (7.2s)
#> v Installed pkgconfig 2.0.3 (1.2s)
#> v Installed purrr 0.3.4 (1.4s)
#> v Installed magrittr 2.0.1 (2s)
#> v Built context 0.3.0 (4.5s)
#> v Installed pillar 1.6.2 (2.3s)
#> v Installed rlang 0.4.11 (2s)
#> v Installed tibble 3.1.3 (2.1s)
#> v Installed tidyselect 1.1.1 (2.2s)
#> v Installed utf8 1.2.2 (1.5s)
#> v Installed RColorBrewer 1.1-2 (1.3s)
#> v Installed vctrs 0.3.8 (1.8s)
#> v Installed context 0.3.0 (1.8s)
#> v Installed farver 2.1.0 (907ms)
#> v Installed gtable 0.3.0 (876ms)
#> v Installed labeling 0.4.2 (797ms)
#> v Installed ggplot2 3.3.5 (1.6s)
#> v Installed munsell 0.5.0 (1.1s)
#> v Installed isoband 0.2.5 (1.6s)
#> v Installed viridisLite 0.4.0 (1.2s)
#> v Installed scales 1.1.1 (1.4s)
#> v Installed withr 2.4.2 (1.3s)
#> v Installed colorspace 2.0-2 (3s)
#> v Summary: 37 new 5 kept in 1m 38.6s
#> Done!
What happened above was when the queue started up it looked to see
what packages were available (none were) and then installed everything
needed to run your jobs. That includes the two packages listed above but
also all their dependencies and context
which
didehpc
uses to send the jobs back and forth.
All these packages are installed into a special directory within the context root:
dir(file.path(root, "lib/windows", as.character(getRversion()[1, 1:2])))
#> [1] "R6" "RColorBrewer" "_cache" "askpass" "cli"
#> [6] "colorspace" "context" "crayon" "digest" "dplyr"
#> [11] "ellipsis" "fansi" "farver" "generics" "ggplot2"
#> [16] "glue" "gtable" "ids" "isoband" "labeling"
#> [21] "lifecycle" "magrittr" "munsell" "openssl" "pillar"
#> [26] "pkgconfig" "purrr" "rlang" "scales" "storr"
#> [31] "sys" "tibble" "tidyselect" "utf8" "uuid"
#> [36] "vctrs" "viridisLite" "withr"
Everything in this library will be available to your R jobs when they run.
We keep many often-used packages in a semi-stable repository (see the mrc-ide drat, the ncov drat and the more experimental R-universe system that is being developed to support this sort of workflow in future).
To tell didehpc
to look in one of these repositories
when installing, create a conan::conan_sourcs
object and
list additional repositories as the repos
argument, and
pass this object in as the package_sources
argument to
context_save
. Here, we add the mrc-ide drat repository and
install the dde
package; this will use the development
version which is often ahead of the CRAN version.
src <- conan::conan_sources(NULL, repos = "https://mrc-ide.github.io/drat/")
ctx <- context::context_save(root, packages = "dde", package_sources = src)
#> [ open:db ] rds
#> [ save:id ] 04dc80e37df65719c6b2ebfd79acfba8
#> [ save:name ] electrometrical_weasel
Create the library as before, and dde
will be
installed
obj <- didehpc::queue_didehpc(ctx)
#> Loading context 04dc80e37df65719c6b2ebfd79acfba8
#> [ context ] 04dc80e37df65719c6b2ebfd79acfba8
#> [ library ] dde
#> [ namespace ]
#> [ source ]
#> Running installation script on cluster
#> ,:\ /:.
#> // \_()_/ \\
#> || | | || CONAN THE LIBRARIAN
#> || | | || Library: Q:\didehpc\20210817-145020\pkgs\lib\windows\4.0
#> || |____| || Bootstrap: T:\conan\bootstrap\4.0
#> \\ / || \ // Cache: Q:\didehpc\20210817-145020\pkgs\conan\cache/pkg
#> `:/ || \;' Policy: lazy
#> || Repos:
#> || * https://mrc-ide.github.io/drat/
#> XX * https://cloud.r-project.org
#> XX * https://mrc-ide.github.io/didehpc-pkgs
#> XX Packages:
#> XX * dde
#> OO
#> `'
#> i Loading metadata database
#> v Loading metadata database ... done
#> i Getting 1 pkg (446.12 kB), 1 cached
#> v Got dde 1.0.3 (source) (180.79 kB)
#> v Got ring 1.0.3 (windows) (446.12 kB)
#> v Installed ring 1.0.3 (563ms)
#> i Building dde 1.0.3
#> v Built dde 1.0.3 (26.6s)
#> v Installed dde 1.0.3 (391ms)
#> v Summary: 2 new 1 kept in 27.6s
#> Done!
If you want to add your packages to one of these repositories, please talk to Rich. You will need to increase your version number at each change (typically each merge into main/master) for the installation to notice that you have made changes.
We use pkgdepends
as the engine for installing packages from exotic locations. This is a
problem that is slightly more complicated than it seems because the
resolution of the dependencies are not always unambiguous, particularly
with networks of dependent packages.
The basic idea is this. Suppose we want to install the rfiglet
package, which is not on CRAN. We use the “Remotes”-style reference
richfitz/rfiglet
as an entry to conan_sources
so that didehpc
knows where to install rfiglet
from:
src <- conan::conan_sources("richfitz/rfiglet")
ctx <- context::context_save(root, packages = "rfiglet", package_sources = src)
#> [ open:db ] rds
#> [ save:id ] e53cee7b36f20b6339f9ce2b92d9f0d8
#> [ save:name ] enharmonic_nautilus
Note that we still list rfiglet
within the
packages
section of context::context_save
as
that is what is used to load the package.
If you want to be even more explicit you can use
github::richfitz/rfiglet
as the reference, and you can add
references such as richfitz/rfiglet@d713c1b8
to point at a
particular commit, branch or tag.
obj <- didehpc::queue_didehpc(ctx)
#> Loading context e53cee7b36f20b6339f9ce2b92d9f0d8
#> [ context ] e53cee7b36f20b6339f9ce2b92d9f0d8
#> [ library ] rfiglet
#> [ namespace ]
#> [ source ]
#> Running installation script on cluster
#> ,:\ /:.
#> // \_()_/ \\
#> || | | || CONAN THE LIBRARIAN
#> || | | || Library: Q:\didehpc\20210817-145020\pkgs\lib\windows\4.0
#> || |____| || Bootstrap: T:\conan\bootstrap\4.0
#> \\ / || \ // Cache: Q:\didehpc\20210817-145020\pkgs\conan\cache/pkg
#> `:/ || \;' Policy: lazy
#> || Repos:
#> || * https://cloud.r-project.org
#> XX * https://mrc-ide.github.io/didehpc-pkgs
#> XX Packages:
#> XX * rfiglet
#> XX * richfitz/rfiglet
#> OO
#> `'
#> ! Using bundled GitHub PAT. Please add your own PAT using `gitcreds::gitcreds_set()`.
#> i No downloads are needed, 1 pkg is cached
#> v Got rfiglet 0.2.0 (source) (144.05 kB)
#> i Packaging rfiglet 0.2.0
#> v Packaged rfiglet 0.2.0 (3.4s)
#> i Building rfiglet 0.2.0
#> v Built rfiglet 0.2.0 (2.7s)
#> v Installed rfiglet 0.2.0 (github::richfitz/rfiglet@d713c1b) (532ms)
#> v Summary: 1 new in 3.2s
#> Done!
To install a private package, first make a local copy of the package
somewhere on your system. Then you need to build a source copy
of this package (this will have a file extension of
tar.gz
).
For example, suppose that the path ~/Documents/src/defer
contains a copy of your sources that you want to install, you could
write:
<- pkgbuild::build("~/Documents/src/defer", ".")
path #>
for file ‘/home/rich/Documents/src/defer/DESCRIPTION’ ...
checking
for file ‘/home/rich/Documents/src/defer/DESCRIPTION’
✔ checking #>
:
─ preparing ‘defer’#>
-information ...
checking DESCRIPTION meta
-information
✔ checking DESCRIPTION meta#>
for LF line-endings in source and make files and shell scripts
─ checking #>
for empty or unneeded directories
─ checking #> ─ building ‘defer_0.1.0.tar.gz’
#>
#>
The second argument (.
) is the directory that the built
package will be created in. This must be in your working directory. You
might find using something like pkgs
as a destination helps
keeps things tidy. (You may want to use the
vignettes = FALSE
argument to speed this process up if your
package includes slow-to-run vignettes as they will be of no use on the
cluster).
file.info(path)
#> size isdir mode mtime ctime
#> ./defer_0.1.0.tar.gz 3813 FALSE 755 2021-08-17 14:52:34 2021-08-17 14:52:34
#> atime uid gid uname grname
#> ./defer_0.1.0.tar.gz 2021-08-17 14:52:34 1000 1000 rich rich
Then construct your package sources passing in the
relative path to your package. We can use the
path
variable here, or you could write ./defer_0.1.0.tar.gz
directly, or something like local::defer_0.1.0.tar.gz. If you have
multiple packages you can pass a vector in.
src <- conan::conan_sources(path)
ctx <- context::context_save(root, packages = "defer", package_sources = src)
#> [ open:db ] rds
#> [ save:id ] b1ee3dfcbfc8e8c455707746f13564cd
#> [ save:name ] nonpoisonous_vulpesvulpes
when you construct the context, this package will be installed for you
obj <- didehpc::queue_didehpc(ctx)
#> Loading context b1ee3dfcbfc8e8c455707746f13564cd
#> [ context ] b1ee3dfcbfc8e8c455707746f13564cd
#> [ library ] defer
#> [ namespace ]
#> [ source ]
#> Running installation script on cluster
#> ,:\ /:.
#> // \_()_/ \\
#> || | | || CONAN THE LIBRARIAN
#> || | | || Library: Q:\didehpc\20210817-145020\pkgs\lib\windows\4.0
#> || |____| || Bootstrap: T:\conan\bootstrap\4.0
#> \\ / || \ // Cache: Q:\didehpc\20210817-145020\pkgs\conan\cache/pkg
#> `:/ || \;' Policy: lazy
#> || Repos:
#> || * https://cloud.r-project.org
#> XX * https://mrc-ide.github.io/didehpc-pkgs
#> XX Packages:
#> XX * defer
#> XX * local::./defer_0.1.0.tar.gz
#> OO
#> `'
#> i No downloads are needed, 1 pkg is cached
#> v Got defer 0.1.0 (source) (3.81 kB)
#> i Building defer 0.1.0
#> v Built defer 0.1.0 (1.8s)
#> v Installed defer 0.1.0 (local) (313ms)
#> v Summary: 1 new in 2.1s
#> Done!
You must have local copies of all packages installed (i.e., on the machine that is submitting the jobs). This is because we use some information about the packages to work out what can be run on the cluster. If you see a message like this when creating the queue object:
Loading context d1b3973bef7762b8d4d4ff5cbe090b2c
[ context ] d1b3973bef7762b8d4d4ff5cbe090b2c
[ library ] rfiglet
Error in library(p, character.only = TRUE) :
there is no package called ‘rfiglet’
it means that you do not have the package installed locally and you should install it before continuing.
You cannot upgrade packages while you have cluster jobs running. The reason for this is file locking; any cluster job running has a copy of the package loaded and will prevent deletion. Unfortunately the installation will delete quite a lot of the package before it realises that it is locked, which causes all sorts of problems.
Typically if you hit this you will see a “permission denied” error concerning a dll. Once this has happened you should be prepared for any queued jobs to fail.
To avoid, if upgrading packages, use a new context root.
The package installation may seem a bit magic but you can tame it a little.
When constructing your queue object, you can control how provisioning
will occur with the provision
argument. The default is to
check to see if any packages listed in your context’s
packages
argument are missing and only then do
installation.
If you pass provision = "fake"
it will leave your
library alone no matter what. Alternatively pass
provision = "upgrade"
to try and upgrade packages, or
provision = "later"
to skip this step for now. You can’t
submit jobs while your package installation looks incomplete.
If you want to add additional things into the library without running
the full provisioning (which might upgrade all sorts of things) you can
use the install_packages()
method on the object. This
ignores the contents of your conan_sources
and you pass
directly in the pkgdepends
-style references; see the
pkgdepends
documentation for the myriad options here.
Examples of usage include:
Install the latest version of a CRAN package
obj$install_packages("data.table")
Install a GitHub package
obj$install_packages("richfitz/stegasaur")
Install some local package from a .tar.gz
file
obj$install_packages("local::mypkg_0.1.2.tar.gz")
You can possibly use this interface (along with
provision = "fake"
) to manipulate your package installation
fairly flexibly.
It is possible to end up in a situation where pkgdepends
can’t resolve your dependencies, or where in resolving dependencies an
unwanted version of a package was installed. Please let Rich know with
enough detail for him to reproduce the example himself:
didehpc::queue_didehpc(...)
covering things like
context::context_save()
and
conan::conan_sources()
.tar.gz
files that you are
using