Using cyphr
We assume that you have followed the “data
encryption” vignette in cyphr
and that you are running
from the same directory that it is set up for. (You may or may not be
using orderly with this, the procedure is the same either way).
We’ll quickly set up the same basic structure in this vignette.
First, we configure hipercow at this directory:
library(hipercow)
hipercow_init()
#> ✔ Initialised hipercow at '.' (/home/rfitzjoh/net/home/cluster/hipercow-vignette/hv-20250912-187f90285932a2)
#> ℹ Next, call 'hipercow_configure()'
hipercow_configure(driver = "dide-windows")
#> ✔ Configured hipercow to use 'dide-windows'
We assume that you have ssh keys that work for your
machine from your previous use of cyphr
and the
vignette already has these working. These will have a password, but for
the purposes of this vignette you will not see the password prompt
here.
cyphr::data_admin_init(".")
#> Generating data key
#> Authorising ourselves
#> Adding key 78:59:4c:6c:61:a1:9e:d8:c8:02:22:49:33:c9:8e:db:b3:d1:c3:65:9b:1d:76:5b:50:e5:36:2c:da:52:1e:f8
#> user: rfitzjoh
#> host: wpia-dide300
#> date: 2025-09-12 13:41:53.50033
#> Verifying
We can locally work with this key just fine to encrypt some small
secret data set, x
x <- runif(10)
What we want to do is to write a cluster job that can use this secret data.
Firstly, we need to install cyphr
into the library that
will run on the cluster. You could also do this by adding
cyphr
to your pkgdepends.txt
:
hipercow_provision(method = "pkgdepends", refs = "cyphr")
#> ℹ Looking for active tasks before installation
#> ✔ No tasks running
#> /`-'\ _______ ___ ___ ____
#> \,T./ / __/ _ \/ _ \/ _ `/ _ \
#> | \__/\___/_//_/\_,_/_//_/
#> | ---- THE LIBRARIAN ----
#>
#> Bootstrapping from: I:/bootstrap-windows/4.4.2
#> Installing into library: hipercow/lib/windows/4.4.2
#> Using method pkgdepends
#> Running in path: V:/cluster/hipercow-vignette/hv-20250912-187f90285932a2
#> Library paths:
#> - V:/cluster/hipercow-vignette/hv-20250912-187f90285932a2/hipercow/lib/windows/4.4.2
#> - C:/Program Files/R/R-4.4.2/library
#> id: 20250912124156
#> Logs from pkgdepends follow:
#>
#> -------------------------------------------------------------------------------
#>
#>
#> ── repos
#> • https://cloud.r-project.org
#>
#> ── refs
#> • cyphr
#> ✔ Updated metadata database: 7.83 MB in 6 files.
#>
#> ℹ Updating metadata database
#> ✔ Updating metadata database ... done
#>
#> + askpass 1.2.1 [dl]
#> + cyphr 1.1.7 [dl]
#> + getPass 0.2-4 [dl]
#> + openssl 2.3.3 [dl]
#> + rstudioapi 0.17.1 [dl]
#> + sodium 1.4.0 [dl]
#> + sys 3.4.3 [dl]
#> ℹ Getting 7 pkgs with unknown sizes
#> ✔ Got askpass 1.2.1 (x86_64-w64-mingw32) (74.80 kB)
#> ✔ Got rstudioapi 0.17.1 (i386+x86_64-w64-mingw32) (346.85 kB)
#> ✔ Got sys 3.4.3 (x86_64-w64-mingw32) (47.98 kB)
#> ✔ Got openssl 2.3.3 (x86_64-w64-mingw32) (3.47 MB)
#> ✔ Got sodium 1.4.0 (x86_64-w64-mingw32) (660.91 kB)
#> ✔ Got cyphr 1.1.7 (i386+x86_64-w64-mingw32) (400.78 kB)
#> ✔ Got getPass 0.2-4 (x86_64-w64-mingw32) (493.58 kB)
#> ✔ Installed askpass 1.2.1 (582ms)
#> ✔ Installed cyphr 1.1.7 (647ms)
#> ✔ Installed getPass 0.2-4 (755ms)
#> ✔ Installed sys 3.4.3 (707ms)
#> ✔ Installed sodium 1.4.0 (894ms)
#> ✔ Installed rstudioapi 0.17.1 (1.1s)
#> ✔ Installed openssl 2.3.3 (1.3s)
#> ✔ Summary: 7 new in 5.9s
#>
#> -------------------------------------------------------------------------------
#> Writing library description to 'hipercow/lib/windows/4.4.2/.conan/20250912124156'
#> Done!
#> ✔ Installation script finished successfully in 20.39 secs
We might then naively try and run a task that reads the encrypted data:
t <- task_create_expr(
cyphr::decrypt(readRDS("data.rds"), cyphr::data_key()))
#> ✔ Submitted task '20bee48990a9104d3cd497a02b80cdf2' using 'dide-windows'
This will fail:
task_result(t)
#> <simpleError: Did not find default ssh public key at '~/.ssh/id_rsa.pub'
#> You can create a key here by running
#> cyphr::ssh_keygen("~/.ssh/id_rsa.pub")>
The reason why it fails is because the cluster cannot see the key
that we are using to work with cyphr; the ~/.ssh/id_rsa.pub
referred to here would be on the cluster node that ran the job and not
the key that exists on your machine.
What we can do is to create another key that will be found in your network home directory:
dide_generate_keypair()
#> ✔ Created new private key at '/home/rfitzjoh/net/home/.hipercow/key'
#> ✔ Created new public key at '/home/rfitzjoh/net/home/.hipercow/key.pub'
#> ✔ Saved public key into your keychain
We need to authorise this key with cyphr, which we can do by running a short cluster job:
t <- task_create_expr(cyphr::data_request_access())
#> ✔ Submitted task 'd4163d6ff3e2dafafd98a26116bd8352' using 'dide-windows'
Once that has run, we will see the request for access, which is actually us but from the cluster:
cyphr::data_admin_list_requests()
#> 1 key:
#> 51:40:32:58:11:4a:05:df:c3:7a:27:2b:25:17:6d:53:66:41:f0:f0:48:d6:2b:d6:23:b1:39:7a:f9:5f:58:66
#> user: rfitzjoh
#> host: WPIA-054
#> date: 2025-09-12 13:42:21.019979
We can authorise ourselves:
cyphr::data_admin_authorise(yes = TRUE)
#> There is 1 request for access
#> Adding key 51:40:32:58:11:4a:05:df:c3:7a:27:2b:25:17:6d:53:66:41:f0:f0:48:d6:2b:d6:23:b1:39:7a:f9:5f:58:66
#> user: rfitzjoh
#> host: WPIA-054
#> date: 2025-09-12 13:42:21.019979
#> Added 1 key
#> If you are using git, you will need to commit and push:
#>
#> git add .cyphr
#> git commit -m "Authorised rfitzjoh"
#> git push
And now we can run our jobs that require encrypted data
t <- task_create_expr(
cyphr::decrypt(readRDS("data.rds"), cyphr::data_key()))
#> ✔ Submitted task 'da8170adcb81ff1b69af55ecc146b74e' using 'dide-windows'
This time we can read the data!
task_result(t)
#> [1] 0.05838293 0.41034069 0.22518724 0.11884682 0.28322052 0.82149523
#> [7] 0.41899393 0.16678808 0.79077363 0.81735793