odin and dust

MRC Centre for Global Infectious Disease Analysis

odin

  • A “domain specific language”
  • Originally designed for ordinary differential equations
  • Some very basic discrete time/stochastic support

An example

deriv(S) <- -beta * S * I / N
deriv(I) <- beta * S * I / N - sigma * I
deriv(R) <- sigma * I

initial(S) <- N - I0
initial(I) <- I0
initial(R) <- 0

N <- user(1e6)
I0 <- user(1)
beta <- user(4)
sigma <- user(2)

\[\begin{gather*} \frac{dS}{dt} = -\beta S \frac{I}{N}\\ \frac{dI}{dt} = \beta S \frac{I}{N} - \sigma I\\ \frac{dR}{dt} = \sigma I \end{gather*}\]

Things to note:

  • out of order definition
  • every variable has initial and deriv pair

Compiling the model

sir <- odin::odin({
  deriv(S) <- -beta * S * I / N
  deriv(I) <- beta * S * I / N - sigma * I
  deriv(R) <- sigma * I
  initial(S) <- N - I0
  initial(I) <- I0
  initial(R) <- 0
  N <- user(1e6)
  I0 <- user(1)
  beta <- user(4)
  sigma <- user(2)
})

Running the model

mod <- gen$new()
t <- seq(0, 10, length.out = 501)
y <- mod$run(t)
plot(I ~ t, as.data.frame(y), type = "l")

Some comments

  • Requires the odin package, along with pkgbuild and pkgload
  • Requires a working C compiler
  • Sort of works for discrete time and stochastic models
  • You’re on your own once you have the model - what you do with it is up to you

Stochastic models

update(S) <- S - n_SI
update(I) <- I + n_SI - n_IR
update(R) <- R + n_IR

n_SI <- rbinom(S, 1 - exp(-beta * I / N))
n_IR <- rbinom(I, 1 - exp(-sigma))

initial(S) <- N - I0
initial(I) <- I0
initial(R) <- 0

N <- user(1e6)
I0 <- user(1)
beta <- user(4)
sigma <- user(2)

…compared with ODE models

update(S) <- S - n_SI
update(I) <- I + n_SI - n_IR
update(R) <- R + n_IR

n_SI <- rbinom(S, 1 - exp(-beta * I / N))
n_IR <- rbinom(I, 1 - exp(-sigma))

initial(S) <- N - I0
initial(I) <- I0
initial(R) <- 0

N <- user(1e6)
I0 <- user(1)
beta <- user(4)
sigma <- user(2)
deriv(S) <- -beta * S * I / N
deriv(I) <- beta * S * I / N - sigma * I
deriv(R) <- sigma * I

initial(S) <- N - I0
initial(I) <- I0
initial(R) <- 0

N <- user(1e6)
I0 <- user(1)
beta <- user(4)
sigma <- user(2)

Compiling with odin

gen <- odin::odin("models/sir_stochastic.R")

Compiling with odin.dust

gen <- odin.dust::odin_dust("models/sir_stochastic.R")

Running with odin.dust

mod <- gen$new(list(), time = 0, n_particles = 1)
mod$run(10)

odin vs odin.dust:

  • no use of output()
  • no use of interpolate() (we might restore this later)
  • no use of delay()
  • not all stochastic distributions supported; just tell us if one you need is missing
  • the interface for working with the models is totally different

Details: https://mrc-ide.github.io/odin.dust/articles/porting.html

Current supported distributions for random draws

  • uniform: runif(min, max)
  • normal: rnorm(mean, sd)
  • hypergeometric: rhyper(m, n, k)
  • poisson: rpois(lambda)
  • binomial: rbinom(n, p)
  • gamma: rgamma(shape, scale)
  • negative binomial: rnbinom(size, prob)
  • exponential: rexp(rate)

Epi: Models with arrays

update(S) <- S - n_SI
update(I) <- I + n_SI - n_IR
update(R) <- R + n_IR
update(S[]) <- S[i] - n_SI[i]
update(I[]) <- I[i] + n_SI[i] - n_IR[i]
update(R[]) <- R[i] + n_IR[i]

How it works

The odin code

update(S[]) <- S[i] - n_SI[i]

becomes (approximately)

for (int i = 0; i < S_length; ++i) {
  update_S[i] = S[i] + n_SI[i];
}

Syntax

  • Don’t use index variables on the left hand side
  • Can use multiple lines for boundary conditions
  • Can crash the program if out of bounds

Relevant changes

m[, ] <- user() # age-structured contact matrix
s_ij[, ] <- m[i, j] * I[j]
lambda[] <- beta * sum(s_ij[i, ])
p_SI[] <- 1 - exp(-lambda[i] * dt)
update(S[]) <- S[i] - n_SI[i]
N_age <- user()
dim(S) <- N_age
dim(m) <- c(N_age, N_age)

You must declare the dimensions of all arrays!

Running in parallel

  • Reduce walltime
  • …but probably increase CPU time
  • Requires that you run multiple “particles” at once

System requirement: OpenMP

dust::dust_openmp_support()
$num_procs
[1] 20

$max_threads
[1] 20

$thread_limit
[1] 2147483647

$openmp_version
[1] 201511

$has_openmp
[1] TRUE

$mc.cores
[1] NA

$OMP_THREAD_LIMIT
[1] NA

$OMP_NUM_THREADS
[1] NA

$MC_CORES
[1] 10

$limit_r
[1] 10

$limit_openmp
[1] 20

$limit
[1] 10
  • Linux, Windows: works out the box - including on the cluster
  • macOS: possible but annoying

Running in parallel does not change results

sir <- dust::dust_example("sir")
sir$new(list(), 0, n_particles = 128, n_threads = 16, seed = 1)$run(10)
sir$new(list(), 0, n_particles = 128, n_threads = 1, seed = 1)$run(10)

Parallelisation strategy

Always parallelise at the coarsest level first

  • Same analysis independently on 10 regions - send each to cluster separately
  • MCMC chains within analysis - run each on a separate process
  • Within each chain, parallelise at the particle level

Don’t use mclapply or parLapply, etc.

Epi: Adding more dimensions

  • Automatic indexing with i, j, k
    • then l, i5, i6, i7, i8 for dimensions 5 – 8
  • Index comes from the loop implied by the lhs

Adding vaccination to the model

  • One approach to modelling vaccination, susceptibles only:
  • Nested binomial draw for vaccination in S
  • Assume you cannot move vaccine class and get infected in same step

Relevant changes

rel_susceptibility[, ] <- user()
p_vacc[, ] <- 1 - exp(-eta[i, j] * dt)
n_S_vacc[, ] <- rbinom(S[i, j] - n_SI[i, j], p_vacc[i, j])
p_SI[, ] <- 1 - exp(-rel_susceptibility[i, j] * lambda[i] * dt) # S to I
new_S[, ] <- S[i, j] - n_SI[i, j] - n_S_vacc[i, j] +
  (if (j == 1) n_S_vacc[i, N_vacc_classes] else n_S_vacc[i, j - 1])
update(S[, ]) <- new_S[i, j]

Packaging your models

  • Easier to distribute
  • Bundle together model and support code
  • Much faster startup time

Create a package

  1. Basic skeleton using usethis::create_r_package("mymodel")
  2. Add DSL code to inst/odin
  3. Edit DESCRIPTION:
    • Add cpp11 and dust to section LinkingTo
    • Add dust to Imports
    • Add SystemRequirements: C++11
  4. Add #' @useDynLib mymodel, .registration = TRUE somewhere (e.g., R/zzz.R)
  5. Run odin.dust::odin_dust_package() to generate files
  6. Run devtools::document() to update NAMESPACE
  7. Run pkgload::load_all() to compile and load

Update a package

  1. Edit DSL code in inst/odin
  2. odin.dust::odin_dust_package() to generate files
  3. Run pkgload::load_all() to compile and load

Next steps

  • Add wrapper functions to generate parameters, process output etc
  • Write unit tests to keep things working
  • Set up GitHub Actions to run tests automatically
  • Create a nice website with pkgdown

Details: https://r-pkgs.org/

Epi: Time varying parameters

Use step to index

beta_step[] <- user()
dim(beta_step) <- user()
beta <- if (step >= length(beta_step))
  beta_step[length(beta_step)] else beta_step[step + 1]

Massively parallel: GPUs

  • CPU: do anything, but only a few at once
  • GPU: do a massive number of things, but all the same

Requirements

  • An NVIDIA GPU - at least a 20xx series (~2018 or later)
  • All the nvcc toolchain (this is annoying to install)

Workflow

  • Recompile the model code again, changing real type
  • Initialise model specifying which gpu to use
  • Benchmark with NVIDIA’s tools (nsight compute etc)
gpu <- dust::dust_cuda_options(fast_math = TRUE)
gen <- odin.dust::odin_dust_(path, real_type = "float", gpu = gpu)
mod <- gen$new(list(), 0, 65536, gpu_config = 0L)
mod$run(100) # runs on GPU!

Expect to run tens of thousands of particles or more, and have a strategy for working with this much data!

Epi: Erlang distributions

k_I <- user(integer = TRUE)
p_I_progress <- 1 - exp(-gamma * dt) # I to R
n_I_progress[, , ] <- rbinom(I[i, j, k], p_I_progress)
new_I[, , ] <- I[i, j, k] - n_I_progress[i, j, k] +
  (if (k == 1) n_SI[i, j] else n_I_progress[i, j, k - 1])

update(I[, , ]) <- new_I[i, j, k]
update(R[, ]) <- R[i, j] + n_I_progress[i, j, k_I]
dim(I) <- c(N_age, N_vacc_classes, k_I)

Epi: daily incidence calculation

steps_per_day <- user(integer = TRUE)
dt <- 1 / steps_per_day

# Cumulative infections
update(infections_tot) <- infections_tot + sum(n_SI)
initial(infections_tot) <- 0

# Daily infections incidence
update(infections_inc) <- if (step %% steps_per_day == 0)
  sum(n_SI) else infections_inc + sum(n_SI)
initial(infections_inc) <- 0

Advanced topics

  • debugging with gdb and valgrind
  • profiling and optimising gpu use
  • multiple parameter sets at once
  • deterministic models from stochastic ones
  • mixed ODE/stochastic models

Resources