mcstate

MRC Centre for Global Infectious Disease Analysis

State space models

What is it?

  • A state space model (SSM) is a mathematical framework for modelling a dynamical system.
  • It is built around two processes:
    • state equations that describes the evolution of some latent variables (also referred as “hidden” states) over time
    • observation equations that relates the observations to the latent variables.

Can you be more precise?

  • \(x_{t, 1 \leq t \leq T}\) the hidden states of the system
  • \(y_{t, 1 \leq t \leq T}\) the observations
  • \(f_{\theta}\) the state transition function
  • \(g_{\theta}\) the observation function
  • \(t\) is often time
  • \(\theta\) defines the model

Two common problems

  • Two common needs
    • “Filtering” i.e. estimate the hidden states \(x_{t}\) from the observations \(y_t\)
    • “Inference” i.e. estimate the \(\theta\)’s compatible with the observations \(y_{t}\)

(Bootstrap) Sequential Monte Carlo

AKA, the particle filter

  • Assuming a given \(\theta\), at each time step \(t\), BSSMC:

    1. generates \(X_{t+1}^N\) by using \(f_{\theta}(X_{t+1}^N|X_{t}^N)\) (the \(N\) particles)
    2. calculates weights for the newly generated states based on \(g_{\theta}(Y_{t+1}|X_{t+1})\)
    3. resamples the states to keep only the good ones
  • Allow to explores efficiently the state space by progressively integrating the data points

  • Produces a MC approximation of \(p(Y_{1:T}|\theta)\) the marginal likelihood

The filter in action

Particle MCMC

What is Particle MCMC?

  • PMCMC is an algorithm which performs “filtering” and “inference”
  • A Markov Chain Monte Carlo (MCMC) method for estimating target distributions
  • MCMC explores the parameter space by moving randomly making jumps from one value to the next
  • Probability of going from point to the other is determined by the proposal distribution and the ratio of the likelihood
  • Compared with “traditional” MCMC, in PMCMC, the likelihood estimation is approximated using a “particle filter”
  • The filter generates a set of “particles” i.e. trajectories compatible with the observation
  • It uses these trajectories to compute a (marginal) likelihood that can be use by the PMCMC

Core algorithm

  1. Initialisation Start with a value \(\theta_{0}\) from the parameter space
  2. Initial SMC Use sequential Monte Carlo to do the “filtering” and samples of potential \(\{X_{t}\}_{1..N}\). Calculate the (marginal) likelihood from this using a MC estimator
  3. Proposal Propose a new parameter value \(\theta ^*\)
  4. SMC Calculate marginal likelihood of proposal
  5. Metropolis-Hastings Accept with probability \(\min(1, \alpha)\) with \(\alpha = \frac{p(\theta ^*)}{p(\theta_{t})} \cdot \frac{q(\theta_{t})}{q(\theta ^*)}\)
  6. Loop Redo (3) until the number of steps is reached

mcstate

  • mcstate is an R package based on the SSM paradigm which aims to provide inference and filtering tools for dust models
  • Implements several main algorithms for this Particle MCMC (PMCMC), SMC^2, iterated filtering
  • Inference tooling for the Centre’s UK COVID model
  • Subsequently used for other diseases

Design philosophy

  • Less well refined than odin/dust tbh
    • We may change and improve much of this, especially MCMC parameters
  • More complex structures are built up from simpler objects
    • Filter {data, model, n_particles, compare}
    • PMCMC {parameters, filter, control}
  • Provides you with low-level tools, and little handholding
  • Pretty fast though

Particle filtering

Our requirements

  • A time series
  • A generating model
  • An index into model state
  • A compare function

(for PMCMC you also need parameters to infer, …later)

The data

incidence <- read.csv("data/incidence.csv")
head(incidence)
  cases day
1     3   1
2     2   2
3     2   3
4     2   4
5     1   5
6     3   6
plot(cases ~ day, incidence, pch = 19, las = 1)

Data preparation

data <- mcstate::particle_filter_data(
  incidence, time = "day", rate = 4, initial_time = 0)
head(data)
  day_start day_end time_start time_end cases
1         0       1          0        4     3
2         1       2          4        8     2
3         2       3          8       12     2
4         3       4         12       16     2
5         4       5         16       20     1
6         5       6         20       24     3

The model

N <- S + I + R
p_SI <- 1 - exp(-(beta) * I / N)
p_IR <- 1 - exp(-(gamma))
n_IR <- rbinom(I, p_IR * dt)
n_SI <- rbinom(S, p_SI * dt)

update(time) <- (step + 1) * dt
update(S) <- S - n_SI
update(I) <- I + n_SI - n_IR
update(R) <- R + n_IR
update(cases_cumul) <- cases_cumul + n_SI
update(cases_inc) <- if (step %% freq == 0) n_SI else cases_inc + n_SI

initial(time) <- 0
initial(S) <- 1000
initial(R) <- 0
initial(I) <- I0
initial(cases_cumul) <- 0
initial(cases_inc) <- 0

beta <- user(0.2)
gamma <- user(0.1)
I0 <- user(10)

freq <- user(4)
dt <- 1.0 / freq

Compiling the model

sir <- odin.dust::odin_dust("models/sir.R")
* installing *source* package ‘sira5d4bd1a’ ...
** using staged installation
** libs
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG  -I'/home/rfitzjoh/lib/R/library/cpp11/include' -g -Wall -Wextra -pedantic -Wmaybe-uninitialized -Wno-unused-parameter -Wno-cast-function-type -Wno-missing-field-initializers -O2  -I/home/rfitzjoh/lib/R/library/dust/include -DHAVE_INLINE -fopenmp -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-a3XuZ5/r-base-4.2.2.20221110=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c cpp11.cpp -o cpp11.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG  -I'/home/rfitzjoh/lib/R/library/cpp11/include' -g -Wall -Wextra -pedantic -Wmaybe-uninitialized -Wno-unused-parameter -Wno-cast-function-type -Wno-missing-field-initializers -O2  -I/home/rfitzjoh/lib/R/library/dust/include -DHAVE_INLINE -fopenmp -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-a3XuZ5/r-base-4.2.2.20221110=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c dust.cpp -o dust.o
g++ -std=gnu++11 -shared -L/usr/lib/R/lib -Wl,-Bsymbolic-functions -Wl,-z,relro -o sira5d4bd1a.so cpp11.o dust.o -fopenmp -L/usr/lib/R/lib -lR
installing to /tmp/Rtmp0uMqpV/devtools_install_2537dc153befd2/00LOCK-file2537dc5e692547/00new/sira5d4bd1a/libs
** checking absolute paths in shared objects and dynamic libraries
* DONE (sira5d4bd1a)

The model over time

pars <- list(beta = 0.25, gamma = 0.1)
mod <- sir$new(pars, 0, 20)
y <- mod$simulate(c(0, data$time_end))
i <- mod$info()$index[["time"]]
j <- mod$info()$index[["cases_inc"]]
matplot(y[i, 1, ], t(y[j, , ]), type = "l", col = "#00000055", lty = 1, las = 1,
        xlab = "Day", ylab = "Cases")
points(cases ~ day, incidence, col = "red", pch = 19)

The index function

  • You rarely care about all the state variables
  • You might want different state variables for your compare and for plotting
index <- function(info) {
  list(run = c(incidence = info$index$cases_inc),
       state = c(t = info$index$time,
                 I = info$index$I,
                 cases = info$index$cases_inc))
}
index(mod$info())
$run
incidence 
        6 

$state
    t     I cases 
    1     4     6 

The compare function

compare <- function(state, observed, pars = NULL) {
  modelled <- state["incidence", , drop = TRUE]
  lambda <- modelled + rexp(length(modelled), 1e6)
  dpois(observed$cases, lambda, log = TRUE)
}

This is the important bit, and something that is a trick to write well.

Files, from this repo

Or browse https://github.com/mrc-ide/odin-dust-tutorial

A particle filter

filter <- mcstate::particle_filter$new(data, model = sir, n_particles = 100,
                                       compare = compare, index = index)
pars <- list(beta = 0.25, gamma = 0.1)
filter$run(pars)
[1] -254.2894

Particle filter marginal likelihoods are stochastic!

replicate(10, filter$run(pars))
 [1] -253.7651 -259.2582 -252.1231 -256.0436 -255.1916 -256.1607 -255.2386
 [8] -257.6765 -254.6032 -254.3471

Likelihood variance changes with particle number

filter <- mcstate::particle_filter$new(data, model = sir, n_particles = 10,
                                       compare = compare, index = index)
sort(replicate(10, filter$run(pars)))
 [1] -267.1850 -263.6292 -262.8457 -261.7034 -259.9648 -259.5186 -258.7828
 [8] -258.0611 -254.4378 -251.7591
filter <- mcstate::particle_filter$new(data, model = sir, n_particles = 1000,
                                       compare = compare, index = index)
sort(replicate(10, filter$run(pars)))
 [1] -255.4523 -255.2030 -254.2859 -254.2029 -254.1420 -254.1420 -253.9215
 [8] -253.8245 -253.6417 -253.6241
  • Monte Carlo estimations typically see variance decrease with sample size, this is no different.
  • You want a small variance, but that costs a lot of CPU time

Likelihood mean changes with parameter values

filter$run(list(beta = 0.2, gamma = 0.1))
[1] -244.3733
filter$run(list(beta = 0.1, gamma = 0.05))
[1] -269.7547

Particle filter history

First, run the filter while saving history (off by default)

filter <- mcstate::particle_filter$new(data, model = sir, n_particles = 100,
                                       compare = compare, index = index)
filter$run(save_history = TRUE)
[1] -246.4271

Particle filter history is a tree

times <- data$time_end
h <- filter$history()
matplot(h["t", 1, ], t(h["cases", , ]), type = "l", col = "#00000011", 
        xlab = "Day", ylab = "Cases", las = 1)
points(cases ~ day, incidence, pch = 19, col = "red")

Particle filter history for unobserved states

matplot(h["t", 1, ], t(h["I", , ]), type = "l", col = "#00000011", 
        xlab = "Day", ylab = "Number of infecteds (I)", las = 1)

PMCMC

  • Particle MCMC - like MCMC but with a particle filter
  • Slower, and harder to tune
  • Easy to generate impossibly large amounts of data
  • Inherits all the issues of MCMC that you know and love

Algorithm

  1. Initialisation Start with a value \(\theta_{0}\) from the parameter space
  2. Initial SMC Use sequential Monte Carlo to do the “filtering” and samples of potential \(\{X_{t}\}_{1..N}\). Calculate the (marginal) likelihood from this using a MC estimator
  3. Proposal Propose a new parameter value \(\theta ^*\)
  4. SMC Calculate marginal likelihood of proposal
  5. Metropolis-Hastings Accept with probability \(\min(1, \alpha)\) with \(\alpha = \frac{p(\theta ^*)}{p(\theta_{t})} \cdot \frac{q(\theta_{t})}{q(\theta ^*)}\)
  6. Loop Redo (3) until the number of steps is reached

Defining your parameters

  • Different to particle filter/model parameters
    • filter/model parameters are everything your model needs to run; may include data!
    • PMCMC parameters (often called \(\theta\)) are unstructured numeric vector
    • the PMCMC parameters are statistical parameters, your model parameters are functional parameters
  • Requirements:
    • priors for MCMC parameters
    • proposal CV for multivariate normal
    • transformation from MCMC to model parameters

Priors

priors <- list(
  mcstate::pmcmc_parameter("beta", 0.2, min = 0),
  mcstate::pmcmc_parameter("gamma", 0.1, min = 0, prior = function(p)
    dgamma(p, shape = 1, scale = 0.2, log = TRUE)))

(this will improve in future, feedback very welcome)

Proposal

  • Variance covariance matrix for a multivariate normal distribution
  • Symmetric (except for reflections at any provided boundaries)
vcv <- diag(0.01, 2)
vcv
     [,1] [,2]
[1,] 0.01 0.00
[2,] 0.00 0.01

Transformation

Convert “MCMC parameters” into “model parameters”

transform <- function(theta) {
  as.list(theta)
}

You will want closures in complex models:

make_transform <- function(contact_matrix, vaccine_schedule) {
  function(theta) {
    list(contact_matrix = contact_matrix,
         vaccine_schedule = vaccine_schedule,
         beta = theta[["beta"]],
         gamma = theta[["gamma"]])
  }
}
transform <- make_transform(contact_matrix, vaccine_schedule)

Final parameter object

mcmc_pars <- mcstate::pmcmc_parameters$new(priors, vcv, transform)

Running PMCMC

control <- mcstate::pmcmc_control(
    n_steps = 500,
    progress = TRUE)
samples <- mcstate::pmcmc(mcmc_pars, filter, control = control)
samples
<mcstate_pmcmc> (500 samples)
  pars: 500 x 2 matrix of parameters
    beta, gamma
  probabilities: 500 x 3 matrix of log-probabilities
    log_prior, log_likelihood, log_posterior
  state: 6 x 500 matrix of final states
  trajectories: (not included)
  restart: (not included)

Our PMCMC samples

plot(samples$probabilities[, "log_posterior"], type = "s",
     xlab = "Sample", ylab = "Log posterior")

oh.

Assessing fit

  • Just look at how bad it is
  • In PMCMC, all your ideas from MCMC will be useful but can be misleading; i.e. adaptation is hard
  • Gelman-Rubin convergence diagnostic
    • Run multiple chains
    • Check than within-chain variance is similar to between-chain variance
    • Necessary but not sufficient to prove convergence
  • A lot of problems in MCMC come from autocorrelation
  • Can use the Gelman Rubin diagnostic

Autocorrelation

  • Notion from time series, which translates for (P)MCMC in term of the steps of the chains
  • Autocorrelation refers to the correlation between the values of a time series at different points in time. In MCMC, this means correlation between successive samples.
  • In the context of MCMC, autocorrelation can most of the time be substituted instead of “bad mixing”
  • A signature of random-walk MCMC
  • Likely to bias estimate (wrong mean) and reduce variance compared with the true posterior distribution
  • Linked with the notion of Effective Sample Size, roughly speaking ESS gives the equivalent in i.i.d. samples

Autocorrelation in practice FAQ

  • Why is Autocorrelation a Problem? For optimal performance, we want the samples to be independent and identically distributed (i.i.d.) samples from the target distribution.
  • How to Detect Autocorrelation? We can calculate the autocorrelation function (ACF), which measures the correlation between the samples and their lagged values.
  • How to Reduce Autocorrelation? To mitigate the problem of autocorrelation, there’s a number of strategies, including: using a longer chain, adapting the proposal distribution, using thinning or subsampling techniques. By reducing autocorrelation, we can obtain better estimates of the target distribution and improve the accuracy of our Bayesian inference.

Thinning the chain

  • Either before or after fit
  • Faster and less memory to thin before
  • More flexible to thin later
  • No real difference if history not saved

This is useful because most of your chain is not interesting due to the autocorrelation.

Running in parallel

Arguments to mcstate::pmcmc_control

  • n_chains: number of separate chains to run
  • n_threads_total: total number of threads to use
  • n_workers: number of separate threads to split your chains over
  • use_parallel_seed: helps with reproducibility

You can also run different chains on different cluster nodes - but talk to us about this.

Let’s try again

vcv <- matrix(c(0.00057, 0.00052, 0.00052, 0.00057), 2, 2)
mcmc_pars <- mcstate::pmcmc_parameters$new(priors, vcv, transform)
control <- mcstate::pmcmc_control(
    n_steps = 500,
    n_chains = 4,
    n_threads_total = 12,
    n_workers = 4,
    save_state = TRUE,
    save_trajectories = TRUE,
    progress = TRUE)
samples <- mcstate::pmcmc(mcmc_pars, filter, control = control)
plot(samples$probabilities[, "log_posterior"], type = "s")

Saving history

  • Save your trajectories at every collected sample
  • Save the final state at every sample
  • Save full model state at specific points.

Next steps

  • forward time predictions
  • posterior predictive checks
  • closures and binding data into functions
  • min log likelihood (and filter early exit)
  • rerun filter in mcmc

Advanced topics

  • compiled compare functions
  • multi-parameter models
  • multi-stage models
  • restarting models
  • deterministic (expectation) models as starting points
  • adaptive fitting (deterministic models only)
  • use on a GPU
  • use with ODE/SDE models
  • other inference methods - if2, smc2

Resources

A nice PMCMC introduction written for the epidemiologist [Endo, A., van Leeuwen, E. & Baguelin, M. Introduction to particle Markov-chain Monte Carlo for disease dynamics modellers. Epidemics 29, 100363 (2019).] (https://www.sciencedirect.com/science/article/pii/S1755436519300301?via%3Dihub)

A tutorial about SMC Doucet, A. & Johansen, A. M. A Tutorial on Particle filtering and smoothing: Fiteen years later. Oxford Handb. nonlinear Filter. 656–705 (2011). doi:10.1.1.157.772

The reference paper on PMCMC Andrieu, C., Doucet, A. & Holenstein, R. Particle Markov chain Monte Carlo methods. J. R. Stat. Soc. Ser. B (Statistical Methodol. 72, 269–342 (2010).

A software oriented paper introducing odin, dust and mcstate R. G. FitzJohn et al. Reproducible parallel inference and simulation of stochastic state space models using odin, dust, and mcstate. Wellcome Open Res. 2021 5288 5, 288 (2021).