Stochastic models can be used in statistical inference via methods such as particle filtering but in practice doing so requires that the models can be run over and over again very quickly. While R has excellent support for sampling from distributions, it is not necessarily well suited for this sort of problem because R is single threaded, so we are forced to evaluate realisations of the stochastic process in series (one after another) rather than in parallel.

The dust package provides tools to help write stochastic models that can be evaluated in parallel. It does not directly provide statistical methods; see the mcstate package for that. Instead, it focuses on providing:

  • a fast random number generator that can be evaluated in parallel (see vignette("rng") for details)
  • a way of wrapping a user-provided model, itself written as a C++ class
  • a lightweight interface that drives this model from R
  • a set of useful primitives for developing sequential Monte Carlo methods.

A simple example - random walk

Consider a unbiased random walk; at each time step we move our position with a draw from a normal distribution with mean 0 and some standard deviation. We consider a single individual moving but will eventually simulate a family of these individuals, each independent. All simulations have a sense of time - a unitless measure of time “step” will be used but it’s up to you how it is interpreted (time is a non-negative integer, implemented using size_t).

Model code

To implement this model in dust we write out a C++ file that looks like:

class walk {
public:
  using real_type = double;
  using data_type = dust::no_data;
  using internal_type = dust::no_internal;
  using rng_state_type = dust::random::generator<real_type>;

  struct shared_type {
    real_type sd;
  };

  walk(const dust::pars_type<walk>& pars) : shared(pars.shared) {
  }

  size_t size() const {
    return 1;
  }

  std::vector<real_type> initial(size_t time) {
    std::vector<real_type> ret = {0};
    return ret;
  }

  void update(size_t time, const real_type * state, rng_state_type& rng_state,
              real_type * state_next) {
    state_next[0] = state[0] +
      dust::random::normal<real_type>(rng_state, 0, shared->sd);
  }

private:
  dust::shared_ptr<walk> shared;
};

namespace dust {

template <>
dust::pars_type<walk> dust_pars<walk>(cpp11::list pars) {
  walk::real_type sd = cpp11::as_cpp<walk::real_type>(pars["sd"]);
  return dust::pars_type<walk>(walk::shared_type{sd});
}

}

There are two main parts here; the class (walk) and an interface function (dust::dust_pars).

The class is what does most of the work. It is important that this does not use anything in the R API as it may be called in parallel. Therefore use only C++ standard library types. It has to provide every public type or method as the example above. First, the five types:

  • the type real_type has to exist and indicate what sort of numeric type you use for “real numbers”. Typically this will be double, but if you need a different type (typically float) you can use it here.
  • the type data_type describes data that the model may be compared with as it runs. Here we use dust::no_data as the model has no such data but see vignette("data") for the interface here.
  • the type shared_type is whatever read-only data your model needs that is shared across particles. Typically these reflect parameters of the model (see dust::dust_pars below), and while these will typically be shared across all particles that is is not always the case (see vignette("multi"))
  • the type internal_type is whatever internal data or space your model needs to run. This is read-write and per-particle (in contrast with shared_type above)
  • the type rng_state_type is one of the valid RNG types. Typically using rng_state_type = dust::random::generator<real_type>; will select a reasonable choice, but you could also force a specific generator such as dust::random::xoshiro256starstar here (see vignette("rng"))

The constructor must take only one argument, being const dust::pars_type<model>&. You can do what you want in the constructor - here we just copy shared (containing parameters) into the object with the pars argument (we use the private member shared here but this can be anything you want, though the type is important). The dust::pars_type<> template is a wrapper that contains two fields shared (as dust::shared_ptr<walk>) and internal (as internal_type).

The method size() must return the “size” of the system - how many state variables it has. Here we’re considering a single individual moving randomly so the size is always 1. Many models will have a size that depends on their parameters, though the size cannot be changed after construction.

There are other forms of the dust::pars_type constructor which would allow specifying the internal data; typically this is used to create scratch space (allocating vectors as needed) rather than compute values because the scratch space and model state can become separated from each other (by direct setting of state, or by shuffling model state between particles).

Constructing a model

The function dust::dust will “compile” a model for us to use:

path_walk <- system.file("examples/walk.cpp", package = "dust")
walk <- dust::dust(path_walk)

However, this is also bundled into the package and can be loaded with:

walk <- dust::dust_example("walk")

The object itself is an “R6ClassGenerator” object:

walk
#> <dust> object generator
#>   Public:
#>     initialize: function (pars, time, n_particles, n_threads = 1L, seed = NULL, 
#>     name: function () 
#>     param: function () 
#>     run: function (time_end) 
#>     simulate: function (time_end) 
#>     set_index: function (index) 
#>     index: function () 
#>     n_threads: function () 
#>     n_state: function () 
#>     n_particles: function () 
#>     n_particles_each: function () 
#>     shape: function () 
#>     update_state: function (pars = NULL, state = NULL, time = NULL, set_initial_state = NULL, 
#>     state: function (index = NULL) 
#>     time: function () 
#>     reorder: function (index) 
#>     resample: function (weights) 
#>     info: function () 
#>     pars: function () 
#>     rng_state: function (first_only = FALSE, last_only = FALSE) 
#>     set_rng_state: function (rng_state) 
#>     has_openmp: function () 
#>     has_gpu_support: function (fake_gpu = FALSE) 
#>     has_compare: function () 
#>     real_size: function () 
#>     rng_algorithm: function () 
#>     uses_gpu: function (fake_gpu = FALSE) 
#>     n_pars: function () 
#>     set_n_threads: function (n_threads) 
#>     set_data: function (data, shared = FALSE) 
#>     compare_data: function () 
#>     filter: function (time_end = NULL, save_trajectories = FALSE, time_snapshot = NULL, 
#>     gpu_info: function () 
#>   Private:
#>     pars_: NULL
#>     pars_multi_: NULL
#>     index_: NULL
#>     info_: NULL
#>     n_threads_: NULL
#>     n_particles_: NULL
#>     n_particles_each_: NULL
#>     shape_: NULL
#>     ptr_: NULL
#>     gpu_config_: NULL
#>     methods_: NULL
#>     param_: NULL
#>     reload_: NULL
#>   Parent env: <environment: namespace:dust>
#>   Locked objects: TRUE
#>   Locked class: FALSE
#>   Portable: TRUE

Create an instance of the model using walk$new. There are three required arguments:

  • pars: passed to dust_pars to initialise the model
  • time: the initial time (0 seems like a sensible choice here given our model has no internal sense of time)
  • n_particles: the number of particles to create
model <- walk$new(list(sd = 1), 0, 20)
model
#> <dust>
#>   Public:
#>     compare_data: function () 
#>     filter: function (time_end = NULL, save_trajectories = FALSE, time_snapshot = NULL, 
#>     gpu_info: function () 
#>     has_compare: function () 
#>     has_gpu_support: function (fake_gpu = FALSE) 
#>     has_openmp: function () 
#>     index: function () 
#>     info: function () 
#>     initialize: function (pars, time, n_particles, n_threads = 1L, seed = NULL, 
#>     n_pars: function () 
#>     n_particles: function () 
#>     n_particles_each: function () 
#>     n_state: function () 
#>     n_threads: function () 
#>     name: function () 
#>     param: function () 
#>     pars: function () 
#>     real_size: function () 
#>     reorder: function (index) 
#>     resample: function (weights) 
#>     rng_algorithm: function () 
#>     rng_state: function (first_only = FALSE, last_only = FALSE) 
#>     run: function (time_end) 
#>     set_data: function (data, shared = FALSE) 
#>     set_index: function (index) 
#>     set_n_threads: function (n_threads) 
#>     set_rng_state: function (rng_state) 
#>     shape: function () 
#>     simulate: function (time_end) 
#>     state: function (index = NULL) 
#>     time: function () 
#>     update_state: function (pars = NULL, state = NULL, time = NULL, set_initial_state = NULL, 
#>     uses_gpu: function (fake_gpu = FALSE) 
#>   Private:
#>     gpu_config_: NULL
#>     index_: NULL
#>     info_: NULL
#>     methods_: list
#>     n_particles_: 20
#>     n_particles_each_: 20
#>     n_threads_: 1
#>     param_: NULL
#>     pars_: list
#>     pars_multi_: FALSE
#>     ptr_: externalptr
#>     reload_: NULL
#>     shape_: 20

This returns an R6 object that can be used to simulate from or interact with the model. For example, our initial model state is

model$state()
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
#> [1,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0
#>      [,15] [,16] [,17] [,18] [,19] [,20]
#> [1,]     0     0     0     0     0     0

Here there is one row per model state variable (there is only one here) and one column per particle (there are 20)

and we can run the model for 100 time steps, which returns the state at the end of the walk (and not at any intermediate times):

model$run(100)
#>             [,1]      [,2]      [,3]     [,4]      [,5]     [,6]      [,7]
#> [1,] -0.02780983 -14.92799 -8.001147 15.16199 0.8541161 10.92156 -4.478731
#>           [,8]      [,9]     [,10]    [,11]     [,12]     [,13]     [,14]
#> [1,] -6.416156 -9.883696 -10.16829 5.046543 0.5644725 -2.079644 -3.014334
#>          [,15]    [,16]     [,17]     [,18]     [,19]    [,20]
#> [1,] -3.685183 15.55656 -5.247826 -1.621558 -3.036199 -4.73985

We can also directly retrieve the state from our object

model$state()
#>             [,1]      [,2]      [,3]     [,4]      [,5]     [,6]      [,7]
#> [1,] -0.02780983 -14.92799 -8.001147 15.16199 0.8541161 10.92156 -4.478731
#>           [,8]      [,9]     [,10]    [,11]     [,12]     [,13]     [,14]
#> [1,] -6.416156 -9.883696 -10.16829 5.046543 0.5644725 -2.079644 -3.014334
#>          [,15]    [,16]     [,17]     [,18]     [,19]    [,20]
#> [1,] -3.685183 15.55656 -5.247826 -1.621558 -3.036199 -4.73985

At this point our particles have been run for 100 time steps with standard deviation 1 at each step so they will be distributed following Normal(0, 10). This is easier to see if we simulate a lot of particles, here 20,000:

model <- walk$new(list(sd = 1), 0, 20000)
invisible(model$run(100))
hist(model$state(), freq = FALSE, las = 1, col = "steelblue2", main = "",
     ylim = c(0., 0.04), xlab = "State")
curve(dnorm(x, 0, 10), col = "orange", add = TRUE, lwd = 2)

Running a model in parallel

The approach above still runs everything in serial, one particle after another. We can configure this system to run in parallel by providing the extra argument n_threads to the constructor.

Provided that your system can compile with openmp the following code will execute in parallel using 2 threads

model <- walk$new(list(sd = 1), 0, 20, n_threads = 2)
model$run(100)
#>          [,1]    [,2]     [,3]    [,4]      [,5]      [,6]      [,7]     [,8]
#> [1,] 3.463116 7.16848 7.102007 21.2275 -5.478347 -6.050355 -9.613679 13.69989
#>           [,9]    [,10]    [,11]     [,12]    [,13]    [,14]     [,15]
#> [1,] 0.9684393 11.65252 1.089576 -6.658974 8.571651 13.96338 -21.20301
#>          [,16]    [,17]    [,18]    [,19]     [,20]
#> [1,] -14.08026 5.045285 1.638252 13.32859 -1.827573

Running this same code in series will give the same results:

model <- walk$new(list(sd = 1), 0, 20, n_threads = 1)
model$run(100)
#>           [,1]     [,2]     [,3]     [,4]     [,5]     [,6]      [,7]      [,8]
#> [1,] -1.451801 2.431436 10.41627 6.861941 11.86564 20.03156 -2.369254 -4.014697
#>          [,9]     [,10]    [,11]     [,12]    [,13]     [,14]    [,15]    [,16]
#> [1,] -3.81413 -4.011602 12.91168 -10.17938 20.60901 -19.91835 9.628508 4.867727
#>         [,17]    [,18]    [,19]     [,20]
#> [1,] 12.11464 7.311943 11.07588 -11.89787

We use as many random number generators as there are particles, so if you run fewer particles or more, increase the threads or decrease, the results will be the same (see vignette("design") for more on this).

You should be careful when selecting the number of threads. dust will never use more than one thread at a time without it being requested, but avoid using parallel::detectCores() to work out how many threads you have available as it will often return an overestimate. This is particularly the case in a shared-use system such as a cluster or CRAN’s servers. We provide a helper function dust::dust_openmp_threads which can be used to try and find a safe number of threads available to you while respecting various environment variables which seek to control this (MC_CORES, OMP_THREAD_LIMIT, etc).

For example,

dust::dust_openmp_threads(100, "fix")
#> Requested number of threads '100' exceeds a limit of '1'
#>  See dust::dust_openmp_threads() for details
#> [1] 1

A more interesting example

Consider now an SIR model (Susceptible - Infected - Recovered). This sort of model is common in epidemiology, and is often extended to add additional compartments (e.g., SEIR which adds an Exposed compartment) or by structuring each compartment based on properties such as age. Here, we show a simple example with just 3 compartments:

class sir {
public:
  using real_type = double;
  using internal_type = dust::no_internal;
  using rng_state_type = dust::random::generator<real_type>;
  
  struct data_type {
    real_type incidence;
  };

  struct shared_type {
    real_type S0;
    real_type I0;
    real_type R0;
    real_type beta;
    real_type gamma;
    real_type dt;
    size_t freq;
    // Observation parameters
    real_type exp_noise;
  };

  sir(const dust::pars_type<sir>& pars) : shared(pars.shared) {
  }

  size_t size() const {
    return 5;
  }

  std::vector<real_type> initial(size_t time) {
    std::vector<real_type> ret = {shared->S0, shared->I0, shared->R0, 0, 0};
    return ret;
  }

  void update(size_t time, const real_type * state, rng_state_type& rng_state,
              real_type * state_next) {
    real_type S = state[0];
    real_type I = state[1];
    real_type R = state[2];
    real_type cumulative_incidence = state[3];

    real_type N = S + I + R;

    real_type p_SI = 1 - std::exp(-(shared->beta) * I / N);
    real_type p_IR = 1 - std::exp(-(shared->gamma));
    real_type n_IR = dust::random::binomial<real_type>(rng_state, I,
                                                       p_IR * shared->dt);
    real_type n_SI = dust::random::binomial<real_type>(rng_state, S,
                                                       p_SI * shared->dt);

    state_next[0] = S - n_SI;
    state_next[1] = I + n_SI - n_IR;
    state_next[2] = R + n_IR;
    state_next[3] = cumulative_incidence + n_SI;
    // Little trick here to compute daily incidence by accumulating
    // incidence from the first day.
    state_next[4] = (time % shared->freq == 0) ? n_SI : state[4] + n_SI;
  }

  real_type compare_data(const real_type * state, const data_type& data,
                         rng_state_type& rng_state) {
    const real_type incidence_modelled = state[4];
    const real_type incidence_observed = data.incidence;
    const real_type lambda = incidence_modelled +
      dust::random::exponential(rng_state, shared->exp_noise);
    return dust::density::poisson(incidence_observed, lambda, true);
  }

private:
  dust::shared_ptr<sir> shared;
};

// Helper function for accepting values with defaults
inline double with_default(double default_value, cpp11::sexp value) {
  return value == R_NilValue ? default_value : cpp11::as_cpp<double>(value);
}

namespace dust {

template <>
dust::pars_type<sir> dust_pars<sir>(cpp11::list pars) {
  using real_type = sir::real_type;
  // Initial state values
  // [[dust::param(I0, required = FALSE)]]
  real_type I0 = with_default(10, pars["I0"]);
  real_type S0 = 1000.0;
  real_type R0 = 0.0;

  // Rates, which can be set based on the provided pars
  // [[dust::param(beta, required = FALSE)]]
  real_type beta = with_default(0.2, pars["beta"]);
  // [[dust::param(gamma, required = FALSE)]]
  real_type gamma = with_default(0.1, pars["gamma"]);

  // Time scaling
  size_t freq = 4;
  real_type dt = 1.0 / static_cast<real_type>(freq);

  // Compare function
  // [[dust::param(exp_noise, required = FALSE)]]
  real_type exp_noise = with_default(1e6, pars["exp_noise"]);

  sir::shared_type shared{S0, I0, R0, beta, gamma, dt, freq, exp_noise};
  return dust::pars_type<sir>(shared);
}

template <>
cpp11::sexp dust_info<sir>(const dust::pars_type<sir>& pars) {
  using namespace cpp11::literals;
  // Information about state order
  cpp11::writable::strings vars({"S", "I", "R", "cases_cumul", "cases_inc"});
  // Information about parameter values
  cpp11::list p = cpp11::writable::list({"beta"_nm = pars.shared->beta,
                                         "gamma"_nm = pars.shared->gamma});
  return cpp11::writable::list({"vars"_nm = vars, "pars"_nm = p});
}

// The way that this is going to work is we will process a list
// *outside* of the C that will take (say) a df and convert it
// row-wise into a list with elements `time` and `data`, we will pass
// that in here. Then this function will be called once per data
// element to create the struct that will be used for future work.
template <>
sir::data_type dust_data<sir>(cpp11::list data) {
  return sir::data_type{cpp11::as_cpp<sir::real_type>(data["incidence"])};
}

}

There a few changes here compared to the version that was shown before for the walk model, but the core parts are only slightly different:

  • the initial method is slightly more complicated as we initialise a 3-element vector
  • our shared_type object has significantly more elements
  • the dust_pars function is much more complicated
  • there is a compare_data method and a nontrivial data_type definition, but ignore these for now (see vignette("data"))

The other difference is a new template specialisation of dust_info - this is optional but can be used to report back to R information about the model based on the input parameters. Here it returns information about variable order and core parameters.

We also have added C++ pseudo-attributes with [[dust::param]] to describe the parameters that this function takes. This is optional, but allows use of a coef method with the class (see below).

As before, we’ll use the version bundled with the package

sir <- dust::dust_example("sir")

To get information about the parameters, use coef() on the generator

coef(sir)
#> $I0
#> $I0$required
#> [1] FALSE
#> 
#> 
#> $beta
#> $beta$required
#> [1] FALSE
#> 
#> 
#> $gamma
#> $gamma$required
#> [1] FALSE
#> 
#> 
#> $exp_noise
#> $exp_noise$required
#> [1] FALSE

The model is initialised the same way as before:

model <- sir$new(list(), 0, 20)

We can use the $info() method to retrieve information about our model:

model$info()
#> $vars
#> [1] "S"           "I"           "R"           "cases_cumul" "cases_inc"  
#> 
#> $pars
#> $pars$beta
#> [1] 0.2
#> 
#> $pars$gamma
#> [1] 0.1

and get its state as before:

model$state()
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
#> [1,] 1000 1000 1000 1000 1000 1000 1000 1000 1000  1000  1000  1000  1000  1000
#> [2,]   10   10   10   10   10   10   10   10   10    10    10    10    10    10
#> [3,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0
#> [4,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0
#> [5,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0
#>      [,15] [,16] [,17] [,18] [,19] [,20]
#> [1,]  1000  1000  1000  1000  1000  1000
#> [2,]    10    10    10    10    10    10
#> [3,]     0     0     0     0     0     0
#> [4,]     0     0     0     0     0     0
#> [5,]     0     0     0     0     0     0

Because we have 5 states per particle, this is a 5 x 20 matrix.

Suppose that as we run the model we mostly want information on the “I” compartment. So we’ll run the model for a bit and retrieve the number of infected individuals, then continue, etc. To do this we use the $set_index() method to indicate which state elements should be returned after using $run():

model$set_index(2L)

(you can pass any integer vector here, of any length, provided all the indices lie between 1 and the length of your state vector)

Now, when using run, the number of infected individuals is returned

model$run(10)
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
#> [1,]   16   14   11   16   13   12   10   15   15    14    11    11    24    13
#>      [,15] [,16] [,17] [,18] [,19] [,20]
#> [1,]    11    10    14    11    20     7
model$run(20)
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
#> [1,]   24   22   10   18   18   20   15   20   23    21    11    14    31    14
#>      [,15] [,16] [,17] [,18] [,19] [,20]
#> [1,]    17    13    16    11    20     9

This is useful when you have many compartments or variables that you are not that interested in during running the model. For a particle filter you might be fitting to the sum of all infected individuals over a number of compartments, so rather than returning hundreds of values back (times hundreds of particles) you can return back much less data and keep things nice and fast.

If the index vector is named, then those names will be used as row names on the returned object:

model$set_index(c(I = 2L))
model$run(30)
#>   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
#> I   30   34   10   19   23   18   22   24   24    28     7    25    36    19
#>   [,15] [,16] [,17] [,18] [,19] [,20]
#> I    23    12    19    17    26    10

We can always use $state() to get the whole state vector:

model$state()
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
#> [1,]  971  966  991  978  972  982  981  974  972   969   991   981   961   983
#> [2,]   30   34   10   19   23   18   22   24   24    28     7    25    36    19
#> [3,]    9   10    9   13   15   10    7   12   14    13    12     4    13     8
#> [4,]   29   34    9   22   28   18   19   26   28    31     9    19    39    17
#> [5,]    2    6    0    2    4    1    0    4    2     2     0     1     5     2
#>      [,15] [,16] [,17] [,18] [,19] [,20]
#> [1,]   979   986   978   986   966   994
#> [2,]    23    12    19    17    26    10
#> [3,]     8    12    13     7    18     6
#> [4,]    21    14    22    14    34     6
#> [5,]     3     1     2     1     3     0

or select only variables of interest:

model$state(c(S = 1L, R = 3L))
#>   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
#> S  971  966  991  978  972  982  981  974  972   969   991   981   961   983
#> R    9   10    9   13   15   10    7   12   14    13    12     4    13     8
#>   [,15] [,16] [,17] [,18] [,19] [,20]
#> S   979   986   978   986   966   994
#> R     8    12    13     7    18     6

Again, this copies names from the index if they are present.

In order to run the simulation beginning-to-end, we use the $simulate method on a dust object, which runs over a set of time steps and records the state at each.

model <- sir$new(list(), 0, 200)
model$set_index(2L)
times <- seq(0, 600, by = 5)
state <- model$simulate(times)

The output here is a 1 x 200 x 121 matrix (n state x n particles x n times)

dim(state)
#> [1]   1 200 121

we need to drop the first dimension and transpose the others for ease of plotting:

traces <- t(drop(state))

Plotting this over time (with 4 time steps per day - see the sir code above)

day <- times / 4
matplot(day, traces, type = "l", lty = 1, col = "#00000022",
        xlab = "Day", ylab = "Number infected (I)")
lines(day, rowMeans(traces), col = "red", lwd = 2)

Other methods

There are a few other methods on the dust objects that may be useful.

Reordering particles

This method exists to support particle filtering, and allows resampling or reordering of particles.

model <- walk$new(list(sd = 1), 0, 20)
model$run(1)
#>            [,1]      [,2]      [,3]       [,4]      [,5]       [,6]      [,7]
#> [1,] 0.04100055 -1.655861 0.4727237 -0.6650445 -1.206297 -0.3958508 -1.117329
#>           [,8]       [,9]     [,10]     [,11]     [,12]    [,13]     [,14]
#> [1,] 0.5347917 0.07778276 0.7395859 -1.178763 0.5530748 1.675934 -1.095233
#>           [,15]     [,16]     [,17]     [,18]     [,19]     [,20]
#> [1,] -0.4476247 0.7529722 0.2641409 0.4776227 0.2053753 -1.205948

Suppose that we wanted to reorder these particles so that they were in decreasing order:

index <- order(model$state())
index
#>  [1]  2  5 20 11  7 14  4 15  6  1  9 19 17  3 18  8 12 10 16 13

We then pass this index to the reorder method:

model$reorder(index)
model$state()
#>           [,1]      [,2]      [,3]      [,4]      [,5]      [,6]       [,7]
#> [1,] -1.655861 -1.206297 -1.205948 -1.178763 -1.117329 -1.095233 -0.6650445
#>            [,8]       [,9]      [,10]      [,11]     [,12]     [,13]     [,14]
#> [1,] -0.4476247 -0.3958508 0.04100055 0.07778276 0.2053753 0.2641409 0.4727237
#>          [,15]     [,16]     [,17]     [,18]     [,19]    [,20]
#> [1,] 0.4776227 0.5347917 0.5530748 0.7395859 0.7529722 1.675934

We can then continue our random walk. There is no need to sample every particle and particles can appear multiple times in the sample, but the total number must be conserved. Suppose that we want to to sample particles based on how close they are to 0:

p <- dnorm(model$state())
index <- sample(length(p), replace = TRUE , prob = p)
index
#>  [1]  4  3 15 13 16  2 14  9  7 19  8 11 14 11  5  8  9  2  4 10

We can then apply this sampling:

model$reorder(index)
model$state()
#>           [,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]
#> [1,] -1.178763 -1.205948 0.4776227 0.2641409 0.5347917 -1.206297 0.4727237
#>            [,8]       [,9]     [,10]      [,11]      [,12]     [,13]      [,14]
#> [1,] -0.3958508 -0.6650445 0.7529722 -0.4476247 0.07778276 0.4727237 0.07778276
#>          [,15]      [,16]      [,17]     [,18]     [,19]      [,20]
#> [1,] -1.117329 -0.4476247 -0.3958508 -1.206297 -1.178763 0.04100055

This is not terribly useful on its own but is a key part of a particle filter.

When this reordering happens, only the model state is copied around; the internal data and random number state are left behind.

Set particle state

A particle state is determined by three mutable things; pars, state and time; these can all be updated for a model after it has been created. We have found setting one or more of these at a time important;

  • Resetting the model with a new set of parameters (pars), initial conditions (state) and times (time)
  • Changing pars at some point in the simulation to introduce some new aspect of the model
  • Changing state to manually move around some individuals within a model
  • Setting time along with state when initialising the model from a previously saved state

The update_state method allows setting any or all of these components.

By default every particle starts from the initial condition specified by your model classes initial() method. However, you can specify a state directly using the $update_state() method. Here, we initialise our SIR model with only 1 infected individual rather than 10:

model <- sir$new(list(), 0, 20)
model$update_state(state = c(1000, 1, 0, 0, 0))
model$state()
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
#> [1,] 1000 1000 1000 1000 1000 1000 1000 1000 1000  1000  1000  1000  1000  1000
#> [2,]    1    1    1    1    1    1    1    1    1     1     1     1     1     1
#> [3,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0
#> [4,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0
#> [5,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0
#>      [,15] [,16] [,17] [,18] [,19] [,20]
#> [1,]  1000  1000  1000  1000  1000  1000
#> [2,]     1     1     1     1     1     1
#> [3,]     0     0     0     0     0     0
#> [4,]     0     0     0     0     0     0
#> [5,]     0     0     0     0     0     0

Now, when we run the model, far more of the epidemics fail to take off as the infected individual goes disappears before infecting anyone.

times <- seq(0, 600, by = 5)
state <- model$simulate(times)
day <- times / 4
matplot(day, t(state[2, , ]), type = "l", lty = 1, col = "#00000022",
        xlab = "Day", ylab = "Number infected (I)")

You can optionally set the initial time along with the state. This is useful if your model depends on time (e.g., you use the time step in a calculation by transforming it into some more meaningful measure of time).

You can also set the initial state to a range of different values. Suppose we set the initial number of infections to be Poisson distributed with a mean of 10, we might write:

I0 <- rpois(20, 10)
state0 <- rbind(1010 - I0, I0, 0, 0, 0, deparse.level = 0)
model$update_state(state = state0, time = 0L)
model$time()
#> [1] 0
model$state()
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
#> [1,] 1001 1001 1003 1000  998  994 1004 1004  998  1001   997  1001   995  1003
#> [2,]    9    9    7   10   12   16    6    6   12     9    13     9    15     7
#> [3,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0
#> [4,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0
#> [5,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0
#>      [,15] [,16] [,17] [,18] [,19] [,20]
#> [1,]   999  1001   998  1003  1002   998
#> [2,]    11     9    12     7     8    12
#> [3,]     0     0     0     0     0     0
#> [4,]     0     0     0     0     0     0
#> [5,]     0     0     0     0     0     0

Reset the model

One particularly common case is to “reset” the model in order to run it again. you should not simply recreate the model from its constructor as that will re-seed a new set of random state – it would be preferable continue with the generator state in your old model.

Suppose we run our model with sd of 1 for 100 time steps

model <- walk$new(list(sd = 1), 0, 10, seed = 1L)
y1 <- model$run(100)

we then use reset to set new parameters into the model and set the time back to zero and can run again

model$update_state(pars = list(sd = 2), time = 0)
y2 <- model$run(100)

The state created in y2 will have started from our new starting point and time zero, but have used the same random number state through both simulations, which is generally what we want.

Use within a package

You should not use dust::dust() within a package, because that would cause the model to compile each time you use it, rather than when the package builds. It may also cause issues when trying to use the model in parallel (e.g., with the parallel package). Instead you should use dust::dust_package() which will generate appropriate code for you.

To use dust in a package, put your dust models in inst/dust and run dust::dust_package() on the package’s root directory.

A skeleton package might contain:

#> .
#> ├── DESCRIPTION
#> ├── NAMESPACE
#> └── inst
#>     └── dust
#>         └── walk.cpp

This is the normal R package skeleton, though missing R and src directories (for now). The DESCRIPTION file contains

Package: example
Title: Example Dust in a Package
Version: 0.0.1
LinkingTo: cpp11, dust
Authors@R: c(person('A', 'Person', role = c('aut', 'cre')
                     email = 'person@example.com'))
SystemRequirements: C++11
License: CC0

The important things here are:

  • the package name (Package). We’re using example, and names with a dot may not work as expected
  • including cpp11 and dust in LinkingTo, which allows package compilation to find their respective header files
  • a useDynLib call to your package in the NAMESPACE file

Our NAMESPACE file contains:

useDynLib('example', .registration = TRUE)

The files in inst/dust are the same files as seen above, with walk.cpp containing

class walk {
public:
  using real_type = double;
  using data_type = dust::no_data;
  using internal_type = dust::no_internal;
  using rng_state_type = dust::random::generator<real_type>;

  struct shared_type {
    real_type sd;
  };

  walk(const dust::pars_type<walk>& pars) : shared(pars.shared) {
  }

  size_t size() const {
    return 1;
  }

  std::vector<real_type> initial(size_t time) {
    std::vector<real_type> ret = {0};
    return ret;
  }

  void update(size_t time, const real_type * state, rng_state_type& rng_state,
              real_type * state_next) {
    state_next[0] = state[0] +
      dust::random::normal<real_type>(rng_state, 0, shared->sd);
  }

private:
  dust::shared_ptr<walk> shared;
};

namespace dust {

template <>
dust::pars_type<walk> dust_pars<walk>(cpp11::list pars) {
  walk::real_type sd = cpp11::as_cpp<walk::real_type>(pars["sd"]);
  return dust::pars_type<walk>(walk::shared_type{sd});
}

}

There can be as many of these files as you want within the directory inst/dust.

To prepare the package, run dust::dust_package():

dust::dust_package(path)
#>  18 functions decorated with [[cpp11::register]]
#>  generated file cpp11.R
#>  generated file cpp11.cpp

The directory structure now has more files:

#> .
#> ├── DESCRIPTION
#> ├── NAMESPACE
#> ├── R
#> │   ├── cpp11.R
#> │   └── dust.R
#> ├── inst
#> │   └── dust
#> │       └── walk.cpp
#> └── src
#>     ├── Makevars
#>     ├── cpp11.cpp
#>     └── walk.cpp

The file src/walk.cpp are generated by dust and should not be edited. They include your model, but also a bit of helper code:

// Generated by dust (version 0.12.3) - do not edit
#include <cpp11.hpp>

[[cpp11::register]]
cpp11::sexp dust_walk_capabilities();

[[cpp11::register]]
cpp11::sexp dust_walk_gpu_info();
[[cpp11::register]]
SEXP dust_cpu_walk_alloc(cpp11::list r_pars, bool pars_multi, size_t time,
                         cpp11::sexp r_n_particles, size_t n_threads,
                         cpp11::sexp r_seed, bool deterministic,
                         cpp11::sexp gpu_config);

[[cpp11::register]]
SEXP dust_cpu_walk_run(SEXP ptr, size_t time_end);

[[cpp11::register]]
SEXP dust_cpu_walk_simulate(SEXP ptr, cpp11::sexp time_end);

[[cpp11::register]]
SEXP dust_cpu_walk_set_index(SEXP ptr, cpp11::sexp r_index);

[[cpp11::register]]
SEXP dust_cpu_walk_update_state(SEXP ptr, SEXP r_pars, SEXP r_state,
                                           SEXP r_time, SEXP r_set_initial_state,
                                           SEXP index, SEXP reset_step_size);

[[cpp11::register]]
SEXP dust_cpu_walk_state(SEXP ptr, SEXP r_index);

[[cpp11::register]]
size_t dust_cpu_walk_time(SEXP ptr);

[[cpp11::register]]
void dust_cpu_walk_reorder(SEXP ptr, cpp11::sexp r_index);

[[cpp11::register]]
SEXP dust_cpu_walk_resample(SEXP ptr, cpp11::doubles r_weights);

[[cpp11::register]]
SEXP dust_cpu_walk_rng_state(SEXP ptr, bool first_only, bool last_only);

[[cpp11::register]]
SEXP dust_cpu_walk_set_rng_state(SEXP ptr, cpp11::raws rng_state);

[[cpp11::register]]
SEXP dust_cpu_walk_set_data(SEXP ptr, cpp11::list data, bool shared);

[[cpp11::register]]
SEXP dust_cpu_walk_compare_data(SEXP ptr);

[[cpp11::register]]
SEXP dust_cpu_walk_filter(SEXP ptr, SEXP time_end,
                                     bool save_trajectories,
                                     cpp11::sexp time_snapshot,
                                     cpp11::sexp min_log_likelihood);

[[cpp11::register]]
void dust_cpu_walk_set_n_threads(SEXP ptr, int n_threads);

[[cpp11::register]]
int dust_cpu_walk_n_state(SEXP ptr);
#include <dust/r/dust.hpp>

class walk {
public:
  using real_type = double;
  using data_type = dust::no_data;
  using internal_type = dust::no_internal;
  using rng_state_type = dust::random::generator<real_type>;

  struct shared_type {
    real_type sd;
  };

  walk(const dust::pars_type<walk>& pars) : shared(pars.shared) {
  }

  size_t size() const {
    return 1;
  }

  std::vector<real_type> initial(size_t time) {
    std::vector<real_type> ret = {0};
    return ret;
  }

  void update(size_t time, const real_type * state, rng_state_type& rng_state,
              real_type * state_next) {
    state_next[0] = state[0] +
      dust::random::normal<real_type>(rng_state, 0, shared->sd);
  }

private:
  dust::shared_ptr<walk> shared;
};

namespace dust {

template <>
dust::pars_type<walk> dust_pars<walk>(cpp11::list pars) {
  walk::real_type sd = cpp11::as_cpp<walk::real_type>(pars["sd"]);
  return dust::pars_type<walk>(walk::shared_type{sd});
}

}

cpp11::sexp dust_walk_capabilities() {
  return dust::r::dust_capabilities<walk>();
}

cpp11::sexp dust_walk_gpu_info() {
  return dust::gpu::r::gpu_info();
}
using model_cpu = dust::dust_cpu<walk>;

SEXP dust_cpu_walk_alloc(cpp11::list r_pars, bool pars_multi, size_t time,
                             cpp11::sexp r_n_particles, size_t n_threads,
                             cpp11::sexp r_seed, bool deterministic,
                             cpp11::sexp gpu_config) {
  return dust::r::dust_cpu_alloc<walk>(r_pars, pars_multi, time, r_n_particles,
                                        n_threads, r_seed, deterministic,
                                        gpu_config);
}

SEXP dust_cpu_walk_run(SEXP ptr, size_t time_end) {
  return dust::r::dust_run<model_cpu>(ptr, time_end);
}

SEXP dust_cpu_walk_simulate(SEXP ptr, cpp11::sexp time_end) {
  return dust::r::dust_simulate<model_cpu>(ptr, time_end);
}

SEXP dust_cpu_walk_set_index(SEXP ptr, cpp11::sexp r_index) {
  dust::r::dust_set_index<model_cpu>(ptr, r_index);
  return R_NilValue;
}

SEXP dust_cpu_walk_update_state(SEXP ptr, SEXP r_pars, SEXP r_state,
                                           SEXP r_time, SEXP r_set_initial_state, SEXP index, SEXP reset_step_size) {
  return dust::r::dust_update_state<model_cpu>(ptr, r_pars, r_state, r_time,
                                                      r_set_initial_state, index, reset_step_size);
}

SEXP dust_cpu_walk_state(SEXP ptr, SEXP r_index) {
  return dust::r::dust_state<model_cpu>(ptr, r_index);
}

size_t dust_cpu_walk_time(SEXP ptr) {
  return dust::r::dust_time<model_cpu>(ptr);
}

void dust_cpu_walk_reorder(SEXP ptr, cpp11::sexp r_index) {
  return dust::r::dust_reorder<model_cpu>(ptr, r_index);
}

SEXP dust_cpu_walk_resample(SEXP ptr, cpp11::doubles r_weights) {
  return dust::r::dust_resample<model_cpu>(ptr, r_weights);
}

SEXP dust_cpu_walk_rng_state(SEXP ptr, bool first_only, bool last_only) {
  return dust::r::dust_rng_state<model_cpu>(ptr, first_only, last_only);
}

SEXP dust_cpu_walk_set_rng_state(SEXP ptr, cpp11::raws rng_state) {
  dust::r::dust_set_rng_state<model_cpu>(ptr, rng_state);
  return R_NilValue;
}

SEXP dust_cpu_walk_set_data(SEXP ptr, cpp11::list data,
                                       bool shared) {
  dust::r::dust_set_data<model_cpu>(ptr, data, shared);
  return R_NilValue;
}

SEXP dust_cpu_walk_compare_data(SEXP ptr) {
  return dust::r::dust_compare_data<model_cpu>(ptr);
}

SEXP dust_cpu_walk_filter(SEXP ptr, SEXP time_end,
                                     bool save_trajectories,
                                     cpp11::sexp time_snapshot,
                                     cpp11::sexp min_log_likelihood) {
  return dust::r::dust_filter<model_cpu>(ptr, time_end,
                                                save_trajectories,
                                                time_snapshot,
                                                min_log_likelihood);
}

void dust_cpu_walk_set_n_threads(SEXP ptr, int n_threads) {
  return dust::r::dust_set_n_threads<model_cpu>(ptr, n_threads);
}

int dust_cpu_walk_n_state(SEXP ptr) {
  return dust::r::dust_n_state<model_cpu>(ptr);
}

The file R/dust.R contains the R interface generated by dust with the constructor objects (all models’ constructors will be collected into this file, which also should not be edited).

## Generated by dust (version 0.12.3) - do not edit
walk <- R6::R6Class(
  "dust",
  cloneable = FALSE,

  private = list(
    pars_ = NULL,
    pars_multi_ = NULL,
    index_ = NULL,
    info_ = NULL,
    n_threads_ = NULL,
    n_particles_ = NULL,
    n_particles_each_ = NULL,
    shape_ = NULL,
    ptr_ = NULL,
    gpu_config_ = NULL,
    methods_ = NULL,
    param_ = NULL,
    reload_ = NULL
  ),

  public = list(
    initialize = function(pars, time, n_particles, n_threads = 1L,
                          seed = NULL, pars_multi = FALSE,
                          deterministic = FALSE,
                          gpu_config = NULL) {
      if (is.null(gpu_config)) {
        private$methods_ <- list(
           alloc = dust_cpu_walk_alloc,
           run = dust_cpu_walk_run,
           simulate = dust_cpu_walk_simulate,
           set_index = dust_cpu_walk_set_index,
           n_state = dust_cpu_walk_n_state,
           update_state = dust_cpu_walk_update_state,
           state = dust_cpu_walk_state,
           time = dust_cpu_walk_time,
           reorder = dust_cpu_walk_reorder,
           resample = dust_cpu_walk_resample,
           rng_state = dust_cpu_walk_rng_state,
           set_rng_state = dust_cpu_walk_set_rng_state,
           set_n_threads = dust_cpu_walk_set_n_threads,
           set_data = dust_cpu_walk_set_data,
           compare_data = dust_cpu_walk_compare_data,
           filter = dust_cpu_walk_filter)
      } else {
        private$methods_ <- list(alloc = function(...) {
          stop("GPU support not enabled for this object")
        })
      }
      res <- private$methods_$alloc(pars, pars_multi, time, n_particles,
                        n_threads, seed, deterministic, gpu_config)
      private$pars_ <- pars
      private$pars_multi_ <- pars_multi
      private$n_threads_ <- n_threads
      private$ptr_ <- res[[1L]]
      private$info_ <- res[[2L]]
      private$shape_ <- res[[3L]]
      private$gpu_config_ <- res[[4L]]
      private$n_particles_ <- prod(private$shape_)
      if (pars_multi) {
        private$n_particles_each_ <- private$n_particles_ / length(pars)
      } else {
        private$n_particles_each_ <- private$n_particles_
      }
    },

    name = function() {
      "walk"
    },

    param = function() {
      private$param_
    },

    run = function(time_end) {
      m <- private$methods_$run(private$ptr_, time_end)
      rownames(m) <- names(private$index_)
      m
    },

    simulate = function(time_end) {
      m <- private$methods_$simulate(private$ptr_, time_end)
      rownames(m) <- names(private$index_)
      m
    },

    set_index = function(index) {
      private$methods_$set_index(private$ptr_, index)
      private$index_ <- index
      invisible()
    },

    index = function() {
      private$index_
    },

    n_threads = function() {
      private$n_threads_
    },

    n_state = function() {
      private$methods_$n_state(private$ptr_)
    },

    n_particles = function() {
      private$n_particles_
    },

    n_particles_each = function() {
      private$n_particles_each_
    },

    shape = function() {
      private$shape_
    },

    update_state = function(pars = NULL, state = NULL, time = NULL,
                            set_initial_state = NULL, index = NULL,
                            reset_step_size = NULL) {
      info <- private$methods_$update_state(private$ptr_, pars, state, time,
                                          set_initial_state, index,
                                          reset_step_size)
      if (!is.null(pars)) {
        private$info_ <- info
        private$pars_ <- pars
      }
      invisible()
    },

    state = function(index = NULL) {
      m <- private$methods_$state(private$ptr_, index)
      rownames(m) <- names(index)
      m
    },

    time = function() {
      private$methods_$time(private$ptr_)
    },

    reorder = function(index) {
      storage.mode(index) <- "integer"
      private$methods_$reorder(private$ptr_, index)
      invisible()
    },

    resample = function(weights) {
      invisible(private$methods_$resample(private$ptr_, weights))
    },

    info = function() {
      private$info_
    },

    pars = function() {
      private$pars_
    },

    rng_state = function(first_only = FALSE, last_only = FALSE) {
      private$methods_$rng_state(private$ptr_, first_only, last_only)
    },

    set_rng_state = function(rng_state) {
      private$methods_$set_rng_state(private$ptr_, rng_state)
      invisible()
    },

    has_openmp = function() {
      dust_walk_capabilities()[["openmp"]]
    },

    has_gpu_support = function(fake_gpu = FALSE) {
      if (fake_gpu) {
        FALSE
      } else {
        dust_walk_capabilities()[["gpu"]]
      }
    },

    has_compare = function() {
      dust_walk_capabilities()[["compare"]]
    },

    real_size = function() {
      dust_walk_capabilities()[["real_size"]]
    },

    rng_algorithm = function() {
      dust_walk_capabilities()[["rng_algorithm"]]
    },

    uses_gpu = function(fake_gpu = FALSE) {
      real_gpu <- private$gpu_config_$real_gpu
      !is.null(real_gpu) && (fake_gpu || real_gpu)
    },

    n_pars = function() {
      if (private$pars_multi_) length(private$pars_) else 0L
    },

    set_n_threads = function(n_threads) {
      prev <- private$n_threads_
      private$methods_$set_n_threads(private$ptr_, n_threads)
      private$n_threads_ <- n_threads
      invisible(prev)
    },

    set_data = function(data, shared = FALSE) {
      private$methods_$set_data(private$ptr_, data, shared)
    },

    compare_data = function() {
      private$methods_$compare_data(private$ptr_)
    },

    filter = function(time_end = NULL, save_trajectories = FALSE,
                      time_snapshot = NULL, min_log_likelihood = NULL) {
      private$methods_$filter(private$ptr_, time_end, save_trajectories,
                              time_snapshot, min_log_likelihood)
    },

    gpu_info = function() {
      ret <- dust_walk_gpu_info()
      parent <- parent.env(environment())
      if (ret$has_cuda && exists("private", parent, inherits = FALSE)) {
        ret$config <- private$gpu_config_
      }
      ret
    }
  ))
class(walk) <- c("dust_generator", class(walk))

Finally, R/cpp11.R and src/cpp11.cpp are files created by cpp11 that should not be edited.

Your package can include as much R code as you want, and can be developed like any other R package. But any time you change the code in inst/dust you should rerun dust::dust_package().