Fundamentally, using a computer to create a realisation from
stochastic Monte Carlo models is extremely simple. Consider a random
walk in one dimension - we might write that in base R functions by
creating a function that takes a current state `state`

and a
list of parameters:

```
update_walk <- function(state, pars) {
rnorm(1, state, pars$sd)
}
```

and then iterating it for 20 time steps with:

```
y <- 0
pars <- list(sd = 2)
for (i in 1:20) {
y <- update_walk(y, pars)
}
```

At the end of this process, the variable `y`

contains a
new value, corresponding to 20 time steps with our stochastic update
function.

So why does `dust`

apparently require thousands of lines
of code to do this?

It’s very rare that one might want to run a single stochastic simulation; normally we want to run a group together. There are several ways that we might want to do that:

- For a single set of parameters and a starting state run a set of simulations, as they will differ due to the stochasticity in the model
- In addition to the above, perhaps run with different starting points, representing uncertainty in initial conditions
- In addition to the above, run for many parameter sets at once possibly with one particle per parameter, possibly with many per parameter
- In addition to the above, the parameters themselves are grouped into blocks

There book-keeping for this can get tedious and error prone if done
by hand. In `dust`

, we try and restrict concern about this to
a few points, and for the simulation itself – the interaction that we
expect to take the longest in any interesting model – we just run a big
loop over time and all particles no matter what type of structure they
might represent from the above.

See `vignette("multi")`

for details of interacting with
different ways that you might want to structure your simulations.

Once we’re running multiple simulations at once, even a simple simulation might start taking a long time and because they are independent we might look to parallelism to try and speed up the simulations.

However, one cannot just draw multiple numbers from a single random number generator at once. That is, given a generator like those built into R, there is no parallel equivalent to

`runif(10)`

that would draw the 10 numbers in parallel rather than in series.
When drawing a random number there is a “side
effect” of updating the random number state. That is because the
random number stream is *also* a Markov chain!

As such it makes sense (to us at least) to store the state of each
stream’s random number generator separately, so if we have
`n`

particles within a `dust`

object we have
`n`

separate streams, and we might think of the model state
as being the state that is declared by the user as a vector of floating
point numbers alongside the random number state. During each model step,
the model state is updated and so is the random number state.

This might seem wasteful, and if we used the popular Mersenne Twister it would be to some degree as each particle would require 2560 bytes of additional state. In contrast the newer xoshiro generators that we use require only 32 or 16 bytes of state; the same as 4 double- or single-precision floating point numbers respectively. So for any nontrivial simulation it’s not a very large overhead.

Setting the seed for these runs is not trivial, particularly as the number of simultaneous particles increase. If you’ve used random numbers with the future package you may have seen it raise a warning if you do not configure it to use a “L’Ecuyer-CMRG” which adapts R’s native random number seeds to be safe in parallel.

The reason for this is that if different streams start from seeds that are set via poor heuristics (e.g., system time and thread id) they might be exactly the same. If they were set randomly, then they might collide (see John Cook’s description of the birthday paradox here) and if they are picked sequentially there’s no guarantee that these streams might not be correlated.

Ideally we want a similar set of properties to R’s
`set.seed`

method; the user provides an arbitrary integer and
we seed *all* the random number streams using this in a way that
is reproducible and also statistically robust. We also want the streams
to be reproducible even when the number of particles changes, for
particle indices that are shared. The random number generators we use
(the xoshiro family, a.k.a. Blackmann-Vigna generators) support these
properties and are described more fully in
`vignette("rng")`

.

To initialise our system with a potentially very large number of particles we take two steps:

- First, we seed the first stream using the
`splitmix64`

RNG, following the xoshiro docs. This expands a single 64-bit integer into the 256-bits of RNG state, while ensuring that the resulting full random number seed does not contain all zeros. - Then, for each subsequent chain we take a “jump” in the sequence.
This is a special move implemented by the RNG that is equivalent to a
very large number of draws from the generator (e.g., about 2^128 for the
default generator used for double-precision models) ensuring that each
particles state occupies a non-overlapping section of the underling
random number stream (see
`vignette("rng")`

for details).

With this setup we are free to parallelise the system as each
realisation is completely independent of each other; the problem has
become “embarrassingly
parallel”. In practice we do this using OpenMP where available as this is
well supported from R and gracefully falls back on serial operation
where not available. See `dust::dust_openmp_support`

for
information on your system’s OpenMP configuration as seen by R:

```
dust::dust_openmp_support()
#> $num_procs
#> [1] 2
#>
#> $max_threads
#> [1] 2
#>
#> $thread_limit
#> [1] 2147483647
#>
#> $openmp_version
#> [1] 201511
#>
#> $has_openmp
#> [1] TRUE
#>
#> $mc.cores
#> [1] NA
#>
#> $OMP_THREAD_LIMIT
#> [1] NA
#>
#> $OMP_NUM_THREADS
#> [1] NA
#>
#> $MC_CORES
#> [1] NA
#>
#> $limit_r
#> [1] 1
#>
#> $limit_openmp
#> [1] 2
#>
#> $limit
#> [1] 1
```

As the number of threads changes, the results will not change; the
same calculations will be carried out and the same random numbers drawn.
The number of threads used can even be changed for a model while it is
running if the computational resources available change during a model
run, using the `$set_n_threads()`

method.

Sometimes we might parallelise beyond one computer (e.g., when using
a cluster), in which case we cannot use OpenMP. We call this case
“distributed parallelism” and cope by having each process take a “long
jump” (an even larger jump in the random number space), then within the
process proceed as above. This is the approach taken in our `mcstate`

package for organising running MCMC chains in parallel, each of which
works with a dust model.

The properties of the random number generator are discussed further
in `vignette("rng")`

.

A general rule-of-thumb is to avoid unneeded memory allocations in tight loops; with this sort of stochastic iteration everything is a tight loop! However, we’ve reduced the problem scope to just providing an update method, and as long as that does not issue memory allocations then the whole thing runs in fixed space without having to worry.

For nontrivial systems, we often want to record a subset of states - potentially a very small fraction of the total states computed. For example, in our sircovid model we track several thousand states (representing populations in various stages of disease transmission, in different ages, with different vaccination status etc), but most of the time we only need to report on a few tens of these in order to fit to data or to examine key outputs.

Reducing the number of state variables returned at different points in the process has several advantages:

- Saving space: if you run a model with 2000 states, 1000 replicates and record their trajectories over 100 timesteps, that represents 100 million floating point numbers, or 1.6 GB of memory or disk if using double-precision numbers. These are not unrealistic numbers, but would make even a simple sensitivity analysis all-but impossible to work with.
- Saving time: copying data around is surprisingly slow.

To enable this, you can restrict the state returned by most methods; some by default and others when you call them.

- The
`$run()`

and`$simulate()`

methods move the system forwards in time and returns the state at that point; it uses an index set into the object with`$set_index()`

. The intention here is that these would be repeatedly called and so we validate the index once and use it over and over. - The
`$state()`

method returns the model state and accepts an argument`index`

as the state to return

In both cases, if `index`

was named then the returned
state carries these names as its rownames.

The ordering of the state is important; we always have dimensions that will contain:

- the model states within a single particle
- the particles within a time-step (may be several dimensions; see
`vignette("multi")`

) - the time dimension if using
`simulate`

This is to minimise repeatedly moving around data during writing, and
to help with concatenation. Multiple particles data is stored
consecutively and read and written in order. Each time step is written
at once. And you can append states from different times easily. The
base-R `aperm()`

function will be useful for reshaping this
output to a different dimension order if you require one, but it can be
very slow.

In order to pull all of this off, we allocate all our memory up front, in C++ and pass back to R a “pointer” to this memory, which will live for as long as your model object. This means that even if your model requires GBs of memory to run, it is never copied back and forth into R (where it would be subject to R’s copy-on-write semantics but instead accessed only when needed, and written to in place following C++ reference semantics.

We try and provide verbs that are useful, given that the model presents a largely opaque pointer to model state. These are driven by our needs for running a particle filter.

Normally we have in the object several things:

- the random number state: this is effectively a matrix of integers
(either
`uint32_t`

or`uint64_t`

) - the model state: this is effectively a matrix of floating point
numbers, (typically either
`float`

or`double`

) - the model parameters: this is specific to the model in question (and these are likely shared across multiple particles). This is presented to all particles within a group as an immutable pointer, which allows safe simultaneous access by multiple threads
- the model
*internal state*: this is also specific to the model and dust has no control over this. Because each particle has its own state they can safely write to it when running parallel (compare with parameters). Typically this space is*allocated*at model construction. For models that come from odin.dust this is used as “scratch” space that we write to during an update but which does not conceptually persist between steps. Importantly, this space is assumed to be unimportant to specifying model state (i.e., we can shuffle or reset thee model state while leaving the “internal” data behind).

The internal state is the the hardest to understand in this set. Suppose that we had a model that each time step we wanted to do something like take the median value found in a set of random number draws. We might want to write the update function like

```
struct internal_type {
std::vector<real_type> samples;
};
// ...
void update(size_t time, const real_type * state, rng_state_type& rng_state,
real_type * state_next) {
for (size_t i = 0; i < shared->n; ++i) {
.samples[i] = dust::random::uniform(rng_state, 0, 1)
internal}
[0] = median(samples);
state_next}
```

with median defined as something like

```
template <typename T, typename U>
(U& v) {
T medianconst size_t m = v.size() / 2;
std::nth_element(u.begin(), u.begin() + m, u.end());
return v[m];
}
```

this takes advantage of some internal space of the correct size in internal memory. This might be configured with

```
::pars_type<model> dust_pars<model>(cpp11::list pars) {
dustusing real_type = typename model::real_type;
auto shared = std::make_shared<model::shared_type>();
->n = 10;
shared::internal_type internal{std::vector<model::real_type>(shared->n)};
modelreturn dust::pars_type<model>(shared, internal);
}
```

There is one additional subtlety about internal state: we assume that
the state entirely specifies a model in a Markov process, and so we
don’t guarantee that models with mutable internal state will not be
discarded between each iteration. Above, `samples`

is
configured in the `dust_pars`

method (so allocated there),
and is used in `update`

, but it should not be read from
within the `update`

method before it is written to, because
it might contain some other particle’s scratch space.

The reason why this is important is because if we reorder particles
what we really do is reorder the *state vector* and not this
internal state. This prevents implementing things like models with
“delays” in the current design. We may relax constraint this if it is
needed.

Given this, the sorts of verbs that we need include:

- Running the model up to a time point (
`$run`

) - runs the model’s`update`

method as many times as required to reach the new time point, returning the model state at this time point. This is useful where you might want to change the system at this time point, then continue. - Running the model and collecting history (
`$simulate`

) - as for`$run`

but also collects partial state at a number of times along the way. This always has one more dimension than`$run`

(being time) and the two functions coexist so that dimensionality is easy to program against. - Setting model state (
`$update_state`

) - leaves RNG state and parameters untouched but replaces model state for all particles. This is useful for model initialisation and for performing arbitrary model state and/or parameter changes.

In addition, we have more specific methods oriented towards particle filtering:

- Reordering the particles (
`$reorder`

) - shuffles particle state among particles within a parameter set. This is useful for implementing a resampling algorithms and updates only the state (as for`$update_state`

, leaving RNG state and internal state untouched) - Resampling particles according to some weight vector
(
`$resample`

) which implements a bootstrap sampling algorithm on top of`$reorder`

- Run a bootstrap particle filter (
`$filter`

) which is implemented using the above methods, in the case where the model provides a compare function. This is likely to be a bit low level for direct use, and is better approached via the interface in mcstate

The most esoteric design of dust is to make it convenient to use as a
target for other programs. We use the package primarily as a target for
models written in `odin`

via odin.dust. This allows
the user to write models at a very high level, describing the updates
between steps. The random walk example at the beginning of this document
might be implemented as

```
sd <- user() # user-provided standard deviation
initial(y) <- 0 # starting point of the simulation
update(y) <- runif(y, sd) # take random step each time step
```

which will compile a dust model:

```
// [[dust::class(odin)]]
// [[dust::param(sd, has_default = FALSE, default_value = NULL, rank = 0, min = -Inf, max = Inf, integer = FALSE)]]
class odin {
public:
using real_type = typename model::real_type;
using rng_state_type = dust::random::generator<real_type> rng_state_type;
using data_type = dust::no_data;
struct shared_type {
real_type initial_y;
real_type sd;
};
struct internal_type {
};
(const dust::pars_type<odin>& pars) :
odin(pars.shared), internal(pars.internal) {
shared}
size_t size() {
return 1;
}
std::vector<real_type> initial(size_t time, rng_state_type& rng_state) {
std::vector<real_type> state(1);
[0] = shared->initial_y;
statereturn state;
}
void update(size_t time, const real_type * state, rng_state_type& rng_state, real_type * state_next) {
const real_type y = state[0];
[0] = dust::random::uniform<real_type>(rng_state, y, shared->sd);
state_next}
private:
std::shared_ptr<const shared_type> shared;
internal_type internal;
};
};
// ...[some utility code excluded]
::pars_type<odin> dust_pars<odin>(cpp11::list user) {
dustusing real_type = typename odin::real_type;
auto shared = std::make_shared<odin::shared_type>();
::internal_type internal;
odin->initial_y = 0;
shared->sd = NA_REAL;
shared->sd = user_get_scalar<real_type>(user, "sd", shared->sd, NA_REAL, NA_REAL);
sharedreturn dust::pars_type<odin>(shared, internal);
}
```

We have designed these two systems to play well together so the user can write models at a very high level and generate code that then works well within this framework and efficiently run in parallel. In sircovid this is used in a model with hundreds of logical compartments each of which may be structured, but the interface at the R level remains the same as for the toy models used in the documentation here.