This vignette covers how to write dust2
systems from
scratch. Users of this package via odin2
do not
need to read or understand this vignette, though of course you are
welcome to read it. We will start with an almost comically trivial
system and try to expand it over several sections.
First steps, a random walk
Consider a unbiased random walk; at each time step we move our position with a draw from a normal distribution with mean 0 and some standard deviation.
To implement this model in dust we write out a C++ file that looks like:
#include <dust2/common.hpp>
// [[dust2::class(walk)]]
// [[dust2::time_type(discrete)]]
// [[dust2::parameter(sd)]]
class walk {
public:
walk() = delete;
using real_type = double;
struct shared_state {
real_type sd;
};
struct internal_state {};
using rng_state_type = monty::random::generator<real_type>;
static dust2::packing packing_state(const shared_state& shared) {
return dust2::packing{{"x", {}}};
}
static shared_state build_shared(cpp11::list pars) {
const auto sd = dust2::r::read_real(pars, "sd");
return shared_state{sd};
}
static void update_shared(cpp11::list pars, shared_state& shared) {
shared.sd = dust2::r::read_real(pars, "sd", shared.sd);
}
static void initial(real_type time,
const shared_state& shared,
internal_state& internal,
rng_state_type& rng_state,
real_type * state_next) {
state_next[0] = 0;
}
static void update(real_type time,
real_type dt,
const real_type * state,
const shared_state& shared,
internal_state& internal,
rng_state_type& rng_state,
real_type * state_next) {
state_next[0] = monty::random::normal(rng_state, state[0], shared.sd * dt);
}
};
All the code sits inside a class, though in this class every method
is static
(we have deleted the constructor by writing
walk() = delete;
to emphasise this). All dust class
definitions have the same components, but the order is not terribly
important:
- a series of annotations (the comments starting with
// [[dust2::
) - some type declarations and
struct
definitions - a series of
static
methods. For these, the types of the arguments must match the interface described here
We consider each of these pieces in turn here, and consider the other values that might have been used in place.
Support code
We include a header from dust2
; the path for the
compiler to find this will be arranged by R using its
LinkingTo
system. This include must be present.
Annotations
The first annotation tells dust2
the name of the class
that contains the system definition. This may seem obvious but computers
are actually not very clever and it’s best to be blunt with them.
The second annotation tells dust2
that this is a
discrete-time (as opposed to continuous time) system:
If we had wanted a continuous time system of ODEs we would have instead written:
and then later some of the rest of the code would be different; we will discuss this later.
The remaining annotations tell dust2
about the
parameters. We use these to help communicate with the users of the
system the inputs that generator accepts. These are really
documentation rather than controlling how parameters are
processed in the build_shared
and
update_shared
methods described below, and it is up to you
to keep them in sync. If you prefer, these could go directly above the
code that handles each parameter. This information is surfaced to R via
the print
and coef
methods of the generator
and system objects.
The other annotations that we did not use here are
-
// [[dust2::name(<value>)]]
- specify a different name in R for your system to the class name -
// [[dust2::default_dt(<value>)]]
- specify a default value fordt
-
// [[dust2::has_compare()]]
- required to support comparison to data -
// [[dust2::has_adjoint()]]
- required to support likelihood gradients
We will describe these later.
Type definitions
The first type we need is one for the real type (we might change this, depending on how rewriting the currently-absent GPU interface goes). But for now at least you must include this line and it’s best if it is exactly as is written here.
If you have come from a C background, this using
statement is a lot like writing typedef double real_type;
except that the arguments go the other way around and it contains an
equals sign.
Second, we have the shared state:
Values in shared_state
are quantities, often parameters,
that do not vary after a system is initialised (or after running
dust_system_update_pars()
) and which are shared across all
particles in a group. They are read-only while the system is running
(indeed through all the non-parameter methods below). You can include
derived quantities here too. In this model we have one element,
sd
, which will represent the standard deviation of the
random walk over a time unit.
Third, we have an empty internal state:
Internal state is particle-specific state that can be used as scratch space that a particle can read and write to as it evaluates but which is not tied to a specific particle.
Finally, we have the random number state:
As with real_type
above, this was designed for
flexibility in dust version 1 that we may not need here, so it may
change in future. However, we do need a convenient name for this type,
so it is best left as is.
The methods
First up, we describe how state is packed. Here we have a single
compartment x
, which is a scalar. This is described in the
packing as:
static dust2::packing packing_state(const shared_state& shared) {
return dust2::packing{{"x", {}}};
}
The {}
here indicates x
is a scalar. If it
had been a vector the length would be within the braces (e.g.,
{3}
or {shared.len}
) and if it had been a
matrix there would be two elements. We’ll describe this more fully
later.
Next, we have build_shared
, which takes an R list
(cpp11::list
) and converts it into our
shared_state
type. The implementation here can be whatever
you fancy.
static shared_state build_shared(cpp11::list pars) {
const auto sd = dust2::r::read_real(pars, "sd");
return shared_state{sd};
}
Here, we have used a dust convenience function
dust2::r::read_real
to convert the R element
sd
from pars
into a real and then construct
the struct
from that. We might have written this manually
as something like:
static shared_state build_shared(cpp11::list pars) {
real_type sd = cpp11::as_cpp<real_type>(pars["sd"]);
return shared_state{sd};
}
but we find that the cpp11
conversion functions produce
error messages that are hard to action; if our list pars
does not have an element sd
, the name
sd
does not occur within the error message so the user does
not really know what has gone wrong.
The first sort of interesting method is initial
, which
computes initial conditions for our system. Though in this case it’s
trivial so it’s not terribly interesting:
static void initial(real_type time,
const shared_state& shared,
internal_state& internal,
rng_state_type& rng_state,
real_type * state_next) {
state_next[0] = 0;
}
This function takes as argument:
-
time
, which is the time at which the system is being initialised. Our initial conditions don’t depend on time so we will ignore this, but some systems will find it useful. -
shared
, a read only (const
) reference to our shared data (in this system astruct
containingsd
). We will ignore this as well. -
internal
, a mutable reference to our internal state, which is an emptystruct
. We ignore this as well. -
rng_state
, the state of the random number generator for this particle; we would use this to generate random numbers if we did so, which we don’t, so you will be unsurprised to read that we ignore this one too. -
state_next
, the system state that we are initialising. This one we will write to, and it will contain on exit the initial conditions.
Note that state_next
is a raw pointer, which is a bit
gross if you are used to modern C++, but here we are writing into the
middle of a big state vector that includes all our particles. It is your
responsibility not to write past the end of this (here we can write
exactly one number but in complex systems the state for each particle
could be quite long). If we used C++20 we might have used ranges support to do
this more nicely.
The actual implementation is trivial; we zero the initial condition!
And finally we have the update
method, which will
usually be the bulk of a system implementation:
static void update(real_type time,
real_type dt,
const real_type * state,
const shared_state& shared,
internal_state& internal,
rng_state_type& rng_state,
real_type * state_next) {
state_next[0] = monty::random::normal(rng_state, state[0], shared.sd * dt);
}
As with initial
there are quite a few arguments:
-
time
, as forinitial
the time at the start of the step -
dt
, the size of the time step (so we will move totime + dt
) -
state
, a read-only (const
) pointer to the state at the start of the step -
shared
,internal
andrng_state
, as forinitial
-
state_next
, a mutable pointer to the state at the end of the step.
The general pattern is to read variables from state
, do
some calculations and write to state_next
, which is what we
do here. We use the monty
random library to sample from a
normal distribution, referencing the previous state
state[0]
, using the standard deviation of the process from
shared_state
(shared.sd
) scaled by
dt
and using the random number state
rng_state
, writing this back into
state_next
.
Multiple variables, the SIR revisited
Slightly more interestingly, consider the SIR model used elsewhere in
the docs (dust_example("sir")
and elsewhere). Here, we’ll
show a minimal implementation of this, which shows off a few more
features:
#include <dust2/common.hpp>
// [[dust2::class(sir)]]
// [[dust2::time_type(discrete)]]
// [[dust2::has_compare()]]
// [[dust2::parameter(I0, constant = FALSE)]]
// [[dust2::parameter(N, constant = TRUE)]]
// [[dust2::parameter(beta, constant = FALSE)]]
// [[dust2::parameter(gamma, constant = FALSE)]]
// [[dust2::parameter(exp_noise, constant = TRUE)]]
class sir {
public:
sir() = delete;
using real_type = double;
struct shared_state {
real_type N;
real_type I0;
real_type beta;
real_type gamma;
real_type exp_noise;
};
struct internal_state {};
struct data_type {
real_type incidence;
};
using rng_state_type = monty::random::generator<real_type>;
static dust2::packing packing_state(const shared_state& shared) {
return dust2::packing{{"S", {}}, {"I", {}}, {"R", {}}, {"cases_inc", {}}};
}
static shared_state build_shared(cpp11::list pars) {
const real_type I0 = dust2::r::read_real(pars, "I0", 10);
const real_type N = dust2::r::read_real(pars, "N", 1000);
const real_type beta = dust2::r::read_real(pars, "beta", 0.2);
const real_type gamma = dust2::r::read_real(pars, "gamma", 0.1);
const real_type exp_noise = dust2::r::read_real(pars, "exp_noise", 1e6);
return shared_state{N, I0, beta, gamma, exp_noise};
}
static void update_shared(cpp11::list pars, shared_state& shared) {
shared.I0 = dust2::r::read_real(pars, "I0", shared.I0);
shared.beta = dust2::r::read_real(pars, "beta", shared.beta);
shared.gamma = dust2::r::read_real(pars, "gamma", shared.gamma);
}
static data_type build_data(cpp11::list r_data, const shared_state& shared) {
auto data = static_cast<cpp11::list>(r_data);
auto incidence = dust2::r::read_real(data, "incidence", NA_REAL);
return data_type{incidence};
}
static void initial(real_type time,
const shared_state& shared,
internal_state& internal,
rng_state_type& rng_state,
real_type * state_next) {
state_next[0] = shared.N - shared.I0;
state_next[1] = shared.I0;
state_next[2] = 0;
state_next[3] = 0;
}
static void update(real_type time,
real_type dt,
const real_type * state,
const shared_state& shared,
internal_state& internal,
rng_state_type& rng_state,
real_type * state_next) {
const auto S = state[0];
const auto I = state[1];
const auto R = state[2];
const auto cases_inc = state[3];
const auto p_SI = 1 - monty::math::exp(-shared.beta * I / shared.N * dt);
const auto p_IR = 1 - monty::math::exp(-shared.gamma * dt);
const auto n_SI = monty::random::binomial<real_type>(rng_state, S, p_SI);
const auto n_IR = monty::random::binomial<real_type>(rng_state, I, p_IR);
state_next[0] = S - n_SI;
state_next[1] = I + n_SI - n_IR;
state_next[2] = R + n_IR;
state_next[3] = cases_inc + n_SI;
}
static auto zero_every(const shared_state& shared) {
return dust2::zero_every_type<real_type>{{1, {3}}};
}
static real_type compare_data(const real_type time,
const real_type * state,
const data_type& data,
const shared_state& shared,
internal_state& internal,
rng_state_type& rng_state) {
const auto incidence_observed = data.incidence;
if (std::isnan(data.incidence)) {
return 0;
}
const auto noise =
monty::random::exponential_rate(rng_state, shared.exp_noise);
const auto incidence_modelled = state[3];
const auto lambda = incidence_modelled + noise;
return monty::density::poisson(incidence_observed, lambda, true);
}
};
This shows basically the same pattern as the random walk model above, and here we highlight the additions and differences.
Our packing_state
method looks more interesting and
potentially more useful now:
static dust2::packing packing_state(const shared_state& shared) {
return dust2::packing{{"S", {}}, {"I", {}}, {"R", {}}, {"cases_inc", {}}};
}
Here, we have four variables; the S
, I
and
R
compartments and cases_inc
which holds the
daily incidence (the number of new cases per day). As with the
random walk model, these are still scalars (the second element of each
pair is still {}
) but there are four elements now and the
order is determined by the order presented here.
The build_shared
and update_shared
have not
really changed in nature, but note that we’ve chosen only to support
updating three of the parameters within update_shared
(I0
, gamma
and beta
). The
parameters N
and exp_noise
can only be set at
initialisation.
We’ve added a new type data_type
and a
build_data
method which creates it. The
data_type
struct
can hold arbitrary data, and
is used to represent an observation of one or more data streams at a
single point in time. When running within a particle filter (see
dust_filter_create()
) we will have data.frame
representing a time series of these observations.
static data_type build_data(cpp11::list r_data, const shared_state& shared) {
auto data = static_cast<cpp11::list>(r_data);
auto incidence = dust2::r::read_real(data, "incidence", NA_REAL);
return data_type{incidence};
}
Our initial condition function now actually sets an initial condition!
static void initial(real_type time,
const shared_state& shared,
internal_state& internal,
rng_state_type& rng_state,
real_type * state_next) {
state_next[0] = shared.N - shared.I0;
state_next[1] = shared.I0;
state_next[2] = 0;
state_next[3] = 0;
}
The first element (S
; see packing_state
) is
set to N - I0
and the second element (I
) is
set to I0
with the remaining two states zeroed.
The update
method is similar in nature to the random
walk model, just a bit longer:
static void update(real_type time,
real_type dt,
const real_type * state,
const shared_state& shared,
internal_state& internal,
rng_state_type& rng_state,
real_type * state_next) {
const auto S = state[0];
const auto I = state[1];
const auto R = state[2];
const auto cases_inc = state[3];
const auto p_SI = 1 - monty::math::exp(-shared.beta * I / shared.N * dt);
const auto p_IR = 1 - monty::math::exp(-shared.gamma * dt);
const auto n_SI = monty::random::binomial<real_type>(rng_state, S, p_SI);
const auto n_IR = monty::random::binomial<real_type>(rng_state, I, p_IR);
state_next[0] = S - n_SI;
state_next[1] = I + n_SI - n_IR;
state_next[2] = R + n_IR;
state_next[3] = cases_inc + n_SI;
}
The update for cases_inc
deserves some extra
attention:
You might think that this will simply compute cumulative
cases, and you would be right if it were not for the
zero_every
method:
static auto zero_every(const shared_state& shared) {
return dust2::zero_every_type<real_type>{{1, {3}}};
}
This declares that every 1
time unit (the first element
of the inner {}
) we will zero the data at element(s)
{3}
. This latter argument might be a vector, so saying
{7, {3, 4, 5}}
would read as “every 7 time units, zero
elements 3 to 5”. The outer {}
allows us to have multiple
different zeroed variables, so we could write
dust2::zero_every_type<real_type>{{1, {3}, {7, {4}}}
if we added another variable for weekly incidence.
Finally, we have compare_data
:
static real_type compare_data(const real_type time,
const real_type * state,
const data_type& data,
const shared_state& shared,
internal_state& internal,
rng_state_type& rng_state) {
const auto incidence_observed = data.incidence;
if (std::isnan(data.incidence)) {
return 0;
}
const auto noise =
monty::random::exponential_rate(rng_state, shared.exp_noise);
const auto incidence_modelled = state[3];
const auto lambda = incidence_modelled + noise;
return monty::density::poisson(incidence_observed, lambda, true);
}
This has the same long sort of call signature as initial
and update
but with read-only
state
and read-only data. Here, we return
a single real number, being the log-likelihood at this point. In this
example we add some small random noise to the modelled data (to avoid
NaN
s where our modelled cases are zero) and compute the log
density from a Poisson distribution for observing the cases in
data
.
Continuous time (ODE) models
The interface looks slightly different for ODE models, because we are
not updating the system from one time to the next but computing the
rate of change at a given time and an ODE
solver is updating the values of the system. So rather than an
update()
method, we will write an rhs()
method. The implementation here follows as that of the sir model above,
and very little has changed:
#include <dust2/common.hpp>
// [[dust2::class(sir)]]
// [[dust2::time_type(discrete)]]
// [[dust2::has_compare()]]
// [[dust2::parameter(I0, constant = FALSE)]]
// [[dust2::parameter(N, constant = TRUE)]]
// [[dust2::parameter(beta, constant = FALSE)]]
// [[dust2::parameter(gamma, constant = FALSE)]]
// [[dust2::parameter(exp_noise, constant = TRUE)]]
class sir {
public:
sir() = delete;
using real_type = double;
struct shared_state {
real_type N;
real_type I0;
real_type beta;
real_type gamma;
real_type exp_noise;
};
struct internal_state {};
struct data_type {
real_type incidence;
};
using rng_state_type = monty::random::generator<real_type>;
static dust2::packing packing_state(const shared_state& shared) {
return dust2::packing{{"S", {}}, {"I", {}}, {"R", {}}, {"cases_inc", {}}};
}
static shared_state build_shared(cpp11::list pars) {
const real_type I0 = dust2::r::read_real(pars, "I0", 10);
const real_type N = dust2::r::read_real(pars, "N", 1000);
const real_type beta = dust2::r::read_real(pars, "beta", 0.2);
const real_type gamma = dust2::r::read_real(pars, "gamma", 0.1);
const real_type exp_noise = dust2::r::read_real(pars, "exp_noise", 1e6);
return shared_state{N, I0, beta, gamma, exp_noise};
}
static void update_shared(cpp11::list pars, shared_state& shared) {
shared.I0 = dust2::r::read_real(pars, "I0", shared.I0);
shared.beta = dust2::r::read_real(pars, "beta", shared.beta);
shared.gamma = dust2::r::read_real(pars, "gamma", shared.gamma);
}
static data_type build_data(cpp11::list r_data, const shared_state& shared) {
auto data = static_cast<cpp11::list>(r_data);
auto incidence = dust2::r::read_real(data, "incidence", NA_REAL);
return data_type{incidence};
}
static void initial(real_type time,
const shared_state& shared,
internal_state& internal,
rng_state_type& rng_state,
real_type * state_next) {
state_next[0] = shared.N - shared.I0;
state_next[1] = shared.I0;
state_next[2] = 0;
state_next[3] = 0;
}
static void update(real_type time,
real_type dt,
const real_type * state,
const shared_state& shared,
internal_state& internal,
rng_state_type& rng_state,
real_type * state_next) {
const auto S = state[0];
const auto I = state[1];
const auto R = state[2];
const auto cases_inc = state[3];
const auto p_SI = 1 - monty::math::exp(-shared.beta * I / shared.N * dt);
const auto p_IR = 1 - monty::math::exp(-shared.gamma * dt);
const auto n_SI = monty::random::binomial<real_type>(rng_state, S, p_SI);
const auto n_IR = monty::random::binomial<real_type>(rng_state, I, p_IR);
state_next[0] = S - n_SI;
state_next[1] = I + n_SI - n_IR;
state_next[2] = R + n_IR;
state_next[3] = cases_inc + n_SI;
}
static auto zero_every(const shared_state& shared) {
return dust2::zero_every_type<real_type>{{1, {3}}};
}
static real_type compare_data(const real_type time,
const real_type * state,
const data_type& data,
const shared_state& shared,
internal_state& internal,
rng_state_type& rng_state) {
const auto incidence_observed = data.incidence;
if (std::isnan(data.incidence)) {
return 0;
}
const auto noise =
monty::random::exponential_rate(rng_state, shared.exp_noise);
const auto incidence_modelled = state[3];
const auto lambda = incidence_modelled + noise;
return monty::density::poisson(incidence_observed, lambda, true);
}
};
We have made changes to the annotations, replaced update
with rhs
and updated compare_data
, otherwise
everything remains the same (except for a minor change dropping the
exp_noise
parameter which was not needed here).
The annotations have replaced
with
The rhs
method follows the sir
model’s
update one closely, but returns rates into state_deriv
rather than updating state_next
. Note that this function
does not accept an rng_state
argument.
The compare_data
method has changed slightly as we no
longer add random noise to the modelled data:
static real_type compare_data(const real_type time,
const real_type * state,
const data_type& data,
const shared_state& shared,
internal_state& internal,
rng_state_type& rng_state) {
const auto incidence_observed = data.incidence;
if (std::isnan(data.incidence)) {
return 0;
}
const auto noise =
monty::random::exponential_rate(rng_state, shared.exp_noise);
const auto incidence_modelled = state[3];
const auto lambda = incidence_modelled + noise;
return monty::density::poisson(incidence_observed, lambda, true);
}
Continuous time models with additional output
Sometimes it will be convenient to have models where you have
variables that are not updated as ODEs, but with some variables being a
compound expression of your variables. You can always do this in post
processing, but this can save pulling a lot of data back into R or allow
reusing values that you have computed in your shared
object.
To do this, list the variables as normal in
packing_state
but make sure that they are the final
variables. Suppose that in the sirode
model above, we
wanted to track N
, the total population size (ignore that
this is a parameter and assume perhaps that we have an open system or
that we are aggregating over age groups). We could then write:
static dust2::packing packing_state(const shared_state& shared) {
return dust2::packing{{"S", {}}, {"I", {}}, {"R", {}}, {"cases_inc", {}}, {"N", {}}};
}
to add the additional “variable” into our state. We then write
to indicate that the last 1 entry in packing_state
is
not an ODE variable. Finally we could define
static void output(real_type time,
real_type * state,
const shared_state& shared,
internal_state& internal) {
state[4] = state[0] + state[1] + state[2];
}
which would set the 5th element of state
to the sum over
the first three (N = S + I + R
). Note that this takes
state
as a read-write variable and not a constant
variable.