Create a dust model from a C++ input file

Create a dust model from a C++ input file. This function will compile the dust support around your model and return an object that can be used to work with the model (see the Details below, and dust_generator).

dust(
  filename,
  quiet = FALSE,
  workdir = NULL,
  gpu = FALSE,
  real_type = NULL,
  linking_to = NULL,
  cpp_std = NULL,
  compiler_options = NULL,
  optimisation_level = NULL,
  skip_cache = FALSE
)

Arguments

filename: The path to a single C++ file
quiet: Logical, indicating if compilation messages from pkgbuild should be displayed. Error messages will be displayed on compilation failure regardless of the value used.
workdir: Optional working directory to use. If NULL uses a temporary directory. By using a different directory of your choosing you can see the generated code.
gpu: Logical, indicating if we should generate GPU code. This requires a considerable amount of additional software installed (CUDA toolkit and drivers) as well as a CUDA-compatible GPU. If TRUE, then we call dust_cuda_options with no arguments. Alternatively, call that function and pass the value here (e.g, gpu = dust::dust_cuda_options(debug = TRUE)). Note that due to the use of the __syncwarp() primitive this may require a GPU with compute version 70 or higher.
real_type: Optionally, a string indicating a substitute type to swap in for your model's real_type declaration. If given, then we replace the string using real_type = (double|float) with the given type. This is primarily intended to be used as gpu = TRUE, real_type = "float" in order to create model for the GPU that will use 32 bit floats (rather than 64 bit doubles, which are much slower). For CPU models decreasing precision of your real type will typically just decrease precision for no additional performance.
linking_to: Optionally, a character vector of additional packages to add to the DESCRIPTION's LinkingTo field. Use this when your model pulls in C++ code that is packaged within another package's header-only library.
cpp_std: The C++ standard to use, if you need to set one explicitly. See the section "Using C++ code" in "Writing R extensions" for the details of this, and how it interacts with the R version currently being used. For R 4.0.0 and above, C++11 will be used; as dust depends on at least this version of R you will never need to specify a version this low. Sensible options are C++14, C++17, etc, depending on the features you need and what your compiler supports.
compiler_options: A character vector of additional options to pass through to the C++ compiler. These will be passed through without any shell quoting or validation, so check the generated commands and outputs carefully in case of error. Note that R will apply these before anything in your personal Makevars.
optimisation_level: A shorthand way of specifying common compiler options that control optimisation level. By default (NULL) no options are generated from this, and the optimisation level will depend on your user Makevars file. Valid options are none which disables optimisation (-O0), which will be faster to compile but much slower, standard which enables standard level of optimisation (-O2), useful if your Makevars/pkgload configuration is disabling optimisation, or max (-O3 and --fast-math) which enables some slower-to-compile and potentially unsafe optimisations. These options are applied after compiler_options and may override options provided there. Note that as for compiler_options, R will apply these before anything in your personal Makevars
skip_cache: Logical, indicating if the cache of previously compiled models should be skipped. If TRUE then your model will not be looked for in the cache, nor will it be added to the cache after compilation.

Value

A dust_generator object based on your source files

Input requirements

Your input dust model must satisfy a few requirements.

Define some class that implements your model (below model is assumed to be the class name)
That class must define a type internal_type (so model::internal_type) that contains its internal data that the model may change during execution (i.e., that is not shared between particles). If no such data is needed, you can do using internal_type = dust::no_internal; to indicate this.
We also need a type shared_type that contains constant internal data is shared between particles (e.g., dimensions, arrays that are read but not written). If no such data is needed, you can do using share_type = dust::no_shared; to indicate this.
That class must also include a type alias that describes the model's floating point type, real_type. Most models can include using real_type = double; in their public section.
The class must also include a type alias that describes the model's data type. If your model does not support data, then write using data_type = dust::no_data;, which disables the compare_data and set_data methods. Otherwise see vignette("data") for more information.
The class must have a constructor that accepts const dust::pars_type<model>& pars for your type model. This will have elements shared and internal which you can assign into your model if needed.
The model must have a method size() returning size_t which returns the size of the system. This size may depend on values in your initialisation object but is constant within a model run.
The model must have a method initial (which may not be const), taking a time step number (size_t) and returning a std::vector<real_type> of initial state for the model.
The model must have a method update taking arguments:
- size_t time: the time step number
- const double * state: the state at the beginning of the time step
- dust::rng_state_type<real_type>& rng_state: the dust random number generator state - this must be a reference, as it will be modified as random numbers are drawn
- double *state_next: the end state of the model (to be written to by your function)

Your update function is the core here and should update the state of the system - you are expected to update all values of state on return.

It is very important that none of the functions in the class use the R API in any way as these functions will be called in parallel.

You must also provide a data/parameter-wrangling function for producing an object of type dust::pars_type<model> from an R list. We use cpp11 for this. Your function will look like:

namespace dust {
template <>
dust::pars_type<model> dust_pars<model>(cpp11::list pars) {
  // ...
  return dust::pars_type<model>(shared, internal);
}
}

With the body interacting with pars to create an object of type model::shared_type and model::internal_type before returning the dust::pars_type object. This function will be called in serial and may use anything in the cpp11 API. All elements of the returned object must be standard C/C++ (e.g., STL) types and not cpp11/R types. If your model uses only shared or internal, you may use the single-argument constructor overload to dust::pars_type which is equivalent to using dust::no_shared or dust::no_internal for the missing argument.

Your model may provided a template specialisation dust::dust_info<model>() returning a cpp11::sexp for returning arbitrary information back to the R session:

namespace dust {
template <>
cpp11::sexp dust_info<model>(const dust::pars_type<sir>& pars) {
  return cpp11::wrap(...);
}
}

What you do with this is up to you. If not present then the info() method on the created object will return NULL. Potential use cases for this are to return information about variable ordering, or any processing done while accepting the pars object used to create the pars fed into the particles.

Configuring your model

You can optionally use C++ pseudo-attributes to configure the generated code. Currently we support two attributes:

[[dust::class(classname)]] will tell dust the name of your target C++ class (in this example classname). You will need to use this if your file uses more than a single class, as otherwise will try to detect this using extremely simple heuristics.
[[dust::name(modelname)]] will tell dust the name to use for the class in R code. For technical reasons this must be alphanumeric characters only (sorry, no underscore) and must not start with a number. If not included then the C++ type name will be used (either specified with [[dust::class()]] or detected).

Error handling

Your model should only throw exceptions as a last resort. One such last resort exists already if rbinom is given invalid inputs to prevent an infinite loop. If an error is thrown, all particles will complete their current run, and then the error will be rethrown - this is required by our parallel processing design. Once this happens though the state of the system is "inconsistent" as it contains particles that have run for different lengths of time. You can extract the state of the system at the point of failure (which may help with debugging) but you will be unable to continue running the object until either you reset it (with $update_state()). An error will be thrown otherwise.

Things are worse on a GPU; if an error is thrown by the RNG code (happens in rbinom when given impossible inputs such as negative sizes, probabilities less than zero or greater than 1) then we currently use CUDA's __trap() function which will require a process restart to be able to use anything that uses the GPU again, covering all methods in the class. However, this is preferable to the infinite loop that would otherwise be caused.

Examples


# dust includes a couple of very simple examples
filename <- system.file("examples/walk.cpp", package = "dust")

# This model implements a random walk with a parameter coming from
# R representing the standard deviation of the walk
writeLines(readLines(filename))
#> class walk {
#> public:
#>   using real_type = double;
#>   using data_type = dust::no_data;
#>   using internal_type = dust::no_internal;
#>   using rng_state_type = dust::random::generator<real_type>;
#> 
#>   struct shared_type {
#>     real_type sd;
#>     bool random_initial;
#>   };
#> 
#>   walk(const dust::pars_type<walk>& pars) : shared(pars.shared) {
#>   }
#> 
#>   size_t size() const {
#>     return 1;
#>   }
#> 
#>   std::vector<real_type> initial(size_t time, rng_state_type& rng_state) {
#>     std::vector<real_type> ret = {0};
#>     if (shared->random_initial) {
#>       ret[0] = dust::random::normal<real_type>(rng_state, 0, shared->sd);
#>     }
#>     return ret;
#>   }
#> 
#>   void update(size_t time, const real_type * state, rng_state_type& rng_state,
#>               real_type * state_next) {
#>     state_next[0] = state[0] +
#>       dust::random::normal<real_type>(rng_state, 0, shared->sd);
#>   }
#> 
#> private:
#>   dust::shared_ptr<walk> shared;
#> };
#> 
#> namespace dust {
#> 
#> template <>
#> dust::pars_type<walk> dust_pars<walk>(cpp11::list pars) {
#>   walk::real_type sd = cpp11::as_cpp<walk::real_type>(pars["sd"]);
#>   const bool random_initial = pars["random_initial"] == R_NilValue ? false :
#>     cpp11::as_cpp<bool>(pars["random_initial"]);
#>   return dust::pars_type<walk>(walk::shared_type{sd, random_initial});
#> }
#> 
#> }

# The model can be compiled and loaded with dust::dust(filename)
# but it's faster in this example to use the prebuilt version in
# the package
model <- dust::dust_example("walk")

# Print the object and you can see the methods that it provides
model
#> <dust> object generator
#>   Public:
#>     initialize: function (pars, time, n_particles, n_threads = 1L, seed = NULL, 
#>     name: function () 
#>     param: function () 
#>     run: function (time_end) 
#>     simulate: function (time_end) 
#>     run_adjoint: function () 
#>     set_index: function (index) 
#>     index: function () 
#>     ode_control: function () 
#>     ode_statistics: function () 
#>     n_threads: function () 
#>     n_state: function () 
#>     n_particles: function () 
#>     n_particles_each: function () 
#>     shape: function () 
#>     update_state: function (pars = NULL, state = NULL, time = NULL, set_initial_state = NULL, 
#>     state: function (index = NULL) 
#>     time: function () 
#>     set_stochastic_schedule: function (time) 
#>     reorder: function (index) 
#>     resample: function (weights) 
#>     info: function () 
#>     pars: function () 
#>     rng_state: function (first_only = FALSE, last_only = FALSE) 
#>     set_rng_state: function (rng_state) 
#>     has_openmp: function () 
#>     has_gpu_support: function (fake_gpu = FALSE) 
#>     has_compare: function () 
#>     real_size: function () 
#>     time_type: function () 
#>     rng_algorithm: function () 
#>     uses_gpu: function (fake_gpu = FALSE) 
#>     n_pars: function () 
#>     set_n_threads: function (n_threads) 
#>     set_data: function (data, shared = FALSE) 
#>     compare_data: function () 
#>     filter: function (time_end = NULL, save_trajectories = FALSE, time_snapshot = NULL, 
#>     gpu_info: function () 
#>   Private:
#>     pars_: NULL
#>     pars_multi_: NULL
#>     index_: NULL
#>     info_: NULL
#>     n_threads_: NULL
#>     n_particles_: NULL
#>     n_particles_each_: NULL
#>     shape_: NULL
#>     ptr_: NULL
#>     gpu_config_: NULL
#>     ode_control_: NULL
#>     methods_: NULL
#>     param_: NULL
#>     reload_: NULL
#>   Parent env: <environment: namespace:dust>
#>   Locked objects: TRUE
#>   Locked class: FALSE
#>   Portable: TRUE

# Create a model with standard deviation of 1, initial time step zero
# and 30 particles
obj <- model$new(list(sd = 1), 0, 30)
obj
#> <dust>
#>   Public:
#>     compare_data: function () 
#>     filter: function (time_end = NULL, save_trajectories = FALSE, time_snapshot = NULL, 
#>     gpu_info: function () 
#>     has_compare: function () 
#>     has_gpu_support: function (fake_gpu = FALSE) 
#>     has_openmp: function () 
#>     index: function () 
#>     info: function () 
#>     initialize: function (pars, time, n_particles, n_threads = 1L, seed = NULL, 
#>     n_pars: function () 
#>     n_particles: function () 
#>     n_particles_each: function () 
#>     n_state: function () 
#>     n_threads: function () 
#>     name: function () 
#>     ode_control: function () 
#>     ode_statistics: function () 
#>     param: function () 
#>     pars: function () 
#>     real_size: function () 
#>     reorder: function (index) 
#>     resample: function (weights) 
#>     rng_algorithm: function () 
#>     rng_state: function (first_only = FALSE, last_only = FALSE) 
#>     run: function (time_end) 
#>     run_adjoint: function () 
#>     set_data: function (data, shared = FALSE) 
#>     set_index: function (index) 
#>     set_n_threads: function (n_threads) 
#>     set_rng_state: function (rng_state) 
#>     set_stochastic_schedule: function (time) 
#>     shape: function () 
#>     simulate: function (time_end) 
#>     state: function (index = NULL) 
#>     time: function () 
#>     time_type: function () 
#>     update_state: function (pars = NULL, state = NULL, time = NULL, set_initial_state = NULL, 
#>     uses_gpu: function (fake_gpu = FALSE) 
#>   Private:
#>     gpu_config_: NULL
#>     index_: NULL
#>     info_: NULL
#>     methods_: list
#>     n_particles_: 30
#>     n_particles_each_: 30
#>     n_threads_: 1
#>     ode_control_: NULL
#>     param_: NULL
#>     pars_: list
#>     pars_multi_: FALSE
#>     ptr_: externalptr
#>     reload_: NULL
#>     shape_: 30

# Curent state is all zero
obj$state()
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
#> [1,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0
#>      [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26]
#> [1,]     0     0     0     0     0     0     0     0     0     0     0     0
#>      [,27] [,28] [,29] [,30]
#> [1,]     0     0     0     0

# Current time is also zero
obj$time()
#> [1] 0

# Run the model up to time step 100
obj$run(100)
#>          [,1]     [,2]    [,3]       [,4]     [,5]     [,6]      [,7]      [,8]
#> [1,] 1.562305 7.288367 5.84155 0.08509074 -7.07042 10.60426 -2.057119 -9.614804
#>           [,9]   [,10]    [,11]    [,12]     [,13]     [,14]    [,15]    [,16]
#> [1,] -8.294164 9.07455 19.80548 1.293768 -13.24191 -1.231157 8.591563 5.342956
#>          [,17]     [,18]    [,19]   [,20]     [,21]     [,22]     [,23]
#> [1,] -9.857717 -8.946853 18.32985 7.39139 -1.159903 -15.64287 -15.16489
#>         [,24]     [,25]    [,26]    [,27]     [,28]      [,29]      [,30]
#> [1,] 3.454845 -2.186743 7.231447 14.71851 -6.099696 -0.4988606 -0.6123435

# Reorder/resample the particles:
obj$reorder(sample(30, replace = TRUE))

# See the state again
obj$state()
#>           [,1]     [,2]     [,3]     [,4]     [,5]     [,6]       [,7]     [,8]
#> [1,] -15.16489 1.293768 -7.07042 10.60426 8.591563 8.591563 0.08509074 10.60426
#>           [,9]    [,10]    [,11]    [,12]    [,13]     [,14]    [,15]
#> [1,] -8.294164 -7.07042 -7.07042 3.454845 8.591563 -13.24191 7.288367
#>           [,16]     [,17]    [,18]      [,19]    [,20]    [,21]     [,22]
#> [1,] -0.6123435 -15.16489 19.80548 -0.6123435 19.80548 -7.07042 -1.159903
#>         [,23]      [,24]    [,25]    [,26]      [,27]    [,28]    [,29]   [,30]
#> [1,] 1.293768 -0.4988606 7.288367 10.60426 0.08509074 5.342956 7.288367 5.84155