Developing leapfrog • frogger

This vignette details how to make changes and add extensions to leapfrog. There are some organisation and structural things about leapfrog which were added to enable different researchers to make extensions to the model. In a way that we can turn these on or off at run time to run different variants of the model.

The overall aim of this is to allow researchers to write these model extensions with as little overhead as possible. If you find something annoying or difficult, let me know and we can probably try and simplify it. Or at least document it better.

Structure

Model variants

Leapfrog has a set of ModelVariants which can be run. See inst/include/model_variants.hpp for details of these. Each model variant is a collection of boolean switches or enums. These are used to turn on or off different parts of the code when it is run, so that leapfrog can have different extensions and we can compose these together in any way we want. These model variants are evaluated at compile time. When you compile the code, there will be an instance in the compiled binary for each variant your code will call. This will cause the binary to be larger but we chose to do it this way because we wanted the speed from the compile-time plymorphism. We still need to benchmark this to see how much of a gain it actually gives.

All model functions are templated on the model variant, and any conditional behaviour based upon the model variant should be written as an if constexpr.

Model functions

All model functions should have the same signature e.g.

template<typename ModelVariant, typename real_type>
void my_model_function(int time_step,
                       const Parameters<ModelVariant, real_type> &pars,
                       const State<ModelVariant, real_type> &state_curr,
                       State<ModelVariant, real_type> &state_next,
                       IntermediateData<ModelVariant, real_type> &intermediate) {

To explain what is going on here

Templated on ModelVariant - this is used for any conditional behaviour based upon the variant being run see Model variants for details.
Templated on real_type - this is used by TMB during fitting. Under a normal fit this will be a double.
time_step - the current time step of the model as an index, e.g. if running for 1970:2030, this will start at 1 and loop to 61. Any input data you read based on time step index should have 1970 at the first index. Index 1 in R and 0 in C++.
pars - the parameters for this model, these are read-only values
state_curr - the state at the current point in time i.e. at t = time_step - 1. This is read-only.
state_next - the state at the next time point, i.e. at t = time_step. This is what we are currently populating from the previous time step and the parameters.
intermediate - a palce for storing any data used within a single time step. Use this as a place to store intermediate values for use later in your code. This is reset to all 0s at the end of every time step.

Note that after each time step the code will do the following

Optionally save out the state see State saving section.
Replace state_curr with state_next.
Set new state_next to all 0s.
Set intermediate to all 0s.

State saving

Leapfrog runs a top-level loop over the time step. At the end of each time step, the state is optionally saved out and eventually returned. We do it this way because it decouples the reporting of the model and each time step iteration. When you run the model with run_model you can specify which years you want to output data for. By default it will output for all time steps, but if say you are only interested in the last time step. You can return this by running e.g.

run_model(data, parameters, 1970:2030, 10, 2030)

This time output is managed by the state saver see inst/include/state_saver.hpp

Adding new input, output or intermediate data

Leapfrog uses code generation to write the code for wiring up the input and output data. This is to reduce the number of locations you need to make changes when you add new input data or return new data from the model. In short it amounts to:

Add new input data - modify inst/cpp_generation/model_input.csv and run scripts/generate
Add new model output - modify inst/cpp_generation/model_output.csv and run scripts/generate

Input data

Input data is generated from csv file in inst/include/model_input.hpp, the columns in this file are

r_name - The name of the data in R, from either data or parameters struct passed to run_model
cpp_name - The name of the data in C++, the label you will use to refer to the data within the leapfrog model. Use a name which adheres to the naming convention rules in the README.
type - The type of data, real_type or int. real_type is a templated type. Under normal usage it will be a double but is used by TMB for model fitting.
input_when - An expression which evaluates to a boolean to control when this input data is used. For example ModelVariant::run_child_model to include only in the child model. If blank, this is always input.
value - If you want to initialise this data as a constant value array, enter the value here. Note that if using this column then no value should be in the r_name column.
convert_base - Set to TRUE for any array of indexes, these will be converted from 1-based numbering (for R) to 0-based numbering (for C++).
dims - The number of dimensions for this input (max 4)
dim1 - The size of the first dimension
dim2 - The size of the second dimension, can by blank
dim3 - The size of the third dimension, can be blank
dim4 - The size of the fourth dimension, can be blank

Note that the dims can be any of the state space sizes or proj_years for the number of years in the projection. See the README for valid state space sizes.

After making changes to the model_input.csv, run the generate script scripts/generate. This update file inst/include/model_input.hpp. Note that this file should never be changed manually as the generate script will completely rewrite it.

After regenerating the code, rebuild the project and you are ready to use the new input data in C++.

Model outputs

Model outputs are generated in a similar way to inputs. To modify an existing or add new output update the file [inst/cpp_generation/model_output.csv](https://github.com/mrc-ide/frogger/blob/main/inst/cpp_generation/model_. The columns in this file are:

r_name - The name of the output in the list output from run_model
cpp_name - The name of the output as it is referred to in C++
r_type - The R type mapping function to use to convert this to an R-type number e.g. REAL to create a real typed number
output_when - Conditional expression for when to output this from the model. For example ModelVariant::run_child_model to return only from the child model. If blank, this is always returned.
dims - The number of dimensions of this output (max 5)
dim1 - The size of the first dimension
dim2 - The size of the second dimension
dim3 - The size of the third dimension
dim4 - The size of the fourth dimension
dim5 - The size of the fifth dimension

Note that the dims can be any of the state space sizes or proj_years for the number of years in the projection. See the README for valid state space sizes.

Intermediate data

Intermediate data is used as a place to store The intermediate data is not handled by code generation as it only requires changes in one place. You can make modifications by adding new data into the struct in inst/include/types.hpp, making sure to set its default value in the reset function on the struct.

Adding a new model variant

To add a new variant, requires work over a few areas. There are two types of model variants at the moment: 1. A flag which turns a section of code on or off, this will come with additional model inputs and outputs 1. A flag which changes the dimensions of some model input or outputs

The required changes will be different depending on if your new variant is one of the first types, which brings additional input/output or the second type which does not bring additional data.

To add a new variant. Firstly, in the code generation:

Add any new input or output data in inst/include/model_input.csv or inst/include/model_output.csv
If you have new input data, add a specialism for the new variant into the state_types template in inst/cpp_generation/state_types.hpp.in
Run the generate script

In the C++ code:

Add your new variant in inst/include/model_variants.hpp
Add a struct for the new state space in inst/include/state_space.hpp
If your model has new input data, add parameters for this variant in inst/include/parameters.hpp. Parameter types will be generated, you need to add those types to the appropriate template specialisation
If your model has new output data, add a new specialisation in inst/include/state_saver.hpp

You now need to update the R wrapper to be able to call this new variant:

In R/run_model.R add a new flag to the run_model interface for using your new variant
Update the conditionals to build the model_variant string based upon your new flag
Add mapping from the string to the model variant struct in Rcpp wrapper code in src/frogger.cpp. Note use a constexpr for the state space to ensure it is evaluated at compile time.