This vignette details how to make changes and add extensions to leapfrog. There are some organisation and structural things about leapfrog which were added to enable different researchers to make extensions to the model. In a way that we can turn these on or off at run time to run different variants of the model.
The overall aim of this is to allow researchers to write these model extensions with as little overhead as possible. If you find something annoying or difficult, let me know and we can probably try and simplify it. Or at least document it better.
Leapfrog has a set of ModelVariant
s which can be run.
See inst/include/model_variants.hpp
for details of these. Each model variant is a collection of boolean
switches or enums. These are used to turn on or off different parts of
the code when it is run, so that leapfrog can have different extensions
and we can compose these together in any way we want. These model
variants are evaluated at compile time. When you compile the code, there
will be an instance in the compiled binary for each variant your code
will call. This will cause the binary to be larger but we chose to do it
this way because we wanted the speed from the compile-time plymorphism.
We still need to benchmark this to see how much of a gain it actually
gives.
All model functions are templated on the model variant, and any
conditional behaviour based upon the model variant should be written as
an if constexpr
.
All model functions should have the same signature e.g.
template<typename ModelVariant, typename real_type>
void my_model_function(int time_step,
const Parameters<ModelVariant, real_type> &pars,
const State<ModelVariant, real_type> &state_curr,
State<ModelVariant, real_type> &state_next,
IntermediateData<ModelVariant, real_type> &intermediate) {
To explain what is going on here
ModelVariant
- this is used for any
conditional behaviour based upon the variant being run see Model variants for details.real_type
- this is used by TMB during
fitting. Under a normal fit this will be a double
.time_step
- the current time step of the model as an
index, e.g. if running for 1970:2030, this will start at 1
and loop to 61
. Any input data you read based on time step
index should have 1970
at the first index. Index
1
in R and 0
in C++.pars
- the parameters for this model, these are
read-only valuesstate_curr
- the state at the current point in time
i.e. at t = time_step - 1
. This is read-only.state_next
- the state at the next time point, i.e. at
t = time_step
. This is what we are currently populating
from the previous time step and the parameters.intermediate
- a palce for storing any data used within
a single time step. Use this as a place to store intermediate values for
use later in your code. This is reset to all 0s at the end of every time
step.Note that after each time step the code will do the following
state_curr
with state_next
.state_next
to all 0s.intermediate
to all 0s.Leapfrog runs a top-level loop over the time step. At the end of each
time step, the state is optionally saved out and eventually returned. We
do it this way because it decouples the reporting of the model and each
time step iteration. When you run the model with run_model
you can specify which years you want to output data for. By default it
will output for all time steps, but if say you are only interested in
the last time step. You can return this by running e.g.
run_model(data, parameters, 1970:2030, 10, 2030)
This time output is managed by the state saver see inst/include/state_saver.hpp
Leapfrog uses code generation to write the code for wiring up the input and output data. This is to reduce the number of locations you need to make changes when you add new input data or return new data from the model. In short it amounts to:
inst/cpp_generation/model_input.csv
and run scripts/generate
inst/cpp_generation/model_output.csv
and run scripts/generate
Input data is generated from csv file in inst/include/model_input.hpp
,
the columns in this file are
r_name
- The name of the data in R, from either
data
or parameters
struct passed to
run_model
cpp_name
- The name of the data in C++, the label you
will use to refer to the data within the leapfrog model. Use a name
which adheres to the naming convention rules in the README.type
- The type of data, real_type
or
int
. real_type
is a templated type. Under
normal usage it will be a double
but is used by TMB for
model fitting.input_when
- An expression which evaluates to a boolean
to control when this input data is used. For example
ModelVariant::run_child_model
to include only in the child
model. If blank, this is always input.value
- If you want to initialise this data as a
constant value array, enter the value here. Note that if using this
column then no value should be in the r_name
column.convert_base
- Set to TRUE
for any array
of indexes, these will be converted from 1-based numbering (for R) to
0-based numbering (for C++).dims
- The number of dimensions for this input (max
4)dim1
- The size of the first dimensiondim2
- The size of the second dimension, can by
blankdim3
- The size of the third dimension, can be
blankdim4
- The size of the fourth dimension, can be
blankNote that the dims can be any of the state space sizes or
proj_years
for the number of years in the projection. See
the README
for valid state space sizes.
After making changes to the model_input.csv
,
run the generate script scripts/generate
.
This update file inst/include/model_input.hpp
.
Note that this file should never be changed manually as the generate
script will completely rewrite it.
After regenerating the code, rebuild the project and you are ready to use the new input data in C++.
Model outputs are generated in a similar way to inputs. To modify an
existing or add new output update the file
[inst/cpp_generation/model_output.csv
](https://github.com/mrc-ide/frogger/blob/main/inst/cpp_generation/model_.
The columns in this file are:
r_name
- The name of the output in the list output from
run_model
cpp_name
- The name of the output as it is referred to
in C++r_type
- The R
type mapping function to
use to convert this to an R-type number e.g. REAL
to create
a real typed numberoutput_when
- Conditional expression for when to output
this from the model. For example
ModelVariant::run_child_model
to return only from the child
model. If blank, this is always returned.dims
- The number of dimensions of this output (max
5)dim1
- The size of the first dimensiondim2
- The size of the second dimensiondim3
- The size of the third dimensiondim4
- The size of the fourth dimensiondim5
- The size of the fifth dimensionNote that the dims can be any of the state space sizes or
proj_years
for the number of years in the projection. See
the README
for valid state space sizes.
Intermediate data is used as a place to store The intermediate data
is not handled by code generation as it only requires changes in one
place. You can make modifications by adding new data into the struct in
inst/include/types.hpp
,
making sure to set its default value in the reset
function
on the struct.
To add a new variant, requires work over a few areas. There are two types of model variants at the moment: 1. A flag which turns a section of code on or off, this will come with additional model inputs and outputs 1. A flag which changes the dimensions of some model input or outputs
The required changes will be different depending on if your new variant is one of the first types, which brings additional input/output or the second type which does not bring additional data.
To add a new variant. Firstly, in the code generation:
inst/include/model_input.csv
or inst/include/model_output.csv
inst/cpp_generation/state_types.hpp.in
In the C++ code:
inst/include/model_variants.hpp
inst/include/state_space.hpp
inst/include/parameters.hpp
.
Parameter types will be generated, you need to add those types to the
appropriate template specialisationinst/include/state_saver.hpp
You now need to update the R wrapper to be able to call this new variant:
R/run_model.R
add a new flag to the run_model
interface for using your
new variantmodel_variant
string based upon your new flagsrc/frogger.cpp
.
Note use a constexpr
for the state space to ensure it is
evaluated at compile time.