Create a particle_filter
object for running
and interacting with a particle filter. A higher-level
interface will be implemented later.
model
The dust model generator being simulated (cannot be re-bound)
n_particles
Number of particles used (read only)
has_multiple_parameters
Logical, indicating if the
particle filter requires multiple parameter sets in a list
as inputs, and if it it will produce a vector of likelihoods
the same length (read only). The parameter sets may or may
not use the same data (see has_multiple_data
).
has_multiple_data
Logical, indicating if the particle
filter simultaneously calculates the likelihood for multiple
parameter sets (read only). If TRUE
, has_multiple_parameters
will always be TRUE
.
n_parameters
The number of parameter sets used by this
particle filter (read only). The returned vector of likelihood
will be this length, and if has_multiple_parameters
is FALSE
this will be 1.
n_data
The number of data sets used by this particle filter
(read only). This will either be 1 or the same value as
n_parameters
.
new()
Create the particle filter
particle_filter$new(
data,
model,
n_particles,
compare,
index = NULL,
initial = NULL,
constant_log_likelihood = NULL,
n_threads = 1L,
seed = NULL,
n_parameters = NULL,
gpu_config = NULL,
stochastic_schedule = NULL,
ode_control = NULL
)
data
The data set to be used for the particle filter,
created by particle_filter_data()
. This is essentially
a data.frame()
with at least columns time_start
and time_end
, along with any additional data used in the
compare
function, and additional information about how your
dust time steps relate to a more interpretable measure of model
time.
model
A stochastic model to use. Must be a
dust_generator
object.
n_particles
The number of particles to simulate
compare
A comparison function. Must take arguments
state
, observed
and pars
as arguments (though the arguments
may have different names). state
is the simulated model state
(a matrix with as many rows as there are state variables and as
many columns as there are particles, data
is a list
of observed data corresponding to the current
time's row in the data
object provided here in the
constructor. pars
is any additional parameters passed
through to the comparison function (via the pars
argument to $run
). Alternatively, compare
can be NULL
if your model provides a built-in compile compare function
(if model$public_methods$has_compare()
is TRUE
), which may
be faster.
index
An index function. This is used to compute the
"interesting" indexes of your model. It must be a function of
one argument, which will be the result of calling the
$info()
method on your model. It should return a list
with elements run
(indices to return at the end of each
run, passed through to your compare function) and state
(indices to return if saving state). These indices can overlap
but do not have to. This argument is optional but using it will
likely speed up your simulation if you have more than a few
states as it will reduce the amount of memory copied back and
forth.
initial
A function to generate initial conditions. If
given, then this function must accept 3 arguments: info
(the result of calling $info()
as for index
),
n_particles
(the number of particles that the particle
filter is using) and pars
(parameters passed in in the
$run
method via the pars
argument). It
must return a list, which can have the elements state
(initial model state, passed to the particle filter - either a
vector or a matrix, and overriding the initial conditions
provided by your model) and time
(the initial time step,
overriding the first time step of your data - this must occur
within your first epoch in your data
provided to the
constructor, i.e., not less than the first element of
time_start
and not more than time_end
). Your function
can also return a vector or matrix of state
and not alter
the starting time step, which is equivalent to returning
list(state = state, time = NULL)
.
constant_log_likelihood
An optional function, taking the
model parameters, that computes the constant part of the
log-likelihood value (if any). You can use this where your
likelihood depends both on the time series (via data
) but also
on some non-temporal data. You should bind any non-parameter
dependencies into this closure. This is applied at the
beginning of the filter run, so represents the initial
condition of the marginal log likelihood value propagated by
the filter.
n_threads
Number of threads to use when running the simulation. Defaults to 1, and should not be set higher than the number of cores available to the machine.
seed
Seed for the random number generator on initial
creation. Can be NULL
(to initialise using R's random number
generator), a positive integer, or a raw vector - see dust::dust
and dust::dust_rng
for more details. Note that the random number
stream is unrelated from R's random number generator, except for
initialisation with seed = NULL
.
n_parameters
Number of parameter sets required. This, along
with data
, controls the interpretation of how the particle
filter, and importantly will add an additional dimension to
most outputs (scalars become vectors, vectors become matrices etc).
gpu_config
GPU configuration, typically an integer
indicating the device to use, where the model has GPU support.
An error is thrown if the device id given is larger than those
reported to be available (note that CUDA numbers devices from 0,
so that '0' is the first device, so on). See the method $gpu_info()
for available device ids; this can be called before object creation
as model$public_methods$gpu_info()
.
For additional control, provide a list with elements device_id
and run_block_size
. Further options (and validation) of this
list will be added in a future version!
stochastic_schedule
Vector of times to perform stochastic updates, for continuous time models.
ode_control
Tuning control for the ODE stepper, for continuous time (ODE) models
run()
Run the particle filter
particle_filter$run(
pars = list(),
save_history = FALSE,
save_restart = NULL,
min_log_likelihood = NULL
)
pars
A list representing parameters. This will be passed as
the pars
argument to your model, to your compare
function, and (if using) to your initial
function. It must
be an R list (not vector or NULL
) because that is what a
dust model currently requires on initialisation or $reset
- we
may relax this later. You may want to put your observation and
initial parameters under their own keys (e.g.,
pars$initial$whatever
), but this is up to you. Extra keys
are silently ignored by dust models.
save_history
Logical, indicating if the history of all
particles should be saved. If saving history, then it can be
queried later with the $history
method on the object.
save_restart
An integer vector of time points to save
restart infomation for. These are in terms of your underlying time
variable (the time
column in particle_filter_data()
) not in
terms of time steps. The state will be saved after the particle
filtering operation (i.e., at the end of the step).
min_log_likelihood
Optionally, a numeric value representing the
smallest likelihood we are interested in. If given and the particle
filter drops below this number, then we terminate early and return
-Inf
. In this case, history and final state cannot be returned
from the filter. This is primarily intended for use with
pmcmc where we can avoid computing likelihoods that
will certainly be rejected. Only suitable for use where
log-likelihood increments (with the compare
function) are always
negative. This is the case if you use a normalised discrete
distribution, but not necessarily otherwise. If using a
multi-parameter filter this can be a single number (in which case
the exit is when the sum of log-likelihoods drops below this
threshold) or a vector of numbers the same length as pars
(in
which case exit occurs when all numbers drop below this threshold).
run_begin()
Begin a particle filter run. This is part of the
"advanced" interface for the particle filter; typically you will
want to use $run()
which provides a user-facing wrapper around
this function. Once created with $run_begin()
, you should take
as many steps as needed with $step()
.
particle_filter$run_begin(
pars = list(),
save_history = FALSE,
save_restart = NULL,
min_log_likelihood = NULL
)
pars
A list representing parameters. See $run()
for details.
save_history
Logical, indicating if the history of all
particles should be saved. See $run()
for details.
save_restart
Times to save restart state at. See $run()
for
details.
min_log_likelihood
Optionally, a numeric value representing the
smallest likelihood we are interested in. See $run()
for details.
state()
Extract the current model state, optionally filtering. If the model has not yet been run, then this method will throw an error. Returns a matrix with the number of rows being the number of model states, and the number of columns being the number of particles.
history()
Extract the particle trajectories. Requires that
the model was run with save_history = TRUE
, which does
incur a performance cost. This method will throw an error if
the model has not run, or was run without save_history = TRUE
. Returns a 3d array with dimensions corresponding to (1)
model state, filtered by index$run
if provided, (2)
particle (following index_particle
if provided), (3)
time point. If using a multi-parameter filter then returns a 4d array
with dimensions corresponding to (1) model state, (2) particle, (3)
parameter, (4) time point.
ode_statistics()
Fetch statistics about steps taken during the integration, by
calling through to the $statistics()
method of the underlying
model. This is only available for continuous time (ODE) models,
and will error if used with discrete time models.
restart_state()
Return the full particle filter state at points back in time
that were saved with the save_restart
argument to
$run()
. If available, this will return a 3d array, with
dimensions representing (1) particle state, (2) particle index,
(3) time point. If multiple parameters are used then returns a 4d array,
with dimensions representing (1) particle state, (2) particle index,
(3) parameter set, (4) time point. This could be quite large, especially
if you are using the index
argument to create the particle filter
and return a subset of all state generally. It is also
different to the saved trajectories returned by $history()
because earlier saved state is not filtered by later filtering
(in the history we return the tree of history representing the
histories of the final particles, here we are returning all
particles at the requested point, regardless if they appear in
the set of particles that make it to the end of the
simulation).
inputs()
Return a list of inputs used to configure the particle
filter. These correspond directly to the argument names for the
particle filter constructor and are the same as the input
argument with the exception of seed
, which is the state of
the rng if it has been used (this can be used as a seed to
restart the model).
set_n_threads()
Set the number of threads used by the particle filter (and dust model) after creation. This can be used to allocate additional (or subtract excess) computing power from a particle filter. Returns (invisibly) the previous value.
n_threads
The new number of threads to use. You may want to
wrap this argument in dust::dust_openmp_threads()
in order to
verify that you can actually use the number of threads
requested (based on environment variables and OpenMP support).
# A basic SIR model included in the dust package
gen <- dust::dust_example("sir")
# Some data that we will fit to, using 1 particle:
sir <- gen$new(pars = list(), time = 0, n_particles = 1)
dt <- 1 / 4
day <- seq(1, 100)
incidence <- rep(NA, length(day))
true_history <- array(NA_real_, c(5, 1, 101))
true_history[, 1, 1] <- sir$state()
for (i in day) {
state_start <- sir$state()
sir$run(i / dt)
state_end <- sir$state()
true_history[, 1, i + 1] <- state_end
# Reduction in S
incidence[i] <- state_start[1, 1] - state_end[1, 1]
}
# Convert this into our required format:
data_raw <- data.frame(day = day, incidence = incidence)
data <- particle_filter_data(data_raw, "day", 4, 0)
# A comparison function
compare <- function(state, observed, pars = NULL) {
if (is.null(pars$exp_noise)) {
exp_noise <- 1e6
} else {
exp_noise <- pars$exp_noise
}
incidence_modelled <- state[1,]
incidence_observed <- observed$incidence
lambda <- incidence_modelled +
rexp(length(incidence_modelled), exp_noise)
dpois(incidence_observed, lambda, log = TRUE)
}
# Construct the particle_filter object with 100 particles
p <- particle_filter$new(data, gen, 100, compare)
p$run(save_history = TRUE)
#> [1] -17249.82
# Our simulated trajectories, with the "real" data superimposed
history <- p$history()
matplot(data_raw$day, t(history[1, , -1]), type = "l",
xlab = "Time", ylab = "State",
col = "#ff000022", lty = 1, ylim = range(history))
matlines(data_raw$day, t(history[2, , -1]), col = "#ffff0022", lty = 1)
matlines(data_raw$day, t(history[3, , -1]), col = "#0000ff22", lty = 1)
matpoints(data_raw$day, t(true_history[1:3, , -1]), pch = 19,
col = c("red", "yellow", "blue"))