Prepare data for use with the particle_filter
. This function
is required to use the particle filter as helps arrange data and
be explicit about the off-by-one errors that can occur. It takes
as input your data to compare against a model, including some
measure of "time". We need to convert this time into model time
steps (see Details).
particle_filter_data(data, time, rate, initial_time = NULL, population = NULL)
A data.frame()
of data
The name of a column within data
that represents
your measure of time. This column must be integer-like. To
avoid confusion, this cannot be called step
, time
, or
model_time
.
The number of model "time steps" that occur between
each time point (in model time time
). This must also be
integer-like for discrete time models and must be NULL
for
continuous time models.
An initial time to start the model from. This
should always be provided, and must be provided for continuous
time models. For discrete time models, this is expressed in
model time. It must be a non-negative integer and must be at
most equal to the first value of the time
column, minus 1
(i.e., data[[time]] - 1
). For historical reasons if not given
we take the first value of the time
column minus one, but with
a warning - this behaviour will be removed in a future version
of mcstate.
Optionally, the name of a column within data
that
represents different populations. Must be a factor.
If population
is NULL, a data.frame with new columns
time_start
and time_end
(required by particle_filter
),
along side all previous data except for the time variable, which
is replaced by new <time>_start
and <time>_end
columns. If
population
is not NULL then a named list of data.frames as
described above where each element represents populations in the
order specified in the data.
We require that the time variable increments in unit steps; this
may be relaxed in future to even steps, or possibly irregular
steps, but for now this assumption is required. We assume that
the data in the first column is recorded at the end of a period of
1 time unit. So if you have in the first column t = 10, data = 100
we assume that the model steps from t = 9
to to t = 10
and at that period the data has value 100.
For continuous time models, time is simple to think about; time is
continuous (and real-valued) and really any time is
acceptable. For discrete time models there are two correlated
measures of time we need to consider - (1) the dust
"time step",
a non-negative integer value that increases in unit steps, and (2)
the "model time" which is related to the dust time step based on
the rate
parameter here as <model time> = <dust time> * <rate>
. For a concrete example, consider a model where we want
to think in terms of days, but which we take 10 steps per
day. Time step 0 and model time 0 are the same, but day 1 occurs
at step 10, day 15 at step 150 and so on.
d <- data.frame(day = 5:20, y = runif(16))
mcstate::particle_filter_data(d, "day", rate = 4, initial_time = 4)
#> day_start day_end time_start time_end y
#> 1 4 5 16 20 0.09420495
#> 2 5 6 20 24 0.23446213
#> 3 6 7 24 28 0.74925343
#> 4 7 8 28 32 0.22425154
#> 5 8 9 32 36 0.38322155
#> 6 9 10 36 40 0.85614510
#> 7 10 11 40 44 0.46839267
#> 8 11 12 44 48 0.41608926
#> 9 12 13 48 52 0.75918109
#> 10 13 14 52 56 0.79509755
#> 11 14 15 56 60 0.83541341
#> 12 15 16 60 64 0.30038474
#> 13 16 17 64 68 0.59615798
#> 14 17 18 68 72 0.69096314
#> 15 18 19 72 76 0.97078650
#> 16 19 20 76 80 0.47409719
# If providing an initial day, then the first epoch of simulation
# will be longer (see the first row)
mcstate::particle_filter_data(d, "day", rate = 4, initial_time = 0)
#> day_start day_end time_start time_end y
#> 1 0 5 0 20 0.09420495
#> 2 5 6 20 24 0.23446213
#> 3 6 7 24 28 0.74925343
#> 4 7 8 28 32 0.22425154
#> 5 8 9 32 36 0.38322155
#> 6 9 10 36 40 0.85614510
#> 7 10 11 40 44 0.46839267
#> 8 11 12 44 48 0.41608926
#> 9 12 13 48 52 0.75918109
#> 10 13 14 52 56 0.79509755
#> 11 14 15 56 60 0.83541341
#> 12 15 16 60 64 0.30038474
#> 13 16 17 64 68 0.59615798
#> 14 17 18 68 72 0.69096314
#> 15 18 19 72 76 0.97078650
#> 16 19 20 76 80 0.47409719
# If including populations:
d <- data.frame(day = 5:20, y = runif(16),
population = factor(rep(letters[1:2], each = 16)))
mcstate::particle_filter_data(d, "day", 4, 0, "population")
#> day_start day_end time_start time_end y population
#> 1 0 5 0 20 0.89688187 a
#> 2 5 6 20 24 0.85579629 a
#> 3 6 7 24 28 0.93773368 a
#> 4 7 8 28 32 0.25396658 a
#> 5 8 9 32 36 0.64091969 a
#> 6 9 10 36 40 0.43633866 a
#> 7 10 11 40 44 0.84205755 a
#> 8 11 12 44 48 0.93930455 a
#> 9 12 13 48 52 0.05103372 a
#> 10 13 14 52 56 0.84367407 a
#> 11 14 15 56 60 0.94840946 a
#> 12 15 16 60 64 0.03745628 a
#> 13 16 17 64 68 0.21018439 a
#> 14 17 18 68 72 0.31921243 a
#> 15 18 19 72 76 0.42698888 a
#> 16 19 20 76 80 0.79741094 a
#> 17 0 5 0 20 0.89688187 b
#> 18 5 6 20 24 0.85579629 b
#> 19 6 7 24 28 0.93773368 b
#> 20 7 8 28 32 0.25396658 b
#> 21 8 9 32 36 0.64091969 b
#> 22 9 10 36 40 0.43633866 b
#> 23 10 11 40 44 0.84205755 b
#> 24 11 12 44 48 0.93930455 b
#> 25 12 13 48 52 0.05103372 b
#> 26 13 14 52 56 0.84367407 b
#> 27 14 15 56 60 0.94840946 b
#> 28 15 16 60 64 0.03745628 b
#> 29 16 17 64 68 0.21018439 b
#> 30 17 18 68 72 0.31921243 b
#> 31 18 19 72 76 0.42698888 b
#> 32 19 20 76 80 0.79741094 b