Prepare data for use with particle filter — particle_filter

Prepare data for use with the particle_filter. This function is required to use the particle filter as helps arrange data and be explicit about the off-by-one errors that can occur. It takes as input your data to compare against a model, including some measure of "time". We need to convert this time into model time steps (see Details).

particle_filter_data(data, time, rate, initial_time = NULL, population = NULL)

Arguments

data: A data.frame() of data
time: The name of a column within data that represents your measure of time. This column must be integer-like. To avoid confusion, this cannot be called step, time, or model_time.
rate: The number of model "time steps" that occur between each time point (in model time time). This must also be integer-like for discrete time models and must be NULL for continuous time models.
initial_time: An initial time to start the model from. This should always be provided, and must be provided for continuous time models. For discrete time models, this is expressed in model time. It must be a non-negative integer and must be at most equal to the first value of the time column, minus 1 (i.e., data[[time]] - 1). For historical reasons if not given we take the first value of the time column minus one, but with a warning - this behaviour will be removed in a future version of mcstate.
population: Optionally, the name of a column within data that represents different populations. Must be a factor.

Value

If population is NULL, a data.frame with new columns time_start and time_end (required by particle_filter), along side all previous data except for the time variable, which is replaced by new <time>_start and <time>_end columns. If population is not NULL then a named list of data.frames as described above where each element represents populations in the order specified in the data.

Details

We require that the time variable increments in unit steps; this may be relaxed in future to even steps, or possibly irregular steps, but for now this assumption is required. We assume that the data in the first column is recorded at the end of a period of 1 time unit. So if you have in the first column t = 10, data = 100 we assume that the model steps from t = 9 to to t = 10 and at that period the data has value 100.

For continuous time models, time is simple to think about; time is continuous (and real-valued) and really any time is acceptable. For discrete time models there are two correlated measures of time we need to consider - (1) the dust "time step", a non-negative integer value that increases in unit steps, and (2) the "model time" which is related to the dust time step based on the rate parameter here as <model time> = <dust time> * <rate>. For a concrete example, consider a model where we want to think in terms of days, but which we take 10 steps per day. Time step 0 and model time 0 are the same, but day 1 occurs at step 10, day 15 at step 150 and so on.

Examples

d <- data.frame(day = 5:20, y = runif(16))
mcstate::particle_filter_data(d, "day", rate = 4, initial_time = 4)
#>    day_start day_end time_start time_end          y
#> 1          4       5         16       20 0.09420495
#> 2          5       6         20       24 0.23446213
#> 3          6       7         24       28 0.74925343
#> 4          7       8         28       32 0.22425154
#> 5          8       9         32       36 0.38322155
#> 6          9      10         36       40 0.85614510
#> 7         10      11         40       44 0.46839267
#> 8         11      12         44       48 0.41608926
#> 9         12      13         48       52 0.75918109
#> 10        13      14         52       56 0.79509755
#> 11        14      15         56       60 0.83541341
#> 12        15      16         60       64 0.30038474
#> 13        16      17         64       68 0.59615798
#> 14        17      18         68       72 0.69096314
#> 15        18      19         72       76 0.97078650
#> 16        19      20         76       80 0.47409719

# If providing an initial day, then the first epoch of simulation
# will be longer (see the first row)
mcstate::particle_filter_data(d, "day", rate = 4, initial_time = 0)
#>    day_start day_end time_start time_end          y
#> 1          0       5          0       20 0.09420495
#> 2          5       6         20       24 0.23446213
#> 3          6       7         24       28 0.74925343
#> 4          7       8         28       32 0.22425154
#> 5          8       9         32       36 0.38322155
#> 6          9      10         36       40 0.85614510
#> 7         10      11         40       44 0.46839267
#> 8         11      12         44       48 0.41608926
#> 9         12      13         48       52 0.75918109
#> 10        13      14         52       56 0.79509755
#> 11        14      15         56       60 0.83541341
#> 12        15      16         60       64 0.30038474
#> 13        16      17         64       68 0.59615798
#> 14        17      18         68       72 0.69096314
#> 15        18      19         72       76 0.97078650
#> 16        19      20         76       80 0.47409719

# If including populations:
d <- data.frame(day = 5:20, y = runif(16),
                population = factor(rep(letters[1:2], each = 16)))
mcstate::particle_filter_data(d, "day", 4, 0, "population")
#>    day_start day_end time_start time_end          y population
#> 1          0       5          0       20 0.89688187          a
#> 2          5       6         20       24 0.85579629          a
#> 3          6       7         24       28 0.93773368          a
#> 4          7       8         28       32 0.25396658          a
#> 5          8       9         32       36 0.64091969          a
#> 6          9      10         36       40 0.43633866          a
#> 7         10      11         40       44 0.84205755          a
#> 8         11      12         44       48 0.93930455          a
#> 9         12      13         48       52 0.05103372          a
#> 10        13      14         52       56 0.84367407          a
#> 11        14      15         56       60 0.94840946          a
#> 12        15      16         60       64 0.03745628          a
#> 13        16      17         64       68 0.21018439          a
#> 14        17      18         68       72 0.31921243          a
#> 15        18      19         72       76 0.42698888          a
#> 16        19      20         76       80 0.79741094          a
#> 17         0       5          0       20 0.89688187          b
#> 18         5       6         20       24 0.85579629          b
#> 19         6       7         24       28 0.93773368          b
#> 20         7       8         28       32 0.25396658          b
#> 21         8       9         32       36 0.64091969          b
#> 22         9      10         36       40 0.43633866          b
#> 23        10      11         40       44 0.84205755          b
#> 24        11      12         44       48 0.93930455          b
#> 25        12      13         48       52 0.05103372          b
#> 26        13      14         52       56 0.84367407          b
#> 27        14      15         56       60 0.94840946          b
#> 28        15      16         60       64 0.03745628          b
#> 29        16      17         64       68 0.21018439          b
#> 30        17      18         68       72 0.31921243          b
#> 31        18      19         72       76 0.42698888          b
#> 32        19      20         76       80 0.79741094          b