Performance • individual

Introduction

Individual is designed for running big individual-based models. But if you find your model taking too long or consuming all of your memory, here are some things you can try to speed up your simulation.

Bitset

The bitset data structure is used to record the presence or absence of an element in a finite set. individual::Bitset implements this data structure and is able to preform set operations extremely quickly using bitwise operations. Taking advantage of these operations can lead to very fast processes, when you need to find some particular subset of individuals.

Let’s take a look at the recovery process in vignette("Tutorial"). A crucial operation here is to get all infectious individuals who are not already scheduled for recovery. The object I is a bitset containing all those individuals currently infectious, and already_scheduled is another bitset containing those individuals scheduled for a recovery event. Using already_scheduled$not() returns a new bitset of those individuals not in the set of already scheduled persons. This is passed to the I$and(), which modifies I in-place so that the result is the intersection of currently infectious persons and persons who have not yet been scheduled for a recovery event, which is precisely the set of people we want.

recovery_process <- function(t){
  I <- health$get_index_of("I")
  already_scheduled <- recovery_event$get_scheduled()
  I$and(already_scheduled$not())
  rec_times <- rgeom(n = I$size(),prob = pexp(q = gamma * dt)) + 1
  recovery_event$schedule(target = I,delay = rec_times)
}

Bitsets can also be efficiently sampled using Bitset$sample(). This is used in the infection process of vignette("Tutorial"). Once the per-capita force of infection (probability of moving from S to I during this time step) is calculated, the bitset S is sampled with that probability which modifies it in-place. The number of elements remaining after being sampled is binomially distributed. The argument rate can also be specified as a vector of probabilities, one for each element in the bitset.

infection_process <- function(t){
  I <- health$get_size_of("I")
  foi <- beta * I/N
  S <- health$get_index_of("S")
  S$sample(rate = pexp(q = foi * dt))
  health$queue_update(value = "I",index = S)
}

When creating a new Bitset, a user must specify the maximum size of the bitset. This is the maximum number of positive integers which the bitset can store. For example, if calling Bitset$new(size = 100), the resulting object is able to store the presence or absence of integers between 1 and 100 (inclusive). Attempting to insert or remove elements outside of this range will result in an error.

Bitsets offer methods to preform unions (Bitset$or()), intersections (Bitset$and()), symmetric set difference (also known as exclusive or, Bitset$xor()), and set difference (Bitset$set_difference()) with other bitsets. These methods modify the bitset in-place. The method Bitset$not() gives the complement of a bitset, and returns a new individual::Bitset object, leaving the original bitset intact. Because these set operations use bitwise operations directly rather than more expensive relational operators, computations with bitsets are extremely fast. Taking advantage of bitset operations can help make processes in “individual” much faster.

This can be seen when implementing a common pattern in epidemiological models: sampling success or failure for a bitset of individuals, and then generating two bitsets to hold individuals sampled one way or the other. A first method might use individual::filter_bitset.

n <- 1e4
bset <- Bitset$new(n)$insert(1:n)
probs <- runif(n)

keep <- probs >= 0.5

stay <- filter_bitset(bitset = bset, other = keep)
leave <- filter_bitset(bitset = bset, other = !keep)

This pattern is almost always slower than using the sample method with a set difference:

stay <- bset$copy()
stay$sample(rate = probs)

leave <- bset$copy()$set_difference(stay)

In both instances the original bitset object bset is not modified. The latter pattern can be made even faster if the original may be modified by directly taking the set difference with it. For models with large population sizes, the speed differences can be substantial.

Because a bitset stores integers in some finite set, it can be returned as an integer vector by using Bitset$to_vector(). However, this is a slow and expensive operation, as data must be copied into a new vector which is returned to R. If your model’s dynamics require the frequent returning of integer vectors, an individual::IntegerVariable object will be more appropriate. However, for most discrete variables, and especially those which mirror compartments in mathematical models, bitset operations and individual::CategoricalVariable (which uses bitsets internally) should be preferred.

Prefabs

Every time your processes ask for a variable, there is an overhead associated with moving simulation data into R, potentially incurring expensive copying of data.

Because many epidemiological models have similar state transitions, we’ve included several “prefab” processes and event listeners implemented in C++ which provide significant speed improvements and can be used out of the box in models. The functions return pointers which can be passed to the process list of individual::simulate_loop or event listeners just like closures in R. The processes available are:

individual::bernoulli_process: moves individuals from one categorical variable state to another at a constant probability
individual::multi_probability_bernoulli_process: moves individuals from one categorical variable state to another at a individual level probability specified by a individual::DoubleVariable object
individual::fixed_probability_multinomial_process: moves individuals from one categorical variable state to a set of possible destination values with constant probability to leave and multinomially distributed choice of destination state.
individual::multi_probability_multinomial_process: moves individuals from one categorical variable state to a set of possible destination values with individual level probability to leave specified by a individual::DoubleVariable object and multinomially distributed choice of destination state.
individual::infection_age_process: Simulates infection for age-structured models, where individuals come into contact at a rate given by a mixing (contact) matrix.

Prefabs for event listeners and renderers:

individual::update_category_listener: event listener for individual::TargetedEvent objects which updates the categorical variable state when it fires.
individual::reschedule_listener: event listener for individual::TargetedEvent objects which schedules some new followup event when it fires.
individual::categorical_count_renderer_process: used for individual::Render objects that counts the size of each state in a categorical variable.

C++ Prefabs

Unfortunately, we don’t have a prefab for every situation. Please feel free to write one of your own!

These are the basic steps to add C++ processes to your R package:

Run usethis::use_rcpp to set your package up for C++ development.
Add individual to the LinkingTo section of your package DESCRIPTION.
If you package is named mypackage, create a header file containing #include<individual.h> in any of these locations:
```
src/mypackage_types.h
src/mypackage_types.hpp
inst/include/mypackage_types.h
inst/include/mypackage_types.hpp
```
Then this header file will be automatically included in RcppExports.cpp. For more information, see section “2.5 Types in Generated Code” in the Rcpp Attributes vignette.
Create a file src/Makecars containing the line CXX_STD = CXX14. Because individual uses C++14 features, when compiling your package against it you must let the compiler know it should use the C++14 standard, otherwise it will not be able to compile.
Write your process!

Processes in C++ are of type process_t, defined in inst/include/common_types.h. Types for listeners for individual::Event and individual::TargetedEvent are listener_t and targeted_listener_t, defined in inst/include/Event.h. Below is how the C++ implementation of multi_probability_bernoulli_process is coded.

Note that the return type is a Rcpp::XPtr (see here) to a process_t, which is implemented as a std::function (see here) object, a C++ class that can hold any callable type. The Rcpp::XPtr is initialized with a pointer to a process_t object, which itself holds a C++ lambda function, basically a function closure.

The lambda function captures the input arguments by value, and takes a single argument when called t, giving the current time step (just like process functions in R). Sampling those individuals to change state is implemented with the C++ API for these objects.

#include <individual.h>
#include <Rcpp.h>

// [[Rcpp::export]]
Rcpp::XPtr<process_t> multi_probability_bernoulli_process_cpp(
    Rcpp::XPtr<CategoricalVariable> variable,
    const std::string from,
    const std::string to,
    const Rcpp::XPtr<DoubleVariable> rate_variable
){

    // make pointer to lambda function and return XPtr to R
    return Rcpp::XPtr<process_t>(
        new process_t([variable,rate_variable,from,to](size_t t){      

            // sample leavers with their unique prob
            individual_index_t leaving_individuals(variable->get_index_of(std::vector<std::string>{from}));
            std::vector<double> rate_vector = rate_variable->get_values(leaving_individuals);
            bitset_sample_multi_internal(leaving_individuals, rate_vector.begin(), rate_vector.end());

            variable->queue_update(to, leaving_individuals);

        }),
        true
    ); 
};

The exported function can be used normally in R when creating the list of processes:

processes <- list(
  multi_probability_bernoulli_process_cpp(state, "I", "R", prob),
  other_process_1,
  other_process_2
)

That’s everything you need to scale your models up to millions of individuals!