R/data.R
historical_data.Rd
A data.frame of sites that were used to estimate the ICC based on previously published data. These sites passed strict inclusion criteria to ensure they are maximally informative (see details).
data(historical_data)
A data.frame of 30 rows and 11 columns. Each row gives a different site that made it through filtering steps in the ICC analysis from historical data. Coluns give geographic properties, sampling times, the number of samples tested and positive for pfhrp2 deletions, and the citation from which the data originates.
The raw dataset of historical pfhrp2/3 studies was downloaded from the WHO malaria threats map on 27 Nov 2023. This spreadsheet can be found in this package in the R_ignore/data folder (see the Github repos) under the name "MTM_PFHRP23_GENE_DELETIONS_20231127_edited.xlss". Note that this spreadsheet has the term "_edited" added to the name because two extra columns were added to the original data: "discard" and "discard_reason". These columns specify certain rows that should be discarded in the original data due to data entry mistakes. The following steps were then taken. All scripts to perform these steps can be found in the same R_ignore folder:
Rows were dropped that were identified to discard based on problems in the original data.
Filtered to Africa, Asia or South America.
Filtered to Symptomatic patients.
Filtered to convenience surveys or cross-sectional prospective surveys only.
Combined counts (tested and positive) of studies conducted in the same exact location (based on lat/lon) in the same year and from the same source publication. These are considered a single site.
Filtered to have 10 or more samples per site. Avoids very small sample sizes which would have very little information from the data and therefore would be driven by our prior assumptions.
All sites were mapped to ADMIN1 level by comparing the lat/lon coordinates against a shapefile from GADM version 4.1.0, first administrative unit.
Results were combined with studies that contain additional information not reflected in the WHO malaria threats map data. For example, some studies have site-level resolution despite not being apparent in the original data download. These additional studies can be found in the R_ignore/data folder under the name "additional_data.csv".
Filter to ADMIN1 regions that contain at least 3 distinct sites within the same year and from the same source publication.
This final filtered dataset is what is available here.
data(historical_data)