STAVE data object (R6 class)
STAVE_object.RdThe main class that stores the data and is responsible for all data input, output, and processing functions. Most of the functionality of the STAVE package is through this class in the form of member functions.
Details
The raw data are stored as private variables within this object, meaning they
cannot (or should not) be edited directly. Rather, tables can be extracted
using get_counts() and similarly for other tables. The three tables
are:
studies: Information on where the data came from, for example a url and author names. Each study is indexed with a unique study_id.
surveys: Information on the surveys represented within a study. A survey is defined here as a discrete instance of data collection, which includes information on geography (latitude and longitude) and collection time. Surveys are given survey_ids and are linked to a particular study through the study_id.
counts: The actual genetic information, which is linked to a particular study and survey through the study_id and survey_id. Genetic variants are encoded in character strings that must follow a specified format, and the number of times this variant was observed among the total sample is stored in columns.
This combination of linked tables allows efficient and flexible encoding of variants, while avoiding unnecessary duplication of information.
Methods
Method get_version()
Extract the version number of the STAVE object. This is important as member functions of a STAVE object are directly linked to the object itself, and will not be updated by updating the version of the package in your environment. To update a STAVE object to a new package version, you should first extract the data and then load into a new STAVE object created with the most recent version.
Method append_data()
Append new data
Arguments
studies_dataframea data.frame containing information at the study level. This data.frame must have the following columns: study_id, study_name, study_type, authors, publication_year, url
surveys_dataframea data.frame containing information at the survey level. This data.frame must have the following columns: study_key, survey_id, country_name, site_name, latitude, longitude, spatial_notes, collection_start, collection_end, collection_day, time_notes.
counts_dataframea data.frame of genetic information. Must contain the following columns: study_key, survey_key, variant_string, variant_num, total_num.
Method get_prevalence()
Calculate prevalence
Usage
STAVE_object$get_prevalence(
target_variant,
keep_ambiguous = FALSE,
prev_from_min = TRUE,
return_full = TRUE
)Arguments
target_variantthe variant on which to calculate prevalence, for example crt:72:C. There can be no heterozygous calls within this string.
keep_ambiguousthere may be variants in the data for which the target_variant could be in the sample, but this cannot be proven conclusively. For example, the sequence A_A_A may be a match to the sequence A/C_A/C_A, or it may not. These are unphased genotypes so we cannot be sure. If
keep_ambiguous = TRUEthen both a min and a max numerator are reported that either include all ambiguous calls as matches (max) or exclude them as mismatches (min). IfFALSE(the default) then ambiguous calls are skipped over, which may downwardly bias prevalence calculation.prev_from_minthe output object includes a point estimate of the prevalence along with exact binomial confidence intervals. In the case of ambiguous calls, these must be calculated from one of
numerator_minornumerator_max. This argument sets which one of these values is used in the calculation. Defaults toTRUE, which risks underestimating prevalence.return_fullif
TRUE(the default) returns the entire loaded dataset, with prevalence equal toNAif there is no denominator. IfFALSEonly returns entries for which there is a non-zero denominator.@import dplyr
Method drop_study()
Drop one or more study_ids from the data. This will drop from all internally stored data objects, including the corresponding surveys and counts data.