STAVE data object (R6 class)

The main class that stores the data and is responsible for all data input, output, and processing functions. Most of the functionality of the STAVE package is through this class in the form of member functions.

Details

The raw data are stored as private variables within this object, meaning they cannot (or should not) be edited directly. Rather, tables can be extracted using get_counts() and similarly for other tables. The three tables are:

studies: Information on where the data came from, for example a url and author names. Each study is indexed with a unique study_id.
surveys: Information on the surveys represented within a study. A survey is defined here as a discrete instance of data collection, which includes information on geography (latitude and longitude) and collection time. Surveys are given survey_ids and are linked to a particular study through the study_id.
counts: The actual genetic information, which is linked to a particular study and survey through the study_id and survey_id. Genetic variants are encoded in character strings that must follow a specified format, and the number of times this variant was observed among the total sample is stored in columns.

This combination of linked tables allows efficient and flexible encoding of variants, while avoiding unnecessary duplication of information.

Methods

Method `print()`

Custom print method to control console output

Usage

STAVE_object$print()

Method `get_studies()`

Extract the studies data.frames stored within the object

Usage

STAVE_object$get_studies()

Method `get_surveys()`

Extract the surveys data.frames stored within the object

Usage

STAVE_object$get_surveys()

Method `get_counts()`

Extract the counts data.frames stored within the object

Usage

STAVE_object$get_counts()

Method `get_version()`

Extract the version number of the STAVE object. This is important as member functions of a STAVE object are directly linked to the object itself, and will not be updated by updating the version of the package in your environment. To update a STAVE object to a new package version, you should first extract the data and then load into a new STAVE object created with the most recent version.

Usage

STAVE_object$get_version()

Method `append_data()`

Append new data

Usage

STAVE_object$append_data(
  studies_dataframe,
  surveys_dataframe,
  counts_dataframe
)

Arguments

studies_dataframe: a data.frame containing information at the study level. This data.frame must have the following columns: study_id, study_name, study_type, authors, publication_year, url
surveys_dataframe: a data.frame containing information at the survey level. This data.frame must have the following columns: study_key, survey_id, country_name, site_name, latitude, longitude, spatial_notes, collection_start, collection_end, collection_day, time_notes.
counts_dataframe: a data.frame of genetic information. Must contain the following columns: study_key, survey_key, variant_string, variant_num, total_num.

Method `get_prevalence()`

Calculate prevalence

Usage

STAVE_object$get_prevalence(
  target_variant,
  keep_ambiguous = FALSE,
  prev_from_min = TRUE,
  return_full = TRUE
)

Arguments

target_variant

the variant on which to calculate prevalence, for example crt:72:C. There can be no heterozygous calls within this string.

keep_ambiguous

there may be variants in the data for which the target_variant could be in the sample, but this cannot be proven conclusively. For example, the sequence A_A_A may be a match to the sequence A/C_A/C_A, or it may not. These are unphased genotypes so we cannot be sure. If keep_ambiguous = TRUE then both a min and a max numerator are reported that either include all ambiguous calls as matches (max) or exclude them as mismatches (min). If FALSE (the default) then ambiguous calls are skipped over, which may downwardly bias prevalence calculation.

prev_from_min

the output object includes a point estimate of the prevalence along with exact binomial confidence intervals. In the case of ambiguous calls, these must be calculated from one of numerator_min or numerator_max. This argument sets which one of these values is used in the calculation. Defaults to TRUE, which risks underestimating prevalence.

return_full

if TRUE (the default) returns the entire loaded dataset, with prevalence equal to NA if there is no denominator. If FALSE only returns entries for which there is a non-zero denominator.

@import dplyr

Method `get_variants()`

Return a vector of all variants present in the data object.

Usage

STAVE_object$get_variants(report_haplo = FALSE)

Arguments

report_haplo: (Boolean) if TRUE then list all haplotypes. Otherwise, list in locus-by-locus format. Defaults to FALSE.

Method `drop_study()`

Drop one or more study_ids from the data. This will drop from all internally stored data objects, including the corresponding surveys and counts data.

Usage

STAVE_object$drop_study(drop_study_id)

Arguments

drop_study_id: a vector of study_ids to drop from all data objects.

Method `clone()`

The objects of this class are cloneable with this method.

Usage

STAVE_object$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Details

Methods

Public methods

Method print()

Usage

Method get_studies()

Usage

Method get_surveys()

Usage

Method get_counts()

Usage

Method get_version()

Usage

Method append_data()

Usage

Arguments

Method get_prevalence()

Usage

Arguments

Method get_variants()

Usage

Arguments

Method drop_study()

Usage

Arguments

Method clone()

Usage

Arguments

Method `print()`

Method `get_studies()`

Method `get_surveys()`

Method `get_counts()`

Method `get_version()`

Method `append_data()`

Method `get_prevalence()`

Method `get_variants()`

Method `drop_study()`

Method `clone()`