Skip to contents

The main class that stores the data and is responsible for all data input, output, and processing functions. Most of the functionality of the STAVE package is through this class in the form of member functions.

Details

The raw data are stored as private variables within this object, meaning they cannot (or should not) be edited directly. Rather, tables can be extracted using get_counts() and similarly for other tables. The three tables are:

  1. studies: Information on where the data came from, for example a url and author names. Each study is indexed with a unique study_id.

  2. surveys: Information on the surveys represented within a study. A survey is defined here as a discrete instance of data collection, which includes information on geography (latitude and longitude) and collection time. Surveys are given survey_ids and are linked to a particular study through the study_id.

  3. counts: The actual genetic information, which is linked to a particular survey through the survey_id. Genetic variants are encoded in character strings that must follow a specified format, and the number of times this variant was observed among the total sample is stored in columns.

This combination of linked tables allows efficient and flexible encoding of variants, while avoiding unnecessary duplication of information.

Methods


Method print()

Custom print method to control console output

Usage

STAVE_object$print()


Method get_studies()

Extract the studies data.frames stored within the object

Usage

STAVE_object$get_studies()


Method get_surveys()

Extract the surveys data.frames stored within the object

Usage

STAVE_object$get_surveys()


Method get_counts()

Extract the counts data.frames stored within the object

Usage

STAVE_object$get_counts()


Method get_version()

Extract the version number of the STAVE object. This is important as member functions of a STAVE object are directly linked to the object itself, and will not be updated by updating the version of the package in your environment. To update a STAVE object to a new package version, you should first extract the data and then load into a new STAVE object created with the most recent version.

Usage

STAVE_object$get_version()


Method append_data()

Append new data

Usage

STAVE_object$append_data(
  studies_dataframe,
  surveys_dataframe,
  counts_dataframe
)

Arguments

studies_dataframe

a data.frame containing information at the study level. This data.frame must have the following columns: study_id, study_name, study_type, authors, publication_year, url

surveys_dataframe

a data.frame containing information at the survey level. This data.frame must have the following columns: study_key, survey_id, country_name, site_name, latitude, longitude, spatial_notes, collection_start, collection_end, time_notes. The study_key element must correspond to a study_id in the studies_dataframe.

counts_dataframe

a data.frame of genetic information. Must contain the following columns: survey_key, variant_string, variant_num, total_num. The survey_key element must correspond to a valid survey_id in the surveys_dataframe.


Method get_prevalence()

Calculate prevalence

Usage

STAVE_object$get_prevalence(
  target_variant,
  keep_ambiguous = FALSE,
  prev_from_min = TRUE
)

Arguments

target_variant

the name of the variant on which we want to calculate prevalence, for example crt:72:C. Note that there can be no heterozygous calls within this name.

keep_ambiguous

there may be variants in the data for which the target_variant could be in the sample, but this cannot be proven conclusively. For example, the sequence A_A_A may be a match to the sequence A/C_A/C_A or it may not, these are unphased genotypes so we cannot be sure. If keep_ambiguous = TRUE then both a min and a max numerator are reported that either exclude all ambiguous calls (min) or include all ambiguous calls (max). If FALSE (the default) then only the min is reported.

prev_from_min

the output object includes a point estimate of the prevalence along with exact binomial confidence intervals. These must be calculated from one of numerator_min or numerator_max in the case of ambiguous calls. This argument sets which one of these numerators is used in the calculation.

@import dplyr


Method get_variants()

Return a vector of all variants present in the data object.

Usage

STAVE_object$get_variants(report_haplo = FALSE)

Arguments

report_haplo

(Boolean) if TRUE then list all haplotypes. Otherwise, list in locus-by-locus format. Defaults to FALSE.


Method drop_study()

Drop one or more study_ids from the data. This will drop from all internally stored data objects, including the corresponding surveys and counts data.

Usage

STAVE_object$drop_study(drop_study_id)

Arguments

drop_study_id

a vector of study_ids to drop from all data objects.


Method clone()

The objects of this class are cloneable with this method.

Usage

STAVE_object$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.