Reading in Data

This tutorial covers the following topics:

Creating a new STAVE object
Loading data into this object
Viewing the loaded data
Dropping studies and surveys by ID

STAVE works via a single class (an R6 object) that acts as the main data container. This class allows users to efficiently import, store, and manipulate data via specialized member functions.

A new object can be created and data read in like this:

# create new object
s <- STAVE_object$new()

# append data using a member function
s$append_data(studies_dataframe = example_input$studies,
              surveys_dataframe = example_input$surveys,
              counts_dataframe = example_input$counts)
#> data correctly appended

All three data frames must follow the very specific formats required by STAVE. See the How it works sections if you are unclear on this format. If your data do not conform to this structure, the append will be rejected.

The default print method tells us how many studies and surveys are loaded:

s
#> Studies: 2
#> Surveys: 6

Using a custom class offers several key advantages. Once loaded, all data remain consolidated within a single object, avoiding fragmentation. The class structure also ensures the data are encapsulated, meaning they cannot be directly edited by the user. This built-in protection minimizes the risk of accidental data corruption.

We can view the loaded tables using get functions:

s$get_studies()

study_id	study_label	description	access_level	contributors	reference	reference_year	PMID
Dama_2017	Reduced ex vivo susceptibility of Plasmodium falciparum after oral artemether-lumefantrine treatment in Mali	NA	public	Dama et al.	https://pubmed.ncbi.nlm.nih.gov/28148267/	2017	28148267
Asua_2019	Changing Molecular Markers of Antimalarial Drug Sensitivity across Uganda	NA	public	Asua et al.	https://pubmed.ncbi.nlm.nih.gov/30559133/	2019	30559133

s$get_surveys()

study_id	survey_id	country_name	site_name	latitude	longitude	location_method	location_notes	collection_start	collection_end	collection_day	time_method	time_notes
Dama_2017	Dama_2017_Bamako_2014	Mali	Koulikoro	12.612900	-8.13560	WWARN lat and long	NA	2014-01-01	2014-12-31	2014-07-02	automated midpoint	NA
Asua_2019	Asua_2019_Agago_2017	Uganda	Agago	2.984722	33.33055	WWARN lat and long	NA	2017-01-01	2017-12-31	2017-07-02	automated midpoint	NA
Asua_2019	Asua_2019_Arua_2017	Uganda	Arua	3.030000	30.91000	WWARN lat and long	NA	2017-01-01	2017-12-31	2017-07-02	automated midpoint	NA
Asua_2019	Asua_2019_Kole_2017	Uganda	Kole	2.428611	32.80111	WWARN lat and long	NA	2017-01-01	2017-12-31	2017-07-02	automated midpoint	NA
Asua_2019	Asua_2019_Lamwo_2017	Uganda	Lamwo	3.533333	32.80000	WWARN lat and long	NA	2017-01-01	2017-12-31	2017-07-02	automated midpoint	NA
Asua_2019	Asua_2019_Mubende_2017	Uganda	Mubende	0.557500	31.39500	WWARN lat and long	NA	2017-01-01	2017-12-31	2017-07-02	automated midpoint	NA

s$get_counts()

study_id	survey_id	variant_string	variant_num	total_num	notes
Dama_2017	Dama_2017_Bamako_2014	crt:76:T	130	170	NA
Dama_2017	Dama_2017_Bamako_2014	mdr1:86:Y	46	158	NA
Asua_2019	Asua_2019_Agago_2017	k13:469:Y	42	42	NA
Asua_2019	Asua_2019_Agago_2017	k13:675:V	42	42	NA
Asua_2019	Asua_2019_Arua_2017	k13:675:V	43	43	NA
Asua_2019	Asua_2019_Kole_2017	k13:469:Y	47	47	NA
Asua_2019	Asua_2019_Kole_2017	k13:675:V	47	47	NA
Asua_2019	Asua_2019_Lamwo_2017	k13:469:Y	43	43	NA
Asua_2019	Asua_2019_Lamwo_2017	k13:675:V	43	43	NA
Asua_2019	Asua_2019_Mubende_2017	k13:469:F	45	45	NA

However, we cannot directly modify the data in these tables. Instead, we have to modify the data structure using member functions.

Dropping studies and surveys

Imagine we are not interested in the study by Dama et al. (2017), and want to drop it from our analysis. We can do so via the study_id:

s$drop_study("Dama_2017")
#> drop 1 study, 1 survey

Similarly, we may want to drop a specific survey - for example the Asua_2019_Agago_2017 survey:

s$drop_survey("Asua_2019_Agago_2017")
#> drop 0 studies, 1 survey

Looking at the STAVE object, we can see how we have reduced the size of the data:

s
#> Studies: 1
#> Surveys: 4

We are free to append this information back in at any point, at which point it will go through the usual rigorous checks.