Reading in Data
tutorial_reading_in_data.RmdThis tutorial covers the following topics:
- Creating a new STAVE object
- Loading data into this object
- Viewing the loaded data
- Dropping studies and surveys by ID
Reading in data
STAVE works via a single class (an R6 object) that acts as the main data container. This class allows users to efficiently import, store, and manipulate data via specialized member functions.
A new object can be created and data read in like this:
# create new object
s <- STAVE_object$new()
# append data using a member function
s$append_data(studies_dataframe = example_input$studies,
surveys_dataframe = example_input$surveys,
counts_dataframe = example_input$counts)
#> data correctly appendedAll three data frames must follow the very specific formats required by STAVE. See the How it works sections if you are unclear on this format. If your data do not conform to this structure, the append will be rejected.
The default print method tells us how many studies and surveys are loaded:
s
#> Studies: 2
#> Surveys: 6Using a custom class offers several key advantages. Once loaded, all data remain consolidated within a single object, avoiding fragmentation. The class structure also ensures the data are encapsulated, meaning they cannot be directly edited by the user. This built-in protection minimizes the risk of accidental data corruption.
We can view the loaded tables using get functions:
s$get_studies()| study_id | study_label | description | access_level | contributors | reference | reference_year | PMID |
|---|---|---|---|---|---|---|---|
| Dama_2017 | Reduced ex vivo susceptibility of Plasmodium falciparum after oral artemether-lumefantrine treatment in Mali | NA | public | Dama et al. | https://pubmed.ncbi.nlm.nih.gov/28148267/ | 2017 | 28148267 |
| Asua_2019 | Changing Molecular Markers of Antimalarial Drug Sensitivity across Uganda | NA | public | Asua et al. | https://pubmed.ncbi.nlm.nih.gov/30559133/ | 2019 | 30559133 |
s$get_surveys()| study_id | survey_id | country_name | site_name | latitude | longitude | location_method | location_notes | collection_start | collection_end | collection_day | time_method | time_notes |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dama_2017 | Dama_2017_Bamako_2014 | Mali | Koulikoro | 12.612900 | -8.13560 | WWARN lat and long | NA | 2014-01-01 | 2014-12-31 | 2014-07-02 | automated midpoint | NA |
| Asua_2019 | Asua_2019_Agago_2017 | Uganda | Agago | 2.984722 | 33.33055 | WWARN lat and long | NA | 2017-01-01 | 2017-12-31 | 2017-07-02 | automated midpoint | NA |
| Asua_2019 | Asua_2019_Arua_2017 | Uganda | Arua | 3.030000 | 30.91000 | WWARN lat and long | NA | 2017-01-01 | 2017-12-31 | 2017-07-02 | automated midpoint | NA |
| Asua_2019 | Asua_2019_Kole_2017 | Uganda | Kole | 2.428611 | 32.80111 | WWARN lat and long | NA | 2017-01-01 | 2017-12-31 | 2017-07-02 | automated midpoint | NA |
| Asua_2019 | Asua_2019_Lamwo_2017 | Uganda | Lamwo | 3.533333 | 32.80000 | WWARN lat and long | NA | 2017-01-01 | 2017-12-31 | 2017-07-02 | automated midpoint | NA |
| Asua_2019 | Asua_2019_Mubende_2017 | Uganda | Mubende | 0.557500 | 31.39500 | WWARN lat and long | NA | 2017-01-01 | 2017-12-31 | 2017-07-02 | automated midpoint | NA |
s$get_counts()| study_id | survey_id | variant_string | variant_num | total_num | notes |
|---|---|---|---|---|---|
| Dama_2017 | Dama_2017_Bamako_2014 | crt:76:T | 130 | 170 | NA |
| Dama_2017 | Dama_2017_Bamako_2014 | mdr1:86:Y | 46 | 158 | NA |
| Asua_2019 | Asua_2019_Agago_2017 | k13:469:Y | 42 | 42 | NA |
| Asua_2019 | Asua_2019_Agago_2017 | k13:675:V | 42 | 42 | NA |
| Asua_2019 | Asua_2019_Arua_2017 | k13:675:V | 43 | 43 | NA |
| Asua_2019 | Asua_2019_Kole_2017 | k13:469:Y | 47 | 47 | NA |
| Asua_2019 | Asua_2019_Kole_2017 | k13:675:V | 47 | 47 | NA |
| Asua_2019 | Asua_2019_Lamwo_2017 | k13:469:Y | 43 | 43 | NA |
| Asua_2019 | Asua_2019_Lamwo_2017 | k13:675:V | 43 | 43 | NA |
| Asua_2019 | Asua_2019_Mubende_2017 | k13:469:F | 45 | 45 | NA |
However, we cannot directly modify the data in these tables. Instead, we have to modify the data structure using member functions.
Dropping studies and surveys
Imagine we are not interested in the study by Dama et al. (2017), and
want to drop it from our analysis. We can do so via the
study_id:
s$drop_study("Dama_2017")
#> drop 1 study, 1 surveySimilarly, we may want to drop a specific survey - for example the
Asua_2019_Agago_2017 survey:
s$drop_survey("Asua_2019_Agago_2017")
#> drop 0 studies, 1 surveyLooking at the STAVE object, we can see how we have reduced the size of the data:
s
#> Studies: 1
#> Surveys: 4We are free to append this information back in at any point, at which point it will go through the usual rigorous checks.