library(hierfstat)
library(vcfR)
Analysis tutorial
First we will load the libraries we will need:
We’re going to be using WGS data from Vietnam which was generated for the Pf3k Project. First load in the metadata containing the sample names and information on where the samples were collected.
<-read.csv("../../data/wgs/pf3k/Vietnam/pf3k.metadata.Vietnam.csv") meta
Then we can load in the genetic data from VCF format.
<-read.vcfR("../../data/wgs/pf3k/Vietnam/SNP_INDEL_Pf3D7_ALL_v3.combined.filtered.vqslod6.biallelic_snp.Vietnam.vcf.gz") vcf
Next we extract the genotypes and convert them into a dosage format.
<-extract.gt(vcf)
vcf_gts
<-function(x){
recode_gt<-gsub("[|]","/",x)
x<-gsub("0/0",0,x)
x<-gsub("0/1",1,x)
x<-gsub("1/1",2,x)
xas.numeric(x)
}
<-apply(vcf_gts,MARGIN = 2,function(x){recode_gt(x)}) raw_gts
Now we can remove loci which are monomorphic.
<-function(x){
calculate_afsum(x,na.rm = T)/length(x)*2
}<- apply(raw_gts,MARGIN = 1,calculate_af)
af <-as.data.frame(t(raw_gts[which(af>0 & af<1),])) transposed_gts
We now have a matrix where the samples are rows and the columns are loci. Hierfstat requires you to also provide population assignments for each sample by adding it to the data as the first column.
<-cbind(meta$site,transposed_gts) dat
Now we are ready to calculate some statistics!
<-basic.stats(dat)
results$overall results
Ho Hs Ht Dst Htp Dstp Fst Fstp Fis Dest
0.0674 0.0635 0.0639 0.0003 0.0640 0.0005 0.0052 0.0078 -0.0601 0.0005
The results$overall
table contains basic statistics averaged over loci. The statistics presented are defined in eq.7.38– 7.43 pp.164–5 of Nei (1987).
Summary
We loaded in data from VCF format, converted this to dosage and finally calculated basic statistics.