9 `varikondo` and `superFreq`: Standardise and combine variants over time

We are now at the stage where variants are called and we have loads of VCF files/ exel files. The scripts in the previous sections helps with calling variants using 4 callers (VarDict, Varscan, MuTect2, km and freebayes).

At this stage, depending on how one needs to use the variants, there are several options discussed below. To make it easier to import variant calls into R and store them in data frame I developed the R package varikondo which hopefully will help you to find the spark joy in your variants!

Below are some applications but for a more in depth description of its functionalities see the package website https://annaquaglieri16.github.io/varikondo/articles/vignette.html.

9.1 `parse_vcf_output`: read in your single sample VCF file in R

If you simply want to read a VCF file in R you can use parse_vcf_output which currently support tumour-only variants from GATK MuTect2, VarDict, VarScan and Freebayes.

library(varikondo)

annot_vcf_mutect <- system.file("extdata", "chr20_mutect.vcf.gz", package = "varikondo")
annot_vcf_mutect

## [1] "/Users/quaglieri.a/Library/R/3.5/library/varikondo/extdata/chr20_mutect.vcf.gz"

parsed_vcf_mutect <- varikondo::parse_vcf_output(vcf_path = annot_vcf_mutect, 
                                                 caller = "mutect",
                                                 sample_name = "Sample1",
                                                 vep = TRUE)
knitr::kable(parsed_vcf_mutect[1:10,],caption = "Parsed MuTect2 output, from VCF to data frame.")

Table 9.1: Parsed MuTect2 output, from VCF to data frame.
Location	caller	chrom	pos	end	ref	alt	qual	filter	genotype	tot_depth	VAF	alt_depth	alt_forw	alt_rev	start	width	alt_base_quality	SampleName	Allele	Consequence	IMPACT	SYMBOL	Gene	Feature_type	Feature	BIOTYPE	EXON	INTRON	cDNA_position	Existing_variation	DISTANCE	STRAND	FLAGS	VARIANT_CLASS	SYMBOL_SOURCE	HGNC_ID	CANONICAL	CCDS	ENSP	SWISSPROT	TREMBL	UNIPARC	GENE_PHENO	AF	AFR_AF	AMR_AF	EAS_AF	EUR_AF	SAS_AF	MAX_AF	MAX_AF_POPS	IMPACT_rank
chr20_30948281	mutect2	chr20	30948281	30948281	A	G	11.000	PASS	0/1	3	1	3	0	3	30948281	1	66	Sample1	G	intron_variant	MODIFIER	ASXL1	ENSG00000171456	Transcript	ENST00000306058	protein_coding		1/11		rs2208131		1		SNV	HGNC	18318			ENSP00000305119		A6NIZ6	UPI00015DF8B6	1	0.4093	0.1672	0.4337	0.744	0.3499	0.4356	0.744	EAS	1
chr20_30948281	mutect2	chr20	30948281	30948281	A	G	11.000	PASS	0/1	3	1	3	0	3	30948281	1	66	Sample1	G	intron_variant	MODIFIER	ASXL1	ENSG00000171456	Transcript	ENST00000375687	protein_coding		1/12		rs2208131		1		SNV	HGNC	18318	YES	CCDS13201.1	ENSP00000364839	Q8IXJ9		UPI000036702C	1	0.4093	0.1672	0.4337	0.744	0.3499	0.4356	0.744	EAS	1
chr20_30948281	mutect2	chr20	30948281	30948281	A	G	11.000	PASS	0/1	3	1	3	0	3	30948281	1	66	Sample1	G	intron_variant	MODIFIER	ASXL1	ENSG00000171456	Transcript	ENST00000375689	protein_coding		1/4		rs2208131		1		SNV	HGNC	18318			ENSP00000364841		Q5JWS8	UPI00004A2D4D	1	0.4093	0.1672	0.4337	0.744	0.3499	0.4356	0.744	EAS	1
chr20_30948281	mutect2	chr20	30948281	30948281	A	G	11.000	PASS	0/1	3	1	3	0	3	30948281	1	66	Sample1	G	intron_variant	MODIFIER	ASXL1	ENSG00000171456	Transcript	ENST00000497249	protein_coding		1/4		rs2208131		1	cds_start_NF	SNV	HGNC	18318			ENSP00000451216			UPI00021CEF59	1	0.4093	0.1672	0.4337	0.744	0.3499	0.4356	0.744	EAS	1
chr20_30948281	mutect2	chr20	30948281	30948281	A	G	11.000	PASS	0/1	3	1	3	0	3	30948281	1	66	Sample1	G	intron_variant	MODIFIER	ASXL1	ENSG00000171456	Transcript	ENST00000542461	protein_coding		1/5		rs2208131		1		SNV	HGNC	18318			ENSP00000438654		Q6P1M8	UPI0000232A47	1	0.4093	0.1672	0.4337	0.744	0.3499	0.4356	0.744	EAS	1
chr20_30948281	mutect2	chr20	30948281	30948281	A	G	11.000	PASS	0/1	3	1	3	0	3	30948281	1	66	Sample1	G	non_coding_transcript_exon_variant	MODIFIER	ASXL1	ENSG00000171456	Transcript	ENST00000555343	processed_transcript	1/7		462	rs2208131		1		SNV	HGNC	18318							1	0.4093	0.1672	0.4337	0.744	0.3499	0.4356	0.744	EAS	1
chr20_30953162	mutect2	chr20	30953162	30953162	T	C	15.125	PASS	0/1	4	1	4	4	0	30953162	1	121	Sample1	C	intron_variant	MODIFIER	ASXL1	ENSG00000171456	Transcript	ENST00000306058	protein_coding		1/11		rs6141294		1		SNV	HGNC	18318			ENSP00000305119		A6NIZ6	UPI00015DF8B6	1	0.4087	0.1672	0.4337	0.744	0.3499	0.4325	0.744	EAS	1
chr20_30953162	mutect2	chr20	30953162	30953162	T	C	15.125	PASS	0/1	4	1	4	4	0	30953162	1	121	Sample1	C	intron_variant	MODIFIER	ASXL1	ENSG00000171456	Transcript	ENST00000375687	protein_coding		1/12		rs6141294		1		SNV	HGNC	18318	YES	CCDS13201.1	ENSP00000364839	Q8IXJ9		UPI000036702C	1	0.4087	0.1672	0.4337	0.744	0.3499	0.4325	0.744	EAS	1
chr20_30953162	mutect2	chr20	30953162	30953162	T	C	15.125	PASS	0/1	4	1	4	4	0	30953162	1	121	Sample1	C	intron_variant	MODIFIER	ASXL1	ENSG00000171456	Transcript	ENST00000375689	protein_coding		1/4		rs6141294		1		SNV	HGNC	18318			ENSP00000364841		Q5JWS8	UPI00004A2D4D	1	0.4087	0.1672	0.4337	0.744	0.3499	0.4325	0.744	EAS	1
chr20_30953162	mutect2	chr20	30953162	30953162	T	C	15.125	PASS	0/1	4	1	4	4	0	30953162	1	121	Sample1	C	upstream_gene_variant	MODIFIER	ASXL1	ENSG00000171456	Transcript	ENST00000470145	processed_transcript				rs6141294	1025	1		SNV	HGNC	18318							1	0.4087	0.1672	0.4337	0.744	0.3499	0.4325	0.744	EAS	1

9.2 `superFreq` and `varikodno::import_goi_superfreq` to combine SNVs/short insertions over time

If you have data collected for the same patient over time you can use software like superFreq to track changing clones over the course of treatment for example. Figure 1.1 shows the input necessary for superFreq and full documentation/manuscript/examples are available on GitHub https://github.com/ChristofferFlensburg/superFreq. The chunk below is an example taken from one of the runs that I did for RNA samples, setting the mode = RNA option. There main input for superFreq are: VCF files (in my case normally from VarScan); bamfiles (including Panel Of Normal (PON) samples); and metadata information (see example in Table ??).

superFreq will create two main output folder specified when running the function: plots = './superFreq/plots' and rdir = './superFreq/R'.

You can then use varikondo::import_goi_superfreq to variants found on genes of interest into R!

import_sf <- varikondo::import_goi_superfreq(superFreq_R_path = superFreq_R_path,
                                 superFreq_meta_path = superFreq_meta_path,
                                 studyGenes,
                                 patientID = "D1",
                                 ref_genome = "hg38",
                                 VAFcut = 0.15)

9.3 `import_any`: combine variants over time for any caller

In the pipeline (Figure 1.1) that I applied to the RNA-Seq cohorts that I was involved with, I cann INDELs with VarDict and km for every single sample independently and then filter and curate the results. superFreq only supports point mutations and short INDELs which is why I won’t use for more complex INDELs reported in VCF files by VarDict. My strategy to combine and filter variants within the same patient is to filter them simultaneusly and to retain only those variants which passes specific threshold at one time points. The rational being here that if a variant is too low to be independently detected at one time point it might still be kept if it is present at a good level at a different time.

To accomplish this, I use the varikondo::import_any function which takes in input a data frame, as created by import_vcf_output and filter variants by patient. Here is an example on how to jointly use import_vcf_output and import_any to obtain a final filtered and combine version of VarDict calls for one patients with samples over time.

9 varikondo and superFreq: Standardise and combine variants over time

9.1 parse_vcf_output: read in your single sample VCF file in R

9.2 superFreq and varikodno::import_goi_superfreq to combine SNVs/short insertions over time

9.3 import_any: combine variants over time for any caller

9 `varikondo` and `superFreq`: Standardise and combine variants over time

9.1 `parse_vcf_output`: read in your single sample VCF file in R

9.2 `superFreq` and `varikodno::import_goi_superfreq` to combine SNVs/short insertions over time

9.3 `import_any`: combine variants over time for any caller