1 Setup

This is an example workflow from SRR files to Variant calling using modular functions written in R and bash.

git clone git@github.com:annaquaglieri16/RNA-seq-variant-calling.git

cd ./RNA-seq-variant-calling

All the functions used for the variant calling and downsampling pipeline are inside the ./functions folder.

If you want to download sample FASTQ files or learn how to download FASTQ files from GEO go to Section 2.
If you already have the FASTQ files and YOU WANT TO randomly downsample your samples to a fix number of reads go to Section 3.
If you already have the FASTQ files and you don’t need to perform quality control or downsampl your files go to Section 6.
If you already have the BAM files and you want to call variants go to Section 7.

1.1 Overview

Figure 1.1 below offers an overview of the pipeline that I applied to several of the cancer RNA-Seq samples that I worked with. However, the current book mentions other callers not displayed in the figure.

Figure 1.1: Overview of the variant calling pipeline that I used used for several cancer RNA-Seq data.

The sofwtare mentioned in Figure 1.1 are mentioned throughout the book and cited below:

GATK (McKenna et al. 2010)
VarScan (Koboldt et al. 2012)
superFreq (Flensburg et al. 2018)
VarDict (Lai et al. 2016)
km (Eric Olivier Audemard, Patrick Gendron, Vincent-Philippe Lavallée, Josée Hébert, Guy Sauvageau, Sébastien Lemieux 2018)
VEP (McLaren et al. 2016)
varikondo

The pre-processing steps in Figure 1.1 are also summarised in Figure 1.2 and discussed in the sections below. The majority of the pre-processing steps are taken from the GATK best practices for RNA-Seq variant calling.

Figure 1.2: Bamfile pre-processing.

1.2 Disclaimer

The following workflow was built in a modular way and it is not wrapped up into a pipeline manager. I aknowledge the limitations and non-user-friendliness of some steps. However, it offers a comprehensive view of several tools and steps used for variant calling in RNA-Seq as well as general tools used in any bioinformatics pipeline.

From FASTQ files to Variant Calling for RNA-Seq

From FASTQ files to Variant Calling for RNA-Seq

1 Setup

1.1 Overview

1.2 Disclaimer