5 FASTQC and adapters trimming
fastqc
(Andrews 2010) can be used for QC of the FASTQ files.
fastqc ../data/SRR1608610_1.fastq.gz --outdir ./data/
5.1 Parallelise your FASTQC
This is just one example to run fastqc
on several FASTQ files using parallel
(Tange, n.d.).
find ./data -name "*.fastq.gz" > ./data/fastq_files.txt
cat ./data/fastq_files.txt | parallel -j 2 "fastqc {} --outdir ./data"
5.2 Summarise reports with MultiQC
I strongly suggest to have a look at MultiQC (Ewels et al. 2016) which allows you to combine together the results of multiple samples into one compact document. You can check the programs whose output is supported by MultiQC
(loads!!).
multiqc ./data/ --interactive -n "FASTQC_summary" -o ./data/
The FASTQC
reports offer a variety of measures and one can decide about discarding some samples or doing some adapter trimming if necessary. Trimmomatic and Trim Galore! can be used for this purpose.
5.3 Adapter troubles: STAR vs Subread
I suggest looking at one of my previous analyses around adapters with STAR and Subread since adapters can cause serious troubles with STAR
default settings! Regarding this I strongly suggest to look at the fragment size distribution across samples once you have aligned your fatsq files. Unusual behaviour can help you spot problems with adapters/alignment steps, which I highlighted in the post. I normally use the CollectMultipleMetrics to extract fragment sizes from Paired End (PE) bamfiles. See Section 6.4 for more details.