Sequence Read Quality Lecture
Overview
Teaching: 20 min
Exercises: 5 minQuestions
How does sequencing work
Where do the errors come from
Objectives
Recap sequence data quality
Understanding read data
Sequencing quality
Next up is a presentation on sequence read quality and sequencing methods. The slides are available at https://klif.uu.nl/klif/mgen/MicrobialGenomics_sequencingQuality_Linda.pdf .
If you are following this course on your own you can make use of a prerecorded lecture by Professor Bas Dutilh, from Theoretical Biology and Bioinformatics at UU and Jena University in Germany. https://www.youtube.com/watch?v=sdxVDy0lSAE
The lecture will take approximately 20 minutes. After that there is time for asking questions.
Checking the Nanopore sequencing quality can be done using NanoStat . it generates a quick summary of the number of bases, the number of reads, the length of the reads and the quality of the reads. in general we expect about 30x more bases than the size of the genome and a mean read length of >3kb. The quality can range between 11 and 18 depending on the sequencing kit, flowcell, basecalling model used.
$ conda activate genomics # check your environment. Most software is in genomics, but some tools have their own environment.
$ conda env list # check which environments you have. If the commandline says it can't find a program, activate the correct environment.
$ cd ~/reads
$ NanoStat --fastq barcode02.fastq
$ NanoStat --fastq barcode03.fastq
Record the number of bases, the estimated sequencing depth (number of bases divided by expected size of the genome) and the mean read length in the Google Docs file
Sometimes the Nanopore output is really high. Filtlong can be used to reduce the amount of read data to more manageable levels. Using this command we reduce the expected coverage to about 100x. This is not necessary in the example data for this course.
$ cd ~/reads
$ filtlong -t 550000000 barcode02.fastq >filtered.barcode02.fastq
$ filtlong -t 550000000 barcode03.fastq >filtered.barcode02.fastq
Key Points
Determining sequence quality of reads