Sequence Read Quality Lecture

Overview

Teaching: 20 min
Exercises: 5 min
Questions
  • How does sequencing work

  • Where do the errors come from

Objectives
  • Recap sequence data quality

  • Understanding read data

Sequencing quality

Next up is a presentation on sequence read quality and sequencing methods. The slides are available at https://klif.uu.nl/klif/mgen/MicrobialGenomics_sequencingQuality_Linda.pdf .

If you are following this course on your own you can make use of a prerecorded lecture by Professor Bas Dutilh, from Theoretical Biology and Bioinformatics at UU and Jena University in Germany. https://www.youtube.com/watch?v=sdxVDy0lSAE

The lecture will take approximately 20 minutes. After that there is time for asking questions.

Checking the Nanopore sequencing quality can be done using NanoStat . it generates a quick summary of the number of bases, the number of reads, the length of the reads and the quality of the reads. in general we expect about 30x more bases than the size of the genome and a mean read length of >3kb. The quality can range between 11 and 18 depending on the sequencing kit, flowcell, basecalling model used.

$ conda activate genomics # check your environment. Most software is in genomics, but some tools have their own environment.
$ conda env list # check which environments you have. If the commandline says it can't find a program, activate the correct environment.
$ cd ~/reads
$ NanoStat --fastq barcode02.fastq
$ NanoStat --fastq barcode03.fastq

Record the number of bases, the estimated sequencing depth (number of bases divided by expected size of the genome) and the mean read length in the Google Docs file

Sometimes the Nanopore output is really high. Filtlong can be used to reduce the amount of read data to more manageable levels. Using this command we reduce the expected coverage to about 100x. This is not necessary in the example data for this course.

$ cd ~/reads
$ filtlong -t 550000000 barcode02.fastq >filtered.barcode02.fastq
$ filtlong -t 550000000 barcode03.fastq >filtered.barcode02.fastq

Key Points

  • Determining sequence quality of reads