FastQC

FASTQC is a widely used tool for assessing the quality of raw and processed sequencing data. It provides a comprehensive quality check, including metrics like per-base quality scores, GC content, and adapter contamination.

fastqc $args --threads $task.cpus $reads

Options:

  • <file{R1,R2}.fastq>: Input FASTQ files (gzip-compressed files, e.g., file1.fastq.gz, are also supported).

  • -o <output_directory>: Specify the directory where reports will be saved. Defaults to the current directory if omitted.

  • -t <number_of_threads>: Specify the number of threads for parallel processing.

Interpreting FASTQC Results

FASTQC generates:

  1. HTML Report: Visual summary of the quality metrics.

  2. ZIP File: Contains the raw data used to generate the report.

Key metrics in the HTML report:

  • Per Base Sequence Quality:

    • Boxplots showing quality scores across all positions in reads.

    • Green indicates high-quality bases (>Q30).

  • Per Sequence Quality Scores:

    • Overall quality of reads in the file.

  • Per Base GC Content:

    • GC content distribution across the length of reads.

  • Adapter Content:

    • Detects overrepresented adapter sequences.

  • Overrepresented Sequences:

    • Identifies frequently occurring sequences (e.g., adapters or contaminants).

Last updated