overview
TwistMethNext integrates various tools and custom scripts to provide a comprehensive analysis workflow for Twist NGS Methylation data.
Generate Reference Genome
Generate reference genome index files for bismark.
Raw Data QC
Performs quality control checks on raw sequence data.
Adapter Trimming
Trims adapters and low-quality bases from the reads.
Align Reads
Aligns bisulfite-treated reads to a reference genome.
Deduplicate removal
Removes PCR duplicates from the aligned reads.
Sort and Indexing
Sorts the aligned and deduplicated BAM files.
Extract Methylation Calls
Extracts methylation calls from the aligned reads.
Summary Report
Generates a summary report of the Bismark alignment and methylation extraction.
Alignment QC
Generates quality control metrics for the aligned reads.
QC Reporting
Aggregates quality control reports from various steps into a single report.
Differential Methylation
Performs differential methylation analysis using EdgeR and MethylKit package.
Post Processing
Generates summary statistics and visualizations of the differential methylation results.
GO Analysis
Generates GOChord diagram from gene ontology analysis results.
Feature Details
READ_PROCESSING: Checks the quality of raw sequencing data and trims low-quality bases and adapters to improve downstream analysis.
BISMARK_ANALYSIS: Aligns bisulfite-converted reads to a reference genome, identifies and removes PCR duplicates, sorts and indexes the aligned reads, performs quality control on alignments, and extracts methylation information from the aligned reads.
QC_REPORTING: Compiles quality control metrics from various steps into a comprehensive report for easy interpretation.
DIFFERENTIAL_METHYLATION:
EDGER_ANALYSIS:
Takes coverage files, a design file, and comparison information as input.
Performs differential methylation analysis using the EdgeR Bioconductor package.
Outputs CSV files with differential methylation results for each group comparison. METHYLKIT_ANALYSIS:
Takes coverage files, a design file, and comparison information as input.
Performs differential methylation analysis using the EdgeR Bioconductor package.
Outputs CSV files with differential methylation results for each group comparison.
POST_PROCESSING:
Reads the EdgeR results and generates:
Summary statistics (total DMRs, hyper/hypomethylated regions, significant DMRs)
Volcano plot (visualizing fold change vs. significance)
MA plot (visualizing intensity vs. fold change)
Functional Analsysis
Reads the EdgeR/MethylKit results and generates -
Top
n
corresponding genes from the EdgeR/MethylKit results are picked up to generate the gene ontology results.generates a CSV file with the GO classification results (only Biological Processes).
generates a Chord diagram for top 10 results from the GO analysis.
Read processing
Read processing subworkflow includes -
FASTQC - for Quality check of samples
TRIM Galore - adapter trimming
FastQC
FASTQC is a widely used tool for assessing the quality of raw and processed sequencing data. It provides a comprehensive quality check, including metrics like per-base quality scores, GC content, and adapter contamination.
General Options
<file{R1,R2}.fastq>
: Input FASTQ files (gzip-compressed files, e.g.,file1.fastq.gz
, are also supported).-o <output_directory>
: Specify the directory where reports will be saved. Defaults to the current directory if omitted.-t <number_of_threads>
: Specify the number of threads for parallel processing.
??? note "FASTQC Results" 1. HTML Report: Visual summary of the quality metrics. 2. ZIP File: Contains the raw data used to generate the report.
Trim Galore
Trim Galore is a versatile tool for trimming sequencing reads and removing adapter sequences. It’s particularly useful for preparing raw sequencing data for downstream applications like alignment or differential expression/methylation analysis. Trim Galore combines the functionalities of Cutadapt and FastQC for quality control and trimming.
General Options
-q <quality>
: Trim low-quality bases from the ends of reads. Default is20
.--length <min_length>
: Discard reads shorter than the specified length after trimming.--adapter <sequence>
: Specify a custom adapter sequence. By default, Trim Galore auto-detects adapters.--gzip
: Compress the output files into.gz
format.--fastqc
: Run FastQC before and after trimming.--cores <number>
: Use multiple cores for faster processing.
Bismark Analysis
Reference Genome Preparation
Bismark needs to prepare the bisulfite index for the genome.
In the current pipeline, user can provide the
genome.fasta
and the pipeline can index it.Optinally, user can provide the index files directly, and the pipeline will use it without indexing the genome again.
General Options
--verbose
: Prints detailed output during the indexing process.--parallel <threads>
: Uses multiple threads to speed up genome preparation.--bowtie2
: Specifies that Bowtie2 will be used for alignment (default option in most versions).--path_to_bowtie <path>
: Specifies the path to the Bowtie installation if not in yourPATH
.
Example with Options:
??? note "Reference Genome Preparation" After successful completion, Bismark generates a bisulfite-converted genome in two orientations (C->T and G->A) along with the Bowtie/Bowtie2 indices.
Bismark Alignment
This step aligns bisulfite-treated sequencing reads to a reference genome.
General Options
--genome
: Path to the reference genome directory preprocessed withbismark_genome_preparation
.-1
and-2
: Specify paired-end reads. Use-U
for single-end reads.-o
: Output directory for alignment files.
??? note "Bismark Alignment"
Produces
.bam
alignment files.Produces
.report.txt
andunmapped_reads.fq.gz
file.
Bismark Deduplication
This step removes duplicate reads to avoid overestimating methylation levels.
General options:
args
options use all arguments from bismark deduplicate command.--paired
: Use this for paired-end data. Remove for single-end reads.--bam
: Specifies the input BAM file, generated frombismark alignment
??? note "Bismark Deduplicate Removal"
Generates a deduplicated
.bam
file.Produces
deduplicated_report.txt
file.
Bismark Methylation Extractor
Extract methylation data from deduplicated BAM files.
General options:
--bedGraph
: Generates bedGraph file--gzip
: Compresses the output files.
??? note "Bismark Methylation Extractor"
Generates
.bismark.cov.gz
files and methylation call data in CpG, CHG, and CHH contexts.Generates
bedGraph.gz
file.Also generates
splitting_report.txt
file.
Bismark Report
Generate a summary report of alignment and methylation statistics.
Command:
??? note "Bismark Report" Produces an HTML file summarizing: * Alignment efficiency. * Duplicate rates. * Methylation levels (CpG, CHG, CHH contexts).
Alignment Quality Mapping
The main module for assessing alignment quality is qualimap bamqc
.
General Options:
-bam <input.bam>
: Path to the aligned BAM file (e.g., deduplicated BAM file).-outdir <output_directory>
: Directory for output reports.-outformat <pdf|html>
: Choose the output format for the report.
??? note "Alignment Quality Check" The output includes:
QC Reporting
MultiQC is used for the QC reporting combining all results from the FastQC, Trim galore, Bismark Alignment, Bismark Deduplication, Bismark summary report, and Qualimap results.
Output:
Generates an interactive HTML report (
multiqc_report.html
) and a data file (multiqc_data.json
).Output includes summary statistics, plots, and tool-specific metrics.
Differential Methylation Analysis
To calculate the differential methylation from the input samples, two different methods can be used -
EdgeR
edgeR is a Bioconductor package primarily used for RNA-seq differential expression analysis but can also handle differential methylation analysis when paired with bisulfite sequencing data. This requires pre-processed methylation data, such as counts of methylated (M
) and unmethylated (U
) reads at each cytosine position or region of interest.
General options
--coverage_files:
selected from thebismark_methylation_extractor
files.--design
: selected from theSample_sheet.csv
--compare
: selected from theSample_sheet.csv
.
??? note "EdgeR Results"
* Generates EdgeR_group_<compare_str>.csv
.
MethylKit
MethylKit is an R package designed for analyzing bisulfite sequencing data, particularly for differential methylation analysis. It supports genome-wide methylation data and is ideal for CpG, CHH, and CHG methylation studies.
General options
--coverage_files:
selected from thebismark_methylation_extractor
files.--design
: selected from theSample_sheet.csv
--compare
: selected from theSample_sheet.csv
.
??? note "MethylKit Results"
* Generates Methylkit_group_<compare_str>.csv
Post-processing
Generates A) Volcano Plot, B) MA Plot and C) Summary Statistics from the Diffrential Methylation results.
Gene Ontology Analysis
The pipeline has also a module to perform the Gene Ontology analysis from the top n
corresponding genes from the differential methylation results (EdgeR/MethylKit) using the clusterProfiler package.
The results generates a full table with all Biological Processes and a Chord diagram with top 10 functions identified in the analysis.
??? note "GOchord Diagram"
Last updated