Accurate detection of oncogenic gene fusions is critical for cancer classification, prognosis, and targeted therapy.
Accurate detection of oncogenic gene fusions is critical for cancer classification, prognosis, and targeted therapy. This article provides a systematic comparison of RNA sequencing (RNA-seq) and DNA sequencing (DNA-seq) methodologies for identifying these crucial biomarkers. We explore the foundational principles defining each technology's strengths, delve into their specific applications across cancer types and drug discovery, address common challenges and optimization strategies, and present rigorous validation data and performance benchmarks. For researchers, scientists, and drug development professionals, this review synthesizes evidence demonstrating that RNA-seq and DNA-seq are highly complementary. Integrating both approaches maximizes detection sensitivity for clinically actionable fusions, thereby optimizing patient stratification for precision oncology.
Gene fusions, hybrid genes formed from the juxtaposition of two previously independent genes, are well-established as potent driver mutations in cancer pathogenesis [1] [2]. These molecular events arise from chromosomal rearrangements such as translocations, inversions, and deletions, leading to the production of fusion proteins with oncogenic properties, such as constitutively active tyrosine kinases or aberrant transcription factors [2] [3]. Their significance is underscored by their role as defining features of certain cancer subtypes and as prime targets for therapeutic intervention, making their accurate detection a critical focus in oncological research and precision medicine [2] [3].
The formation of a gene fusion typically originates from a DNA-level rearrangement. Key mechanisms include translocation, where segments from two different chromosomes break and swap places; deletion, which removes an intervening DNA segment to bring two genes together; and inversion, where a chromosome segment is reversed end-to-end [2] [3]. The classic example is the BCR-ABL1 fusion, resulting from a reciprocal translocation between chromosomes 9 and 22 that forms the Philadelphia chromosome, a hallmark of chronic myeloid leukemia (CML) [2] [3]. This fusion produces a constitutively active tyrosine kinase that drives uncontrolled cell proliferation [3].
Oncogenic fusion proteins can function through several mechanisms. Many, like EML4-ALK in non-small cell lung cancer (NSCLC), lead to constitutive activation of tyrosine kinases, perpetually stimulating growth and survival pathways such as MAPK and PI3K-AKT [2] [3]. Others, such as TMPRSS2-ERG in prostate cancer, place a transcription factor under the control of a strong promoter, leading to its deregulated overexpression and disrupting normal gene expression programs [2]. A third mechanism, exemplified by surface-bound NRG1 fusions, can drive aberrant paracrine signaling by activating receptors on neighboring cells [2]. The diagram below illustrates these key mechanisms through which gene fusions drive oncogenesis.
The accurate identification of gene fusions is foundational for both research and clinical decision-making. Next-generation sequencing (NGS) offers two primary approaches: DNA sequencing (DNA-seq) and RNA sequencing (RNA-seq), each with distinct advantages and limitations [4].
DNA-seq (including whole-genome and targeted sequencing) aims to detect the underlying genomic rearrangement that creates the fusion gene. However, this can be challenging because breakpoints often fall within long, repetitive intronic regions, making them difficult to amplify, sequence, and map accurately [4]. While DNA-seq can confirm a structural variant is present, it cannot confirm whether it is transcribed into a functional, expressed fusion transcript [5].
RNA-seq directly sequences the transcriptome, capturing the expressed RNA molecules. This makes it uniquely powerful for identifying the functional, expressed fusion transcripts present in the cell [4] [6]. Since RNA-seq skips introns, it provides a more direct and often more sensitive method for detecting the relevant chimeric RNA, provided the fusion gene is actively expressed [4]. The fundamental differences between these two approaches are summarized in the table below.
Table 1: Core Differences Between DNA-seq and RNA-seq for Fusion Gene Detection
| Feature | DNA-Sequencing (DNA-seq) | RNA-Sequencing (RNA-seq) |
|---|---|---|
| Target Molecule | Genomic DNA | RNA (reverse-transcribed to cDNA) |
| Primary Purpose | Identify structural rearrangements & breakpoints | Identify expressed fusion transcripts |
| Key Challenge | Breakpoints in long, repetitive introns; cannot confirm expression [4] | Cannot detect fusions with low/no expression [4] |
| Information Gained | Presence of genetic alteration | Functional, transcribed mRNA product |
| Ideal Use Case | Comprehensive discovery of structural variants | Identifying expressed, potentially actionable oncogenic drivers |
The following diagram outlines the generic workflow for detecting gene fusions from RNA-seq data, highlighting the key steps from sample preparation to final validation.
The bioinformatic detection of fusions from RNA-seq data relies on specialized algorithms that identify chimeric reads—sequences that map to two different genes. Multiple tools have been developed, each with different strengths. Arriba is a fast, accurate algorithm designed for clinical applications, demonstrating high sensitivity even with few supporting reads [7]. STAR-Fusion is another widely used, accurate tool known for its reliability [7]. With the advent of long-read sequencing (e.g., PacBio, Oxford Nanopore), new tools like GFvoter have emerged, leveraging longer reads to span complex fusion junctions with high precision [8].
Table 2: Performance Benchmark of Fusion Detection Tools on Real and Simulated RNA-seq Datasets
| Tool | Average Precision | Average Recall (Sensitivity) | Key Performance Insight |
|---|---|---|---|
| GFvoter (Long-read) | 58.6% | Comparable or superior to other tools | Achieved the highest average F1 score (0.569), indicating best precision-recall balance [8]. |
| LongGF (Long-read) | 39.5% | Varies by dataset | Lower precision compared to GFvoter [8]. |
| JAFFAL (Long-read) | 30.8% | Varies by dataset | Lower precision compared to GFvoter [8]. |
| Arriba (Short-read) | High (specific data not shown) | High | Rediscovered 88/150 simulated fusions at low expression; superior sensitivity on multiple benchmarks [7]. |
| FusionCatcher (Short-read) | High (specific data not shown) | High | Identified all synthetic spike-in fusions in benchmark [7]. |
The performance of RNA-seq in a clinical setting is robust. A 2021 study on 806 acute myeloid leukemia (AML) samples found that RNA-seq detected 90% of fusion events that were reported with high evidence by conventional diagnostics (karyotyping, FISH, RT-PCR) [9]. Similarly, a 2024 study in acute leukemia demonstrated a 83.3% sensitivity for RNA-seq compared to conventional methods, while also identifying novel fusions missed by standard approaches [6].
Recent technological advances are further refining fusion detection. Single-cell RNA-seq (scRNA-seq) allows researchers to detect fusions at the single-cell level, revealing tumor heterogeneity and identifying rare subclones harboring driver fusions. The tool scFusion was developed specifically for this purpose, effectively controlling for the high technical noise in scRNA-seq data to identify fusions with high sensitivity and a low false discovery rate [10]. Meanwhile, long-read transcriptome sequencing (e.g., PacBio) produces reads that are thousands of bases long, enabling a single read to span an entire fusion junction without assembly, simplifying detection and reducing false positives [8].
A typical experiment to identify gene fusions via RNA-seq involves a multi-step process. First, total RNA is extracted from tumor samples or cell lines, ensuring high quality and integrity (RIN > 8). The RNA is then used to prepare a sequencing library, which is typically sequenced on an Illumina platform to generate high-throughput short reads (e.g., 2x150 bp) [7] [6].
For bioinformatics analysis, the raw sequencing reads are first processed for quality control using tools like FastQC. High-quality reads are then aligned to a reference genome (e.g., GRCh38) using a splice-aware aligner such as STAR [7] [10]. The aligned data is subsequently analyzed by one or more fusion detection algorithms (e.g., Arriba, STAR-Fusion). Using two tools and taking the union or intersection of their predictions is a common practice to improve robustness [7]. The final list of high-confidence fusion calls must undergo manual inspection in a genome browser (e.g., IGV) and orthogonal validation using an independent method such as RT-PCR or FISH [6] [9].
Table 3: Essential Research Reagent Solutions for Fusion Detection Studies
| Reagent / Tool Category | Example Products | Critical Function in Experiment |
|---|---|---|
| RNA Extraction & QC | TRIzol, Qiagen RNeasy Kits, Agilent Bioanalyzer | Isolate high-quality, intact RNA for accurate transcriptome representation. |
| Library Prep Kits | Illumina Stranded mRNA Prep | Convert RNA into a sequence-ready library, often with barcoding for multiplexing. |
| Alignment Software | STAR, HISAT2, Minimap2 (for long-reads) | Map sequencing reads to the reference genome, crucially identifying splice and fusion junctions. |
| Fusion Callers | Arriba, STAR-Fusion, GFvoter, FusionCatcher | Apply specialized algorithms to aligned reads to identify and filter candidate gene fusions. |
| Validation Reagents | FISH probes, PCR primers, TaqMan assays | Provide independent, orthogonal confirmation of high-priority fusion events. |
Gene fusions are critical drivers of oncogenesis, functioning through diverse mechanisms such as constitutive kinase activation and transcriptional deregulation. While DNA-seq can identify the genomic rearrangements behind fusions, RNA-seq has emerged as the superior method for detecting the expressed, functional fusion transcripts that are most relevant for cancer biology and targeted therapy. The ongoing development of more accurate bioinformatics tools like Arriba and GFvoter, coupled with revolutionary technologies like single-cell and long-read sequencing, is steadily enhancing our detection capabilities. The integration of RNA-seq into clinical workflows provides a comprehensive and powerful approach to uncovering these key molecular alterations, ultimately advancing precision oncology and improving patient outcomes.
Structural variants (SVs) represent a category of genomic alterations involving segments of DNA larger than 50 base pairs, including deletions, duplications, inversions, translocations, and insertions. These variants play significant roles in human disease, particularly in cancer, where they can drive tumorigenesis through mechanisms such as oncogene activation, tumor suppressor inactivation, and the creation of novel fusion genes. DNA sequencing (DNA-seq) provides the fundamental technology for directly interrogating the genomic blueprint to identify these structural alterations at their source. Unlike RNA sequencing (RNA-seq), which examines the transcriptomic consequences of genetic changes, DNA-seq reveals the underlying architectural variations in the genome itself, offering complementary insights for comprehensive genomic profiling in both research and clinical diagnostics.
The ability to accurately detect SVs has profound implications for understanding cancer biology and advancing personalized medicine. Numerous SVs are now recognized as clinically actionable biomarkers, with fusion genes involving drivers such as ALK, RET, ROS1, and NTRK serving as prime examples for which targeted therapies have been developed. However, the detection of these variants presents substantial technical challenges, leading to the development of diverse DNA-seq approaches with varying capabilities and limitations for comprehensive structural variant interrogation.
DNA sequencing approaches for structural variant detection can be broadly categorized into three main methodologies, each with distinct strengths and limitations for SV identification:
Whole Genome Sequencing (WGS) sequences the entire DNA genome, enabling the detection of virtually any type of mutation throughout both coding and non-coding regions. This approach can identify single nucleotide variants (SNVs), insertions and deletions (indels), structural variants, and copy number variations (CNVs) across the complete genome. WGS is particularly valuable for discovering novel structural variants in regions outside traditional exonic targets and for analyzing samples without established reference genomes [4].
Whole Exome Sequencing (WES) focuses specifically on sequencing the protein-coding regions (exons) of the genome, which represent approximately 3% of the human genome. This targeted approach efficiently identifies SNVs and indels within exonic regions while omitting regulatory elements such as promoters and enhancers. While WES is more cost-effective and generates less data than WGS, its limited genomic coverage reduces its effectiveness for detecting structural variants that involve non-coding or intergenic regions [4].
Targeted Sequencing concentrates on a predetermined subset of genomic regions, such as specific genes known to be involved in disease pathways. This approach offers the most cost-effective and focused analysis, with enhanced sensitivity for detecting low-frequency variants—a particular advantage in heterogeneous samples like tumors. However, its targeted nature means it can only identify structural variants within the preselected genomic regions and may miss novel or unexpected rearrangements [4].
The standard workflow for detecting structural variants via DNA-seq involves multiple critical steps from sample preparation through bioinformatic analysis. The following diagram illustrates this comprehensive process:
The process begins with DNA extraction from patient samples (e.g., blood, saliva, tissue biopsies), leveraging DNA's relative stability compared to RNA. Following extraction, DNA undergoes fragmentation and library preparation, which may include mechanical shearing, adaptor ligation, and PCR amplification depending on the specific protocol. The prepared libraries are then sequenced using platforms such as Illumina, Ion Torrent, PacBio, or Oxford Nanopore, each offering different trade-offs in read length, accuracy, and throughput [4] [11].
The resulting sequencing reads are aligned to a reference genome using specialized tools like BWA or Bowtie, which map the short DNA fragments to their corresponding genomic positions. SV calling algorithms then analyze the aligned reads for patterns indicative of structural variants, such as discordant read pairs, split reads, or read depth anomalies. Commonly used tools for this purpose include GATK, Samtools, CNVnator, and Lumpy [4]. Finally, detected variants undergo annotation and filtering to determine their potential functional consequences using tools like ANNOVAR or VEP, and to prioritize likely pathogenic events based on population frequency, predicted impact on coding sequences, and overlap with known regulatory elements.
Different DNA sequencing platforms offer distinct performance characteristics that significantly impact their effectiveness for structural variant detection. The following table summarizes key metrics across major sequencing platforms:
Table 1: Performance Comparison of DNA Sequencing Platforms for SV Detection
| Platform | Read Length | Accuracy | Key Strengths for SV Detection | Primary Limitations |
|---|---|---|---|---|
| Illumina HiSeq/NovaSeq | Short (150-250 bp) | High (>99.9%) | Most consistent genome coverage; robust indel detection [12] | Limited in repetitive regions; short reads hamper complex SV resolution |
| PacBio HiFi | 10-25 kb | >99.9% (HiFi consensus) | Excellent for complex regions; high mapping accuracy; top SV detection performance [13] | Higher cost per genome; moderate throughput |
| Oxford Nanopore | Up to >1 Mb | ~98-99.5% (Q20+ chemistry) | Ultra-long reads resolve large SVs; portability; real-time analysis [13] | Historically lower accuracy (improving with recent chemistry) |
| Ion Torrent | Mid-length | Mid-accuracy | Fast turnaround; lower capital cost [12] | Higher error rates in homopolymers; moderate read lengths |
| BGISEQ-500/MGISEQ-2000 | Short | Low error rates | Competitive cost structure | Limited independent validation in clinical settings |
The Association of Biomolecular Resource Facilities (ABRF) Next-Generation Sequencing Study comprehensively benchmarked these platforms, revealing that each exhibits particular strengths depending on the variant type and genomic context being interrogated [12]. Among short-read instruments, Illumina's HiSeq 4000 and X10 systems provided the most consistent, highest genome coverage, while NovaSeq 6000 using 2 × 250-bp read chemistry proved most robust for capturing known insertion/deletion events. For long-read platforms, PacBio circular consensus sequencing (CCS) demonstrated the highest reference-based mapping rate and lowest non-mapping rate, with both PacBio CCS and Oxford Nanopore technologies showing superior sequence mapping in repeat-rich areas and across homopolymers [12].
In clinical settings, the performance of DNA-seq methodologies varies significantly based on the specific application and variant type being targeted. The following table compares the detection capabilities across DNA-seq approaches for key structural variants:
Table 2: DNA-seq Method Performance for Oncogenic Fusion Detection in Clinical Samples
| DNA-seq Method | Detection Rate for Known Fusions | Advantages | Limitations |
|---|---|---|---|
| Amplicon-based DNA/RNA-seq | 82.6% of theoretical fusion detection capability [14] | Streamlined workflow; cost-effective for targeted detection | Misses rare/novel fusions; limited by primer design |
| Hybridization-capture-based RNA-seq (reflex testing) | Additional ~10% yield over amplicon-based alone [14] | Improved rare/novel fusion detection; maximizes therapy eligibility | Requires secondary testing; increased cost and time |
| Short-read WGS | Variable depending on coverage and bioinformatics | Comprehensive genome-wide coverage | Misses complex rearrangements in repetitive regions |
| Long-read WGS | Highest for complex SVs | Resolves repetitive regions; phased variant calling | Higher cost; emerging clinical validation |
A study of 1,211 non-small cell lung carcinoma specimens highlights these performance differences, showing that approximately 10% of cases required reflex hybridization-capture-based RNA sequencing after initial negative amplicon-based DNA/RNA sequencing [14]. In these reflex-tested cases, otherwise missed clinically actionable fusions involving ALK, BRAF, NRG1, NTRK3, ROS1, and RET were identified—none of which were detected by the initial amplicon-based assay. Analysis of the American Association for Cancer Research Project Genie database (v15.1) encompassing 20,900 NSCLC cases confirmed that while amplicon-based assays could theoretically detect 82.6% of known fusions, a significant minority require alternative approaches for identification [14].
The detection of fusion genes represents a critical application of structural variant analysis in cancer genomics, with both DNA-seq and RNA-seq offering complementary approaches. The fundamental differences between these methodologies are illustrated in the following diagram:
DNA-seq identifies fusion genes by detecting the underlying genomic rearrangements that bring two separate genes into proximity, such as chromosomal translocations, inversions, or deletions. This approach provides direct evidence of the structural variant at the DNA level but faces challenges when breakpoints occur within long intronic regions or repetitive sequences, which are difficult to resolve with short-read technologies [4]. Additionally, DNA-seq cannot determine whether a genomic rearrangement produces a functionally expressed fusion transcript.
In contrast, RNA-seq detects the chimeric transcripts resulting from expressed fusion genes, providing direct evidence of functional consequences at the transcript level. This approach naturally focuses on clinically relevant expressed fusions and avoids the challenges of intronic breakpoint mapping. However, RNA-seq may miss genomic rearrangements that do not produce stable transcripts or those expressed at low levels, and it can be confounded by transcriptional noise or trans-splicing events [4].
Comparative studies in clinical cohorts demonstrate the complementary value of DNA and RNA sequencing approaches. In an analysis of 806 acute myeloid leukemia samples, routine diagnostic methods (primarily karyotyping and FISH) identified 138 true fusions, with RNA-seq detecting 89.9% of these benchmark fusions [9]. Notably, the samples in which RNA-seq failed to detect fusion genes generally had lower and more inhomogeneous sequence coverage, particularly for genes including CBFB and KMT2A [9].
Long-read sequencing technologies have emerged as particularly powerful tools for fusion detection, as they can span complex rearrangement structures and provide complete transcript information. PacBio's HiFi sequencing enables full-length RNA isoform sequencing (Iso-Seq), which resolves complex fusions with precise breakpoints and complete sequence readouts of associated fusion transcripts [15]. Similarly, Oxford Nanopore technologies generate ultra-long reads capable of encompassing entire fusion transcripts in single sequencing reads [13]. Recent tools such as GFvoter, designed specifically for long-read transcriptome data, have demonstrated superior performance in fusion detection, achieving the highest F1 scores across multiple experimental datasets compared to alternative methods [8].
Table 3: Essential Research Reagent Solutions for DNA-seq SV Detection
| Reagent/Resource | Function | Application Notes |
|---|---|---|
| High-molecular-weight DNA extraction kits | Preserve long DNA fragments for optimal SV detection | Critical for long-read sequencing; maintain DNA integrity |
| Fragment libraries | Prepare DNA for sequencing through fragmentation and adaptor ligation | Vary by platform; mechanical shearing common for WGS [11] |
| Hybridization capture baits | Enrich specific genomic regions for targeted sequencing | Enable focused SV detection in genes of interest |
| BLESS/DSBCapture/BLISS reagents | Map DNA double-strand breaks (DSBs) experimentally | Identify DSB-prone regions linked to SV formation [16] |
| Chromatin immunoprecipitation (ChIP) reagents | Profile protein-DNA interactions and histone modifications | Understand SV formation in chromatin context [16] |
| ATAC-seq reagents | Assess chromatin accessibility genome-wide | Correlate open chromatin with SV susceptibility [17] |
| BWA/Bowtie alignment tools | Map sequencing reads to reference genomes | Foundation for SV detection pipelines [4] |
| GATK/Samtools variant callers | Identify genetic variants from aligned reads | Detect SNVs/indels; prerequisite for some SV callers [4] |
| CNVnator/Lumpy/SVIM | Specifically detect structural variants | Specialized for different SV types and size ranges [13] [4] |
| ANNOVAR/VEP | Annotate functional consequences of variants | Prioritize potentially pathogenic SVs [4] |
DNA sequencing provides an essential foundation for interrogating the genomic blueprint of structural variants, offering direct detection of chromosomal rearrangements at their origin. While multiple DNA-seq approaches exist—from targeted panels to whole-genome sequencing—each presents distinct advantages and limitations for comprehensive SV detection. The emergence of long-read sequencing technologies has significantly improved the resolution of complex structural variants, particularly in repetitive regions that challenge short-read platforms.
In clinical practice, DNA-seq-based fusion detection identifies approximately 82.6% of theoretically detectable oncogenic fusions, with reflex RNA-seq testing capturing an additional 10% of cases that would otherwise be missed [14]. This demonstrates the complementary nature of genomic and transcriptomic approaches for comprehensive fusion detection. As sequencing technologies continue to advance, with both PacBio HiFi and Oxford Nanopore platforms achieving increasingly higher accuracy and longer read lengths, the integration of DNA and RNA sequencing approaches will likely become standard practice in clinical diagnostics, ultimately expanding patient eligibility for targeted therapies and clinical trials through improved detection of rare and novel structural variants.
Gene fusions, hybrid molecules formed by the joining of two previously separate genes, represent a critical class of genomic alterations with profound implications in cancer research and therapeutic development. These chimeric entities typically arise from chromosomal rearrangements such as translocations, inversions, or deletions, and can function as powerful oncogenic drivers by activating proto-oncogenes or inactivating tumor suppressors. The detection of fusion transcripts has become indispensable for disease classification, risk stratification, and therapeutic decision-making, particularly with the growing availability of targeted therapies against fusion-driven cancers.
The transcriptome represents the complete set of RNA transcripts produced by the genome at any given time, providing a dynamic view of genetic activity. Within this landscape, RNA sequencing (RNA-seq) has emerged as a powerful methodology for capturing expressed fusion transcripts, offering distinct advantages over DNA-based approaches. While DNA sequencing reveals the structural blueprint of genetic alterations, RNA-seq directly interrogates the functional expression of these changes, distinguishing driver fusion events with oncogenic potential from passenger events that may not contribute to tumorigenesis. This fundamental distinction positions RNA-seq as an essential tool for comprehensive fusion transcript characterization in both research and clinical settings.
The choice between RNA-seq and DNA-seq for fusion detection hinges on their complementary strengths and limitations. DNA-based approaches, including whole-genome sequencing (WGS), can identify structural variants across the entire genome but face challenges in determining the functional consequences of these alterations. The breakpoints of fusion genes often occur within long intronic regions containing repetitive sequences, making them difficult to resolve and accurately identify using DNA-seq [4]. Furthermore, DNA-seq cannot distinguish between expressed, potentially oncogenic fusions and silent rearrangements that may not contribute to disease pathogenesis.
In contrast, RNA-seq directly sequences the transcriptome, capturing evidence of fusion transcripts that are actively expressed. This approach naturally enriches for exonic sequences and provides direct evidence of chimeric transcripts, bypassing the challenges posed by intronic regions. Additionally, RNA-seq can reveal the exact breakpoints at the transcript level and identify different fusion isoforms that may arise from the same genomic rearrangement [4]. The table below summarizes the key distinctions between these approaches for fusion detection:
Table: Comparison of DNA-seq and RNA-seq for Fusion Gene Detection
| Feature | DNA-seq | RNA-seq |
|---|---|---|
| Target | Genomic DNA structure | Expressed RNA transcripts |
| Breakpoint Resolution | Challenging in repetitive intronic regions | Focused on exonic regions; precise transcript breakpoints |
| Functional Insight | Identifies structural variants without expression context | Directly detects expressed, potentially functional fusions |
| Fusion Isoforms | Limited ability to resolve different transcript isoforms | Can identify multiple fusion isoforms from same rearrangement |
| Coverage Requirements | Requires deep coverage across introns and exons | Naturally enriches for exonic sequences |
| Therapeutic Relevance | May detect silent rearrangements without functional impact | Prioritizes expressed fusions with potential clinical actionability |
Despite these advantages, RNA-seq has limitations, including its dependence on adequate RNA quality and quantity, and the challenge of detecting fusions involving genes with low expression levels. The most comprehensive approach often involves combining both DNA and RNA-level analyses to obtain a complete picture of genomic rearrangements and their functional consequences.
The standard RNA-seq workflow begins with RNA extraction from patient samples, which can include fresh frozen tissue, formalin-fixed paraffin-embedded (FFPE) specimens, or cell lines. Due to RNA's inherent instability compared to DNA, careful preservation and extraction methods are critical to maintain RNA integrity. The extracted RNA undergoes reverse transcription to complementary DNA (cDNA), followed by library preparation and next-generation sequencing. Specific variations in library preparation methodology define the major RNA-seq approaches for fusion detection.
The following diagram illustrates the core workflow and decision points in RNA-seq for fusion transcript detection:
Targeted RNA-seq methods focus sequencing power on specific genes of interest, offering enhanced sensitivity for detecting low-abundance fusion transcripts. Amplicon-based approaches utilize gene-specific primers to amplify targeted regions, making them particularly effective when prior knowledge of potential fusion partners exists. Studies have demonstrated that amplicon-based assays can achieve sensitivity of 93.3% and specificity of 100% for fusion detection [18] [19]. These methods typically employ unique molecular identifiers (UMIs) to mitigate PCR amplification biases and improve detection accuracy.
Hybridization capture-based methods use complementary probes to enrich for target genes before sequencing. This approach offers greater flexibility for detecting novel fusion partners compared to amplicon-based methods. A recent study of non-small cell lung cancer specimens found that adding reflex hybridization capture-based RNA-seq identified actionable oncogenic fusions in approximately 10% of cases that were missed by initial amplicon-based testing [14]. These fusions involved clinically relevant genes including ALK, BRAF, NRG1, NTRK3, ROS1, and RET.
Whole transcriptome sequencing provides an unbiased approach to fusion discovery by sequencing all expressed genes without prior selection. This method enables detection of novel fusion events without predetermined expectations about fusion partners but typically requires higher sequencing depth and more extensive bioinformatic analysis. Recent advances in long-read sequencing technologies from Pacific Biosciences and Oxford Nanopore now enable full-length isoform sequencing, providing unprecedented resolution of fusion transcript structures [20]. These technologies are particularly valuable for resolving complex fusion isoforms and identifying fusions in single-cell transcriptomes.
Comparative studies provide critical insights into the performance characteristics of different RNA-seq approaches. In a comprehensive analysis of 806 acute myeloid leukemia samples, RNA-seq detected 90% of fusion events that were reported by routine diagnostic methods with high evidence, demonstrating strong concordance with established techniques [9]. The performance varied based on sequencing coverage, with samples exhibiting lower and inhomogeneous coverage showing reduced sensitivity, particularly for fusions involving CBFB and KMT2A.
A large-scale study comparing targeted RNA-seq with optical genome mapping (OGM) in 467 acute leukemia cases revealed an overall concordance rate of 88.1% for fusion detection [21]. The performance differed significantly based on fusion type: OGM uniquely detected 15.8% of clinically relevant rearrangements, while RNA-seq exclusively identified 9.4%. This highlights the complementary nature of different technologies, with RNA-seq demonstrating superior detection of expressed chimeric fusions, while OGM more effectively identified cryptic, enhancer-driven events that may not generate fusion transcripts.
Table: Performance Comparison of RNA-seq Fusion Detection Approaches
| Platform/Method | Sensitivity | Specificity | Key Strengths | Study Details |
|---|---|---|---|---|
| Targeted Amplicon (QIAseq) | 93.3% | 100% | Optimal for low-input samples; UMIs reduce false positives | 74 positive, 36 negative controls [19] |
| Hybridization Capture | ~90% (for reflex testing) | ~100% | Detects novel fusions; complements amplicon-based methods | Identified fusions in 10% of NSCLC cases missed by amplicon [14] |
| Whole Transcriptome | 89.9% | Varies by tools | Unbiased discovery; detects novel fusions | 806 AML samples; coverage-dependent [9] |
| Long-read Sequencing | Superior for complex isoforms | High with proper tools | Resolves full-length fusion structures | CTAT-LR-Fusion tool benchmarking [20] |
The accurate identification of fusion transcripts from RNA-seq data requires sophisticated computational approaches. Current methods primarily fall into two categories: read-mapping approaches that align sequences to reference genomes or transcriptomes to identify discordant reads, and de novo assembly-based approaches that reconstruct transcripts before identifying chimeric sequences. Benchmarking studies have evaluated numerous fusion detection tools, with STAR-Fusion, Arriba, and STAR-SEQR consistently demonstrating high accuracy and fast performance for fusion detection on cancer transcriptomes [22].
Performance varies significantly among tools, with mapping-based approaches generally outperforming assembly-based methods in terms of sensitivity. In simulated data benchmarks, Arriba, Pizzly, STAR-SEQR, and STAR-Fusion emerged as top performers, while methods requiring de novo transcriptome assembly exhibited high precision but suffered from comparably low sensitivity [22]. Fusion detection sensitivity is notably affected by fusion expression levels, with most tools performing better for moderately and highly expressed fusions.
The high rate of false positives represents a significant challenge in fusion transcript detection, necessitating robust validation strategies. Integration with whole-genome sequencing (WGS) data provides orthogonal confirmation of fusion events at the DNA level. Recently developed pipelines for validating fusion transcripts in matched WGS data have demonstrated superior sensitivity and speed compared to established structural variant callers like BreakDancer and Manta [23]. These approaches use focused searches based on RNA-seq fusion predictions to identify supporting evidence in WGS data, significantly reducing computational requirements while maintaining high sensitivity.
Successful detection of fusion transcripts requires careful selection of laboratory reagents and computational resources. The following table outlines essential components of a robust fusion detection workflow:
Table: Essential Research Reagents and Materials for Fusion Transcript Detection
| Category | Specific Products/Tools | Function and Application Notes |
|---|---|---|
| RNA Extraction | miRNeasy Kit (Qiagen), miRNAeasy FFPE kit | Maintain RNA integrity; specialized protocols for FFPE samples |
| Library Prep | QIAseq RNAscan Custom Panel, Illumina TruSeq Stranded Total RNA | Target-specific vs. whole transcriptome approaches |
| rRNA Depletion | Ribo-Zero (Illumina) | Remove ribosomal RNA to enrich for mRNA targets |
| Target Enrichment | OSU-SpARKFuse custom probes, xGen Lockdown Probes | Hybridization capture for targeted sequencing |
| Sequencing | Illumina MiSeq, NextSeq; PacBio Sequel; Oxford Nanopore | Platform selection based on read length and accuracy needs |
| Bioinformatics | STAR-Fusion, Arriba, CTAT-LR-Fusion, SeekFusion | Fusion detection algorithms with varying performance characteristics |
| Validation | OncoScan FFPE Assay Kit, RT-PCR, Orthogonal WGS | Confirm fusion events identified by RNA-seq |
RNA-seq has established itself as an indispensable technology for capturing expressed fusion transcripts in the transcriptome, providing critical functional insights that complement DNA-level structural information. The optimal approach to fusion detection depends on specific research objectives, sample characteristics, and available resources. Targeted methods offer high sensitivity for known fusions in challenging samples like FFPE, while whole transcriptome and long-read approaches enable novel fusion discovery and isoform resolution. As sequencing technologies continue to evolve and computational methods improve, RNA-seq will undoubtedly remain central to advancing our understanding of fusion transcripts in cancer biology and therapeutic development.
In the field of cancer genomics, the accurate detection of fusion genes is crucial for diagnosis, prognosis, and guiding targeted therapies. Two primary sequencing approaches—DNA sequencing (DNA-seq) and RNA sequencing (RNA-seq)—offer distinct technological pathways for this detection, each with fundamental differences in what they measure: genomic breakpoints versus transcript expression. DNA-seq identifies structural rearrangements at the DNA level, including the precise breakpoints in the genome where different genes have joined. In contrast, RNA-seq detects the RNA transcripts that are actually expressed from such rearrangements, revealing the functional fusion products [4] [24]. This guide provides an objective comparison of their performance, supported by experimental data and detailed methodologies.
The following table summarizes the fundamental differences between DNA-seq and RNA-seq in the context of fusion gene detection.
| Feature | DNA-Sequencing (DNA-seq) | RNA-Sequencing (RNA-seq) |
|---|---|---|
| Detection Principle | Identifies structural rearrangements and breakpoints in the genome itself [4]. | Identifies chimeric transcripts that are expressed and spliced [4] [24]. |
| Molecular Target | Genomic DNA (including introns and exons) [4]. | Complementary DNA (cDNA) derived from processed mRNA (exons only) [4] [24]. |
| Key Advantage | Can detect rearrangements regardless of whether they are expressed as RNA [4]. | Directly confirms expression; avoids sequencing long introns by focusing on spliced exon junctions [4] [24]. |
| Key Challenge | Breakpoints often lie in long, repetitive intronic regions, making them difficult to cover and sequence [4]. | Requires high-quality RNA and sufficient expression of the fusion transcript for detection [4]. |
The diagram below illustrates the core logical relationship between what each technology detects and its corresponding output.
Empirical studies and large-scale clinical validations consistently demonstrate the performance characteristics of each method. The following table quantifies their relative strengths and limitations.
| Study & Context | Key Finding on DNA-seq | Key Finding on RNA-seq | Experimental Detail |
|---|---|---|---|
| Tempus (Real-World, n=~80k) [25] | Detected only 4.8% of actionable fusions exclusively. | Detected 29.1% of actionable fusions exclusively, a ~6x increase over DNA-seq alone. | Assay: Tempus xT (DNA-seq panel for 21 fusions + whole-exome RNA-seq). |
| Targeted RNA-seq (Clinical Cohort) [26] | N/A (Compared to FISH/RT-PCR). | Increased overall diagnostic rate from 63% (conventional methods) to 76%. | Assay: Custom targeted RNA-seq panels for hematological and solid tumors. |
| FFPE Tumor Validation [27] | A DNA panel missed a MET fusion (false negative). | The RNA-seq assay identified the MET fusion and 26 extra fusions; 77% were validated. | Sample: Formalin-Fixed, Paraffin-Embedded (FFPE) tumor samples. |
| Acute Myeloid Leukemia Study [9] | Routine diagnostics (karyotyping, FISH, PCR) identified 107/138 fusions with high evidence. | Detected 115/138 fusions with high evidence, showing strong concordance and complementary value. | Sample: 806 patient samples; Tools: Arriba and FusionCatcher. |
The DNA-based approach focuses on identifying the genomic locus where a chromosomal rearrangement has occurred.
Workflow Overview:
Limitations: The fundamental challenge is that breakpoints for gene fusions often occur within long intronic regions. These regions are difficult to cover with sufficient sequencing depth, and their repetitive nature complicates accurate alignment and variant calling [4].
The RNA-based approach skips the DNA breakpoint and instead focuses on the expressed, spliced mRNA product of the fusion gene.
Workflow Overview:
Successful detection and validation of fusion genes require a combination of laboratory and computational resources. The following table lists essential solutions and their functions.
| Research Reagent / Tool | Function / Application |
|---|---|
| Arriba & FusionCatcher | Widely used, state-of-the-art fusion detection software tools that are often used in conjunction for high-confidence calling [9] [26]. |
| STAR-Fusion | Another accurate and widely used fusion detection algorithm, based on the STAR aligner [26]. |
| Targeted RNA-seq Panels | Biotinylated oligonucleotide probes designed to enrich for hundreds of known fusion-related genes, dramatically increasing sensitivity for low-expression fusions and enabling work with degraded samples [26]. |
| FFPE-RNA Extraction Kits | Specialized reagents for extracting usable RNA from Formalin-Fixed, Paraffin-Embedded (FFPE) tissue blocks, the most common form of clinical archiving [27]. |
| Spike-in Control RNAs | Synthetic RNA controls (e.g., ERCC, fusion sequins) spiked into samples to quantitatively evaluate the sensitivity, accuracy, and limit of detection of the entire RNA-seq workflow [26]. |
| Long-read Aligners (Minimap2) | Essential software for aligning data from long-read sequencing technologies (PacBio, Nanopore), which is crucial for tools like GFvoter [8]. |
The choice between DNA-seq and RNA-seq for fusion gene detection is not a matter of one being universally superior, but rather of understanding their complementary strengths. DNA-seq is unparalleled in identifying the genomic architecture and breakpoints of structural rearrangements. However, for confirming the expression of a functionally consequential fusion transcript with high sensitivity and clinical actionability, RNA-seq has demonstrated a clear and significant advantage. The most robust clinical and research practice is to utilize these technologies in tandem, where DNA-seq provides the structural context and RNA-seq delivers functional validation of the expressed fusion, ensuring the most comprehensive and accurate detection for precision oncology.
The diagnosis of gene fusions, critical drivers in cancer, has historically relied on traditional molecular techniques such as fluorescence in situ hybridization (FISH) and quantitative real-time polymerase chain reaction (RT-PCR). Though highly sensitive, these methods are typically limited to testing for a single fusion gene per assay, often resulting in a lengthy, iterative, and costly diagnostic path. Furthermore, they are unable to identify novel fusion gene partners or resolve complex structural rearrangements, with false-negative results from non-tested fusions being a leading cause of misdiagnosis in haematological cancers [26]. The advent of next-generation sequencing (NGS) has fundamentally transformed this landscape by enabling genome-wide surveillance of fusion genes with nucleotide-level resolution. Among NGS approaches, a key distinction exists between DNA-sequencing (DNA-seq) and RNA-sequencing (RNA-seq) methods, each with unique strengths and limitations for fusion detection. This guide objectively compares the performance of these platforms, framing the discussion within the broader thesis of RNA-seq versus DNA-seq for fusion detection research.
DNA-seq and RNA-seq assays employ distinct laboratory methods and bioinformatic pipelines to identify gene fusions. DNA-based NGS (including whole-genome, whole-exome, or targeted panels) detects rearrangements at the genomic DNA level by identifying sequencing reads that span breakpoints between different genes or chromosomal regions. In contrast, RNA-based NGS detects the expressed transcript resulting from a gene fusion, effectively capturing the chimeric RNA molecule. Common RNA-seq enrichment methods include anchored multiplex PCR (AMP), amplicon-based multiplex PCR, and hybrid capture-based enrichment [30] [26].
Table 1: Core Methodological Differences Between DNA-seq and RNA-seq for Fusion Detection
| Feature | DNA-Sequencing (DNA-seq) | RNA-Sequencing (RNA-seq) |
|---|---|---|
| Target Molecule | Genomic DNA | Messenger RNA (transcriptome) |
| Detection Principle | Identifies structural rearrangements and breakpoints in the DNA sequence | Identifies chimeric fusion transcripts |
| Key Enrichment Methods | Hybrid-capture, Amplicon | Anchored Multiplex PCR, Hybrid-capture, Amplicon |
| Ability to Detect Novel Partners | Limited to targeted genomic regions; can be challenging | High, especially with anchored multiplex or hybrid-capture methods |
| Confirmation of Expression | No; identifies potential but not necessarily expressed fusions | Yes; directly confirms the fusion is transcribed |
| Influence of Gene Expression | Independent of expression level | Dependent on transcript abundance |
A critical advancement in the diagnostic workup is the use of reflex testing protocols. Studies in non-small-cell lung carcinoma (NSCLC) have demonstrated that an algorithm using an initial amplicon-based DNA/RNA test, followed by reflex hybridization-capture–based RNA-seq for negative cases, significantly improves the detection of rare and novel oncogenic fusions, thereby maximizing patient eligibility for targeted therapies [14].
Head-to-head comparative studies reveal that RNA-seq and DNA-seq platforms are largely complementary, with each method uniquely detecting a subset of clinically significant rearrangements.
A large-scale study of 467 acute leukemia cases directly compared a 108-gene targeted RNA-seq panel with Optical Genome Mapping (OGM), a DNA-level structural variant mapping technique. The results demonstrated an overall concordance rate of 88.1% for clinically relevant events [21]. However, each method contributed unique findings [21]:
In solid tumors, RNA-seq has proven highly effective, particularly for biomarker-driven therapies. For example, in the detection of NTRK fusions—an FDA-approved target—RNA-seq is one of the most sensitive methods [30]. A comparative study of three RNA-seq chemistries found that while amplicon-based multiplex PCR had the lowest limit of detection, both hybrid-capture and anchored multiplex PCR methods were superior for detecting NTRK fusions with uncommon or novel partners [30].
Table 2: Performance Comparison of RNA-seq Assay Types for Fusion Detection
| Performance Metric | Amplicon-Based Multiplex PCR | Anchored Multiplex PCR (AMP) | Hybrid-Capture-Based |
|---|---|---|---|
| Analytical Sensitivity | Highest (Lowest Limit of Detection) | High | High |
| Ability to Detect Novel/Uncommon Partners | Limited | High | High |
| Example Clinical Utility | Detecting known, common fusions | Discovery; complex rearrangements | Comprehensive profiling; reflex testing |
The analytical performance of RNA-seq has been rigorously validated for clinical use. One study developed an RNA-seq assay for formalin-fixed, paraffin-embedded (FFPE) tumors, demonstrating it could identify all spiked-in NTRK fusions from reference material and achieved a detection limit down to 10% tumor content in dilution experiments [27]. The assay showed 83.3% sensitivity against a DNA panel and successfully identified additional fusions not covered by the DNA assay [27].
To ensure reproducibility, below are the core experimental protocols from key studies cited in this guide.
Targeted RNA-seq for Fusion Detection in Leukemia (from [21])
Validation of an RNA-seq Assay for FFPE Tumors (from [27])
The following diagram illustrates the integrated DNA/RNA-seq reflex testing workflow used to maximize fusion detection in non-small-cell lung cancer, as described in the research [14].
Successful implementation of NGS-based fusion detection requires a suite of specialized reagents, kits, and computational tools.
Table 3: Key Research Reagent Solutions for NGS-Based Fusion Detection
| Item | Function/Description | Example Kits/Tools (from Search Results) |
|---|---|---|
| RNA Extraction Kits | Isolate high-quality RNA from cell, tissue, or FFPE samples. | QIAgen RNeasy Kit [31], TRIzol-based methods [31] |
| Library Prep Kits | Prepare sequencing libraries from extracted RNA. | Illumina TruSeq mRNA stranded, NEBnext Ultra II RNA, Lexogen QuantSeq-Pool, Alithea MERCURIUS BRB-seq [31] |
| Target Enrichment | Enrich for target genes/transcripts prior to sequencing. | Archer AMP Kit (Anchored Multiplex PCR) [21], Hybrid-capture probes [26] |
| Sequencing Platforms | Instruments to perform high-throughput sequencing. | Illumina NovaSeq 6000, Illumina MiSeq [31] [32] |
| Bioinformatics Tools | Align sequences, detect fusions, and interpret variants. | STAR aligner [31], Arriba [7], STAR-Fusion, FusionCatcher [7] [26], Archer Analysis [21] |
| RNA Quality Control | Assess RNA integrity and quantity prior to library prep. | Agilent Bioanalyzer RNA-6000-Nano chip [31] |
The cost of RNA-sequencing varies significantly based on the library preparation method and sequencing depth. A detailed breakdown shows that library preparation is often the most expensive step [31]. When using a high-throughput NovaSeq S4 flow cell at full capacity, total costs per sample (excluding labor) can range from approximately $37 (using a highly multiplexed kit like BRB-seq at 5M reads) to $114 (using Illumina's TruSeq kit at ≥25M reads) [31]. Core facility pricing from Northwestern University provides a commercial benchmark, listing mRNA-seq complete services (library prep, sequencing, and standard bioinformatics) at $380 per sample for institutional users [32]. These figures highlight that while NGS has become more accessible, budgeting must carefully consider the trade-offs between cost, sequencing depth, and the comprehensiveness of the assay.
The evolution from FISH and PCR to NGS platforms has irrevocably changed the diagnostic landscape for gene fusions. The evidence clearly demonstrates that DNA-seq and RNA-seq are not competing but complementary technologies. DNA-level methods like OGM are superior for detecting structural rearrangements that may not result in fusion transcripts, such as enhancer-hijacking events. Conversely, RNA-seq directly confirms the expression of a chimeric fusion, is more sensitive for fusions arising from intrachromosomal deletions, and excels at identifying novel fusion partners, making it indispensable for comprehensive biomarker testing. The most effective modern diagnostic algorithms, therefore, leverage the strengths of both approaches, often through reflexive testing protocols. As sequencing costs continue to decline and bioinformatic tools like Arriba [7] improve in speed and accuracy, the integration of multi-modal NGS testing will become the standard of care, ensuring that patients receive the most precise diagnosis and access to targeted therapies.
The detection of gene rearrangements, such as those producing oncogenic fusions, represents a critical component of precision oncology and genetic disease diagnosis. However, the presence of large intronic regions—stretches of non-coding DNA that can span thousands of bases—poses a formidable challenge for conventional DNA sequencing (DNA-seq) technologies. These intronic regions often contain breakpoints where structural rearrangements occur, yet their length and repetitive nature can obscure detection using standard approaches. While DNA-seq provides essential information about genomic architecture, its limitations in resolving breakpoints within extensive intronic sequences have driven the development of complementary technologies, most notably RNA sequencing (RNA-seq).
The fundamental challenge lies in the technological constraints of most widely-used DNA-seq platforms. Short-read sequencing, while excellent for identifying single nucleotide variants and small insertions/deletions, struggles to span large intronic regions where breakpoints may reside. This limitation becomes clinically significant when rearrangements in these regions produce functionally important fusion genes or disrupt normal gene function. Consequently, understanding the specific scenarios where DNA-seq succeeds versus when it requires augmentation from other methods is essential for researchers and clinicians designing diagnostic approaches.
Conventional DNA-seq approaches face several inherent limitations when targeting rearrangements with breakpoints in large intronic regions. The primary issue stems from library preparation methods and read length constraints. Most targeted DNA-seq panels use hybrid capture or amplicon-based approaches designed to cover exonic regions and occasionally their immediate flanking sequences. This design inevitably creates gaps in coverage across large introns, resulting in an inability to detect breakpoints occurring in these under-covered regions [14].
The fundamental detection challenge arises because DNA rearrangements involving large intronic regions may not produce physically close breakpoints in the linear genome. When using short-read sequencing (typically 75-300 bp), the two ends of a rearrangement event may be separated by distances far exceeding the read length, making it impossible to capture both breakpoints in a single sequencing read. While paired-end sequencing provides some contextual information about these events, the inference of precise breakpoint locations remains challenging when they fall within repetitive or low-complexity sequences common in intronic regions [13].
Additionally, the bioinformatic pipelines used to identify structural variants from DNA-seq data often rely on discordant read pairs and split reads as signals of rearrangement events. For breakpoints in large introns, especially those with repetitive elements, these signatures can be difficult to distinguish from mapping artifacts or technical noise. The problem is particularly pronounced for complex rearrangements involving multiple breakpoints, where the linear distance between genomic features further complicates accurate reconstruction [33].
The table below summarizes the core methodological differences between DNA-seq and RNA-seq approaches for rearrangement detection:
Table 1: Core Methodological Differences Between DNA-seq and RNA-seq for Rearrangement Detection
| Feature | DNA-Seq Approach | RNA-Seq Approach |
|---|---|---|
| Target Material | Genomic DNA | Processed messenger RNA |
| Breakpoint Detection | Direct detection of genomic breakpoints | Detection of expressed fusion transcripts |
| Intronic Region Impact | Limited by intron size and repetitive elements | Introns removed during RNA processing |
| Functional Relevance | Identifies structural variants regardless of functional impact | Confirms expression of fusion products |
| Coverage Requirements | Requires continuous coverage across potential breakpoint regions | Requires coverage of exon boundaries |
| Novel Partner Discovery | Limited to designed target regions | Can identify novel partners via untargeted methods |
DNA-seq identifies structural variants at the genomic level by directly sequencing DNA and looking for abnormalities in sequence arrangement. In contrast, RNA-seq detects the transcriptional consequences of these rearrangements—specifically, the fusion transcripts that result from chromosomal rearrangements [34]. This fundamental difference explains their complementary strengths: DNA-seq can potentially identify all structural variants regardless of their functional consequences, while RNA-seq confirms which variants are actually expressed and likely functionally relevant.
For intronic breakpoints specifically, RNA-seq possesses a distinct advantage because the natural process of RNA splicing removes introns during maturation from pre-mRNA to mRNA. Consequently, RNA-seq only needs to sequence across exon-exon junctions, completely bypassing the challenge of large intronic regions that plague DNA-seq approaches [35]. This enables RNA-seq to detect fusion events regardless of the genomic distance or complexity between partner genes, provided the fusion is expressed at detectable levels.
Recent clinical studies directly comparing DNA-seq and RNA-seq performance have quantified the detection gap for rearrangements with challenging genomic architectures. In non-small cell lung cancer (NSCLC), a study of 1,211 specimens found that approximately 10% of cases required reflex testing with hybridization-capture-based RNA-seq after initial amplicon-based DNA/RNA sequencing yielded negative results despite clinical suspicion. Among these reflex cases, oncogenic fusions involving genes including ALK, BRAF, NRG1, NTRK3, ROS1, and RET were identified—none of which were detected by the initial amplicon-based assay [14].
A focused investigation of RET fusions in early-stage NSCLC provided further insight into method-specific sensitivities. In this study, DNA-seq successfully identified putative RET+ cases, but the subsequent RNA-seq analysis demonstrated enhanced detection capabilities. Targeted RNA-seq specifically uncovered five additional RET+ cases that were missed by whole-transcriptome sequencing, highlighting both the value of RNA-based detection and the performance differences between RNA-seq approaches [36] [37]. The concordance rates between methods were notably high but imperfect: 92.3% between DNA-seq and RNA-seq, and 82.5% between DNA-seq and FISH, underscoring that each method captures a slightly different subset of rearrangements [36].
The performance gap varies significantly across cancer types and specific genes. In acute leukemia, a comprehensive comparison of targeted RNA-seq and optical genome mapping (OGM) in 467 cases revealed an overall concordance of 88.1% for fusion detection. However, the detection rates were highly variable, with RNA-seq uniquely identifying 9.4% of clinically relevant rearrangements, while OGM exclusively detected 15.8% [21]. This suggests that the optimal testing approach may need to be tailored to specific clinical contexts and target genes.
Table 2: Clinical Detection Rates of Oncogenic Fusions Across Methodologies
| Study Context | DNA-Seq Detection Rate | RNA-Seq Detection Rate | Key Findings |
|---|---|---|---|
| NSCLC (n=1,211) [14] | ~90% of fusions (estimated from database review) | Identified 100% of fusions in reflex cohort | 10% of cases required RNA-seq reflex testing; RNA-seq found actionable fusions missed by DNA-seq |
| RET+ Early-Stage NSCLC (n=40) [36] [37] | 92.3% concordance with RNA-seq | 100% detection in confirmed RET+ cases | Targeted RNA-seq identified 5 additional cases missed by whole-transcriptome sequencing |
| Acute Leukemia (n=467) [21] | Not separately reported | 74.7% overall concordance with OGM | RNA-seq better for fusions from intrachromosomal deletions; OGM superior for enhancer-hijacking events |
| Solid Tumors (n=60) [34] | 93.4% concordance with reference methods | 86.9% concordance with reference methods | Integrated DNA/RNA testing achieved 100% sensitivity and specificity |
Beyond oncology, the limitations of DNA-seq for detecting intronic variants have significant implications for genetic disease diagnosis. A compelling case report described a patient with clinical Cowden syndrome who had negative targeted DNA sequencing results. Through concurrent RNA testing, researchers identified a deep intronic PTEN pathogenic variant that disrupted normal splicing [38]. This variant would have remained undetected by standard DNA-seq approaches, which typically only capture exons and short flanking intronic sequences. The discovery enabled accurate risk assessment and clinical management for the patient and their family members.
The integration of DNA and RNA sequencing can also resolve complex structural variants that evade characterization by single-method approaches. In one study investigating copy number gains, researchers utilized long-read sequencing on both DNA and cDNA to precisely map breakpoints at single-base resolution. This integrated approach revealed intricate rearrangement structures and their functional consequences on transcription, providing insights that would have been impossible with DNA-seq alone [33].
Emerging technologies like long-read sequencing offer potential solutions to some limitations of short-read DNA-seq. Pacific Biosciences HiFi and Oxford Nanopore Technologies can generate reads spanning kilobases to megabases, potentially capturing large intronic regions and complex rearrangements in a single read [13]. However, these technologies currently face challenges related to cost, throughput, and analytical validation for routine clinical use, suggesting they will complement rather than immediately replace established DNA-seq and RNA-seq approaches.
Standardized protocols for DNA-seq-based rearrangement detection typically begin with sample preparation from formalin-fixed paraffin-embedded (FFPE) tissue or fresh frozen specimens. For targeted DNA-seq approaches, hybrid capture or amplicon-based methods are used to enrich for genomic regions of interest. In one representative study investigating RET fusions in NSCLC, researchers employed a 425-gene panel with the following workflow: genomic DNA extraction using the QIAamp DNA FFPE Tissue kit, quality assessment via Nanodrop and Qubit fluorometry, library preparation with the KAPA Hyper Prep kit, and sequencing on Illumina HiSeq4000 platforms [36].
The bioinformatic analysis typically involves alignment to a reference genome (e.g., hg19/GRCh37) using tools like the Burrows-Wheeler Aligner (BWA), followed by variant calling with specialized structural variant detection algorithms. In the RET fusion study, researchers used Delly for somatic gene fusion detection after standard processing with GATK for base quality recalibration and local realignment [36]. For comprehensive variant interpretation, detected rearrangements are often manually verified using visualization tools such as the Integrative Genomics Viewer (IGV).
The limit of detection for DNA-seq assays varies based on sequencing depth and variant allele frequency. Validation studies of integrated DNA/RNA assays have demonstrated stable fusion detection at DNA mutational abundances as low as 5%, though performance depends on the specific fusion characteristics [34]. Intra-assay and inter-assay reproducibility validation is essential, with studies typically demonstrating complete concordance across replicates when quality metrics are maintained.
RNA-seq protocols for fusion detection address different technical challenges, particularly RNA quality preservation from clinical specimens. A typical workflow begins with RNA extraction using specialized kits such as the RNeasy FFPE kit, followed by quality and quantity measurement with Qubit RNA HS assays. For targeted RNA-seq, custom-designed probes enrich for specific transcripts or gene regions of interest, improving detection sensitivity for low-abundance fusion transcripts [36].
Two primary enrichment strategies dominate clinical RNA-seq for fusions: anchored multiplex PCR (AMP) and hybrid-capture-based approaches. The AMP method uses unidirectional gene-specific primers to capture known and novel fusion partners, making it particularly valuable for detecting rearrangements with previously uncharacterized partners. In contrast, hybrid-capture approaches use biotinylated probes to pull down target transcripts, offering broader coverage of potential fusion events [21] [35].
The analytical sensitivity of RNA-seq fusion detection depends on * transcript abundance* rather than genomic characteristics. Studies have demonstrated reliable fusion detection with RNA input as low as 250-400 copies/100 ng total RNA [34]. For the bioinformatic identification of fusions, tools like FusionCatcher and Archer Analysis Software align sequencing reads to reference genomes and apply filters to distinguish true fusion transcripts from artifacts. The high sensitivity of RNA hybrid-capture sequencing is evidenced by its ability to identify numerous oncogenic and likely oncogenic NTRK fusions across diverse tumor types in real-world clinical settings [35].
Diagram 1: Molecular Biology of Fusion Detection. This diagram illustrates the central dogma of biology and how fusion genes with intronic breakpoints create oncogenic proteins. RNA-seq bypasses the challenge of large introns by detecting the expressed fusion transcript directly.
The experimental approaches discussed require specialized reagents and computational tools to successfully detect rearrangements with large intronic breakpoints. The following table catalogues key solutions used in the cited studies:
Table 3: Essential Research Reagents and Tools for Rearrangement Detection Studies
| Category | Specific Product/Platform | Application Note |
|---|---|---|
| DNA Extraction | QIAamp DNA FFPE Tissue Kit (Qiagen) | Optimized for challenging clinical samples |
| RNA Extraction | RNeasy FFPE Kit (Qiagen) | Maintains RNA integrity from archived specimens |
| Target Enrichment | Anchored Multiplex PCR (Archer) | Captures novel fusion partners |
| Target Enrichment | Hybrid-Capture Probes (Illumina) | Broad coverage of fusion events |
| Sequencing Platform | Illumina HiSeq4000 | Workhorse for clinical NGS |
| Long-Read Platform | PacBio HiFi Sequencing | Resolves complex structural variants |
| Long-Read Platform | Oxford Nanopore PromethION | Ultra-long reads for spanning introns |
| Variant Caller | Delly | Specialized for structural variants |
| Fusion Detection | FusionCatcher | Identifies fusion transcripts from RNA-seq |
| Visualization | Integrative Genomics Viewer (IGV) | Manual verification of rearrangements |
Each solution addresses specific technical challenges in detecting rearrangements with intronic breakpoints. For example, specialized extraction kits maintain nucleic acid integrity despite degradation in FFPE samples, while targeted enrichment approaches ensure sufficient coverage of relevant genomic regions or transcripts. The choice between detection platforms involves trade-offs between read length, accuracy, throughput, and cost, with each technology offering distinct advantages for particular applications.
Diagram 2: Experimental Workflow for Rearrangement Detection. This diagram outlines parallel DNA-seq and RNA-seq pathways for comprehensive rearrangement detection, culminating in integrated analysis that compensates for the limitations of each individual method.
The detection of rearrangements with large intronic breakpoints remains a challenging frontier in genomic analysis. While DNA-seq provides critical information about genomic architecture, its limitations in spanning large intronic regions necessitate complementary approaches. RNA-seq offers a powerful solution by detecting the expressed consequences of these rearrangements, effectively bypassing the challenges posed by intronic sequences. The most effective diagnostic and research strategies increasingly employ integrated approaches that combine the strengths of both technologies.
Evidence from multiple clinical studies demonstrates that reflexive testing algorithms—where RNA-seq follows negative DNA-seq results in clinically suspicious cases—significantly improve detection rates for actionable rearrangements. As sequencing technologies evolve, long-read approaches may eventually overcome current limitations, but for now, the strategic combination of DNA and RNA sequencing represents the most comprehensive approach for detecting rearrangements with large intronic breakpoints. For researchers and clinicians, this integrated paradigm maximizes sensitivity while providing orthogonal validation of biologically significant fusion events.
The accurate identification of expressed chimeric transcripts, commonly known as fusion genes, has become a cornerstone of modern cancer diagnostics and therapeutic decision-making. These hybrid genes, formed through chromosomal rearrangements such as translocations, inversions, or deletions, act as powerful oncogenic drivers in numerous cancer types, accounting for approximately 20% of human cancer morbidity [26]. The detection of these fusions is particularly crucial as they represent actionable therapeutic targets, with inhibitors such as crizotinib (targeting EML4-ALK) showing remarkable clinical efficacy in treating fusion-positive cancers [26]. While traditional methods like fluorescence in situ hybridization (FISH) and reverse-transcription polymerase chain reaction (RT-PCR) have been diagnostic mainstays, they are inherently limited to assessing predefined targets, potentially missing novel or rare fusion events [9] [26].
The emergence of next-generation sequencing (NGS) technologies, particularly RNA sequencing (RNA-seq), has revolutionized fusion detection by enabling transcriptome-wide surveillance with nucleotide-level resolution. However, a significant methodological question remains: how does RNA-seq compare to DNA sequencing (DNA-seq) for reliable fusion identification in clinical and research settings? This guide provides a comprehensive, data-driven comparison of these approaches, evaluating their performance characteristics, practical applications, and implementation requirements to inform researchers and clinicians in selecting optimal strategies for fusion gene detection.
Direct comparative studies reveal distinct performance advantages and limitations of RNA-seq and DNA-seq approaches for fusion detection. The table below summarizes key performance metrics based on recent clinical and technical evaluations.
Table 1: Performance comparison of RNA-seq and DNA-seq for fusion detection
| Performance Metric | RNA-seq | DNA-seq | Evidence |
|---|---|---|---|
| Detection Rate | 76% (targeted RNA-seq) | Used as reference standard | [26] |
| Sensitivity for Canonical Fusions | 79.5-92.3% | 92.3% concordance with RNA-seq | [36] |
| Sensitivity for Novel Partners | High (partner-agnostic) | Limited to designed targets | [36] [26] |
| Ability to Confirm Expression | Direct evidence | Indirect inference | [39] [26] |
| Concordance with FISH | 84.6% | 82.5% | [36] |
| Major Limitation | RNA quality/expression level | Large introns/regulatory elements | [36] |
The data demonstrates that targeted RNA-seq significantly improves the overall diagnostic rate compared to conventional approaches (76% vs. 63%) [26]. In head-to-head comparisons for RET fusion detection in NSCLC, RNA-seq and DNA-seq showed high concordance (92.3%), though targeted RNA-seq identified additional positive cases missed by whole-transcriptome sequencing and DNA-seq [36]. This enhanced sensitivity is attributed to RNA-seq's direct capture of expressed fusion transcripts, circumventing challenges DNA-seq faces with large intronic regions where breakpoints often occur [36].
RNA-seq particularly excels in identifying noncanonical fusion partners, which are increasingly recognized as clinically relevant. One study of 120 NSCLC cases reflexed to hybridization-capture-based RNA sequencing identified actionable fusions involving ALK, BRAF, NRG1, NTRK3, ROS1, and RET that were not detected by amplicon-based DNA/RNA testing [14]. This partner-agnostic capability makes RNA-seq invaluable for comprehensive fusion profiling, especially in cancers with diverse fusion partners.
The targeted RNA-seq approach employs probe-based enrichment to overcome sensitivity limitations of whole-transcriptome sequencing for fusion detection [26]:
RNA Extraction and Quality Control: Extract total RNA from tumor samples (fresh frozen or FFPE). Assess RNA integrity using Bioanalyzer, requiring RIN score ≥7 for library construction [40].
Library Preparation: Convert RNA to double-stranded cDNA and add sequencing adapters. The use of ribosomal RNA depletion rather than poly-A selection is recommended as it preserves non-coding and degraded transcripts often present in FFPE samples [40].
Target Enrichment: Hybridize libraries with biotinylated oligonucleotide probes targeting exons of genes frequently involved in fusions. One validated panel design targets 188 genes for hematological malignancies and 241 genes for solid tumors, with overlapping coverage of 43 core fusion genes [26]. Perform double-capture to increase on-target rates to >90% [26].
Sequencing: Sequence enriched libraries on Illumina platforms (HiSeq, NovaSeq, NextSeq, or MiSeq). Recommended depth is 20-30 million paired-end reads per sample (2×100 bp or 2×150 bp) to adequately capture fusion junctions [41] [42].
The computational identification of fusions requires specialized pipelines to handle high false-positive rates common in RNA-seq data:
Quality Control and Preprocessing: Assess raw read quality with FastQC. Trim adapters and low-quality bases using Trimmomatic [42].
Alignment and Quantification: Map reads to the reference genome (e.g., GRCh38) using STAR aligner, which accurately handles splice junctions [41] [42]. Generate count matrices with FeatureCounts [42].
Fusion Calling: Execute multiple fusion detection algorithms (minimum of two recommended) such as STAR-Fusion and FusionCatcher to increase confidence [26] [43]. These tools identify chimeric reads spanning fusion junctions.
False Positive Filtering: Implement stringent filtering to remove artifacts:
Validation: Confirm high-confidence fusions in matched whole-genome sequencing data using discordant read pairs and soft-clipped alignments to identify supporting genomic breakpoints [39].
DNA-based fusion detection employs different principles and analytical approaches:
DNA Extraction and Library Prep: Extract genomic DNA from tumor and normal tissue. Use targeted capture or amplicon-based panels (e.g., 425-gene panel) focusing on intronic regions of genes known to harbor fusions [36].
Sequencing and Structural Variant Calling: Sequence to high coverage (typically >200x). Process reads through alignment (BWA-MEM), indel realignment (GATK), and structural variant calling using tools like Delly to identify genomic rearrangements [36].
Integration with RNA Evidence: Overlap DNA breakpoints with RNA-seq fusion calls to confirm transcriptional activity [39]. This integrated approach provides the highest confidence in fusion validation.
The following diagram illustrates the comprehensive integrated approach for fusion gene detection, combining both RNA-seq and DNA-seq methodologies:
Figure 1: Integrated RNA-seq and DNA-seq fusion detection workflow
Successful fusion detection requires careful selection of laboratory reagents, computational tools, and reference databases. The table below catalogs essential resources for implementing a robust fusion detection pipeline.
Table 2: Essential research reagents and resources for fusion detection
| Category | Resource | Specific Application | Function |
|---|---|---|---|
| Wet Lab Reagents | Ribosomal RNA depletion kits | RNA library preparation | Preserves non-polyadenylated transcripts |
| Biotinylated oligonucleotide panels | Targeted RNA-seq | Enrichment of fusion-related genes | |
| RNA spike-in controls (ERCC, fusion sequins) | Assay QC & quantification | Absolute quantification and sensitivity assessment | |
| Bioinformatics Tools | STAR-Fusion, FusionCatcher | Fusion detection | Identification of chimeric transcripts from RNA-seq |
| Delly, Manta | Structural variant calling | DNA-based fusion detection | |
| SAMtools, Picard | Data processing | BAM file processing and QC metrics | |
| Reference Databases | AACR Project GENIE | Clinical genomics | Repository of clinical cancer genomics data |
| ChimerDB, Mitelman Database | Fusion annotation | Curated database of known fusion genes | |
| ENSEMBL, Gencode | Genome annotation | Reference gene models for alignment |
The selection of biotinylated oligonucleotide panels is particularly critical, with designs targeting 188-241 fusion-related genes showing excellent coverage of clinically relevant fusions [26]. For bioinformatic analysis, the combination of STAR-Fusion and FusionCatcher provides complementary detection capabilities, with verification from the SMC-RNA Challenge benchmarking 77 fusion detection methods across 51 synthetic tumors [43]. Integration with reference databases like ChimerDB helps prioritize clinically relevant fusions and filter out likely artifacts [9].
The comprehensive comparison of RNA-seq and DNA-seq methodologies for fusion detection reveals a clear paradigm: while DNA-seq provides important information about genomic rearrangements, RNA-seq delivers superior sensitivity and clinical utility for identifying expressed chimeric transcripts, particularly when employing targeted enrichment approaches. The ability of RNA-seq to directly capture fusion expression, identify novel partners, and resolve complex isoforms makes it an indispensable tool for modern cancer genomics.
Looking forward, the integration of multiple technologies appears most promising for comprehensive fusion characterization. As demonstrated in recent studies, combining DNA-seq, targeted RNA-seq, and FISH achieves the highest diagnostic sensitivity while providing orthogonal validation [36]. Furthermore, emerging methodologies that validate fusion transcripts in matched whole-genome sequencing data offer powerful approaches for distinguishing high-confidence events from false positives [39]. As sequencing costs continue to decline and analytical methods mature, RNA-seq is poised to become the central technology for fusion detection in both clinical diagnostics and basic cancer research, ultimately expanding treatment options for patients with fusion-driven cancers.
Oncogenic gene fusions are major drivers in the pathogenesis of acute leukemia, with profound implications for disease classification, risk stratification, and therapeutic decision-making. The accurate detection of these rearrangements has become essential for modern precision oncology in hematologic malignancies. Currently, clinical laboratories employ diverse methodological approaches, primarily leveraging next-generation sequencing (NGS) technologies at either the DNA or RNA level. Each method offers distinct advantages and limitations in detecting these critical genetic events. This guide provides an objective comparison of DNA sequencing (DNA-seq) and RNA sequencing (RNA-seq) platforms for fusion gene detection in acute leukemia, synthesizing experimental data from recent studies to inform researchers, scientists, and drug development professionals.
A comprehensive 2025 study comparing a 108-gene targeted RNA-seq panel with optical genome mapping (OGM) in 467 acute leukemia cases provides critical insights into method-specific performance characteristics. The cohort included 360 cases of acute myeloid leukemia (AML), 89 B-lymphoblastic leukemia (B-ALL), 12 T-ALL, and 6 mixed phenotype acute leukemia (MPAL) cases [21].
Table 1: Overall Detection Performance in 467 Acute Leukemia Cases
| Metric | RNA-seq Performance | DNA-based OGM Performance |
|---|---|---|
| Overall concordance rate | 88.1% across all cases | 88.1% across all cases |
| Unique fusion detection | 9.4% (22/234) of clinically relevant fusions | 15.8% (37/234) of clinically relevant fusions |
| Case-level detection rate | 43.6% (206/467) of cases showed ≥1 rearrangement/fusion | 43.6% (206/467) of cases showed ≥1 rearrangement/fusion |
| Tier 1 aberration detection | 31.5% (147/467) of cases | 31.5% (147/467) of cases |
| Leukemia-type specific concordance | Varied from 41.7% (T-ALL) to 80.2% (B-ALL) | Varied from 41.7% (T-ALL) to 80.2% (B-ALL) |
The data reveal that both methodologies contribute uniquely to comprehensive fusion detection, with RNA-seq particularly effective for identifying expressed chimeric transcripts, while DNA-based OGM excels at detecting structural rearrangements that may not generate fusion transcripts, such as enhancer-hijacking events [21].
Table 2: Method-Specific Advantages and Limitations for Fusion Detection
| Aspect | RNA-seq | DNA-seq |
|---|---|---|
| Detection principle | Fusion transcripts | Genomic rearrangements |
| Sensitivity for expressed fusions | High | Variable (depends on breakpoint location) |
| Ability to detect enhancer-hijacking | Poor (20.6% concordance) | Excellent |
| Performance with intrachromosomal deletions | Slightly superior | May interpret as simple deletions |
| Dependence on expression level | High | None |
| Effect of RNA degradation | Significant concern | Not applicable |
| Novel partner discovery | Excellent with anchored multiplex PCR | Limited by probe design |
| Coverage requirements | Targeted panels sufficient | Often requires extensive intronic coverage |
The study found notably poor concordance (20.6%) for enhancer-hijacking lesions, including MECOM, BCL11B, and IGH rearrangements, many of which were not detected by RNA-seq. Conversely, RNA-seq slightly outperformed DNA-based OGM for fusions arising from intrachromosomal deletions that were sometimes labeled by OGM as simple deletions [21].
The 108-gene anchored multiplex PCR (AMP)-based RNA-Seq panel employed in the acute leukemia study utilizes specific experimental protocols optimized for hematologic malignancies [21]:
RNA Extraction and Quality Control: RNA is extracted from peripheral blood or bone marrow aspirate specimens. Quality control is critical, with RNA integrity number (RIN) typically assessed to ensure sample suitability.
Library Preparation: The AMP method utilizes unidirectional gene-specific primers (GSP2) targeting at least one of the two gene partners involved in translocation to capture novel fusion partners. This partner-agnostic approach enables discovery of previously uncharacterized fusions.
Sequencing and Analysis: Amplified targets undergo bidirectional sequencing on Illumina platforms. Sequencing reads are aligned to the human reference genome GRCh37/hg19, with fusion transcripts identified using Archer Analysis Software v6.2.7.
Validation Framework: The study established rigorous validation using the 2019 American College of Medical Genetics and Genomics (ACMG) and Clinical Genome Resource (ClinGen) Guidelines, with variants classified into three tiers based on established diagnostic, prognostic, or therapeutic relevance [21].
For DNA-based fusion detection, methodologies vary depending on the platform:
Hybrid Capture-Based DNA Sequencing: One study utilized a 542-gene solid tumor NGS panel with exonic probes supplemented with intronic bait probes against genes commonly involved in oncogenic fusions. This design specifically addresses the challenge of detecting breakpoints occurring in intronic regions [45].
Analytical Pipelines: The FindDNAFusion pipeline integrates multiple software tools (JuLI, Factera, and GeneFuse) to improve detection accuracy. This combinatorial approach achieved 98.0% detection accuracy for intron-tiled genes when optimized with blacklists for filtering common artifacts and criteria for selecting clinically reportable fusions [45].
Optical Genome Mapping: The OGM methodology involves:
Diagram 1: Fundamental differences between RNA-seq and DNA-seq approaches for fusion detection.
Third-generation sequencing technologies are emerging as powerful tools for fusion detection, offering advantages for analyzing complex genomic regions:
GFvoter Algorithm Performance: A novel tool employing a multivoting strategy for identifying gene fusions from long-read transcriptome sequencing data demonstrated superior performance compared to existing methods. When tested on real datasets from cancer cell lines and an AML patient sample, GFvoter achieved the highest average precision (58.6%) across nine experimental datasets, surpassing LongGF (39.5%), FusionSeeker (35.6%), and JAFFAL (30.8%) [8].
AML Transcript Isoform Diversity: Long-read sequencing of 60 primary AML bone marrow samples revealed extensive splicing abnormalities and identified 119,278 previously unannotated transcript isoforms. This isoform-level resolution enabled non-negative matrix factorization clustering that defined distinct molecular subtypes with strong correlations to patient prognosis, highlighting alternative splicing as a major contributor to AML molecular heterogeneity [46].
Advanced computational methods are enhancing fusion detection and clinical interpretation:
k-mer-Based Classification: One study applied machine learning models trained on k-mer count matrices to predict favorable and adverse risk groups in AML patients based on RNA-seq data. This reference-free approach fragmented sequencing reads into k-mers (substrings of length k) that were indexed to provide a compressed yet comprehensive data representation [47].
Risk Stratification Performance: Models including Neural Networks, Random Forest, and eXtreme Gradient Boosting achieved over 90% accuracy in risk prediction and identified key gene signatures distinguishing ELN2017 favorable and adverse groups. This approach facilitated the selection of prognostic biomarkers with significant impacts on survival [47].
Diagram 2: Complementary testing workflow for comprehensive fusion detection in leukemia.
Table 3: Key Research Reagent Solutions for Fusion Detection Studies
| Reagent/Resource | Function/Application | Specific Examples/Characteristics |
|---|---|---|
| Anchored Multiplex PCR (AMP) | Target enrichment for RNA-seq | Enables novel fusion partner discovery; used in 108-gene hematology panel |
| Intronic Bait Probes | Enhanced DNA-seq fusion detection | Supplemental probes for genes commonly involved in oncogenic fusions |
| Archer Analysis Software | Fusion transcript identification | v6.2.7 used for analyzing AMP-based RNA-seq data |
| Bionano Access Software | OGM data analysis | Version 1.8.2 with HemeTargets feature files |
| FindDNAFusion Pipeline | DNA-seq fusion calling | Integrates JuLI, Factera, and GeneFuse tools; 98% accuracy for intron-tiled genes |
| GFvoter Algorithm | Long-read fusion detection | Multivoting strategy for PacBio/Nanopore data; highest precision (58.6%) among tested tools |
| Kmtricks | k-mer counting for ML approaches | Generates k-mer count matrices from RNA-seq data for machine learning applications |
The comparative analysis of DNA-seq and RNA-seq platforms for fusion detection in acute leukemia reveals distinct yet complementary strengths. RNA-seq demonstrates superior sensitivity for detecting expressed chimeric fusion transcripts and slightly better performance for fusions arising from intrachromosomal deletions. Conversely, DNA-based methods, particularly optical genome mapping, excel at identifying cryptic, enhancer-driven events that often evade transcriptomic detection.
The most comprehensive approach integrates both methodologies, as neither platform alone detects all clinically relevant rearrangements. The 74.7% overall concordance rate between methods, with each technology uniquely identifying significant percentages of clinically actionable fusions (9.4% by RNA-seq alone, 15.8% by OGM alone), underscores the necessity of multimodal analysis for complete molecular characterization in acute leukemia [21].
For research and clinical applications, selection of appropriate methodologies should consider specific study objectives, sample quality, and resource constraints. However, the evolving landscape of fusion detection increasingly supports integrated DNA and RNA analysis to fully elucidate the genomic complexity of hematologic malignancies and advance precision medicine approaches for leukemia patients.
Oncogenic gene fusions are pivotal drivers of cancer pathogenesis, serving as essential biomarkers for diagnosis, prognosis, and therapeutic targeting across a wide spectrum of solid tumors. The accurate detection of these structural variants is therefore a cornerstone of precision oncology. Next-generation sequencing (NGS) technologies, particularly DNA sequencing (DNA-Seq) and RNA sequencing (RNA-Seq), have become the primary methods for identifying these alterations. However, these platforms possess distinct and complementary strengths and limitations for fusion detection. This guide provides an objective comparison of DNA-Seq and RNA-Seq performance for identifying actionable fusions in a pan-cancer context, supported by recent experimental data and detailed methodologies to inform researchers, scientists, and drug development professionals.
Table 1: Comparative Performance of RNA-Seq and Alternative Technologies for Fusion Detection
| Metric | Targeted RNA-Seq (108-gene panel) | Optical Genome Mapping (OGM) | Hybridization-Capture RNA-Seq | Amplicon-Based DNA/RNA-Seq |
|---|---|---|---|---|
| Overall Concordance | 88.1% (with OGM in leukemia) [21] | 88.1% (with RNA-Seq in leukemia) [21] | N/A | N/A |
| Unique Fusion Detection | Uniquely identified 9.4% of clinically relevant rearrangements [21] | Uniquely identified 15.8% of clinically relevant rearrangements [21] | Identified rare/novel fusions (ALK, BRAF, NRG1, NTRK3, ROS1, RET) missed by amplicon assay [14] | Detected ~82.6% of known fusions; missed 17.4% potentially novel/rare fusions [14] |
| Sensitivity | 98.4% for known fusions (WTS assay) [48] | N/A | N/A | N/A |
| Specificity | 100% (WTS assay) [48] | N/A | N/A | N/A |
| Strength in Detection | Expressed chimeric fusions; fusions from intrachromosomal deletions [21] | Cryptic, enhancer-driven events (e.g., MECOM, BCL11B, IGH rearrangements) [21] | Unbiased detection of known and novel fusions without prior knowledge of partners [14] | Targeted detection of pre-specified fusions with high efficiency [14] |
A direct comparison of a 108-gene targeted RNA-Seq panel and Optical Genome Mapping (OGM) in 467 acute leukemia cases revealed an overall concordance rate of 88.1% for gene rearrangements [21]. This high-level agreement, however, masks critical differences. OGM uniquely detected 15.8% of clinically relevant rearrangements, while RNA-Seq exclusively identified 9.4% [21]. The technological divergence is stark for specific fusion types; concordance for enhancer-hijacking lesions (e.g., involving MECOM) was markedly low at 20.6%, as these DNA-level rearrangements often do not produce fusion transcripts detectable by RNA-Seq [21]. Conversely, RNA-Seq slightly outperformed OGM for fusions arising from intrachromosomal deletions, which were sometimes misinterpreted by OGM as simple deletions [21].
In solid tumors, a study of 1,211 non-small cell lung cancer (NSCLC) specimens demonstrated the superior ability of hybridization-capture-based RNA-Seq to identify rare and novel oncogenic fusions. When used as a reflex test after negative amplicon-based testing, it successfully identified actionable fusions in 9 out of 120 cases, involving ALK, BRAF, NRG1, NTRK3, ROS1, and RET, none of which were detected by the initial amplicon-based assay [14]. Interrogation of a large database (AACR Project Genie) revealed that an amplicon-based approach could theoretically detect 82.6% of known fusions, leaving a significant 17.4% that would be missed and potentially identified by broader capture-based methods [14].
Table 2: Actionable Fusion Landscape in Pan-Cancer Analysis
| Cancer Type | Prevalence of Actionable Fusions | Commonly Altered Genes | Tumor-Agnostic Biomarker Status |
|---|---|---|---|
| Non-Small Cell Lung Cancer (NSCLC) | ~10% of reflexed cases harbored actionable fusions [14]; 68.9% of identified fusions were potentially actionable [48] | ALK, ROS1, RET, NTRK, BRAF, NRG1 [14] [48] | TMB-High (16.8%), MSI-High, NTRK fusions, RET fusions [49] |
| All Solid Tumors (Pan-Cancer) | 8.4% of samples had at least one tumor-agnostic biomarker [49] | NTRK, RET, BRAF [49] | TMB-High (6.6%), MSI-High, NTRK fusions, RET fusions, BRAF V600E [49] |
| Thyroid Cancer | 30% had a tumor-agnostic biomarker [49] | BRAF [49] | BRAF V600E [49] |
| Melanoma | 22.7% had a tumor-agnostic biomarker [49] | BRAF [49] | BRAF V600E [49] |
Comprehensive genomic profiling (CGP) of 1,166 tissue samples across 29 cancer types in an Asian cohort found that at least one established tumor-agnostic biomarker—including MSI-High, TMB-High, NTRK fusions, and BRAF V600E—was present in 8.4% of samples, spanning 26 different cancer types [49]. The prevalence was particularly high in specific cancers, such as thyroid cancer (30%) and melanoma (22.7%) [49]. In NSCLC, a focused validation of a whole transcriptome sequencing (WTS) assay demonstrated that a significant majority (68.9%) of the fusions identified were potentially actionable, highlighting the critical clinical value of comprehensive fusion detection in this and other malignancies [48].
The OSU-SpARKFuse assay was designed for clinical-grade detection of gene fusions in solid tumors [18].
A novel WTS assay was developed for the detection of gene fusions, MET exon 14 skipping, and EGFRvIII alterations [48].
Gene fusions drive oncogenesis through constitutive activation of key cellular signaling pathways that promote proliferation, survival, and metastasis. The diagram below illustrates the core pathways impacted by common actionable fusions in solid tumors.
Fusions often involve receptor tyrosine kinases (RTKs) or their downstream effectors. For example, fusions involving ALK, ROS1, RET, NTRK, and FGFR genes lead to ligand-independent dimerization and constitutive activation of the kinase [48] [18]. This aberrant activation persistently stimulates two major downstream pathways: the RAS-RAF-MEK-ERK (MAPK) pathway, which drives cell proliferation, and the PI3K-AKT-mTOR pathway, which promotes cell growth, survival, and metabolic changes [48]. MET exon 14 skipping, another RNA-level alteration detectable by RNA-Seq, results in increased stability of the MET receptor and activation of these same downstream pathways, making it a potent oncogenic driver in NSCLC and other cancers [48].
Table 3: Key Reagents and Kits for RNA-Seq-Based Fusion Detection
| Reagent/Kits | Function | Specific Examples (from cited studies) |
|---|---|---|
| RNA Extraction Kit | Isolate high-quality RNA from challenging sample types like FFPE. | RNeasy FFPE Kit (Qiagen) [48] [18], miRNeasy Kit (Qiagen) [18] |
| RNA Quality Control Instrument | Assess RNA integrity and quantity, a critical pre-analytical step. | TapeStation 2200 (Agilent) [18], Agilent 2100 Bioanalyzer [48], NanoDrop [18] |
| rRNA Depletion Kit | Remove abundant ribosomal RNA to enrich for coding and non-coding transcripts of interest. | Ribo-Zero (Illumina) [18], NEBNext rRNA Depletion Kit [48] |
| Library Prep Kit | Convert RNA into sequencer-compatible cDNA libraries. | Illumina TruSeq Stranded Total RNA Library Kit [18], NEBNext Ultra II Directional RNA Library Prep Kit [48] |
| Targeted Capture Probes | Enrich sequencing libraries for specific genes of interest. | Custom 120-mer biotinylated probes (IDT) [18] |
| NGS Platform | Perform high-throughput sequencing of prepared libraries. | Illumina MiSeq [18], Gene+ seq 2000 [48] |
The choice between DNA-Seq and RNA-Seq for detecting actionable fusions in solid tumors is not a matter of selecting a superior technology, but rather of understanding their complementary roles. DNA-based methods like OGM are powerful for identifying structural rearrangements at the genomic level, including cryptic enhancer-hijacking events. However, RNA-Seq confirms the expression of fusion transcripts, filters out silent passenger mutations, and detects a broader range of alterations, including novel fusion partners and splice variants like MET exon 14 skipping. For clinical and research applications aiming to maximize the detection of therapeutically actionable fusions, an integrated approach, potentially using a reflex testing model, provides the most comprehensive and clinically impactful solution.
While the comparison of RNA sequencing (RNA-seq) to DNA-based methods like Optical Genome Mapping (OGM) often focuses on fusion detection proficiency, this narrow view overlooks the profound utility of RNA-seq in modern drug discovery. The paradigm is shifting from merely detecting structural variants to understanding their functional consequences and exploiting this knowledge for therapeutic development. In acute leukemias, for instance, where RNA-seq demonstrates an 88.1% overall concordance with OGM, each method reveals distinct biological insights: OGM uniquely identifies 15.8% of clinically relevant rearrangements (particularly cryptic, enhancer-hijacking events), while RNA-seq exclusively detects 9.4% of fusions, especially those arising from intrachromosomal deletions [21]. This complementary relationship underscores that RNA-seq's true value extends far beyond structural variant detection into the realm of functional genomics and mechanism of action (MoA) elucidation.
The transcriptome provides a dynamic view of cellular states that DNA-level analyses cannot capture, positioning RNA-seq as an indispensable tool for understanding disease mechanisms, drug responses, and therapeutic opportunities. As we move toward personalized cancer treatments, the ability to connect genetic alterations to their functional transcriptional outcomes becomes increasingly critical for developing targeted therapies and predicting treatment efficacy [50]. This article explores how RNA-seq technologies are revolutionizing drug discovery by moving beyond fusion detection to provide insights into complex drug mechanisms, resistance patterns, and novel therapeutic targets.
Understanding the relative performance of RNA-seq versus DNA-based methods provides crucial context for appreciating its expanded role in drug discovery. A comprehensive 2025 study of 467 acute leukemia cases offers valuable comparative data, summarized in the table below [21].
Table 1: Comparative Performance of Targeted RNA-seq and Optical Genome Mapping in Acute Leukemia
| Performance Metric | Targeted RNA-seq (108-gene panel) | Optical Genome Mapping (OGM) |
|---|---|---|
| Overall Concordance | 88.1% with OGM | 88.1% with RNA-seq |
| Unique Detection of Clinically Relevant Rearrangements | 9.4% | 15.8% |
| Detection of Enhancer-Hijacking Lesions | Poor (20.6% concordance) | Effective |
| Detection of Fusions from Intrachromosomal Deletions | Effective | Sometimes labeled as simple deletions |
| Concordance Variation by Leukemia Type | 80.2% in B-ALL to 41.7% in T-ALL | Same variation pattern |
| Key Advantages | Detects expressed chimeric fusions; slightly better for deletion-related fusions | Better for cryptic, enhancer-driven events without fusion transcripts |
This comparative analysis reveals that RNA-seq and DNA-level methods provide complementary rather than redundant information. The fundamental distinction lies in what each method detects: RNA-seq identifies expressed chimeric fusion transcripts, while OGM reveals structural rearrangements regardless of their transcriptional activity [21]. This distinction becomes particularly important in enhancer-hijacking events, such as those involving MECOM, BCL11B, and IGH rearrangements, which frequently evade RNA-seq detection because they can activate oncogenes without generating fusion transcripts [21].
The technical limitations in fusion detection directly impact therapeutic development. For example, the poor performance of RNA-seq in detecting enhancer-hijacking lesions (20.6% concordance) means that potentially targetable events would be missed using RNA-seq alone [21]. Conversely, RNA-seq's ability to detect expressed fusions provides direct evidence of biologically active oncogenic drivers that may represent more promising drug targets.
These comparative insights establish why a multi-modal approach is increasingly necessary in clinical genomics and why RNA-seq's role must expand beyond fusion detection to leverage its unique capabilities in understanding functional biology.
The emergence of pharmacotranscriptomics—the large-scale profiling of gene expression changes in response to drug perturbations—represents a fundamental shift in drug screening methodologies. This approach has developed into the third major class of drug screening, distinct from target-based and phenotype-based screening [51]. By capturing the complex transcriptional responses to drug treatments, researchers can infer mechanisms of action, identify biomarkers of response, and discover novel therapeutic applications for existing compounds.
Pharmacotranscriptomics-based drug screening (PTDS) can detect system-wide changes in gene expression following drug perturbation, enabling researchers to analyze the efficacy of drug-regulated gene sets, signaling pathways, and complex disease networks by combining large-scale transcriptomic profiling with artificial intelligence [51]. This approach is particularly valuable for understanding the complex mechanisms of traditional Chinese medicine and other multi-component therapies, where multiple targets and pathways are simultaneously engaged [51].
The integration of single-cell RNA sequencing (scRNA-seq) with drug screening has created powerful new opportunities for MoA elucidation. A landmark 2025 study demonstrated a high-throughput multiplexed scRNA-seq pharmacotranscriptomics pipeline that combined drug screening with 96-plex single-cell RNA sequencing [52]. This approach enabled the researchers to explore the heterogeneous transcriptional landscape of primary high-grade serous ovarian cancer (HGSOC) cells after treatment with 45 drugs spanning 13 distinct mechanisms of action.
Table 2: Single-Cell Pharmacotranscriptomics Experimental Design for MoA Elucidation
| Experimental Component | Specifications | Application in MoA Studies |
|---|---|---|
| Drug Library | 45 drugs, 13 mechanism of action classes | PI3K-AKT-mTOR inhibitors, Ras-Raf-MEK-ERK pathway inhibitors, CDK inhibitors, epigenetic modifiers, etc. |
| Cell Models | 3 HGSOC models: JHOS2 cell line + 2 patient-derived cancer cells (PDC2, PDC3) | Capturing inter-patient and intra-patient heterogeneity |
| Multiplexing Approach | Live-cell barcoding using antibody-oligonucleotide conjugates (Cell Hashing) | 96-plex scRNA-Seq enabling high-throughput screening |
| Cells Analyzed | 36,016 high-quality cells across 288 samples | Single-resolution dissection of heterogeneous drug responses |
| Key Finding | PI3K-AKT-mTOR inhibitors induced feedback activation of RTKs (EGFR) via CAV1 upregulation | Identified drug resistance mechanism and synergistic combination (PI3K-AKT-mTOR + EGFR inhibitors) |
The power of this single-cell approach lies in its ability to resolve heterogeneous drug responses within complex cell populations. While bulk RNA-seq averages responses across all cells, scRNA-seq can identify distinct subpopulations with different drug sensitivities and resistance mechanisms [52]. In the HGSOC study, cells treated with different drug classes clustered distinctly: those treated with PI3K-AKT-mTOR, Ras-Raf-MEK-ERK, and multikinase inhibitors showed milder, model-specific transcriptional shifts, while cells treated with BET, HDAC, and CDK inhibitors formed distinct clusters enriched with cells from all three models, suggesting more consistent cross-model effects [52].
The typical workflow for single-cell pharmacotranscriptomic MoA studies involves several critical steps [52]:
Sample Preparation: Fresh tissue dissociation or use of frozen nuclei when fresh samples are unavailable. For the HGSOC study, patient-derived tumor epithelial cancer cells were isolated and cultured ex vivo at early passages to avoid loss of phenotypic identity.
Drug Treatment: Cells are treated with compounds at concentrations above the half-maximal effective concentration (EC50) based on prior drug sensitivity and resistance testing (DSRT) screens, typically for 24 hours to elicit a detectable transcriptional response.
Live-Cell Barcoding: Following drug treatments, cells from each well are labeled with a unique pair of antibody-oligonucleotide conjugates (Hashtag oligos or HTOs) targeting surface proteins like β2 microglobulin (B2M) and CD298.
Cell Pooling and scRNA-seq: All barcoded cells are pooled together for multiplexed single-cell RNA sequencing, significantly reducing costs and technical variability compared to processing samples individually.
Bioinformatic Analysis: The data processing includes demultiplexing cells based on their HTO barcodes, quality control, clustering, and differential expression analysis to identify drug-specific transcriptional signatures.
This protocol enables the systematic identification of single-cell transcriptomic responses to drugs, providing unprecedented insights into the heterogeneous mechanisms of drug action in complex cancer populations [52].
RNA-seq technologies contribute significantly to multiple stages of the drug discovery pipeline, beginning with target identification and validation. By detecting differentially expressed transcripts across disease states, RNA-seq helps uncover new molecular mechanisms of disease—an essential prerequisite for developing new drug targets [50]. For example, RNA-seq has identified distinct oncogene-driven transcriptome profiles, enabling the identification of potential targets for cancer therapy [50].
Single-cell RNA sequencing further enhances target discovery by resolving cellular heterogeneity and identifying novel cell types and subtypes that may represent promising therapeutic targets [53]. The technology has enabled the identification of molecular pathways that predict survival, therapy response, likelihood of resistance, and candidacy for alternative interventions [53]. When combined with CRISPR screening technologies, scRNA-seq enables high-content functional genomics screens that can credential and prioritize drug targets by directly linking genetic perturbations to transcriptional outcomes at single-cell resolution [53].
Chemotherapy resistance remains a major obstacle in oncology, and RNA-seq provides powerful tools to investigate its mechanisms. By comparing gene expression profiles between drug-resistant and sensitive cells, researchers can identify genes and pathways associated with treatment failure [50]. In triple-negative breast cancer (TNBC), for instance, RNA-seq analysis of drug-resistant cell lines revealed significant differences in cytokine-cytokine receptor interaction pathways, providing new ideas for developing more effective treatments [50].
Small RNA-Seq has proven particularly valuable for investigating the role of microRNAs (miRNAs) in regulating drug resistance. In a study of doxorubicin resistance in hepatocellular carcinoma, researchers used RNA-seq to identify down-regulated miRNAs and their associated functional pathways, providing potential targets for overcoming resistance [50].
RNA-seq is increasingly important for identifying biomarkers that can predict treatment response and stratify patient populations. Fusion genes, once considered rare, are now recognized as powerful biomarkers and therapeutic targets across multiple cancer types [50] [54]. RNA-seq has uncovered recurrent gene fusions in acute myeloid leukemia, breast cancer, and colorectal cancer, providing promising targets for personalized therapies [50].
Beyond fusions, RNA-seq can identify various other cancer biomarker types, including small RNAs (such as miRNAs), various non-coding RNAs (lncRNAs and circRNAs), and gene expression signatures that correlate with disease progression, recurrence, and treatment response [50]. These biomarkers are essential for developing companion diagnostics and implementing precision medicine approaches.
The following diagram illustrates how RNA-seq technologies integrate into various stages of the drug discovery and development pipeline, from initial target identification to clinical decision-making.
This diagram details the experimental workflow for multiplexed single-cell RNA-seq pharmacotranscriptomic analysis, which enables high-throughput drug screening at single-cell resolution.
Implementing RNA-seq technologies for MoA studies and drug discovery requires specialized reagents and tools. The following table details key solutions used in the featured studies.
Table 3: Essential Research Reagent Solutions for Pharmacotranscriptomic Studies
| Reagent/Tool Category | Specific Examples | Function in Experimental Pipeline |
|---|---|---|
| Cell Barcoding Systems | Cell Hashing with anti-B2M and anti-CD298 antibody-oligonucleotide conjugates [52] | Enables sample multiplexing by labeling cells from different conditions with unique barcodes before pooling |
| scRNA-seq Platforms | 10X Chromium technology [53] | Creates microdroplet reaction chambers for single-cell RNA capture and barcoding |
| Library Prep Kits | SMART-seq2 for plate-based protocols [53] | Provides high-sensitivity full-length transcript coverage for single cells |
| Bioinformatic Tools | Cell Ranger, STARsolo, Alevin, Kallisto-BUStools [53] | Processes raw sequencing data into cell-by-gene count matrices |
| Data Analysis Platforms | Seurat, Scanpy [53] | Performs quality control, normalization, clustering, and differential expression analysis |
| Pathway Analysis Tools | Gene set variation analysis (GSVA) [52] | Evaluates activity of biological processes and pathways from transcriptome data |
| Fusion Detection Algorithms | DEEPEST (Data-Enriched Efficient PrEcise STatistical fusion detection) [54] | Identifies gene fusions with high specificity while minimizing false positives in large datasets |
The application of RNA-seq technologies in drug discovery has evolved far beyond its initial role in fusion detection to become an indispensable tool for understanding therapeutic mechanisms of action. While DNA-level methods like OGM provide crucial information about structural variants, RNA-seq delivers unique insights into the functional consequences of these variants and other disease-associated perturbations. The emergence of single-cell pharmacotranscriptomics represents a particularly significant advance, enabling researchers to dissect heterogeneous drug responses in complex cell populations and identify resistance mechanisms that would be obscured in bulk analyses.
As artificial intelligence and machine learning increasingly integrate with transcriptomic data analysis, the potential for RNA-seq to revolutionize drug discovery continues to grow. By capturing the dynamic complexity of disease states and therapeutic responses, RNA-seq provides a powerful window into biological systems that is transforming target identification, lead optimization, and clinical development. For researchers and drug development professionals, embracing these technologies and their applications will be essential for developing the next generation of targeted therapies and personalized medicine approaches.
Next-generation sequencing (NGS) has revolutionized genomic analysis, with DNA sequencing (DNA-seq) and RNA sequencing (RNA-seq) serving as complementary technologies for detecting oncogenic alterations in cancer research. While DNA-seq provides a comprehensive view of genomic architecture, it faces significant limitations in identifying specific structural variants, particularly those involving large intronic regions and enhancer-hijacking events. These limitations have profound implications for drug development and clinical diagnostics, where missing these alterations can directly impact patient access to targeted therapies. This guide examines the technical challenges of DNA-seq through comparative experimental data and provides methodologies for implementing integrated sequencing approaches in fusion detection research.
DNA-seq encounters substantial difficulties in detecting fusion events that involve large intronic regions in partner genes. The technical limitation stems from the common practice of using fragmented DNA libraries and the positioning of primer/probe sets, which may not adequately cover extensive intronic sequences where breakpoints occur.
Table: Detection Rates of RET Fusions Across Methodologies
| Detection Method | Cases Identified | Detection Rate | Key Limitations |
|---|---|---|---|
| DNA Sequencing (DNA-seq) | 40/40 | 100% (reference) | May miss fusions with breakpoints in large introns |
| Targeted RNA-seq | 39/39 | 100% | Identifies expressed fusions; requires good RNA quality |
| Whole-Transcriptome Sequencing (WTS) | 31/39 | 79.5% | Lower sensitivity for low-abundance transcripts |
| Fluorescence In Situ Hybridization (FISH) | 33/40 | 82.5% | Limited to known partners; reveals architecture [36] |
Research on RET fusions in non-small cell lung cancer (NSCLC) demonstrates that DNA-seq, while capable of initial identification, benefits substantially from RNA-seq confirmation. In a study of 40 RET+ NSCLC patients, targeted RNA-seq identified five additional RET+ cases that were missed by whole-transcriptome sequencing, highlighting its superior sensitivity for detecting these fusion events [36]. The same study reported a 92.3% concordance between DNA-seq and RNA-seq, with discordant cases potentially representing limitations in either platform's ability to detect certain fusion types.
Enhancer-hijacking represents a particularly challenging structural variant for DNA-seq to detect. These events occur when genomic rearrangements place enhancer elements in proximity to oncogenes, activating their expression without generating fusion transcripts. This mechanism is especially prevalent in acute leukemias, where it drives oncogenesis through dysregulation of key developmental genes.
Table: Detection of Enhancer-Hijacking Events in Acute Leukemia (n=467 cases)
| Method | Overall Concordance Rate | Enhancer-Hijacking Detection Rate | Commonly Missed Alterations |
|---|---|---|---|
| Optical Genome Mapping (OGM) + RNA-seq | 74.7% | 20.6% | - |
| Optical Genome Mapping (OGM) Alone | - | 15.8% uniquely detected | MECOM, BCL11B, IGH rearrangements |
| Targeted RNA-seq Alone | - | 9.4% uniquely detected | CDK6::MNX1, other enhancer-driven events [55] |
A comprehensive analysis of 467 acute leukemia cases revealed strikingly poor performance in detecting enhancer-hijacking lesions, with only 20.6% concordance between OGM and targeted RNA-seq methods. OGM uniquely detected 15.8% of clinically relevant rearrangements, while RNA-seq exclusively identified 9.4%. The dramatically low concordance for enhancer-hijacking events (20.6%) compared to all other aberration types (93.1%) underscores the fundamental detection challenge [55]. These findings highlight how enhancer-hijacking events represent a critical blind spot for transcriptome-based methods, requiring complementary technologies for comprehensive detection.
Advanced research protocols have demonstrated the utility of integrated sequencing approaches that combine the strengths of multiple technologies. These workflows typically employ DNA-seq as an initial screening tool, followed by targeted RNA-seq for verification and characterization of fusion events.
The workflow for RET fusion characterization exemplifies this integrated approach. DNA-seq serves as an initial screen using a 425-gene panel, followed by RNA quality assessment (RIN >7, appropriate 260/280 ratios). Whole-transcriptome sequencing provides broad detection capability, with targeted RNA-seq employed for cases showing inconclusive results or suspected novel fusions. FISH validation confirms potentially actionable findings, ensuring comprehensive fusion characterization [36]. This multi-modal approach maximizes sensitivity while maintaining specificity through orthogonal verification.
Reflex testing protocols represent a strategic approach to balancing comprehensive detection with cost efficiency. These protocols employ initial screening with broader but less sensitive methods, followed by targeted secondary testing for negative cases with high clinical suspicion.
In non-small cell lung cancer, research has demonstrated the utility of amplicon-based DNA/RNA sequencing as an initial test, with reflex to hybridization-capture-based RNA sequencing for driver-negative cases. In one study of 1,211 NSCLC specimens, approximately 10% required reflex testing, which identified nine oncogenic fusions involving ALK, BRAF, NRG1, NTRK3, ROS1, and RET that were missed by the initial amplicon-based assay [14]. Analysis of the AACR Project Genie database revealed that 17.4% of fusions in NSCLC would be undetectable by amplicon-based approaches alone, highlighting the critical importance of reflex testing protocols [14].
Table: Key Research Reagents for Fusion Detection Studies
| Reagent/Category | Specific Examples | Research Function | Technical Considerations |
|---|---|---|---|
| Nucleic Acid Extraction | QIAamp DNA FFPE Tissue Kit, PAXgene RNA tubes | Preserve molecular integrity from specimens | FFPE degradation affects DNA; blood requires RNA stabilizers [56] [36] |
| Target Enrichment | Archer FusionPlex, Anchored Multiplex PCR (AMP) | Capture known/novel fusion transcripts | Partner-agnostic designs essential for novel discovery [55] |
| Library Preparation | KAPA HyperPrep, Illumina Stranded kits | Prepare sequencing libraries | Stranded protocols preserve transcript orientation [56] |
| rRNA Depletion | RNase H-based kits, Ribozero | Enhance non-ribosomal sequencing | Reproducible but may cause off-target effects [56] |
| Sequencing Platforms | PacBio long-read, Illumina HiSeq/IsoSeq | Generate sequence data | Long-read spans complex rearrangements [57] |
Sample Preparation and Quality Control
Library Preparation and Sequencing
Data Analysis and Validation
The diagram illustrates two distinct mechanisms of oncogene activation with different detection requirements. Enhancer-hijacking events (top pathway) activate oncogenes through repositioning of regulatory elements without generating fusion transcripts, making them detectable primarily at the DNA level through methods like optical genome mapping. Conventional fusion events (bottom pathway) create chimeric transcripts and proteins detectable by both DNA-seq and RNA-seq, though with varying efficiency depending on the genomic architecture [55]. This mechanistic distinction explains why multi-platform approaches are necessary for comprehensive oncogenic driver detection.
DNA-seq presents significant limitations in detecting fusion events involving large intronic regions and enhancer-hijacking mechanisms, potentially missing clinically actionable alterations in cancer. The experimental data presented demonstrates that integrated approaches combining DNA-seq with targeted RNA-seq and optical genome mapping provide substantially improved detection sensitivity. For researchers and drug development professionals, implementing reflexive testing protocols and leveraging complementary technologies is essential for comprehensive genomic characterization. These multi-modal strategies ensure identification of both canonical fusion events and complex structural variants, ultimately supporting more precise targeted therapy development and patient stratification.
RNA sequencing (RNA-seq) has become an indispensable tool in modern biological research, providing a comprehensive snapshot of the transcriptome that enables researchers to quantify gene expression levels, detect alternative splicing events, and identify non-coding RNAs [58]. Its ability to capture dynamic changes in gene expression under different conditions or treatments makes RNA-seq invaluable for studying various biological processes, including development, disease mechanisms, and drug responses [58]. In the specific context of fusion gene detection—a critical application in oncology research—RNA-seq provides a complementary method to DNA sequencing that may improve the identification of actionable variants [25].
However, the powerful capabilities of RNA-seq come with significant technical challenges that researchers must navigate to generate reliable data. Two of the most critical limitations concern its profound dependence on RNA quality and transcript expression levels. These technical constraints can substantially impact data quality, potentially leading to false negatives, inaccurate quantification, and ultimately, compromised biological conclusions. This guide objectively examines these pitfalls through the lens of experimental evidence and provides researchers with practical frameworks for designing robust RNA-seq experiments, particularly for fusion detection research where these factors play a decisive role in success or failure.
Unlike DNA, which is relatively stable and does not degrade rapidly, RNA is inherently unstable and degrades quickly once extracted from cells [4]. This fundamental chemical property poses one of the most significant practical challenges for RNA-seq workflows. The instability of RNA necessitates careful preservation immediately after sample collection, as high-quality tissue is required due to RNA's rapid degradation potential [58]. Specific tissue types present particular challenges; for instance, brain tissue must be collected quickly post-mortem to avoid degradation, which is often not feasible in clinical settings [58].
The RNA extraction process itself is "a difficult and often error prone process involving many steps with loss of sample at every step" [59]. The requirement to remove highly abundant ribosomal RNA (rRNA), which typically constitutes over 90% of total RNA in the cell, further complicates the process, leaving only the 1-2% comprising messenger RNA (mRNA) that researchers are typically interested in [28]. This process adds "labor, cost and time" and depletes the amount of original sample, creating particular challenges when working with needle biopsies, rare transcripts, or single cells [59].
The quality of starting RNA material directly influences multiple aspects of RNA-seq data quality. Formalin-Fixed Paraffin-Embedded (FFPE) samples exemplify these challenges, as "the fixation process causes RNA fragmentation and modifications leading to biased transcriptome profiles and inaccurate gene expression quantification" [58]. Artifacts like cross-linking may further interfere with sequencing processes [58].
Table 1: Impact of RNA Quality on Sequencing Metrics
| RNA Quality Metric | High-Quality RNA Impact | Degraded RNA Impact |
|---|---|---|
| RNA Integrity Number (RIN) | Higher mapping rates, more balanced coverage [28] | Increased 3' bias, reduced library complexity [28] |
| Library Complexity | Detection of more transcripts, better dynamic range [28] | Limited transcript detection, biased toward highly expressed genes [59] |
| Mapping Quality | 70-90% of reads map to reference genome [28] | Reduced mapping percentages, increased ambiguous mappings [28] |
| Base-Level Quality Scores | Uniform quality across read length [28] | Quality deterioration toward 3' end, requiring trimming [28] |
Variability in sample quality and quantity can introduce batch effects and confounding factors that complicate data interpretation [58]. This is especially problematic in clinical settings where sample collection cannot be as tightly controlled as in experimental models. The presence of high abundance RNAs requires additional steps to reduce background RNA and/or enrich for mRNAs, and although "these methods can help data quality, they add to the labor, cost and time required" for the experiment [59].
RNA-seq faces inherent challenges in sensitivity and noise that directly impact its ability to detect transcripts across different expression levels. The balance between sensitivity and noise is critical in RNA-seq analysis, as "technical limitations in library preparation and high sequencing depth requirements can lead to difficulties in detecting low-abundance transcripts, potentially underestimating or omitting important biological signals" [58]. Even when sequencing at sufficient depth to capture low-frequency transcripts, "the associated noise buildup can mask the transcripts that are of most importance" [58].
The fundamental issue stems from the composition of the transcriptome, where a small number of highly expressed genes can dominate the sequencing library, making it challenging to detect rare transcripts without extensive sequencing. High background noise from "sequencing errors, PCR amplification biases, and other technical artefacts can obscure genuine transcriptomic differences, making it challenging to distinguish true biological variability from experimental noise" [58]. This sensitivity limitation has direct implications for fusion detection, as fusion transcripts may be expressed at low levels despite their clinical significance.
Recent large-scale clinical studies have provided quantitative evidence of how transcript expression levels impact fusion detection sensitivity. A comprehensive analysis of approximately 80,000 samples from the Tempus Research Database compared the detection of clinically actionable fusions using both DNA-seq and whole exome capture RNA-seq [25]. The results demonstrated significant differences in detection capabilities:
Table 2: Comparative Fusion Detection Rates: DNA-seq vs RNA-seq
| Gene Fusion | Total Fusions Detected | Detected by Both RNA + DNA | DNA Only | RNA Only |
|---|---|---|---|---|
| ALK-* | 386 | 78.0% | 4.1% | 17.9% |
| BRAF-* | 289 | 30.4% | 1.4% | 68.2% |
| FGFR3-* | 307 | 73.6% | 2.9% | 23.5% |
| NTRK1/2/3-* | 198 | 65.7% | 11.1% | 23.2% |
| ROS-* | 113 | 70.8% | 1.8% | 27.4% |
| All Fusions | 2118 | 66.1% | 4.8% | 29.1% |
Across all fusion events, 29.1% were detected only by RNA-seq, while only 4.8% were identifiable solely through DNA-seq [25]. This substantial difference highlights the complementarity of the two approaches and RNA-seq's particular value for fusion detection. The study further analyzed the therapeutic implications of these findings, noting that "fusions identified through RNA-seq alone led to a 24% increase in the number of patients who were eligible to receive matched therapies" [25].
The technical reasons for RNA-seq's advantage in fusion detection relate to the nature of fusion events. DNA-seq operates at the DNA level, where "the breakpoints of fusion genes usually occur in long intronic regions, and the breakpoints of fusion genes vary across patients and diseases in real life" [4]. However, "DNA-seq cannot accurately cover the long intronic regions, which contain a large number of repetitive sequences, making it difficult to identify fusion genes" [4]. RNA-seq, in contrast, detects the expressed consequence of these genomic rearrangements, potentially providing a more functional assessment of the fusion's biological relevance.
A successful RNA-seq study requires thoughtful experimental design beginning with RNA quality assessment. The RNA-extraction protocol must be chosen based on sample characteristics—for eukaryotes, this involves deciding "whether to enrich for mRNA using poly(A) selection or to deplete rRNA" [28]. Poly(A) selection "typically requires a relatively high proportion of mRNA with minimal degradation as measured by RNA integrity number (RIN), which normally yields a higher overall fraction of reads falling onto known exons" [28]. However, many biologically relevant samples (such as tissue biopsies) cannot be obtained in sufficient quantity or quality to produce good poly(A) RNA-seq libraries and therefore require ribosomal depletion instead [28].
Quality control checkpoints should be implemented at multiple stages of the RNA-seq workflow:
Diagram 1: RNA-seq Quality Control Checkpoints
At the raw reads stage, quality control involves "analysis of sequence quality, GC content, the presence of adaptors, overrepresented k-mers and duplicated reads in order to detect sequencing errors, PCR artifacts or contaminations" [28]. Software tools such as the FASTX-Toolkit and Trimmomatic can be used to "discard low-quality reads, trim adaptor sequences, and eliminate poor-quality bases" [28]. For read alignment, important parameters include "the percentage of mapped reads, which is a global indicator of the overall sequencing accuracy and of the presence of contaminating DNA" [28]. Additional alignment quality metrics include "the uniformity of read coverage on exons and the mapped strand" [28].
Optimal sequencing depth is experiment-dependent and represents a balance between cost and comprehensiveness. While "some authors will argue that as few as five million mapped reads are sufficient to quantify accurately medium to highly expressed genes in most eukaryotic transcriptomes, others will sequence up to 100 million reads to quantify precisely genes and transcripts that have low expression levels" [28]. For fusion detection, where target transcripts may be rare, deeper sequencing is generally advantageous.
The number of biological replicates is another critical design factor that "depends on both the amount of technical variability in the RNA-seq procedures and the biological variability of the system under study, as well as on the desired statistical power" [28]. Technical variation in RNA-seq experiments "stems from many sources, such as differences in quality and quantity of RNA recovered during sample preparation, library preparation batch effect, flow cell and lane effects when using Illumina technology, and adapter bias" [60]. Evidence suggests that "library preparation was the largest source of technical variation" [60].
To mitigate these effects, researchers should "randomize samples during preparation and dilute them to the same concentration" [60]. Additionally, "indexing and multiplexing samples, with all samples included on all lanes/flow cells" helps reduce lane-specific effects [60]. When complete multiplexing isn't possible, "a blocking design can be used that includes some samples from each group on each lane of sequencing" [60].
The analysis of RNA-seq data involves multiple steps, each with specific methodological considerations. A generalized workflow encompasses quality control, read alignment, quantification, and differential expression analysis, with tool selection depending on the specific experimental goals:
Diagram 2: RNA-seq Data Analysis Workflow
For read alignment, different aligners are available, with studies comparing "Gsnap, Stampy and TopHat" for their influence on detection capabilities [61]. For differential expression analysis, multiple statistical methods have been developed, including "DESeq, edgeR, Cuffdiff, baySeq, and NOISeq" [61]. These tools employ different statistical models—"edgeR method proposed by Robinson et al. has been developed based on an overdispersed Poisson model," while "Anders and Huber showed that negative binomial was superior for estimation of variability in read count type data and implemented the method as a DESeq package" [61].
Recent systematic comparisons have revealed that analytical choices significantly impact RNA-seq results. One comprehensive study evaluated "192 pipelines using alternative methods" applied to 18 samples from two human cell lines, testing "3 trimming algorithms, 5 aligners, 6 counting methods, 3 pseudoaligners and 8 normalization approaches" [62]. The results demonstrated that "the choice of data preprocessing operations affected the performance" of downstream analyses [62].
For cross-study applications, such as building classifiers for tissue of origin prediction, preprocessing decisions including "normalization, batch effect correction, and data scaling" significantly impact performance [63]. One investigation found that "batch effect correction improved performance measured by weighted F1-score in resolving tissue of origin against an independent GTEx test dataset" [63]. However, the same study also noted that "the use of data preprocessing operations worsened classification performance when the independent test dataset was aggregated from separate studies in ICGC and GEO" [63], highlighting that the optimal preprocessing approach depends on the specific application and data structure.
Table 3: Key Research Reagent Solutions for RNA-seq Studies
| Reagent/Solution Category | Specific Examples | Function and Application |
|---|---|---|
| RNA Stabilization Reagents | RNAlater, PAXgene | Stabilize RNA immediately after sample collection to prevent degradation [58] |
| RNA Extraction Kits | RNeasy Plus Mini Kit (QIAGEN) | High-quality RNA extraction with genomic DNA removal [62] |
| RNA Quality Assessment | Bioanalyzer (Agilent), TapeStation | Assess RNA Integrity Number (RIN) for sample QC [62] |
| rRNA Depletion Kits | Ribo-Zero, NEBNext rRNA Depletion | Remove abundant ribosomal RNA to enhance mRNA sequencing [28] |
| Poly(A) Selection Kits | Dynabeads mRNA DIRECT | Isolate polyadenylated mRNA molecules [28] |
| Library Preparation Kits | TruSeq Stranded Total RNA, NuGEN Ovation | Convert RNA to sequencing-ready libraries [60] [62] |
| Strand-Specific Library Kits | dUTP-based methods | Preserve strand orientation information during cDNA synthesis [28] |
RNA-seq provides powerful capabilities for transcriptome analysis and fusion detection, but its effectiveness is constrained by fundamental dependencies on RNA quality and transcript expression levels. The evidence demonstrates that RNA-seq can detect approximately 29% of actionable fusions that would be missed by DNA-seq alone [25], highlighting its complementary value in comprehensive genomic profiling. However, realizing this potential requires meticulous attention to experimental design, including appropriate quality control measures, sufficient sequencing depth, and adequate biological replication.
Researchers must carefully consider their specific experimental goals when designing RNA-seq studies, as "the choice between standard and nascent RNA-seq depends on the research question and experimental objectives" [58]. Failure to consider critical factors such as RNA quality, sequencing depth, and analytical approaches "can lead to misinterpretation of results and limited insights into gene regulation dynamics" [58]. By understanding and addressing these pitfalls through rigorous experimental design and appropriate analytical choices, researchers can maximize the utility of RNA-seq data and generate robust, biologically meaningful results that advance our understanding of transcriptome biology and improve clinical detection of functionally important genetic events such as gene fusions.
Gene fusions are critical drivers in oncogenesis, serving as key biomarkers for disease classification, prognosis, and therapeutic targeting in precision oncology. The accurate detection of these complex structural variants relies heavily on optimized assay configurations, encompassing targeted panel design, sequencing coverage parameters, and bioinformatics pipeline selection. This guide provides a comprehensive comparison of RNA-seq and DNA-seq approaches for fusion gene detection, synthesizing experimental data from recent studies to inform researchers, scientists, and drug development professionals in their assay optimization strategies. The analysis focuses specifically on technical performance metrics across platforms and methodologies, providing evidence-based recommendations for clinical and research applications.
A large-scale comparative study of 467 acute leukemia cases provides critical insights into the complementary strengths of targeted RNA-seq and optical genome mapping (OGM) technologies. The research demonstrated an overall concordance rate of 88.1% between platforms, with significant variability across leukemia subtypes—ranging from 80.2% in B-ALL to 41.7% in T-ALL [21].
Table 1: Platform-Specific Detection Rates in Acute Leukemia (n=234 clinically relevant rearrangements)
| Detection Category | Percentage | Count | Key Examples |
|---|---|---|---|
| OGM Unique Detection | 15.8% | 37/234 | MECOM, BCL11B, IGH rearrangements |
| RNA-seq Unique Detection | 9.4% | 22/234 | Fusions from intrachromosomal deletions |
| Concordant Detection | 74.7% | 175/234 | KMT2A, BCR::ABL1 rearrangements |
| Enhancer Hijacking Events (Concordance) | 20.6% | - | MECOM, BCL11B rearrangements |
The data reveals that OGM particularly excels in identifying enhancer-hijacking lesions that often evade detection by RNA-seq, while targeted RNA-seq slightly outperforms for fusions arising from intrachromosomal deletions that OGM may misclassify as simple deletion events [21]. This underscores the platform-specific biases that must be considered during assay selection.
Targeted RNA-seq demonstrates significant advantages for fusion detection sensitivity compared to conventional whole transcriptome approaches. Experimental validation using spike-in standards and cell lines shows that targeted capture achieves 50% detection of fusion sequins at 2 pM input and 100% detection between 8 pM and 31 nM input, independent of whether the panel targeted one or both fusion partners [26].
Table 2: Sensitivity Comparison of RNA-seq Methodologies
| Performance Metric | Targeted RNA-seq | Conventional RNA-seq |
|---|---|---|
| Detection Sensitivity | 50% at 2 pM input | Limited for low-expression fusions |
| On-Target Rate | 93% (double-capture) | ~4% |
| Enrichment Factor | 33-59 fold | - |
| Single-Copy Fusion Detection | Reliable | Challenging |
| Novel Partner Identification | Possible | Possible |
In clinical validation, targeted RNA-seq increased the overall fusion diagnostic rate from 63% with conventional approaches (FISH, RT-PCR) to 76%, while simultaneously identifying precise fusion junctions and partners [26]. This enhanced sensitivity is particularly valuable for detecting low-abundance fusion transcripts in samples with limited tumor purity or those expressing fusion genes at low levels.
The optimized protocol for fusion detection employs a double-capture approach to maximize on-target efficiency:
RNA Extraction and Quality Control: Isolate RNA from patient specimens (blood, bone marrow, or tumor tissue) using standardized extraction kits. Assess RNA integrity using appropriate methods (e.g., Bioanalyzer) with RIN > 7.0 recommended [26] [62].
Library Preparation: Utilize stranded RNA library preparation kits (e.g., TruSeq Stranded Total RNA) following manufacturer protocols with incorporation of unique dual indexes to enable sample multiplexing [62].
Hybridization Capture: Employ biotinylated oligonucleotide probes targeting known fusion genes (188 genes for hematological malignancies, 241 genes for solid tumors) with 16-24 hour hybridization at 65°C. Include both DNA-level and RNA-level spike-in controls (ERCC, fusion sequins) for quality monitoring [26].
Post-Capture Amplification: Perform double-capture enrichment with magnetic streptavidin bead-based purification followed by 10-12 cycles of PCR amplification [26].
Sequencing: Sequence on Illumina platforms (HiSeq 2500, MiSeq, or NovaSeq) with paired-end reads (2×101 bp) targeting minimum 20 million reads per sample for adequate coverage [62].
Computational analysis requires specialized pipelines to distinguish true fusion events from artifacts:
Read Preprocessing: Quality trim adapters and low-quality bases using Trimmomatic, Cutadapt, or BBDuk, retaining reads with Phred score >20 and length >50 bp [62].
Alignment and Mapping: Map reads to reference genome (GRCh37/hg19 or GRCh38/hg38) using STAR aligner with chimeric alignment detection enabled [7] [26].
Fusion Detection: Implement multiple algorithms in parallel (STAR-Fusion, FusionCatcher, Arriba) to maximize sensitivity. Require consensus detection by at least two tools to minimize false positives [7] [26] [64].
Filtering and Annotation: Remove known artifacts, read-through transcripts, and common false positives. Annotate remaining fusions with clinical relevance (Tier 1-3 classification per ACMG/AMP guidelines) [21].
Figure 1: Bioinformatics Pipeline for Fusion Detection. This workflow illustrates the sequential steps from raw sequencing data to final fusion calling, emphasizing the importance of multiple algorithm consensus.
Benchmarking studies evaluating fusion detection algorithms across simulated and experimental datasets reveal significant variation in performance characteristics. When tested on samples with low concentrations of fusion transcripts, Arriba demonstrated superior sensitivity, identifying 88 of 150 simulated fusions at fivefold expression level compared to 57% for the next best method [7].
Table 3: Fusion Detection Tool Performance Characteristics
| Tool | Sensitivity | Precision | Runtime | Strengths |
|---|---|---|---|---|
| Arriba | 88/150 simulated fusions (5x level) | High | <1 hour/sample | Detects intragenic rearrangements, cryptic events |
| FusionCatcher | Moderate-High | Moderate | Hours | Comprehensive gene annotation |
| STAR-Fusion | Moderate | High | Hours | Accurate breakpoint resolution |
| FusionScan | 79% recall | 60% precision | Comparable to leaders | Optimized for intact exon combinations |
| deFuse | Low-Moderate | Moderate | Hours | Good specificity |
Performance varies substantially across data types, with certain tools excelling in specific contexts. For example, Arriba detected 55 TMPRSS2-ERG fusions in the ICGC early-onset prostate cancer cohort (6% more than the next best method) and 8 IG-BCL2/BCL6/MYC translocations in the TGCA-DLBC cohort (60% more than the next best method) [7]. This highlights the importance of matching algorithm selection to experimental context and fusion types of interest.
Gene fusions activate oncogenic pathways through multiple mechanisms, predominantly via either promoter swapping leading to oncogene overexpression or creation of chimeric proteins with constitutive kinase activity.
Figure 2: Oncogenic Signaling Pathways Activated by Gene Fusions. This diagram illustrates the primary mechanisms through which fusion genes drive oncogenesis, highlighting key downstream pathways and potential therapeutic intervention points.
In pancreatic cancer, fusion genes are significantly associated with KRAS wild-type tumors and predominantly involve proteins that stimulate the MAPK signaling pathway, suggesting they functionally substitute for activating KRAS mutations [7]. Similar pathway-specific activities are observed across cancer types, with different fusion classes activating characteristic oncogenic programs.
Table 4: Essential Research Reagents and Platforms for Fusion Detection Studies
| Reagent/Platform | Function | Application Notes |
|---|---|---|
| Targeted RNA-seq Panels | Gene-specific enrichment | 188-gene (hematologic) vs. 241-gene (solid tumor) configurations |
| Archer Analysis Software | Fusion calling from AMP-based data | Optimized for anchored multiplex PCR target enrichment |
| Bionano OGM | Genome-wide structural variant detection | Complementary to RNA-seq for enhancer hijacking events |
| Spike-in Controls (ERCC, Fusion Sequins) | Quantification standards | Enable absolute sensitivity measurement and quality control |
| STAR Aligner | Spliced read alignment | Critical for chimeric junction detection in RNA-seq data |
| Illumina Sequencing Platforms | High-throughput sequencing | MiSeq, HiSeq 2500, NovaSeq for varying throughput needs |
The optimization of fusion detection assays requires careful consideration of the complementary strengths and limitations of available technologies. Targeted RNA-seq provides superior sensitivity for detecting expressed chimeric fusions, particularly those arising from intrachromosomal deletions, while OGM excels in identifying cryptic, enhancer-driven rearrangements that may be missed by transcriptome-based approaches. Bioinformatics pipeline selection significantly impacts detection accuracy, with consensus approaches using multiple algorithms (Arriba, STAR-Fusion, FusionCatcher) providing optimal sensitivity and specificity. These findings support a multimodal approach to fusion detection in clinical and research settings, where orthogonal technologies provide comprehensive structural variant characterization to inform basic cancer research and precision oncology initiatives.
In the era of precision oncology and advanced genetic diagnostics, the detection of actionable molecular alterations is paramount. Fusion genes represent a critical class of biomarkers that guide diagnosis, prognosis, and targeted treatment decisions across numerous cancers. While both DNA sequencing (DNA-seq) and RNA sequencing (RNA-seq) can identify these fusions, discrepancies between their results frequently present significant challenges in clinical and research settings. Understanding the sources of these discrepancies—whether biological, technical, or analytical—is essential for accurate interpretation and appropriate patient management. This guide objectively compares the performance of DNA-seq and RNA-seq for fusion detection, examining the underlying causes of discordant results and providing evidence-based strategies for resolution.
Discrepancies between DNA-seq and RNA-seq results arise from multiple factors spanning biological mechanisms and technical limitations.
Biological Mechanisms: True biological differences can manifest as DNA-RNA discordance. RNA editing represents one such process where the RNA sequence is altered post-transcriptionally. However, studies indicate that RNA editing explains only a minor portion of observed discrepancies [65]. More impactful is the transcriptional process itself; a fusion identified at the DNA level may not be transcribed or expressed, rendering it undetectable by RNA-seq. Conversely, trans-splicing or read-through transcription events can create fusion transcripts without an underlying genomic rearrangement [9].
Technical and Analytical Limitations: Each technology has inherent limitations. DNA-seq, particularly when using targeted panels, can miss rearrangements occurring in large intronic regions or complex genomic contexts outside the covered areas [36] [34]. The detection accuracy is also influenced by the unpredictable nature of genomic breakpoints. In contrast, RNA-seq faces challenges related to RNA quality, which is often compromised in formalin-fixed paraffin-embedded (FFPE) samples due to chemical modification and degradation [34]. Furthermore, the alignment of RNA-seq reads is complicated by phenomena such as alternative splicing and the presence of pseudogenes—dysfunctional genomic sequences with high similarity to functional genes—which can lead to misalignment and false positives [66].
Direct comparisons of DNA-seq and RNA-seq for fusion detection reveal complementary strengths and weaknesses, as summarized by concordance studies and detection rates.
Table 1: Concordance Rates Between Detection Methods for RET Fusions in NSCLC
| Comparison | Concordance Rate | Study Context |
|---|---|---|
| DNA-seq vs. RNA-seq | 92.3% | Early-stage NSCLC [36] |
| RNA-seq vs. FISH | 84.6% | Early-stage NSCLC [36] |
| DNA-seq vs. FISH | 82.5% | Early-stage NSCLC [36] |
Table 2: Detection Performance in Clinical Validation Studies
| Metric | DNA-seq Only | RNA-seq Only | Integrated DNA/RNA Approach |
|---|---|---|---|
| Sensitivity | 93.4% | 86.9% | 100% [34] |
| Specificity | 96.9% | 96.9% | 100% [34] |
| Commonly Missed Fusions | ETV6::NTRK3, CCDC6::RET | TRIM46::NTRK1, CD74::ROS1 | None (complementary detection) [34] |
The data demonstrates that while DNA-seq and RNA-seq independently show high performance, an integrated approach achieves superior sensitivity and specificity by leveraging their complementary nature. RNA-seq, especially targeted panels, can identify fusions missed by DNA-seq. For instance, in one study, targeted RNA-seq uncovered five additional RET+ cases missed by whole-transcriptome sequencing [36]. Another study on acute myeloid leukemia found that RNA-seq detected 90% of fusion events reported by routine diagnostics (karyotyping and FISH) with high evidence [9].
A robust protocol for resolving discrepancies involves a complementary testing algorithm. One validated approach begins with an initial amplicon-based DNA/RNA sequencing step. If this is negative for oncogenic drivers, it is reflexed to a more comprehensive hybridization-capture-based RNA sequencing [14]. This strategy successfully identified actionable fusions in non-small cell lung carcinoma (NSCLC) that were missed by the initial amplicon-based assay [14].
Another developed assay simultaneously utilizes both DNA and RNA from FFPE samples. The DNA component helps confirm the genomic rearrangement, while the RNA component confirms the expression of the fusion transcript, thereby ruling out silent rearrangements [34]. This dual-layer approach facilitates precise diagnosis and treatment.
Beyond laboratory workflows, sophisticated bioinformatics strategies are critical for accurate fusion detection from RNA-seq data. This involves using multiple state-of-the-art fusion callers (e.g., Arriba, FusionCatcher) and applying stringent, custom filtering strategies to reduce false positives [9]. Key filters include:
For DNA-seq, specialized structural variant callers like Delly are employed to identify the genomic breakpoints supporting a fusion [36].
Successful resolution of DNA-seq/RNA-seq discrepancies relies on a suite of specialized reagents, kits, and computational tools.
Table 3: Key Research Reagent Solutions for Fusion Detection
| Item | Function | Example Use Case |
|---|---|---|
| QIAamp DNA FFPE Tissue Kit | Extracts high-quality DNA from challenging FFPE samples. | Input for DNA-based NGS fusion panels [36]. |
| KAPA Hyper Prep Kit | Prepares NGS libraries from extracted DNA. | Used in targeted DNA-seq for fusion detection [36]. |
| Stranded Ribo-Zero Depletion Kit | Removes ribosomal RNA for whole-transcriptome sequencing. | RNA library prep for comprehensive fusion screening [67]. |
| Stranded Poly-A Enrichment Kit | Selects polyadenylated RNA for sequencing. | RNA library prep focusing on mRNA [67]. |
| GeneWell Fusion Reference Standards | Spiked-in controls containing validated fusions. | Assess assay sensitivity, specificity, and limit of detection [34]. |
| Arriba & FusionCatcher | Bioinformatics tools for fusion detection from RNA-seq data. | Used in tandem for robust fusion calling in AML [9]. |
| Delly | Bioinformatics tool for calling structural variants from DNA-seq. | Identifies genomic breakpoints supporting fusions [36]. |
The discrepancy between DNA-seq and RNA-seq results in fusion gene detection is not a mere technical artifact but a multifaceted issue with biological and methodological roots. Evidence consistently shows that neither technology is infallible alone; DNA-seq can miss expressed fusions due to breakpoint location or design limitations, while RNA-seq can be confounded by low expression, poor sample quality, or complex alignment scenarios.
The most reliable path forward involves an integrated, complementary approach. Combining DNA and RNA analysis within a single assay or a reflexive testing algorithm maximizes detection sensitivity and specificity, minimizing false negatives and positives. Furthermore, employing robust bioinformatics pipelines with stringent filtering is essential for accurate data interpretation. As precision medicine continues to evolve, embracing these multi-modal diagnostic strategies will be crucial for ensuring all patients receive accurate diagnoses and benefit from the most effective targeted therapies.
Next-generation sequencing (NGS) has revolutionized cancer diagnostics, yet traditional approaches that analyze DNA and RNA separately present significant limitations in clinical practice. DNA sequencing alone struggles to reliably detect key oncogenic drivers such as gene fusions and exon-skipping events because breakpoints often occur within introns or repetitive regions that are challenging for hybridization-capture assays [68]. This diagnostic gap has clinical consequences, as these alterations represent actionable targets for targeted therapies. RNA sequencing provides direct evidence of fusion transcripts and aberrant splicing, offering a solution to these limitations [26]. However, implementing separate DNA and RNA assays requires more specimen material, increases costs, and prolongs turnaround times—critical factors in clinical decision-making. This comparison guide examines the emerging solution: integrated DNA-RNA NGS assays that simultaneously capture multiple data types from a single workflow, offering a more comprehensive genomic profiling approach for clinical use.
Recent studies demonstrate that integrated DNA-RNA sequencing significantly outperforms DNA-only approaches in identifying clinically actionable alterations, particularly for fusion detection.
Table 1: Comparative Performance of Sequencing Approaches for Fusion Detection
| Metric | DNA-Only NGS | RNA-Only NGS | Combined DNA-RNA |
|---|---|---|---|
| Fusion Detection Sensitivity | Limited for novel/unexpected partners [68] | High for expressed fusions [26] | Highest; captures both known and novel [69] [26] |
| Exon-Skipping Detection | Challenging; indirect inference [68] | Direct detection via transcriptome [68] | Comprehensive DNA+RNA evidence [68] |
| Actionable Alteration Rate | ~80-90% of cases [70] | N/A (fusion-focused) | 98% of cases [69] |
| Novel Fusion Discovery | Limited by design | Possible but requires high expression [26] | Enhanced; captures partners missed by targeted panels [69] |
| Orthogonal Confirmation Needed | Often required for fusions [70] | Sometimes required | Reduced need due to combined evidence |
In a validation study of 2,230 clinical tumor samples, the combined assay improved fusion detection and enabled direct correlation of somatic alterations with gene expression. This approach uncovered clinically actionable alterations in 98% of cases and revealed complex genomic rearrangements that would likely have remained undetected with DNA-only testing [69]. For non-small cell lung cancer (NSCLC)—a malignancy with numerous actionable fusions—one study found that approximately 10% of cases required reflex RNA sequencing to identify oncogenic drivers after initial DNA-based testing was negative [14].
Integrated assays undergo rigorous validation to ensure reliability across variant types. One study established comprehensive performance metrics using exome-wide somatic reference standards containing 3,042 single nucleotide variants (SNVs) and 47,466 copy number variations (CNVs) [69]. When compared to orthogonal methods, combined approaches show variable but generally high concordance depending on the alteration type:
Table 2: Concordance Rates with Orthogonal Methods in Clinical Samples
| Alteration Type | Cancer Type | Sensitivity (%) | Specificity (%) | Notes |
|---|---|---|---|---|
| SNVs (e.g., KRAS) | Colorectal Cancer | 87.4 | 79.3 | DNA component performance [70] |
| Fusions (e.g., ALK) | NSCLC | 100 | 100 | RNA significantly enhances detection [70] |
| Fusions (e.g., ROS1) | NSCLC | 33.3 | N/A | Lower sensitivity for certain fusions with DNA-only [70] |
| Amplifications (ERBB2) | Breast Cancer | 53.7 | 99.4 | DNA-only; challenges with CNV calling [70] |
| Amplifications (ERBB2) | Gastric Cancer | 62.5 | 98.2 | DNA-only; tissue quality impacts performance [70] |
The detection threshold for variant calling significantly impacts assay performance. Studies recommend a minimum 2% variant allele frequency (VAF) threshold for optimal specificity in mutation detection, as specificity dramatically decreases below this level [71].
Several technical approaches exist for combining DNA and RNA sequencing:
The DNA/RNA co-hybrid capture sequencing (DRCC-Seq) approach demonstrates particular innovation by mixing pre-captured DNA and RNA libraries in defined proportions before a single capture reaction, creating one sequencing library [68]. This method shows optimal performance with a 1:1 ratio of DNA to RNA probes, as higher RNA proportions negatively impact DNA data quality and copy number calling [68].
Successful implementation requires meticulous attention to laboratory procedures:
Nucleic Acid Isolation: For fresh frozen solid tumors, the AllPrep DNA/RNA Mini Kit enables simultaneous isolation of both nucleic acids. For FFPE samples, dedicated kits like the AllPrep DNA/RNA FFPE Kit account for cross-linking and fragmentation [69].
Library Preparation: Input requirements typically range from 10-200 ng of extracted DNA or RNA. For RNA library construction from fresh frozen tissue, the TruSeq stranded mRNA kit is commonly used, while FFPE samples require specialized kits like SureSelect XTHS2 to handle degraded material [69].
Sequencing and QC: Sequencing is typically performed on Illumina platforms such as NovaSeq 6000. Quality control metrics include Q30 scores >90% and passing filter (PF) rates >80%. For RNA sequencing, metrics like RNA integrity number (RIN) are crucial for assessing sample quality [69].
Table 3: Essential Research Reagent Solutions for Integrated NGS
| Reagent/Category | Specific Examples | Function in Workflow |
|---|---|---|
| Nucleic Acid Extraction | AllPrep DNA/RNA Mini Kit (Qiagen); Concert FFPE kits [69] [68] | Co-extraction of DNA and RNA while preserving integrity |
| Library Preparation | TruSeq stranded mRNA kit; SureSelect XTHS2; Rapid MaxDNA Lib Prep kit [69] [68] | Preparation of sequencing libraries from nucleic acid inputs |
| Hybridization Capture | SureSelect Human All Exon; Custom DNA+RNA probe panels [69] [68] | Target enrichment for exonic regions and fusion-related genes |
| Quality Control | Qubit Fluorometer; TapeStation; QIAxcel Advanced System [69] [68] | Quantification and quality assessment of inputs and libraries |
| Sequence Capture | Twist Fast Hybridization and Wash Kit; Custom probe panels [69] [68] | Target enrichment with optimized conditions for both DNA and RNA |
The computational pipeline for integrated assays requires specialized approaches:
Alignment: DNA sequencing data is typically mapped to the human genome (hg38) using BWA aligner, while RNA sequencing data uses STAR aligner for its handling of splice junctions [69].
Variant Calling: Somatic SNVs and indels are detected using optimized algorithms like Strelka2, with RNA-seq variant calling performed using tools such as Pisces [69].
Fusion Detection: A consensus approach requiring detection by multiple algorithms (e.g., STARfusion and FusionCatcher) significantly reduces false positives [26].
Quality Control: Unique considerations include calculation of off-target rates, duplicate reads, and for RNA-seq, assessment of strand-specificity and DNA contamination [69].
Integrated DNA-RNA NGS Workflow
Rigorous validation of integrated assays follows a structured approach:
Reference Materials: Use of cell lines and synthetic standards with known mutations across varying tumor purities establishes baseline performance. The 3042 SNVs and 47,466 CNVs in validated reference materials enable exome-wide analytical validation [69].
Orthogonal Confirmation: Comparison with established methods (FISH, RT-PCR, ddPCR) verifies results. One study showed 100% concordance for ALK fusions between NGS and orthogonal methods, though sensitivity for ROS1 fusions was lower (33.3%) with DNA-only approaches [70].
Clinical Utility Assessment: Implementation in real-world cohorts demonstrates practical value. In one clinical cohort, integrated profiling increased the fusion diagnostic rate from 63% to 76% compared to conventional approaches [26].
The utility of combined DNA-RNA profiling extends across multiple malignancies:
Non-Small Cell Lung Cancer: Detection of targetable fusions in ALK, ROS1, RET, and NTRK is enhanced by RNA sequencing, with one study identifying these alterations in approximately 9% of reflex-tested cases [14].
Hematological Malignancies: Custom panels targeting 188 fusion-related genes, including immune receptor loci (TCR, IG), enable comprehensive profiling while simultaneously characterizing the immune repertoire [26].
Solid Tumors: Expanded panels covering 241 fusion-related genes identify both established and novel driver events in sarcoma, prostate, and other solid tumors [26].
Multi-Omic Data Enhances Clinical Utility
Integrated DNA-RNA sequencing represents a significant advancement over DNA-only approaches for comprehensive genomic profiling in clinical oncology. The combined approach enhances detection of actionable alterations, particularly gene fusions, while providing a more complete molecular portrait of tumors. The DRCC-Seq methodology offers a practical implementation strategy with optimized 1:1 DNA:RNA probe ratios, balancing data quality and comprehensive variant detection [68].
As the field advances, several developments will shape future implementations: Single-cell multi-omics technologies now enable simultaneous DNA and RNA profiling within individual cells, revealing clonal heterogeneity and genotype-phenotype relationships previously obscured by bulk sequencing [72]. Long-read sequencing technologies improve resolution of complex structural variants and repetitive regions, while advanced bioinformatics pipelines incorporating machine learning enhance variant interpretation [73].
For clinical laboratories adopting these approaches, establishing rigorous validation frameworks covering all variant types remains essential. As evidence accumulates demonstrating the clinical utility of integrated profiling, these comprehensive assays are poised to become the standard of care in precision oncology, ultimately improving patient outcomes through more accurate diagnosis and personalized treatment strategies.
Gene fusions are critical molecular drivers in cancer, with profound implications for diagnosis, prognosis, and therapeutic decision-making in oncology. The detection of these rearrangements has evolved significantly with the advent of next-generation sequencing (NGS) technologies, primarily through DNA sequencing (DNA-seq) and RNA sequencing (RNA-seq) approaches. Each method offers distinct advantages and limitations based on its underlying principles. DNA-seq identifies structural variants at the genomic level, while RNA-seq detects chimeric fusion transcripts expressed in the cell. Understanding the concordance and discordance between these platforms is essential for clinical laboratories and researchers aiming to implement robust testing protocols that maximize detection of clinically actionable alterations. This guide synthesizes evidence from recent real-world comparative studies to objectively evaluate the performance of RNA-seq versus DNA-seq for fusion gene detection across various cancer types and clinical scenarios.
Recent studies have employed rigorous head-to-head comparison designs to evaluate the performance of DNA-seq and RNA-seq for fusion detection. The most robust analyses utilize large, real-world patient cohorts with orthogonal validation to establish ground truth. A 2025 study by PMC12608001 compared a 108-gene targeted RNA-seq panel with optical genome mapping (OGM) in 467 acute leukemia cases, including 360 AML, 89 B-ALL, 12 T-ALL, and 6 MPAL cases [21]. Similarly, a 2025 study in Communications Medicine analyzed 2,230 clinical tumor samples using an integrated RNA and DNA exome assay, providing extensive data on fusion detection across diverse cancer types [69].
The technical protocols for nucleic acid extraction and library preparation significantly impact fusion detection capabilities. For DNA-seq, studies often use hybrid capture-based panels with extended intronic coverage to capture breakpoints in genes known to be involved in fusions. The FindDNAFusion pipeline, for instance, employs three software tools (JuLI, Factera, and GeneFuse) with a combinatorial approach to improve detection accuracy to 98.0% for DNA panels with intron-tiled bait probes [45]. For RNA-seq, the Archer FusionPlex Pan-Heme panel utilizes anchored multiplex PCR (AMP) technology, which employs gene-specific primers combined with universal adapters to capture both known and novel fusion partners without prior knowledge of the partner sequence [74].
Robust validation frameworks for integrated DNA-RNA sequencing assays typically involve three key steps: (1) analytical validation using custom reference samples containing known variants; (2) orthogonal testing in patient samples with established fusion status; and (3) assessment of clinical utility in real-world cases [69]. For instance, one validation approach used exome-wide somatic reference standards containing 3,042 SNVs and 47,466 CNVs, with multiple sequencing runs of cell lines at varying purities to establish sensitivity and specificity [69].
Quality control metrics are particularly crucial for RNA-seq from clinical samples, especially formalin-fixed paraffin-embedded (FFPE) tissue, where RNA degradation can impact results. Studies have shown that while FFPE samples yield shorter RNA fragments, the detection of fusion transcripts does not significantly differ between freshly frozen and FFPE samples when using appropriate library preparation methods and bioinformatic filters [75]. Standard QC metrics for RNA-seq include ribosomal RNA contamination assessment, unique mapping rates, and expression correlation between replicate samples.
Table 1: Key Methodological Approaches in DNA-seq vs. RNA-seq Comparison Studies
| Study | Cohort Size & Cancer Type | DNA-seq Method | RNA-seq Method | Orthogonal Validation |
|---|---|---|---|---|
| PMC12608001 (2025) | 467 acute leukemia cases | Optical Genome Mapping | 108-gene AMP-based targeted panel | FISH, RT-PCR, clinical follow-up |
| Nature Communications Medicine (2025) | 2,230 clinical tumor samples | Whole Exome Sequencing | Whole Transcriptome Sequencing | Orthogonal panels, reference standards |
| Frontiers in Molecular Biosciences (2025) | 29 colorectal cancer patients | - | STAR-Fusion on FFPE vs. fresh frozen | Multiple database annotation |
| Cancers (2024) | 264 leukemia patients | Karyotyping | 199-gene Archer FusionPlex | RT-PCR, mRNA sequencing |
Comprehensive comparative studies in acute leukemia reveal distinctive patterns of concordance between DNA and RNA-based detection methods. A 2025 analysis of 467 acute leukemia cases demonstrated an overall concordance rate of 88.1% between targeted RNA-seq and optical genome mapping [21]. However, this concordance varied substantially across leukemia subtypes, ranging from 80.2% in B-ALL to 41.7% in T-ALL, highlighting the impact of disease-specific biology on methodological performance [21].
The distribution of uniquely detected rearrangements further illuminates the complementary nature of these technologies. Among 234 clinically relevant events, OGM uniquely identified 37 (15.8%), while RNA-seq exclusively detected 22 (9.4%) [21]. This disparity stems from fundamental biological differences: RNA-seq effectively identifies expressed chimeric fusions, while OGM excels at detecting cryptic, enhancer-driven events that may not generate fusion transcripts. Enhancer-hijacking lesions involving genes such as MECOM, BCL11B, and IGH showed particularly poor concordance (20.6%) compared to all other aberrations (93.1%) [21].
A separate 2024 study of 264 leukemia patients validated targeted RNA-seq against conventional karyotyping and RT-PCR, demonstrating 100% concordance with RT-PCR but only 83.3% concordance with karyotyping [74]. Notably, targeted RNA-seq identified 29 fusion events missed by karyotyping, while 5 cases initially called positive by karyotyping showed no pathogenic rearrangements upon confirmatory testing with mRNA sequencing [74].
In non-small cell lung cancer (NSCLC), RET fusions have been a particular focus of methodological comparisons. A 2025 retrospective study of 40 RET+ NSCLC patients found a 92.3% concordance between DNA-seq and RNA-seq, with RNA-seq identifying five additional RET+ cases missed by DNA-seq [36]. The study employed a 425-gene DNA panel with breakpoint analysis and compared it against both whole-transcriptome sequencing and targeted RNA-seq, revealing the enhanced sensitivity of targeted RNA approaches [36].
The clinical utility of reflexive testing algorithms is evident in real-world practice. One study of 1,211 NSCLC specimens implemented a testing algorithm using amplicon-based DNA/RNA sequencing followed by reflex hybridization-capture-based RNA sequencing if initial testing was negative [14]. Among 120 cases (approximately 10%) that underwent reflex testing, 9 oncogenic fusions were identified, including clinically actionable alterations in ALK, BRAF, NRG1, NTRK3, ROS1, and RET – none of which were detected by the initial amplicon-based assay [14].
Analysis of the AACR Project Genie database encompassing 20,900 NSCLC cases revealed that of 1,081 fusion-positive cases, 893 (82.6%) could theoretically be detected by amplicon-based assays, leaving a significant minority requiring more comprehensive approaches [14].
Table 2: Concordance Rates Between DNA-seq and RNA-seq Across Cancer Types
| Cancer Type | Overall Concordance Rate | DNA-Seq Unique Detection | RNA-Seq Unique Detection | Key Discordant Fusion Types |
|---|---|---|---|---|
| Acute Leukemia (all types) | 88.1% [21] | 15.8% [21] | 9.4% [21] | Enhancer-hijacking lesions (MECOM, BCL11B, IGH) |
| B-ALL | 80.2% [21] | - | - | - |
| T-ALL | 41.7% [21] | - | - | - |
| NSCLC (RET fusions) | 92.3% [36] | - | 5 additional cases by RNA-seq [36] | Noncanonical RET partners |
| Colorectal Cancer | No significant difference in FFPE vs. fresh frozen [75] | - | - | - |
The biological nature of genomic rearrangements fundamentally impacts their detection by DNA-seq versus RNA-seq. Enhancer hijacking events represent a key category prone to discordant detection. These lesions reposition enhancer elements to drive oncogene expression without generating fusion transcripts, making them detectable by DNA-based methods but largely invisible to RNA-seq [21]. In acute leukemia, this explains why rearrangements involving MECOM and BCL11B show particularly low concordance between platforms [21].
Conversely, RNA-seq slightly outperforms DNA-based methods for fusions arising from intrachromosomal deletions that are sometimes labeled by OGM as simple deletions rather than rearrangements [21]. The expression level of fusion transcripts also critically impacts detectability by RNA-seq. Low-expression fusions may fall below the detection threshold of RNA-seq assays, while DNA-based methods remain unaffected by transcriptional activity [36].
Technical artifacts also contribute to discordance. DNA-seq may identify rearrangements in non-expressed genes or non-functional open reading frames that never produce fusion transcripts [39]. Similarly, RNA-seq can detect trans-splicing events or read-through transcripts that do not correspond to actual genomic rearrangements [75]. These biological and technical factors necessitate careful interpretation of discordant results between platforms.
Sample quality profoundly impacts method performance, particularly for RNA-seq. FFPE-derived RNA is often degraded, potentially affecting fusion detection sensitivity [75]. However, a 2025 study comparing matched FFPE and freshly frozen colorectal cancer samples found no statistically significant difference in fusion detection rates when using appropriate library preparation methods optimized for degraded RNA [75].
DNA-seq panels vary significantly in their coverage of intronic regions where breakpoints occur. Panels without comprehensive intron tiling may miss rearrangements in key genes, while those with extended intronic coverage improve detection but increase sequencing costs and data analysis complexity [45]. The FindDNAFusion study demonstrated that a combinatorial bioinformatics approach applied to DNA panels with intron-tiled bait probes could achieve 98.0% accuracy [45].
RNA-seq chemistry also influences detection capabilities. Amplicon-based approaches offer high sensitivity for known fusions but may miss novel partners, while hybridization-capture methods provide more comprehensive coverage but require higher RNA input and are more susceptible to degradation effects [14]. The choice of bioinformatics pipelines substantially impacts both false-positive and false-negative rates across both platforms [39].
The choice between DNA-seq and RNA-seq for fusion detection has direct implications for therapeutic decision-making. In NSCLC, the identification of RET fusions dictates eligibility for RET inhibitors such as selpercatinib and pralsetinib [36]. Studies show that targeted RNA-seq can identify additional RET+ cases missed by DNA-seq alone, potentially expanding the population eligible for these targeted therapies [36].
In leukemia, the detection of specific fusions directly influences risk stratification and treatment selection. For example, KMT2A rearrangements in AML warrant more intensive induction regimens and allogeneic transplantation in first remission, while RUNX1::RUNX1T1 fusions may respond to chemotherapy alone [74]. The superior detection of enhancer-hijacking events by DNA-based methods ensures appropriate risk assignment for patients who might otherwise be misclassified [21].
Comprehensive fusion detection also facilitates identification of rare or novel fusion events with clinical relevance. In one study of colorectal cancer, RNA-seq identified a potentially actionable LRRFIP2::ALK fusion not previously described in this cancer type, with an intact tyrosine kinase domain that could be targeted by ALK inhibitors [75]. Such findings underscore the therapeutic opportunities enabled by thorough fusion profiling.
Based on concordance data from real-world studies, integrated testing approaches maximize clinical sensitivity for fusion detection. Sequential testing algorithms, beginning with DNA-based panels followed by RNA-seq for negative cases, provide a cost-effective strategy for comprehensive profiling [14]. This approach identified an additional 9 actionable fusions in 120 reflex-tested NSCLC cases that were missed by initial DNA-based testing [14].
For clinical scenarios where tissue is limited or rapid turnaround is essential, targeted RNA-seq panels offer a practical solution with high sensitivity for therapeutically relevant fusions. The 199-gene Archer FusionPlex panel demonstrated 100% concordance with RT-PCR in leukemia samples while identifying novel fusions such as RUNX1::DOPEY2, RUNX1::MACROD2, and ZCCHC7::LRP1B [74].
Parallel DNA and RNA testing from a single specimen provides the most comprehensive approach, particularly for cancers with diverse fusion mechanisms and partners. The 2025 study in Communications Medicine validated a combined RNA and DNA exome assay across 2,230 tumors, demonstrating improved detection of actionable alterations in 98% of cases [69]. This integrated approach enabled direct correlation of somatic alterations with gene expression and revealed complex genomic rearrangements that would likely have remained undetected with either method alone [69].
Table 3: Key Research Reagent Solutions for Fusion Detection Studies
| Reagent/Tool Category | Specific Examples | Function in Fusion Detection |
|---|---|---|
| DNA-seq Panels | GeneseeqPrime 425-gene panel [36] | Comprehensive genomic breakpoint detection with extended intronic coverage |
| RNA-seq Panels | Archer FusionPlex Pan-Heme (199 genes) [74] | Targeted detection of fusion transcripts via anchored multiplex PCR |
| Library Prep Kits | TruSeq stranded mRNA kit [69], KAPA RNA Hyper with rRNA Erase [75] | Library construction from RNA, ribosomal RNA depletion |
| Bioinformatics Tools | STAR-Fusion [75], FindDNAFusion [45], Archer Analysis [74] | Fusion transcript identification, genomic breakpoint calling |
| Reference Standards | GeneWell fusion reference standards [34] | Analytical validation, limit of detection studies |
Comparative analyses across diverse cancer types consistently demonstrate that DNA-seq and RNA-seq provide complementary rather than redundant information for fusion gene detection. The 88.1% overall concordance rate observed in large leukemia cohorts, with platform-specific unique detections accounting for approximately 25% of clinically relevant events, underscores the limitations of relying on a single methodology [21]. Biological mechanisms, particularly enhancer hijacking events that do not generate fusion transcripts, fundamentally drive these discordances and necessitate DNA-based detection approaches [21].
For clinical laboratories and research institutions, the evidence supports integrated testing algorithms that combine DNA and RNA analysis to maximize detection of actionable fusions. Reflexive testing pathways, beginning with comprehensive DNA panels followed by targeted RNA-seq for negative cases, provide a practical balance between cost-effectiveness and sensitivity [14]. For precision oncology initiatives where tissue is limited or comprehensive profiling is prioritized, parallel DNA and RNA sequencing from a single sample offers the most complete characterization of the fusion landscape [69]. As therapeutic options targeting gene fusions continue to expand, ensuring their reliable detection through multimodal approaches becomes increasingly critical for optimizing patient outcomes across the spectrum of hematologic and solid tumor malignancies.
The precise detection of genomic rearrangements, particularly fusion genes, is a critical component of cancer diagnosis, prognosis, and therapeutic decision-making. The establishment of rigorous analytical sensitivity and specificity parameters forms the foundation of any reliable clinical detection assay. This guide provides a systematic comparison of two prominent technological approaches for fusion detection: targeted RNA sequencing (RNA-seq) and optical genome mapping (OGM) as a DNA-level method. As multi-modal testing becomes increasingly common in diagnostic settings, understanding the performance characteristics, limitations, and complementary strengths of these platforms is essential for researchers, clinical laboratories, and drug development professionals navigating the complex landscape of genomic structural variant detection [21].
A comprehensive 2025 study directly compared a 108-gene targeted RNA-seq panel with OGM across 467 acute leukemia cases, providing robust performance data in a clinical context [21]. The findings demonstrate distinct and complementary strengths for each technology.
Table 1: Overall Performance Metrics in Acute Leukemia (n=467 cases)
| Performance Metric | Targeted RNA-seq | Optical Genome Mapping (OGM) | Combined Approach |
|---|---|---|---|
| Overall Concordance | 88.1% with OGM | 88.1% with RNA-seq | - |
| Unique Detection of Clinically Relevant Rearrangements | 22/234 (9.4%) | 37/234 (15.8%) | 59/234 (25.2%) |
| Detection Rate in T-ALL | Higher Concordance | 41.7% Concordance | - |
| Enhancer-Hijacking Lesions (e.g., MECOM, BCL11B) | Poor (20.6% Concordance) | Effectively Detects | - |
| Fusions from Intrachromosomal Deletions | Effective Detection | May interpret as simple deletions | - |
The data reveals that while overall concordance is high, each method uniquely contributes to the diagnostic yield. OGM demonstrated a significant advantage in detecting cryptic, enhancer-driven rearrangements that often evade RNA-based detection. Conversely, targeted RNA-seq showed slightly superior performance for fusions arising from intrachromosomal deletions, which OGM sometimes misclassified as simple deletions [21]. This underscores the principle that the choice of platform profoundly influences the spectrum of detectable alterations.
Table 2: Assay Performance Across Leukemia Subtypes
| Leukemia Type | Cases (n) | Tier 1 Aberration Detection Rate | Key Fusion Examples |
|---|---|---|---|
| Acute Myeloid Leukemia (AML) | 360 | 23.9% | KMT2A, MECOM |
| B-Acute Lymphoblastic Leukemia (B-ALL) | 89 | 60.7% | BCR::ABL1 |
| T-Acute Lymphoblastic Leukemia (T-ALL) | 12 | Information Missing | BCL11B |
| Mixed Phenotype Acute Leukemia (MPAL) | 6 | Information Missing | - |
Determining the Limit of Detection (LoD) is a critical step in assay validation. The following sections detail common experimental approaches used to establish analytical sensitivity for both RNA-seq and DNA-based methods.
For RNA-seq fusion detection, a standard protocol for determining LoD involves serial dilution experiments. In one validation study, RNA from the H2228 cell line (which harbors a known EML4-ALK fusion) was diluted into fusion-negative background RNA. The results demonstrated reliable fusion detection down to a 10% variant allele frequency, establishing the assay's LoD at this level [27].
For even more precise quantification, studies utilize synthetic spike-in controls. One such approach used in silico-generated fusion transcripts spiked into RNA-seq data from benign tissue (H1 human embryonic stem cells) at nine different expression levels, ranging from five- to 200-fold [7]. Another employed synthetic RNA molecules mimicking oncogenic fusions, which were spiked into RNA libraries at 10 different concentrations (from 10^-8.57 pMol to 10^-3.47 pMol) across 20 replicates [7]. A separate study used "fusion sequins" (spike-in controls for fusion genes) spiked into cell line RNA, achieving 50% detection at 2 pM input and 100% detection across a dynamic range of 8 pM to 31 nM [26].
While the provided search results focus more on RNA-seq validation, OGM and other DNA-based methods similarly require rigorous LoD studies. These typically involve diluting DNA from cell lines with known structural variants into wild-type DNA, followed by statistical analysis to determine the lowest detectable variant allele fraction with high confidence. The HemoTargets and hg38-primary transcript feature files are commonly used in OGM data analysis for this purpose [21].
The performance differences between RNA-seq and OGM stem from their fundamental technological principles.
Targeted RNA-seq uses biotinylated oligonucleotide probes to enrich for transcripts of interest prior to sequencing. This enrichment leads to a significant increase in coverage for targeted genes—achieving up to 93% on-target reads and a 33- to 59-fold enrichment compared to standard RNA-seq [26]. This enhanced coverage directly improves sensitivity for lowly expressed fusion transcripts.
Key Strengths:
Key Limitations:
OGM operates at the DNA level, using ultra-high-molecular-weight DNA that is labeled at specific enzyme recognition sites and imaged to create a genome-wide physical map.
Key Strengths:
Key Limitations:
Technological Synergy in Fusion Detection
Successful implementation of a fusion detection assay requires careful selection of reagents and computational tools.
Table 3: Research Reagent Solutions for Fusion Detection Assays
| Category | Specific Tool / Reagent | Function in Assay |
|---|---|---|
| Target Enrichment | TruSight RNA Pan-Cancer Panel (Illumina) [76] | Targeted capture of 1385 cancer-related genes for RNA-seq |
| Anchored Multiplex PCR (AMP) [21] | Target enrichment method for targeted RNA-seq panels | |
| Bioinformatic Pipelines | STAR-Fusion [7] [76] | Algorithm for fusion detection from RNA-seq data |
| FusionCatcher [7] [26] | Fusion detection algorithm | |
| Arriba [7] [76] | High-sensitivity fusion detection algorithm | |
| Validation Tools | Fusion Sequins [26] | Synthetic spike-in RNA standards for quantification |
| ERCC RNA Spike-In Controls [26] | External RNA controls for quality assessment | |
| Analysis Software | Bionano Access & VIA Software [21] | OGM data analysis and visualization |
| Archer Analysis [21] | Software for variant calling in AMP-based sequencing |
The establishment of analytical sensitivity and specificity is not merely a regulatory requirement but a fundamental scientific practice that directly impacts patient care. Targeted RNA-seq excels where functional transcript detection is paramount, offering high sensitivity and precise junction resolution for expressed fusions. In contrast, OGM provides a genome-wide surveillance capability that is agnostic to transcriptional activity, making it indispensable for detecting enhancer hijacking and other non-transcriptional structural variants. Rather than viewing these technologies as competitive, the most comprehensive approach for fusion detection in clinical and research settings involves their strategic integration. The synergistic use of both RNA and DNA-level analyses maximizes detection sensitivity and ensures that clinically significant structural variants are not overlooked, ultimately advancing the goals of precision oncology.
Oncogenic gene fusions are critical drivers in numerous cancers, with profound implications for diagnosis, prognosis, and targeted therapy selection. The accurate detection of these hybrid genes, formed through chromosomal rearrangements like translocations, deletions, and inversions, is therefore paramount in clinical oncology and research [3]. For years, the question has persisted: what is the optimal molecular method for fusion detection? Next-generation sequencing (NGS) technologies have emerged as powerful tools, but they are primarily split into two approaches: DNA sequencing (DNA-seq), which interrogates the genome for structural variants, and RNA sequencing (RNA-seq), which captures expressed fusion transcripts. While each method has its strengths, relying on either one alone can lead to missed detections. This case study objectively compares the performance of these platforms and demonstrates, through experimental data, that an integrated DNA and RNA sequencing approach provides the most comprehensive detection of clinically relevant gene fusions, ultimately enhancing patient stratification for targeted therapies.
Head-to-head comparisons in clinical cohorts reveal that DNA-seq and RNA-seq have complementary detection capabilities, with neither platform identifying all fusion events on its own. The following tables summarize key performance metrics from recent studies.
Table 1: Comparative Detection Rates in Acute Leukemia (467 cases) [21]
| Method | Clinically Relevant Rearrangements Detected | Percentage of Total | Notable Strengths |
|---|---|---|---|
| OGM (DNA-level) | 37 / 234 | 15.8% | Superior for enhancer-hijacking lesions (e.g., MECOM, BCL11B, IGH rearrangements) |
| Targeted RNA-Seq | 22 / 234 | 9.4% | Better for fusions from intrachromosomal deletions and expressed chimeric fusions |
| Concordant Findings | 175 / 234 | 74.7% | --- |
Table 2: Performance in Solid Tumors (Non-Small Cell Lung Cancer and other solid tumors) [34] [14] [37]
| Study Context | Method | Key Finding | Implication |
|---|---|---|---|
| Early-Stage NSCLC (RET fusions) | DNA-seq | Identified putative RET+ cases | Can miss fusions involving large introns or complex rearrangements [37] |
| Targeted RNA-seq | Identified additional actionable RET+ cases missed by other methods | Higher sensitivity for detecting expressed fusion transcripts [37] | |
| 120 Reflex NSCLC Cases | Amplicon-based DNA/RNA assay | Missed 9 oncogenic fusions | Limitations in detecting rare/novel fusions with amplicon-based designs [14] |
| Reflex Hybridization-Capture RNA-seq | Detected 9 fusions (in ALK, BRAF, NRG1, NTRK3, ROS1, RET) | Essential for maximizing detection of rare and novel oncogenic fusions [14] | |
| 60 Clinical Solid Tumor Samples | DNA-based NGS alone | 93.4% (57/61) concordance with previous results | Missed fusions like ETV6::NTRK3 and CCDC6::RET [34] |
| RNA-based NGS alone | 86.9% (53/61) concordance with previous results | Missed fusions like TRIM46::NTRK1 and CD74::ROS1 [34] | |
| Integrated DNA/RNA NGS | 100% Sensitivity & Specificity | Identified and validated a previously missed TPM3::NTRK1 fusion | Combined approach corrects for individual method limitations [34] |
The comparative data is derived from rigorously validated clinical and research assays. The following section outlines the standard experimental protocols used to generate these findings.
DNA sequencing methods identify the genomic breakpoints where two independent genes have joined. Optical Genome Mapping (OGM) and targeted DNA panels are two prominent techniques.
RNA sequencing detects the chimeric transcripts that result from gene fusions, providing direct evidence of expression.
The most robust diagnostic strategy involves a complementary workflow. A sample first undergoes targeted DNA sequencing. If negative for a driver mutation or if there is a high clinical suspicion of a fusion, it is reflexed to targeted RNA sequencing. This combined approach ensures that fusions missed by one method (e.g., due to large introns in DNA-seq or low expression in RNA-seq) can be captured by the other [14] [37]. The conceptual relationship between these methods in detecting a fusion event is illustrated below.
The performance disparities between DNA-seq and RNA-seq are not random but stem from fundamental biological and technical factors.
Table 3: Key Research Reagent Solutions for Fusion Detection
| Item / Solution | Function in Experiment | Specific Examples / Notes |
|---|---|---|
| Targeted RNA-Seq Panels | Multiplexed detection of known and novel fusion transcripts from RNA. | Archer (AMP-based), Paragon Genomics AccuFusion (amplicon-based), Hybridization-capture panels [21] [77]. |
| Targeted DNA-Seq Panels | Interrogation of genomic DNA for structural variants and breakpoints. | Large panels (e.g., 425-gene DNA panel) often using hybrid capture technology [37]. |
| Optical Genome Mapping (OGM) | Genome-wide detection of structural variants without sequencing, at the DNA level. | Bionano Genomics platform; effective for enhancer-hijacking and cryptic rearrangements [21]. |
| Bioinformatics Pipelines | Critical for analyzing NGS data, calling fusions, and filtering false positives. | Arriba, FusionCatcher (for RNA-seq); Delly (for DNA-seq); custom filtering strategies are essential [9] [8]. |
| Reference Standards | Assay validation, determining sensitivity, specificity, and limit of detection. | Commercial fusion RNA reference materials (e.g., Seraseq Fusion RNA Mix) [34] [77]. |
Oncogenic fusions typically create constitutively active proteins that drive tumor growth through key signaling pathways, making them prime targets for therapy. The central pathway activated by many receptor tyrosine kinase (RTK) fusions is illustrated below.
The clinical significance of detecting these fusions is profound. For instance, the presence of an EML4-ALK fusion in non-small cell lung cancer (NSCLC) makes patients eligible for ALK tyrosine kinase inhibitors like crizotinib and ceritinib, which have significantly improved outcomes [3]. Similarly, NTRK fusions across various tumor types can be targeted with TRK inhibitors such as larotrectinib and entrectinib [3]. The BCR-ABL1 fusion, hallmark of chronic myeloid leukemia, is successfully treated with imatinib and other TKIs [3]. Accurate detection is the critical first step that unlocks these targeted treatment options for patients.
The evidence from multiple clinical studies is clear: DNA-seq and RNA-seq are complementary, not redundant, technologies for gene fusion detection. DNA-based methods excel in identifying structural rearrangements, including those that do not produce fusion transcripts, while RNA-based methods directly capture the expressed chimeric products, often with higher sensitivity for fusions arising from complex genomic regions. Relying on a single methodology inevitably creates diagnostic blind spots, potentially depriving a subset of patients of life-changing targeted therapies. Therefore, an integrated diagnostic approach, leveraging the strengths of both DNA and RNA sequencing, represents the new gold standard for comprehensive fusion detection in oncology research and clinical practice. This synergistic strategy ensures the highest possible detection rate for these critical oncogenic drivers, ultimately advancing the goals of precision medicine.
The accurate detection of RET (REarranged during Transfection) fusions is critical for guiding targeted therapy in multiple cancers, including non-small cell lung cancer (NSCLC) and thyroid cancer. This case study objectively compares the performance of various molecular diagnostic platforms—RNA sequencing (RNA-seq), DNA sequencing (DNA-seq), fluorescence in situ hybridization (FISH), and immunohistochemistry (IHC)—for identifying these clinically actionable alterations. As selective RET inhibitors like selpercatinib and pralsetinib demonstrate response rates of 64-70% in RET-altered cancers, optimal detection methods directly impact patient eligibility for effective treatments [78]. Evidence from recent studies indicates that an integrative approach, combining DNA-seq with RNA-seq, achieves the most comprehensive detection profile, overcoming the limitations inherent in any single methodology [36] [79].
RET fusions are oncogenic drivers resulting from chromosomal rearrangements that fuse the 3' kinase domain of RET with the 5' domain of a partner gene. This rearrangement leads to constitutive activation of the RET tyrosine kinase, promoting tumorigenesis through unchecked cellular proliferation and survival signals [78]. The prevalence of RET fusions varies by tumor type, occurring in approximately 1-2% of NSCLC cases, ~10% of papillary thyroid cancers, and at lower frequencies in other solid tumors [36] [79]. Over 100 different partner genes have been identified, with KIF5B, CCDC6, and NCOA4 being the most common. The distribution of these partners is cancer-type specific: KIF5B predominates in lung cancer (66-68%), while CCDC6 and NCOA4 are more frequent in thyroid cancer [78]. This diversity, coupled with breakpoints predominantly located in intron 11 (87% of cases), presents a significant challenge for detection assays [78].
Table 1: Performance Metrics of RET Fusion Detection Platforms
| Detection Platform | Sensitivity (%) | Specificity (%) | Key Strengths | Key Limitations |
|---|---|---|---|---|
| DNA Sequencing (DNA-seq) | 100 [79] | 99.6 [79] | High-throughput; detects genomic breakpoints; good specificity [79]. | May miss fusions with large introns or complex rearrangements [36]. |
| RNA Sequencing (RNA-seq) | N/A | N/A | Confirms expressed fusion transcripts; identifies novel partners; assesses functionality [36] [78]. | Dependent on RNA quality and gene expression levels [36]. |
| Fluorescence In Situ Hybridization (FISH) | 91.7 [79] | N/A | Partner-agnostic; single-cell resolution; standardized for some cancers [79] [78]. | Lower sensitivity for NCOA4-RET (66.7%); cannot identify partner gene; subjective interpretation [79] [78]. |
| Immunohistochemistry (IHC) | Variable by partner [79] | ~82 [79] | Low cost; fast turnaround; readily available in most labs [79]. | Low overall sensitivity; variable specificity (40-85%); not recommended for standalone use [78]. |
Table 2: Partner-Gene Dependent Performance of FISH and IHC
| Fusion Partner | FISH Sensitivity | IHC Sensitivity |
|---|---|---|
| KIF5B::RET | High | 100% [79] |
| CCDC6::RET | High | 88.9% [79] |
| NCOA4::RET | 66.7% [79] | 50% [79] |
Studies directly comparing these methodologies reveal critical insights into their concordance. A 2025 study on early-stage NSCLC found a 92.3% concordance between DNA-seq and RNA-seq for identifying RET fusions. The concordance between RNA-seq and FISH was 84.6%, and between DNA-seq and FISH was 82.5% [36]. This high inter-method agreement is counterbalanced by unique detections from each platform, underscoring their complementarity.
Notably, DNA-seq sometimes identifies structural variants of unknown significance (SVUS). In one pan-cancer study, 37.5% (12/32) of these RET SVUS were confirmed as oncogenic fusions by RNA-seq, emphasizing the necessity of RNA-level confirmation for ambiguous DNA findings [79]. Conversely, FISH can be positive in cases where RNA-seq does not detect a fusion transcript, as was observed in 87.5% (7/8) of RNA-negative RET SVUS cases [79]. This discordance may arise from technical factors or biologically inactive rearrangements.
The MSK-IMPACT (Integrated Mutation Profiling of Actionable Cancer Targets) assay is a hybridization capture-based next-generation sequencing (NGS) method performed on formalin-fixed, paraffin-embedded (FFPE) tissue [79].
The Archer FusionPlex assay utilizes Anchored Multiplex PCR (AMP) for targeted RNA-seq to detect fusion transcripts, even with unknown partners [36] [78].
A novel bioinformatic approach for fusion detection involves analyzing the 5'/3' coverage imbalance in RNA-seq data. This method is particularly useful for identifying 3' fusions of druggable kinases like RET.
Coverage Imbalance Analysis Workflow: A bioinformatics pipeline for detecting gene fusions based on asymmetrical RNA-seq read coverage between the 5' and 3' ends of a gene.
Given the limitations of individual platforms, clinical laboratories are increasingly adopting reflex testing algorithms to maximize detection rates. A common and effective strategy involves starting with a broad, DNA-based NGS panel to screen for a wide range of genomic alterations, including point mutations, copy number changes, and known fusions.
Reflex to RNA-seq: Cases that are negative for a clear mitogenic driver or that harbor structural variants of unknown significance (SVUS) on DNA-seq are automatically "reflexed" to a targeted RNA-seq assay [14]. This approach significantly improves the detection of rare and novel oncogenic fusions. In one study of 1,211 NSCLC specimens, approximately 10% required reflex RNA testing, which successfully identified actionable fusions in 9 cases (including ALK, BRAF, NRG1, NTRK3, ROS1, and RET) that were missed by the initial amplicon-based DNA assay [14].
Complementary FISH/IHC: In specific diagnostic scenarios, or when NGS is inconclusive/unavailable, FISH and IHC can provide orthogonal validation. However, their variable performance, particularly for fusions involving NCOA4, must be considered [79].
Integrated RET Fusion Testing Algorithm: A decision-tree workflow illustrating a reflex testing model that combines DNA and RNA sequencing for comprehensive fusion detection.
Table 3: Essential Reagents and Kits for RET Fusion Analysis
| Product/Technology | Primary Function | Application Context |
|---|---|---|
| QIAamp DNA FFPE Tissue Kit (Qiagen) | High-quality DNA extraction from challenging FFPE samples. | DNA-seq library prep [36]. |
| KAPA Hyper Prep Kit (Roche) | Library construction for NGS, compatible with hybridization capture. | DNA-seq library preparation for assays like MSK-IMPACT [36]. |
| Archer FusionPlex Solid/Lymphoma Panels | Targeted RNA-seq library prep using Anchored Multiplex PCR (AMP). | Detection of fusion transcripts from RNA, agnostic to known partners [21] [78]. |
| BWA (Burrows-Wheeler Aligner) | Alignment of sequencing reads to a reference genome. | Standard primary analysis step for both DNA-seq and RNA-seq data [36]. |
| DELLY | Structural variant caller from DNA-seq data. | Identification of genomic rearrangements, including RET fusions, from DNA-seq [79]. |
| Trimmomatic | Pre-processing of raw NGS reads to remove adapters and low-quality bases. | Quality control (QC) step in both DNA-seq and RNA-seq pipelines [36]. |
RET Signaling and Fusion Mechanism: A comparison of ligand-dependent normal RET signaling versus the constitutive activation caused by oncogenic gene fusions.
No single platform is universally superior for detecting RET fusions. DNA-seq offers high sensitivity and specificity for known fusions but can miss functionally relevant events or yield inconclusive SVUS. RNA-seq directly confirms expressed fusion transcripts and identifies novel partners, making it ideal for confirming DNA findings and Interrogating fusion-negative yet driver-negative cases. FISH and IHC have specific, more limited roles due to variable, partner-dependent sensitivity and an inability to identify the specific fusion partner.
The evidence strongly supports an integrative diagnostic approach. A synergistic workflow, beginning with DNA-seq and reflexing to RNA-seq in ambiguous or negative cases, provides the most comprehensive and clinically actionable profiling for RET fusions. This strategy maximizes patient eligibility for life-saving targeted therapies like selpercatinib and pralsetinib, embodying the precision medicine mandate in modern oncology.
Gene fusions are critical molecular biomarkers in cancer, influencing diagnosis, prognosis, and therapeutic decisions. The accurate detection of these aberrations depends heavily on the choice of genomic approach—DNA-based or RNA-based sequencing—and the specific bioinformatics tools employed. This guide provides an objective comparison of fusion detection methodologies and tools, synthesizing performance data from recent studies to inform researchers and clinicians in selecting optimal approaches for their specific applications. Evidence consistently demonstrates that integrated DNA-RNA sequencing approaches maximize detection sensitivity for clinically relevant fusions, with performance varying significantly across tools and cancer types [21] [34] [69].
Table 1: Overall Performance Comparison of Fusion Detection Approaches
| Approach | Key Strengths | Key Limitations | Ideal Use Cases |
|---|---|---|---|
| RNA-seq Tools | High sensitivity for expressed chimeric transcripts; identifies fusion products with functional potential | May miss enhancer-hijacking events; dependent on expression levels | Routine fusion screening in clinical settings; therapy selection |
| DNA-seq Tools | Detects structural variants regardless of expression; identifies cryptic, enhancer-driven events | May miss fusions from intrachromosomal deletions; cannot confirm expression | Research discovery; comprehensive structural variant detection |
| Combined DNA-RNA | Maximizes detection of both structural variants and expressed fusions; highest clinical utility | Higher cost and computational requirements; complex workflow | Precision oncology; complex diagnostic cases |
Gene fusions arise from genomic rearrangements including chromosomal translocations, deletions, inversions, or duplications, and serve as important diagnostic, prognostic, and predictive biomarkers in oncology. The detection of these events can be approached at either the DNA level, identifying structural rearrangements in the genome, or at the RNA level, identifying chimeric transcripts resulting from these rearrangements. Each approach offers distinct advantages and limitations, with recent studies demonstrating their complementary nature [21] [34].
DNA-based methods excel at identifying structural variants regardless of their transcriptional activity, making them particularly valuable for detecting enhancer-hijacking events that may not produce fusion transcripts but can activate oncogenes through positional effects. In contrast, RNA-based methods detect expressed fusion products, providing functional validation of the DNA rearrangement and often representing the direct targets for therapeutic intervention. Understanding these fundamental differences is crucial for selecting appropriate detection strategies in both research and clinical settings [21] [69].
Multiple studies have comprehensively evaluated the performance of fusion detection tools using various benchmarking datasets, including simulated fusion transcripts, spike-in controls, and clinically validated samples. The sensitivity, specificity, and computational efficiency vary considerably across tools.
Table 2: Performance Metrics of Leading RNA-seq Fusion Detection Tools
| Tool | Sensitivity (%) | Specificity (%) | Computational Efficiency | Key Features |
|---|---|---|---|---|
| Arriba | 88-100* | High | Fast (<1 hour/sample) | Detects fusions, intragenic rearrangements, truncations |
| Fusion-Bloom | 96* | High | Moderate (10-12 hours/100M reads) | de novo assembly approach; base-pair precision |
| STAR-Fusion | High | High | Fast | Based on STAR aligner; well-documented |
| FusionCatcher | High | High | Moderate | Comprehensive pipeline; multiple alignment tools |
| JAFFA | Moderate | High | Moderate | Hybrid assembly approach; good for long reads |
| deFuse | Moderate | Moderate | Slow | Early tool; largely superseded by newer methods |
*Sensitivity varies based on expression levels and dataset characteristics
In a landmark comparison of 12 fusion detection tools, performance varied significantly based on RNA-seq data quality, read length, and sequencing depth. Most tools showed trade-offs between sensitivity and false discovery rates, with no single tool performing optimally across all datasets [80]. However, more recent evaluations have identified several tools that consistently outperform others.
Arriba demonstrates particularly strong performance across multiple benchmarking datasets, identifying 88 of 150 simulated fusions at the lowest expression level (5-fold), all synthetic fusions in spike-in experiments, and 78 validated fusions in the MCF-7 cell line. This represents a sensitivity surplus of 13-60% compared to the next best method depending on the dataset [7]. Fusion-Bloom also shows excellent performance, detecting 48 of 50 known fusions with zero false positives in one benchmark, and all fusions across all molarities in spike-in experiments [81].
A comprehensive 2025 study comparing targeted RNA-seq and optical genome mapping (OGM) in 467 acute leukemia cases revealed striking differences in detection capabilities between approaches. The overall concordance rate was 88.1%, but significant variations emerged when examining specific fusion types [21].
RNA-seq slightly outperformed OGM for fusions arising from intrachromosomal deletions, which were sometimes misinterpreted by OGM as simple deletions. Conversely, OGM uniquely detected 37 of 234 (15.8%) clinically relevant rearrangements, while RNA-seq exclusively identified 22 of 234 (9.4%). The most dramatic difference was observed for enhancer-hijacking lesions (including MECOM, BCL11B, and IGH rearrangements), which showed only 20.6% concordance between platforms, with many events missed by RNA-seq [21].
These findings underscore the complementary nature of DNA and RNA-based approaches. RNA-seq proves more sensitive for detecting expressed chimeric fusions, while OGM (a DNA-level method) excels at identifying cryptic, enhancer-driven events that do not generate fusion transcripts [21].
Recognizing the limitations of single-modality approaches, researchers have developed integrated DNA-RNA sequencing assays that simultaneously leverage both data types. A 2025 validation study of a combined RNA and DNA exome assay across 2,230 clinical tumor samples demonstrated significantly improved detection of clinically actionable alterations compared to DNA-only testing [69].
This integrated approach enabled direct correlation of somatic alterations with gene expression, recovery of variants missed by DNA-only testing, and improved detection of gene fusions. The assay uncovered clinically actionable alterations in 98% of cases and revealed complex genomic rearrangements that would likely have remained undetected without RNA data [69].
Similarly, a custom-designed integrated DNA and RNA-based NGS assay for solid tumors demonstrated 100% sensitivity and specificity after confirming a previously false-negative TPM3::NTRK1 fusion. The study found that DNA and RNA results complemented each other, with each modality detecting fusions missed by the other [34].
Robust evaluation of fusion detection tools requires diverse benchmarking datasets that mimic real-world scenarios:
In silico simulated datasets: Computer-generated fusion transcripts merged into real RNA-seq data from benign tissue, enabling precise sensitivity measurements across expression levels (typically 5- to 200-fold) [7] [80].
Spike-in reference standards: Synthetic RNA molecules mimicking oncogenic fusions spiked into RNA libraries at varying concentrations (e.g., 10^-8.57 pMol to 10^-3.47 pMol), allowing sensitivity limits to be determined [34] [7].
Cell line datasets: Well-characterized cancer cell lines (e.g., MCF-7) with orthogonally validated fusions, providing real-world performance assessment [7] [80].
Clinical patient cohorts: Samples from defined patient populations (e.g., ICGC early-onset prostate cancer cohort) with known prevalence of specific fusions [7].
Comprehensive tool validation should incorporate multiple approaches:
Orthogonal validation: Confirmation of predicted fusions using independent methods such as FISH, RT-PCR, or Sanger sequencing [34] [80].
Tiered classification: Classification of variants according to established guidelines (e.g., ACMG/ClinGen, AMP/ASCO/CAP) into tiers based on clinical relevance [21].
Limit of detection (LOD) assessment: Determination of minimum mutation abundance (e.g., 5% for DNA, 250-400 copies/100ng for RNA) for reliable fusion detection through serial dilution experiments [34].
Figure 1: Comprehensive Fusion Detection Workflow integrating both DNA and RNA sequencing approaches for maximal sensitivity.
Figure 2: Decision Framework for selecting appropriate fusion detection strategies based on research goals, resources, and fusion types of interest.
Table 3: Essential Research Reagents and Computational Tools for Fusion Detection Studies
| Category | Specific Products/Tools | Function/Purpose |
|---|---|---|
| Wet Lab Reagents | TruSeq stranded mRNA kit (Illumina); SureSelect XTHS2 (Agilent); AllPrep DNA/RNA kits (Qiagen) | Library preparation; nucleic acid extraction |
| Reference Standards | GeneWell fusion reference standards; synthetic spike-in RNA controls; characterized cell lines (e.g., MCF-7) | Assay validation; sensitivity determination; quality control |
| Computational Tools | STAR, HISAT2, BWA aligners; Fusion-Bloom, Arriba, STAR-Fusion fusion detectors; DESeq2, EdgeR for expression | Data analysis; fusion detection; differential expression |
| Validation Tools | BLAT, BLAST, Sanger sequencing; IGV visualization; orthogonal assays (FISH, RT-PCR) | Results confirmation; visual verification; experimental validation |
Based on comprehensive benchmarking studies, the following recommendations emerge for selecting fusion detection approaches:
For clinical diagnostics: Implement combined DNA-RNA sequencing where possible, as this approach detects the broadest range of clinically actionable fusions, with demonstrated utility in 98% of cases in large validation studies [69].
For clinical settings with limited resources: Prioritize RNA-seq with high-performance tools like Arriba or STAR-Fusion, which offer the best balance of sensitivity, speed, and accuracy for detecting therapeutically relevant expressed fusions [7].
For research discovery: Select approaches based on the biological questions. DNA-based methods (OGM, DNA-seq with intronic tiling) are superior for identifying structural variants and enhancer hijacking events, while RNA-based methods excel at detecting functional fusion transcripts [21].
For method validation: Employ standardized benchmarking datasets including spike-in controls, simulated fusions, and orthogonally validated samples to properly assess tool performance [7] [80].
As sequencing technologies continue to evolve and computational methods improve, the integration of multi-omic approaches will likely become standard practice in both research and clinical settings, further enhancing our ability to detect these critical genomic events with implications for cancer diagnosis and treatment.
The choice between RNA-seq and DNA-seq for fusion detection is not a matter of selecting a superior technology, but of understanding their powerful synergy. DNA-seq effectively identifies genomic rearrangements, including those that may not be expressed, while RNA-seq provides direct evidence of oncogenic, expressed fusion transcripts and often discovers novel partners. Robust validation studies and real-world clinical data consistently demonstrate that a combined approach significantly increases the detection of clinically actionable fusions—by over 21% in pan-cancer cohorts—compared to either method alone. For the future of precision medicine, integrating DNA and RNA sequencing into comprehensive genomic profiling is paramount. This strategy ensures the most complete molecular diagnosis, expands the population of patients eligible for matched targeted therapies, and ultimately paves the way for improved clinical outcomes across a wide spectrum of cancers.