RNA-seq vs. DNA-seq for Gene Fusion Detection: A Comprehensive Guide for Researchers and Clinicians

Kennedy Cole Dec 02, 2025 9

Accurate detection of oncogenic gene fusions is critical for cancer classification, prognosis, and targeted therapy.

RNA-seq vs. DNA-seq for Gene Fusion Detection: A Comprehensive Guide for Researchers and Clinicians

Abstract

Accurate detection of oncogenic gene fusions is critical for cancer classification, prognosis, and targeted therapy. This article provides a systematic comparison of RNA sequencing (RNA-seq) and DNA sequencing (DNA-seq) methodologies for identifying these crucial biomarkers. We explore the foundational principles defining each technology's strengths, delve into their specific applications across cancer types and drug discovery, address common challenges and optimization strategies, and present rigorous validation data and performance benchmarks. For researchers, scientists, and drug development professionals, this review synthesizes evidence demonstrating that RNA-seq and DNA-seq are highly complementary. Integrating both approaches maximizes detection sensitivity for clinically actionable fusions, thereby optimizing patient stratification for precision oncology.

The Biological and Technical Foundations of Fusion Detection

What Are Gene Fusions? Defining Key Drivers in Cancer Pathogenesis

Gene fusions, hybrid genes formed from the juxtaposition of two previously independent genes, are well-established as potent driver mutations in cancer pathogenesis [1] [2]. These molecular events arise from chromosomal rearrangements such as translocations, inversions, and deletions, leading to the production of fusion proteins with oncogenic properties, such as constitutively active tyrosine kinases or aberrant transcription factors [2] [3]. Their significance is underscored by their role as defining features of certain cancer subtypes and as prime targets for therapeutic intervention, making their accurate detection a critical focus in oncological research and precision medicine [2] [3].

The Molecular Biology of Oncogenic Fusions

The formation of a gene fusion typically originates from a DNA-level rearrangement. Key mechanisms include translocation, where segments from two different chromosomes break and swap places; deletion, which removes an intervening DNA segment to bring two genes together; and inversion, where a chromosome segment is reversed end-to-end [2] [3]. The classic example is the BCR-ABL1 fusion, resulting from a reciprocal translocation between chromosomes 9 and 22 that forms the Philadelphia chromosome, a hallmark of chronic myeloid leukemia (CML) [2] [3]. This fusion produces a constitutively active tyrosine kinase that drives uncontrolled cell proliferation [3].

Oncogenic fusion proteins can function through several mechanisms. Many, like EML4-ALK in non-small cell lung cancer (NSCLC), lead to constitutive activation of tyrosine kinases, perpetually stimulating growth and survival pathways such as MAPK and PI3K-AKT [2] [3]. Others, such as TMPRSS2-ERG in prostate cancer, place a transcription factor under the control of a strong promoter, leading to its deregulated overexpression and disrupting normal gene expression programs [2]. A third mechanism, exemplified by surface-bound NRG1 fusions, can drive aberrant paracrine signaling by activating receptors on neighboring cells [2]. The diagram below illustrates these key mechanisms through which gene fusions drive oncogenesis.

G cluster_1 Mechanism 1: Constitutive Kinase Activation cluster_2 Mechanism 2: Promoter-Driven Overexpression cluster_3 Mechanism 3: Aberrant Paracrine Signaling Fusion Fusion Mech1 Fusion creates a constitutively active kinase (e.g., BCR-ABL1, EML4-ALK) Fusion->Mech1 Mech2 Strong promoter drives overexpression of oncogene (e.g., TMPRSS2-ERG) Fusion->Mech2 Mech3 Ligand fusion causes aberrant receptor activation (e.g., NRG1 fusions) Fusion->Mech3 Outcome1 Ligand-independent activation of signaling pathways (MAPK, PI3K-AKT) Mech1->Outcome1 leads to Outcome2 Deregulation of transcription & cellular identity Mech2->Outcome2 leads to Outcome3 Activation of signaling in neighboring cells Mech3->Outcome3 leads to

Methodological Showdown: DNA-Seq vs. RNA-Seq for Fusion Detection

The accurate identification of gene fusions is foundational for both research and clinical decision-making. Next-generation sequencing (NGS) offers two primary approaches: DNA sequencing (DNA-seq) and RNA sequencing (RNA-seq), each with distinct advantages and limitations [4].

DNA-seq (including whole-genome and targeted sequencing) aims to detect the underlying genomic rearrangement that creates the fusion gene. However, this can be challenging because breakpoints often fall within long, repetitive intronic regions, making them difficult to amplify, sequence, and map accurately [4]. While DNA-seq can confirm a structural variant is present, it cannot confirm whether it is transcribed into a functional, expressed fusion transcript [5].

RNA-seq directly sequences the transcriptome, capturing the expressed RNA molecules. This makes it uniquely powerful for identifying the functional, expressed fusion transcripts present in the cell [4] [6]. Since RNA-seq skips introns, it provides a more direct and often more sensitive method for detecting the relevant chimeric RNA, provided the fusion gene is actively expressed [4]. The fundamental differences between these two approaches are summarized in the table below.

Table 1: Core Differences Between DNA-seq and RNA-seq for Fusion Gene Detection

Feature DNA-Sequencing (DNA-seq) RNA-Sequencing (RNA-seq)
Target Molecule Genomic DNA RNA (reverse-transcribed to cDNA)
Primary Purpose Identify structural rearrangements & breakpoints Identify expressed fusion transcripts
Key Challenge Breakpoints in long, repetitive introns; cannot confirm expression [4] Cannot detect fusions with low/no expression [4]
Information Gained Presence of genetic alteration Functional, transcribed mRNA product
Ideal Use Case Comprehensive discovery of structural variants Identifying expressed, potentially actionable oncogenic drivers

The following diagram outlines the generic workflow for detecting gene fusions from RNA-seq data, highlighting the key steps from sample preparation to final validation.

G cluster_bioinfo Bioinformatics Analysis Start Sample (Tissue/Blood) RNA RNA Extraction Start->RNA End Validated Gene Fusion LibPrep Library Preparation & Sequencing RNA->LibPrep Bioinfo Bioinfo LibPrep->Bioinfo QC Quality Control & Read Filtering Bioinfo->QC Visual Visualization & Manual Review (e.g., IGV) Bioinfo->Visual Align Read Alignment to Reference Genome QC->Align FusionCall Fusion Detection using specialized tools Align->FusionCall Ortho Orthogonal Validation (e.g., RT-PCR, FISH) Visual->Ortho Ortho->End

Performance Benchmarking: Fusion Detection Tools and Technologies

The bioinformatic detection of fusions from RNA-seq data relies on specialized algorithms that identify chimeric reads—sequences that map to two different genes. Multiple tools have been developed, each with different strengths. Arriba is a fast, accurate algorithm designed for clinical applications, demonstrating high sensitivity even with few supporting reads [7]. STAR-Fusion is another widely used, accurate tool known for its reliability [7]. With the advent of long-read sequencing (e.g., PacBio, Oxford Nanopore), new tools like GFvoter have emerged, leveraging longer reads to span complex fusion junctions with high precision [8].

Table 2: Performance Benchmark of Fusion Detection Tools on Real and Simulated RNA-seq Datasets

Tool Average Precision Average Recall (Sensitivity) Key Performance Insight
GFvoter (Long-read) 58.6% Comparable or superior to other tools Achieved the highest average F1 score (0.569), indicating best precision-recall balance [8].
LongGF (Long-read) 39.5% Varies by dataset Lower precision compared to GFvoter [8].
JAFFAL (Long-read) 30.8% Varies by dataset Lower precision compared to GFvoter [8].
Arriba (Short-read) High (specific data not shown) High Rediscovered 88/150 simulated fusions at low expression; superior sensitivity on multiple benchmarks [7].
FusionCatcher (Short-read) High (specific data not shown) High Identified all synthetic spike-in fusions in benchmark [7].

The performance of RNA-seq in a clinical setting is robust. A 2021 study on 806 acute myeloid leukemia (AML) samples found that RNA-seq detected 90% of fusion events that were reported with high evidence by conventional diagnostics (karyotyping, FISH, RT-PCR) [9]. Similarly, a 2024 study in acute leukemia demonstrated a 83.3% sensitivity for RNA-seq compared to conventional methods, while also identifying novel fusions missed by standard approaches [6].

The Single-Cell and Long-Read Revolution

Recent technological advances are further refining fusion detection. Single-cell RNA-seq (scRNA-seq) allows researchers to detect fusions at the single-cell level, revealing tumor heterogeneity and identifying rare subclones harboring driver fusions. The tool scFusion was developed specifically for this purpose, effectively controlling for the high technical noise in scRNA-seq data to identify fusions with high sensitivity and a low false discovery rate [10]. Meanwhile, long-read transcriptome sequencing (e.g., PacBio) produces reads that are thousands of bases long, enabling a single read to span an entire fusion junction without assembly, simplifying detection and reducing false positives [8].

Experimental Design and the Research Toolkit

A typical experiment to identify gene fusions via RNA-seq involves a multi-step process. First, total RNA is extracted from tumor samples or cell lines, ensuring high quality and integrity (RIN > 8). The RNA is then used to prepare a sequencing library, which is typically sequenced on an Illumina platform to generate high-throughput short reads (e.g., 2x150 bp) [7] [6].

For bioinformatics analysis, the raw sequencing reads are first processed for quality control using tools like FastQC. High-quality reads are then aligned to a reference genome (e.g., GRCh38) using a splice-aware aligner such as STAR [7] [10]. The aligned data is subsequently analyzed by one or more fusion detection algorithms (e.g., Arriba, STAR-Fusion). Using two tools and taking the union or intersection of their predictions is a common practice to improve robustness [7]. The final list of high-confidence fusion calls must undergo manual inspection in a genome browser (e.g., IGV) and orthogonal validation using an independent method such as RT-PCR or FISH [6] [9].

Table 3: Essential Research Reagent Solutions for Fusion Detection Studies

Reagent / Tool Category Example Products Critical Function in Experiment
RNA Extraction & QC TRIzol, Qiagen RNeasy Kits, Agilent Bioanalyzer Isolate high-quality, intact RNA for accurate transcriptome representation.
Library Prep Kits Illumina Stranded mRNA Prep Convert RNA into a sequence-ready library, often with barcoding for multiplexing.
Alignment Software STAR, HISAT2, Minimap2 (for long-reads) Map sequencing reads to the reference genome, crucially identifying splice and fusion junctions.
Fusion Callers Arriba, STAR-Fusion, GFvoter, FusionCatcher Apply specialized algorithms to aligned reads to identify and filter candidate gene fusions.
Validation Reagents FISH probes, PCR primers, TaqMan assays Provide independent, orthogonal confirmation of high-priority fusion events.

Gene fusions are critical drivers of oncogenesis, functioning through diverse mechanisms such as constitutive kinase activation and transcriptional deregulation. While DNA-seq can identify the genomic rearrangements behind fusions, RNA-seq has emerged as the superior method for detecting the expressed, functional fusion transcripts that are most relevant for cancer biology and targeted therapy. The ongoing development of more accurate bioinformatics tools like Arriba and GFvoter, coupled with revolutionary technologies like single-cell and long-read sequencing, is steadily enhancing our detection capabilities. The integration of RNA-seq into clinical workflows provides a comprehensive and powerful approach to uncovering these key molecular alterations, ultimately advancing precision oncology and improving patient outcomes.

Structural variants (SVs) represent a category of genomic alterations involving segments of DNA larger than 50 base pairs, including deletions, duplications, inversions, translocations, and insertions. These variants play significant roles in human disease, particularly in cancer, where they can drive tumorigenesis through mechanisms such as oncogene activation, tumor suppressor inactivation, and the creation of novel fusion genes. DNA sequencing (DNA-seq) provides the fundamental technology for directly interrogating the genomic blueprint to identify these structural alterations at their source. Unlike RNA sequencing (RNA-seq), which examines the transcriptomic consequences of genetic changes, DNA-seq reveals the underlying architectural variations in the genome itself, offering complementary insights for comprehensive genomic profiling in both research and clinical diagnostics.

The ability to accurately detect SVs has profound implications for understanding cancer biology and advancing personalized medicine. Numerous SVs are now recognized as clinically actionable biomarkers, with fusion genes involving drivers such as ALK, RET, ROS1, and NTRK serving as prime examples for which targeted therapies have been developed. However, the detection of these variants presents substantial technical challenges, leading to the development of diverse DNA-seq approaches with varying capabilities and limitations for comprehensive structural variant interrogation.

DNA Sequencing Methodologies for Structural Variant Detection

DNA-seq Approach Categories

DNA sequencing approaches for structural variant detection can be broadly categorized into three main methodologies, each with distinct strengths and limitations for SV identification:

Whole Genome Sequencing (WGS) sequences the entire DNA genome, enabling the detection of virtually any type of mutation throughout both coding and non-coding regions. This approach can identify single nucleotide variants (SNVs), insertions and deletions (indels), structural variants, and copy number variations (CNVs) across the complete genome. WGS is particularly valuable for discovering novel structural variants in regions outside traditional exonic targets and for analyzing samples without established reference genomes [4].

Whole Exome Sequencing (WES) focuses specifically on sequencing the protein-coding regions (exons) of the genome, which represent approximately 3% of the human genome. This targeted approach efficiently identifies SNVs and indels within exonic regions while omitting regulatory elements such as promoters and enhancers. While WES is more cost-effective and generates less data than WGS, its limited genomic coverage reduces its effectiveness for detecting structural variants that involve non-coding or intergenic regions [4].

Targeted Sequencing concentrates on a predetermined subset of genomic regions, such as specific genes known to be involved in disease pathways. This approach offers the most cost-effective and focused analysis, with enhanced sensitivity for detecting low-frequency variants—a particular advantage in heterogeneous samples like tumors. However, its targeted nature means it can only identify structural variants within the preselected genomic regions and may miss novel or unexpected rearrangements [4].

Experimental Workflow for DNA-seq-Based SV Detection

The standard workflow for detecting structural variants via DNA-seq involves multiple critical steps from sample preparation through bioinformatic analysis. The following diagram illustrates this comprehensive process:

D Sample Sample DNA DNA Sample->DNA DNA Extraction Library Library DNA->Library Fragmentation & Library Prep Sequence Sequence Library->Sequence Sequencing Align Align Sequence->Align Read Alignment to Reference Genome Call Call Align->Call SV Calling Algorithms Annotate Annotate Call->Annotate Variant Annotation & Filtering

Figure 1: DNA-seq Structural Variant Detection Workflow

The process begins with DNA extraction from patient samples (e.g., blood, saliva, tissue biopsies), leveraging DNA's relative stability compared to RNA. Following extraction, DNA undergoes fragmentation and library preparation, which may include mechanical shearing, adaptor ligation, and PCR amplification depending on the specific protocol. The prepared libraries are then sequenced using platforms such as Illumina, Ion Torrent, PacBio, or Oxford Nanopore, each offering different trade-offs in read length, accuracy, and throughput [4] [11].

The resulting sequencing reads are aligned to a reference genome using specialized tools like BWA or Bowtie, which map the short DNA fragments to their corresponding genomic positions. SV calling algorithms then analyze the aligned reads for patterns indicative of structural variants, such as discordant read pairs, split reads, or read depth anomalies. Commonly used tools for this purpose include GATK, Samtools, CNVnator, and Lumpy [4]. Finally, detected variants undergo annotation and filtering to determine their potential functional consequences using tools like ANNOVAR or VEP, and to prioritize likely pathogenic events based on population frequency, predicted impact on coding sequences, and overlap with known regulatory elements.

Performance Comparison: DNA-seq Platforms and Methodologies

DNA-seq Platform Performance Characteristics

Different DNA sequencing platforms offer distinct performance characteristics that significantly impact their effectiveness for structural variant detection. The following table summarizes key metrics across major sequencing platforms:

Table 1: Performance Comparison of DNA Sequencing Platforms for SV Detection

Platform Read Length Accuracy Key Strengths for SV Detection Primary Limitations
Illumina HiSeq/NovaSeq Short (150-250 bp) High (>99.9%) Most consistent genome coverage; robust indel detection [12] Limited in repetitive regions; short reads hamper complex SV resolution
PacBio HiFi 10-25 kb >99.9% (HiFi consensus) Excellent for complex regions; high mapping accuracy; top SV detection performance [13] Higher cost per genome; moderate throughput
Oxford Nanopore Up to >1 Mb ~98-99.5% (Q20+ chemistry) Ultra-long reads resolve large SVs; portability; real-time analysis [13] Historically lower accuracy (improving with recent chemistry)
Ion Torrent Mid-length Mid-accuracy Fast turnaround; lower capital cost [12] Higher error rates in homopolymers; moderate read lengths
BGISEQ-500/MGISEQ-2000 Short Low error rates Competitive cost structure Limited independent validation in clinical settings

The Association of Biomolecular Resource Facilities (ABRF) Next-Generation Sequencing Study comprehensively benchmarked these platforms, revealing that each exhibits particular strengths depending on the variant type and genomic context being interrogated [12]. Among short-read instruments, Illumina's HiSeq 4000 and X10 systems provided the most consistent, highest genome coverage, while NovaSeq 6000 using 2 × 250-bp read chemistry proved most robust for capturing known insertion/deletion events. For long-read platforms, PacBio circular consensus sequencing (CCS) demonstrated the highest reference-based mapping rate and lowest non-mapping rate, with both PacBio CCS and Oxford Nanopore technologies showing superior sequence mapping in repeat-rich areas and across homopolymers [12].

DNA-seq Method Performance in Clinical Detection

In clinical settings, the performance of DNA-seq methodologies varies significantly based on the specific application and variant type being targeted. The following table compares the detection capabilities across DNA-seq approaches for key structural variants:

Table 2: DNA-seq Method Performance for Oncogenic Fusion Detection in Clinical Samples

DNA-seq Method Detection Rate for Known Fusions Advantages Limitations
Amplicon-based DNA/RNA-seq 82.6% of theoretical fusion detection capability [14] Streamlined workflow; cost-effective for targeted detection Misses rare/novel fusions; limited by primer design
Hybridization-capture-based RNA-seq (reflex testing) Additional ~10% yield over amplicon-based alone [14] Improved rare/novel fusion detection; maximizes therapy eligibility Requires secondary testing; increased cost and time
Short-read WGS Variable depending on coverage and bioinformatics Comprehensive genome-wide coverage Misses complex rearrangements in repetitive regions
Long-read WGS Highest for complex SVs Resolves repetitive regions; phased variant calling Higher cost; emerging clinical validation

A study of 1,211 non-small cell lung carcinoma specimens highlights these performance differences, showing that approximately 10% of cases required reflex hybridization-capture-based RNA sequencing after initial negative amplicon-based DNA/RNA sequencing [14]. In these reflex-tested cases, otherwise missed clinically actionable fusions involving ALK, BRAF, NRG1, NTRK3, ROS1, and RET were identified—none of which were detected by the initial amplicon-based assay. Analysis of the American Association for Cancer Research Project Genie database (v15.1) encompassing 20,900 NSCLC cases confirmed that while amplicon-based assays could theoretically detect 82.6% of known fusions, a significant minority require alternative approaches for identification [14].

DNA-seq Versus RNA-seq for Fusion Gene Detection

Comparative Detection Approaches

The detection of fusion genes represents a critical application of structural variant analysis in cancer genomics, with both DNA-seq and RNA-seq offering complementary approaches. The fundamental differences between these methodologies are illustrated in the following diagram:

D DNA_RNA_Comparison DNA-seq vs RNA-seq Fusion Detection DNA_Node DNA-seq Approach DNA_RNA_Comparison->DNA_Node RNA_Node RNA-seq Approach DNA_RNA_Comparison->RNA_Node DNA_Mechanism Detects genomic rearrangements (translocations, inversions, deletions) DNA_Node->DNA_Mechanism DNA_Strength Strengths: • Direct genomic evidence • Identifies non-expressed fusions • Reveals mechanism DNA_Mechanism->DNA_Strength DNA_Weakness Limitations: • Breakpoints often in long introns • Repetitive regions challenging • Functional impact uncertain DNA_Strength->DNA_Weakness RNA_Mechanism Detects fusion transcripts (expressed chimeric RNAs) RNA_Node->RNA_Mechanism RNA_Strength Strengths: • Confirms functional expression • Identifies transcribed breakpoints • Reveals splicing patterns RNA_Mechanism->RNA_Strength RNA_Weakness Limitations: • Misses non-expressed fusions • Expression level dependent • Transcriptional noise RNA_Strength->RNA_Weakness

Figure 2: DNA-seq vs. RNA-seq Approaches to Fusion Detection

DNA-seq identifies fusion genes by detecting the underlying genomic rearrangements that bring two separate genes into proximity, such as chromosomal translocations, inversions, or deletions. This approach provides direct evidence of the structural variant at the DNA level but faces challenges when breakpoints occur within long intronic regions or repetitive sequences, which are difficult to resolve with short-read technologies [4]. Additionally, DNA-seq cannot determine whether a genomic rearrangement produces a functionally expressed fusion transcript.

In contrast, RNA-seq detects the chimeric transcripts resulting from expressed fusion genes, providing direct evidence of functional consequences at the transcript level. This approach naturally focuses on clinically relevant expressed fusions and avoids the challenges of intronic breakpoint mapping. However, RNA-seq may miss genomic rearrangements that do not produce stable transcripts or those expressed at low levels, and it can be confounded by transcriptional noise or trans-splicing events [4].

Clinical Performance Data

Comparative studies in clinical cohorts demonstrate the complementary value of DNA and RNA sequencing approaches. In an analysis of 806 acute myeloid leukemia samples, routine diagnostic methods (primarily karyotyping and FISH) identified 138 true fusions, with RNA-seq detecting 89.9% of these benchmark fusions [9]. Notably, the samples in which RNA-seq failed to detect fusion genes generally had lower and more inhomogeneous sequence coverage, particularly for genes including CBFB and KMT2A [9].

Long-read sequencing technologies have emerged as particularly powerful tools for fusion detection, as they can span complex rearrangement structures and provide complete transcript information. PacBio's HiFi sequencing enables full-length RNA isoform sequencing (Iso-Seq), which resolves complex fusions with precise breakpoints and complete sequence readouts of associated fusion transcripts [15]. Similarly, Oxford Nanopore technologies generate ultra-long reads capable of encompassing entire fusion transcripts in single sequencing reads [13]. Recent tools such as GFvoter, designed specifically for long-read transcriptome data, have demonstrated superior performance in fusion detection, achieving the highest F1 scores across multiple experimental datasets compared to alternative methods [8].

Table 3: Essential Research Reagent Solutions for DNA-seq SV Detection

Reagent/Resource Function Application Notes
High-molecular-weight DNA extraction kits Preserve long DNA fragments for optimal SV detection Critical for long-read sequencing; maintain DNA integrity
Fragment libraries Prepare DNA for sequencing through fragmentation and adaptor ligation Vary by platform; mechanical shearing common for WGS [11]
Hybridization capture baits Enrich specific genomic regions for targeted sequencing Enable focused SV detection in genes of interest
BLESS/DSBCapture/BLISS reagents Map DNA double-strand breaks (DSBs) experimentally Identify DSB-prone regions linked to SV formation [16]
Chromatin immunoprecipitation (ChIP) reagents Profile protein-DNA interactions and histone modifications Understand SV formation in chromatin context [16]
ATAC-seq reagents Assess chromatin accessibility genome-wide Correlate open chromatin with SV susceptibility [17]
BWA/Bowtie alignment tools Map sequencing reads to reference genomes Foundation for SV detection pipelines [4]
GATK/Samtools variant callers Identify genetic variants from aligned reads Detect SNVs/indels; prerequisite for some SV callers [4]
CNVnator/Lumpy/SVIM Specifically detect structural variants Specialized for different SV types and size ranges [13] [4]
ANNOVAR/VEP Annotate functional consequences of variants Prioritize potentially pathogenic SVs [4]

DNA sequencing provides an essential foundation for interrogating the genomic blueprint of structural variants, offering direct detection of chromosomal rearrangements at their origin. While multiple DNA-seq approaches exist—from targeted panels to whole-genome sequencing—each presents distinct advantages and limitations for comprehensive SV detection. The emergence of long-read sequencing technologies has significantly improved the resolution of complex structural variants, particularly in repetitive regions that challenge short-read platforms.

In clinical practice, DNA-seq-based fusion detection identifies approximately 82.6% of theoretically detectable oncogenic fusions, with reflex RNA-seq testing capturing an additional 10% of cases that would otherwise be missed [14]. This demonstrates the complementary nature of genomic and transcriptomic approaches for comprehensive fusion detection. As sequencing technologies continue to advance, with both PacBio HiFi and Oxford Nanopore platforms achieving increasingly higher accuracy and longer read lengths, the integration of DNA and RNA sequencing approaches will likely become standard practice in clinical diagnostics, ultimately expanding patient eligibility for targeted therapies and clinical trials through improved detection of rare and novel structural variants.

Gene fusions, hybrid molecules formed by the joining of two previously separate genes, represent a critical class of genomic alterations with profound implications in cancer research and therapeutic development. These chimeric entities typically arise from chromosomal rearrangements such as translocations, inversions, or deletions, and can function as powerful oncogenic drivers by activating proto-oncogenes or inactivating tumor suppressors. The detection of fusion transcripts has become indispensable for disease classification, risk stratification, and therapeutic decision-making, particularly with the growing availability of targeted therapies against fusion-driven cancers.

The transcriptome represents the complete set of RNA transcripts produced by the genome at any given time, providing a dynamic view of genetic activity. Within this landscape, RNA sequencing (RNA-seq) has emerged as a powerful methodology for capturing expressed fusion transcripts, offering distinct advantages over DNA-based approaches. While DNA sequencing reveals the structural blueprint of genetic alterations, RNA-seq directly interrogates the functional expression of these changes, distinguishing driver fusion events with oncogenic potential from passenger events that may not contribute to tumorigenesis. This fundamental distinction positions RNA-seq as an essential tool for comprehensive fusion transcript characterization in both research and clinical settings.

RNA-seq Versus DNA-seq: A Fundamental Comparison for Fusion Detection

The choice between RNA-seq and DNA-seq for fusion detection hinges on their complementary strengths and limitations. DNA-based approaches, including whole-genome sequencing (WGS), can identify structural variants across the entire genome but face challenges in determining the functional consequences of these alterations. The breakpoints of fusion genes often occur within long intronic regions containing repetitive sequences, making them difficult to resolve and accurately identify using DNA-seq [4]. Furthermore, DNA-seq cannot distinguish between expressed, potentially oncogenic fusions and silent rearrangements that may not contribute to disease pathogenesis.

In contrast, RNA-seq directly sequences the transcriptome, capturing evidence of fusion transcripts that are actively expressed. This approach naturally enriches for exonic sequences and provides direct evidence of chimeric transcripts, bypassing the challenges posed by intronic regions. Additionally, RNA-seq can reveal the exact breakpoints at the transcript level and identify different fusion isoforms that may arise from the same genomic rearrangement [4]. The table below summarizes the key distinctions between these approaches for fusion detection:

Table: Comparison of DNA-seq and RNA-seq for Fusion Gene Detection

Feature DNA-seq RNA-seq
Target Genomic DNA structure Expressed RNA transcripts
Breakpoint Resolution Challenging in repetitive intronic regions Focused on exonic regions; precise transcript breakpoints
Functional Insight Identifies structural variants without expression context Directly detects expressed, potentially functional fusions
Fusion Isoforms Limited ability to resolve different transcript isoforms Can identify multiple fusion isoforms from same rearrangement
Coverage Requirements Requires deep coverage across introns and exons Naturally enriches for exonic sequences
Therapeutic Relevance May detect silent rearrangements without functional impact Prioritizes expressed fusions with potential clinical actionability

Despite these advantages, RNA-seq has limitations, including its dependence on adequate RNA quality and quantity, and the challenge of detecting fusions involving genes with low expression levels. The most comprehensive approach often involves combining both DNA and RNA-level analyses to obtain a complete picture of genomic rearrangements and their functional consequences.

RNA-seq Methodologies: Experimental Approaches and Platforms

RNA-seq Workflow Fundamentals

The standard RNA-seq workflow begins with RNA extraction from patient samples, which can include fresh frozen tissue, formalin-fixed paraffin-embedded (FFPE) specimens, or cell lines. Due to RNA's inherent instability compared to DNA, careful preservation and extraction methods are critical to maintain RNA integrity. The extracted RNA undergoes reverse transcription to complementary DNA (cDNA), followed by library preparation and next-generation sequencing. Specific variations in library preparation methodology define the major RNA-seq approaches for fusion detection.

The following diagram illustrates the core workflow and decision points in RNA-seq for fusion transcript detection:

G cluster_0 Library Prep Method Start Sample Collection (Blood, Tissue, FFPE) RNA RNA Extraction & QC Start->RNA LibPrep Library Preparation RNA->LibPrep Amp Targeted Amplicon (PCR-based) LibPrep->Amp Cap Hybridization Capture LibPrep->Cap WTS Whole Transcriptome LibPrep->WTS Seq Sequencing Amp->Seq Cap->Seq WTS->Seq Bioinf Bioinformatic Analysis Seq->Bioinf Fusion Fusion Transcript Identification Bioinf->Fusion

Targeted RNA-seq Approaches

Targeted RNA-seq methods focus sequencing power on specific genes of interest, offering enhanced sensitivity for detecting low-abundance fusion transcripts. Amplicon-based approaches utilize gene-specific primers to amplify targeted regions, making them particularly effective when prior knowledge of potential fusion partners exists. Studies have demonstrated that amplicon-based assays can achieve sensitivity of 93.3% and specificity of 100% for fusion detection [18] [19]. These methods typically employ unique molecular identifiers (UMIs) to mitigate PCR amplification biases and improve detection accuracy.

Hybridization capture-based methods use complementary probes to enrich for target genes before sequencing. This approach offers greater flexibility for detecting novel fusion partners compared to amplicon-based methods. A recent study of non-small cell lung cancer specimens found that adding reflex hybridization capture-based RNA-seq identified actionable oncogenic fusions in approximately 10% of cases that were missed by initial amplicon-based testing [14]. These fusions involved clinically relevant genes including ALK, BRAF, NRG1, NTRK3, ROS1, and RET.

Whole Transcriptome and Long-Read Sequencing

Whole transcriptome sequencing provides an unbiased approach to fusion discovery by sequencing all expressed genes without prior selection. This method enables detection of novel fusion events without predetermined expectations about fusion partners but typically requires higher sequencing depth and more extensive bioinformatic analysis. Recent advances in long-read sequencing technologies from Pacific Biosciences and Oxford Nanopore now enable full-length isoform sequencing, providing unprecedented resolution of fusion transcript structures [20]. These technologies are particularly valuable for resolving complex fusion isoforms and identifying fusions in single-cell transcriptomes.

Performance Benchmarking: Quantitative Comparisons Across Platforms

Detection Sensitivity and Specificity

Comparative studies provide critical insights into the performance characteristics of different RNA-seq approaches. In a comprehensive analysis of 806 acute myeloid leukemia samples, RNA-seq detected 90% of fusion events that were reported by routine diagnostic methods with high evidence, demonstrating strong concordance with established techniques [9]. The performance varied based on sequencing coverage, with samples exhibiting lower and inhomogeneous coverage showing reduced sensitivity, particularly for fusions involving CBFB and KMT2A.

A large-scale study comparing targeted RNA-seq with optical genome mapping (OGM) in 467 acute leukemia cases revealed an overall concordance rate of 88.1% for fusion detection [21]. The performance differed significantly based on fusion type: OGM uniquely detected 15.8% of clinically relevant rearrangements, while RNA-seq exclusively identified 9.4%. This highlights the complementary nature of different technologies, with RNA-seq demonstrating superior detection of expressed chimeric fusions, while OGM more effectively identified cryptic, enhancer-driven events that may not generate fusion transcripts.

Platform-Specific Performance Characteristics

Table: Performance Comparison of RNA-seq Fusion Detection Approaches

Platform/Method Sensitivity Specificity Key Strengths Study Details
Targeted Amplicon (QIAseq) 93.3% 100% Optimal for low-input samples; UMIs reduce false positives 74 positive, 36 negative controls [19]
Hybridization Capture ~90% (for reflex testing) ~100% Detects novel fusions; complements amplicon-based methods Identified fusions in 10% of NSCLC cases missed by amplicon [14]
Whole Transcriptome 89.9% Varies by tools Unbiased discovery; detects novel fusions 806 AML samples; coverage-dependent [9]
Long-read Sequencing Superior for complex isoforms High with proper tools Resolves full-length fusion structures CTAT-LR-Fusion tool benchmarking [20]

Bioinformatics Pipelines: From Raw Data to Fusion Calls

Computational Tools and Algorithms

The accurate identification of fusion transcripts from RNA-seq data requires sophisticated computational approaches. Current methods primarily fall into two categories: read-mapping approaches that align sequences to reference genomes or transcriptomes to identify discordant reads, and de novo assembly-based approaches that reconstruct transcripts before identifying chimeric sequences. Benchmarking studies have evaluated numerous fusion detection tools, with STAR-Fusion, Arriba, and STAR-SEQR consistently demonstrating high accuracy and fast performance for fusion detection on cancer transcriptomes [22].

Performance varies significantly among tools, with mapping-based approaches generally outperforming assembly-based methods in terms of sensitivity. In simulated data benchmarks, Arriba, Pizzly, STAR-SEQR, and STAR-Fusion emerged as top performers, while methods requiring de novo transcriptome assembly exhibited high precision but suffered from comparably low sensitivity [22]. Fusion detection sensitivity is notably affected by fusion expression levels, with most tools performing better for moderately and highly expressed fusions.

Validation and Integration with Genomic Data

The high rate of false positives represents a significant challenge in fusion transcript detection, necessitating robust validation strategies. Integration with whole-genome sequencing (WGS) data provides orthogonal confirmation of fusion events at the DNA level. Recently developed pipelines for validating fusion transcripts in matched WGS data have demonstrated superior sensitivity and speed compared to established structural variant callers like BreakDancer and Manta [23]. These approaches use focused searches based on RNA-seq fusion predictions to identify supporting evidence in WGS data, significantly reducing computational requirements while maintaining high sensitivity.

The Scientist's Toolkit: Essential Research Reagents and Materials

Successful detection of fusion transcripts requires careful selection of laboratory reagents and computational resources. The following table outlines essential components of a robust fusion detection workflow:

Table: Essential Research Reagents and Materials for Fusion Transcript Detection

Category Specific Products/Tools Function and Application Notes
RNA Extraction miRNeasy Kit (Qiagen), miRNAeasy FFPE kit Maintain RNA integrity; specialized protocols for FFPE samples
Library Prep QIAseq RNAscan Custom Panel, Illumina TruSeq Stranded Total RNA Target-specific vs. whole transcriptome approaches
rRNA Depletion Ribo-Zero (Illumina) Remove ribosomal RNA to enrich for mRNA targets
Target Enrichment OSU-SpARKFuse custom probes, xGen Lockdown Probes Hybridization capture for targeted sequencing
Sequencing Illumina MiSeq, NextSeq; PacBio Sequel; Oxford Nanopore Platform selection based on read length and accuracy needs
Bioinformatics STAR-Fusion, Arriba, CTAT-LR-Fusion, SeekFusion Fusion detection algorithms with varying performance characteristics
Validation OncoScan FFPE Assay Kit, RT-PCR, Orthogonal WGS Confirm fusion events identified by RNA-seq

RNA-seq has established itself as an indispensable technology for capturing expressed fusion transcripts in the transcriptome, providing critical functional insights that complement DNA-level structural information. The optimal approach to fusion detection depends on specific research objectives, sample characteristics, and available resources. Targeted methods offer high sensitivity for known fusions in challenging samples like FFPE, while whole transcriptome and long-read approaches enable novel fusion discovery and isoform resolution. As sequencing technologies continue to evolve and computational methods improve, RNA-seq will undoubtedly remain central to advancing our understanding of fusion transcripts in cancer biology and therapeutic development.

In the field of cancer genomics, the accurate detection of fusion genes is crucial for diagnosis, prognosis, and guiding targeted therapies. Two primary sequencing approaches—DNA sequencing (DNA-seq) and RNA sequencing (RNA-seq)—offer distinct technological pathways for this detection, each with fundamental differences in what they measure: genomic breakpoints versus transcript expression. DNA-seq identifies structural rearrangements at the DNA level, including the precise breakpoints in the genome where different genes have joined. In contrast, RNA-seq detects the RNA transcripts that are actually expressed from such rearrangements, revealing the functional fusion products [4] [24]. This guide provides an objective comparison of their performance, supported by experimental data and detailed methodologies.

Core Technological Comparison

The following table summarizes the fundamental differences between DNA-seq and RNA-seq in the context of fusion gene detection.

Feature DNA-Sequencing (DNA-seq) RNA-Sequencing (RNA-seq)
Detection Principle Identifies structural rearrangements and breakpoints in the genome itself [4]. Identifies chimeric transcripts that are expressed and spliced [4] [24].
Molecular Target Genomic DNA (including introns and exons) [4]. Complementary DNA (cDNA) derived from processed mRNA (exons only) [4] [24].
Key Advantage Can detect rearrangements regardless of whether they are expressed as RNA [4]. Directly confirms expression; avoids sequencing long introns by focusing on spliced exon junctions [4] [24].
Key Challenge Breakpoints often lie in long, repetitive intronic regions, making them difficult to cover and sequence [4]. Requires high-quality RNA and sufficient expression of the fusion transcript for detection [4].

The diagram below illustrates the core logical relationship between what each technology detects and its corresponding output.

architecture Fusion Detection Logical Flow DNA Genomic DNA DNA_Breakpoint DNA Breakpoint in Intron/Exon DNA->DNA_Breakpoint DNA_Seq DNA-Seq Detects Breakpoint DNA_Breakpoint->DNA_Seq Transcribed_RNA Transcribed Fusion RNA DNA_Breakpoint->Transcribed_RNA Spliced_mRNA Spliced Fusion mRNA (Introns Removed) Transcribed_RNA->Spliced_mRNA RNA_Seq RNA-Seq Detects Exon-Exon Junction Spliced_mRNA->RNA_Seq

Performance and Clinical Validation Data

Empirical studies and large-scale clinical validations consistently demonstrate the performance characteristics of each method. The following table quantifies their relative strengths and limitations.

Study & Context Key Finding on DNA-seq Key Finding on RNA-seq Experimental Detail
Tempus (Real-World, n=~80k) [25] Detected only 4.8% of actionable fusions exclusively. Detected 29.1% of actionable fusions exclusively, a ~6x increase over DNA-seq alone. Assay: Tempus xT (DNA-seq panel for 21 fusions + whole-exome RNA-seq).
Targeted RNA-seq (Clinical Cohort) [26] N/A (Compared to FISH/RT-PCR). Increased overall diagnostic rate from 63% (conventional methods) to 76%. Assay: Custom targeted RNA-seq panels for hematological and solid tumors.
FFPE Tumor Validation [27] A DNA panel missed a MET fusion (false negative). The RNA-seq assay identified the MET fusion and 26 extra fusions; 77% were validated. Sample: Formalin-Fixed, Paraffin-Embedded (FFPE) tumor samples.
Acute Myeloid Leukemia Study [9] Routine diagnostics (karyotyping, FISH, PCR) identified 107/138 fusions with high evidence. Detected 115/138 fusions with high evidence, showing strong concordance and complementary value. Sample: 806 patient samples; Tools: Arriba and FusionCatcher.

Detailed Experimental Protocols

DNA-Seq for Fusion Detection

The DNA-based approach focuses on identifying the genomic locus where a chromosomal rearrangement has occurred.

Workflow Overview:

  • DNA Extraction & Library Prep: Genomic DNA is extracted from cells or tissues (e.g., blood, saliva, FFPE tissue). It is then fragmented, and sequencing adapters are ligated [4].
  • Sequencing: The library is sequenced using panels (targeted), whole exome (WES), or whole genome (WGS) approaches. For fusions, WGS is most comprehensive but also the most costly [4].
  • Alignment & Breakpoint Calling: Reads are aligned to a reference genome (e.g., using BWA, Bowtie2). Specialized algorithms (e.g., GATK, CNVnator) then look for discordant read pairs and split reads that indicate a structural variant, pinpointing the breakpoint coordinates [4].

Limitations: The fundamental challenge is that breakpoints for gene fusions often occur within long intronic regions. These regions are difficult to cover with sufficient sequencing depth, and their repetitive nature complicates accurate alignment and variant calling [4].

RNA-Seq for Fusion Detection

The RNA-based approach skips the DNA breakpoint and instead focuses on the expressed, spliced mRNA product of the fusion gene.

Workflow Overview:

  • RNA Extraction & QC: RNA is extracted, which is more prone to degradation than DNA. Quality control (e.g., RIN score) is critical [28].
  • Library Preparation: The extracted RNA is reverse-transcribed into cDNA. A key decision is whether to use poly(A) selection to enrich for mRNA or ribosomal depletion to capture a broader set of RNA species. Strand-specific protocols are preferred as they preserve the directionality of the transcript [28].
  • Sequencing: Standard short-read (e.g., Illumina) or long-read (e.g., PacBio, Nanopore) platforms are used. Targeted RNA-seq can be employed to enrich for genes of interest, increasing sensitivity for low-expression fusions [26].
  • Fusion Calling: Reads are aligned to a reference genome/transcriptome (e.g., using STAR, HISAT2). Fusion detection tools (e.g., FusionCatcher, Arriba, STAR-Fusion) are then used to identify reads that span the novel exon-exon junction created by the fusion event [29] [9] [26].

workflow RNA-seq Fusion Detection Workflow Subgraph_1 Wet-Lab Steps A1 Extract Total RNA A2 RNA Quality Control (Check RIN Score) A1->A2 A3 Library Prep: Reverse Transcribe to cDNA, Enrich (polyA/deplete) A2->A3 A4 Sequence (Illumina, PacBio, Nanopore) A3->A4 B1 Quality Control & Read Trimming (FastQC, Trimmomatic) A4->B1 Subgraph_2 Bioinformatic Analysis B2 Alignment to Reference (STAR, HISAT2, Minimap2) B1->B2 B3 Fusion Calling & Filtering (FusionCatcher, Arriba, STAR-Fusion) B2->B3 B4 Visualization & Validation B3->B4

The Scientist's Toolkit

Successful detection and validation of fusion genes require a combination of laboratory and computational resources. The following table lists essential solutions and their functions.

Research Reagent / Tool Function / Application
Arriba & FusionCatcher Widely used, state-of-the-art fusion detection software tools that are often used in conjunction for high-confidence calling [9] [26].
STAR-Fusion Another accurate and widely used fusion detection algorithm, based on the STAR aligner [26].
Targeted RNA-seq Panels Biotinylated oligonucleotide probes designed to enrich for hundreds of known fusion-related genes, dramatically increasing sensitivity for low-expression fusions and enabling work with degraded samples [26].
FFPE-RNA Extraction Kits Specialized reagents for extracting usable RNA from Formalin-Fixed, Paraffin-Embedded (FFPE) tissue blocks, the most common form of clinical archiving [27].
Spike-in Control RNAs Synthetic RNA controls (e.g., ERCC, fusion sequins) spiked into samples to quantitatively evaluate the sensitivity, accuracy, and limit of detection of the entire RNA-seq workflow [26].
Long-read Aligners (Minimap2) Essential software for aligning data from long-read sequencing technologies (PacBio, Nanopore), which is crucial for tools like GFvoter [8].

The choice between DNA-seq and RNA-seq for fusion gene detection is not a matter of one being universally superior, but rather of understanding their complementary strengths. DNA-seq is unparalleled in identifying the genomic architecture and breakpoints of structural rearrangements. However, for confirming the expression of a functionally consequential fusion transcript with high sensitivity and clinical actionability, RNA-seq has demonstrated a clear and significant advantage. The most robust clinical and research practice is to utilize these technologies in tandem, where DNA-seq provides the structural context and RNA-seq delivers functional validation of the expressed fusion, ensuring the most comprehensive and accurate detection for precision oncology.

The diagnosis of gene fusions, critical drivers in cancer, has historically relied on traditional molecular techniques such as fluorescence in situ hybridization (FISH) and quantitative real-time polymerase chain reaction (RT-PCR). Though highly sensitive, these methods are typically limited to testing for a single fusion gene per assay, often resulting in a lengthy, iterative, and costly diagnostic path. Furthermore, they are unable to identify novel fusion gene partners or resolve complex structural rearrangements, with false-negative results from non-tested fusions being a leading cause of misdiagnosis in haematological cancers [26]. The advent of next-generation sequencing (NGS) has fundamentally transformed this landscape by enabling genome-wide surveillance of fusion genes with nucleotide-level resolution. Among NGS approaches, a key distinction exists between DNA-sequencing (DNA-seq) and RNA-sequencing (RNA-seq) methods, each with unique strengths and limitations for fusion detection. This guide objectively compares the performance of these platforms, framing the discussion within the broader thesis of RNA-seq versus DNA-seq for fusion detection research.

Methodological Comparison: DNA-Seq vs. RNA-Seq for Fusion Detection

DNA-seq and RNA-seq assays employ distinct laboratory methods and bioinformatic pipelines to identify gene fusions. DNA-based NGS (including whole-genome, whole-exome, or targeted panels) detects rearrangements at the genomic DNA level by identifying sequencing reads that span breakpoints between different genes or chromosomal regions. In contrast, RNA-based NGS detects the expressed transcript resulting from a gene fusion, effectively capturing the chimeric RNA molecule. Common RNA-seq enrichment methods include anchored multiplex PCR (AMP), amplicon-based multiplex PCR, and hybrid capture-based enrichment [30] [26].

Table 1: Core Methodological Differences Between DNA-seq and RNA-seq for Fusion Detection

Feature DNA-Sequencing (DNA-seq) RNA-Sequencing (RNA-seq)
Target Molecule Genomic DNA Messenger RNA (transcriptome)
Detection Principle Identifies structural rearrangements and breakpoints in the DNA sequence Identifies chimeric fusion transcripts
Key Enrichment Methods Hybrid-capture, Amplicon Anchored Multiplex PCR, Hybrid-capture, Amplicon
Ability to Detect Novel Partners Limited to targeted genomic regions; can be challenging High, especially with anchored multiplex or hybrid-capture methods
Confirmation of Expression No; identifies potential but not necessarily expressed fusions Yes; directly confirms the fusion is transcribed
Influence of Gene Expression Independent of expression level Dependent on transcript abundance

A critical advancement in the diagnostic workup is the use of reflex testing protocols. Studies in non-small-cell lung carcinoma (NSCLC) have demonstrated that an algorithm using an initial amplicon-based DNA/RNA test, followed by reflex hybridization-capture–based RNA-seq for negative cases, significantly improves the detection of rare and novel oncogenic fusions, thereby maximizing patient eligibility for targeted therapies [14].

Performance Benchmarking: Sensitivity, Concordance, and Unique Detections

Head-to-head comparative studies reveal that RNA-seq and DNA-seq platforms are largely complementary, with each method uniquely detecting a subset of clinically significant rearrangements.

Concordance and Discrepancies in Acute Leukemia

A large-scale study of 467 acute leukemia cases directly compared a 108-gene targeted RNA-seq panel with Optical Genome Mapping (OGM), a DNA-level structural variant mapping technique. The results demonstrated an overall concordance rate of 88.1% for clinically relevant events [21]. However, each method contributed unique findings [21]:

  • OGM (DNA-level) uniquely detected 15.8% of rearrangements. These were predominantly enhancer-hijacking lesions (e.g., involving MECOM, BCL11B, and IGH), which do not generate fusion transcripts and are therefore invisible to RNA-seq. Concordance for this class of aberrations was very low (20.6%).
  • RNA-seq uniquely detected 9.4% of rearrangements. It showed slightly better performance for fusions arising from intrachromosomal deletions, which were sometimes misinterpreted by OGM as simple deletions.

Detection of Gene Fusions in Solid Tumors

In solid tumors, RNA-seq has proven highly effective, particularly for biomarker-driven therapies. For example, in the detection of NTRK fusions—an FDA-approved target—RNA-seq is one of the most sensitive methods [30]. A comparative study of three RNA-seq chemistries found that while amplicon-based multiplex PCR had the lowest limit of detection, both hybrid-capture and anchored multiplex PCR methods were superior for detecting NTRK fusions with uncommon or novel partners [30].

Table 2: Performance Comparison of RNA-seq Assay Types for Fusion Detection

Performance Metric Amplicon-Based Multiplex PCR Anchored Multiplex PCR (AMP) Hybrid-Capture-Based
Analytical Sensitivity Highest (Lowest Limit of Detection) High High
Ability to Detect Novel/Uncommon Partners Limited High High
Example Clinical Utility Detecting known, common fusions Discovery; complex rearrangements Comprehensive profiling; reflex testing

The analytical performance of RNA-seq has been rigorously validated for clinical use. One study developed an RNA-seq assay for formalin-fixed, paraffin-embedded (FFPE) tumors, demonstrating it could identify all spiked-in NTRK fusions from reference material and achieved a detection limit down to 10% tumor content in dilution experiments [27]. The assay showed 83.3% sensitivity against a DNA panel and successfully identified additional fusions not covered by the DNA assay [27].

Experimental Protocols and Workflows

Detailed Methodologies from Cited Studies

To ensure reproducibility, below are the core experimental protocols from key studies cited in this guide.

Targeted RNA-seq for Fusion Detection in Leukemia (from [21])

  • RNA Extraction: RNA is extracted from peripheral blood or bone marrow aspirate specimens.
  • Library Preparation: The Anchored Multiplex PCR (AMP) method is used for target enrichment. This chemistry utilizes unidirectional gene-specific primers (GSP2) targeting at least one of the two gene partners involved in a translocation, enabling the capture of novel fusion partners.
  • Sequencing: Amplified targets are sequenced using bidirectional sequencing on an Illumina sequencer. Sequencing reads are aligned to the human reference genome GRCh37/hg19.
  • Fusion Calling: Fusion transcripts are identified using Archer Analysis Software v6.2.7.
  • Variant Interpretation: Results are classified according to ACMG/ClinGen and AMP/ASCO/CAP guidelines into tiers based on clinical relevance.

Validation of an RNA-seq Assay for FFPE Tumors (from [27])

  • Sample Type: Formalin-fixed, paraffin-embedded (FFPE) tumor samples.
  • Performance Assessment:
    • Spike-in Recovery: Identified all 15 spiked-in NTRK fusions from RNA reference material.
    • Cell Line Validation: Detected six known fusions from five cancer cell lines.
    • Limit of Detection (LOD): Assessed using serial dilutions of RNA from the H2228 cell line, determining LOD at 10% tumor content.
    • Reproducibility: Good intra-assay and inter-assay reproducibility was observed in three specimens.
    • Clinical Validation: Tested against a DNA-based NGS panel in clinical specimens.

Workflow Diagram: Reflex Testing Algorithm in NSCLC

The following diagram illustrates the integrated DNA/RNA-seq reflex testing workflow used to maximize fusion detection in non-small-cell lung cancer, as described in the research [14].

G Start NSCLC Specimen Received InitialTest Initial Amplicon-Based DNA/RNA Sequencing Start->InitialTest Decision Oncogenic Driver Detected? InitialTest->Decision ReflexTest Reflex to Hybrid-Capture Based RNA Sequencing Decision->ReflexTest No ResultA Driver Identified Informs Targeted Therapy Decision->ResultA Yes ResultB Rare/Novel Fusion Identified Maximizes Clinical Eligibility ReflexTest->ResultB

Successful implementation of NGS-based fusion detection requires a suite of specialized reagents, kits, and computational tools.

Table 3: Key Research Reagent Solutions for NGS-Based Fusion Detection

Item Function/Description Example Kits/Tools (from Search Results)
RNA Extraction Kits Isolate high-quality RNA from cell, tissue, or FFPE samples. QIAgen RNeasy Kit [31], TRIzol-based methods [31]
Library Prep Kits Prepare sequencing libraries from extracted RNA. Illumina TruSeq mRNA stranded, NEBnext Ultra II RNA, Lexogen QuantSeq-Pool, Alithea MERCURIUS BRB-seq [31]
Target Enrichment Enrich for target genes/transcripts prior to sequencing. Archer AMP Kit (Anchored Multiplex PCR) [21], Hybrid-capture probes [26]
Sequencing Platforms Instruments to perform high-throughput sequencing. Illumina NovaSeq 6000, Illumina MiSeq [31] [32]
Bioinformatics Tools Align sequences, detect fusions, and interpret variants. STAR aligner [31], Arriba [7], STAR-Fusion, FusionCatcher [7] [26], Archer Analysis [21]
RNA Quality Control Assess RNA integrity and quantity prior to library prep. Agilent Bioanalyzer RNA-6000-Nano chip [31]

Cost and Operational Considerations

The cost of RNA-sequencing varies significantly based on the library preparation method and sequencing depth. A detailed breakdown shows that library preparation is often the most expensive step [31]. When using a high-throughput NovaSeq S4 flow cell at full capacity, total costs per sample (excluding labor) can range from approximately $37 (using a highly multiplexed kit like BRB-seq at 5M reads) to $114 (using Illumina's TruSeq kit at ≥25M reads) [31]. Core facility pricing from Northwestern University provides a commercial benchmark, listing mRNA-seq complete services (library prep, sequencing, and standard bioinformatics) at $380 per sample for institutional users [32]. These figures highlight that while NGS has become more accessible, budgeting must carefully consider the trade-offs between cost, sequencing depth, and the comprehensiveness of the assay.

The evolution from FISH and PCR to NGS platforms has irrevocably changed the diagnostic landscape for gene fusions. The evidence clearly demonstrates that DNA-seq and RNA-seq are not competing but complementary technologies. DNA-level methods like OGM are superior for detecting structural rearrangements that may not result in fusion transcripts, such as enhancer-hijacking events. Conversely, RNA-seq directly confirms the expression of a chimeric fusion, is more sensitive for fusions arising from intrachromosomal deletions, and excels at identifying novel fusion partners, making it indispensable for comprehensive biomarker testing. The most effective modern diagnostic algorithms, therefore, leverage the strengths of both approaches, often through reflexive testing protocols. As sequencing costs continue to decline and bioinformatic tools like Arriba [7] improve in speed and accuracy, the integration of multi-modal NGS testing will become the standard of care, ensuring that patients receive the most precise diagnosis and access to targeted therapies.

Practical Applications in Research and Clinical Diagnostics

The detection of gene rearrangements, such as those producing oncogenic fusions, represents a critical component of precision oncology and genetic disease diagnosis. However, the presence of large intronic regions—stretches of non-coding DNA that can span thousands of bases—poses a formidable challenge for conventional DNA sequencing (DNA-seq) technologies. These intronic regions often contain breakpoints where structural rearrangements occur, yet their length and repetitive nature can obscure detection using standard approaches. While DNA-seq provides essential information about genomic architecture, its limitations in resolving breakpoints within extensive intronic sequences have driven the development of complementary technologies, most notably RNA sequencing (RNA-seq).

The fundamental challenge lies in the technological constraints of most widely-used DNA-seq platforms. Short-read sequencing, while excellent for identifying single nucleotide variants and small insertions/deletions, struggles to span large intronic regions where breakpoints may reside. This limitation becomes clinically significant when rearrangements in these regions produce functionally important fusion genes or disrupt normal gene function. Consequently, understanding the specific scenarios where DNA-seq succeeds versus when it requires augmentation from other methods is essential for researchers and clinicians designing diagnostic approaches.

Technological Limitations of DNA-Seq for Intronic Breakpoints

The Intronic Blind Spot in DNA Sequencing

Conventional DNA-seq approaches face several inherent limitations when targeting rearrangements with breakpoints in large intronic regions. The primary issue stems from library preparation methods and read length constraints. Most targeted DNA-seq panels use hybrid capture or amplicon-based approaches designed to cover exonic regions and occasionally their immediate flanking sequences. This design inevitably creates gaps in coverage across large introns, resulting in an inability to detect breakpoints occurring in these under-covered regions [14].

The fundamental detection challenge arises because DNA rearrangements involving large intronic regions may not produce physically close breakpoints in the linear genome. When using short-read sequencing (typically 75-300 bp), the two ends of a rearrangement event may be separated by distances far exceeding the read length, making it impossible to capture both breakpoints in a single sequencing read. While paired-end sequencing provides some contextual information about these events, the inference of precise breakpoint locations remains challenging when they fall within repetitive or low-complexity sequences common in intronic regions [13].

Additionally, the bioinformatic pipelines used to identify structural variants from DNA-seq data often rely on discordant read pairs and split reads as signals of rearrangement events. For breakpoints in large introns, especially those with repetitive elements, these signatures can be difficult to distinguish from mapping artifacts or technical noise. The problem is particularly pronounced for complex rearrangements involving multiple breakpoints, where the linear distance between genomic features further complicates accurate reconstruction [33].

DNA-Seq Versus RNA-Seq: Fundamental Differences in Detection Principles

The table below summarizes the core methodological differences between DNA-seq and RNA-seq approaches for rearrangement detection:

Table 1: Core Methodological Differences Between DNA-seq and RNA-seq for Rearrangement Detection

Feature DNA-Seq Approach RNA-Seq Approach
Target Material Genomic DNA Processed messenger RNA
Breakpoint Detection Direct detection of genomic breakpoints Detection of expressed fusion transcripts
Intronic Region Impact Limited by intron size and repetitive elements Introns removed during RNA processing
Functional Relevance Identifies structural variants regardless of functional impact Confirms expression of fusion products
Coverage Requirements Requires continuous coverage across potential breakpoint regions Requires coverage of exon boundaries
Novel Partner Discovery Limited to designed target regions Can identify novel partners via untargeted methods

DNA-seq identifies structural variants at the genomic level by directly sequencing DNA and looking for abnormalities in sequence arrangement. In contrast, RNA-seq detects the transcriptional consequences of these rearrangements—specifically, the fusion transcripts that result from chromosomal rearrangements [34]. This fundamental difference explains their complementary strengths: DNA-seq can potentially identify all structural variants regardless of their functional consequences, while RNA-seq confirms which variants are actually expressed and likely functionally relevant.

For intronic breakpoints specifically, RNA-seq possesses a distinct advantage because the natural process of RNA splicing removes introns during maturation from pre-mRNA to mRNA. Consequently, RNA-seq only needs to sequence across exon-exon junctions, completely bypassing the challenge of large intronic regions that plague DNA-seq approaches [35]. This enables RNA-seq to detect fusion events regardless of the genomic distance or complexity between partner genes, provided the fusion is expressed at detectable levels.

Comparative Performance Data: DNA-Seq Versus Alternative Methods

Clinical Study Findings in Oncology

Recent clinical studies directly comparing DNA-seq and RNA-seq performance have quantified the detection gap for rearrangements with challenging genomic architectures. In non-small cell lung cancer (NSCLC), a study of 1,211 specimens found that approximately 10% of cases required reflex testing with hybridization-capture-based RNA-seq after initial amplicon-based DNA/RNA sequencing yielded negative results despite clinical suspicion. Among these reflex cases, oncogenic fusions involving genes including ALK, BRAF, NRG1, NTRK3, ROS1, and RET were identified—none of which were detected by the initial amplicon-based assay [14].

A focused investigation of RET fusions in early-stage NSCLC provided further insight into method-specific sensitivities. In this study, DNA-seq successfully identified putative RET+ cases, but the subsequent RNA-seq analysis demonstrated enhanced detection capabilities. Targeted RNA-seq specifically uncovered five additional RET+ cases that were missed by whole-transcriptome sequencing, highlighting both the value of RNA-based detection and the performance differences between RNA-seq approaches [36] [37]. The concordance rates between methods were notably high but imperfect: 92.3% between DNA-seq and RNA-seq, and 82.5% between DNA-seq and FISH, underscoring that each method captures a slightly different subset of rearrangements [36].

The performance gap varies significantly across cancer types and specific genes. In acute leukemia, a comprehensive comparison of targeted RNA-seq and optical genome mapping (OGM) in 467 cases revealed an overall concordance of 88.1% for fusion detection. However, the detection rates were highly variable, with RNA-seq uniquely identifying 9.4% of clinically relevant rearrangements, while OGM exclusively detected 15.8% [21]. This suggests that the optimal testing approach may need to be tailored to specific clinical contexts and target genes.

Table 2: Clinical Detection Rates of Oncogenic Fusions Across Methodologies

Study Context DNA-Seq Detection Rate RNA-Seq Detection Rate Key Findings
NSCLC (n=1,211) [14] ~90% of fusions (estimated from database review) Identified 100% of fusions in reflex cohort 10% of cases required RNA-seq reflex testing; RNA-seq found actionable fusions missed by DNA-seq
RET+ Early-Stage NSCLC (n=40) [36] [37] 92.3% concordance with RNA-seq 100% detection in confirmed RET+ cases Targeted RNA-seq identified 5 additional cases missed by whole-transcriptome sequencing
Acute Leukemia (n=467) [21] Not separately reported 74.7% overall concordance with OGM RNA-seq better for fusions from intrachromosomal deletions; OGM superior for enhancer-hijacking events
Solid Tumors (n=60) [34] 93.4% concordance with reference methods 86.9% concordance with reference methods Integrated DNA/RNA testing achieved 100% sensitivity and specificity

Specialized Applications and Case Studies

Beyond oncology, the limitations of DNA-seq for detecting intronic variants have significant implications for genetic disease diagnosis. A compelling case report described a patient with clinical Cowden syndrome who had negative targeted DNA sequencing results. Through concurrent RNA testing, researchers identified a deep intronic PTEN pathogenic variant that disrupted normal splicing [38]. This variant would have remained undetected by standard DNA-seq approaches, which typically only capture exons and short flanking intronic sequences. The discovery enabled accurate risk assessment and clinical management for the patient and their family members.

The integration of DNA and RNA sequencing can also resolve complex structural variants that evade characterization by single-method approaches. In one study investigating copy number gains, researchers utilized long-read sequencing on both DNA and cDNA to precisely map breakpoints at single-base resolution. This integrated approach revealed intricate rearrangement structures and their functional consequences on transcription, providing insights that would have been impossible with DNA-seq alone [33].

Emerging technologies like long-read sequencing offer potential solutions to some limitations of short-read DNA-seq. Pacific Biosciences HiFi and Oxford Nanopore Technologies can generate reads spanning kilobases to megabases, potentially capturing large intronic regions and complex rearrangements in a single read [13]. However, these technologies currently face challenges related to cost, throughput, and analytical validation for routine clinical use, suggesting they will complement rather than immediately replace established DNA-seq and RNA-seq approaches.

Experimental Approaches and Methodologies

DNA-Seq Protocols for Structural Variant Detection

Standardized protocols for DNA-seq-based rearrangement detection typically begin with sample preparation from formalin-fixed paraffin-embedded (FFPE) tissue or fresh frozen specimens. For targeted DNA-seq approaches, hybrid capture or amplicon-based methods are used to enrich for genomic regions of interest. In one representative study investigating RET fusions in NSCLC, researchers employed a 425-gene panel with the following workflow: genomic DNA extraction using the QIAamp DNA FFPE Tissue kit, quality assessment via Nanodrop and Qubit fluorometry, library preparation with the KAPA Hyper Prep kit, and sequencing on Illumina HiSeq4000 platforms [36].

The bioinformatic analysis typically involves alignment to a reference genome (e.g., hg19/GRCh37) using tools like the Burrows-Wheeler Aligner (BWA), followed by variant calling with specialized structural variant detection algorithms. In the RET fusion study, researchers used Delly for somatic gene fusion detection after standard processing with GATK for base quality recalibration and local realignment [36]. For comprehensive variant interpretation, detected rearrangements are often manually verified using visualization tools such as the Integrative Genomics Viewer (IGV).

The limit of detection for DNA-seq assays varies based on sequencing depth and variant allele frequency. Validation studies of integrated DNA/RNA assays have demonstrated stable fusion detection at DNA mutational abundances as low as 5%, though performance depends on the specific fusion characteristics [34]. Intra-assay and inter-assay reproducibility validation is essential, with studies typically demonstrating complete concordance across replicates when quality metrics are maintained.

RNA-Seq Methodologies for Fusion Transcript Identification

RNA-seq protocols for fusion detection address different technical challenges, particularly RNA quality preservation from clinical specimens. A typical workflow begins with RNA extraction using specialized kits such as the RNeasy FFPE kit, followed by quality and quantity measurement with Qubit RNA HS assays. For targeted RNA-seq, custom-designed probes enrich for specific transcripts or gene regions of interest, improving detection sensitivity for low-abundance fusion transcripts [36].

Two primary enrichment strategies dominate clinical RNA-seq for fusions: anchored multiplex PCR (AMP) and hybrid-capture-based approaches. The AMP method uses unidirectional gene-specific primers to capture known and novel fusion partners, making it particularly valuable for detecting rearrangements with previously uncharacterized partners. In contrast, hybrid-capture approaches use biotinylated probes to pull down target transcripts, offering broader coverage of potential fusion events [21] [35].

The analytical sensitivity of RNA-seq fusion detection depends on * transcript abundance* rather than genomic characteristics. Studies have demonstrated reliable fusion detection with RNA input as low as 250-400 copies/100 ng total RNA [34]. For the bioinformatic identification of fusions, tools like FusionCatcher and Archer Analysis Software align sequencing reads to reference genomes and apply filters to distinguish true fusion transcripts from artifacts. The high sensitivity of RNA hybrid-capture sequencing is evidenced by its ability to identify numerous oncogenic and likely oncogenic NTRK fusions across diverse tumor types in real-world clinical settings [35].

G DNA DNA RNA RNA DNA->RNA Transcription Fusion_transcript Fusion Transcript DNA->Fusion_transcript Bypasses introns Protein Protein RNA->Protein Translation DNA_rearrangement DNA Rearrangement with Intronic Breakpoints DNA_rearrangement->Fusion_transcript RNA-seq detects Oncogenic_protein Oncogenic Fusion Protein Fusion_transcript->Oncogenic_protein Drives cancer

Diagram 1: Molecular Biology of Fusion Detection. This diagram illustrates the central dogma of biology and how fusion genes with intronic breakpoints create oncogenic proteins. RNA-seq bypasses the challenge of large introns by detecting the expressed fusion transcript directly.

Essential Research Reagents and Tools

The experimental approaches discussed require specialized reagents and computational tools to successfully detect rearrangements with large intronic breakpoints. The following table catalogues key solutions used in the cited studies:

Table 3: Essential Research Reagents and Tools for Rearrangement Detection Studies

Category Specific Product/Platform Application Note
DNA Extraction QIAamp DNA FFPE Tissue Kit (Qiagen) Optimized for challenging clinical samples
RNA Extraction RNeasy FFPE Kit (Qiagen) Maintains RNA integrity from archived specimens
Target Enrichment Anchored Multiplex PCR (Archer) Captures novel fusion partners
Target Enrichment Hybrid-Capture Probes (Illumina) Broad coverage of fusion events
Sequencing Platform Illumina HiSeq4000 Workhorse for clinical NGS
Long-Read Platform PacBio HiFi Sequencing Resolves complex structural variants
Long-Read Platform Oxford Nanopore PromethION Ultra-long reads for spanning introns
Variant Caller Delly Specialized for structural variants
Fusion Detection FusionCatcher Identifies fusion transcripts from RNA-seq
Visualization Integrative Genomics Viewer (IGV) Manual verification of rearrangements

Each solution addresses specific technical challenges in detecting rearrangements with intronic breakpoints. For example, specialized extraction kits maintain nucleic acid integrity despite degradation in FFPE samples, while targeted enrichment approaches ensure sufficient coverage of relevant genomic regions or transcripts. The choice between detection platforms involves trade-offs between read length, accuracy, throughput, and cost, with each technology offering distinct advantages for particular applications.

G cluster_DNA DNA-Seq Pathway cluster_RNA RNA-Seq Pathway Start Sample Collection (FFPE/Fresh Frozen) DNA_extraction DNA Extraction Start->DNA_extraction RNA_extraction RNA Extraction Start->RNA_extraction DNA_lib_prep Library Preparation (Hybrid Capture/Amplicon) DNA_extraction->DNA_lib_prep DNA_seq Sequencing (Short/Long-read) DNA_lib_prep->DNA_seq DNA_analysis Variant Calling (Delly, other SV callers) DNA_seq->DNA_analysis Integration Integrated Analysis DNA_analysis->Integration RNA_lib_prep Library Preparation (AMP/Hybrid Capture) RNA_extraction->RNA_lib_prep RNA_seq Sequencing RNA_lib_prep->RNA_seq RNA_analysis Fusion Detection (FusionCatcher) RNA_seq->RNA_analysis RNA_analysis->Integration Clinical_report Clinical Report Integration->Clinical_report

Diagram 2: Experimental Workflow for Rearrangement Detection. This diagram outlines parallel DNA-seq and RNA-seq pathways for comprehensive rearrangement detection, culminating in integrated analysis that compensates for the limitations of each individual method.

The detection of rearrangements with large intronic breakpoints remains a challenging frontier in genomic analysis. While DNA-seq provides critical information about genomic architecture, its limitations in spanning large intronic regions necessitate complementary approaches. RNA-seq offers a powerful solution by detecting the expressed consequences of these rearrangements, effectively bypassing the challenges posed by intronic sequences. The most effective diagnostic and research strategies increasingly employ integrated approaches that combine the strengths of both technologies.

Evidence from multiple clinical studies demonstrates that reflexive testing algorithms—where RNA-seq follows negative DNA-seq results in clinically suspicious cases—significantly improve detection rates for actionable rearrangements. As sequencing technologies evolve, long-read approaches may eventually overcome current limitations, but for now, the strategic combination of DNA and RNA sequencing represents the most comprehensive approach for detecting rearrangements with large intronic breakpoints. For researchers and clinicians, this integrated paradigm maximizes sensitivity while providing orthogonal validation of biologically significant fusion events.

The accurate identification of expressed chimeric transcripts, commonly known as fusion genes, has become a cornerstone of modern cancer diagnostics and therapeutic decision-making. These hybrid genes, formed through chromosomal rearrangements such as translocations, inversions, or deletions, act as powerful oncogenic drivers in numerous cancer types, accounting for approximately 20% of human cancer morbidity [26]. The detection of these fusions is particularly crucial as they represent actionable therapeutic targets, with inhibitors such as crizotinib (targeting EML4-ALK) showing remarkable clinical efficacy in treating fusion-positive cancers [26]. While traditional methods like fluorescence in situ hybridization (FISH) and reverse-transcription polymerase chain reaction (RT-PCR) have been diagnostic mainstays, they are inherently limited to assessing predefined targets, potentially missing novel or rare fusion events [9] [26].

The emergence of next-generation sequencing (NGS) technologies, particularly RNA sequencing (RNA-seq), has revolutionized fusion detection by enabling transcriptome-wide surveillance with nucleotide-level resolution. However, a significant methodological question remains: how does RNA-seq compare to DNA sequencing (DNA-seq) for reliable fusion identification in clinical and research settings? This guide provides a comprehensive, data-driven comparison of these approaches, evaluating their performance characteristics, practical applications, and implementation requirements to inform researchers and clinicians in selecting optimal strategies for fusion gene detection.

Performance Comparison: RNA-seq vs. DNA-seq for Fusion Detection

Direct comparative studies reveal distinct performance advantages and limitations of RNA-seq and DNA-seq approaches for fusion detection. The table below summarizes key performance metrics based on recent clinical and technical evaluations.

Table 1: Performance comparison of RNA-seq and DNA-seq for fusion detection

Performance Metric RNA-seq DNA-seq Evidence
Detection Rate 76% (targeted RNA-seq) Used as reference standard [26]
Sensitivity for Canonical Fusions 79.5-92.3% 92.3% concordance with RNA-seq [36]
Sensitivity for Novel Partners High (partner-agnostic) Limited to designed targets [36] [26]
Ability to Confirm Expression Direct evidence Indirect inference [39] [26]
Concordance with FISH 84.6% 82.5% [36]
Major Limitation RNA quality/expression level Large introns/regulatory elements [36]

The data demonstrates that targeted RNA-seq significantly improves the overall diagnostic rate compared to conventional approaches (76% vs. 63%) [26]. In head-to-head comparisons for RET fusion detection in NSCLC, RNA-seq and DNA-seq showed high concordance (92.3%), though targeted RNA-seq identified additional positive cases missed by whole-transcriptome sequencing and DNA-seq [36]. This enhanced sensitivity is attributed to RNA-seq's direct capture of expressed fusion transcripts, circumventing challenges DNA-seq faces with large intronic regions where breakpoints often occur [36].

RNA-seq particularly excels in identifying noncanonical fusion partners, which are increasingly recognized as clinically relevant. One study of 120 NSCLC cases reflexed to hybridization-capture-based RNA sequencing identified actionable fusions involving ALK, BRAF, NRG1, NTRK3, ROS1, and RET that were not detected by amplicon-based DNA/RNA testing [14]. This partner-agnostic capability makes RNA-seq invaluable for comprehensive fusion profiling, especially in cancers with diverse fusion partners.

Experimental Protocols and Methodologies

Targeted RNA-seq Laboratory Workflow

The targeted RNA-seq approach employs probe-based enrichment to overcome sensitivity limitations of whole-transcriptome sequencing for fusion detection [26]:

  • RNA Extraction and Quality Control: Extract total RNA from tumor samples (fresh frozen or FFPE). Assess RNA integrity using Bioanalyzer, requiring RIN score ≥7 for library construction [40].

  • Library Preparation: Convert RNA to double-stranded cDNA and add sequencing adapters. The use of ribosomal RNA depletion rather than poly-A selection is recommended as it preserves non-coding and degraded transcripts often present in FFPE samples [40].

  • Target Enrichment: Hybridize libraries with biotinylated oligonucleotide probes targeting exons of genes frequently involved in fusions. One validated panel design targets 188 genes for hematological malignancies and 241 genes for solid tumors, with overlapping coverage of 43 core fusion genes [26]. Perform double-capture to increase on-target rates to >90% [26].

  • Sequencing: Sequence enriched libraries on Illumina platforms (HiSeq, NovaSeq, NextSeq, or MiSeq). Recommended depth is 20-30 million paired-end reads per sample (2×100 bp or 2×150 bp) to adequately capture fusion junctions [41] [42].

Bioinformatic Analysis for Fusion Detection

The computational identification of fusions requires specialized pipelines to handle high false-positive rates common in RNA-seq data:

  • Quality Control and Preprocessing: Assess raw read quality with FastQC. Trim adapters and low-quality bases using Trimmomatic [42].

  • Alignment and Quantification: Map reads to the reference genome (e.g., GRCh38) using STAR aligner, which accurately handles splice junctions [41] [42]. Generate count matrices with FeatureCounts [42].

  • Fusion Calling: Execute multiple fusion detection algorithms (minimum of two recommended) such as STAR-Fusion and FusionCatcher to increase confidence [26] [43]. These tools identify chimeric reads spanning fusion junctions.

  • False Positive Filtering: Implement stringent filtering to remove artifacts:

    • Apply a promiscuity score to exclude genes frequently called in implausible fusions [9]
    • Require minimum of 2 RNA-seq replicates for fusion support [44]
    • Calculate Fusion Transcript Score to ensure adequate expression relative to partner genes [9]
    • Compare against blacklist of known artifacts [9]
  • Validation: Confirm high-confidence fusions in matched whole-genome sequencing data using discordant read pairs and soft-clipped alignments to identify supporting genomic breakpoints [39].

DNA-Seq Fusion Detection Protocol

DNA-based fusion detection employs different principles and analytical approaches:

  • DNA Extraction and Library Prep: Extract genomic DNA from tumor and normal tissue. Use targeted capture or amplicon-based panels (e.g., 425-gene panel) focusing on intronic regions of genes known to harbor fusions [36].

  • Sequencing and Structural Variant Calling: Sequence to high coverage (typically >200x). Process reads through alignment (BWA-MEM), indel realignment (GATK), and structural variant calling using tools like Delly to identify genomic rearrangements [36].

  • Integration with RNA Evidence: Overlap DNA breakpoints with RNA-seq fusion calls to confirm transcriptional activity [39]. This integrated approach provides the highest confidence in fusion validation.

Visualizing Fusion Detection Workflows

The following diagram illustrates the comprehensive integrated approach for fusion gene detection, combining both RNA-seq and DNA-seq methodologies:

fusion_detection Sample Tumor Sample RNA RNA Extraction & QC (RIN≥7) Sample->RNA DNA DNA Extraction Sample->DNA RNA_lib Library Prep (rRNA depletion) RNA->RNA_lib DNA_lib Library Prep (Targeted capture) DNA->DNA_lib RNA_seq Sequencing (20-30M PE reads) RNA_lib->RNA_seq DNA_seq Sequencing (>200x coverage) DNA_lib->DNA_seq RNA_align Alignment (STAR) & Quantification RNA_seq->RNA_align DNA_align Alignment (BWA-MEM) & SV calling DNA_seq->DNA_align RNA_fusion Fusion Calling (STAR-Fusion, FusionCatcher) RNA_align->RNA_fusion Integrate Integrated Analysis (RNA + DNA evidence) DNA_align->Integrate Filter False Positive Filtering (Promiscuity score, Expression) RNA_fusion->Filter Filter->Integrate Report High-Confidence Fusion Genes Integrate->Report

Figure 1: Integrated RNA-seq and DNA-seq fusion detection workflow

Successful fusion detection requires careful selection of laboratory reagents, computational tools, and reference databases. The table below catalogs essential resources for implementing a robust fusion detection pipeline.

Table 2: Essential research reagents and resources for fusion detection

Category Resource Specific Application Function
Wet Lab Reagents Ribosomal RNA depletion kits RNA library preparation Preserves non-polyadenylated transcripts
Biotinylated oligonucleotide panels Targeted RNA-seq Enrichment of fusion-related genes
RNA spike-in controls (ERCC, fusion sequins) Assay QC & quantification Absolute quantification and sensitivity assessment
Bioinformatics Tools STAR-Fusion, FusionCatcher Fusion detection Identification of chimeric transcripts from RNA-seq
Delly, Manta Structural variant calling DNA-based fusion detection
SAMtools, Picard Data processing BAM file processing and QC metrics
Reference Databases AACR Project GENIE Clinical genomics Repository of clinical cancer genomics data
ChimerDB, Mitelman Database Fusion annotation Curated database of known fusion genes
ENSEMBL, Gencode Genome annotation Reference gene models for alignment

The selection of biotinylated oligonucleotide panels is particularly critical, with designs targeting 188-241 fusion-related genes showing excellent coverage of clinically relevant fusions [26]. For bioinformatic analysis, the combination of STAR-Fusion and FusionCatcher provides complementary detection capabilities, with verification from the SMC-RNA Challenge benchmarking 77 fusion detection methods across 51 synthetic tumors [43]. Integration with reference databases like ChimerDB helps prioritize clinically relevant fusions and filter out likely artifacts [9].

The comprehensive comparison of RNA-seq and DNA-seq methodologies for fusion detection reveals a clear paradigm: while DNA-seq provides important information about genomic rearrangements, RNA-seq delivers superior sensitivity and clinical utility for identifying expressed chimeric transcripts, particularly when employing targeted enrichment approaches. The ability of RNA-seq to directly capture fusion expression, identify novel partners, and resolve complex isoforms makes it an indispensable tool for modern cancer genomics.

Looking forward, the integration of multiple technologies appears most promising for comprehensive fusion characterization. As demonstrated in recent studies, combining DNA-seq, targeted RNA-seq, and FISH achieves the highest diagnostic sensitivity while providing orthogonal validation [36]. Furthermore, emerging methodologies that validate fusion transcripts in matched whole-genome sequencing data offer powerful approaches for distinguishing high-confidence events from false positives [39]. As sequencing costs continue to decline and analytical methods mature, RNA-seq is poised to become the central technology for fusion detection in both clinical diagnostics and basic cancer research, ultimately expanding treatment options for patients with fusion-driven cancers.

Oncogenic gene fusions are major drivers in the pathogenesis of acute leukemia, with profound implications for disease classification, risk stratification, and therapeutic decision-making. The accurate detection of these rearrangements has become essential for modern precision oncology in hematologic malignancies. Currently, clinical laboratories employ diverse methodological approaches, primarily leveraging next-generation sequencing (NGS) technologies at either the DNA or RNA level. Each method offers distinct advantages and limitations in detecting these critical genetic events. This guide provides an objective comparison of DNA sequencing (DNA-seq) and RNA sequencing (RNA-seq) platforms for fusion gene detection in acute leukemia, synthesizing experimental data from recent studies to inform researchers, scientists, and drug development professionals.

Performance Comparison: DNA-seq vs. RNA-seq in Clinical Studies

Concordance and Unique Detection Capabilities

A comprehensive 2025 study comparing a 108-gene targeted RNA-seq panel with optical genome mapping (OGM) in 467 acute leukemia cases provides critical insights into method-specific performance characteristics. The cohort included 360 cases of acute myeloid leukemia (AML), 89 B-lymphoblastic leukemia (B-ALL), 12 T-ALL, and 6 mixed phenotype acute leukemia (MPAL) cases [21].

Table 1: Overall Detection Performance in 467 Acute Leukemia Cases

Metric RNA-seq Performance DNA-based OGM Performance
Overall concordance rate 88.1% across all cases 88.1% across all cases
Unique fusion detection 9.4% (22/234) of clinically relevant fusions 15.8% (37/234) of clinically relevant fusions
Case-level detection rate 43.6% (206/467) of cases showed ≥1 rearrangement/fusion 43.6% (206/467) of cases showed ≥1 rearrangement/fusion
Tier 1 aberration detection 31.5% (147/467) of cases 31.5% (147/467) of cases
Leukemia-type specific concordance Varied from 41.7% (T-ALL) to 80.2% (B-ALL) Varied from 41.7% (T-ALL) to 80.2% (B-ALL)

The data reveal that both methodologies contribute uniquely to comprehensive fusion detection, with RNA-seq particularly effective for identifying expressed chimeric transcripts, while DNA-based OGM excels at detecting structural rearrangements that may not generate fusion transcripts, such as enhancer-hijacking events [21].

Technology-Specific Strengths and Limitations

Table 2: Method-Specific Advantages and Limitations for Fusion Detection

Aspect RNA-seq DNA-seq
Detection principle Fusion transcripts Genomic rearrangements
Sensitivity for expressed fusions High Variable (depends on breakpoint location)
Ability to detect enhancer-hijacking Poor (20.6% concordance) Excellent
Performance with intrachromosomal deletions Slightly superior May interpret as simple deletions
Dependence on expression level High None
Effect of RNA degradation Significant concern Not applicable
Novel partner discovery Excellent with anchored multiplex PCR Limited by probe design
Coverage requirements Targeted panels sufficient Often requires extensive intronic coverage

The study found notably poor concordance (20.6%) for enhancer-hijacking lesions, including MECOM, BCL11B, and IGH rearrangements, many of which were not detected by RNA-seq. Conversely, RNA-seq slightly outperformed DNA-based OGM for fusions arising from intrachromosomal deletions that were sometimes labeled by OGM as simple deletions [21].

Experimental Protocols and Methodologies

Targeted RNA-seq Methodology

The 108-gene anchored multiplex PCR (AMP)-based RNA-Seq panel employed in the acute leukemia study utilizes specific experimental protocols optimized for hematologic malignancies [21]:

RNA Extraction and Quality Control: RNA is extracted from peripheral blood or bone marrow aspirate specimens. Quality control is critical, with RNA integrity number (RIN) typically assessed to ensure sample suitability.

Library Preparation: The AMP method utilizes unidirectional gene-specific primers (GSP2) targeting at least one of the two gene partners involved in translocation to capture novel fusion partners. This partner-agnostic approach enables discovery of previously uncharacterized fusions.

Sequencing and Analysis: Amplified targets undergo bidirectional sequencing on Illumina platforms. Sequencing reads are aligned to the human reference genome GRCh37/hg19, with fusion transcripts identified using Archer Analysis Software v6.2.7.

Validation Framework: The study established rigorous validation using the 2019 American College of Medical Genetics and Genomics (ACMG) and Clinical Genome Resource (ClinGen) Guidelines, with variants classified into three tiers based on established diagnostic, prognostic, or therapeutic relevance [21].

DNA-Based Sequencing Approaches

For DNA-based fusion detection, methodologies vary depending on the platform:

Hybrid Capture-Based DNA Sequencing: One study utilized a 542-gene solid tumor NGS panel with exonic probes supplemented with intronic bait probes against genes commonly involved in oncogenic fusions. This design specifically addresses the challenge of detecting breakpoints occurring in intronic regions [45].

Analytical Pipelines: The FindDNAFusion pipeline integrates multiple software tools (JuLI, Factera, and GeneFuse) to improve detection accuracy. This combinatorial approach achieved 98.0% detection accuracy for intron-tiled genes when optimized with blacklists for filtering common artifacts and criteria for selecting clinically reportable fusions [45].

Optical Genome Mapping: The OGM methodology involves:

  • Ultra-high-molecular-weight genomic DNA extraction from fresh bone marrow aspirates
  • DNA labeling with specific fluorophores
  • DNA molecule imaging and genome construction
  • Data analysis using Bionano Access software (version 1.8.2) and/or VIA (version 7.1) with HemeTargets and hg38-primary transcripts feature files [21]

G RNAseq RNA-seq Approach RNAmethod Detects fusion transcripts (expressed chimeric RNA) RNAseq->RNAmethod DNAseq DNA-seq Approach DNAmethod Detects genomic rearrangements (DNA structural variants) DNAseq->DNAmethod RNAstrength Strengths: • High sensitivity for expressed fusions • Identifies novel partners • Confirms functional transcripts RNAmethod->RNAstrength RNAweakness Limitations: • Affected by RNA degradation • Misses enhancer-hijacking • Expression-level dependent RNAmethod->RNAweakness DNAstrength Strengths: • Detects regulatory rearrangements • Not affected by expression • Identifies breakpoint locations DNAmethod->DNAstrength DNAweakness Limitations: • May miss complex rearrangements • Requires extensive coverage • Can misinterpret deletions DNAmethod->DNAweakness

Diagram 1: Fundamental differences between RNA-seq and DNA-seq approaches for fusion detection.

Emerging Technologies and Advanced Applications

Long-Read Transcriptome Sequencing

Third-generation sequencing technologies are emerging as powerful tools for fusion detection, offering advantages for analyzing complex genomic regions:

GFvoter Algorithm Performance: A novel tool employing a multivoting strategy for identifying gene fusions from long-read transcriptome sequencing data demonstrated superior performance compared to existing methods. When tested on real datasets from cancer cell lines and an AML patient sample, GFvoter achieved the highest average precision (58.6%) across nine experimental datasets, surpassing LongGF (39.5%), FusionSeeker (35.6%), and JAFFAL (30.8%) [8].

AML Transcript Isoform Diversity: Long-read sequencing of 60 primary AML bone marrow samples revealed extensive splicing abnormalities and identified 119,278 previously unannotated transcript isoforms. This isoform-level resolution enabled non-negative matrix factorization clustering that defined distinct molecular subtypes with strong correlations to patient prognosis, highlighting alternative splicing as a major contributor to AML molecular heterogeneity [46].

Machine Learning Approaches in Transcriptomic Analysis

Advanced computational methods are enhancing fusion detection and clinical interpretation:

k-mer-Based Classification: One study applied machine learning models trained on k-mer count matrices to predict favorable and adverse risk groups in AML patients based on RNA-seq data. This reference-free approach fragmented sequencing reads into k-mers (substrings of length k) that were indexed to provide a compressed yet comprehensive data representation [47].

Risk Stratification Performance: Models including Neural Networks, Random Forest, and eXtreme Gradient Boosting achieved over 90% accuracy in risk prediction and identified key gene signatures distinguishing ELN2017 favorable and adverse groups. This approach facilitated the selection of prognostic biomarkers with significant impacts on survival [47].

G Sample Leukemia Sample (Blood/Bone Marrow) DNApath DNA Analysis Path Sample->DNApath RNApath RNA Analysis Path Sample->RNApath DNAmethods Methods: • Hybrid-capture DNA-seq • Optical Genome Mapping • Whole Genome Sequencing DNApath->DNAmethods RNAmethods Methods: • Targeted RNA-seq panels • Whole Transcriptome Sequencing • Long-read RNA-seq RNApath->RNAmethods DNAoutput Outputs: • Structural variants • Breakpoint locations • Enhancer hijacking events DNAmethods->DNAoutput RNAoutput Outputs: • Expressed fusion transcripts • Alternative splicing • Novel isoform discovery RNAmethods->RNAoutput Integration Integrated Analysis • Comprehensive fusion detection • Complete molecular profiling • Enhanced clinical interpretation DNAoutput->Integration RNAoutput->Integration

Diagram 2: Complementary testing workflow for comprehensive fusion detection in leukemia.

The Scientist's Toolkit: Essential Research Reagents and Solutions

Table 3: Key Research Reagent Solutions for Fusion Detection Studies

Reagent/Resource Function/Application Specific Examples/Characteristics
Anchored Multiplex PCR (AMP) Target enrichment for RNA-seq Enables novel fusion partner discovery; used in 108-gene hematology panel
Intronic Bait Probes Enhanced DNA-seq fusion detection Supplemental probes for genes commonly involved in oncogenic fusions
Archer Analysis Software Fusion transcript identification v6.2.7 used for analyzing AMP-based RNA-seq data
Bionano Access Software OGM data analysis Version 1.8.2 with HemeTargets feature files
FindDNAFusion Pipeline DNA-seq fusion calling Integrates JuLI, Factera, and GeneFuse tools; 98% accuracy for intron-tiled genes
GFvoter Algorithm Long-read fusion detection Multivoting strategy for PacBio/Nanopore data; highest precision (58.6%) among tested tools
Kmtricks k-mer counting for ML approaches Generates k-mer count matrices from RNA-seq data for machine learning applications

The comparative analysis of DNA-seq and RNA-seq platforms for fusion detection in acute leukemia reveals distinct yet complementary strengths. RNA-seq demonstrates superior sensitivity for detecting expressed chimeric fusion transcripts and slightly better performance for fusions arising from intrachromosomal deletions. Conversely, DNA-based methods, particularly optical genome mapping, excel at identifying cryptic, enhancer-driven events that often evade transcriptomic detection.

The most comprehensive approach integrates both methodologies, as neither platform alone detects all clinically relevant rearrangements. The 74.7% overall concordance rate between methods, with each technology uniquely identifying significant percentages of clinically actionable fusions (9.4% by RNA-seq alone, 15.8% by OGM alone), underscores the necessity of multimodal analysis for complete molecular characterization in acute leukemia [21].

For research and clinical applications, selection of appropriate methodologies should consider specific study objectives, sample quality, and resource constraints. However, the evolving landscape of fusion detection increasingly supports integrated DNA and RNA analysis to fully elucidate the genomic complexity of hematologic malignancies and advance precision medicine approaches for leukemia patients.

Oncogenic gene fusions are pivotal drivers of cancer pathogenesis, serving as essential biomarkers for diagnosis, prognosis, and therapeutic targeting across a wide spectrum of solid tumors. The accurate detection of these structural variants is therefore a cornerstone of precision oncology. Next-generation sequencing (NGS) technologies, particularly DNA sequencing (DNA-Seq) and RNA sequencing (RNA-Seq), have become the primary methods for identifying these alterations. However, these platforms possess distinct and complementary strengths and limitations for fusion detection. This guide provides an objective comparison of DNA-Seq and RNA-Seq performance for identifying actionable fusions in a pan-cancer context, supported by recent experimental data and detailed methodologies to inform researchers, scientists, and drug development professionals.

Performance Comparison: RNA-Seq vs. DNA-Seq for Fusion Detection

Detection Rates and Technological Concordance

Table 1: Comparative Performance of RNA-Seq and Alternative Technologies for Fusion Detection

Metric Targeted RNA-Seq (108-gene panel) Optical Genome Mapping (OGM) Hybridization-Capture RNA-Seq Amplicon-Based DNA/RNA-Seq
Overall Concordance 88.1% (with OGM in leukemia) [21] 88.1% (with RNA-Seq in leukemia) [21] N/A N/A
Unique Fusion Detection Uniquely identified 9.4% of clinically relevant rearrangements [21] Uniquely identified 15.8% of clinically relevant rearrangements [21] Identified rare/novel fusions (ALK, BRAF, NRG1, NTRK3, ROS1, RET) missed by amplicon assay [14] Detected ~82.6% of known fusions; missed 17.4% potentially novel/rare fusions [14]
Sensitivity 98.4% for known fusions (WTS assay) [48] N/A N/A N/A
Specificity 100% (WTS assay) [48] N/A N/A N/A
Strength in Detection Expressed chimeric fusions; fusions from intrachromosomal deletions [21] Cryptic, enhancer-driven events (e.g., MECOM, BCL11B, IGH rearrangements) [21] Unbiased detection of known and novel fusions without prior knowledge of partners [14] Targeted detection of pre-specified fusions with high efficiency [14]

A direct comparison of a 108-gene targeted RNA-Seq panel and Optical Genome Mapping (OGM) in 467 acute leukemia cases revealed an overall concordance rate of 88.1% for gene rearrangements [21]. This high-level agreement, however, masks critical differences. OGM uniquely detected 15.8% of clinically relevant rearrangements, while RNA-Seq exclusively identified 9.4% [21]. The technological divergence is stark for specific fusion types; concordance for enhancer-hijacking lesions (e.g., involving MECOM) was markedly low at 20.6%, as these DNA-level rearrangements often do not produce fusion transcripts detectable by RNA-Seq [21]. Conversely, RNA-Seq slightly outperformed OGM for fusions arising from intrachromosomal deletions, which were sometimes misinterpreted by OGM as simple deletions [21].

In solid tumors, a study of 1,211 non-small cell lung cancer (NSCLC) specimens demonstrated the superior ability of hybridization-capture-based RNA-Seq to identify rare and novel oncogenic fusions. When used as a reflex test after negative amplicon-based testing, it successfully identified actionable fusions in 9 out of 120 cases, involving ALK, BRAF, NRG1, NTRK3, ROS1, and RET, none of which were detected by the initial amplicon-based assay [14]. Interrogation of a large database (AACR Project Genie) revealed that an amplicon-based approach could theoretically detect 82.6% of known fusions, leaving a significant 17.4% that would be missed and potentially identified by broader capture-based methods [14].

Pan-Cancer Actionability of Fusions

Table 2: Actionable Fusion Landscape in Pan-Cancer Analysis

Cancer Type Prevalence of Actionable Fusions Commonly Altered Genes Tumor-Agnostic Biomarker Status
Non-Small Cell Lung Cancer (NSCLC) ~10% of reflexed cases harbored actionable fusions [14]; 68.9% of identified fusions were potentially actionable [48] ALK, ROS1, RET, NTRK, BRAF, NRG1 [14] [48] TMB-High (16.8%), MSI-High, NTRK fusions, RET fusions [49]
All Solid Tumors (Pan-Cancer) 8.4% of samples had at least one tumor-agnostic biomarker [49] NTRK, RET, BRAF [49] TMB-High (6.6%), MSI-High, NTRK fusions, RET fusions, BRAF V600E [49]
Thyroid Cancer 30% had a tumor-agnostic biomarker [49] BRAF [49] BRAF V600E [49]
Melanoma 22.7% had a tumor-agnostic biomarker [49] BRAF [49] BRAF V600E [49]

Comprehensive genomic profiling (CGP) of 1,166 tissue samples across 29 cancer types in an Asian cohort found that at least one established tumor-agnostic biomarker—including MSI-High, TMB-High, NTRK fusions, and BRAF V600E—was present in 8.4% of samples, spanning 26 different cancer types [49]. The prevalence was particularly high in specific cancers, such as thyroid cancer (30%) and melanoma (22.7%) [49]. In NSCLC, a focused validation of a whole transcriptome sequencing (WTS) assay demonstrated that a significant majority (68.9%) of the fusions identified were potentially actionable, highlighting the critical clinical value of comprehensive fusion detection in this and other malignancies [48].

Experimental Protocols and Methodologies

Targeted RNA-Sequencing Assay (OSU-SpARKFuse)

The OSU-SpARKFuse assay was designed for clinical-grade detection of gene fusions in solid tumors [18].

  • Probe Design: The assay targets complete transcripts from 93 kinase and transcription factor genes. Using RefSeq transcripts, 5'-biotinylated 120-mer probes were designed to be non-overlapping and allowed to cross exon-exon junctions. Probes for exons shorter than 120 bp were centered over the exon and extended into intronic regions. The final design included 3,143 target probes, with additional probes for genomic DNA contamination assessment and external RNA controls [18].
  • RNA Extraction & QC: RNA is extracted from FFPE or fresh-frozen tissues using silica-based membrane kits (e.g., miRNeasy FFPE kit, Qiagen). Quality control is performed using a TapeStation 2200 (Agilent) or similar system. The DV200 value (percentage of RNA fragments >200 nucleotides) is critical; samples with DV200 ≤ 30% skip the fragmentation step in library preparation [18].
  • Library Prep & Sequencing: Ribosomal RNA is depleted using Ribo-Zero (Illumina). Following fragmentation (unless skipped for degraded samples), cDNA synthesis, A-tailing, and ligation of unique indexes are performed using the Illumina TruSeq Stranded Total RNA Library Kit. Libraries are amplified, pooled, and hybridized with the custom probe set for 16-18 hours. Captured libraries are washed, amplified, and sequenced on an Illumina MiSeq system with 2x100 bp paired-end reads [18].
  • Validation Performance: This validated assay demonstrated 93.3% sensitivity and 100% specificity for fusion detection, with high repeatability (96.3% intrarun concordance) and reproducibility (94.4% interrun concordance) [18].

Whole Transcriptome Sequencing (WTS) Assay

A novel WTS assay was developed for the detection of gene fusions, MET exon 14 skipping, and EGFRvIII alterations [48].

  • Sample Requirements: The protocol requires FFPE samples with a tumor content exceeding 20%, stored at 4°C for less than one year to minimize RNA degradation. A DV200 value of ≥ 30% is defined as the threshold for acceptable RNA degradation [48].
  • RNA Extraction & Library Prep: Total RNA is extracted from FFPE sections using the RNeasy FFPE Kit (Qiagen). Ribosomal RNA is removed using the NEBNext rRNA Depletion Kit. For samples with DV200 ≤ 50%, the fragmentation step is omitted. cDNA synthesis and NGS library preparation are performed using the NEBNext Ultra II Directional RNA Library Prep Kit. Sequencing is performed to generate an average of 25 Gb of 100 bp paired-end data per sample [48].
  • Bioinformatic Filtering: To reduce false positives, a curated list of 553 reportable genes is used, filtering the transcriptome from ~22,000 genes down to those with known diagnostic, prognostic, or therapeutic value [48].
  • Performance Characteristics: This WTS assay achieved a sensitivity of 98.4% (62/63 known fusions) and a specificity of 100% [48].

Signaling Pathways and Oncogenic Mechanisms

Gene fusions drive oncogenesis through constitutive activation of key cellular signaling pathways that promote proliferation, survival, and metastasis. The diagram below illustrates the core pathways impacted by common actionable fusions in solid tumors.

Oncogenic Signaling Pathways Activated by Gene Fusions

Fusions often involve receptor tyrosine kinases (RTKs) or their downstream effectors. For example, fusions involving ALK, ROS1, RET, NTRK, and FGFR genes lead to ligand-independent dimerization and constitutive activation of the kinase [48] [18]. This aberrant activation persistently stimulates two major downstream pathways: the RAS-RAF-MEK-ERK (MAPK) pathway, which drives cell proliferation, and the PI3K-AKT-mTOR pathway, which promotes cell growth, survival, and metabolic changes [48]. MET exon 14 skipping, another RNA-level alteration detectable by RNA-Seq, results in increased stability of the MET receptor and activation of these same downstream pathways, making it a potent oncogenic driver in NSCLC and other cancers [48].

The Scientist's Toolkit: Essential Research Reagents

Table 3: Key Reagents and Kits for RNA-Seq-Based Fusion Detection

Reagent/Kits Function Specific Examples (from cited studies)
RNA Extraction Kit Isolate high-quality RNA from challenging sample types like FFPE. RNeasy FFPE Kit (Qiagen) [48] [18], miRNeasy Kit (Qiagen) [18]
RNA Quality Control Instrument Assess RNA integrity and quantity, a critical pre-analytical step. TapeStation 2200 (Agilent) [18], Agilent 2100 Bioanalyzer [48], NanoDrop [18]
rRNA Depletion Kit Remove abundant ribosomal RNA to enrich for coding and non-coding transcripts of interest. Ribo-Zero (Illumina) [18], NEBNext rRNA Depletion Kit [48]
Library Prep Kit Convert RNA into sequencer-compatible cDNA libraries. Illumina TruSeq Stranded Total RNA Library Kit [18], NEBNext Ultra II Directional RNA Library Prep Kit [48]
Targeted Capture Probes Enrich sequencing libraries for specific genes of interest. Custom 120-mer biotinylated probes (IDT) [18]
NGS Platform Perform high-throughput sequencing of prepared libraries. Illumina MiSeq [18], Gene+ seq 2000 [48]

The choice between DNA-Seq and RNA-Seq for detecting actionable fusions in solid tumors is not a matter of selecting a superior technology, but rather of understanding their complementary roles. DNA-based methods like OGM are powerful for identifying structural rearrangements at the genomic level, including cryptic enhancer-hijacking events. However, RNA-Seq confirms the expression of fusion transcripts, filters out silent passenger mutations, and detects a broader range of alterations, including novel fusion partners and splice variants like MET exon 14 skipping. For clinical and research applications aiming to maximize the detection of therapeutically actionable fusions, an integrated approach, potentially using a reflex testing model, provides the most comprehensive and clinically impactful solution.

While the comparison of RNA sequencing (RNA-seq) to DNA-based methods like Optical Genome Mapping (OGM) often focuses on fusion detection proficiency, this narrow view overlooks the profound utility of RNA-seq in modern drug discovery. The paradigm is shifting from merely detecting structural variants to understanding their functional consequences and exploiting this knowledge for therapeutic development. In acute leukemias, for instance, where RNA-seq demonstrates an 88.1% overall concordance with OGM, each method reveals distinct biological insights: OGM uniquely identifies 15.8% of clinically relevant rearrangements (particularly cryptic, enhancer-hijacking events), while RNA-seq exclusively detects 9.4% of fusions, especially those arising from intrachromosomal deletions [21]. This complementary relationship underscores that RNA-seq's true value extends far beyond structural variant detection into the realm of functional genomics and mechanism of action (MoA) elucidation.

The transcriptome provides a dynamic view of cellular states that DNA-level analyses cannot capture, positioning RNA-seq as an indispensable tool for understanding disease mechanisms, drug responses, and therapeutic opportunities. As we move toward personalized cancer treatments, the ability to connect genetic alterations to their functional transcriptional outcomes becomes increasingly critical for developing targeted therapies and predicting treatment efficacy [50]. This article explores how RNA-seq technologies are revolutionizing drug discovery by moving beyond fusion detection to provide insights into complex drug mechanisms, resistance patterns, and novel therapeutic targets.

Comparative Performance in Fusion Detection: Setting the Stage

Methodological Strengths and Limitations in Structural Variant Detection

Understanding the relative performance of RNA-seq versus DNA-based methods provides crucial context for appreciating its expanded role in drug discovery. A comprehensive 2025 study of 467 acute leukemia cases offers valuable comparative data, summarized in the table below [21].

Table 1: Comparative Performance of Targeted RNA-seq and Optical Genome Mapping in Acute Leukemia

Performance Metric Targeted RNA-seq (108-gene panel) Optical Genome Mapping (OGM)
Overall Concordance 88.1% with OGM 88.1% with RNA-seq
Unique Detection of Clinically Relevant Rearrangements 9.4% 15.8%
Detection of Enhancer-Hijacking Lesions Poor (20.6% concordance) Effective
Detection of Fusions from Intrachromosomal Deletions Effective Sometimes labeled as simple deletions
Concordance Variation by Leukemia Type 80.2% in B-ALL to 41.7% in T-ALL Same variation pattern
Key Advantages Detects expressed chimeric fusions; slightly better for deletion-related fusions Better for cryptic, enhancer-driven events without fusion transcripts

This comparative analysis reveals that RNA-seq and DNA-level methods provide complementary rather than redundant information. The fundamental distinction lies in what each method detects: RNA-seq identifies expressed chimeric fusion transcripts, while OGM reveals structural rearrangements regardless of their transcriptional activity [21]. This distinction becomes particularly important in enhancer-hijacking events, such as those involving MECOM, BCL11B, and IGH rearrangements, which frequently evade RNA-seq detection because they can activate oncogenes without generating fusion transcripts [21].

Implications for Drug Discovery

The technical limitations in fusion detection directly impact therapeutic development. For example, the poor performance of RNA-seq in detecting enhancer-hijacking lesions (20.6% concordance) means that potentially targetable events would be missed using RNA-seq alone [21]. Conversely, RNA-seq's ability to detect expressed fusions provides direct evidence of biologically active oncogenic drivers that may represent more promising drug targets.

These comparative insights establish why a multi-modal approach is increasingly necessary in clinical genomics and why RNA-seq's role must expand beyond fusion detection to leverage its unique capabilities in understanding functional biology.

The Expanding Role of RNA-seq in Mechanism of Action Elucidation

Pharmacotranscriptomics: A New Paradigm for Drug Screening

The emergence of pharmacotranscriptomics—the large-scale profiling of gene expression changes in response to drug perturbations—represents a fundamental shift in drug screening methodologies. This approach has developed into the third major class of drug screening, distinct from target-based and phenotype-based screening [51]. By capturing the complex transcriptional responses to drug treatments, researchers can infer mechanisms of action, identify biomarkers of response, and discover novel therapeutic applications for existing compounds.

Pharmacotranscriptomics-based drug screening (PTDS) can detect system-wide changes in gene expression following drug perturbation, enabling researchers to analyze the efficacy of drug-regulated gene sets, signaling pathways, and complex disease networks by combining large-scale transcriptomic profiling with artificial intelligence [51]. This approach is particularly valuable for understanding the complex mechanisms of traditional Chinese medicine and other multi-component therapies, where multiple targets and pathways are simultaneously engaged [51].

Single-Cell Pharmacotranscriptomics: Resolving Heterogeneity

The integration of single-cell RNA sequencing (scRNA-seq) with drug screening has created powerful new opportunities for MoA elucidation. A landmark 2025 study demonstrated a high-throughput multiplexed scRNA-seq pharmacotranscriptomics pipeline that combined drug screening with 96-plex single-cell RNA sequencing [52]. This approach enabled the researchers to explore the heterogeneous transcriptional landscape of primary high-grade serous ovarian cancer (HGSOC) cells after treatment with 45 drugs spanning 13 distinct mechanisms of action.

Table 2: Single-Cell Pharmacotranscriptomics Experimental Design for MoA Elucidation

Experimental Component Specifications Application in MoA Studies
Drug Library 45 drugs, 13 mechanism of action classes PI3K-AKT-mTOR inhibitors, Ras-Raf-MEK-ERK pathway inhibitors, CDK inhibitors, epigenetic modifiers, etc.
Cell Models 3 HGSOC models: JHOS2 cell line + 2 patient-derived cancer cells (PDC2, PDC3) Capturing inter-patient and intra-patient heterogeneity
Multiplexing Approach Live-cell barcoding using antibody-oligonucleotide conjugates (Cell Hashing) 96-plex scRNA-Seq enabling high-throughput screening
Cells Analyzed 36,016 high-quality cells across 288 samples Single-resolution dissection of heterogeneous drug responses
Key Finding PI3K-AKT-mTOR inhibitors induced feedback activation of RTKs (EGFR) via CAV1 upregulation Identified drug resistance mechanism and synergistic combination (PI3K-AKT-mTOR + EGFR inhibitors)

The power of this single-cell approach lies in its ability to resolve heterogeneous drug responses within complex cell populations. While bulk RNA-seq averages responses across all cells, scRNA-seq can identify distinct subpopulations with different drug sensitivities and resistance mechanisms [52]. In the HGSOC study, cells treated with different drug classes clustered distinctly: those treated with PI3K-AKT-mTOR, Ras-Raf-MEK-ERK, and multikinase inhibitors showed milder, model-specific transcriptional shifts, while cells treated with BET, HDAC, and CDK inhibitors formed distinct clusters enriched with cells from all three models, suggesting more consistent cross-model effects [52].

Experimental Protocols for MoA Studies

The typical workflow for single-cell pharmacotranscriptomic MoA studies involves several critical steps [52]:

  • Sample Preparation: Fresh tissue dissociation or use of frozen nuclei when fresh samples are unavailable. For the HGSOC study, patient-derived tumor epithelial cancer cells were isolated and cultured ex vivo at early passages to avoid loss of phenotypic identity.

  • Drug Treatment: Cells are treated with compounds at concentrations above the half-maximal effective concentration (EC50) based on prior drug sensitivity and resistance testing (DSRT) screens, typically for 24 hours to elicit a detectable transcriptional response.

  • Live-Cell Barcoding: Following drug treatments, cells from each well are labeled with a unique pair of antibody-oligonucleotide conjugates (Hashtag oligos or HTOs) targeting surface proteins like β2 microglobulin (B2M) and CD298.

  • Cell Pooling and scRNA-seq: All barcoded cells are pooled together for multiplexed single-cell RNA sequencing, significantly reducing costs and technical variability compared to processing samples individually.

  • Bioinformatic Analysis: The data processing includes demultiplexing cells based on their HTO barcodes, quality control, clustering, and differential expression analysis to identify drug-specific transcriptional signatures.

This protocol enables the systematic identification of single-cell transcriptomic responses to drugs, providing unprecedented insights into the heterogeneous mechanisms of drug action in complex cancer populations [52].

RNA-seq in the Drug Discovery Pipeline

Target Identification and Validation

RNA-seq technologies contribute significantly to multiple stages of the drug discovery pipeline, beginning with target identification and validation. By detecting differentially expressed transcripts across disease states, RNA-seq helps uncover new molecular mechanisms of disease—an essential prerequisite for developing new drug targets [50]. For example, RNA-seq has identified distinct oncogene-driven transcriptome profiles, enabling the identification of potential targets for cancer therapy [50].

Single-cell RNA sequencing further enhances target discovery by resolving cellular heterogeneity and identifying novel cell types and subtypes that may represent promising therapeutic targets [53]. The technology has enabled the identification of molecular pathways that predict survival, therapy response, likelihood of resistance, and candidacy for alternative interventions [53]. When combined with CRISPR screening technologies, scRNA-seq enables high-content functional genomics screens that can credential and prioritize drug targets by directly linking genetic perturbations to transcriptional outcomes at single-cell resolution [53].

Understanding Drug Resistance and Sensitivity

Chemotherapy resistance remains a major obstacle in oncology, and RNA-seq provides powerful tools to investigate its mechanisms. By comparing gene expression profiles between drug-resistant and sensitive cells, researchers can identify genes and pathways associated with treatment failure [50]. In triple-negative breast cancer (TNBC), for instance, RNA-seq analysis of drug-resistant cell lines revealed significant differences in cytokine-cytokine receptor interaction pathways, providing new ideas for developing more effective treatments [50].

Small RNA-Seq has proven particularly valuable for investigating the role of microRNAs (miRNAs) in regulating drug resistance. In a study of doxorubicin resistance in hepatocellular carcinoma, researchers used RNA-seq to identify down-regulated miRNAs and their associated functional pathways, providing potential targets for overcoming resistance [50].

Biomarker Discovery and Patient Stratification

RNA-seq is increasingly important for identifying biomarkers that can predict treatment response and stratify patient populations. Fusion genes, once considered rare, are now recognized as powerful biomarkers and therapeutic targets across multiple cancer types [50] [54]. RNA-seq has uncovered recurrent gene fusions in acute myeloid leukemia, breast cancer, and colorectal cancer, providing promising targets for personalized therapies [50].

Beyond fusions, RNA-seq can identify various other cancer biomarker types, including small RNAs (such as miRNAs), various non-coding RNAs (lncRNAs and circRNAs), and gene expression signatures that correlate with disease progression, recurrence, and treatment response [50]. These biomarkers are essential for developing companion diagnostics and implementing precision medicine approaches.

Visualization of Key Concepts

RNA-seq in the Drug Discovery Workflow

The following diagram illustrates how RNA-seq technologies integrate into various stages of the drug discovery and development pipeline, from initial target identification to clinical decision-making.

G Target Identification Target Identification Bulk RNA-seq Bulk RNA-seq Target Identification->Bulk RNA-seq  Disease transcriptome Single-cell RNA-seq Single-cell RNA-seq Target Identification->Single-cell RNA-seq  Cellular heterogeneity Target Validation Target Validation Target Validation->Single-cell RNA-seq  CRISPR screening Lead Optimization Lead Optimization Pharmacotranscriptomics Pharmacotranscriptomics Lead Optimization->Pharmacotranscriptomics  MoA elucidation Preclinical Development Preclinical Development Preclinical Development->Pharmacotranscriptomics  Efficacy & toxicity Clinical Development Clinical Development Clinical Development->Bulk RNA-seq  Biomarker discovery

Single-Cell Pharmacotranscriptomics Workflow

This diagram details the experimental workflow for multiplexed single-cell RNA-seq pharmacotranscriptomic analysis, which enables high-throughput drug screening at single-cell resolution.

G Drug Library (45 drugs) Drug Library (45 drugs) 96-Well Plate Drug Treatment 96-Well Plate Drug Treatment Drug Library (45 drugs)->96-Well Plate Drug Treatment Cell Models (HGSOC) Cell Models (HGSOC) Cell Models (HGSOC)->96-Well Plate Drug Treatment Live-Cell Barcoding (HTO) Live-Cell Barcoding (HTO) 96-Well Plate Drug Treatment->Live-Cell Barcoding (HTO) 24h treatment Cell Pooling Cell Pooling Live-Cell Barcoding (HTO)->Cell Pooling Antibody-oligo conjugates scRNA-seq Processing scRNA-seq Processing Cell Pooling->scRNA-seq Processing 36,016 cells Bioinformatic Analysis Bioinformatic Analysis scRNA-seq Processing->Bioinformatic Analysis Demultiplexing MoA Insights MoA Insights Bioinformatic Analysis->MoA Insights Heterogeneous responses Mechanism Elucidation Mechanism Elucidation MoA Insights->Mechanism Elucidation Resistance Pathways Resistance Pathways MoA Insights->Resistance Pathways Combination Therapies Combination Therapies MoA Insights->Combination Therapies

The Scientist's Toolkit: Essential Research Reagent Solutions

Implementing RNA-seq technologies for MoA studies and drug discovery requires specialized reagents and tools. The following table details key solutions used in the featured studies.

Table 3: Essential Research Reagent Solutions for Pharmacotranscriptomic Studies

Reagent/Tool Category Specific Examples Function in Experimental Pipeline
Cell Barcoding Systems Cell Hashing with anti-B2M and anti-CD298 antibody-oligonucleotide conjugates [52] Enables sample multiplexing by labeling cells from different conditions with unique barcodes before pooling
scRNA-seq Platforms 10X Chromium technology [53] Creates microdroplet reaction chambers for single-cell RNA capture and barcoding
Library Prep Kits SMART-seq2 for plate-based protocols [53] Provides high-sensitivity full-length transcript coverage for single cells
Bioinformatic Tools Cell Ranger, STARsolo, Alevin, Kallisto-BUStools [53] Processes raw sequencing data into cell-by-gene count matrices
Data Analysis Platforms Seurat, Scanpy [53] Performs quality control, normalization, clustering, and differential expression analysis
Pathway Analysis Tools Gene set variation analysis (GSVA) [52] Evaluates activity of biological processes and pathways from transcriptome data
Fusion Detection Algorithms DEEPEST (Data-Enriched Efficient PrEcise STatistical fusion detection) [54] Identifies gene fusions with high specificity while minimizing false positives in large datasets

The application of RNA-seq technologies in drug discovery has evolved far beyond its initial role in fusion detection to become an indispensable tool for understanding therapeutic mechanisms of action. While DNA-level methods like OGM provide crucial information about structural variants, RNA-seq delivers unique insights into the functional consequences of these variants and other disease-associated perturbations. The emergence of single-cell pharmacotranscriptomics represents a particularly significant advance, enabling researchers to dissect heterogeneous drug responses in complex cell populations and identify resistance mechanisms that would be obscured in bulk analyses.

As artificial intelligence and machine learning increasingly integrate with transcriptomic data analysis, the potential for RNA-seq to revolutionize drug discovery continues to grow. By capturing the dynamic complexity of disease states and therapeutic responses, RNA-seq provides a powerful window into biological systems that is transforming target identification, lead optimization, and clinical development. For researchers and drug development professionals, embracing these technologies and their applications will be essential for developing the next generation of targeted therapies and personalized medicine approaches.

Navigating Challenges and Optimizing Detection Assays

Next-generation sequencing (NGS) has revolutionized genomic analysis, with DNA sequencing (DNA-seq) and RNA sequencing (RNA-seq) serving as complementary technologies for detecting oncogenic alterations in cancer research. While DNA-seq provides a comprehensive view of genomic architecture, it faces significant limitations in identifying specific structural variants, particularly those involving large intronic regions and enhancer-hijacking events. These limitations have profound implications for drug development and clinical diagnostics, where missing these alterations can directly impact patient access to targeted therapies. This guide examines the technical challenges of DNA-seq through comparative experimental data and provides methodologies for implementing integrated sequencing approaches in fusion detection research.

Technical Limitations of DNA-seq in Fusion Detection

The Challenge of Large Intronic Regions

DNA-seq encounters substantial difficulties in detecting fusion events that involve large intronic regions in partner genes. The technical limitation stems from the common practice of using fragmented DNA libraries and the positioning of primer/probe sets, which may not adequately cover extensive intronic sequences where breakpoints occur.

Table: Detection Rates of RET Fusions Across Methodologies

Detection Method Cases Identified Detection Rate Key Limitations
DNA Sequencing (DNA-seq) 40/40 100% (reference) May miss fusions with breakpoints in large introns
Targeted RNA-seq 39/39 100% Identifies expressed fusions; requires good RNA quality
Whole-Transcriptome Sequencing (WTS) 31/39 79.5% Lower sensitivity for low-abundance transcripts
Fluorescence In Situ Hybridization (FISH) 33/40 82.5% Limited to known partners; reveals architecture [36]

Research on RET fusions in non-small cell lung cancer (NSCLC) demonstrates that DNA-seq, while capable of initial identification, benefits substantially from RNA-seq confirmation. In a study of 40 RET+ NSCLC patients, targeted RNA-seq identified five additional RET+ cases that were missed by whole-transcriptome sequencing, highlighting its superior sensitivity for detecting these fusion events [36]. The same study reported a 92.3% concordance between DNA-seq and RNA-seq, with discordant cases potentially representing limitations in either platform's ability to detect certain fusion types.

Enhancer-Hijacking Events: The Invisible Threat

Enhancer-hijacking represents a particularly challenging structural variant for DNA-seq to detect. These events occur when genomic rearrangements place enhancer elements in proximity to oncogenes, activating their expression without generating fusion transcripts. This mechanism is especially prevalent in acute leukemias, where it drives oncogenesis through dysregulation of key developmental genes.

Table: Detection of Enhancer-Hijacking Events in Acute Leukemia (n=467 cases)

Method Overall Concordance Rate Enhancer-Hijacking Detection Rate Commonly Missed Alterations
Optical Genome Mapping (OGM) + RNA-seq 74.7% 20.6% -
Optical Genome Mapping (OGM) Alone - 15.8% uniquely detected MECOM, BCL11B, IGH rearrangements
Targeted RNA-seq Alone - 9.4% uniquely detected CDK6::MNX1, other enhancer-driven events [55]

A comprehensive analysis of 467 acute leukemia cases revealed strikingly poor performance in detecting enhancer-hijacking lesions, with only 20.6% concordance between OGM and targeted RNA-seq methods. OGM uniquely detected 15.8% of clinically relevant rearrangements, while RNA-seq exclusively identified 9.4%. The dramatically low concordance for enhancer-hijacking events (20.6%) compared to all other aberration types (93.1%) underscores the fundamental detection challenge [55]. These findings highlight how enhancer-hijacking events represent a critical blind spot for transcriptome-based methods, requiring complementary technologies for comprehensive detection.

Experimental Approaches for Comprehensive Fusion Detection

Integrated DNA and RNA Sequencing Workflows

Advanced research protocols have demonstrated the utility of integrated sequencing approaches that combine the strengths of multiple technologies. These workflows typically employ DNA-seq as an initial screening tool, followed by targeted RNA-seq for verification and characterization of fusion events.

G Start NSCLC Tumor Sample DNAseq DNA-seq Screening (425-gene panel) Start->DNAseq QC RNA Quality Control (RIN >7, 260/280 ratio) DNAseq->QC WTS Whole-Transcriptome Sequencing (WTS) QC->WTS TargetedRNA Targeted RNA-seq (Partner-agnostic panel) WTS->TargetedRNA If WTS inconclusive or novel fusion suspected FISH FISH Confirmation WTS->FISH For validation TargetedRNA->FISH Results Comprehensive Fusion Characterization FISH->Results

The workflow for RET fusion characterization exemplifies this integrated approach. DNA-seq serves as an initial screen using a 425-gene panel, followed by RNA quality assessment (RIN >7, appropriate 260/280 ratios). Whole-transcriptome sequencing provides broad detection capability, with targeted RNA-seq employed for cases showing inconclusive results or suspected novel fusions. FISH validation confirms potentially actionable findings, ensuring comprehensive fusion characterization [36]. This multi-modal approach maximizes sensitivity while maintaining specificity through orthogonal verification.

Reflex Testing Protocols for Enhanced Detection

Reflex testing protocols represent a strategic approach to balancing comprehensive detection with cost efficiency. These protocols employ initial screening with broader but less sensitive methods, followed by targeted secondary testing for negative cases with high clinical suspicion.

In non-small cell lung cancer, research has demonstrated the utility of amplicon-based DNA/RNA sequencing as an initial test, with reflex to hybridization-capture-based RNA sequencing for driver-negative cases. In one study of 1,211 NSCLC specimens, approximately 10% required reflex testing, which identified nine oncogenic fusions involving ALK, BRAF, NRG1, NTRK3, ROS1, and RET that were missed by the initial amplicon-based assay [14]. Analysis of the AACR Project Genie database revealed that 17.4% of fusions in NSCLC would be undetectable by amplicon-based approaches alone, highlighting the critical importance of reflex testing protocols [14].

The Scientist's Toolkit: Essential Research Reagents

Table: Key Research Reagents for Fusion Detection Studies

Reagent/Category Specific Examples Research Function Technical Considerations
Nucleic Acid Extraction QIAamp DNA FFPE Tissue Kit, PAXgene RNA tubes Preserve molecular integrity from specimens FFPE degradation affects DNA; blood requires RNA stabilizers [56] [36]
Target Enrichment Archer FusionPlex, Anchored Multiplex PCR (AMP) Capture known/novel fusion transcripts Partner-agnostic designs essential for novel discovery [55]
Library Preparation KAPA HyperPrep, Illumina Stranded kits Prepare sequencing libraries Stranded protocols preserve transcript orientation [56]
rRNA Depletion RNase H-based kits, Ribozero Enhance non-ribosomal sequencing Reproducible but may cause off-target effects [56]
Sequencing Platforms PacBio long-read, Illumina HiSeq/IsoSeq Generate sequence data Long-read spans complex rearrangements [57]

Experimental Protocol: Integrated Fusion Detection

Sample Preparation and Quality Control

  • Extract DNA and RNA from matched patient samples (FFPE or fresh frozen)
  • Assess DNA quality via Nanodrop 2000 and Qubit dsDNA HS Assay
  • Evaluate RNA integrity using RIN values >7 and distinct 28S/18S ribosomal peaks
  • Perform rRNA depletion using RNase H-based methods to enhance detection of non-ribosomal transcripts [56]

Library Preparation and Sequencing

  • Prepare DNA-seq libraries using a comprehensive gene panel (e.g., 425-cancer gene panel)
  • Construct RNA-seq libraries using both whole-transcriptome and targeted approaches
  • Employ stranded library protocols to preserve transcript orientation
  • Sequence on appropriate platforms (Illumina for short-read, PacBio for long-read applications) [36]

Data Analysis and Validation

  • Align sequences to reference genome (hg19/GRCh37) using BWA-MEM
  • Call fusion variants using dedicated algorithms (Delly, Archer Analysis)
  • Validate positive findings using orthogonal methods (FISH)
  • Interpret variants according to ACMG/AMP guidelines for clinical relevance [55] [36]

Signaling Pathways and Biological Impact

G Enhancer Enhancer Element Oncogene Oncogene (e.g., MECOM) Enhancer->Oncogene Enhancer Hijacking Expression Oncogene Overexpression Oncogene->Expression Signaling Constitutive Signaling Activation Expression->Signaling Proliferation Uncontrolled Proliferation Signaling->Proliferation Leukemogenesis Leukemogenesis Proliferation->Leukemogenesis FusionDNA Genomic Rearrangement (DNA level) FusionTranscript Chimeric Fusion Transcript FusionDNA->FusionTranscript OncogenicFusion Oncogenic Fusion Protein FusionTranscript->OncogenicFusion KinaseActivation Constitutive Kinase Activity OncogenicFusion->KinaseActivation Transformation Cellular Transformation KinaseActivation->Transformation

The diagram illustrates two distinct mechanisms of oncogene activation with different detection requirements. Enhancer-hijacking events (top pathway) activate oncogenes through repositioning of regulatory elements without generating fusion transcripts, making them detectable primarily at the DNA level through methods like optical genome mapping. Conventional fusion events (bottom pathway) create chimeric transcripts and proteins detectable by both DNA-seq and RNA-seq, though with varying efficiency depending on the genomic architecture [55]. This mechanistic distinction explains why multi-platform approaches are necessary for comprehensive oncogenic driver detection.

DNA-seq presents significant limitations in detecting fusion events involving large intronic regions and enhancer-hijacking mechanisms, potentially missing clinically actionable alterations in cancer. The experimental data presented demonstrates that integrated approaches combining DNA-seq with targeted RNA-seq and optical genome mapping provide substantially improved detection sensitivity. For researchers and drug development professionals, implementing reflexive testing protocols and leveraging complementary technologies is essential for comprehensive genomic characterization. These multi-modal strategies ensure identification of both canonical fusion events and complex structural variants, ultimately supporting more precise targeted therapy development and patient stratification.

RNA sequencing (RNA-seq) has become an indispensable tool in modern biological research, providing a comprehensive snapshot of the transcriptome that enables researchers to quantify gene expression levels, detect alternative splicing events, and identify non-coding RNAs [58]. Its ability to capture dynamic changes in gene expression under different conditions or treatments makes RNA-seq invaluable for studying various biological processes, including development, disease mechanisms, and drug responses [58]. In the specific context of fusion gene detection—a critical application in oncology research—RNA-seq provides a complementary method to DNA sequencing that may improve the identification of actionable variants [25].

However, the powerful capabilities of RNA-seq come with significant technical challenges that researchers must navigate to generate reliable data. Two of the most critical limitations concern its profound dependence on RNA quality and transcript expression levels. These technical constraints can substantially impact data quality, potentially leading to false negatives, inaccurate quantification, and ultimately, compromised biological conclusions. This guide objectively examines these pitfalls through the lens of experimental evidence and provides researchers with practical frameworks for designing robust RNA-seq experiments, particularly for fusion detection research where these factors play a decisive role in success or failure.

The Fundamental Challenge: RNA Quality Dependence

The Technical Basis of RNA Instability

Unlike DNA, which is relatively stable and does not degrade rapidly, RNA is inherently unstable and degrades quickly once extracted from cells [4]. This fundamental chemical property poses one of the most significant practical challenges for RNA-seq workflows. The instability of RNA necessitates careful preservation immediately after sample collection, as high-quality tissue is required due to RNA's rapid degradation potential [58]. Specific tissue types present particular challenges; for instance, brain tissue must be collected quickly post-mortem to avoid degradation, which is often not feasible in clinical settings [58].

The RNA extraction process itself is "a difficult and often error prone process involving many steps with loss of sample at every step" [59]. The requirement to remove highly abundant ribosomal RNA (rRNA), which typically constitutes over 90% of total RNA in the cell, further complicates the process, leaving only the 1-2% comprising messenger RNA (mRNA) that researchers are typically interested in [28]. This process adds "labor, cost and time" and depletes the amount of original sample, creating particular challenges when working with needle biopsies, rare transcripts, or single cells [59].

Impact of Sample Quality on Results

The quality of starting RNA material directly influences multiple aspects of RNA-seq data quality. Formalin-Fixed Paraffin-Embedded (FFPE) samples exemplify these challenges, as "the fixation process causes RNA fragmentation and modifications leading to biased transcriptome profiles and inaccurate gene expression quantification" [58]. Artifacts like cross-linking may further interfere with sequencing processes [58].

Table 1: Impact of RNA Quality on Sequencing Metrics

RNA Quality Metric High-Quality RNA Impact Degraded RNA Impact
RNA Integrity Number (RIN) Higher mapping rates, more balanced coverage [28] Increased 3' bias, reduced library complexity [28]
Library Complexity Detection of more transcripts, better dynamic range [28] Limited transcript detection, biased toward highly expressed genes [59]
Mapping Quality 70-90% of reads map to reference genome [28] Reduced mapping percentages, increased ambiguous mappings [28]
Base-Level Quality Scores Uniform quality across read length [28] Quality deterioration toward 3' end, requiring trimming [28]

Variability in sample quality and quantity can introduce batch effects and confounding factors that complicate data interpretation [58]. This is especially problematic in clinical settings where sample collection cannot be as tightly controlled as in experimental models. The presence of high abundance RNAs requires additional steps to reduce background RNA and/or enrich for mRNAs, and although "these methods can help data quality, they add to the labor, cost and time required" for the experiment [59].

The Expression Level Dilemma: Sensitivity Limitations

Technical Limitations in Detecting Low-Abundance Transcripts

RNA-seq faces inherent challenges in sensitivity and noise that directly impact its ability to detect transcripts across different expression levels. The balance between sensitivity and noise is critical in RNA-seq analysis, as "technical limitations in library preparation and high sequencing depth requirements can lead to difficulties in detecting low-abundance transcripts, potentially underestimating or omitting important biological signals" [58]. Even when sequencing at sufficient depth to capture low-frequency transcripts, "the associated noise buildup can mask the transcripts that are of most importance" [58].

The fundamental issue stems from the composition of the transcriptome, where a small number of highly expressed genes can dominate the sequencing library, making it challenging to detect rare transcripts without extensive sequencing. High background noise from "sequencing errors, PCR amplification biases, and other technical artefacts can obscure genuine transcriptomic differences, making it challenging to distinguish true biological variability from experimental noise" [58]. This sensitivity limitation has direct implications for fusion detection, as fusion transcripts may be expressed at low levels despite their clinical significance.

Quantitative Evidence: RNA-seq vs. DNA-seq for Fusion Detection

Recent large-scale clinical studies have provided quantitative evidence of how transcript expression levels impact fusion detection sensitivity. A comprehensive analysis of approximately 80,000 samples from the Tempus Research Database compared the detection of clinically actionable fusions using both DNA-seq and whole exome capture RNA-seq [25]. The results demonstrated significant differences in detection capabilities:

Table 2: Comparative Fusion Detection Rates: DNA-seq vs RNA-seq

Gene Fusion Total Fusions Detected Detected by Both RNA + DNA DNA Only RNA Only
ALK-* 386 78.0% 4.1% 17.9%
BRAF-* 289 30.4% 1.4% 68.2%
FGFR3-* 307 73.6% 2.9% 23.5%
NTRK1/2/3-* 198 65.7% 11.1% 23.2%
ROS-* 113 70.8% 1.8% 27.4%
All Fusions 2118 66.1% 4.8% 29.1%

Across all fusion events, 29.1% were detected only by RNA-seq, while only 4.8% were identifiable solely through DNA-seq [25]. This substantial difference highlights the complementarity of the two approaches and RNA-seq's particular value for fusion detection. The study further analyzed the therapeutic implications of these findings, noting that "fusions identified through RNA-seq alone led to a 24% increase in the number of patients who were eligible to receive matched therapies" [25].

The technical reasons for RNA-seq's advantage in fusion detection relate to the nature of fusion events. DNA-seq operates at the DNA level, where "the breakpoints of fusion genes usually occur in long intronic regions, and the breakpoints of fusion genes vary across patients and diseases in real life" [4]. However, "DNA-seq cannot accurately cover the long intronic regions, which contain a large number of repetitive sequences, making it difficult to identify fusion genes" [4]. RNA-seq, in contrast, detects the expressed consequence of these genomic rearrangements, potentially providing a more functional assessment of the fusion's biological relevance.

Experimental Design Considerations for Optimized RNA-seq

Sample Quality Assessment and Quality Control

A successful RNA-seq study requires thoughtful experimental design beginning with RNA quality assessment. The RNA-extraction protocol must be chosen based on sample characteristics—for eukaryotes, this involves deciding "whether to enrich for mRNA using poly(A) selection or to deplete rRNA" [28]. Poly(A) selection "typically requires a relatively high proportion of mRNA with minimal degradation as measured by RNA integrity number (RIN), which normally yields a higher overall fraction of reads falling onto known exons" [28]. However, many biologically relevant samples (such as tissue biopsies) cannot be obtained in sufficient quantity or quality to produce good poly(A) RNA-seq libraries and therefore require ribosomal depletion instead [28].

Quality control checkpoints should be implemented at multiple stages of the RNA-seq workflow:

G A Sample Collection (RNA Integrity) B RNA Extraction (RIN Assessment) A->B C Library Preparation (Fragment Analysis) B->C D Sequencing (Quality Metrics) C->D E Raw Reads (QC: FastQC) D->E F Read Alignment (QC: RSeQC) E->F G Quantification (Gene Counts) F->G

Diagram 1: RNA-seq Quality Control Checkpoints

At the raw reads stage, quality control involves "analysis of sequence quality, GC content, the presence of adaptors, overrepresented k-mers and duplicated reads in order to detect sequencing errors, PCR artifacts or contaminations" [28]. Software tools such as the FASTX-Toolkit and Trimmomatic can be used to "discard low-quality reads, trim adaptor sequences, and eliminate poor-quality bases" [28]. For read alignment, important parameters include "the percentage of mapped reads, which is a global indicator of the overall sequencing accuracy and of the presence of contaminating DNA" [28]. Additional alignment quality metrics include "the uniformity of read coverage on exons and the mapped strand" [28].

Sequencing Depth and Replicate Planning

Optimal sequencing depth is experiment-dependent and represents a balance between cost and comprehensiveness. While "some authors will argue that as few as five million mapped reads are sufficient to quantify accurately medium to highly expressed genes in most eukaryotic transcriptomes, others will sequence up to 100 million reads to quantify precisely genes and transcripts that have low expression levels" [28]. For fusion detection, where target transcripts may be rare, deeper sequencing is generally advantageous.

The number of biological replicates is another critical design factor that "depends on both the amount of technical variability in the RNA-seq procedures and the biological variability of the system under study, as well as on the desired statistical power" [28]. Technical variation in RNA-seq experiments "stems from many sources, such as differences in quality and quantity of RNA recovered during sample preparation, library preparation batch effect, flow cell and lane effects when using Illumina technology, and adapter bias" [60]. Evidence suggests that "library preparation was the largest source of technical variation" [60].

To mitigate these effects, researchers should "randomize samples during preparation and dilute them to the same concentration" [60]. Additionally, "indexing and multiplexing samples, with all samples included on all lanes/flow cells" helps reduce lane-specific effects [60]. When complete multiplexing isn't possible, "a blocking design can be used that includes some samples from each group on each lane of sequencing" [60].

Analytical Workflows and Computational Tools

RNA-seq Data Analysis Pipeline

The analysis of RNA-seq data involves multiple steps, each with specific methodological considerations. A generalized workflow encompasses quality control, read alignment, quantification, and differential expression analysis, with tool selection depending on the specific experimental goals:

G A Raw Reads (FASTQ Files) B Quality Control (FastQC, NGSQC) A->B C Trimming & Filtering (Trimmomatic, FASTX-Toolkit) B->C D Read Alignment (STAR, HISAT2, TopHat2) C->D E Quantification (featureCounts, HTSeq) D->E F Normalization (TPM, FPKM, DESeq2, edgeR) E->F G Differential Expression (DESeq2, edgeR, Cuffdiff) F->G H Functional Analysis (GO Enrichment, GSEA) G->H

Diagram 2: RNA-seq Data Analysis Workflow

For read alignment, different aligners are available, with studies comparing "Gsnap, Stampy and TopHat" for their influence on detection capabilities [61]. For differential expression analysis, multiple statistical methods have been developed, including "DESeq, edgeR, Cuffdiff, baySeq, and NOISeq" [61]. These tools employ different statistical models—"edgeR method proposed by Robinson et al. has been developed based on an overdispersed Poisson model," while "Anders and Huber showed that negative binomial was superior for estimation of variability in read count type data and implemented the method as a DESeq package" [61].

Impact of Analysis Choices on Results

Recent systematic comparisons have revealed that analytical choices significantly impact RNA-seq results. One comprehensive study evaluated "192 pipelines using alternative methods" applied to 18 samples from two human cell lines, testing "3 trimming algorithms, 5 aligners, 6 counting methods, 3 pseudoaligners and 8 normalization approaches" [62]. The results demonstrated that "the choice of data preprocessing operations affected the performance" of downstream analyses [62].

For cross-study applications, such as building classifiers for tissue of origin prediction, preprocessing decisions including "normalization, batch effect correction, and data scaling" significantly impact performance [63]. One investigation found that "batch effect correction improved performance measured by weighted F1-score in resolving tissue of origin against an independent GTEx test dataset" [63]. However, the same study also noted that "the use of data preprocessing operations worsened classification performance when the independent test dataset was aggregated from separate studies in ICGC and GEO" [63], highlighting that the optimal preprocessing approach depends on the specific application and data structure.

The Scientist's Toolkit: Essential Research Reagents and Solutions

Table 3: Key Research Reagent Solutions for RNA-seq Studies

Reagent/Solution Category Specific Examples Function and Application
RNA Stabilization Reagents RNAlater, PAXgene Stabilize RNA immediately after sample collection to prevent degradation [58]
RNA Extraction Kits RNeasy Plus Mini Kit (QIAGEN) High-quality RNA extraction with genomic DNA removal [62]
RNA Quality Assessment Bioanalyzer (Agilent), TapeStation Assess RNA Integrity Number (RIN) for sample QC [62]
rRNA Depletion Kits Ribo-Zero, NEBNext rRNA Depletion Remove abundant ribosomal RNA to enhance mRNA sequencing [28]
Poly(A) Selection Kits Dynabeads mRNA DIRECT Isolate polyadenylated mRNA molecules [28]
Library Preparation Kits TruSeq Stranded Total RNA, NuGEN Ovation Convert RNA to sequencing-ready libraries [60] [62]
Strand-Specific Library Kits dUTP-based methods Preserve strand orientation information during cDNA synthesis [28]

RNA-seq provides powerful capabilities for transcriptome analysis and fusion detection, but its effectiveness is constrained by fundamental dependencies on RNA quality and transcript expression levels. The evidence demonstrates that RNA-seq can detect approximately 29% of actionable fusions that would be missed by DNA-seq alone [25], highlighting its complementary value in comprehensive genomic profiling. However, realizing this potential requires meticulous attention to experimental design, including appropriate quality control measures, sufficient sequencing depth, and adequate biological replication.

Researchers must carefully consider their specific experimental goals when designing RNA-seq studies, as "the choice between standard and nascent RNA-seq depends on the research question and experimental objectives" [58]. Failure to consider critical factors such as RNA quality, sequencing depth, and analytical approaches "can lead to misinterpretation of results and limited insights into gene regulation dynamics" [58]. By understanding and addressing these pitfalls through rigorous experimental design and appropriate analytical choices, researchers can maximize the utility of RNA-seq data and generate robust, biologically meaningful results that advance our understanding of transcriptome biology and improve clinical detection of functionally important genetic events such as gene fusions.

Gene fusions are critical drivers in oncogenesis, serving as key biomarkers for disease classification, prognosis, and therapeutic targeting in precision oncology. The accurate detection of these complex structural variants relies heavily on optimized assay configurations, encompassing targeted panel design, sequencing coverage parameters, and bioinformatics pipeline selection. This guide provides a comprehensive comparison of RNA-seq and DNA-seq approaches for fusion gene detection, synthesizing experimental data from recent studies to inform researchers, scientists, and drug development professionals in their assay optimization strategies. The analysis focuses specifically on technical performance metrics across platforms and methodologies, providing evidence-based recommendations for clinical and research applications.

Performance Comparison of Fusion Detection Approaches

RNA-seq vs. Optical Genome Mapping for Structural Variant Detection

A large-scale comparative study of 467 acute leukemia cases provides critical insights into the complementary strengths of targeted RNA-seq and optical genome mapping (OGM) technologies. The research demonstrated an overall concordance rate of 88.1% between platforms, with significant variability across leukemia subtypes—ranging from 80.2% in B-ALL to 41.7% in T-ALL [21].

Table 1: Platform-Specific Detection Rates in Acute Leukemia (n=234 clinically relevant rearrangements)

Detection Category Percentage Count Key Examples
OGM Unique Detection 15.8% 37/234 MECOM, BCL11B, IGH rearrangements
RNA-seq Unique Detection 9.4% 22/234 Fusions from intrachromosomal deletions
Concordant Detection 74.7% 175/234 KMT2A, BCR::ABL1 rearrangements
Enhancer Hijacking Events (Concordance) 20.6% - MECOM, BCL11B rearrangements

The data reveals that OGM particularly excels in identifying enhancer-hijacking lesions that often evade detection by RNA-seq, while targeted RNA-seq slightly outperforms for fusions arising from intrachromosomal deletions that OGM may misclassify as simple deletion events [21]. This underscores the platform-specific biases that must be considered during assay selection.

Targeted vs. Conventional RNA-seq for Fusion Detection

Targeted RNA-seq demonstrates significant advantages for fusion detection sensitivity compared to conventional whole transcriptome approaches. Experimental validation using spike-in standards and cell lines shows that targeted capture achieves 50% detection of fusion sequins at 2 pM input and 100% detection between 8 pM and 31 nM input, independent of whether the panel targeted one or both fusion partners [26].

Table 2: Sensitivity Comparison of RNA-seq Methodologies

Performance Metric Targeted RNA-seq Conventional RNA-seq
Detection Sensitivity 50% at 2 pM input Limited for low-expression fusions
On-Target Rate 93% (double-capture) ~4%
Enrichment Factor 33-59 fold -
Single-Copy Fusion Detection Reliable Challenging
Novel Partner Identification Possible Possible

In clinical validation, targeted RNA-seq increased the overall fusion diagnostic rate from 63% with conventional approaches (FISH, RT-PCR) to 76%, while simultaneously identifying precise fusion junctions and partners [26]. This enhanced sensitivity is particularly valuable for detecting low-abundance fusion transcripts in samples with limited tumor purity or those expressing fusion genes at low levels.

Experimental Protocols and Methodologies

Targeted RNA-seq Laboratory Workflow

The optimized protocol for fusion detection employs a double-capture approach to maximize on-target efficiency:

  • RNA Extraction and Quality Control: Isolate RNA from patient specimens (blood, bone marrow, or tumor tissue) using standardized extraction kits. Assess RNA integrity using appropriate methods (e.g., Bioanalyzer) with RIN > 7.0 recommended [26] [62].

  • Library Preparation: Utilize stranded RNA library preparation kits (e.g., TruSeq Stranded Total RNA) following manufacturer protocols with incorporation of unique dual indexes to enable sample multiplexing [62].

  • Hybridization Capture: Employ biotinylated oligonucleotide probes targeting known fusion genes (188 genes for hematological malignancies, 241 genes for solid tumors) with 16-24 hour hybridization at 65°C. Include both DNA-level and RNA-level spike-in controls (ERCC, fusion sequins) for quality monitoring [26].

  • Post-Capture Amplification: Perform double-capture enrichment with magnetic streptavidin bead-based purification followed by 10-12 cycles of PCR amplification [26].

  • Sequencing: Sequence on Illumina platforms (HiSeq 2500, MiSeq, or NovaSeq) with paired-end reads (2×101 bp) targeting minimum 20 million reads per sample for adequate coverage [62].

Bioinformatic Processing and Fusion Calling

Computational analysis requires specialized pipelines to distinguish true fusion events from artifacts:

  • Read Preprocessing: Quality trim adapters and low-quality bases using Trimmomatic, Cutadapt, or BBDuk, retaining reads with Phred score >20 and length >50 bp [62].

  • Alignment and Mapping: Map reads to reference genome (GRCh37/hg19 or GRCh38/hg38) using STAR aligner with chimeric alignment detection enabled [7] [26].

  • Fusion Detection: Implement multiple algorithms in parallel (STAR-Fusion, FusionCatcher, Arriba) to maximize sensitivity. Require consensus detection by at least two tools to minimize false positives [7] [26] [64].

  • Filtering and Annotation: Remove known artifacts, read-through transcripts, and common false positives. Annotate remaining fusions with clinical relevance (Tier 1-3 classification per ACMG/AMP guidelines) [21].

G A Raw RNA-Seq FASTQ Files B Quality Control & Trimming A->B C Alignment to Reference Genome B->C D Fusion Detection Algorithms C->D E STAR-Fusion D->E F FusionCatcher D->F G Arriba D->G H Consensus Fusion Calls E->H F->H G->H I Annotation & Clinical Reporting H->I J Final Fusion List I->J

Figure 1: Bioinformatics Pipeline for Fusion Detection. This workflow illustrates the sequential steps from raw sequencing data to final fusion calling, emphasizing the importance of multiple algorithm consensus.

Bioinformatics Tool Performance

Comparative Accuracy of Fusion Detection Algorithms

Benchmarking studies evaluating fusion detection algorithms across simulated and experimental datasets reveal significant variation in performance characteristics. When tested on samples with low concentrations of fusion transcripts, Arriba demonstrated superior sensitivity, identifying 88 of 150 simulated fusions at fivefold expression level compared to 57% for the next best method [7].

Table 3: Fusion Detection Tool Performance Characteristics

Tool Sensitivity Precision Runtime Strengths
Arriba 88/150 simulated fusions (5x level) High <1 hour/sample Detects intragenic rearrangements, cryptic events
FusionCatcher Moderate-High Moderate Hours Comprehensive gene annotation
STAR-Fusion Moderate High Hours Accurate breakpoint resolution
FusionScan 79% recall 60% precision Comparable to leaders Optimized for intact exon combinations
deFuse Low-Moderate Moderate Hours Good specificity

Performance varies substantially across data types, with certain tools excelling in specific contexts. For example, Arriba detected 55 TMPRSS2-ERG fusions in the ICGC early-onset prostate cancer cohort (6% more than the next best method) and 8 IG-BCL2/BCL6/MYC translocations in the TGCA-DLBC cohort (60% more than the next best method) [7]. This highlights the importance of matching algorithm selection to experimental context and fusion types of interest.

Signaling Pathways and Biological Context

Gene fusions activate oncogenic pathways through multiple mechanisms, predominantly via either promoter swapping leading to oncogene overexpression or creation of chimeric proteins with constitutive kinase activity.

G A Fusion Gene Formation B Promoter Swapping (Overexpression) A->B C Chimeric Protein (Constitutive Activity) A->C D Oncogenic Signaling Activation B->D C->D E MAPK Pathway D->E F PI3K/AKT Pathway D->F G JAK/STAT Pathway D->G H Cell Proliferation & Survival E->H F->H G->H I Therapeutic Inhibition H->I Feedback

Figure 2: Oncogenic Signaling Pathways Activated by Gene Fusions. This diagram illustrates the primary mechanisms through which fusion genes drive oncogenesis, highlighting key downstream pathways and potential therapeutic intervention points.

In pancreatic cancer, fusion genes are significantly associated with KRAS wild-type tumors and predominantly involve proteins that stimulate the MAPK signaling pathway, suggesting they functionally substitute for activating KRAS mutations [7]. Similar pathway-specific activities are observed across cancer types, with different fusion classes activating characteristic oncogenic programs.

Research Reagent Solutions

Table 4: Essential Research Reagents and Platforms for Fusion Detection Studies

Reagent/Platform Function Application Notes
Targeted RNA-seq Panels Gene-specific enrichment 188-gene (hematologic) vs. 241-gene (solid tumor) configurations
Archer Analysis Software Fusion calling from AMP-based data Optimized for anchored multiplex PCR target enrichment
Bionano OGM Genome-wide structural variant detection Complementary to RNA-seq for enhancer hijacking events
Spike-in Controls (ERCC, Fusion Sequins) Quantification standards Enable absolute sensitivity measurement and quality control
STAR Aligner Spliced read alignment Critical for chimeric junction detection in RNA-seq data
Illumina Sequencing Platforms High-throughput sequencing MiSeq, HiSeq 2500, NovaSeq for varying throughput needs

The optimization of fusion detection assays requires careful consideration of the complementary strengths and limitations of available technologies. Targeted RNA-seq provides superior sensitivity for detecting expressed chimeric fusions, particularly those arising from intrachromosomal deletions, while OGM excels in identifying cryptic, enhancer-driven rearrangements that may be missed by transcriptome-based approaches. Bioinformatics pipeline selection significantly impacts detection accuracy, with consensus approaches using multiple algorithms (Arriba, STAR-Fusion, FusionCatcher) providing optimal sensitivity and specificity. These findings support a multimodal approach to fusion detection in clinical and research settings, where orthogonal technologies provide comprehensive structural variant characterization to inform basic cancer research and precision oncology initiatives.

In the era of precision oncology and advanced genetic diagnostics, the detection of actionable molecular alterations is paramount. Fusion genes represent a critical class of biomarkers that guide diagnosis, prognosis, and targeted treatment decisions across numerous cancers. While both DNA sequencing (DNA-seq) and RNA sequencing (RNA-seq) can identify these fusions, discrepancies between their results frequently present significant challenges in clinical and research settings. Understanding the sources of these discrepancies—whether biological, technical, or analytical—is essential for accurate interpretation and appropriate patient management. This guide objectively compares the performance of DNA-seq and RNA-seq for fusion detection, examining the underlying causes of discordant results and providing evidence-based strategies for resolution.

Discrepancies between DNA-seq and RNA-seq results arise from multiple factors spanning biological mechanisms and technical limitations.

  • Biological Mechanisms: True biological differences can manifest as DNA-RNA discordance. RNA editing represents one such process where the RNA sequence is altered post-transcriptionally. However, studies indicate that RNA editing explains only a minor portion of observed discrepancies [65]. More impactful is the transcriptional process itself; a fusion identified at the DNA level may not be transcribed or expressed, rendering it undetectable by RNA-seq. Conversely, trans-splicing or read-through transcription events can create fusion transcripts without an underlying genomic rearrangement [9].

  • Technical and Analytical Limitations: Each technology has inherent limitations. DNA-seq, particularly when using targeted panels, can miss rearrangements occurring in large intronic regions or complex genomic contexts outside the covered areas [36] [34]. The detection accuracy is also influenced by the unpredictable nature of genomic breakpoints. In contrast, RNA-seq faces challenges related to RNA quality, which is often compromised in formalin-fixed paraffin-embedded (FFPE) samples due to chemical modification and degradation [34]. Furthermore, the alignment of RNA-seq reads is complicated by phenomena such as alternative splicing and the presence of pseudogenes—dysfunctional genomic sequences with high similarity to functional genes—which can lead to misalignment and false positives [66].

Performance Comparison: DNA-seq vs. RNA-seq

Direct comparisons of DNA-seq and RNA-seq for fusion detection reveal complementary strengths and weaknesses, as summarized by concordance studies and detection rates.

Table 1: Concordance Rates Between Detection Methods for RET Fusions in NSCLC

Comparison Concordance Rate Study Context
DNA-seq vs. RNA-seq 92.3% Early-stage NSCLC [36]
RNA-seq vs. FISH 84.6% Early-stage NSCLC [36]
DNA-seq vs. FISH 82.5% Early-stage NSCLC [36]

Table 2: Detection Performance in Clinical Validation Studies

Metric DNA-seq Only RNA-seq Only Integrated DNA/RNA Approach
Sensitivity 93.4% 86.9% 100% [34]
Specificity 96.9% 96.9% 100% [34]
Commonly Missed Fusions ETV6::NTRK3, CCDC6::RET TRIM46::NTRK1, CD74::ROS1 None (complementary detection) [34]

The data demonstrates that while DNA-seq and RNA-seq independently show high performance, an integrated approach achieves superior sensitivity and specificity by leveraging their complementary nature. RNA-seq, especially targeted panels, can identify fusions missed by DNA-seq. For instance, in one study, targeted RNA-seq uncovered five additional RET+ cases missed by whole-transcriptome sequencing [36]. Another study on acute myeloid leukemia found that RNA-seq detected 90% of fusion events reported by routine diagnostics (karyotyping and FISH) with high evidence [9].

Methodologies for Resolving Discrepancies

Integrated DNA-RNA Sequencing Workflow

A robust protocol for resolving discrepancies involves a complementary testing algorithm. One validated approach begins with an initial amplicon-based DNA/RNA sequencing step. If this is negative for oncogenic drivers, it is reflexed to a more comprehensive hybridization-capture-based RNA sequencing [14]. This strategy successfully identified actionable fusions in non-small cell lung carcinoma (NSCLC) that were missed by the initial amplicon-based assay [14].

Another developed assay simultaneously utilizes both DNA and RNA from FFPE samples. The DNA component helps confirm the genomic rearrangement, while the RNA component confirms the expression of the fusion transcript, thereby ruling out silent rearrangements [34]. This dual-layer approach facilitates precise diagnosis and treatment.

G Start FFPE Tumor Sample DNA_RNA_Extract Concurrent DNA & RNA Extraction Start->DNA_RNA_Extract Initial_Test Initial Amplicon-Based DNA/RNA NGS Test DNA_RNA_Extract->Initial_Test Decision Oncogenic Driver Found? Initial_Test->Decision Negative Negative/Inconclusive Result Decision->Negative No Positive Actionable Fusion Identified Decision->Positive Yes Reflex Reflex to Hybridization-Capture Based RNA-seq Negative->Reflex End Precise Diagnosis & Targeted Therapy Reflex->End Positive->End

Analytical and Bioinformatics Strategies

Beyond laboratory workflows, sophisticated bioinformatics strategies are critical for accurate fusion detection from RNA-seq data. This involves using multiple state-of-the-art fusion callers (e.g., Arriba, FusionCatcher) and applying stringent, custom filtering strategies to reduce false positives [9]. Key filters include:

  • Promiscuity Score (PS): Excludes fusion events where partner genes are frequently called in other distinct fusions, as these are likely artifacts [9].
  • Fusion Transcript Score (FTS): Measures the expression of a fusion relative to its partner genes; low FTS indicates a likely artifact [9].
  • Robustness Score (RS): Defined as the ratio between the number of samples in which a fusion passed all filters and the total number of samples where it was called [9].

For DNA-seq, specialized structural variant callers like Delly are employed to identify the genomic breakpoints supporting a fusion [36].

The Scientist's Toolkit: Essential Research Reagents and Solutions

Successful resolution of DNA-seq/RNA-seq discrepancies relies on a suite of specialized reagents, kits, and computational tools.

Table 3: Key Research Reagent Solutions for Fusion Detection

Item Function Example Use Case
QIAamp DNA FFPE Tissue Kit Extracts high-quality DNA from challenging FFPE samples. Input for DNA-based NGS fusion panels [36].
KAPA Hyper Prep Kit Prepares NGS libraries from extracted DNA. Used in targeted DNA-seq for fusion detection [36].
Stranded Ribo-Zero Depletion Kit Removes ribosomal RNA for whole-transcriptome sequencing. RNA library prep for comprehensive fusion screening [67].
Stranded Poly-A Enrichment Kit Selects polyadenylated RNA for sequencing. RNA library prep focusing on mRNA [67].
GeneWell Fusion Reference Standards Spiked-in controls containing validated fusions. Assess assay sensitivity, specificity, and limit of detection [34].
Arriba & FusionCatcher Bioinformatics tools for fusion detection from RNA-seq data. Used in tandem for robust fusion calling in AML [9].
Delly Bioinformatics tool for calling structural variants from DNA-seq. Identifies genomic breakpoints supporting fusions [36].

The discrepancy between DNA-seq and RNA-seq results in fusion gene detection is not a mere technical artifact but a multifaceted issue with biological and methodological roots. Evidence consistently shows that neither technology is infallible alone; DNA-seq can miss expressed fusions due to breakpoint location or design limitations, while RNA-seq can be confounded by low expression, poor sample quality, or complex alignment scenarios.

The most reliable path forward involves an integrated, complementary approach. Combining DNA and RNA analysis within a single assay or a reflexive testing algorithm maximizes detection sensitivity and specificity, minimizing false negatives and positives. Furthermore, employing robust bioinformatics pipelines with stringent filtering is essential for accurate data interpretation. As precision medicine continues to evolve, embracing these multi-modal diagnostic strategies will be crucial for ensuring all patients receive accurate diagnoses and benefit from the most effective targeted therapies.

Next-generation sequencing (NGS) has revolutionized cancer diagnostics, yet traditional approaches that analyze DNA and RNA separately present significant limitations in clinical practice. DNA sequencing alone struggles to reliably detect key oncogenic drivers such as gene fusions and exon-skipping events because breakpoints often occur within introns or repetitive regions that are challenging for hybridization-capture assays [68]. This diagnostic gap has clinical consequences, as these alterations represent actionable targets for targeted therapies. RNA sequencing provides direct evidence of fusion transcripts and aberrant splicing, offering a solution to these limitations [26]. However, implementing separate DNA and RNA assays requires more specimen material, increases costs, and prolongs turnaround times—critical factors in clinical decision-making. This comparison guide examines the emerging solution: integrated DNA-RNA NGS assays that simultaneously capture multiple data types from a single workflow, offering a more comprehensive genomic profiling approach for clinical use.

Performance Comparison: Combined Assays Versus Traditional Approaches

Diagnostic Yield and Technical Performance

Recent studies demonstrate that integrated DNA-RNA sequencing significantly outperforms DNA-only approaches in identifying clinically actionable alterations, particularly for fusion detection.

Table 1: Comparative Performance of Sequencing Approaches for Fusion Detection

Metric DNA-Only NGS RNA-Only NGS Combined DNA-RNA
Fusion Detection Sensitivity Limited for novel/unexpected partners [68] High for expressed fusions [26] Highest; captures both known and novel [69] [26]
Exon-Skipping Detection Challenging; indirect inference [68] Direct detection via transcriptome [68] Comprehensive DNA+RNA evidence [68]
Actionable Alteration Rate ~80-90% of cases [70] N/A (fusion-focused) 98% of cases [69]
Novel Fusion Discovery Limited by design Possible but requires high expression [26] Enhanced; captures partners missed by targeted panels [69]
Orthogonal Confirmation Needed Often required for fusions [70] Sometimes required Reduced need due to combined evidence

In a validation study of 2,230 clinical tumor samples, the combined assay improved fusion detection and enabled direct correlation of somatic alterations with gene expression. This approach uncovered clinically actionable alterations in 98% of cases and revealed complex genomic rearrangements that would likely have remained undetected with DNA-only testing [69]. For non-small cell lung cancer (NSCLC)—a malignancy with numerous actionable fusions—one study found that approximately 10% of cases required reflex RNA sequencing to identify oncogenic drivers after initial DNA-based testing was negative [14].

Analytical Validation and Concordance

Integrated assays undergo rigorous validation to ensure reliability across variant types. One study established comprehensive performance metrics using exome-wide somatic reference standards containing 3,042 single nucleotide variants (SNVs) and 47,466 copy number variations (CNVs) [69]. When compared to orthogonal methods, combined approaches show variable but generally high concordance depending on the alteration type:

Table 2: Concordance Rates with Orthogonal Methods in Clinical Samples

Alteration Type Cancer Type Sensitivity (%) Specificity (%) Notes
SNVs (e.g., KRAS) Colorectal Cancer 87.4 79.3 DNA component performance [70]
Fusions (e.g., ALK) NSCLC 100 100 RNA significantly enhances detection [70]
Fusions (e.g., ROS1) NSCLC 33.3 N/A Lower sensitivity for certain fusions with DNA-only [70]
Amplifications (ERBB2) Breast Cancer 53.7 99.4 DNA-only; challenges with CNV calling [70]
Amplifications (ERBB2) Gastric Cancer 62.5 98.2 DNA-only; tissue quality impacts performance [70]

The detection threshold for variant calling significantly impacts assay performance. Studies recommend a minimum 2% variant allele frequency (VAF) threshold for optimal specificity in mutation detection, as specificity dramatically decreases below this level [71].

Methodological Approaches: Experimental Design and Workflow Optimization

Integrated Assay Configurations

Several technical approaches exist for combining DNA and RNA sequencing:

  • Parallel Sequencing: DNA and RNA are processed separately then sequenced concurrently [69]
  • Sequential Testing: RNA sequencing follows DNA testing as a reflex [14]
  • Unified Co-Capture: Single-tube capture of pre-captured DNA and RNA libraries [68]

The DNA/RNA co-hybrid capture sequencing (DRCC-Seq) approach demonstrates particular innovation by mixing pre-captured DNA and RNA libraries in defined proportions before a single capture reaction, creating one sequencing library [68]. This method shows optimal performance with a 1:1 ratio of DNA to RNA probes, as higher RNA proportions negatively impact DNA data quality and copy number calling [68].

Laboratory Procedures and Quality Control

Successful implementation requires meticulous attention to laboratory procedures:

Nucleic Acid Isolation: For fresh frozen solid tumors, the AllPrep DNA/RNA Mini Kit enables simultaneous isolation of both nucleic acids. For FFPE samples, dedicated kits like the AllPrep DNA/RNA FFPE Kit account for cross-linking and fragmentation [69].

Library Preparation: Input requirements typically range from 10-200 ng of extracted DNA or RNA. For RNA library construction from fresh frozen tissue, the TruSeq stranded mRNA kit is commonly used, while FFPE samples require specialized kits like SureSelect XTHS2 to handle degraded material [69].

Sequencing and QC: Sequencing is typically performed on Illumina platforms such as NovaSeq 6000. Quality control metrics include Q30 scores >90% and passing filter (PF) rates >80%. For RNA sequencing, metrics like RNA integrity number (RIN) are crucial for assessing sample quality [69].

Table 3: Essential Research Reagent Solutions for Integrated NGS

Reagent/Category Specific Examples Function in Workflow
Nucleic Acid Extraction AllPrep DNA/RNA Mini Kit (Qiagen); Concert FFPE kits [69] [68] Co-extraction of DNA and RNA while preserving integrity
Library Preparation TruSeq stranded mRNA kit; SureSelect XTHS2; Rapid MaxDNA Lib Prep kit [69] [68] Preparation of sequencing libraries from nucleic acid inputs
Hybridization Capture SureSelect Human All Exon; Custom DNA+RNA probe panels [69] [68] Target enrichment for exonic regions and fusion-related genes
Quality Control Qubit Fluorometer; TapeStation; QIAxcel Advanced System [69] [68] Quantification and quality assessment of inputs and libraries
Sequence Capture Twist Fast Hybridization and Wash Kit; Custom probe panels [69] [68] Target enrichment with optimized conditions for both DNA and RNA

Bioinformatics Analysis Frameworks

The computational pipeline for integrated assays requires specialized approaches:

Alignment: DNA sequencing data is typically mapped to the human genome (hg38) using BWA aligner, while RNA sequencing data uses STAR aligner for its handling of splice junctions [69].

Variant Calling: Somatic SNVs and indels are detected using optimized algorithms like Strelka2, with RNA-seq variant calling performed using tools such as Pisces [69].

Fusion Detection: A consensus approach requiring detection by multiple algorithms (e.g., STARfusion and FusionCatcher) significantly reduces false positives [26].

Quality Control: Unique considerations include calculation of off-target rates, duplicate reads, and for RNA-seq, assessment of strand-specificity and DNA contamination [69].

G Start Sample Input (FF or FFPE Tissue) NA_Extraction Nucleic Acid Extraction (AllPrep DNA/RNA Kit) Start->NA_Extraction DNA_Lib DNA Library Prep (Fragmentation, Adapter Ligation) NA_Extraction->DNA_Lib RNA_Lib RNA Library Prep (Fragmentation, cDNA Synthesis) NA_Extraction->RNA_Lib Pooling Library Pooling (Indexed DNA + RNA Libraries) DNA_Lib->Pooling RNA_Lib->Pooling Capture Hybridization Capture (Combined DNA+RNA Probe Panel) Pooling->Capture Sequencing NGS Sequencing (Illumina NovaSeq 6000) Capture->Sequencing Analysis Bioinformatics Analysis (Variant Calling, Fusion Detection) Sequencing->Analysis Report Clinical Report (Integrated DNA+RNA Alterations) Analysis->Report

Integrated DNA-RNA NGS Workflow

Clinical Validation and Utility

Analytical Validation Frameworks

Rigorous validation of integrated assays follows a structured approach:

Reference Materials: Use of cell lines and synthetic standards with known mutations across varying tumor purities establishes baseline performance. The 3042 SNVs and 47,466 CNVs in validated reference materials enable exome-wide analytical validation [69].

Orthogonal Confirmation: Comparison with established methods (FISH, RT-PCR, ddPCR) verifies results. One study showed 100% concordance for ALK fusions between NGS and orthogonal methods, though sensitivity for ROS1 fusions was lower (33.3%) with DNA-only approaches [70].

Clinical Utility Assessment: Implementation in real-world cohorts demonstrates practical value. In one clinical cohort, integrated profiling increased the fusion diagnostic rate from 63% to 76% compared to conventional approaches [26].

Clinical Application Across Cancer Types

The utility of combined DNA-RNA profiling extends across multiple malignancies:

Non-Small Cell Lung Cancer: Detection of targetable fusions in ALK, ROS1, RET, and NTRK is enhanced by RNA sequencing, with one study identifying these alterations in approximately 9% of reflex-tested cases [14].

Hematological Malignancies: Custom panels targeting 188 fusion-related genes, including immune receptor loci (TCR, IG), enable comprehensive profiling while simultaneously characterizing the immune repertoire [26].

Solid Tumors: Expanded panels covering 241 fusion-related genes identify both established and novel driver events in sarcoma, prostate, and other solid tumors [26].

G DNA_Only DNA Sequencing Only DNA_Advantage • SNVs/Indels • Copy Number Variations • Tumor Mutational Burden DNA_Only->DNA_Advantage DNA_Limitation • Missed Fusions • Indirect Splice Detection DNA_Only->DNA_Limitation Combined Combined DNA-RNA DNA_Limitation->Combined Integrated Solution Combined_Advantage • Direct Fusion Detection • Expression Validation • Enhanced Actionability Combined->Combined_Advantage Clinical_Impact Clinical Decision Impact Combined_Advantage->Clinical_Impact Impact_Result • 98% Actionable Alteration Rate • Personalized Treatment • Improved Outcomes Clinical_Impact->Impact_Result

Multi-Omic Data Enhances Clinical Utility

Integrated DNA-RNA sequencing represents a significant advancement over DNA-only approaches for comprehensive genomic profiling in clinical oncology. The combined approach enhances detection of actionable alterations, particularly gene fusions, while providing a more complete molecular portrait of tumors. The DRCC-Seq methodology offers a practical implementation strategy with optimized 1:1 DNA:RNA probe ratios, balancing data quality and comprehensive variant detection [68].

As the field advances, several developments will shape future implementations: Single-cell multi-omics technologies now enable simultaneous DNA and RNA profiling within individual cells, revealing clonal heterogeneity and genotype-phenotype relationships previously obscured by bulk sequencing [72]. Long-read sequencing technologies improve resolution of complex structural variants and repetitive regions, while advanced bioinformatics pipelines incorporating machine learning enhance variant interpretation [73].

For clinical laboratories adopting these approaches, establishing rigorous validation frameworks covering all variant types remains essential. As evidence accumulates demonstrating the clinical utility of integrated profiling, these comprehensive assays are poised to become the standard of care in precision oncology, ultimately improving patient outcomes through more accurate diagnosis and personalized treatment strategies.

Benchmarking Performance and Clinical Validation

Gene fusions are critical molecular drivers in cancer, with profound implications for diagnosis, prognosis, and therapeutic decision-making in oncology. The detection of these rearrangements has evolved significantly with the advent of next-generation sequencing (NGS) technologies, primarily through DNA sequencing (DNA-seq) and RNA sequencing (RNA-seq) approaches. Each method offers distinct advantages and limitations based on its underlying principles. DNA-seq identifies structural variants at the genomic level, while RNA-seq detects chimeric fusion transcripts expressed in the cell. Understanding the concordance and discordance between these platforms is essential for clinical laboratories and researchers aiming to implement robust testing protocols that maximize detection of clinically actionable alterations. This guide synthesizes evidence from recent real-world comparative studies to objectively evaluate the performance of RNA-seq versus DNA-seq for fusion gene detection across various cancer types and clinical scenarios.

Methodological Frameworks for Comparative Studies

Experimental Designs in Recent Comparative Literature

Recent studies have employed rigorous head-to-head comparison designs to evaluate the performance of DNA-seq and RNA-seq for fusion detection. The most robust analyses utilize large, real-world patient cohorts with orthogonal validation to establish ground truth. A 2025 study by PMC12608001 compared a 108-gene targeted RNA-seq panel with optical genome mapping (OGM) in 467 acute leukemia cases, including 360 AML, 89 B-ALL, 12 T-ALL, and 6 MPAL cases [21]. Similarly, a 2025 study in Communications Medicine analyzed 2,230 clinical tumor samples using an integrated RNA and DNA exome assay, providing extensive data on fusion detection across diverse cancer types [69].

The technical protocols for nucleic acid extraction and library preparation significantly impact fusion detection capabilities. For DNA-seq, studies often use hybrid capture-based panels with extended intronic coverage to capture breakpoints in genes known to be involved in fusions. The FindDNAFusion pipeline, for instance, employs three software tools (JuLI, Factera, and GeneFuse) with a combinatorial approach to improve detection accuracy to 98.0% for DNA panels with intron-tiled bait probes [45]. For RNA-seq, the Archer FusionPlex Pan-Heme panel utilizes anchored multiplex PCR (AMP) technology, which employs gene-specific primers combined with universal adapters to capture both known and novel fusion partners without prior knowledge of the partner sequence [74].

Analytical Validation Frameworks

Robust validation frameworks for integrated DNA-RNA sequencing assays typically involve three key steps: (1) analytical validation using custom reference samples containing known variants; (2) orthogonal testing in patient samples with established fusion status; and (3) assessment of clinical utility in real-world cases [69]. For instance, one validation approach used exome-wide somatic reference standards containing 3,042 SNVs and 47,466 CNVs, with multiple sequencing runs of cell lines at varying purities to establish sensitivity and specificity [69].

Quality control metrics are particularly crucial for RNA-seq from clinical samples, especially formalin-fixed paraffin-embedded (FFPE) tissue, where RNA degradation can impact results. Studies have shown that while FFPE samples yield shorter RNA fragments, the detection of fusion transcripts does not significantly differ between freshly frozen and FFPE samples when using appropriate library preparation methods and bioinformatic filters [75]. Standard QC metrics for RNA-seq include ribosomal RNA contamination assessment, unique mapping rates, and expression correlation between replicate samples.

Table 1: Key Methodological Approaches in DNA-seq vs. RNA-seq Comparison Studies

Study Cohort Size & Cancer Type DNA-seq Method RNA-seq Method Orthogonal Validation
PMC12608001 (2025) 467 acute leukemia cases Optical Genome Mapping 108-gene AMP-based targeted panel FISH, RT-PCR, clinical follow-up
Nature Communications Medicine (2025) 2,230 clinical tumor samples Whole Exome Sequencing Whole Transcriptome Sequencing Orthogonal panels, reference standards
Frontiers in Molecular Biosciences (2025) 29 colorectal cancer patients - STAR-Fusion on FFPE vs. fresh frozen Multiple database annotation
Cancers (2024) 264 leukemia patients Karyotyping 199-gene Archer FusionPlex RT-PCR, mRNA sequencing

Quantitative Concordance Data Across Cancer Types

Hematologic Malignancies

Comprehensive comparative studies in acute leukemia reveal distinctive patterns of concordance between DNA and RNA-based detection methods. A 2025 analysis of 467 acute leukemia cases demonstrated an overall concordance rate of 88.1% between targeted RNA-seq and optical genome mapping [21]. However, this concordance varied substantially across leukemia subtypes, ranging from 80.2% in B-ALL to 41.7% in T-ALL, highlighting the impact of disease-specific biology on methodological performance [21].

The distribution of uniquely detected rearrangements further illuminates the complementary nature of these technologies. Among 234 clinically relevant events, OGM uniquely identified 37 (15.8%), while RNA-seq exclusively detected 22 (9.4%) [21]. This disparity stems from fundamental biological differences: RNA-seq effectively identifies expressed chimeric fusions, while OGM excels at detecting cryptic, enhancer-driven events that may not generate fusion transcripts. Enhancer-hijacking lesions involving genes such as MECOM, BCL11B, and IGH showed particularly poor concordance (20.6%) compared to all other aberrations (93.1%) [21].

A separate 2024 study of 264 leukemia patients validated targeted RNA-seq against conventional karyotyping and RT-PCR, demonstrating 100% concordance with RT-PCR but only 83.3% concordance with karyotyping [74]. Notably, targeted RNA-seq identified 29 fusion events missed by karyotyping, while 5 cases initially called positive by karyotyping showed no pathogenic rearrangements upon confirmatory testing with mRNA sequencing [74].

Solid Tumors

In non-small cell lung cancer (NSCLC), RET fusions have been a particular focus of methodological comparisons. A 2025 retrospective study of 40 RET+ NSCLC patients found a 92.3% concordance between DNA-seq and RNA-seq, with RNA-seq identifying five additional RET+ cases missed by DNA-seq [36]. The study employed a 425-gene DNA panel with breakpoint analysis and compared it against both whole-transcriptome sequencing and targeted RNA-seq, revealing the enhanced sensitivity of targeted RNA approaches [36].

The clinical utility of reflexive testing algorithms is evident in real-world practice. One study of 1,211 NSCLC specimens implemented a testing algorithm using amplicon-based DNA/RNA sequencing followed by reflex hybridization-capture-based RNA sequencing if initial testing was negative [14]. Among 120 cases (approximately 10%) that underwent reflex testing, 9 oncogenic fusions were identified, including clinically actionable alterations in ALK, BRAF, NRG1, NTRK3, ROS1, and RET – none of which were detected by the initial amplicon-based assay [14].

Analysis of the AACR Project Genie database encompassing 20,900 NSCLC cases revealed that of 1,081 fusion-positive cases, 893 (82.6%) could theoretically be detected by amplicon-based assays, leaving a significant minority requiring more comprehensive approaches [14].

Table 2: Concordance Rates Between DNA-seq and RNA-seq Across Cancer Types

Cancer Type Overall Concordance Rate DNA-Seq Unique Detection RNA-Seq Unique Detection Key Discordant Fusion Types
Acute Leukemia (all types) 88.1% [21] 15.8% [21] 9.4% [21] Enhancer-hijacking lesions (MECOM, BCL11B, IGH)
B-ALL 80.2% [21] - - -
T-ALL 41.7% [21] - - -
NSCLC (RET fusions) 92.3% [36] - 5 additional cases by RNA-seq [36] Noncanonical RET partners
Colorectal Cancer No significant difference in FFPE vs. fresh frozen [75] - - -

Technical Factors Influencing Concordance

Biological Mechanisms Underlying Discordant Results

The biological nature of genomic rearrangements fundamentally impacts their detection by DNA-seq versus RNA-seq. Enhancer hijacking events represent a key category prone to discordant detection. These lesions reposition enhancer elements to drive oncogene expression without generating fusion transcripts, making them detectable by DNA-based methods but largely invisible to RNA-seq [21]. In acute leukemia, this explains why rearrangements involving MECOM and BCL11B show particularly low concordance between platforms [21].

Conversely, RNA-seq slightly outperforms DNA-based methods for fusions arising from intrachromosomal deletions that are sometimes labeled by OGM as simple deletions rather than rearrangements [21]. The expression level of fusion transcripts also critically impacts detectability by RNA-seq. Low-expression fusions may fall below the detection threshold of RNA-seq assays, while DNA-based methods remain unaffected by transcriptional activity [36].

Technical artifacts also contribute to discordance. DNA-seq may identify rearrangements in non-expressed genes or non-functional open reading frames that never produce fusion transcripts [39]. Similarly, RNA-seq can detect trans-splicing events or read-through transcripts that do not correspond to actual genomic rearrangements [75]. These biological and technical factors necessitate careful interpretation of discordant results between platforms.

Sample Quality and Platform-Specific Limitations

Sample quality profoundly impacts method performance, particularly for RNA-seq. FFPE-derived RNA is often degraded, potentially affecting fusion detection sensitivity [75]. However, a 2025 study comparing matched FFPE and freshly frozen colorectal cancer samples found no statistically significant difference in fusion detection rates when using appropriate library preparation methods optimized for degraded RNA [75].

DNA-seq panels vary significantly in their coverage of intronic regions where breakpoints occur. Panels without comprehensive intron tiling may miss rearrangements in key genes, while those with extended intronic coverage improve detection but increase sequencing costs and data analysis complexity [45]. The FindDNAFusion study demonstrated that a combinatorial bioinformatics approach applied to DNA panels with intron-tiled bait probes could achieve 98.0% accuracy [45].

RNA-seq chemistry also influences detection capabilities. Amplicon-based approaches offer high sensitivity for known fusions but may miss novel partners, while hybridization-capture methods provide more comprehensive coverage but require higher RNA input and are more susceptible to degradation effects [14]. The choice of bioinformatics pipelines substantially impacts both false-positive and false-negative rates across both platforms [39].

Clinical Implications and Integrative Testing Approaches

Impact on Patient Care

The choice between DNA-seq and RNA-seq for fusion detection has direct implications for therapeutic decision-making. In NSCLC, the identification of RET fusions dictates eligibility for RET inhibitors such as selpercatinib and pralsetinib [36]. Studies show that targeted RNA-seq can identify additional RET+ cases missed by DNA-seq alone, potentially expanding the population eligible for these targeted therapies [36].

In leukemia, the detection of specific fusions directly influences risk stratification and treatment selection. For example, KMT2A rearrangements in AML warrant more intensive induction regimens and allogeneic transplantation in first remission, while RUNX1::RUNX1T1 fusions may respond to chemotherapy alone [74]. The superior detection of enhancer-hijacking events by DNA-based methods ensures appropriate risk assignment for patients who might otherwise be misclassified [21].

Comprehensive fusion detection also facilitates identification of rare or novel fusion events with clinical relevance. In one study of colorectal cancer, RNA-seq identified a potentially actionable LRRFIP2::ALK fusion not previously described in this cancer type, with an intact tyrosine kinase domain that could be targeted by ALK inhibitors [75]. Such findings underscore the therapeutic opportunities enabled by thorough fusion profiling.

Optimized Testing Algorithms

Based on concordance data from real-world studies, integrated testing approaches maximize clinical sensitivity for fusion detection. Sequential testing algorithms, beginning with DNA-based panels followed by RNA-seq for negative cases, provide a cost-effective strategy for comprehensive profiling [14]. This approach identified an additional 9 actionable fusions in 120 reflex-tested NSCLC cases that were missed by initial DNA-based testing [14].

For clinical scenarios where tissue is limited or rapid turnaround is essential, targeted RNA-seq panels offer a practical solution with high sensitivity for therapeutically relevant fusions. The 199-gene Archer FusionPlex panel demonstrated 100% concordance with RT-PCR in leukemia samples while identifying novel fusions such as RUNX1::DOPEY2, RUNX1::MACROD2, and ZCCHC7::LRP1B [74].

Parallel DNA and RNA testing from a single specimen provides the most comprehensive approach, particularly for cancers with diverse fusion mechanisms and partners. The 2025 study in Communications Medicine validated a combined RNA and DNA exome assay across 2,230 tumors, demonstrating improved detection of actionable alterations in 98% of cases [69]. This integrated approach enabled direct correlation of somatic alterations with gene expression and revealed complex genomic rearrangements that would likely have remained undetected with either method alone [69].

Essential Research Reagents and Tools

Table 3: Key Research Reagent Solutions for Fusion Detection Studies

Reagent/Tool Category Specific Examples Function in Fusion Detection
DNA-seq Panels GeneseeqPrime 425-gene panel [36] Comprehensive genomic breakpoint detection with extended intronic coverage
RNA-seq Panels Archer FusionPlex Pan-Heme (199 genes) [74] Targeted detection of fusion transcripts via anchored multiplex PCR
Library Prep Kits TruSeq stranded mRNA kit [69], KAPA RNA Hyper with rRNA Erase [75] Library construction from RNA, ribosomal RNA depletion
Bioinformatics Tools STAR-Fusion [75], FindDNAFusion [45], Archer Analysis [74] Fusion transcript identification, genomic breakpoint calling
Reference Standards GeneWell fusion reference standards [34] Analytical validation, limit of detection studies

DNA-seq vs RNA-seq Fusion Detection Workflow

G cluster_DNA DNA-seq Pathway cluster_RNA RNA-seq Pathway Sample Tumor Sample DNA_Extraction DNA Extraction & Library Prep Sample->DNA_Extraction RNA_Extraction RNA Extraction & Library Prep Sample->RNA_Extraction DNA_Sequencing DNA Sequencing (Whole Exome/Genome) DNA_Extraction->DNA_Sequencing DNA_Analysis Structural Variant Calling (FindDNAFusion) DNA_Sequencing->DNA_Analysis DNA_Output Genomic Breakpoints Detected DNA_Analysis->DNA_Output Concordance Concordance Analysis (88.1% Overall) DNA_Output->Concordance RNA_Sequencing RNA Sequencing (Targeted/Whole Transcriptome) RNA_Extraction->RNA_Sequencing RNA_Analysis Fusion Transcript Calling (STAR-Fusion) RNA_Sequencing->RNA_Analysis RNA_Output Expressed Fusion Transcripts Detected RNA_Analysis->RNA_Output RNA_Output->Concordance

Factors Influencing Concordance and Discordance

G cluster_DNA DNA-seq Unique Detection cluster_RNA RNA-seq Unique Detection cluster_Discordant Discordance Factors Concordance Concordance Rate (88.1% Overall) EnhancerHijacking Enhancer Hijacking Events (MECOM, BCL11B) EnhancerHijacking->Concordance NonExpressedFusions Non-Expressed or Silent Rearrangements NonExpressedFusions->Concordance DNA_Technical Comprehensive Intronic Coverage Advantage DNA_Technical->Concordance LowDNAComplexity Complex Rearrangements with Simple DNA Patterns LowDNAComplexity->Concordance HighExpression Highly Expressed Fusion Transcripts HighExpression->Concordance RNA_Technical Novel Partner Detection Capability RNA_Technical->Concordance Biological Biological Mechanisms (Enhancer Hijacking vs. Fusion Transcripts) Biological->Concordance Technical Technical Limitations (Coverage Gaps, Degradation) Technical->Concordance Analytical Bioinformatic Pipeline Differences Analytical->Concordance

Comparative analyses across diverse cancer types consistently demonstrate that DNA-seq and RNA-seq provide complementary rather than redundant information for fusion gene detection. The 88.1% overall concordance rate observed in large leukemia cohorts, with platform-specific unique detections accounting for approximately 25% of clinically relevant events, underscores the limitations of relying on a single methodology [21]. Biological mechanisms, particularly enhancer hijacking events that do not generate fusion transcripts, fundamentally drive these discordances and necessitate DNA-based detection approaches [21].

For clinical laboratories and research institutions, the evidence supports integrated testing algorithms that combine DNA and RNA analysis to maximize detection of actionable fusions. Reflexive testing pathways, beginning with comprehensive DNA panels followed by targeted RNA-seq for negative cases, provide a practical balance between cost-effectiveness and sensitivity [14]. For precision oncology initiatives where tissue is limited or comprehensive profiling is prioritized, parallel DNA and RNA sequencing from a single sample offers the most complete characterization of the fusion landscape [69]. As therapeutic options targeting gene fusions continue to expand, ensuring their reliable detection through multimodal approaches becomes increasingly critical for optimizing patient outcomes across the spectrum of hematologic and solid tumor malignancies.

The precise detection of genomic rearrangements, particularly fusion genes, is a critical component of cancer diagnosis, prognosis, and therapeutic decision-making. The establishment of rigorous analytical sensitivity and specificity parameters forms the foundation of any reliable clinical detection assay. This guide provides a systematic comparison of two prominent technological approaches for fusion detection: targeted RNA sequencing (RNA-seq) and optical genome mapping (OGM) as a DNA-level method. As multi-modal testing becomes increasingly common in diagnostic settings, understanding the performance characteristics, limitations, and complementary strengths of these platforms is essential for researchers, clinical laboratories, and drug development professionals navigating the complex landscape of genomic structural variant detection [21].

Performance Comparison: Targeted RNA-seq vs. Optical Genome Mapping

A comprehensive 2025 study directly compared a 108-gene targeted RNA-seq panel with OGM across 467 acute leukemia cases, providing robust performance data in a clinical context [21]. The findings demonstrate distinct and complementary strengths for each technology.

Table 1: Overall Performance Metrics in Acute Leukemia (n=467 cases)

Performance Metric Targeted RNA-seq Optical Genome Mapping (OGM) Combined Approach
Overall Concordance 88.1% with OGM 88.1% with RNA-seq -
Unique Detection of Clinically Relevant Rearrangements 22/234 (9.4%) 37/234 (15.8%) 59/234 (25.2%)
Detection Rate in T-ALL Higher Concordance 41.7% Concordance -
Enhancer-Hijacking Lesions (e.g., MECOM, BCL11B) Poor (20.6% Concordance) Effectively Detects -
Fusions from Intrachromosomal Deletions Effective Detection May interpret as simple deletions -

The data reveals that while overall concordance is high, each method uniquely contributes to the diagnostic yield. OGM demonstrated a significant advantage in detecting cryptic, enhancer-driven rearrangements that often evade RNA-based detection. Conversely, targeted RNA-seq showed slightly superior performance for fusions arising from intrachromosomal deletions, which OGM sometimes misclassified as simple deletions [21]. This underscores the principle that the choice of platform profoundly influences the spectrum of detectable alterations.

Table 2: Assay Performance Across Leukemia Subtypes

Leukemia Type Cases (n) Tier 1 Aberration Detection Rate Key Fusion Examples
Acute Myeloid Leukemia (AML) 360 23.9% KMT2A, MECOM
B-Acute Lymphoblastic Leukemia (B-ALL) 89 60.7% BCR::ABL1
T-Acute Lymphoblastic Leukemia (T-ALL) 12 Information Missing BCL11B
Mixed Phenotype Acute Leukemia (MPAL) 6 Information Missing -

Establishing Limits of Detection: Experimental Protocols

Determining the Limit of Detection (LoD) is a critical step in assay validation. The following sections detail common experimental approaches used to establish analytical sensitivity for both RNA-seq and DNA-based methods.

LoD Determination for RNA-seq Assays

For RNA-seq fusion detection, a standard protocol for determining LoD involves serial dilution experiments. In one validation study, RNA from the H2228 cell line (which harbors a known EML4-ALK fusion) was diluted into fusion-negative background RNA. The results demonstrated reliable fusion detection down to a 10% variant allele frequency, establishing the assay's LoD at this level [27].

For even more precise quantification, studies utilize synthetic spike-in controls. One such approach used in silico-generated fusion transcripts spiked into RNA-seq data from benign tissue (H1 human embryonic stem cells) at nine different expression levels, ranging from five- to 200-fold [7]. Another employed synthetic RNA molecules mimicking oncogenic fusions, which were spiked into RNA libraries at 10 different concentrations (from 10^-8.57 pMol to 10^-3.47 pMol) across 20 replicates [7]. A separate study used "fusion sequins" (spike-in controls for fusion genes) spiked into cell line RNA, achieving 50% detection at 2 pM input and 100% detection across a dynamic range of 8 pM to 31 nM [26].

LoD Determination for DNA-based Methods

While the provided search results focus more on RNA-seq validation, OGM and other DNA-based methods similarly require rigorous LoD studies. These typically involve diluting DNA from cell lines with known structural variants into wild-type DNA, followed by statistical analysis to determine the lowest detectable variant allele fraction with high confidence. The HemoTargets and hg38-primary transcript feature files are commonly used in OGM data analysis for this purpose [21].

Complementary Strengths and Weaknesses: A Technological Perspective

The performance differences between RNA-seq and OGM stem from their fundamental technological principles.

Targeted RNA Sequencing

Targeted RNA-seq uses biotinylated oligonucleotide probes to enrich for transcripts of interest prior to sequencing. This enrichment leads to a significant increase in coverage for targeted genes—achieving up to 93% on-target reads and a 33- to 59-fold enrichment compared to standard RNA-seq [26]. This enhanced coverage directly improves sensitivity for lowly expressed fusion transcripts.

Key Strengths:

  • Direct Evidence of Expression: Detects functionally relevant fusion transcripts.
  • Sensitivity: Capable of identifying fusions even at low expression levels due to targeted enrichment [26].
  • Resolution: Provides single-base resolution of fusion junctions [26].
  • Additional Data: Simultaneously provides gene expression quantification and can profile immune receptor repertoires [26].

Key Limitations:

  • Transcriptional Dependency: Cannot detect rearrangements in non-expressed genes or promoter/enhancer hijacking events that do not produce fusion transcripts [21].
  • RNA Quality Dependency: Performance is highly dependent on RNA integrity, which can be challenging with formalin-fixed, paraffin-embedded (FFPE) samples [27].

Optical Genome Mapping (OGM)

OGM operates at the DNA level, using ultra-high-molecular-weight DNA that is labeled at specific enzyme recognition sites and imaged to create a genome-wide physical map.

Key Strengths:

  • Genome-Wide View: Capable of detecting structural variants across the entire genome without prior knowledge [21].
  • Non-Transcriptional Events: Effectively identifies enhancer-hijacking lesions and other structural rearrangements that do not result in fusion transcripts [21].
  • Single-Molecule Resolution: Provides long-range genomic context.

Key Limitations:

  • Functional Ambiguity: Cannot distinguish between expressed fusion transcripts and silent genomic rearrangements.
  • Resolution Limits: May miss small intragenic rearrangements or misinterpret fusions from intrachromosomal deletions as simple deletions [21].
  • Complex Analysis: Requires sophisticated bioinformatics pipelines for variant calling.

G Start Sample Input DNA DNA Analysis (Optical Genome Mapping) Start->DNA RNA Targeted RNA-seq Start->RNA DNA_Strength Strengths: - Detects enhancer hijacking - Genome-wide view - Non-transcriptional events DNA->DNA_Strength DNA_Weakness Limitations: - Functional ambiguity - May miss small deletions DNA->DNA_Weakness RNA_Strength Strengths: - Direct evidence of expression - Fusion junction resolution - Quantifies expression RNA->RNA_Strength RNA_Weakness Limitations: - Transcriptional dependency - RNA quality sensitivity RNA->RNA_Weakness Outcome Comprehensive Fusion Detection DNA_Strength->Outcome RNA_Strength->Outcome DNA_Weakness->Outcome RNA_Weakness->Outcome

Technological Synergy in Fusion Detection

Successful implementation of a fusion detection assay requires careful selection of reagents and computational tools.

Table 3: Research Reagent Solutions for Fusion Detection Assays

Category Specific Tool / Reagent Function in Assay
Target Enrichment TruSight RNA Pan-Cancer Panel (Illumina) [76] Targeted capture of 1385 cancer-related genes for RNA-seq
Anchored Multiplex PCR (AMP) [21] Target enrichment method for targeted RNA-seq panels
Bioinformatic Pipelines STAR-Fusion [7] [76] Algorithm for fusion detection from RNA-seq data
FusionCatcher [7] [26] Fusion detection algorithm
Arriba [7] [76] High-sensitivity fusion detection algorithm
Validation Tools Fusion Sequins [26] Synthetic spike-in RNA standards for quantification
ERCC RNA Spike-In Controls [26] External RNA controls for quality assessment
Analysis Software Bionano Access & VIA Software [21] OGM data analysis and visualization
Archer Analysis [21] Software for variant calling in AMP-based sequencing

The establishment of analytical sensitivity and specificity is not merely a regulatory requirement but a fundamental scientific practice that directly impacts patient care. Targeted RNA-seq excels where functional transcript detection is paramount, offering high sensitivity and precise junction resolution for expressed fusions. In contrast, OGM provides a genome-wide surveillance capability that is agnostic to transcriptional activity, making it indispensable for detecting enhancer hijacking and other non-transcriptional structural variants. Rather than viewing these technologies as competitive, the most comprehensive approach for fusion detection in clinical and research settings involves their strategic integration. The synergistic use of both RNA and DNA-level analyses maximizes detection sensitivity and ensures that clinically significant structural variants are not overlooked, ultimately advancing the goals of precision oncology.

Oncogenic gene fusions are critical drivers in numerous cancers, with profound implications for diagnosis, prognosis, and targeted therapy selection. The accurate detection of these hybrid genes, formed through chromosomal rearrangements like translocations, deletions, and inversions, is therefore paramount in clinical oncology and research [3]. For years, the question has persisted: what is the optimal molecular method for fusion detection? Next-generation sequencing (NGS) technologies have emerged as powerful tools, but they are primarily split into two approaches: DNA sequencing (DNA-seq), which interrogates the genome for structural variants, and RNA sequencing (RNA-seq), which captures expressed fusion transcripts. While each method has its strengths, relying on either one alone can lead to missed detections. This case study objectively compares the performance of these platforms and demonstrates, through experimental data, that an integrated DNA and RNA sequencing approach provides the most comprehensive detection of clinically relevant gene fusions, ultimately enhancing patient stratification for targeted therapies.

Performance Comparison: DNA-seq vs. RNA-seq

Head-to-head comparisons in clinical cohorts reveal that DNA-seq and RNA-seq have complementary detection capabilities, with neither platform identifying all fusion events on its own. The following tables summarize key performance metrics from recent studies.

Table 1: Comparative Detection Rates in Acute Leukemia (467 cases) [21]

Method Clinically Relevant Rearrangements Detected Percentage of Total Notable Strengths
OGM (DNA-level) 37 / 234 15.8% Superior for enhancer-hijacking lesions (e.g., MECOM, BCL11B, IGH rearrangements)
Targeted RNA-Seq 22 / 234 9.4% Better for fusions from intrachromosomal deletions and expressed chimeric fusions
Concordant Findings 175 / 234 74.7% ---

Table 2: Performance in Solid Tumors (Non-Small Cell Lung Cancer and other solid tumors) [34] [14] [37]

Study Context Method Key Finding Implication
Early-Stage NSCLC (RET fusions) DNA-seq Identified putative RET+ cases Can miss fusions involving large introns or complex rearrangements [37]
Targeted RNA-seq Identified additional actionable RET+ cases missed by other methods Higher sensitivity for detecting expressed fusion transcripts [37]
120 Reflex NSCLC Cases Amplicon-based DNA/RNA assay Missed 9 oncogenic fusions Limitations in detecting rare/novel fusions with amplicon-based designs [14]
Reflex Hybridization-Capture RNA-seq Detected 9 fusions (in ALK, BRAF, NRG1, NTRK3, ROS1, RET) Essential for maximizing detection of rare and novel oncogenic fusions [14]
60 Clinical Solid Tumor Samples DNA-based NGS alone 93.4% (57/61) concordance with previous results Missed fusions like ETV6::NTRK3 and CCDC6::RET [34]
RNA-based NGS alone 86.9% (53/61) concordance with previous results Missed fusions like TRIM46::NTRK1 and CD74::ROS1 [34]
Integrated DNA/RNA NGS 100% Sensitivity & Specificity Identified and validated a previously missed TPM3::NTRK1 fusion Combined approach corrects for individual method limitations [34]

Experimental Insights and Methodologies

The comparative data is derived from rigorously validated clinical and research assays. The following section outlines the standard experimental protocols used to generate these findings.

DNA-Based Fusion Detection Protocols

DNA sequencing methods identify the genomic breakpoints where two independent genes have joined. Optical Genome Mapping (OGM) and targeted DNA panels are two prominent techniques.

  • Optical Genome Mapping (OGM): This technique utilizes ultra-high-molecular-weight DNA. The DNA is fluorescently labeled at specific enzyme recognition sites, linearized in nanochannels, and imaged. The resulting molecule maps are assembled and compared to a reference genome to identify large structural variants, including those leading to gene fusions [21]. Analysis is typically performed using manufacturer-specific software like Bionano Access.
  • Targeted DNA Sequencing Panels: This approach involves extracting genomic DNA from patient samples (e.g., fresh blood or FFPE tissue). Sequencing libraries are prepared, often using hybrid capture-based methods with probes designed to target the intronic regions of genes of interest where breakpoints frequently occur. The libraries are sequenced on platforms like Illumina HiSeq4000, and data is analyzed with bioinformatic tools such as Delly to call structural variants and fusions [37].

RNA-Based Fusion Detection Protocols

RNA sequencing detects the chimeric transcripts that result from gene fusions, providing direct evidence of expression.

  • Targeted RNA Sequencing: RNA is extracted from specimens and converted to complementary DNA (cDNA). Target enrichment is achieved through either amplicon-based (e.g., Archer) or hybridization-capture-based methods. Amplicon-based panels use gene-specific primers to amplify known and novel fusion partners, while capture-based panels use probes to pull down target regions [21] [14]. After sequencing, fusion transcripts are identified using specialized software such as Archer Analysis, FusionCatcher, or Arriba [21] [9].
  • Whole-Transcriptome Sequencing (WTS): This method sequences the entire complement of RNA in a sample without prior target enrichment. While it allows for hypothesis-free discovery, it requires a higher sequencing depth and is more susceptible to issues with RNA quality, especially in FFPE samples [37].

Integrated DNA/RNA Sequencing Workflow

The most robust diagnostic strategy involves a complementary workflow. A sample first undergoes targeted DNA sequencing. If negative for a driver mutation or if there is a high clinical suspicion of a fusion, it is reflexed to targeted RNA sequencing. This combined approach ensures that fusions missed by one method (e.g., due to large introns in DNA-seq or low expression in RNA-seq) can be captured by the other [14] [37]. The conceptual relationship between these methods in detecting a fusion event is illustrated below.

G A Chromosomal DNA B DNA-Sequencing (DNA-seq) A->B D Transcription A->D C Genomic Breakpoint B->C G Detected Gene Fusion C->G E Fusion Transcript (RNA) D->E F RNA-Sequencing (RNA-seq) E->F F->G

Analysis of Detection Gaps and Biological Causes

The performance disparities between DNA-seq and RNA-seq are not random but stem from fundamental biological and technical factors.

  • Enhancer-Hijacking Events: DNA-seq (specifically OGM) is uniquely powerful in identifying rearrangements that place an oncogene under the control of a highly active enhancer from another gene (e.g., IGH rearrangements). These events can dramatically upregulate the oncogene without producing a fusion transcript, making them invisible to RNA-seq [21]. One study found concordance for these lesions was as low as 20.6% between RNA-seq and OGM [21].
  • Intronic Size and Breakpoint Complexity: DNA panels can miss fusions if the genomic breakpoints fall in large intronic regions not covered by the design probes or in complex regions that are difficult to map. Since RNA-seq sequences the spliced transcript, it bypasses introns and can often detect fusions that DNA-seq misses [37].
  • Low or Tissue-Restricted Expression: The primary limitation of RNA-seq is its dependence on adequate expression of the fusion transcript. If the fusion is expressed at very low levels or only in a small subset of cells, the RNA-seq assay may fail to detect it, whereas DNA-seq can identify the structural variant regardless of expression status [34].
  • Fusion Interpretation: Some intrachromosomal deletions that generate in-frame fusion transcripts are sometimes labeled as simple deletions by OGM, leading to potential misclassification unless confirmed by RNA-seq [21].

Table 3: Key Research Reagent Solutions for Fusion Detection

Item / Solution Function in Experiment Specific Examples / Notes
Targeted RNA-Seq Panels Multiplexed detection of known and novel fusion transcripts from RNA. Archer (AMP-based), Paragon Genomics AccuFusion (amplicon-based), Hybridization-capture panels [21] [77].
Targeted DNA-Seq Panels Interrogation of genomic DNA for structural variants and breakpoints. Large panels (e.g., 425-gene DNA panel) often using hybrid capture technology [37].
Optical Genome Mapping (OGM) Genome-wide detection of structural variants without sequencing, at the DNA level. Bionano Genomics platform; effective for enhancer-hijacking and cryptic rearrangements [21].
Bioinformatics Pipelines Critical for analyzing NGS data, calling fusions, and filtering false positives. Arriba, FusionCatcher (for RNA-seq); Delly (for DNA-seq); custom filtering strategies are essential [9] [8].
Reference Standards Assay validation, determining sensitivity, specificity, and limit of detection. Commercial fusion RNA reference materials (e.g., Seraseq Fusion RNA Mix) [34] [77].

Signaling Pathways and Clinical Actionability

Oncogenic fusions typically create constitutively active proteins that drive tumor growth through key signaling pathways, making them prime targets for therapy. The central pathway activated by many receptor tyrosine kinase (RTK) fusions is illustrated below.

G A Oncogenic Fusion Protein (e.g., EML4-ALK, RET fusions) B Constitutive Dimerization & Activation A->B C MAPK Pathway (Proliferation) B->C D PI3K/AKT Pathway (Cell Survival) B->D E JAK/STAT Pathway B->E F Uncontrolled Cell Growth & Cancer Progression C->F D->F E->F

The clinical significance of detecting these fusions is profound. For instance, the presence of an EML4-ALK fusion in non-small cell lung cancer (NSCLC) makes patients eligible for ALK tyrosine kinase inhibitors like crizotinib and ceritinib, which have significantly improved outcomes [3]. Similarly, NTRK fusions across various tumor types can be targeted with TRK inhibitors such as larotrectinib and entrectinib [3]. The BCR-ABL1 fusion, hallmark of chronic myeloid leukemia, is successfully treated with imatinib and other TKIs [3]. Accurate detection is the critical first step that unlocks these targeted treatment options for patients.

The evidence from multiple clinical studies is clear: DNA-seq and RNA-seq are complementary, not redundant, technologies for gene fusion detection. DNA-based methods excel in identifying structural rearrangements, including those that do not produce fusion transcripts, while RNA-based methods directly capture the expressed chimeric products, often with higher sensitivity for fusions arising from complex genomic regions. Relying on a single methodology inevitably creates diagnostic blind spots, potentially depriving a subset of patients of life-changing targeted therapies. Therefore, an integrated diagnostic approach, leveraging the strengths of both DNA and RNA sequencing, represents the new gold standard for comprehensive fusion detection in oncology research and clinical practice. This synergistic strategy ensures the highest possible detection rate for these critical oncogenic drivers, ultimately advancing the goals of precision medicine.

The accurate detection of RET (REarranged during Transfection) fusions is critical for guiding targeted therapy in multiple cancers, including non-small cell lung cancer (NSCLC) and thyroid cancer. This case study objectively compares the performance of various molecular diagnostic platforms—RNA sequencing (RNA-seq), DNA sequencing (DNA-seq), fluorescence in situ hybridization (FISH), and immunohistochemistry (IHC)—for identifying these clinically actionable alterations. As selective RET inhibitors like selpercatinib and pralsetinib demonstrate response rates of 64-70% in RET-altered cancers, optimal detection methods directly impact patient eligibility for effective treatments [78]. Evidence from recent studies indicates that an integrative approach, combining DNA-seq with RNA-seq, achieves the most comprehensive detection profile, overcoming the limitations inherent in any single methodology [36] [79].

RET fusions are oncogenic drivers resulting from chromosomal rearrangements that fuse the 3' kinase domain of RET with the 5' domain of a partner gene. This rearrangement leads to constitutive activation of the RET tyrosine kinase, promoting tumorigenesis through unchecked cellular proliferation and survival signals [78]. The prevalence of RET fusions varies by tumor type, occurring in approximately 1-2% of NSCLC cases, ~10% of papillary thyroid cancers, and at lower frequencies in other solid tumors [36] [79]. Over 100 different partner genes have been identified, with KIF5B, CCDC6, and NCOA4 being the most common. The distribution of these partners is cancer-type specific: KIF5B predominates in lung cancer (66-68%), while CCDC6 and NCOA4 are more frequent in thyroid cancer [78]. This diversity, coupled with breakpoints predominantly located in intron 11 (87% of cases), presents a significant challenge for detection assays [78].

Performance Comparison of Detection Platforms

Quantitative Platform Performance Metrics

Table 1: Performance Metrics of RET Fusion Detection Platforms

Detection Platform Sensitivity (%) Specificity (%) Key Strengths Key Limitations
DNA Sequencing (DNA-seq) 100 [79] 99.6 [79] High-throughput; detects genomic breakpoints; good specificity [79]. May miss fusions with large introns or complex rearrangements [36].
RNA Sequencing (RNA-seq) N/A N/A Confirms expressed fusion transcripts; identifies novel partners; assesses functionality [36] [78]. Dependent on RNA quality and gene expression levels [36].
Fluorescence In Situ Hybridization (FISH) 91.7 [79] N/A Partner-agnostic; single-cell resolution; standardized for some cancers [79] [78]. Lower sensitivity for NCOA4-RET (66.7%); cannot identify partner gene; subjective interpretation [79] [78].
Immunohistochemistry (IHC) Variable by partner [79] ~82 [79] Low cost; fast turnaround; readily available in most labs [79]. Low overall sensitivity; variable specificity (40-85%); not recommended for standalone use [78].

Table 2: Partner-Gene Dependent Performance of FISH and IHC

Fusion Partner FISH Sensitivity IHC Sensitivity
KIF5B::RET High 100% [79]
CCDC6::RET High 88.9% [79]
NCOA4::RET 66.7% [79] 50% [79]

Concordance Analysis Between Platforms

Studies directly comparing these methodologies reveal critical insights into their concordance. A 2025 study on early-stage NSCLC found a 92.3% concordance between DNA-seq and RNA-seq for identifying RET fusions. The concordance between RNA-seq and FISH was 84.6%, and between DNA-seq and FISH was 82.5% [36]. This high inter-method agreement is counterbalanced by unique detections from each platform, underscoring their complementarity.

Notably, DNA-seq sometimes identifies structural variants of unknown significance (SVUS). In one pan-cancer study, 37.5% (12/32) of these RET SVUS were confirmed as oncogenic fusions by RNA-seq, emphasizing the necessity of RNA-level confirmation for ambiguous DNA findings [79]. Conversely, FISH can be positive in cases where RNA-seq does not detect a fusion transcript, as was observed in 87.5% (7/8) of RNA-negative RET SVUS cases [79]. This discordance may arise from technical factors or biologically inactive rearrangements.

Detailed Experimental Protocols

DNA Sequencing (MSK-IMPACT Assay)

The MSK-IMPACT (Integrated Mutation Profiling of Actionable Cancer Targets) assay is a hybridization capture-based next-generation sequencing (NGS) method performed on formalin-fixed, paraffin-embedded (FFPE) tissue [79].

  • DNA Extraction & QC: Genomic DNA is extracted using specialized kits for FFPE tissue. Quality and quantity are assessed using a Nanodrop 2000 and a Qubit fluorometer with the dsDNA HS assay kit [36].
  • Library Preparation & Target Enrichment: Sequencing libraries are prepared using the KAPA Hyper Prep kit. The libraries are hybridized to custom bait sets targeting all exons and select introns (including RET introns 9-11) of hundreds of cancer-related genes [79].
  • Sequencing & Analysis: The enriched library is sequenced on an Illumina HiSeq 4000 system. Sequence reads are aligned to the human reference genome (GRCh37/hg19) using the Burrows-Wheeler Aligner (BWA). Structural variants (SVs) are called using DELLY, and all RET SVs are manually reviewed [79].

RNA Sequencing (Archer FusionPlex Assay)

The Archer FusionPlex assay utilizes Anchored Multiplex PCR (AMP) for targeted RNA-seq to detect fusion transcripts, even with unknown partners [36] [78].

  • RNA Extraction & QC: RNA is extracted from peripheral blood or bone marrow aspirate specimens (or FFPE tissue). Quality and integrity are critical for success.
  • Library Preparation (AMP): The method uses unidirectional gene-specific primers (GSP2) targeting one of the partner genes involved in a translocation. This allows for the capture of novel fusion partners. Adapter-ligated primers are used to amplify the targets [21] [78].
  • Sequencing & Analysis: Amplified targets are sequenced on an Illumina platform. Sequencing reads are aligned to the human reference genome (e.g., GRCh37/hg19) using analysis software like Archer Analysis [21]. Fusion transcripts are typically only reported if they meet quality thresholds, such as a minimum of 5 unique reads and 3 unique reads with unique start sites [79].

Coverage Imbalance Analysis

A novel bioinformatic approach for fusion detection involves analyzing the 5'/3' coverage imbalance in RNA-seq data. This method is particularly useful for identifying 3' fusions of druggable kinases like RET.

  • Principle: In a normal, unfused gene, RNA-seq read coverage is relatively balanced across the transcript. When an oncogenic fusion occurs, the 3' portion of the gene (e.g., the RET kinase domain) comes under the control of a highly active promoter from the 5' partner gene. This leads to a marked increase in read coverage for the 3' exons of RET compared to its 5' exons [78].
  • Workflow: RNA-seq reads are aligned to the reference transcriptome. The coverage depth for each exon of the target gene (RET) is calculated. A significant imbalance ratio, favoring the 3' exons, is indicative of a potential fusion event. This method can serve as a robust complement to traditional fusion callers [78].

G Start RNA-seq Read Alignment A Calculate Exon Coverage Depth Start->A B Compute 5'/3' Coverage Ratio A->B C Ratio > Threshold? B->C D Fusion Unlikely C->D No E Potential RET Fusion Identified C->E Yes

Coverage Imbalance Analysis Workflow: A bioinformatics pipeline for detecting gene fusions based on asymmetrical RNA-seq read coverage between the 5' and 3' ends of a gene.

Integrated Testing Strategies & Workflows

Given the limitations of individual platforms, clinical laboratories are increasingly adopting reflex testing algorithms to maximize detection rates. A common and effective strategy involves starting with a broad, DNA-based NGS panel to screen for a wide range of genomic alterations, including point mutations, copy number changes, and known fusions.

  • Reflex to RNA-seq: Cases that are negative for a clear mitogenic driver or that harbor structural variants of unknown significance (SVUS) on DNA-seq are automatically "reflexed" to a targeted RNA-seq assay [14]. This approach significantly improves the detection of rare and novel oncogenic fusions. In one study of 1,211 NSCLC specimens, approximately 10% required reflex RNA testing, which successfully identified actionable fusions in 9 cases (including ALK, BRAF, NRG1, NTRK3, ROS1, and RET) that were missed by the initial amplicon-based DNA assay [14].

  • Complementary FISH/IHC: In specific diagnostic scenarios, or when NGS is inconclusive/unavailable, FISH and IHC can provide orthogonal validation. However, their variable performance, particularly for fusions involving NCOA4, must be considered [79].

G Start Tumor Sample (FFPE/Fresh) DNAseq DNA Sequencing (e.g., MSK-IMPACT) Start->DNAseq Decision1 Driver Identified or RET SVUS? DNAseq->Decision1 Report1 Report Result Decision1->Report1 Yes, Clear Driver RNAseq Reflex to RNA Sequencing (e.g., Archer FusionPlex) Decision1->RNAseq No Driver or RET SVUS Decision2 Fusion Transcript Detected? RNAseq->Decision2 Report2 Report RET Fusion Positive Decision2->Report2 Yes Report3 Report RET Fusion Negative or Investigate Further Decision2->Report3 No

Integrated RET Fusion Testing Algorithm: A decision-tree workflow illustrating a reflex testing model that combines DNA and RNA sequencing for comprehensive fusion detection.

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Reagents and Kits for RET Fusion Analysis

Product/Technology Primary Function Application Context
QIAamp DNA FFPE Tissue Kit (Qiagen) High-quality DNA extraction from challenging FFPE samples. DNA-seq library prep [36].
KAPA Hyper Prep Kit (Roche) Library construction for NGS, compatible with hybridization capture. DNA-seq library preparation for assays like MSK-IMPACT [36].
Archer FusionPlex Solid/Lymphoma Panels Targeted RNA-seq library prep using Anchored Multiplex PCR (AMP). Detection of fusion transcripts from RNA, agnostic to known partners [21] [78].
BWA (Burrows-Wheeler Aligner) Alignment of sequencing reads to a reference genome. Standard primary analysis step for both DNA-seq and RNA-seq data [36].
DELLY Structural variant caller from DNA-seq data. Identification of genomic rearrangements, including RET fusions, from DNA-seq [79].
Trimmomatic Pre-processing of raw NGS reads to remove adapters and low-quality bases. Quality control (QC) step in both DNA-seq and RNA-seq pipelines [36].

RET Signaling Pathway and Fusion Mechanism

RET Signaling and Fusion Mechanism: A comparison of ligand-dependent normal RET signaling versus the constitutive activation caused by oncogenic gene fusions.

No single platform is universally superior for detecting RET fusions. DNA-seq offers high sensitivity and specificity for known fusions but can miss functionally relevant events or yield inconclusive SVUS. RNA-seq directly confirms expressed fusion transcripts and identifies novel partners, making it ideal for confirming DNA findings and Interrogating fusion-negative yet driver-negative cases. FISH and IHC have specific, more limited roles due to variable, partner-dependent sensitivity and an inability to identify the specific fusion partner.

The evidence strongly supports an integrative diagnostic approach. A synergistic workflow, beginning with DNA-seq and reflexing to RNA-seq in ambiguous or negative cases, provides the most comprehensive and clinically actionable profiling for RET fusions. This strategy maximizes patient eligibility for life-saving targeted therapies like selpercatinib and pralsetinib, embodying the precision medicine mandate in modern oncology.

Benchmarking Bioinformatics Tools for Accurate Fusion Calling

Gene fusions are critical molecular biomarkers in cancer, influencing diagnosis, prognosis, and therapeutic decisions. The accurate detection of these aberrations depends heavily on the choice of genomic approach—DNA-based or RNA-based sequencing—and the specific bioinformatics tools employed. This guide provides an objective comparison of fusion detection methodologies and tools, synthesizing performance data from recent studies to inform researchers and clinicians in selecting optimal approaches for their specific applications. Evidence consistently demonstrates that integrated DNA-RNA sequencing approaches maximize detection sensitivity for clinically relevant fusions, with performance varying significantly across tools and cancer types [21] [34] [69].

Table 1: Overall Performance Comparison of Fusion Detection Approaches

Approach Key Strengths Key Limitations Ideal Use Cases
RNA-seq Tools High sensitivity for expressed chimeric transcripts; identifies fusion products with functional potential May miss enhancer-hijacking events; dependent on expression levels Routine fusion screening in clinical settings; therapy selection
DNA-seq Tools Detects structural variants regardless of expression; identifies cryptic, enhancer-driven events May miss fusions from intrachromosomal deletions; cannot confirm expression Research discovery; comprehensive structural variant detection
Combined DNA-RNA Maximizes detection of both structural variants and expressed fusions; highest clinical utility Higher cost and computational requirements; complex workflow Precision oncology; complex diagnostic cases

Gene fusions arise from genomic rearrangements including chromosomal translocations, deletions, inversions, or duplications, and serve as important diagnostic, prognostic, and predictive biomarkers in oncology. The detection of these events can be approached at either the DNA level, identifying structural rearrangements in the genome, or at the RNA level, identifying chimeric transcripts resulting from these rearrangements. Each approach offers distinct advantages and limitations, with recent studies demonstrating their complementary nature [21] [34].

DNA-based methods excel at identifying structural variants regardless of their transcriptional activity, making them particularly valuable for detecting enhancer-hijacking events that may not produce fusion transcripts but can activate oncogenes through positional effects. In contrast, RNA-based methods detect expressed fusion products, providing functional validation of the DNA rearrangement and often representing the direct targets for therapeutic intervention. Understanding these fundamental differences is crucial for selecting appropriate detection strategies in both research and clinical settings [21] [69].

Performance Benchmarking of Fusion Detection Tools

Comparative Analysis of RNA-seq Fusion Detection Tools

Multiple studies have comprehensively evaluated the performance of fusion detection tools using various benchmarking datasets, including simulated fusion transcripts, spike-in controls, and clinically validated samples. The sensitivity, specificity, and computational efficiency vary considerably across tools.

Table 2: Performance Metrics of Leading RNA-seq Fusion Detection Tools

Tool Sensitivity (%) Specificity (%) Computational Efficiency Key Features
Arriba 88-100* High Fast (<1 hour/sample) Detects fusions, intragenic rearrangements, truncations
Fusion-Bloom 96* High Moderate (10-12 hours/100M reads) de novo assembly approach; base-pair precision
STAR-Fusion High High Fast Based on STAR aligner; well-documented
FusionCatcher High High Moderate Comprehensive pipeline; multiple alignment tools
JAFFA Moderate High Moderate Hybrid assembly approach; good for long reads
deFuse Moderate Moderate Slow Early tool; largely superseded by newer methods

*Sensitivity varies based on expression levels and dataset characteristics

In a landmark comparison of 12 fusion detection tools, performance varied significantly based on RNA-seq data quality, read length, and sequencing depth. Most tools showed trade-offs between sensitivity and false discovery rates, with no single tool performing optimally across all datasets [80]. However, more recent evaluations have identified several tools that consistently outperform others.

Arriba demonstrates particularly strong performance across multiple benchmarking datasets, identifying 88 of 150 simulated fusions at the lowest expression level (5-fold), all synthetic fusions in spike-in experiments, and 78 validated fusions in the MCF-7 cell line. This represents a sensitivity surplus of 13-60% compared to the next best method depending on the dataset [7]. Fusion-Bloom also shows excellent performance, detecting 48 of 50 known fusions with zero false positives in one benchmark, and all fusions across all molarities in spike-in experiments [81].

DNA vs. RNA Sequencing for Fusion Detection

A comprehensive 2025 study comparing targeted RNA-seq and optical genome mapping (OGM) in 467 acute leukemia cases revealed striking differences in detection capabilities between approaches. The overall concordance rate was 88.1%, but significant variations emerged when examining specific fusion types [21].

RNA-seq slightly outperformed OGM for fusions arising from intrachromosomal deletions, which were sometimes misinterpreted by OGM as simple deletions. Conversely, OGM uniquely detected 37 of 234 (15.8%) clinically relevant rearrangements, while RNA-seq exclusively identified 22 of 234 (9.4%). The most dramatic difference was observed for enhancer-hijacking lesions (including MECOM, BCL11B, and IGH rearrangements), which showed only 20.6% concordance between platforms, with many events missed by RNA-seq [21].

These findings underscore the complementary nature of DNA and RNA-based approaches. RNA-seq proves more sensitive for detecting expressed chimeric fusions, while OGM (a DNA-level method) excels at identifying cryptic, enhancer-driven events that do not generate fusion transcripts [21].

Integrated DNA-RNA Sequencing Approaches

Recognizing the limitations of single-modality approaches, researchers have developed integrated DNA-RNA sequencing assays that simultaneously leverage both data types. A 2025 validation study of a combined RNA and DNA exome assay across 2,230 clinical tumor samples demonstrated significantly improved detection of clinically actionable alterations compared to DNA-only testing [69].

This integrated approach enabled direct correlation of somatic alterations with gene expression, recovery of variants missed by DNA-only testing, and improved detection of gene fusions. The assay uncovered clinically actionable alterations in 98% of cases and revealed complex genomic rearrangements that would likely have remained undetected without RNA data [69].

Similarly, a custom-designed integrated DNA and RNA-based NGS assay for solid tumors demonstrated 100% sensitivity and specificity after confirming a previously false-negative TPM3::NTRK1 fusion. The study found that DNA and RNA results complemented each other, with each modality detecting fusions missed by the other [34].

Experimental Protocols for Benchmarking

Standardized Benchmarking Datasets

Robust evaluation of fusion detection tools requires diverse benchmarking datasets that mimic real-world scenarios:

  • In silico simulated datasets: Computer-generated fusion transcripts merged into real RNA-seq data from benign tissue, enabling precise sensitivity measurements across expression levels (typically 5- to 200-fold) [7] [80].

  • Spike-in reference standards: Synthetic RNA molecules mimicking oncogenic fusions spiked into RNA libraries at varying concentrations (e.g., 10^-8.57 pMol to 10^-3.47 pMol), allowing sensitivity limits to be determined [34] [7].

  • Cell line datasets: Well-characterized cancer cell lines (e.g., MCF-7) with orthogonally validated fusions, providing real-world performance assessment [7] [80].

  • Clinical patient cohorts: Samples from defined patient populations (e.g., ICGC early-onset prostate cancer cohort) with known prevalence of specific fusions [7].

Validation Methodologies

Comprehensive tool validation should incorporate multiple approaches:

  • Orthogonal validation: Confirmation of predicted fusions using independent methods such as FISH, RT-PCR, or Sanger sequencing [34] [80].

  • Tiered classification: Classification of variants according to established guidelines (e.g., ACMG/ClinGen, AMP/ASCO/CAP) into tiers based on clinical relevance [21].

  • Limit of detection (LOD) assessment: Determination of minimum mutation abundance (e.g., 5% for DNA, 250-400 copies/100ng for RNA) for reliable fusion detection through serial dilution experiments [34].

Visualizing Fusion Detection Workflows

fusion_detection cluster_seq Sequencing cluster_analysis Analysis Approaches cluster_detection Fusion Types Detected Start Sample Collection (DNA/RNA) DNA_seq DNA Sequencing (WES/OGM/Targeted) Start->DNA_seq RNA_seq RNA Sequencing (RNA-seq) Start->RNA_seq DNA_analysis DNA-based Fusion Detection (FindDNAFusion, OGM) DNA_seq->DNA_analysis RNA_analysis RNA-based Fusion Detection (Arriba, STAR-Fusion, Fusion-Bloom) RNA_seq->RNA_analysis DNA_fusions Structural Variants Enhancer Hijacking Cryptic Rearrangements DNA_analysis->DNA_fusions RNA_fusions Expressed Chimeric Transcripts Fusion Products from Deletions Functional Fusion RNAs RNA_analysis->RNA_fusions Integration Results Integration DNA_fusions->Integration RNA_fusions->Integration Clinical Clinical Reporting Integration->Clinical

Figure 1: Comprehensive Fusion Detection Workflow integrating both DNA and RNA sequencing approaches for maximal sensitivity.

Decision Framework for Tool Selection

decision_framework Start Primary Goal? A Clinical Diagnostics? Start->A B Research Discovery? A->B No Clinical_high Recommend: Combined DNA-RNA Approach (Arriba + FindDNAFusion) A->Clinical_high Yes E Fusion Type of Interest? B->E Yes C Available Resources? D Required Sensitivity? C->D Limited Limited_resources Recommend: Single Platform with High Sensitivity (Arriba or STAR-Fusion) C->Limited_resources Very Limited Clinical_RNA Recommend: RNA-focused (Arriba, FusionCatcher) D->Clinical_RNA High Research_DNA Recommend: DNA-focused (OGM, FindDNAFusion) E->Research_DNA Enhancer Hijacking Cryptic Rearrangements Research_RNA Recommend: RNA-focused (Arriba, STAR-Fusion) E->Research_RNA Expressed Fusions Therapeutic Targets

Figure 2: Decision Framework for selecting appropriate fusion detection strategies based on research goals, resources, and fusion types of interest.

Table 3: Essential Research Reagents and Computational Tools for Fusion Detection Studies

Category Specific Products/Tools Function/Purpose
Wet Lab Reagents TruSeq stranded mRNA kit (Illumina); SureSelect XTHS2 (Agilent); AllPrep DNA/RNA kits (Qiagen) Library preparation; nucleic acid extraction
Reference Standards GeneWell fusion reference standards; synthetic spike-in RNA controls; characterized cell lines (e.g., MCF-7) Assay validation; sensitivity determination; quality control
Computational Tools STAR, HISAT2, BWA aligners; Fusion-Bloom, Arriba, STAR-Fusion fusion detectors; DESeq2, EdgeR for expression Data analysis; fusion detection; differential expression
Validation Tools BLAT, BLAST, Sanger sequencing; IGV visualization; orthogonal assays (FISH, RT-PCR) Results confirmation; visual verification; experimental validation

Based on comprehensive benchmarking studies, the following recommendations emerge for selecting fusion detection approaches:

  • For clinical diagnostics: Implement combined DNA-RNA sequencing where possible, as this approach detects the broadest range of clinically actionable fusions, with demonstrated utility in 98% of cases in large validation studies [69].

  • For clinical settings with limited resources: Prioritize RNA-seq with high-performance tools like Arriba or STAR-Fusion, which offer the best balance of sensitivity, speed, and accuracy for detecting therapeutically relevant expressed fusions [7].

  • For research discovery: Select approaches based on the biological questions. DNA-based methods (OGM, DNA-seq with intronic tiling) are superior for identifying structural variants and enhancer hijacking events, while RNA-based methods excel at detecting functional fusion transcripts [21].

  • For method validation: Employ standardized benchmarking datasets including spike-in controls, simulated fusions, and orthogonally validated samples to properly assess tool performance [7] [80].

As sequencing technologies continue to evolve and computational methods improve, the integration of multi-omic approaches will likely become standard practice in both research and clinical settings, further enhancing our ability to detect these critical genomic events with implications for cancer diagnosis and treatment.

Conclusion

The choice between RNA-seq and DNA-seq for fusion detection is not a matter of selecting a superior technology, but of understanding their powerful synergy. DNA-seq effectively identifies genomic rearrangements, including those that may not be expressed, while RNA-seq provides direct evidence of oncogenic, expressed fusion transcripts and often discovers novel partners. Robust validation studies and real-world clinical data consistently demonstrate that a combined approach significantly increases the detection of clinically actionable fusions—by over 21% in pan-cancer cohorts—compared to either method alone. For the future of precision medicine, integrating DNA and RNA sequencing into comprehensive genomic profiling is paramount. This strategy ensures the most complete molecular diagnosis, expands the population of patients eligible for matched targeted therapies, and ultimately paves the way for improved clinical outcomes across a wide spectrum of cancers.

References