This article provides a comprehensive overview of RNA sequencing (RNA-seq) for detecting clinically relevant gene fusions in cancer.
This article provides a comprehensive overview of RNA sequencing (RNA-seq) for detecting clinically relevant gene fusions in cancer. It explores the foundational role of gene fusions as diagnostic biomarkers and therapeutic targets, detailing methodological advances from targeted panels to long-read sequencing. The content addresses key challenges in troubleshooting and optimization, including sample preparation and bioinformatic pipelines. Finally, it synthesizes evidence from comparative validation studies, demonstrating how integrating RNA-seq with DNA-based methods significantly improves detection rates for actionable fusions, thereby advancing precision oncology and drug development.
Gene fusions, also known as chimeric genes, are hybrid genes formed when two previously separate genes become juxtaposed due to chromosomal rearrangements. These genomic alterations represent a critical class of somatic alterations in cancer, functioning as strong oncogenic drivers in numerous malignancies [1] [2]. The resulting fusion proteins can exhibit novel functional properties or altered expression patterns that disrupt normal cellular processes, ultimately leading to tumorigenesis. The clinical importance of gene fusions has grown substantially with the development of targeted therapies, making their detection crucial for optimal treatment selection [1].
The processes of tumorigenesis and development are intricate, involving numerous genes and molecular pathways. Fusion genes, as direct products of abnormal chromosomal rearrangements, are now recognized as key factors in the formation of many types of tumors [2]. In recent years, advancements in sequencing technology and bioinformatics have accelerated the discovery of fusion genes associated with specific tumor types, expanding our understanding of their roles in cancer biology and their potential as therapeutic targets.
Gene fusions arise through several distinct mechanisms of DNA rearrangement, each involving different types of chromosomal alterations [1]:
The majority of oncogenic fusions are in-frame mutations that affect exonic regions of two protein-coding genes [1]. Interestingly, chimeric proteins can also arise without genomic rearrangement through mechanisms such as aberrant read-through transcription, where the transcription process does not properly terminate at the end of a gene and continues into the next gene (e.g., SCNN1A-TNFRSF1A) [1]. Fusion transcripts may also arise by trans or cis splicing of mRNA [1].
Oncogenic fusion proteins drive cancer development through several distinct mechanisms:
Table 1: Common Gene Fusion Types and Their Functional Consequences
| Fusion Type | Functional Consequence | Representative Examples | Primary Signaling Pathways Affected |
|---|---|---|---|
| Kinase Fusions | Constitutive kinase activation | BCR-ABL, EML4-ALK, TPM3-NTRK1 | PI3K/AKT, MAPK, JAK/STAT |
| Transcription Factor Fusions | Altered gene expression programs | PML-RARα, TMPRSS2-ERG, PAX3-FOXO1 | Cell differentiation, apoptosis |
| Ligand Fusions | Aberrant receptor activation | NRG1 fusions | ErbB signaling |
| Promoter Swap Fusions | Oncogene overexpression | IGH-FGFR3 | Various oncogenic pathways |
Oncogenic fusion proteins have been shown to drive or contribute to cancer development through both cell-autonomous and non-cell-autonomous mechanisms. For instance, in rhabdomyosarcoma, tumor cells with PAX3-FOXO1 fusion can modulate the tumor microenvironment to enhance cancer and recipient cell motility, favoring metastatic disease [1]. Similarly, cell-surface-bound NRG1 fusion proteins are thought to drive paracrine signaling via RTKs on neighboring cells [1].
Gene fusions were first discovered in hematologic malignancies, with the Philadelphia chromosome in chronic myeloid leukemia (CML) representing the inaugural example [1]. This chromosomal abnormality, identified in 1960 and later found to arise from a translocation between chromosomes 9 and 22, results in the BCR-ABL fusion gene [1]. This fusion generates a constitutively active tyrosine kinase that drives leukemogenesis and is found in almost all cases of CML [1].
Other significant fusions in hematologic cancers include PML-RARα in acute promyelocytic leukemia and various ALK fusions in anaplastic large cell lymphoma (ALCL). The TPM3-ALK fusion, for instance, has been reported in ALCL, where it drives aberrant ALK expression closely associated with malignant transformation of lymphoid cells [3].
In solid tumors, gene fusions occur across a broad spectrum of malignancies. The first fusion reported in solid tumors was CTNNB1-PLAG1 in salivary gland adenoma [1]. Large-scale genomic studies have since revealed the extensive landscape of fusion genes across solid tumors.
A comprehensive analysis of 9,624 tumors across 33 cancer types identified 25,664 fusions, with a 63.3% validation rate using whole-genome sequencing data [4]. This study suggested that fusions drive the development of 16.5% of cancer cases and function as the sole driver in more than 1% of cases [4].
Table 2: Prevalence of Select Gene Fusions Across Cancer Types
| Cancer Type | Fusion | Prevalence | Clinical Actionability |
|---|---|---|---|
| Chronic Myeloid Leukemia | BCR-ABL | ~100% [1] | FDA-approved TKIs |
| Prostate Adenocarcinoma | TMPRSS2-ERG | 38.2% [4] | Under investigation |
| Lung Adenocarcinoma | EML4-ALK | 1.0% [4] | FDA-approved ALK inhibitors |
| Thyroid Carcinoma | CCDC6-RET | 4.2% [4] | FDA-approved RET inhibitors |
| Cholangiocarcinoma | FGFR2-BICC1 | 5.6% [4] | FDA-approved FGFR inhibitors |
| Head and Neck Cancer | FGFR3-TACC3 | 2.8% overall fusion prevalence [5] | Potential target |
| Various Solid Tumors | NTRK fusions | 0.35% overall prevalence [6] | FDA-approved TRK inhibitors |
Recent large-scale studies have provided further insights into fusion prevalence. A 2025 pan-cancer analysis of 67,278 patients receiving both RNA- and DNA-based next-generation sequencing (NGS) found that 2.2% had at least one of nine fusions with FDA-approved matched therapies [7]. Notably, 29% of these fusions were detected outside of FDA-approved indications, highlighting the potential for expanding targeted therapy applications [7].
The prevalence of specific fusion types varies considerably across cancer types. For NTRK fusions, a real-world study of 19,591 solid tumor samples found an overall prevalence of 0.35%, with the highest frequencies in glioblastoma (1.91%), small intestine tumors (1.32%), and head and neck tumors (0.95%) [6].
Historically, gene fusions were detected using traditional methods that remain relevant in clinical practice:
While these methods have been widely used, they have limitations, particularly poor compatibility with multiplexing, which prevents simultaneous interrogation of multiple fusion genes [8].
The emergence of NGS technology has revolutionized fusion detection by enabling simultaneous sequencing of numerous genes in parallel [8]. Both DNA-based and RNA-based NGS approaches are employed, each with distinct advantages and limitations:
DNA-based NGS identifies genomic rearrangements at the DNA level but requires extensive coverage due to unpredictable breakpoints and potential blind spots within targeted areas [8].
RNA-based NGS detects expressed fusion transcripts, providing direct evidence of functionally relevant fusions. However, RNA is more susceptible to degradation, especially in formalin-fixed paraffin-embedded (FFPE) samples [8].
Recent studies have demonstrated that combined DNA and RNA sequencing significantly improves fusion detection. A 2025 study showed that concurrent RNA- and DNA-based NGS increased the detection of driver gene fusions by 21% compared with DNA-NGS alone [7]. Another study developing an integrated DNA and RNA-based targeted sequencing assay reported 100% sensitivity and specificity in detecting fusions in clinical samples [8].
The following protocol outlines an integrated approach for gene fusion detection using both DNA and RNA NGS:
Sample Preparation:
Library Preparation:
Sequencing and Analysis:
Validation:
This integrated approach has been shown to overcome the limitations of single-method approaches, with one study demonstrating that combined DNA and RNA sequencing identified a TPM3-NTRK1 fusion that was missed by DNA-only analysis [8].
Oncogenic fusion proteins typically activate key signaling pathways that drive tumorigenesis. The diagram below illustrates the major pathways activated by different types of gene fusions:
Diagram 1: Signaling Pathways Activated by Oncogenic Gene Fusions
The diagram above illustrates how different categories of gene fusions activate distinct signaling cascades that ultimately drive oncogenic processes. Kinase fusions typically activate multiple pathways simultaneously, including the RTK/RAS/MAPK, PI3K/AKT/mTOR, and JAK/STAT pathways, leading to enhanced cell proliferation and survival [1] [3]. Transcription factor fusions primarily alter gene expression programs, which can block differentiation and increase metastatic potential [1]. Ligand fusions, such as NRG1 fusions, activate receptor tyrosine kinase signaling through aberrant paracrine or autocrine mechanisms [1].
Table 3: Essential Research Reagents for Gene Fusion Studies
| Reagent Category | Specific Examples | Function/Application | Technical Notes |
|---|---|---|---|
| Nucleic Acid Extraction Kits | FFPE RNA/DNA extraction kits | Isolation of high-quality nucleic acids from archived specimens | Optimized for degraded FFPE material; include DNase/RNase treatment steps |
| Library Preparation Kits | Tempus xT (DNA), Tempus xR (RNA) [7] | Preparation of sequencing libraries | DNA panels should cover relevant intronic regions; RNA methods should capture fusion transcripts |
| Reference Standards | Commercial fusion standards (e.g., GeneWell) [8] | Assay validation and quality control | Contain known fusion variants at defined concentrations |
| Hybrid Capture Reagents | Custom bait panels | Enrichment of target genes | Design should include known and novel fusion partners |
| Reverse Transcription Kits | High-efficiency RT enzymes | cDNA synthesis for RNA-seq | Critical for obtaining full-length transcripts from degraded RNA |
| Sequencing Controls | Spike-in RNA controls, positive control samples | Monitoring technical performance | Should include samples with known fusion status |
| Analysis Software | STAR-Fusion, Arriba, EricScript [4] [5] | Bioinformatics detection of fusions | Use multiple algorithms to improve sensitivity/specificity |
| Orthogonal Validation Reagents | FISH probes, IHC antibodies, Sanger sequencing reagents | Confirmation of NGS findings | Essential for validating novel fusions |
The following diagram illustrates a comprehensive workflow for gene fusion detection integrating both DNA and RNA sequencing approaches:
Diagram 2: Integrated DNA and RNA Sequencing Workflow for Fusion Detection
This workflow highlights the importance of parallel DNA and RNA analysis to maximize detection sensitivity. Studies have demonstrated that this integrated approach increases the detection of actionable fusions by 21-127% compared to DNA sequencing alone [7]. The complementary nature of DNA and RNA sequencing helps overcome the limitations of each individual method - DNA sequencing can detect genomic rearrangements regardless of expression, while RNA sequencing confirms functionally expressed fusions and can identify events missed by DNA analysis due to breakpoint location or complexity [8].
Gene fusions represent critical oncogenic drivers across both hematologic malignancies and solid tumors. Understanding their mechanisms of formation, prevalence across cancer types, and the signaling pathways they activate is essential for advancing cancer research and therapy. The development of integrated detection methodologies combining DNA and RNA sequencing has significantly improved our ability to identify these important alterations, directly impacting therapeutic decisions and patient outcomes. As targeted therapies continue to advance, comprehensive fusion testing will play an increasingly vital role in precision oncology, enabling more patients to benefit from matched targeted treatments.
Oncogenic gene fusions are hybrid genes formed when two previously separate genes become juxtaposed through genomic rearrangements such as chromosomal translocations, inversions, deletions, or duplications [1]. These molecular events represent a critical class of oncogenic drivers across a broad spectrum of cancers, with profound implications for tumor initiation, progression, and therapeutic targeting [1]. Research indicates that gene fusions drive cancer development in approximately 16.5% of all cancer cases, playing a unique driving role in more than 1% of cases [9]. The clinical significance of these fusions stems from their dual role as defining diagnostic markers and actionable therapeutic targets, making their detection imperative for modern precision oncology.
The constitutive activation of tyrosine kinases through fusion events represents a common oncogenic mechanism. For instance, the BCR-ABL fusion in chronic myeloid leukemia and EML4-ALK fusion in non-small cell lung cancer result in aberrant, ligand-independent signaling that drives uncontrolled cellular proliferation and survival [1]. Beyond kinase activation, gene fusions can also drive oncogenesis through alternative mechanisms, including the juxtaposition of strong promoters that drive overexpression of proto-oncogenes or the creation of novel chimeric transcription factors that alter transcriptional programs [1]. The resulting fusion proteins can activate multiple critical signaling pathways, including PI3K-AKT, MAPK, and Rho GTPase pathways, establishing oncogenic addiction that can be therapeutically exploited [1].
Gene fusions serve as defining diagnostic markers for specific cancer types and subtypes, enabling precise pathological classification. The detection of particular gene fusions can distinguish histologically similar tumors and guide accurate diagnosis, which is fundamental for appropriate treatment selection [1]. Several fusion-driven cancers are now recognized as distinct entities in diagnostic classifications, including the World Health Organization (WHO) classification of tumors.
Table 1: Gene Fusions as Diagnostic Biomarkers in Specific Cancers
| Cancer Type | Diagnostic Fusion | Clinical Significance |
|---|---|---|
| Chronic Myeloid Leukemia | BCR-ABL1 | Defining diagnostic marker [9] |
| Secretory Breast Cancer | ETV6-NTRK3 | Present in ~92% of cases; diagnostic biomarker [10] |
| Synovial Sarcoma | SS18-SSX | Characteristic marker [10] |
| Dermatofibrosarcoma Protuberans | COL1A1-PDGFB | Specific marker [10] |
| Ependymoma | RELA fusion | Distinct entity in WHO CNS tumor classification [10] |
| Inflammatory Myofibroblastic Tumor | ALK fusions | Clarifies diagnosis due to high specific expression [11] |
| Lipofibromatosis-like Neural Tumor | NTRK1 fusions | Distinguishes from histologically similar lipofibromatosis [11] |
Gene fusions serve as powerful predictive biomarkers for response to targeted therapies. Cancers driven by fusion products, particularly those involving tyrosine kinases, often demonstrate remarkable responses to matched targeted agents, exemplifying the paradigm of precision oncology [1]. The predictive value of these fusions has led to the development of tumor-agnostic treatment approaches, where therapies are approved based on the molecular alteration rather than the tumor's tissue of origin.
The combined prevalence of actionable fusions with FDA-approved targeted therapies represents a significant proportion of cancer patients who can benefit from matched targeted treatments. A recent large-scale pan-cancer analysis of 67,278 patients found that 2.2% harbored at least one of nine fusions with an FDA-approved matched therapy, with RNA sequencing increasing the detection of these driver gene fusions by 21% compared to DNA sequencing alone [7]. Furthermore, 29% of these actionable fusions were detected outside of their FDA-approved indications, highlighting the potential for expanding targeted therapy benefits to additional patient populations [7].
Table 2: Clinically Actionable Gene Fusions and Approved Therapies
| Gene Fusion | Primary Cancer Types | Approved Targeted Therapies | Level of Evidence |
|---|---|---|---|
| ALK fusions | NSCLC, Inflammatory Myofibroblastic Tumor | Crizotinib, Ceritinib, Alectinib [11] [1] | FDA-approved in specific indications |
| NTRK fusions | Multiple solid tumors (tumor-agnostic) | Larotrectinib, Entrectinib [1] | Tumor-agnostic FDA approval |
| RET fusions | NSCLC, Thyroid Cancer | Selpercatinib, Pralsetinib [11] [1] | FDA-approved in specific indications |
| ROS1 fusions | NSCLC | Crizotinib, Entrectinib [11] [1] | FDA-approved in specific indications |
| FGFR2/3 fusions | Cholangiocarcinoma, Bladder Cancer | Erdafitinib, Pemigatinib [1] [7] | FDA-approved in specific indications |
| NRG1 fusions | NSCLC, Pancreatic Cancer | Afatinib (under investigation) [1] | Clinical trials |
The prognostic significance of gene fusions varies considerably across different cancer types and specific fusion events. Some fusions are associated with more aggressive disease courses and worse outcomes, while others may define cancer subtypes with more favorable prognoses [1]. For instance, in pediatric thyroid cancers, patients with RET or NTRK fusions were more likely to have metastatic disease and worse outcomes compared to those with BRAF-mutant disease [1]. In contrast, FGFR2 fusions in cholangiocarcinoma were grouped in a cluster of genetic alterations with the best prognosis [1]. This variability underscores the importance of context-specific interpretation of the prognostic implications of gene fusions.
RNA sequencing has emerged as a powerful tool for gene fusion detection due to its ability to directly detect expressed fusion transcripts. Several targeted and whole transcriptome sequencing approaches have been developed and validated for clinical use:
Targeted RNA Sequencing The FoundationOneRNA assay is a hybrid-capture-based targeted RNA sequencing test designed to detect fusions in 318 genes and measure expression of 1521 genes. Analytical validation demonstrated a positive percent agreement (PPA) of 98.28% and negative percent agreement (NPA) of 99.89% compared to orthogonal methods [12]. The assay successfully identified a low-level BRAF fusion missed by orthogonal whole transcriptome RNA sequencing, confirming its high sensitivity [12].
Whole Transcriptome Sequencing (WTS) A novel WTS assay for detection of gene fusions, MET exon 14 skipping, and EGFR vIII alterations achieved 98.4% sensitivity, correctly identifying 62 out of 63 known gene fusions with 100% specificity [10]. The assay established optimal performance thresholds at DV200 ≥ 30% for RNA degradation, >100 ng RNA input, >40 copies/ng fusion expression, and >80 million mapped reads [10].
Integrated DNA and RNA Sequencing Combining RNA sequencing with whole exome sequencing (WES) from a single tumor sample substantially improves detection of clinically relevant alterations. Applied to 2230 clinical tumor samples, this integrated approach enabled direct correlation of somatic alterations with gene expression, recovery of variants missed by DNA-only testing, and improved detection of gene fusions [13]. The combined assay uncovered clinically actionable alterations in 98% of cases and revealed complex genomic rearrangements that would likely have remained undetected without RNA data [13].
Sample Preparation and Quality Control
Library Preparation and Sequencing
Bioinformatic Analysis
Table 3: Essential Research Reagents for RNA Sequencing-Based Fusion Detection
| Reagent/Category | Specific Examples | Function in Workflow |
|---|---|---|
| RNA Extraction Kits | RNeasy FFPE Kit (Qiagen), AllPrep DNA/RNA FFPE Kit | Nucleic acid isolation from challenging FFPE samples [13] [10] |
| RNA Quality Control | Agilent 2100 Bioanalyzer, TapeStation 4200, Qubit assays | Assessment of RNA integrity, quantity, and suitability for sequencing [13] [10] |
| Library Prep Kits | NEBNext Ultra II Directional RNA Library Prep Kit, TruSeq stranded mRNA kit | Construction of sequencing libraries from RNA templates [13] [10] |
| rRNA Depletion | NEBNext rRNA Depletion Kit | Removal of ribosomal RNA to enrich for coding transcripts [10] |
| Target Capture | SureSelect XTHS2 RNA kit (Agilent) | Hybrid-capture enrichment for targeted RNA sequencing approaches [13] |
| Sequencing Platforms | Illumina NovaSeq 6000, Gene+ seq 2000 | High-throughput sequencing of RNA libraries [13] [10] |
Oncogenic fusion proteins drive cancer development through multiple mechanisms, most commonly by constitutively activating tyrosine kinase signaling or altering transcriptional programs. The diagram below illustrates the key signaling pathways activated by oncogenic gene fusions and their downstream effects.
The molecular mechanisms of fusion-driven oncogenesis extend beyond cell-autonomous signaling. Fusion-positive cancer cells can modulate the tumor microenvironment through paracrine signaling. For example, in rhabdomyosarcoma, PAX3-FOXO1 fusion alters exosome content, driving pro-tumorigenic signaling in recipient cells [1]. Similarly, cell-surface-bound NRG1 fusion proteins can drive paracrine signaling via RTKs on neighboring cells [1]. These microenvironmental effects highlight the broad impact of oncogenic fusions on tumor biology.
The accurate detection of clinically relevant gene fusions requires an integrated approach that combines DNA and RNA sequencing methodologies. The workflow below illustrates the complementary nature of these technologies in identifying fusion events.
The complementary nature of DNA and RNA sequencing is evidenced by multiple studies demonstrating that combined approaches significantly improve fusion detection rates. In a large pan-cancer analysis, concurrent RNA and DNA sequencing increased the detection of driver gene fusions by 21% compared with DNA sequencing alone [7]. Similarly, a targeted sequencing study found that integrated DNA and RNA testing could identify fusions that would be missed by either method alone, with DNA and RNA assays independently showing false-negative rates that were compensated for by the complementary method [11].
The comprehensive characterization of gene fusions as diagnostic, prognostic, and predictive biomarkers represents a cornerstone of modern precision oncology. The integration of RNA sequencing with DNA-based genomic profiling has demonstrated significant improvements in detection sensitivity, with combined approaches identifying actionable alterations in up to 98% of cases [13]. As the landscape of fusion-targeted therapies continues to expand, with tumor-agnostic approvals for NTRK and RET functions and ongoing investigations for numerous other targets, the clinical imperative for robust fusion detection will only intensify.
Future developments in fusion detection technology, including long-read transcriptome sequencing and advanced computational methods like GFvoter, promise further enhancements in detection accuracy [9]. Meanwhile, the growing recognition of fusions occurring outside their classic indications highlights the importance of comprehensive molecular profiling across diverse cancer types. As biomarker-driven treatment strategies continue to evolve, the systematic implementation of integrated DNA and RNA sequencing approaches will be essential for maximizing therapeutic opportunities for cancer patients harboring these clinically significant genomic alterations.
Oncogenic gene fusions are hybrid genes arising from chromosomal rearrangements such as translocations, inversions, deletions, or tandem duplications, and represent a critical class of therapeutic targets in precision oncology [1]. These fusion events can result in constitutive activation of tyrosine kinases or aberrant expression of transcription factors, driving uncontrolled cell proliferation and survival through the disruption of key signaling pathways including RAS/MAPK, PI3K/AKT, and JAK/STAT [14] [1]. The detection of these fusions has become essential for optimal cancer diagnosis, prognosis, and treatment selection, particularly with the development of highly effective targeted therapies.
RNA sequencing has emerged as a powerful tool for fusion detection, offering several advantages over DNA-based approaches and traditional methods like FISH and IHC. While DNA-based NGS can identify genomic rearrangements, it often struggles with large intronic regions where breakpoints frequently occur. RNA-seq directly captures the expressed fusion transcript, providing functional evidence of the rearrangement and confirming the maintenance of the reading frame and integrity of kinase domains [14] [7]. The clinical utility of comprehensive fusion profiling is underscored by real-world data showing that concurrent RNA and DNA sequencing increases the detection of clinically actionable fusions by 21-127% compared to DNA sequencing alone [7].
The prevalence of targetable fusions varies significantly across cancer types, with some occurring at high frequencies in specific rare tumors while appearing at lower frequencies across more common malignancies. The table below summarizes the prevalence and clinical characteristics of key oncogenic fusions.
Table 1: Prevalence and Characteristics of Key Oncogenic Fusions
| Fusion | Prevalence in NSCLC | Other Cancer Types with Significant Prevalence | Common Fusion Partners | Clinical Characteristics |
|---|---|---|---|---|
| ALK | 3-8% [14] | Anaplastic Large Cell Lymphoma (50-80%) [14]; Inflammatory Myofibroblastic Tumors (50-60%) [14] | EML4, STRN, NPM1, TPM3 [14] | Oncogenic addiction to ALK signaling; responsive to TKIs [14] |
| RET | 1-2% [15] | Papillary Thyroid Carcinoma [11] | KIF5B, CCDC6 [11] | Higher proportion of never smokers (36%) and adenocarcinoma histology (88%) [15] [16] |
| ROS1 | 1-2% [17] | - | CD74, SLC34A2 [11] | - |
| NTRK1/2/3 | <1% [6] | Glioblastoma (1.91%) [6]; Small Intestine (1.32%) [6]; Secretory Breast Carcinoma (>90%) [6] | ETV6, TPM3, LMNA [6] | Tumor-agnostic FDA approvals; often mutually exclusive with other drivers [6] |
Oncogenic fusions involving receptor tyrosine kinases typically result in constitutive activation of downstream signaling cascades that promote cell survival, proliferation, and differentiation. The diagram below illustrates the common signaling pathways activated by ALK, RET, ROS1, and NTRK fusions.
Multiple RNA-based NGS approaches have been developed for fusion detection, each with distinct advantages and limitations. The selection of an appropriate methodology depends on factors including the required sensitivity, need for novel fusion discovery, cost considerations, and sample quality.
Table 2: Comparison of RNA Sequencing Methodologies for Fusion Detection
| Methodology | Principle | Advantages | Limitations | Representative Platforms |
|---|---|---|---|---|
| Whole Transcriptome Sequencing (WTS) | Sequencing of all polyadenylated RNA transcripts | Unbiased detection of known and novel fusions; comprehensive biomarker analysis [14] | Complex bioinformatic analysis; low sensitivity for fusions with low expression [14] | Standard RNA-seq protocols |
| Hybrid-Capture-Based RNA Sequencing | Solution-based hybridization to target transcripts using bait panels | High sensitivity for known fusions; robust performance with FFPE samples [6] [18] | Limited to pre-designed targets; may miss novel partners outside panel [6] | Tempus xR [7]; Illumina RNA Panels [6] |
| Amplicon-Based RNA Sequencing | Multiplex PCR amplification of target regions | High sensitivity for known targets; cost-effective [14] | Limited to predefined targets; false positives from primer artifacts [18] | TruSight RNA Fusion Panel [14]; OncoFu Elite [14] |
| Anchored Multiplex PCR | Gene-specific priming combined with universal adapters | Ability to detect fusions with unknown partners; requires less input RNA [14] | Limited by primer design; may miss some fusion variants | FusionPlex [14] |
The most comprehensive approach for fusion detection involves parallel sequencing of both DNA and RNA from tumor samples. This integrated workflow maximizes sensitivity and specificity while providing complementary information about genomic rearrangements and their functional transcriptional consequences.
Principle: This protocol utilizes biotinylated oligonucleotide probes to enrich for target RNA sequences prior to sequencing, enabling highly sensitive detection of fusion transcripts even in degraded FFPE samples [6].
Sample Requirements:
Procedure:
RNA Quality Control
Library Preparation
Hybrid Capture Enrichment
Sequencing
Bioinformatic Analysis
Quality Control Metrics:
Table 3: Essential Research Reagents for Fusion Detection Studies
| Reagent/Category | Specific Examples | Function | Application Notes |
|---|---|---|---|
| NGS Library Prep Kits | Illumina TruSight RNA Fusion Panel; Archer FusionPlex | Target enrichment and library preparation for RNA sequencing | Archer uses anchored multiplex PCR; Illumina uses hybrid capture [14] |
| Reference Standards | GeneWell Fusion Reference Standards (contain 10 fusions across ALK, ROS1, RET, NTRK) [11] | Assay validation and quality control | Essential for establishing limit of detection (5% for DNA, 250-400 copies for RNA) [11] |
| Hybrid Capture Panels | Tempus xR (whole transcriptome); Labcorp CGP Panel | Comprehensive fusion detection via RNA baits | Labcorp panel identified 73 NTRK fusions in 19,591 samples (0.35% prevalence) [6] |
| Bioinformatic Tools | STAR-Fusion; Arriba; AGFusion; FusionCatcher | Fusion detection from RNA-seq data | Ensemble approaches combining multiple algorithms improve accuracy [7] |
| Validation Reagents | FISH probes; IHC antibodies; Sanger sequencing | Orthogonal validation of NGS findings | Critical for confirming novel fusions and borderline positive cases [11] |
The identification of oncogenic fusions has led to the development of highly effective targeted therapies, with several receiving FDA approval in both tumor-agnostic and indication-specific contexts.
Table 4: Approved Targeted Therapies for Oncogenic Fusions
| Fusion | Approved Therapies | Approval Context | Clinical Response |
|---|---|---|---|
| ALK | Crizotinib, Ceritinib, Alectinib [11] | NSCLC-specific | Standard care in ALK+ NSCLC [14] |
| RET | Selpercatinib, Pralsetinib [19] [15] | Tumor-agnostic for RET fusions | Pralsetinib: ORR 70.3%, mPFS 13.1 mos, mOS 44.3 mos [19] |
| NTRK | Larotrectinib, Entrectinib, Repotrectinib [6] | Tumor-agnostic for NTRK fusions | Larotrectinib: ORR 79% [6]; Entrectinib: ORR 61.2% [6] |
| ROS1 | Crizotinib, Entrectinib [11] | NSCLC-specific | - |
Despite initial efficacy, resistance to targeted therapies inevitably develops through multiple mechanisms. Understanding these pathways is essential for developing sequential treatment strategies.
Resistance Mechanisms:
Emeritting Therapeutic Strategies:
Artificial Intelligence in Fusion Prediction: Deep learning models applied to H&E-stained whole slide images can predict ALK and ROS1 fusions with ROC AUCs of 0.84-0.85, serving as potential prescreening tools before confirmatory molecular testing [17]. These vision transformer models utilize a two-step training approach, first learning general cancer morphology patterns before specializing in specific fusion prediction.
Liquid Biopsy Applications: While not extensively covered in the current search results, circulating tumor DNA and RNA analyses are emerging as non-invasive methods for fusion detection and resistance monitoring, particularly valuable when tissue biopsies are impractical.
The future of fusion-targeted therapy includes several promising directions:
The comprehensive detection of oncogenic fusions through RNA sequencing represents a critical component of precision oncology. The integration of multiple testing modalities, particularly combined DNA and RNA sequencing, significantly enhances detection rates of these therapeutic targets. As the field advances, the ongoing development of more sensitive detection methods, novel therapeutic agents, and sophisticated resistance-management strategies will continue to improve outcomes for patients with fusion-driven cancers. Research and clinical practice must prioritize comprehensive molecular profiling to fully realize the potential of targeted therapies across the spectrum of oncogenic fusions.
The identification of oncogenic gene fusions is critical for cancer diagnosis, prognosis, and targeted treatment selection. For years, fluorescence in situ hybridization (FISH), immunohistochemistry (IHC), and reverse transcription polymerase chain reaction (RT-PCR) have served as cornerstone techniques in clinical molecular pathology. However, as our understanding of cancer genomics expands, significant limitations of these traditional methods have emerged. Within the broader thesis on the superiority of RNA sequencing for gene fusion detection, this application note systematically details the technical and clinical constraints of FISH, IHC, and RT-PCR, supported by quantitative performance data and experimental protocols. The transition toward comprehensive genomic approaches like RNA-based next-generation sequencing (NGS) is becoming increasingly necessary to fully realize the potential of precision oncology.
The table below summarizes the key characteristics and limitations of FISH, IHC, and RT-PCR based on current clinical studies.
Table 1: Comparative Analysis of Traditional Gene Fusion Detection Methods
| Method | Typical Applications | Key Advantages | Major Limitations | Reported Sensitivity | Reported Specificity |
|---|---|---|---|---|---|
| FISH | Detection of ALK, ROS1, RET rearrangements in NSCLC [20] | Single-cell resolution; partner-agnostic [21] | Limited multiplexing capability; subjective interpretation; inability to identify fusion partners or breakpoints [21] [11] | ~70-99% (varies by gene and platform) [22] | ~80-99% (varies by gene and platform) [22] |
| IHC | Protein expression analysis; ALK, HER2 status detection [20] [23] | Low cost; rapid turnaround; preserves tissue architecture [22] | Variable sensitivity/specificity dependent on antibody and fusion partner; semi-quantitative [21] | ~60% for RET [21]; 97% for HER2 (AI-assisted) [23] | 40-85% for RET [21]; 82% for HER2 (AI-assisted) [23] |
| RT-PCR | Known fusion variant detection (e.g., EML4-ALK) [22] | Rapid; high sensitivity for known fusions; works with limited tissue [22] | Cannot detect novel fusion partners; susceptible to RNA degradation [11] [24] | 100% (ALK vs FISH/Sequencing) [22] | 94% (ALK vs FISH/Sequencing) [22] |
Experimental Protocol for FISH-Based Fusion Detection:
Key Limitations: FISH demonstrates particularly poor performance for pericentric fusions where partner genes are located close together (e.g., KIF5B and RET on chromosome 10) [21]. The method cannot identify the specific fusion partner, which has emerging clinical relevance for predicting treatment response [21]. Furthermore, FISH requires specialized expertise for interpretation, lacks standardized cutoff criteria across laboratories, and may yield positive results that are not confirmed at the transcript level [21].
Experimental Protocol for IHC-Based Fusion Protein Detection:
Key Limitations: IHC sensitivity is highly dependent on the specific fusion partner. For RET fusions, sensitivity ranges from 100% for KIF5B-RET to approximately 50% for NCOA4-RET [21]. The method suffers from significant inter-observer variability, with studies showing substantial discordance between laboratories due to lack of standardization in reagents and training [22]. While artificial intelligence approaches are emerging to address these limitations, they require further validation before widespread clinical implementation [23].
Experimental Protocol for RT-PCR Fusion Detection:
Key Limitations: RT-PCR is fundamentally limited to detecting known fusion variants with predefined breakpoints [11]. The requirement for intact, high-quality RNA presents significant challenges with FFPE specimens, where RNA is often degraded [24]. The method's sensitivity decreases dramatically when fusion transcripts are expressed at low levels or when the RNA input is below optimal levels (typically <250-400 copies/100 ng) [11].
The diagram below illustrates the fundamental difference in what each traditional detection method actually measures in the central dogma of molecular biology.
Diagram Title: What Traditional Methods Detect in Molecular Pathology
Table 2: Key Research Reagents for Traditional Fusion Detection Methods
| Reagent/Category | Specific Examples | Function and Application Notes |
|---|---|---|
| FISH Probes | Vysis ALK Break Apart FISH Probe Kit (Abbott Molecular) [22] | Designed to detect ALK rearrangements regardless of partner; requires fluorescence microscopy and specific expertise for interpretation |
| IHC Antibodies | Ventana ALK (D5F3) CDx Assay; Novocastra 5A4 (Leica) [22] | Clone D5F3 is FDA-approved companion diagnostic; 5A4 is widely validated; both require standardized antigen retrieval and detection systems |
| RNA Extraction Kits | RNeasy FFPE Kit (Qiagen) [24] | Critical for RT-PCR and RNA-based methods; includes DNase treatment step to remove genomic DNA contamination |
| RT-PCR Kits | ALK RGQ RT-PCR Kit (QIAGEN) [22] | Single-tube quantitative real-time PCR assay for automated ALK expression interpretation; requires high-quality RNA input |
| Control Materials | Commercial fusion reference standards (e.g., GeneWell) [11] | Contain spiked-in fusion transcripts (EML4::ALK, CD74::ROS1, CCDC6::RET) for assay validation and quality control |
The limitations of traditional gene fusion detection methods - including limited multiplexing capability, inability to detect novel fusions, variable sensitivity and specificity, and technical challenges with sample quality - present significant constraints in the era of precision oncology. While FISH, IHC, and RT-PCR remain valuable for specific clinical scenarios, their individual shortcomings highlight the necessity for more comprehensive approaches. RNA-based next-generation sequencing emerges as a powerful solution that overcomes these limitations, enabling simultaneous detection of known and novel fusions across multiple genes with high sensitivity and specificity. The integration of advanced computational methods and multi-analyte approaches will further enhance the detection of clinically relevant gene fusions, ultimately improving patient outcomes through more accurate diagnosis and targeted treatment selection.
Gene fusions represent a critical class of molecular alterations in cancer, serving as diagnostic, prognostic, and predictive biomarkers for targeted therapies. The detection of these transcripts in clinical specimens, particularly formalin-fixed paraffin-embedded (FFPE) tissue, presents significant technical challenges. Targeted RNA sequencing (RNA-Seq) has emerged as a powerful solution, offering enhanced sensitivity and specificity for fusion detection compared to whole transcriptome approaches. This application note details the fundamental design principles, experimental protocols, and validation frameworks for developing targeted RNA-Seq panels specifically optimized for capturing fusion transcripts in cancer research, providing researchers and drug development professionals with a comprehensive guide for implementing this technology in both basic and translational settings.
Gene fusions are hybrid genes formed through chromosomal rearrangements including translocations, deletions, inversions, or duplications. These molecular events can result in the expression of chimeric proteins with oncogenic properties or place proto-oncogenes under the control of strong promoter elements, driving tumorigenesis. Approximately one-third of soft tissue tumors and a wide array of other solid tumors harbor clinically relevant gene fusions [11]. Notably, fusions involving genes such as ALK, ROS1, RET, and NTRK family members have been well-characterized and serve as biomarkers for matched targeted therapies that have demonstrated remarkable clinical efficacy [11] [25].
The accurate detection of fusion transcripts is therefore paramount in modern cancer research and precision oncology. While traditional methods like fluorescence in situ hybridization (FISH) and immunohistochemistry (IHC) remain in use, they are limited by poor compatibility with multiplexing, preventing simultaneous interrogation of multiple fusion genes [11] [26]. Next-generation sequencing (NGS) technologies, particularly RNA-Seq, enable comprehensive profiling of fusion transcripts. However, the low quality and fragmented nature of RNA extracted from FFPE samples—the standard in pathology—poses a substantial challenge for sequencing assays [11] [26] [27]. Targeted RNA-Seq addresses these limitations by focusing sequencing power on specific genes of interest, thereby improving sensitivity, reducing costs, and enabling robust analysis of degraded samples typical in clinical research.
Targeted RNA-Seq employs two primary strategies to enrich for specific RNA transcripts: amplicon sequencing and hybridization capture [28].
The choice between these methods depends on the research application, sample quality, and the desired balance between sensitivity, specificity, and coverage.
The selection of genes to include in a panel is driven by the research context. A well-designed panel should include:
FFPE processing causes RNA fragmentation and chemical modification, making downstream analysis difficult [11] [26]. Key design adaptations include:
The following section outlines a standard workflow for fusion detection using a targeted RNA-Seq approach, from sample preparation to data analysis.
Table 1: Key Research Reagent Solutions for Library Preparation
| Reagent / Kit | Function | Application Note |
|---|---|---|
| FusionPlex Solid Tumor Kit (ArcherDX) | Multiplex PCR-based library preparation for fusion detection. | Validated for FFPE samples; includes panels for sarcoma and carcinoma [26]. |
| Biotin-labeled Probes | Hybridization and capture of target RNA sequences. | Used in capture-based targeted sequencing; require careful design for specificity [28]. |
| Streptavidin Magnetic Beads | Enrichment of probe-bound target RNA fragments. | Essential for post-hybridization wash and capture in hybridization-based methods [28]. |
| Reverse Transcriptase | Synthesis of complementary DNA (cDNA) from RNA templates. | First step in converting captured RNA into a sequencer-compatible library. |
| Platform-specific Adapters | Enable binding of library fragments to sequencing flow cells. | Contain indices for sample multiplexing. |
The library preparation process varies by method but generally follows these steps:
The following workflow diagram illustrates the complete process from sample to analysis:
Robust validation is critical for deploying a targeted RNA-Seq assay in a research setting. Performance is measured using well-characterized reference standards and clinical samples.
Table 2: Representative Performance Metrics from Validation Studies
| Validation Parameter | Representative Data | Experimental Details |
|---|---|---|
| Limit of Detection (LOD) - DNA | Mutational abundance down to 5% [11]. | Serial dilution experiments with fusion-spiked reference standards. |
| Limit of Detection (LOD) - RNA | 250–400 copies/100 ng input RNA [11]. | Serial dilution of RNA from positive cell lines (e.g., H2228 for EML4-ALK). |
| Sensitivity | 100% (28/28 known positive clinical samples) [11]. | Comparison of assay results against known fusion status from previous tests (NGS or FISH). |
| Specificity | 96.9%-100% (after resolving false negatives) [11] [27]. | Testing of fusion-negative samples and confirmation of discordant results by orthogonal methods (e.g., Sanger sequencing). |
| Reproducibility | 100% concordance in intra-run and inter-run replicates [11]. | Testing of multiple replicates (n=3) within a single run and across different sequencing runs. |
The choice of fusion-calling algorithm significantly impacts results. One study on 190 FFPE samples found that while the ArcherDX Analysis Suite (ADx) demonstrated high sensitivity, the open-source tools Arriba (ARR) and STAR-Fusion (SFU) showed lower sensitivity but could provide valuable orthogonal support, especially for low-quality data [26]. Combining multiple callers can therefore improve the robustness of fusion detection.
While RNA-Seq directly captures expressed fusion transcripts, DNA-based NGS can identify genomic rearrangements that may not be transcribed or may be difficult to capture due to RNA degradation. An integrated approach that simultaneously uses DNA and RNA-based NGS maximizes detection sensitivity [11]. Studies have shown that DNA and RNA results can complement each other, with some fusions being detected only at one level [11]. For instance, DNA-based assays may miss fusions due to large intronic regions or complex rearrangements, while RNA-based assays may miss fusions if the transcript is expressed at very low levels or is unstable.
The following diagram conceptualizes this complementary relationship:
Targeted RNA-Seq represents a highly sensitive, specific, and cost-effective methodology for the detection of clinically relevant gene fusions in cancer research. Its design, optimized for challenging, real-world samples like FFPE tissue, makes it particularly suited for both retrospective and prospective studies. Success hinges on several key factors: prudent panel design encompassing actionable genes, a robust laboratory workflow validated for degraded RNA, and a bioinformatics pipeline that leverages complementary algorithms to minimize false positives and negatives. As the landscape of therapeutic targets continues to expand, the implementation of rigorously designed and validated targeted RNA-Seq panels will be indispensable for unraveling the molecular drivers of cancer and advancing drug development.
In precision oncology, DNA sequencing reveals the genetic blueprint of a tumor, but it cannot determine which mutations are actively transcribed into messenger RNA and are therefore more likely to produce functional proteins that drive cancer progression. This fundamental limitation creates a "DNA-to-protein divide" in clinical diagnostics [31]. RNA sequencing (RNA-Seq) bridges this critical gap by providing a snapshot of the actively expressed mutational landscape, enabling more accurate cancer diagnosis, prognosis, and therapeutic targeting [32] [31]. This Application Note details experimental and bioinformatic protocols for using RNA-Seq to identify expressed mutations, with emphasis on clinically-actionable gene fusions in cancer research.
RNA-Seq demonstrates high sensitivity and specificity for detecting expressed mutations, particularly gene fusions. The following table summarizes key performance metrics from recent studies:
Table 1: Performance Metrics of RNA-Seq Assays for Gene Fusion Detection
| Metric | Performance Value | Experimental Context | Citation |
|---|---|---|---|
| Sensitivity | 98.4% (62/63 known fusions) | Whole Transcriptome Sequencing (WTS) assay on clinical samples | [24] |
| Specificity | 100% (0 false positives in 21 negative samples) | Same WTS assay on fusion-negative samples | [24] |
| Precision (Average) | 58.6% | GFvoter performance across multiple datasets | [9] |
| Advantage over DNA-Seq | Identifies 18% additional somatic SNVs in lung cancer | Comparative analysis of paired RNA-seq and DNA-seq data | [31] |
These performance characteristics make RNA-Seq particularly valuable for clinical applications where detecting expressed mutations directly influences treatment decisions. For instance, one study found that nearly one-fifth of somatic single nucleotide variants (SNVs) detected by DNA sequencing were not transcribed, suggesting they may have limited clinical relevance for targeted therapies [31].
Proper sample preparation is critical for successful RNA-Seq analysis. The following protocol outlines key steps for processing cancer specimens:
The computational analysis of RNA-Seq data involves multiple steps to identify expressed mutations accurately:
Diagram: RNA-Seq Fusion Detection Workflow
RNA-Seq provides critical functional validation of DNA-identified mutations, with several key applications in oncology:
Therapeutic Target Prioritization: RNA-Seq confirms whether mutations identified by DNA sequencing are actually expressed, helping prioritize targets with clinical relevance. Studies show RNA-Seq uniquely identifies variants with significant pathological relevance that were missed by DNA-Seq alone [31].
Gene Fusion Detection: Gene fusions are important drivers of cancer and serve as diagnostic biomarkers and therapeutic targets. RNA-Seq enables unbiased detection of both known and novel fusion events within any expressed gene [9] [24].
Comprehensive Mutation Profiling: Beyond fusions, RNA-Seq detects alternative splicing events, exon skipping variants (e.g., MET exon 14 skipping in NSCLC), and expressed single nucleotide variants that may impact protein function [24].
Biomarker Discovery: Expression patterns from RNA-Seq can classify cancer subtypes, predict treatment response, and identify resistance mechanisms, contributing to more personalized treatment approaches [32].
Table 2: Clinically-Actionable Mutations Detectable by RNA-Seq
| Mutation Type | Cancer Examples | Clinical Significance | |
|---|---|---|---|
| Gene Fusions (ALK, ROS1, RET, NTRK) | Non-small cell lung cancer (NSCLC) | FDA-approved targeted therapies available | [24] |
| Exon Skipping (MET exon 14) | Lung adenocarcinoma, lung sarcomatoid carcinoma | Emerging therapeutic target; responds to MET inhibitors | [24] |
| Expressed SNVs | Various solid tumors | Indicates clinically relevant mutations; 18% of DNA-level SNVs not transcribed | [31] |
Table 3: Essential Reagents and Tools for RNA-Seq Mutation Detection
| Category | Specific Product/Tool | Application/Function | |
|---|---|---|---|
| RNA Extraction | RNeasy FFPE Kit (Qiagen) | RNA isolation from FFPE tissue samples | [24] |
| Library Prep | NEBNext Ultra II Directional RNA Library Prep Kit | cDNA synthesis and library construction | [24] |
| rRNA Depletion | NEBNext rRNA Depletion Kit | Remove ribosomal RNA to enrich coding transcripts | [24] |
| Sequencing Platforms | Illumina NovaSeq, PacBio SMRT, Oxford Nanopore | High-throughput sequencing; long-read technologies ideal for isoform detection | [35] [36] |
| Alignment Tools | STAR, GSNAP, TopHat2, Minimap2 | "Splicing-aware" alignment of RNA-Seq reads | [33] [9] |
| Fusion Detection | GFvoter, LongGF, JAFFAL, FusionSeeker | Specialized algorithms for identifying fusion transcripts | [9] |
| Differential Expression | edgeR, limma, DESeq2 | Statistical analysis of gene expression changes | [34] |
Diagram: Bridging the DNA-to-Protein Divide with RNA-Seq
RNA sequencing provides a critical functional bridge between DNA-level mutations and their protein products, enabling more accurate cancer diagnostics and therapeutic decision-making. By confirming which mutations are actively expressed, RNA-Seq addresses fundamental limitations of DNA-only approaches in precision oncology. The experimental and computational protocols detailed in this Application Note provide researchers with robust methods for detecting expressed mutations, particularly gene fusions, across various cancer types. As sequencing technologies and bioinformatic tools continue to advance, RNA-Seq is poised to play an increasingly central role in cancer research and clinical oncology, ultimately improving patient outcomes through more precise molecular profiling.
Oncogenic gene fusions are hybrid genes resulting from chromosomal rearrangements such as translocations, inversions, deletions, or tandem duplications [1]. These fusions act as powerful drivers in numerous cancers, with products often constituting constitutively active tyrosine kinases or overexpressed transcription factors that lead to uncontrolled cell growth and proliferation [1]. The reliable detection of these fusions is critical for personalized cancer therapy, especially with the advent of targeted treatments like TRK inhibitors for NTRK fusions and selective RET inhibitors for RET fusions, which can produce profound responses in patients whose cancers harbor these alterations [37] [1].
While traditional methods like fluorescence in situ hybridization (FISH) and reverse transcription-polymerase chain reaction (RT-PCR) have proven utility, they possess inherent limitations in identifying novel and noncanonical fusion genes [38]. Next-generation sequencing (NGS) technologies have therefore become the cornerstone of modern molecular diagnostics. DNA-based NGS (DNA-seq) is adept at identifying genomic breakpoints but can miss fusions involving large introns or complex rearrangements [38]. RNA-based NGS (RNA-seq) directly captures expressed fusion transcripts, offering enhanced sensitivity and the ability to detect unknown partners, but its effectiveness depends on RNA quality and expression levels [37] [38]. An integrated DNA and RNA sequencing approach overcomes the limitations of either method alone, ensuring comprehensive detection of both known and novel oncogenic fusions for optimal therapeutic decision-making.
The following diagram outlines the comprehensive workflow for integrated DNA and RNA sequencing, from sample preparation to final analysis.
The pre-analytical phase is critical for assay success, particularly for labile RNA.
DNA-seq identifies genomic rearrangements at the DNA level.
RNA-seq directly identifies expressed fusion transcripts, overcoming limitations of DNA-seq.
A systematic comparison of DNA-seq, RNA-seq, and FISH for detecting RET fusions in early-stage NSCLC demonstrates the strengths of an integrated approach [38].
Table 1: Concordance Rates Between Detection Methods for RET Fusions
| Comparison | Concordance Rate | Key Findings |
|---|---|---|
| DNA-seq vs. RNA-seq | 92.3% (36/39 cases) | High concordance, but some fusions missed by each method individually. |
| RNA-seq vs. FISH | 84.6% (33/39 cases) | Targeted RNA-seq identified 5 additional RET+ cases missed by WTS. |
| DNA-seq vs. FISH | 82.5% (33/40 cases) | FISH provides visual confirmation but may miss non-canonical fusions. |
The same study highlighted critical performance differences between RNA-seq approaches [38].
Table 2: Detection Performance of RNA-seq Methods
| Method | Detection Rate | Sensitivity | Advantages |
|---|---|---|---|
| Whole-Transcriptome Sequencing (WTS) | 79.5% (31/39 cases) | Moderate | Unbiased detection of known and novel fusions. |
| Targeted RNA-seq | Higher than WTS | Enhanced | Identified 5 additional RET+ cases missed by WTS; optimal for low-quality RNA. |
Table 3: Essential Research Reagents and Kits for Integrated Fusion Detection
| Item | Function | Example Product |
|---|---|---|
| FFPE DNA/RNA Extraction Kit | Simultaneous co-extraction of nucleic acids from archived tumor tissue. | QIAamp DNA FFPE Tissue Kit [38] |
| DNA Library Prep Kit | Preparation of sequencing-ready libraries from gDNA. | KAPA Hyper Prep Kit [38] |
| RNA Library Prep Kit | Construction of strand-specific RNA-seq libraries. | NEBNext Ultra DNA Library Prep Kit for Illumina [33] |
| Target Enrichment Panel | Hybrid-capture or amplicon-based panels for focused sequencing. | GeneseeqPrime 425-gene panel [38] |
| NGS Platform | High-throughput sequencing of prepared libraries. | Illumina HiSeq 4000 [38] |
| Alignment & Analysis Software | Processing of raw sequence data, alignment, and variant calling. | BWA-MEM, GATK, Delly [38] |
Oncogenic gene fusions activate key signaling pathways that drive tumor growth and survival. The diagram below illustrates the primary signaling cascades dysregulated by kinase fusion proteins.
These constitutive signals promote tumorigenesis through multiple mechanisms [1]:
Targeted therapies, such as small-molecule inhibitors, directly bind to and inhibit the constitutively active kinase domain of the fusion protein, thereby blocking these oncogenic signals [1].
The integrated DNA and RNA NGS assay provides a comprehensive diagnostic solution for detecting oncogenic gene fusions. This approach leverages the unique strengths of each method: DNA-seq reliably identifies genomic breakpoints and structural rearrangements, while RNA-seq confirms the expression of functional fusion transcripts and detects events missed by DNA-based methods due to large introns or complex rearrangements [38].
The clinical implications of this integrated approach are profound. It ensures that patients with rare or non-canonical fusions are identified, making them eligible for highly effective targeted therapies. This is the cornerstone of tumor-agnostic treatment, where the molecular alteration, rather than the tumor's tissue of origin, dictates the therapeutic strategy [1]. Furthermore, characterizing the specific fusion partner and structure can provide insights into clinical behavior and potential resistance mechanisms.
For implementing this assay, several best practices are recommended. Laboratories should standardize pre-analytical procedures, including tumor enrichment via macrodissection and optimal handling of FFPE tissue to preserve RNA integrity [37]. A validated bioinformatics pipeline is essential for accurate fusion calling from both DNA and RNA data. Finally, maintaining flexibility in RNA input and having protocols for complementary testing (e.g., FISH or IHC) for challenging specimens is crucial for maximizing clinical utility [37] [38].
In conclusion, an integrated DNA and RNA NGS assay represents a superior methodology for comprehensive fusion detection in oncology. It aligns with the principles of precision medicine by ensuring that all patients with actionable oncogenic fusions are identified, thereby optimizing their treatment outcomes and paving the way for continued advances in cancer therapy.
Long-read transcriptome sequencing has emerged as a transformative technology for detecting complex structural variants and gene fusions in cancer research. Unlike short-read sequencing, long-read technologies from Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) generate reads spanning thousands to millions of bases, enabling direct observation of full-length transcripts and complex rearrangement events. This capability is particularly valuable for identifying driver mutations and therapeutic targets in oncology, where gene fusions play a significant role in approximately 16.5% of cancer cases. This application note details experimental protocols, benchmarking data, and analytical workflows for implementing long-read sequencing in cancer genomics research, providing researchers with practical frameworks for detecting clinically relevant structural variants.
Long-read sequencing technologies have revolutionized structural variant detection by overcoming inherent limitations of short-read approaches. The two leading platforms, PacBio and Oxford Nanopore, employ fundamentally different methodologies each with distinct advantages for transcriptome analysis:
The key advantage of both platforms is their ability to span repetitive regions and complex genomic architectures that typically confound short-read technologies, enabling more comprehensive characterization of the cancer transcriptome [40].
Table 1: Comparison of Long-Read Sequencing Platforms
| Feature | PacBio HiFi | Oxford Nanopore (ONT) |
|---|---|---|
| Read Length | 10-25 kb (HiFi reads) | Up to >1 Mb (typical reads 20-100 kb) |
| Accuracy | >99.9% (HiFi consensus) | ~98-99.5% (Q20+ with recent improvements) |
| Throughput | Moderate–High (up to ~160 Gb/run Sequel IIe) | High (varies by device; PromethION > Tb) |
| Instrument Cost | High (Sequel IIe system) | Lower (MinION, GridION, scalable options) |
| Best Applications | Clinical-grade variant detection, haplotype phasing | Large SV detection, point-of-care, real-time analysis |
The analysis of long-read transcriptome sequencing data requires specialized bioinformatics workflows tailored to the unique characteristics of long reads. A robust analytical pipeline includes the following critical steps:
Long reads must be aligned to a reference genome using specialized aligners that accommodate their length and error profiles. Minimap2 is widely used for this purpose, though alternatives like Winnowmap2 and ngmlr offer complementary strengths [9]. Quality control should assess read length distribution, base-calling quality scores, and adapter contamination.
Specialized algorithms identify potential gene fusions by detecting reads that align to multiple genomic locations. The GFvoter pipeline employs a multivoting strategy that integrates multiple alignment and fusion detection tools to improve accuracy [9]. This approach demonstrates how combining evidence from multiple methods reduces false positives while maintaining sensitivity.
Detected variants require annotation to determine their potential functional impact. This includes assessing whether fusions preserve open reading frames, affect functional domains, or disrupt regulatory elements. Integration with cancer gene databases helps prioritize clinically relevant events.
Recent evaluations of fusion detection tools demonstrate significant variability in performance metrics. GFvoter, which employs a multivoting strategy combining multiple aligners and fusion callers, has shown superior performance compared to existing methods [9].
Table 2: Performance Comparison of Fusion Detection Tools on Real and Simulated Datasets
| Tool | Average Precision | Average Recall | Average F1 Score | Key Strengths |
|---|---|---|---|---|
| GFvoter | 58.6% | Variable by dataset | 0.569 | Best precision-recall balance, multivoting strategy |
| LongGF | 39.5% | Variable by dataset | 0.407 | Good recall for certain fusion types |
| JAFFAL | 30.8% | Variable by dataset | 0.386 | Comprehensive fusion annotation |
| FusionSeeker | 35.6% | Variable by dataset | 0.291 | High precision on specific datasets |
In a comparative analysis using both simulated datasets and real cancer cell line data (including MCF-7, HCT-116, and A549 lines), GFvoter consistently achieved the highest F1 scores across nine experimental datasets, with values ranging from 0.080 to 0.972 [9]. Notably, GFvoter uniquely detected the RPS6KB1-VMP1 gene fusion in the MCF-7 breast cancer cell line, which was missed by all other tools evaluated [9].
The performance advantage of GFvoter stems from its multivoting methodology, which integrates:
This approach demonstrates how combining evidence from multiple analytical methods can overcome the limitations of individual tools, particularly for detecting challenging fusion events with complex breakpoints or occurring in repetitive regions [9].
Begin with high-quality RNA extracted from cancer cells or tissues:
The following protocol is adapted from methods described in recent publications [9] [42]:
Successful implementation of long-read transcriptome sequencing requires specific reagents and computational resources. The following table details essential components for a complete workflow:
Table 3: Essential Research Reagents and Computational Tools for Long-Read Transcriptome Sequencing
| Category | Specific Product/Tool | Function/Purpose |
|---|---|---|
| RNA Extraction | Trizol reagent, PicoPure RNA Isolation Kit | High-quality RNA extraction from cells/tissues |
| Library Prep | NEBNext Ultra DNA Library Prep Kit, SMRTbell Express Template Prep Kit | cDNA library construction for sequencing |
| Sequencing | PacBio Sequel IIe SMRT Cells, ONT PromethION Flow Cells | Platform-specific sequencing substrates |
| Alignment | Minimap2, Winnowmap2, ngmlr | Mapping long reads to reference genomes |
| Fusion Detection | GFvoter, LongGF, JAFFAL | Identifying gene fusions from aligned reads |
| Quality Control | FastQC, Trimmomatic | Assessing and improving read quality |
| Visualization | Integrative Genomics Viewer (IGV) | Visualizing aligned reads and fusion events |
The application of long-read sequencing to cancer genomics has yielded significant insights into complex structural variants driving oncogenesis. In a recent study of acute myeloid leukemia (AML), long-read transcriptome sequencing identified previously undetected gene fusions that had escaped detection by short-read technologies [9]. These findings demonstrate how long-read approaches can resolve the full complexity of cancer transcriptomes.
Another illustrative case comes from analysis of the MCF-7 breast cancer cell line, where GFvoter detected the RPS6KB1-VMP1 fusion while other tools failed [9]. This fusion represents a clinically relevant event that may serve as both a diagnostic biomarker and therapeutic target. The ability to detect such events with high confidence enables researchers to build more complete inventories of driver mutations across cancer types.
Long-read sequencing also excels at characterizing variants in repetitive regions and segmental duplications, which are frequently overlooked in short-read analyses [39]. This capability is particularly valuable for understanding genome instability in cancer and identifying non-canonical gene fusions that may contribute to therapy resistance.
As long-read sequencing technologies continue to evolve, several trends are shaping their future application in cancer research:
The integration of long-read sequencing into mainstream cancer research requires continued development of user-friendly analytical tools and standardized protocols. As these technologies become more accessible, they are poised to significantly advance our understanding of cancer genomics and expand the repertoire of actionable targets for precision oncology.
Formalin-fixed, paraffin-embedded (FFPE) tissues represent an invaluable resource for cancer genomics research, particularly in the context of RNA sequencing for gene fusion detection. However, the degraded nature of FFPE-derived RNA, chemical modifications from fixation, and low yields present significant analytical challenges. This application note synthesizes current methodologies and data to provide optimized protocols for FFPE RNA quality assessment, library preparation, and sequencing analysis. By implementing rigorous quality control metrics and selecting appropriate sequencing strategies, researchers can reliably overcome these challenges to detect clinically relevant gene fusions, including ALK, ROS1, RET, and NTRK fusions, with significant implications for cancer diagnosis and therapeutic development.
FFPE tissues are the most widely available clinical specimens, offering unparalleled access to retrospective cohorts with extensive clinical annotation. Their importance in oncology is underscored by the critical role that gene fusions play as diagnostic biomarkers and therapeutic targets in solid tumors. Gene fusions derived from genomic rearrangements occur in approximately one-third of soft tissue tumors and drive oncogenesis in 16.5% of all cancer cases [11] [9]. However, the formalin fixation process fragments RNA, induces chemical modifications, and cross-links nucleic acids to proteins, resulting in suboptimal RNA integrity that complicates downstream sequencing applications [44]. The poly-A tails essential for many library preparation methods are often degraded in FFPE-RNA, limiting the effectiveness of oligo-dT based reverse transcription [44]. This technical guide addresses these challenges through systematic quality control, optimized protocols, and analytical frameworks specifically designed for fusion detection in FFPE specimens.
Precise quality assessment is fundamental for successful FFPE-RNA sequencing. Traditional RNA Quality Indicators (RIN) are often unreliable for FFPE-RNA due to extensive fragmentation. Instead, fragment size distribution metrics provide more accurate quality predictions:
Research demonstrates that samples with median RNA concentration <18.9 ng/µL and pre-capture library Qubit values <2.08 ng/µL frequently fail bioinformatics QC, whereas successful samples typically exhibit concentrations >40.8 ng/µL and library yields >5.82 ng/µL [45]. A decision tree model incorporating these metrics achieved an F-score of 0.848 in predicting sequencing success [45].
Table 1: Quality Control Decision Thresholds for FFPE-RNA
| Metric | Threshold for Success | Threshold for Failure | Assessment Method |
|---|---|---|---|
| RNA Concentration | >25 ng/µL | <18.9 ng/µL | Qubit Fluorometer |
| DV200 Value | >40% | <30% | Agilent Bioanalyzer |
| DV100 Value | >50% | <40% | Agilent Bioanalyzer |
| Library Concentration | >1.7 ng/µL | <1.7 ng/µL | Qubit dsDNA HS Assay |
| 28S:18S Ratio | ~2:1 (for intact RNA) | <1.5:1 | Denaturing Agarose Gel |
The recommended workflow includes:
Table 2: RNA Input Requirements for Different Sequencing Approaches
| Sequencing Approach | Example Kits | Recommended Input | Optimal FFPE Application |
|---|---|---|---|
| Short-read RNA-seq | Illumina TruSeq Stranded mRNA | 100 ng | Higher quality samples (DV200 > 50%) |
| Short-read RNA-seq (Low-input) | NEBNext Ultra II Directional RNA | 10 ng | Limited samples with moderate quality |
| Long-read RNA-seq | PacBio SMRTbell Prep 3.0 | 300-500 ng | Intact transcript sequencing (rare for FFPE) |
| Ultra-low-input RNA-seq | SMART-Seq mRNA LP | 10 pg | Extremely limited or poor quality samples |
| 3' mRNA-seq | QuantSeq FFPE | 10 ng-1 μg | Degraded samples, gene expression focus |
| Whole Transcriptome | CORALL FFPE | Varies | Fusion detection, isoform analysis |
The selection of appropriate input requirements depends on both sample availability and research objectives. For gene fusion detection, studies demonstrate that an integrated DNA and RNA-based approach achieves optimal results, with DNA-based NGS reliably detecting fusions at 5% mutational abundance and RNA-based NGS requiring 250-400 copies/100 ng input [11]. This combined approach increased detection of clinically actionable fusions by 21% compared to DNA-NGS alone in a pan-cancer study of 67,278 patients [7].
Two predominant strategies have emerged for FFPE-RNA library preparation:
A. Ribosomal Depletion Workflow:
B. 3' mRNA-Seq Workflow:
Table 3: Performance Comparison of FFPE-Compatible Library Prep Kits
| Kit Name | Workflow | Input Requirement | Key Advantages | Fusion Detection Capability |
|---|---|---|---|---|
| TaKaRa SMARTer Stranded Total RNA-Seq v2 | Ribosomal depletion | 1 ng-200 ng | 20-fold lower input requirement | Suitable with sufficient coverage |
| Illumina Stranded Total RNA Prep with Ribo-Zero Plus | Ribosomal depletion | Standard input | Superior alignment rates (61.65% unique mapping) | Excellent with uniform coverage |
| NEBNext Ultra II Directional RNA | Ribosomal depletion | 10 ng | Cost-effective, well-validated | Reliable with confirmatory DNA-seq |
| QuantSeq FFPE | 3' mRNA-seq | 10 ng-1 μg | Focused on 3' end, cost-efficient | Limited to 3' fusion partners |
Comparative studies indicate that while ribosomal depletion methods (Kit B) achieve better alignment performance and lower duplication rates (10.73% vs 28.48%), low-input methods (Kit A) can achieve comparable gene expression quantification with 20-fold less RNA input [47]. Both approaches show 83.6-91.7% concordance in differential expression analysis, confirming their reliability for FFPE transcriptomics [47].
Materials:
Procedure:
Dual-Modality Approach: For comprehensive fusion detection, implement both DNA and RNA sequencing:
DNA Library Preparation:
RNA Library Preparation:
Sequencing Parameters:
Fusion Detection Pipeline:
RNA-based Fusion Calling:
Validation:
Table 4: Key Research Reagent Solutions for FFPE RNA Fusion Detection
| Product Category | Specific Examples | Function and Application |
|---|---|---|
| Nucleic Acid Extraction | AllPrep DNA/RNA FFPE Kit (Qiagen) | Co-extraction of DNA and RNA from FFPE samples |
| RNA Quality Assessment | Agilent RNA 6000 Nano Kit | Fragment analysis and DV200 calculation |
| RNA Quantitation | Qubit RNA HS Assay | Accurate RNA concentration measurement |
| rRNA Depletion Kits | NEBNext rRNA Depletion Kit | Removal of ribosomal RNA for total RNA-seq |
| Low-Input RNA Library Prep | TaKaRa SMARTer Stranded Total RNA-Seq Kit v2 | Library preparation from limited RNA input (1ng) |
| 3' mRNA-Seq Kits | QuantSeq FPPE (Lexogen) | 3' focused gene expression profiling |
| Whole Transcriptome Kits | CORALL FFPE (Lexogen) | Full transcriptome coverage for fusion detection |
| DNA NGS Panels | Tempus xT (648 genes) | Targeted DNA sequencing for fusion detection |
| RNA NGS Panels | Tempus xR (Whole transcriptome) | RNA sequencing for fusion validation |
| Fusion Validation | FusionPlex BBI Solid Tumor Panel | Orthogonal validation of fusion events |
FFPE tissues remain indispensable for cancer research, particularly in the era of precision oncology where gene fusion detection directly informs therapeutic decisions. By implementing rigorous quality control metrics (DV200/DV100, concentration thresholds), selecting appropriate library preparation strategies based on RNA quality, and utilizing integrated DNA-RNA sequencing approaches, researchers can successfully overcome the inherent challenges of FFPE-derived nucleic acids. The combined DNA-RNA NGS approach increases detection of clinically actionable fusions by 21% compared to DNA sequencing alone, potentially expanding patient eligibility for matched targeted therapies [7]. As sequencing technologies evolve, particularly long-read transcriptome sequencing and advanced computational tools like GFvoter, the detection of complex fusion events and their isoforms will continue to improve, further unlocking the potential of archived FFPE collections for transformative cancer research.
Oncogenic gene fusions are potent drivers of cancer and represent critical biomarkers for targeted therapies [1]. The reliable detection of these fusions via RNA sequencing is paramount for personalized cancer treatment, making the rigorous validation of analytical methods a cornerstone of clinical genomics. Establishing robust detection limits—specifically the Limit of Detection (LOD) and reproducibility—ensures that assays can reliably identify therapeutically relevant fusions, even at low expression levels or in challenging sample types like Formalin-Fixed, Paraffin-Embedded (FFPE) tissues [48] [27]. This application note details the experimental protocols and metrics essential for validating an RNA sequencing assay for gene fusion detection in cancer research, providing a framework for researchers, scientists, and drug development professionals.
Various targeted RNA sequencing approaches have been developed and validated to detect gene fusions with high sensitivity and specificity. The table below summarizes the performance metrics reported for different assay methodologies.
Table 1: Analytical Validation Metrics of RNA Sequencing Assays for Fusion Detection
| Assay / Study | Methodology | Positive Percent Agreement (PPA) | Negative Percent Agreement (NPA) | Limit of Detection (LoD) | Reproducibility |
|---|---|---|---|---|---|
| FoundationOneRNA [49] [50] | Hybrid-capture-based targeted RNA-seq (318 fusion genes) | 98.28% | 99.89% | 21-85 supporting reads; 1.5-30 ng RNA input | 100% (10/10 pre-defined fusions) |
| RNA Fusion Panel (ArcherDX) [48] | Anchored Multiplex PCR (AMP) targeting 17 genes | >99% | >99% | 50 copies for most fusion transcripts | >99% |
| Tempus xR [7] | Whole-transcriptome RNA-seq | 98.2% | 99.993% | LOB set at 3 total supporting reads; ≥4 reads for positive call | Established via GLMM* |
| Targeted RNA-seq (Research) [51] | Two panels for hematological (188 genes) and solid (241 genes) tumors | Increased diagnostic rate from 63% to 76% vs. FISH/RT-PCR | N/A | 50% detection at 2 pM input (using spike-in standards) | High concordance with previous diagnoses |
*GLMM: Generalized Linear Mixed Model, a statistical approach for characterizing reproducibility [52].
The LOD is the lowest concentration of an analyte that can be reliably detected by an assay. For gene fusion detection, this is often defined in terms of input RNA and supporting sequencing reads.
This protocol outlines the standard procedure for establishing the analytical sensitivity of an RNA sequencing assay for fusion detection.
I. Materials and Reagents
II. Experimental Procedure
III. Data Analysis and LOD Calculation
Reproducibility measures the precision of an assay under varying conditions, such as different operators, instruments, or days. It ensures consistent results across routine clinical use.
I. Materials and Reagents
II. Experimental Procedure
III. Data Analysis and Interpretation
The following diagram illustrates the complete workflow, from sample preparation to analytical validation, for establishing robust detection limits in gene fusion detection.
The table below lists key reagents and materials critical for successfully establishing detection limits for gene fusion detection assays.
Table 2: Essential Research Reagents for RNA Fusion Assay Validation
| Reagent / Material | Function | Example Products / Notes |
|---|---|---|
| FFPE RNA Extraction Kit | Isolves high-quality RNA from archived clinical samples. | Qiagen AllPrep DNA/RNA FFPE kit [48]. |
| RNA Quantitation Kit | Accurately measures RNA concentration; more reliable for FFPE RNA than absorbance. | Qubit fluorometric quantification [48]. |
| cDNA Synthesis Kit | Generates complementary DNA from RNA templates for library prep. | Archer FusionPlex reagents [48]. |
| Targeted RNA Library Prep Kit | Prepares sequencing libraries enriched for fusion-related genes. | FoundationOneRNA; Archer FusionPlex [49] [48]. |
| Fusion-Positive Control RNA | Serves as a positive control for assay development and LOD studies. | RNA from cell lines like K562 (BCR-ABL1), H2228 (EML4-ALK) [51] [27]. |
| Fusion Spike-in Controls | Synthetic RNA molecules used for absolute quantification of LOD. | Fusion sequins [51]. |
| NGS Platform | Executes high-throughput sequencing of prepared libraries. | Illumina MiSeq, NextSeq [48]. |
| Bioinformatic Tools | Software for identifying fusion transcripts from raw sequencing data. | STAR-Fusion, FusionCatcher, JAFFAL, GFvoter [51] [9]. |
In the context of RNA sequencing for gene fusion detection in cancer research, bioinformatic pipelines face a fundamental challenge: minimizing false negatives to ensure that clinically actionable fusions are not missed, while controlling false positives to avoid misleading clinical conclusions and unnecessary follow-up testing. Gene fusions are a hallmark of cancer, driving approximately 20% of human cancer morbidity, and their accurate identification is paramount for diagnosis, prognosis, and guiding targeted therapies [24]. However, fusion detection remains challenging due to artifacts introduced during library preparation, sequence alignment difficulties, and the inherent complexity of genomic rearrangements [30]. This application note details established protocols and data analysis strategies to optimize this critical balance, enabling reliable identification of therapeutically relevant gene fusions in clinical and research settings.
The choice of computational tools significantly impacts the balance between sensitivity and specificity. Independent benchmarking studies provide crucial quantitative data for informed tool selection.
Table 1: Performance Comparison of Short-Read RNA-Seq Fusion Detection Tools
| Tool | Sensitivity (Recall) | Precision | Key Strengths | Runtime Efficiency |
|---|---|---|---|---|
| Arriba | 88/150 (58.7%) fusions at 5x expression level [30] | High (Superior enrichment of validated predictions) [30] | Fast; sensitive detection of low-expression fusions; identifies fusions with intergenic breakpoints [30] | <1 hour per sample [30] |
| STAR-Fusion | Varies by dataset and expression level [30] | Varies by dataset and expression level [30] | Commonly used; part of a widely adopted RNA-seq aligner suite [30] | Computationally demanding [30] |
| FusionCatcher | Lower than Arriba on benchmark data [30] | Lower than Arriba on benchmark data [30] | Can use a list of known fusions for sensitive parameters [30] | Computationally demanding [30] |
| JAFFA | Lower than Arriba on benchmark data [30] | Lower than Arriba on benchmark data [30] | Hybrid (assembly and alignment) approach [30] | Computationally intensive, scalability challenges [53] |
For long-read transcriptome sequencing (PacBio, Nanopore), new tools are emerging. GFvoter, a tool that employs a multivoting strategy, has demonstrated superior performance on real and simulated datasets, achieving an average F1 score (harmonic mean of precision and recall) of 0.569, outperforming LongGF, JAFFAL, and FusionSeeker [9]. GFvoter successfully identified the RPS6KB1::VMP1 fusion in the MCF-7 cell line, which was missed by other tools tested [9].
Table 2: Performance of Long-Read RNA-Seq Fusion Detection Tools (Real Datasets)
| Tool | Average Precision | Average Recall | Average F1 Score |
|---|---|---|---|
| GFvoter | 58.6% | Varies by dataset | 0.569 |
| LongGF | 39.5% | Varies by dataset | 0.407 |
| JAFFAL | 30.8% | Varies by dataset | 0.386 |
| FusionSeeker | 35.6% | Varies by dataset | 0.291 |
This protocol is validated for formalin-fixed, paraffin-embedded (FFPE) tumor samples, a common but challenging material source in clinical practice [24].
Materials and Reagents:
Procedure:
This protocol uses a custom panel to simultaneously interrogate fusions at the DNA and RNA level, providing complementary information that enhances detection accuracy [11].
Materials and Reagents:
Procedure:
Key Finding: The DNA and RNA assays are complementary. In one study, the DNA assay missed fusions (e.g., ETV6::NTRK3, CCDC6::RET) that the RNA assay detected, and conversely, the RNA assay missed some fusions (e.g., TRIM46::NTRK1, CD74::ROS1) that the DNA assay detected. The integrated approach achieved 100% sensitivity and specificity after resolving discrepancies [11].
The following diagram illustrates the critical steps and quality control checkpoints in the sample preparation workflow that directly influence the false positive/negative rate.
A multi-layered bioinformatic approach is essential to filter out artifacts while retaining true positive fusions. The following diagram outlines a consensus strategy.
Key Filtering Steps:
Table 3: Key Reagents and Kits for RNA-Seq Based Fusion Detection
| Reagent/Kits | Primary Function | Application Note |
|---|---|---|
| RNeasy FFPE Kit (Qiagen) | RNA extraction from FFPE tissue | Maintains RNA integrity for degraded samples; critical for archival clinical material [24]. |
| NEBNext rRNA Depletion Kit (NEB) | Removal of ribosomal RNA | Enriches for mRNA transcripts; preferred over poly-A selection for degraded RNA [24]. |
| NEBNext Ultra II Directional RNA Library Prep Kit (NEB) | Preparation of sequencing-ready libraries | Compatible with low-input and degraded RNA from FFPE samples [24]. |
| QIAseq RNAscan Custom Panel (Qiagen) | Targeted PCR-based RNA-seq | Uses UMI for error correction; highly sensitive for low-expression fusions in a defined gene set [53]. |
| ERCC RNA Spike-In Mix (Thermo Fisher) | External RNA controls | Monitors technical performance and enables cross-laboratory benchmarking of expression data [54]. |
| Commercial Fusion Reference Standards (e.g., GeneWell) | Positive controls for validation | Contains validated fusion transcripts for determining assay LOD, precision, and accuracy [11]. |
Optimizing a bioinformatic pipeline for gene fusion detection requires a holistic approach that integrates stringent wet-lab protocols, judicious selection of computational tools, and multi-step bioinformatic filtering. Adherence to strict RNA quality controls (DV200 ≥ 30%), sufficient sequencing depth (>80M reads), and the use of integrated DNA/RNA or targeted RNA-seq approaches can significantly enhance detection accuracy. Employing modern tools like Arriba for short-read data or GFvoter for long-read data, combined with advanced filtering strategies such as coverage imbalance analysis, provides a robust framework for balancing sensitivity and specificity. This optimized workflow ensures reliable identification of clinically actionable gene fusions, ultimately supporting precise diagnosis and personalized treatment in oncology.
Within cancer research, the reliable detection of gene fusions via RNA sequencing (RNA-Seq) is a critical component of molecular diagnostics and therapeutic targeting. Achieving reliable results, however, is profoundly dependent on two fundamental experimental parameters: sample size (the number of biological replicates) and sequencing depth (the number of reads per sample). Inadequate attention to either can lead to both false-positive and false-negative findings, compromising the validity of the study and its potential for clinical translation. This Application Note provides a structured framework, grounded in empirical evidence and statistical principles, to guide researchers in determining these parameters for robust gene fusion detection in the context of cancer.
Sample size estimation for RNA-Seq experiments is based on a negative binomial distribution, which accurately models the over-dispersed nature of count data generated by sequencing [55]. The core formula for calculating the required number of samples per group (n) is [55]:
Where the key parameters are:
The values zα/2 and zβ are the critical values from the standard normal distribution corresponding to α and β. For example, for α=0.05 and β=0.10 (90% power), zα/2 = 1.96 and zβ = 1.28 [55].
While power calculations are essential, empirical down-sampling studies from large datasets provide practical guidance. A recent large-scale analysis in murine models offers critical insights applicable to cancer studies [56].
Table 1: Impact of Sample Size on False Discovery Rate (FDR) and Sensitivity [56]
| Sample Size (N per group) | False Discovery Rate (FDR) | Sensitivity | Recommendation |
|---|---|---|---|
| N ≤ 4 | Unacceptably High (e.g., >50%) | Very Low | Highly misleading; results are unreliable |
| N = 5 | High | Low | Fails to recapitulate the full expression signature |
| N = 6-7 | Decreases to ~50% | Increases to ~50% | Minimum threshold for consistent results |
| N = 8-12 | Further reduced, tapering observed | Significantly improved (e.g., >50%) | Optimal range for reliable discovery |
This analysis demonstrated that raising the fold-change cutoff is not an effective substitute for increasing sample size, as this strategy inflates effect sizes and substantially reduces detection sensitivity [56]. For gene fusion studies, where the goal is to detect a qualitative "present/absent" event rather than a quantitative fold-change, sufficient biological replicates remain crucial for accurately estimating the prevalence of a fusion within a population and for accounting for biological variability in expression.
Using a single, conservatively chosen value for read count (μ) and dispersion (σ) often leads to over-estimated sample sizes [57]. The recommended approach is to utilize distributions of these parameters from real RNA-seq data of a similar type (e.g., from public repositories like The Cancer Genome Atlas - TCGA).
The RnaSeqSampleSize R package implements this empirical data-based method [57]. The workflow involves:
Table 2: Comparison of Sample Size Estimation Methods for a Hypothetical Study
| Estimation Method | Key Features | Input Parameters | Estimated Sample Size (Example) |
|---|---|---|---|
| Single-Value (Conservative) | Uses one minimal read count and maximal dispersion; often overestimates. | Min read count=10, Max dispersion=2.0, FC=2, Power=0.8, FDR=0.05 | 168 [57] |
Empirical Data-Based (RnaSeqSampleSize) |
Uses real distributions of read counts and dispersions from a reference dataset. | TCGA READ dataset, FC=2, Power=0.8, FDR=0.05 | 42 [57] |
The following diagram illustrates the statistical workflow for determining sample size.
Sequencing depth must be aligned with the primary analytical goal of the study. While standard gene expression analysis can be performed with moderate depth, the confident detection of gene fusions and other complex features demands greater sequencing effort [58].
Table 3: Recommended Sequencing Depth and Read Length by RNA-Seq Application
| Application | Recommended Depth (Mapped Reads) | Recommended Read Length | Rationale |
|---|---|---|---|
| Standard Gene Expression | 20 - 40 million [59] [58] | 50 - 75 bp, paired-end [60] | Cost-effective for robust quantification of highly expressed genes. |
| Isoform Detection & Splicing | ≥ 100 million [58] | 2x75 bp or 2x100 bp, paired-end [58] [60] | Increased depth and length are required to cover low-abundance junctions and resolve complex splice variants. |
| Fusion Gene Detection | 60 - 100 million [58] | 2x75 bp as baseline, 2x100 bp for better resolution [58] | Ensures sufficient "split-read" support to anchor breakpoints and identify novel fusion partners. |
| Allele-Specific Expression | ~100 million [58] | 2x75 bp or longer, paired-end | Higher depth is essential to accurately estimate variant allele frequencies and minimize sampling error. |
For fusion detection, the use of paired-end reads is critical. This approach sequences both ends of a DNA fragment, providing two independent data points that are invaluable for mapping reads across breakpoint junctions and resolving complex rearrangements [61] [62]. A read length of 2x100 bp provides cleaner junction resolution compared to 2x75 bp [58].
A key principle in experimental design is that for a fixed budget, increasing the number of biological replicates (sample size) often provides a greater return on investment than increasing the sequencing depth per sample [56]. While deep sequencing is necessary for fusion detection, an underpowered study with few replicates sequenced very deeply will still yield biologically irreproducible results. The priority should be to first determine the adequate sample size to account for biological variation, then allocate remaining resources to achieve the recommended sequencing depth for the application.
Targeted RNA sequencing panels offer a highly sensitive and cost-effective approach for detecting recurrent gene fusions in a clinical diagnostic setting, with a demonstrated turnaround time of fewer than five days [63]. The following protocol, adapted from a clinical leukemia study, details this workflow.
Workflow Overview: The process involves using an anchored multiplex PCR panel for target enrichment. This method uses gene-specific primers combined with universal adapters to amplify sequences of interest without prior knowledge of the exact partner gene or breakpoint, making it ideal for discovering novel fusions [63].
Step-by-Step Protocol:
RNA Extraction and QC: Extract total RNA from patient samples (e.g., bone marrow or blood) using a standard Trizol-based or column-based protocol. Assess RNA integrity (RIN or RQS) and quantity. For formalin-fixed, paraffin-embedded (FFPE) samples, use DV200 as a metric; a DV200 > 50% is generally acceptable, but rRNA depletion is preferred over poly(A) selection for degraded samples [58].
cDNA Synthesis: Convert 1.0 - 1.5 μg of total RNA into complementary DNA (cDNA) using reverse transcriptase (e.g., M-MLV) and oligo(dT) or random hexamer primers [63].
Targeted Library Preparation (Anchored Multiplex PCR):
Library QC and Pooling: Quantify the final libraries using a fluorometric method (e.g., Qubit). Assess library size distribution using a bioanalyzer or tape station. Normalize and pool the barcoded libraries in equimolar ratios.
Sequencing: Sequence the pooled library on a benchtop sequencer (e.g., Illumina MiSeq). Aim for a sequencing depth of 3-5 million reads per sample for a targeted panel, which is sufficient for sensitive fusion detection [60]. Use a 2x75 bp or 2x100 bp paired-end run configuration.
Bioinformatic Analysis: Demultiplex the sequenced data and align reads to the reference genome using a splice-aware aligner like STAR [59]. Use a fusion detection algorithm (e.g., FusionScan [61]) to identify split reads and discordant read pairs that map to two distinct genes. FusionScan achieves high precision and recall by requiring multiple split reads that join intact exons and implementing extensive filtering to remove false positives.
Experimental Validation: Confirm putative fusion candidates identified by RNA-Seq using an independent method, such as RT-PCR followed by Sanger sequencing. This is a critical step for verifying results, especially for novel fusions [63].
Table 4: Key Research Reagent Solutions for Targeted RNA-Seq Fusion Detection
| Item | Function / Description | Example Product / Method |
|---|---|---|
| Targeted RNA-Seq Panel | A pre-designed set of probes/primers to enrich for genes known to be involved in fusions in a specific cancer type. | Archer FusionPlex Panels, Illumina TruSight RNA Panels [63] [60] |
| Reverse Transcriptase | Enzyme that synthesizes cDNA from an RNA template. | M-MLV Reverse Transcriptase [63] |
| High-Fidelity DNA Polymerase | Enzyme for PCR amplification with low error rate, critical for accurate sequencing. | Q5 Hot-Start Polymerase |
| Library Quantification Kit | Fluorometric assay for accurate quantification of DNA library concentration prior to sequencing. | Qubit dsDNA HS Assay Kit |
| Bioanalyzer/TapeStation | Microfluidic capillary electrophoresis system for assessing library fragment size distribution and quality. | Agilent Bioanalyzer 2100 |
| Stranded RNA Library Prep Kit | For whole-transcriptome approaches, preserves strand information of originating transcript. | Illumina Stranded Total RNA Prep [62] |
| rRNA Depletion Kit | Removes abundant ribosomal RNA, increasing the sequencing capacity for mRNA and other RNAs. | Illumina Ribozero rRNA Removal Kit |
| Fusion Detection Software | Bioinformatics tool to identify split reads and discordant read pairs indicative of gene fusions. | FusionScan [61], STAR-Fusion |
The rigorous detection of gene fusions in cancer research using RNA-Seq is a process heavily dependent on sound experimental design. Researchers must justify their choices of sample size and sequencing depth based on statistical principles, empirical evidence, and the specific requirements of their biological question. As demonstrated, a sample size of fewer than six biological replicates per group carries a high risk of generating unreliable and irreproducible data, while a depth of 60-100 million paired-end reads provides a solid foundation for confident fusion detection. By adhering to the protocols and guidelines outlined in this document, researchers can optimize resource allocation, enhance the reliability of their findings, and ensure that their conclusions stand up to scientific and clinical scrutiny.
Gene fusions are a critical class of oncogenic drivers in cancer pathogenesis, with significant implications for disease classification, prognosis, and therapeutic targeting [64] [7]. The detection of these structural variants has evolved substantially with the advent of advanced genomic technologies. While DNA sequencing (DNA-Seq) has been a cornerstone in genomic profiling, it faces limitations in detecting certain fusion events due to large intronic regions and complex structural rearrangements [7]. RNA sequencing (RNA-Seq) emerged as a powerful complementary approach that directly captures expressed fusion transcripts, overcoming some of DNA-Seq's limitations [7]. More recently, optical genome mapping (OGM) has introduced a novel approach for comprehensive structural variant detection without sequencing [64] [65]. This application note provides a comparative analysis of these technologies, focusing on their respective strengths, limitations, and implementation protocols for gene fusion detection in cancer research and drug development.
Table 1: Comparative Performance of Genomic Technologies for Fusion Detection
| Technology | Detection Principle | Key Advantages | Major Limitations | Best Applications |
|---|---|---|---|---|
| DNA-Seq | DNA-level variant detection across targeted genes | Detects coding variants, CNVs, and some fusions; established workflows | Limited fusion detection due to large introns; misses enhancer hijacking | First-line genomic profiling; SNV/indel detection |
| RNA-Seq | Sequence expressed transcripts | Direct evidence of functional fusion transcripts; unaffected by breakpoint location | Limited to expressed fusions; misses events without fusion transcript generation | Therapeutic target validation; expressed fusion confirmation |
| OGM | Imaging of ultra-high molecular weight DNA | Genome-wide SV detection; identifies cryptic rearrangements and enhancer hijacking | May miss small intrachromosomal deletions interpreted as simple deletions | Comprehensive SV profiling; complex rearrangement analysis |
Recent large-scale studies have demonstrated the complementary nature of these technologies. A pan-cancer analysis of 67,278 patients revealed that combining RNA-Seq with DNA-Seq increased the detection of clinically actionable gene fusions by 21% compared to DNA-Seq alone [7]. The study identified 1,497 patients (2.2%) with at least one of nine therapeutically targetable fusions, with 29% of these fusions detected outside their FDA-approved cancer indications, highlighting the value of comprehensive genomic profiling [7].
A focused analysis in acute leukemia compared a 108-gene targeted RNA-Seq panel with OGM in 467 cases [64] [66]. The overall concordance rate between methods was 74.7% for 234 detected gene rearrangements and fusions. However, significant differences emerged in specific detection capabilities: OGM uniquely identified 37 of 234 (15.8%) clinically relevant rearrangements, while RNA-Seq exclusively identified 22 of 234 (9.4%) [64]. The technologies showed markedly different performance profiles depending on the biological mechanism of the structural variant.
Table 2: Detection Performance Across Acute Leukemia Types
| Leukemia Type | Cases with ≥1 Rearrangement | Tier 1 Aberration Detection Rate | RNA-Seq/OGM Concordance |
|---|---|---|---|
| AML (n=360) | 36.1% | 23.9% | High (specific rate not provided) |
| B-ALL (n=89) | ~75% (estimated) | 60.7% | 80.2% |
| T-ALL (n=12) | 75% | Not specified | 41.7% |
| Overall (n=467) | 43.6% | 31.5% | 74.7% |
Enhancer-hijacking lesions demonstrated particularly poor concordance between platforms (20.6% versus 93.1% for all other aberrations, p < 0.001) [64]. These events, including MECOM, BCL11B, and IGH rearrangements, were predominantly detected by OGM, as they often do not generate fusion transcripts targeted by RNA-Seq panels [64] [66]. Conversely, RNA-Seq slightly outperformed OGM for fusions arising from intrachromosomal deletions that were sometimes labeled by OGM as simple deletions [64].
Sample Preparation and Library Construction
Data Analysis Pipeline
Sample Preparation and Imaging
Data Analysis
Sample Preparation and Sequencing
Structural Variant Analysis
Table 3: Essential Research Reagents and Platforms
| Category | Product/Platform | Application | Key Features |
|---|---|---|---|
| RNA Extraction | PAXgene Blood RNA kit (Qiagen) | RNA stabilization from blood samples | Maintains RNA integrity for transcriptome studies |
| RNA-Seq Library | NEBNext Ultra Directional RNA Library Prep Kit | Whole transcriptome library preparation | Preserves strand information; compatible with rRNA depletion |
| Targeted RNA-Seq | Archer FusionPlex Panels | Fusion detection in targeted genes | AMP chemistry for novel partner identification |
| UHMW DNA Isolation | Bionano Prep SP Blood and Cell DNA Isolation Kit | DNA extraction for OGM | Maintains long DNA fragments essential for mapping |
| OGM Labeling | Bionano DLS Labeling Kit | Fluorescent DNA labeling | Sequence-specific labeling for pattern recognition |
| OGM Platform | Bionano Saphyr System | Genome imaging and analysis | High-throughput imaging with nanochannel arrays |
| Analysis Software | Bionano Access & Solve | OGM data analysis | SV calling, visualization, and annotation |
| Fusion Callers | STAR-Fusion, Archer Analysis | RNA-Seq fusion detection | Sensitive algorithms for fusion transcript identification |
The integration of RNA-Seq and OGM provides a comprehensive approach for structural variant detection, with each technology compensating for the limitations of the other. In pediatric acute lymphoblastic leukemia (ALL), the combination of digital MLPA and RNA-Seq achieved precise classification in 95% of cases, significantly outperforming standard-of-care techniques (46.7%) [67]. OGM as a standalone test demonstrated superior resolution in detecting chromosomal gains and losses (51.7% vs. 35%) and gene fusions (56.7% vs. 30%) compared to conventional methods [67].
For complex structural variants, OGM provides unparalleled capability in resolving intricate rearrangement patterns. In one study of neurodevelopmental disorders, OGM detected a complex rearrangement involving chromosomes 2 and 6 that was much more complex than indicated by conventional cytogenetic analysis [69]. The technology revealed that 17 segments of 6q15 spanning 9.3 Mb were disarranged and joined to 2q11.2, demonstrating its power in resolving complicated genomic architectures.
RNA-Seq plays a critical role in validating the functional consequences of structural variants identified at the DNA level. In the analysis of constitutional abnormalities, RNA-Seq confirmed the pathogenicity of three SVs detected by OGM by demonstrating aberrant expression of the affected genes [69]. This integrated approach solved three previously undiagnosed cases of neurodevelopmental disorders, including a deletion encompassing the promoter and 5'UTR of MBD5 and an intragenic duplication of PAFAH1B1 [69].
The synergy between these technologies is particularly evident in cancer research, where both DNA-level rearrangements and their functional RNA consequences are critical for understanding disease mechanisms and identifying therapeutic targets. The combination provides a complete picture from structural variant to functional outcome, enabling more accurate diagnosis and targeted treatment strategies.
RNA-Seq, DNA-Seq, and OGM offer complementary approaches for gene fusion detection in cancer research. RNA-Seq excels at identifying expressed fusion transcripts with direct therapeutic relevance, while OGM provides comprehensive genome-wide structural variant detection, particularly for cryptic rearrangements and enhancer hijacking events. DNA-Seq remains valuable for detecting coding variants and copy number changes. The integration of these technologies, through workflows and protocols outlined in this application note, provides researchers with a powerful toolkit for comprehensive genomic characterization. This multi-platform approach maximizes detection of clinically actionable variants, ultimately advancing precision oncology and drug development efforts.
The accurate detection of gene fusions is a critical component of precision oncology, directly influencing diagnostic stratification and therapeutic decisions. While fluorescence in situ hybridization (FISH) and reverse transcription-polymerase chain reaction (RT-PCR) have long been the standard methods in clinical practice, RNA sequencing (RNA-seq) presents a powerful alternative with the potential for higher multiplexing capability and the discovery of novel fusion partners. This document details the experimental protocols and summarizes key concordance data from recent clinical studies that validate targeted RNA-seq against FISH and RT-PCR across various patient cohorts and cancer types. The evidence supports the integration of RNA-seq into comprehensive genomic profiling workflows to enhance the detection of clinically actionable gene fusions.
Recent validation studies across different solid tumors and hematological malignancies have consistently demonstrated high concordance between RNA-based next-generation sequencing (NGS) and traditional methods.
Table 1: Concordance of RNA-seq with Orthogonal Methods in Clinical Cohorts
| Cancer Type / Cohort | RNA-seq Method | Orthogonal Method | Key Concordance Findings | Citation |
|---|---|---|---|---|
| Diverse Solid Tumors (n=160) | FoundationOneRNA (Targeted) | Orthogonal NGS (DNA- or RNA-based) | PPA: 98.28%; NPA: 99.89% | [70] [50] |
| Early-stage NSCLC (RET+) (n=39) | Whole-Transcriptome Sequencing (WTS) | FISH | Concordance: 84.6% | [38] |
| Early-stage NSCLC (RET+) (n=40) | DNA-seq | FISH | Concordance: 82.5% | [38] |
| Broad Clinical Cancer Cohort | Targeted RNA-seq (FusionPlex) | FISH & RT-PCR | Increased diagnostic rate to 76%, from 63% with conventional methods alone | [51] |
| Hematological Malignancies (n=21) | Targeted RNA-seq (FusionPlex) | Whole-Transcriptome Sequencing | Concordance for known fusions: 100% | [71] |
PPA: Positive Percent Agreement; NPA: Negative Percent Agreement
These studies underscore the robustness of RNA-seq. The FoundationOneRNA assay demonstrated exceptional accuracy in a diverse set of clinical solid tumor specimens [70] [50]. In a focused study on early-stage non-small cell lung cancer (NSCLC) with RET fusions, RNA-seq showed high concordance with FISH, comparable to that of DNA-seq [38]. Furthermore, targeted RNA-seq has been shown to increase the overall diagnostic yield of fusion genes in a clinical cohort, identifying actionable fusions that were missed by conventional testing algorithms [51].
This protocol is adapted from validation studies that compared RNA-seq with FISH and RT-PCR using existing patient samples [38] [70].
This protocol outlines the steps for determining the analytical sensitivity (Limit of Detection) and precision of an RNA-seq fusion assay, which is critical for clinical application [70].
The diagram below illustrates the end-to-end process for validating RNA-seq for fusion detection against standard methods.
This diagram outlines the decision logic for when different molecular methods are most applicable and how their results complement each other.
Table 2: Essential Reagents and Kits for RNA-seq Fusion Detection Studies
| Item | Function/Application | Example Products |
|---|---|---|
| Nucleic Acid Co-Extraction Kit | Simultaneous purification of DNA and RNA from a single FFPE sample, preserving the limited specimen. | AllPrep DNA/RNA FFPE Kit (Qiagen) [13] |
| Targeted RNA-seq Panel | Biotinylated probe sets for enriching fusion-related transcripts prior to sequencing, increasing sensitivity. | FoundationOneRNA Panel [70], Custom Panels (e.g., 241-gene solid tumor panel) [51] |
| RNA Spike-in Controls | Synthetic RNA molecules added to samples to quantitatively assess sensitivity, enrichment efficiency, and detectability limits. | ERCC Spike-in Mix, Fusion Sequins [51] |
| Library Prep Kit | Prepares RNA sequencing libraries from input RNA, often optimized for degraded FFPE-derived RNA. | TruSeq Stranded mRNA Kit (Illumina) [13], SureSelect XTHS2 RNA Kit (Agilent) [13] |
| Bioinformatic Tools | Software packages for aligning sequencing reads and calling fusion events with high specificity. | STAR-Fusion, FusionCatcher [51] |
| Orthogonal Validation Assay | Independent method to confirm novel or uncertain fusion calls identified by RNA-seq. | Digital PCR Assays [71], FISH Probes [70] |
The protocols and data summarized herein provide a robust framework for the clinical validation of RNA sequencing in the detection of gene fusions. The high concordance rates with established methods like FISH and RT-PCR, combined with the inherent advantages of multiplexing and novel fusion discovery, position targeted RNA-seq as an indispensable tool in modern oncology. Its integration into comprehensive genomic profiling, often alongside DNA-based analysis, ensures the most complete molecular characterization of tumors, ultimately facilitating personalized treatment strategies and improving patient outcomes.
The emergence of tumour-agnostic therapeutic paradigms, where treatment is guided by molecular alterations rather than tissue of origin, has fundamentally shifted diagnostic requirements in clinical oncology [73]. This precision medicine approach creates a pressing need for comprehensive genomic profiling that can reliably detect a broad spectrum of biomarker classes across diverse cancer types. While DNA-based next-generation sequencing (NGS) has become a cornerstone of molecular diagnostics, it possesses inherent limitations for detecting certain structural variants and expressed biomarkers. The integration of RNA sequencing with DNA analysis presents a powerful solution to these limitations, significantly expanding the diagnostic capability available to researchers, scientists, and drug development professionals. This application note synthesizes recent pan-cancer evidence demonstrating the substantial added diagnostic yield of combined RNA/DNA sequencing, provides validated experimental protocols for implementation, and illustrates key bioinformatic workflows essential for clinical-grade fusion detection.
DNA-based sequencing approaches, whether employing targeted panels or whole exome sequencing, effectively detect single nucleotide variants (SNVs), small insertions/deletions (indels), and copy number variations (CNVs). However, they face inherent challenges in identifying gene fusions, exon-skipping events, and other expressed alterations that are critical therapeutic targets.
The fundamental limitation stems from the nature of genomic rearrangements. DNA sequencing relies on sufficient coverage across breakpoint regions, which can be expansive and unpredictable. For instance, DNA-based assays may miss functionally critical fusions due to several factors [27]:
These limitations are not merely theoretical. In one validation study, an RNA sequencing assay identified a previously missed MET fusion in a clinical sample that had been characterized as fusion-negative by a DNA panel. This fusion was subsequently confirmed by RT-PCR and Sanger sequencing, revealing a false-negative result in the DNA-only approach [27].
Table 1: Limitations of DNA-Only Sequencing for Fusion Detection
| Limitation Factor | Impact on Detection | Clinical Consequence |
|---|---|---|
| Breakpoints in non-coding regions | Inaccessible to targeted panels | Missed actionable fusions (e.g., NTRK, RET) |
| Large intronic regions | Incomplete coverage of potential breakpoints | Reduced sensitivity for known fusion drivers |
| Complex rearrangements | Difficult to reconstruct from short reads | Failure to detect novel fusion combinations |
| Expression ambiguity | Inability to confirm transcription of rearranged genes | Potential reporting of biologically irrelevant fusions |
The most consistently documented benefit of integrated RNA/DNA sequencing is the significant improvement in gene fusion detection. Evidence from large-scale clinical cohorts demonstrates that RNA sequencing complements DNA analysis by both verifying rearrangements identified at the DNA level and discovering additional fusions missed by DNA-only approaches.
In a comprehensive analysis of 2230 clinical tumor samples, the integration of whole exome sequencing (WES) with RNA sequencing improved the detection of gene fusions and enabled direct correlation of somatic alterations with gene expression profiles [13]. The combined approach recovered clinically actionable fusions that would have been missed by DNA-only testing, with complex genomic rearrangements particularly likely to remain undetected without RNA data.
Similarly, a pan-cancer study of 1166 tissue samples encompassing 29 cancer types utilized a comprehensive DNA/RNA profiling panel and found that 62.3% of samples harbored at least one actionable biomarker [73]. While the vast majority of somatic variants (4.6%) were identified through DNA analysis, a critical 0.1% of significant findings—particularly fusions—were exclusively detected via RNA sequencing. This study further identified at least one tumour-agnostic biomarker (including MSI-high, TMB-high, NTRK/RET fusions, and BRAF V600E) in 8.4% of samples across 26 different cancer types, highlighting the importance of comprehensive profiling for tumour-agnostic treatment strategies.
Table 2: Added Diagnostic Yield of Combined RNA/DNA Sequencing in Pan-Cancer Studies
| Study Cohort | Detection Method | Key Findings on Added Yield | Clinical Impact |
|---|---|---|---|
| 2230 clinical tumor samples [13] | WES + RNA-seq | Improved fusion detection; Recovery of variants missed by DNA-only; Complex rearrangement identification | Actionable alterations found in 98% of cases |
| 1166 Asian cohort samples (29 cancer types) [73] | DNA/RNA CGP panel | 0.1% of significant findings exclusively from RNA; Tumor-agnostic biomarkers in 8.4% across 26 cancers | 62.3% samples had actionable biomarkers |
| 60 FFPE solid tumors [11] | Integrated DNA/RNA NGS | Additional TPM3::NTRK1 fusion identified; 100% sensitivity after calibration | Complementarity of DNA and RNA levels |
| 101 NSCLC samples [24] | Whole transcriptome sequencing | 68.9% of identified fusions were potentially actionable | Higher actionability rate in NSCLC |
The analytical performance of combined sequencing approaches has been rigorously validated against reference standards and orthogonal methods. In one development study, a custom-designed integrated DNA/RNA NGS assay accurately identified all 10 different fusion types in commercial reference standards and 29 fusions (including 16 different forms) in 60 clinical solid tumor samples [11]. The assay demonstrated 100% sensitivity and 96.9% specificity after identifying an additional TPM3::NTRK1 fusion that was initially missed by previous testing methods.
For fusion detection specifically, RNA sequencing alone has demonstrated exceptional performance characteristics. One whole transcriptome sequencing (WTS) assay successfully identified 62 out of 63 known gene fusions, achieving a sensitivity of 98.4% with 100% specificity across validation cohorts [24]. The same study established optimal RNA quality thresholds (DV200 ≥ 30%), input requirements (>100 ng), and sequencing depth (>80 million mapped reads) for reliable fusion detection in clinical samples.
The clinical actionability of findings from combined sequencing is particularly noteworthy. In non-small cell lung cancer (NSCLC), where fusion drivers are well-established, 68.9% of fusions identified by WTS were potentially actionable [24]. This high actionability rate underscores the therapeutic importance of comprehensive fusion detection, particularly in molecularly-defined cancer subtypes.
Sample Requirements:
Extraction Protocol:
Quality Control Thresholds:
Dual-Stranded RNA Library Preparation:
Whole Exome Library Preparation:
Sequencing Parameters:
Figure 1: Integrated RNA and DNA Sequencing Workflow. The protocol encompasses co-extraction, quality control, library preparation, and integrated bioinformatic analysis for comprehensive genomic profiling.
Reference Genome: GRCh38/hg38 with alt-aware alignment [13]
DNA Sequencing Analysis:
RNA Sequencing Analysis:
Somatic Variant Calling (DNA):
RNA-Based Fusion Detection:
Integrated Analysis:
Figure 2: Bioinformatic Pipeline for Integrated DNA and RNA Analysis. Parallel analysis of DNA and RNA sequencing data followed by integrated interpretation enhances variant detection and clinical actionability.
Table 3: Essential Research Reagents for Combined RNA/DNA Sequencing
| Reagent/Category | Specific Examples | Function & Application |
|---|---|---|
| Nucleic Acid Extraction | AllPrep DNA/RNA FFPE Kit (Qiagen); RNeasy FFPE Kit (Qiagen) | Co-extraction of DNA and RNA while preserving quality; Specialized for challenging FFPE samples [13] [24] |
| Library Preparation | TruSeq stranded mRNA kit (Illumina); NEBNext Ultra II Directional RNA Library Prep Kit; SureSelect XTHS2 | Construction of sequencing libraries; Maintains strand specificity; Compatible with degraded RNA [13] [24] |
| Target Enrichment | SureSelect Human All Exon V7 + UTR (Agilent); Custom targeted panels | Capture of exonic regions and untranslated regions; Focus on clinically relevant genes [13] [73] |
| RNA Depletion | NEBNext rRNA Depletion Kit (Human/Mouse/Rat) | Removal of ribosomal RNA to enrich for mRNA; Improves sequencing efficiency [24] |
| Reference Materials | GeneWell fusion reference standards; Commercial FFPE reference materials | Analytical validation; Quality control; Establishing limits of detection [11] |
| Bioinformatic Tools | GFvoter; LongGF; JAFFAL; FusionSeeker; Strelka2; STAR | Specialized fusion detection; Somatic variant calling; Splice-aware alignment [13] [9] |
Gene fusions identified through combined RNA/DNA sequencing frequently drive oncogenesis through constitutive activation of key signaling pathways. Understanding these pathways is essential for interpreting the functional significance of fusion events and identifying potential therapeutic vulnerabilities.
The most clinically significant fusions typically function as oncogenic drivers through several mechanisms:
For example, MET exon 14 skipping alterations—detectable through RNA sequencing—lead to increased MET protein stability and subsequent activation of downstream RAS-RAF-MEK-ERK and PI3K/AKT pathways, promoting tumor growth and proliferation [24].
Figure 3: Signaling Pathways Activated by Oncogenic Fusions. Gene fusions detected through combined RNA/DNA sequencing drive oncogenesis through constitutive activation of key proliferative and survival pathways.
The accumulated pan-cancer evidence unequivocally demonstrates that combined RNA/DNA sequencing provides substantial added diagnostic yield compared to DNA-only approaches. This integrated methodology enhances detection of clinically actionable gene fusions, resolves ambiguous findings from DNA sequencing alone, and provides a more comprehensive molecular portrait of individual tumors. The protocols and analytical frameworks presented herein provide researchers and drug development professionals with validated methodologies for implementing this powerful approach in both basic research and clinical translation contexts. As precision medicine continues to evolve toward increasingly biomarker-driven treatment paradigms, the integration of transcriptional data with genomic profiling will become increasingly essential for optimizing therapeutic strategies and advancing drug development.
Gene fusions are pivotal drivers in oncogenesis, serving as critical biomarkers for cancer diagnosis, prognosis, and targeted therapy development [25] [76]. The detection of these chimeric transcripts has been revolutionized by RNA sequencing (RNA-seq), with both short-read and long-read technologies offering distinct advantages and challenges for accurate fusion calling [77] [78]. This application note provides a comprehensive evaluation of bioinformatics tools for fusion detection across sequencing platforms, presenting structured performance comparisons, detailed experimental protocols, and standardized workflows to guide researchers in selecting and implementing optimal fusion calling strategies in cancer research.
The evolution of sequencing technologies has fundamentally transformed fusion detection paradigms. Short-read RNA-seq from Illumina platforms has enabled cost-effective, high-throughput profiling but struggles to resolve complex fusion isoforms and breakpoints due to read length limitations [25]. Conversely, long-read technologies from PacBio and Oxford Nanopore Technologies (ONT) capture full-length transcript sequences, providing unambiguous fusion isoform characterization but historically facing higher error rates and throughput challenges [78] [79]. Recent advancements in both platforms have significantly improved accuracy and throughput, making long-read fusion detection increasingly viable for clinical and research applications [25] [79].
Multiple comprehensive studies have evaluated the performance of fusion detection tools for short-read RNA-seq data. Table 1 summarizes the key performance metrics of leading tools based on benchmarking studies involving simulated data and real cancer cell line RNA-seq [77].
Table 1: Performance Comparison of Short-Read Fusion Detection Tools
| Tool | Sensitivity | Precision | Execution Speed | Key Strengths |
|---|---|---|---|---|
| Arriba | High | High | Fast (minutes) | Excellent sensitivity for driver fusions; fast runtime [77] [80] |
| STAR-Fusion | High | High | Fast | Robust alignment-based approach; user-friendly output [77] |
| STAR-SEQR | High | High | Fast | Integrates with STAR aligner; good performance on real data [77] |
| FusionCatcher | Moderate | Moderate | Moderate | Comprehensive filtering; detects wide range of fusion types [81] |
| JAFFA | Moderate | High | Slow | Assembly-based approach; good precision [77] |
| FusionMap | Moderate | Moderate | Moderate | Windows-compatible; reference comparison tool [81] |
Benchmarking analyses reveal that Arriba, STAR-Fusion, and STAR-SEQR consistently demonstrate superior accuracy and efficiency for fusion detection in cancer transcriptomes [77]. These tools effectively identified clinically relevant driver fusions in pancreatic cancer samples, including ALK, BRAF, FGFR2, NRG1, NTRK1, NTRK3, RET, and ROS1 fusions, which were significantly associated with KRAS wild-type tumors and involved proteins stimulating the MAPK signaling pathway [80]. When applied to a large collection of published pancreatic cancer samples (n = 803), Arriba specifically identified various driver fusions affecting druggable proteins [80].
The development of specialized tools for long-read data has accelerated with improvements in PacBio and ONT technologies. Table 2 presents performance metrics for long-read fusion callers based on benchmarking with simulated and genuine long-read RNA-seq [25] [78].
Table 2: Performance Comparison of Long-Read Fusion Detection Tools
| Tool | Platform | Sensitivity | Precision | Unique Capabilities |
|---|---|---|---|---|
| CTAT-LR-Fusion | PacBio, ONT | High | High | Bulk and single-cell support; short-read integration [25] |
| JAFFAL | PacBio, ONT | High | Moderate | Comprehensive fusion annotation; filters false positives [78] |
| LongGF | PacBio, ONT | Moderate | High | Good for full-length fusion transcripts [25] [78] |
| FusionSeeker | PacBio, ONT | Moderate | Moderate | Specialized for isoform sequencing [25] [78] |
| Anchored-fusion | Short-read | High | High | Targeted detection; deep learning filtering [76] |
Recent benchmarking demonstrates that CTAT-LR-Fusion exceeds the fusion detection accuracy of alternative methods on both simulated and genuine long-read RNA-seq data [25]. The tool's modularized software includes chimeric read extraction, fusion transcript identification, expression quantification, gene fusion annotation, and interactive visualization capabilities [25]. When combining short and long reads in CTAT-LR-Fusion, researchers can maximize the detection of fusion splicing isoforms and fusion-expressing tumor cells in both bulk and single-cell RNA-seq applications [25].
For challenging detection scenarios involving sequence homology, Anchored-fusion employs a novel approach that anchors a user-specified gene of interest and incorporates a hierarchical view learning and distillation (HVLD) deep learning framework to filter false positives while maintaining sensitivity [76]. This method is particularly valuable for detecting fusion genes with low sequencing depth, such as in single-cell and clinical contexts [76].
Integrating multiple tools and sequencing technologies can significantly enhance fusion detection accuracy. A benchmarking study of eight long-read structural variant callers demonstrated that combining multiple tools and testing different combinations substantially improves validation of somatic alterations in cancer genomes [82]. The study employed Sniffles, cuteSV, Delly, DeBreak, Dysgu, NanoVar, SVIM, and Severus on paired tumor and matched normal samples from lung cancer and melanoma cell lines, revealing that different tools have complementary strengths for various variant types [82].
For comprehensive fusion characterization, a hybrid approach leveraging both short-read and long-read technologies provides optimal results. Short-read data offers higher sequencing depth for initial detection, while long-read data enables complete isoform resolution [25] [79]. This strategy is particularly valuable in single-cell contexts, where long-read sequencing retains transcripts shorter than 500 bp and enables removal of degraded cDNA contaminated by template switching oligos, artifacts identifiable only from full-length transcripts [79].
The following protocol outlines a comprehensive workflow for fusion detection from RNA-seq data, adaptable to both short-read and long-read platforms.
RNA Extraction and Quality Control
Library Preparation Optimization
Platform-Specific Considerations
The computational workflow for fusion detection involves sequential steps from raw data processing to final fusion calling, with platform-specific considerations at each stage.
Short-Read Data
Long-Read Data
Short-Read Alignment
Long-Read Alignment
Fusion Calling Execution
Result Filtering and Annotation
Visual Validation
Table 3: Key Research Reagents and Materials for Fusion Detection Studies
| Reagent/Material | Function | Example Products |
|---|---|---|
| RNA Extraction Kits | High-quality RNA isolation with DNA removal | QIAGen RNeasy Plus, Zymo Quick-RNA |
| RNA Integrity Tools | Assess RNA quality for library preparation | Agilent Bioanalyzer RNA Nano, TapeStation |
| Library Prep Kits | Platform-specific cDNA library construction | Illumina Stranded mRNA Prep, ONT PCR-cDNA Kit, PacBio MAS-ISO-seq Kit |
| UMI Adapters | Unique molecular identifiers for duplicate removal | IDT UMI adapters, ONT UMI kits |
| Reverse Transcriptase | High-processivity cDNA synthesis | SuperScript IV, Induro RT (for direct RNA) |
| Exonuclease I | Prevent internal priming in cDNA synthesis | NEB Exonuclease I [78] |
| Reference Standards | Positive controls for fusion detection | Universal Human Reference RNA, Seraseq Fusion RNA standards |
Table 4: Essential Computational Resources for Fusion Analysis
| Resource Type | Specific Tools/Resources | Application Context |
|---|---|---|
| Quality Control | FastQC, MultiQC, NanoPlot | Pre-alignment quality assessment |
| Alignment Tools | STAR, minimap2, HISAT2 | Read mapping to reference genome |
| Fusion Callers | Arriba, STAR-Fusion, CTAT-LR-Fusion, JAFFAL | Platform-specific fusion detection |
| Annotation Databases | COSMIC, Mitelman, ChimerDB | Biological and clinical interpretation |
| Visualization Tools | IGV, IGV.js, FusionInspector | Result validation and exploration |
| Benchmarking Sets | COLO829 truth set, UHRR with spike-ins | Performance validation [82] [78] |
The evolving landscape of fusion detection tools presents researchers with multiple robust options for identifying cancer-relevant gene fusions across sequencing platforms. Short-read methods like Arriba and STAR-Fusion offer speed and sensitivity for large-scale studies, while long-read tools like CTAT-LR-Fusion provide unparalleled resolution of fusion isoforms and breakpoints. The optimal fusion detection strategy often involves combining multiple tools and sequencing technologies to leverage their complementary strengths, with careful attention to platform-specific library preparation and computational analysis parameters.
As sequencing technologies continue to advance, with improvements in read length, accuracy, and throughput, fusion detection methodologies will likely converge towards long-read-dominated workflows that provide complete transcriptome characterization. The integration of machine learning approaches, as demonstrated by Anchored-fusion's HVLD framework, represents a promising direction for enhancing detection sensitivity while controlling false positives. By adopting the standardized protocols and performance benchmarks outlined in this application note, researchers can implement reliable, reproducible fusion detection pipelines to advance cancer genomics research and precision oncology applications.
RNA sequencing has firmly established itself as an indispensable tool for the precise detection of oncogenic gene fusions, directly impacting cancer diagnosis and the expansion of targeted therapeutic options. The integration of RNA-seq with DNA-based NGS is critical, with real-world pan-cancer studies demonstrating a significant (over 21%) increase in the detection of clinically actionable fusions compared to DNA-seq alone. While targeted panels offer sensitive and cost-effective clinical profiling, long-read sequencing presents a promising frontier for resolving complex rearrangements. Future directions must focus on standardizing bioinformatic pipelines, validating assays across diverse cancer types, and further integrating RNA-seq data into clinical trial designs to unlock the full potential of precision medicine. This multi-faceted approach will ultimately improve patient outcomes by ensuring that fusion-driven cancers are accurately identified and matched with effective therapies.