NGS for Microsatellite Instability Testing: A Comprehensive Review of Performance, Applications, and Clinical Validation

Jackson Simmons Nov 26, 2025 383

This article provides a comprehensive analysis of next-generation sequencing (NGS) for detecting microsatellite instability (MSI), a critical biomarker for immunotherapy response and Lynch syndrome identification. Tailored for researchers, scientists, and drug development professionals, it explores the foundational biology of MSI, details the development and application of novel NGS algorithms and panels, addresses key troubleshooting and optimization challenges, and presents extensive validation data comparing NGS performance against traditional methods like immunohistochemistry (IHC) and PCR. The synthesis of recent large-scale studies offers crucial insights for implementing NGS-MSI in clinical research and therapeutic development.

NGS for Microsatellite Instability Testing: A Comprehensive Review of Performance, Applications, and Clinical Validation

Abstract

This article provides a comprehensive analysis of next-generation sequencing (NGS) for detecting microsatellite instability (MSI), a critical biomarker for immunotherapy response and Lynch syndrome identification. Tailored for researchers, scientists, and drug development professionals, it explores the foundational biology of MSI, details the development and application of novel NGS algorithms and panels, addresses key troubleshooting and optimization challenges, and presents extensive validation data comparing NGS performance against traditional methods like immunohistochemistry (IHC) and PCR. The synthesis of recent large-scale studies offers crucial insights for implementing NGS-MSI in clinical research and therapeutic development.

The Biological and Clinical Foundation of Microsatellite Instability

The DNA mismatch repair (MMR) system is a critical guardian of genomic stability, responsible for correcting base-base mismatches and insertion-deletion loops (IDLs) that arise during DNA replication [1]. This sophisticated repair mechanism involves a coordinated complex of proteins including MutS homologs (MSH2, MSH3, MSH6) and MutL homologs (MLH1, PMS2, MLH3) that sequentially detect, excise, and repair mismatched DNA regions [1]. The MMR process begins with error recognition by MutS heterodimers—MutSα (MSH2-MSH6) for base mismatches and small IDLs, and MutSβ (MSH2-MSH3) for larger IDLs [1]. Following recognition, MutL complexes, particularly MutLα (MLH1-PMS2), are recruited to initiate the excision and repair process [1].

When the MMR system becomes compromised through genetic, epigenetic, or acquired factors, the resulting deficiency (dMMR) leads to the accumulation of replication errors, particularly in microsatellite regions [1]. Microsatellites, also known as short tandem repeats (STRs), consist of short repetitive DNA sequences (1-6 base pair units) distributed throughout the genome that are particularly prone to replication errors due to their repetitive structure [2] [3]. The failure to repair these errors results in microsatellite instability (MSI), characterized by length alterations in microsatellite regions that are detectable as novel alleles not present in corresponding germline DNA [4].

MSI has emerged as a crucial biomarker in oncology with significant implications for cancer diagnosis, prognosis, and treatment selection. Tumors exhibiting high levels of MSI (MSI-H) demonstrate markedly increased mutational burden and enhanced responsiveness to immune checkpoint inhibitors [5] [6]. This comprehensive review examines the molecular mechanisms underlying MMR deficiency, explores the genomic consequences of MSI, and provides an objective comparison of current methodologies for MSI detection with particular emphasis on next-generation sequencing (NGS) performance characteristics.

Molecular Mechanisms of Mismatch Repair Deficiency

The Canonical MMR Pathway

The human MMR system operates through a highly conserved mechanism that ensures replication fidelity by correcting mispaired bases and small insertion-deletion loops. The process begins with error recognition by MutS homolog complexes that scan newly synthesized DNA [1]. The MSH2-MSH6 (MutSα) heterodimer specializes in detecting single base-base mismatches and small IDLs (1-2 nucleotides), while MSH2-MSH3 (MutSβ) identifies larger IDLs (up to approximately 16 nucleotides) [1]. Upon binding to mismatched DNA, MutS complexes undergo conformational changes facilitated by ATP binding, forming a sliding clamp that recruits downstream repair effectors [1].

Following mismatch recognition, MutL homolog complexes are recruited to initiate the excision phase. The primary heterodimer MLH1-PMS2 (MutLα) interacts with the mismatch-bound MutS complex in an ATP-dependent manner and introduces single-strand nicks in the newly synthesized DNA strand [1]. The ability to distinguish the nascent strand from the parental template is critical for repair fidelity and is facilitated by pre-existing nicks in the lagging strand or through interactions with replication factors such as proliferating cell nuclear antigen (PCNA) on the leading strand [1].

The excision process is mediated by exonuclease 1 (EXO1), which removes the DNA segment containing the mismatch in a 5'→3' direction [1]. EXO1 activity is tightly regulated to prevent excessive DNA degradation. During excision, replication protein A (RPA) binds to the resulting single-stranded DNA to prevent secondary structure formation and protect against further degradation [1]. Following excision, DNA polymerase δ resynthesizes the removed segment using the undamaged strand as a template, with PCNA enhancing polymerase processivity. The final nick is sealed by DNA ligase I, completing the repair process [1].

Figure 1: Canonical Mismatch Repair Pathway. The MMR system corrects replication errors through sequential recognition, excision, and resynthesis steps. MutS complexes (MutSα/MutSβ) recognize mismatches, MutLα initiates repair, EXO1 excises erroneous DNA, and polymerase δ/ligase I complete resynthesis and sealing. PCNA facilitates strand discrimination and polymerase processivity [1].

Mechanisms of MMR Deficiency

MMR deficiency can arise through multiple mechanisms, including germline mutations, somatic alterations, and epigenetic silencing. Germline mutations in MMR genes (particularly MLH1, MSH2, MSH6, and PMS2) are the hallmark of Lynch syndrome, which predisposes individuals to various cancers, most notably colorectal and endometrial carcinomas [1]. These inherited mutations follow Knudson's two-hit hypothesis, with one allele inactivated germline and the second somatically inactivated in tumor tissue [1].

Somatic alterations in MMR genes represent another pathway to MMR deficiency, occurring exclusively in tumor cells rather than the germline. These include point mutations, small insertions/deletions, and copy number alterations that disrupt gene function [1]. Somatic hypermethylation of the MLH1 promoter represents a common epigenetic mechanism of MMR deficiency, particularly in sporadic colorectal cancers [6]. This methylation silences MLH1 expression, leading to loss of MutLα function and subsequent MSI [6].

Emerging research has revealed non-canonical functions of MMR proteins beyond replication error correction. Recent studies indicate that MMR pathway components participate in oxidative DNA damage repair, with synthetic lethal interactions observed between MMR genes and polymerase beta (POLβ) or polymerase gamma (POLγ) [1]. The MSH2/POLβ synthetic lethality associates with accumulation of 8-oxo-G lesions in nuclear DNA, while MLH1/POLγ deficiency leads to buildup of mitochondrial 8-oxo-G lesions [1]. MMR proteins also contribute to DNA demethylation processes through long-patch base excision repair mechanisms, processing deamination products like dU•G, 5ohU•G, T•G, and 5hmU•G mismatches that may trigger active DNA demethylation [1].

Genomic Consequences of MMR Deficiency

Microsatellite Instability and Mutational Signatures

The primary consequence of MMR deficiency is microsatellite instability, which serves as a phenotypic marker of dMMR. Microsatellites are particularly vulnerable to replication errors due to their repetitive nature, which promotes DNA polymerase slippage during replication [3]. In MMR-proficient cells, these slippage errors are efficiently corrected, but in dMMR cells, they persist as insertion or deletion mutations, altering the length of microsatellite regions [2].

MSI has a profound impact on the cancer genome, generating a hypermutator phenotype characterized by significantly elevated mutation rates compared to MMR-proficient tumors [5]. This hypermutation extends beyond microsatellite regions to include single nucleotide variants and indels throughout the genome [5]. The resulting increased tumor mutational burden (TMB) contributes to enhanced immunogenicity, as neoantigens derived from somatic mutations are recognized by the immune system [5] [6].

MMR-deficient tumors exhibit distinctive mutational signatures dominated by insertions and deletions at mononucleotide repeats, particularly polyA/T tracts [2]. This signature reflects the preferential activity of MutSα on single-base mismatches and single-nucleotide IDLs [1]. Specific genes containing coding microsatellites become frequent mutation targets in MSI-H tumors, including ACVR2A, TGFBR2, BAX, MSH3, and MSH6 [2]. The ACVR2A gene, which contains a polyA tract in its coding sequence, demonstrates particularly high mutation frequency in MSI-H tumors, with one study reporting the chr2:g.148683686del (ACVR2A: c.1310del) mutation in 66.6% of MSI-H cases [2].

Biological and Clinical Implications

The genomic consequences of MMR deficiency have significant biological and clinical implications. MSI-H tumors exhibit altered responses to conventional chemotherapeutic agents, with numerous studies demonstrating increased sensitivity to various DNA-damaging drugs [5]. In colorectal cancer, MSI-H cell lines show significantly lower IC50 values for 5-fluorouracil, oxaliplatin, and irinotecan compared to microsatellite stable (MSS) cell lines [5]. This enhanced sensitivity appears mediated through downregulation of DNA damage response pathways, particularly non-homologous end joining (NHEJ) in MSI-H colorectal cancers [5].

From a therapeutic perspective, the high mutational burden of MSI-H tumors generates abundant neoantigens that make these cancers particularly susceptible to immune checkpoint blockade [5] [6]. Pembrolizumab received landmark FDA approval as the first tissue-agnostic cancer treatment for MSI-H/dMMR solid tumors, establishing MSI status as a predictive biomarker for immunotherapy response across multiple cancer types [6]. This approval underscores the clinical importance of accurate MSI status determination in treatment selection.

MSI also serves as a valuable diagnostic marker for Lynch syndrome identification. Detection of MSI in tumors should prompt genetic counseling and testing for germline MMR gene mutations, enabling targeted cancer surveillance and risk-reducing interventions for affected individuals and family members [6].

Methodologies for MSI Detection: Comparative Performance Analysis

Traditional MSI Detection Methods

Immunohistochemistry (IHC) and polymerase chain reaction (PCR)-based fragment analysis have long been considered the gold standard methods for MSI detection in clinical diagnostics [2] [6]. IHC indirectly assesses MMR function by detecting nuclear expression of MLH1, MSH2, MSH6, and PMS2 proteins in tumor tissue [6]. Loss of expression of one or more MMR proteins suggests MMR deficiency and correlates strongly with MSI [6]. While IHC provides information about which specific MMR protein may be affected, it has limitations including susceptibility to interpretation errors due to heterogeneous staining and inability to detect non-truncating mutations that preserve antigenicity despite functional impairment [2].

PCR-based MSI testing directly evaluates microsatellite length alterations by comparing tumor DNA to matched normal DNA at specific microsatellite loci [2]. The most widely adopted approach uses a panel of five quasi-monomorphic mononucleotide repeats, which provides improved performance compared to the original Bethesda panel [2]. While PCR demonstrates high concordance with IHC (up to 97%), its application is primarily validated for colorectal cancers, and performance in other malignancies remains less established [2]. Additionally, PCR requires matched normal tissue for accurate interpretation and assesses only a limited number of loci [6].

Table 1: Comparison of Traditional MSI Detection Methods

Parameter Immunohistochemistry (IHC) PCR-Based Fragment Analysis
Principle Detects MMR protein expression via antibodies Detects length alterations in microsatellite loci
Target MLH1, MSH2, MSH6, PMS2 proteins 5-6 mononucleotide microsatellite loci
Advantages Identifies affected MMR protein; no normal tissue required Direct measurement of MSI; highly standardized
Limitations Cannot detect non-truncating mutations; subjective interpretation Requires matched normal tissue; limited loci
Concordance with Reference ~97% with PCR [2] Gold standard
Optimal Application First-line screening; Lynch syndrome identification Confirmatory testing; when IHC ambiguous

Next-Generation Sequencing for MSI Detection

Next-generation sequencing has emerged as a comprehensive approach for MSI detection that offers several advantages over traditional methods [2] [6]. NGS-based MSI analysis evaluates dozens to hundreds of microsatellite loci simultaneously, providing broader genomic coverage than PCR-based methods [2]. This expanded coverage may improve sensitivity, particularly in non-colorectal cancers where traditional panels show reduced performance [2]. NGS platforms can simultaneously assess MSI status, tumor mutational burden, and specific genetic alterations in a single assay, maximizing information from limited tissue samples [6].

A key advantage of NGS-based MSI detection is the elimination of the requirement for matched normal tissue, as computational algorithms compare tumor microsatellite length distributions to reference models derived from stable samples [2] [7]. Multiple bioinformatics tools have been developed for NGS-based MSI determination, including MSIsensor, MSI-ColonCore, and MIAmS, which employ various algorithms to classify MS status based on length distribution patterns at microsatellite loci [2] [7].

Validation studies have demonstrated strong overall concordance between NGS and traditional methods. One large-scale retrospective analysis of 35,563 pan-cancer cases reported high performance for an NGS-based algorithm (MSIDRL) using 100 carefully selected microsatellite loci [2]. Another study evaluating Illumina's targeted NGS panels (TruSight Tumor 170 and TruSight Oncology 500) in 331 cancer patients found overall high concordance with PCR (AUC=0.922), though sensitivity was lower in colorectal cancers (AUC=0.867) [6].

Table 2: Performance Characteristics of NGS-Based MSI Detection Across Studies

Study NGS Method Sample Size Cancer Types Concordance with Reference Limitations
Wang et al. (2025) [2] MSIDRL (100 loci) 35,563 Pan-cancer High overall performance Retrospective design
Illumina Panel Study (2025) [6] TruSight Tumor 170/TSO500 331 Multiple (71.2% CRC) AUC=0.922 overall; AUC=0.867 for CRC Lower sensitivity in CRC
NSCLC Study (2025) [7] MIAmS (7 loci) 1,547 Lung cancer 100% for dMMR cases Rare MSI in lung cancer (0.39%)
MSKCC Study [2] NGS-MSI Multiple Multiple 99.4% (CRC/endometrial); 96.6% (other) Literature reference

Artificial intelligence and machine learning approaches are being integrated into NGS-based MSI detection to improve classification accuracy. The MIAmS tool employs two independent classifiers (MSINGS and SVCpairs) that use machine learning models of length distributions for stable and unstable microsatellites, combining predictions to determine MS status with high confidence [7]. These computational advances address the challenge of overlapping score distributions between MSI-H and MSS cases that can complicate interpretation, particularly in colorectal cancers [6].

Experimental Protocols for MSI Detection

NGS-Based MSI Detection Workflow

The protocol for NGS-based MSI detection begins with sample preparation and DNA extraction from formalin-fixed paraffin-embedded (FFPE) tumor tissue [6] [7]. Specimens should contain sufficient tumor content (typically ≥30% tumor nuclei) as determined by pathological assessment [7]. DNA extraction is performed using standardized kits, such as the Maxwell RSC DNA FFPE Kit, followed by quantification using fluorometric methods like Qubit dsDNA High Sensitivity Assay [7].

Library preparation involves targeted enrichment of microsatellite loci and relevant cancer genes. For example, the Advanta Solid Tumor NGS Library Prep Assay enriches 7 microsatellite loci and 49 cancer-related genes (245 kb total) [7]. The process typically includes initial PCR amplification of target regions, purification with AMPure XP beads, followed by a second PCR to integrate sequencing adapters [7]. Libraries are then quantified, normalized, and sequenced on platforms such as Illumina NextSeq with 2×150 bp paired-end reads [7].

Bioinformatic analysis constitutes a critical component of NGS-based MSI detection. The process involves aligning sequencing reads to a reference genome, followed by microsatellite length analysis at designated loci [2] [7]. For each microsatellite locus, the algorithm calculates the proportion of reads deviating from the expected reference length, often defined by a "diacritical repeat length" (DRL) that maximizes separation between stable and unstable distributions [2]. The unstable locus count (ULC) or MSI score is then calculated by summing loci exceeding specific instability thresholds [2].

Figure 2: NGS-Based MSI Detection Workflow. The process begins with DNA extraction from FFPE tissue, followed by library preparation targeting microsatellite loci, sequencing, and bioinformatic analysis. MSI classification is based on unstable locus count (ULC) relative to established cutoffs [2] [6] [7].

PCR-Based MSI Reference Method

The reference PCR-based method for MSI detection follows established protocols using panels of mononucleotide markers. The Promega MSI Analysis System, which employs five quasi-monomorphic mononucleotide repeats (BAT-25, BAT-26, NR-21, NR-24, and MONO-27), represents a widely adopted approach [2]. The protocol involves DNA extraction from paired tumor and normal tissue, PCR amplification with fluorescently labeled primers, capillary electrophoresis for fragment analysis, and size determination using genetic analyzers [2].

Instability is defined as the presence of novel peaks in the tumor DNA that deviate from the corresponding normal pattern at two or more loci (for the 5-marker panel) [2]. Samples are classified as MSI-H when ≥2 markers show instability, MSI-L (low instability) when only one marker shows instability, and MSS when no unstable markers are detected [2]. This method benefits from extensive validation and standardization but requires matched normal tissue for accurate interpretation [2].

Table 3: Essential Research Reagents for MSI Detection Studies

Category Specific Product/Kit Manufacturer/Provider Application Notes
DNA Extraction Maxwell RSC DNA FFPE Kit Promega Optimal for degraded FFPE-derived DNA [7]
DNA Quantification Qubit dsDNA HS Assay Kit Thermo Fisher Scientific Fluorometric quantification for NGS [7]
PCR-Based MSI MSI Analysis System Promega Five mononucleotide markers; reference method [2]
NGS Library Prep TruSight Oncology 500 Illumina Comprehensive profiling including MSI, TMB [6]
NGS Library Prep Advanta Solid Tumor NGS Fluidigm 7 MS loci + 49 gene panel [7]
Sequencing NextSeq 500/550 Illumina 2×150 bp for MSI analysis [7]
IHC Antibodies Anti-MLH1 (clone ES05) Agilent Technologies MMR protein detection [7]
IHC Antibodies Anti-MSH2 (clone FE11) Agilent Technologies MMR protein detection [7]
IHC Antibodies Anti-MSH6 (clone EP49) Agilent Technologies MMR protein detection [7]
IHC Antibodies Anti-PMS2 (clone EP51) Agilent Technologies MMR protein detection [7]
Bioinformatics Tools MSIsensor - NGS-based MSI detection [2]
Bioinformatics Tools MIAmS - Machine learning approach for MSI [7]

The landscape of MSI detection has evolved significantly from traditional IHC and PCR methods to encompass sophisticated NGS-based approaches that provide comprehensive genomic profiling. While traditional methods maintain their role as established standards, NGS offers distinct advantages in terms of multiplexing capability, expanded genomic coverage, and elimination of the matched normal requirement. Performance validation across multiple studies demonstrates strong overall concordance between NGS and reference methods, though careful attention to platform-specific thresholds and analytical validation remains essential, particularly for non-colorectal malignancies.

The mechanistic understanding of MMR deficiency and its genomic consequences continues to expand, revealing novel biological insights with direct clinical relevance. As MSI solidifies its role as a predictive biomarker for immunotherapy response across diverse cancer types, the accurate and reliable detection of MSI status becomes increasingly critical in clinical decision-making. Future directions will likely focus on standardizing NGS-based MSI detection protocols, establishing consensus guidelines for interpretation, and further elucidating the non-canonical functions of MMR proteins in genome maintenance and cellular homeostasis.

MSI as a Pan-Cancer Predictive Biomarker for Immunotherapy

Microsatellite instability-high (MSI-H) status has emerged as a critical pan-cancer biomarker for predicting response to immune checkpoint inhibitors (ICIs), revolutionizing treatment across multiple cancer types. Microsatellites are short, repetitive DNA sequences prone to errors during replication, and MSI occurs when the DNA mismatch repair (MMR) system is compromised, leading to accumulated insertion/deletion mutations throughout the genome [2] [6]. The MSI-H phenotype results from deficient MMR (dMMR), which can arise from germline mutations (as in Lynch syndrome), somatic alterations, or epigenetic changes such as MLH1 promoter hypermethylation [6] [8].

A recent comprehensive meta-analysis of randomized clinical trials demonstrated that MSI-H status predicts exceptional benefit from immunotherapy across tumor types, with significantly improved progression-free survival (HR = 0.36, 95% CI 0.28-0.46; p < 0.0001) and overall survival (HR = 0.35, 95% CI 0.27-0.46; p < 0.00001) compared to chemotherapy [9]. This establishes MSI-H as a tumor-agnostic predictive biomarker supporting therapy selection regardless of cancer origin.

MSI Testing Methodologies: A Comparative Analysis

Multiple laboratory methods exist for determining MSI or MMR status, each with distinct technical approaches, strengths, and limitations. The following section provides a detailed comparison of these methodologies and their performance characteristics.

Comparison of MSI Detection Methods

Table 1: Comparison of primary methodologies for MSI/MMR status detection

Method Target Mechanism Advantages Limitations
Immunohistochemistry (IHC) MMR proteins (MLH1, MSH2, MSH6, PMS2) Detects loss of nuclear protein expression Cost-effective, widely available, identifies specific deficient protein [6] [8] Indirect measure of MSI; ambiguous staining interpretation challenges [6] [10]
PCR-based MSI 5-6 mononucleotide microsatellite loci Fragment length analysis of microsatellite markers Direct measure of MSI; established gold standard [6] [10] Requires matched normal tissue; limited loci; primarily validated for colorectal cancer [2] [6]
Next-Generation Sequencing (NGS) Dozens to thousands of microsatellite loci Sequencing-based analysis of length variations in repetitive regions Comprehensive genomic profiling; no matched normal required; pan-cancer applicability [2] [6] [11] Higher cost; lack of standardized thresholds across platforms [6] [10]
Performance Metrics of NGS-Based MSI Detection

Multiple studies have evaluated the analytical performance of NGS-based MSI detection compared to traditional methods. The concordance rates vary depending on the specific NGS platform and tumor types evaluated.

Table 2: Analytical performance of NGS-based MSI detection across validation studies

Study NGS Platform Reference Method Sample Size Concordance Notes
Caris Life Sciences [12] WES/WTS-based assay IHC-MMR 191,767 solid tumors 99.69% agreement Largest concordance study; led to FDA approval as CDx
FoundationOne CDx [11] F1CDx (2,000+ loci) PCR & IHC 264 (PCR) 279 (IHC) 97.7% (PCR) 97.8% (IHC) FDA-approved companion diagnostic for pembrolizumab
Illumina Panels [6] [10] TruSight Oncology 500 PCR 314 tumors AUC = 0.922 Recommended cut-off: MSI score ≥13.8%; borderline range: 8.7%-13.8%
MSIDRL Algorithm [2] 733-gene panel PCR 35,563 pan-cancer cases High concordance Novel algorithm; identified 7 optimal loci for pan-cancer MSI detection
Multi-Assay Comparison [8] VariantPlex, AVENIO, TSO500 IHC 139 tumors Strong correlation 12/139 (8.6%) MSI-H; 2 MSI-H cases showed retained MMR protein expression
Biological Basis of MSI Testing

The following diagram illustrates the molecular pathway connecting MMR deficiency to microsatellite instability and its implications for immunotherapy response:

Experimental Protocols for NGS-Based MSI Detection

FoundationOne CDx Fraction-Based MSI Analysis

The FoundationOne CDx (F1CDx) assay employs a fraction-based MSI analysis that has received FDA approval as a companion diagnostic for pembrolizumab in MSI-H solid tumors [11]. The detailed methodology includes:

  • Sample Requirements: Formalin-fixed paraffin-embedded (FFPE) tumor tissue with at least 20% nucleated tumor cells and minimum 50 ng DNA input [11]
  • Sequencing Method: Hybridization-based capture targeting 324 genes plus >2,000 microsatellite loci for comprehensive genomic profiling [11]
  • MSI Calculation: The fraction of unstable loci is calculated as (number of unstable microsatellite loci) / (number of evaluable microsatellite loci) [11]
  • Classification Thresholds:
    • MSI-H: FB-MSI score ≥ 0.0124
    • MSS: FB-MSI score ≤ 0.0041
    • MSI-Equivocal: FB-MSI score >0.0041 and <0.0124 [11]
  • Validation: Demonstrated 97.7% concordance with PCR and 97.8% with IHC across multiple solid tumor types [11]
MSIDRL Novel Algorithm Development

A large-scale retrospective study of 35,563 Chinese pan-cancer cases introduced MSIDRL, a novel NGS-based MSI detection algorithm [2]:

  • Locus Selection: Initially selected 500 most robust noncoding MS loci from colorectal circulating tumor DNA whole-exome sequencing assays [2]
  • Training Set: 105 pan-cancer FFPE samples with predefined MSI status (31 MSI-H and 74 MSI-L/MSS) by PCR [2]
  • Algorithm Development: For each locus, defined a "diacritical repeat length" (DRL) that maximizes cumulative read count difference between MSI-H and MSS samples [2]
  • Classification Metric: Unstable locus count (ULC) - the count of panel MS loci with binomial test p-values below established cutoffs [2]
  • Optimal Threshold: Determined ULC cutoff of 11 based on bimodal distribution observed across 35,563 cases [2]
Illumina Panel Validation Protocol

A real-world evaluation of Illumina's targeted NGS panels established optimized workflows for routine molecular diagnostics [6] [10]:

  • Cohort Design: 331 tumor samples representing various cancer types, with 314 meeting quality thresholds (minimum 40 usable MSI sites) [10]
  • Reference Standard: Fluorescent multiplex PCR targeting six mononucleotide repeat markers [10]
  • ROC Analysis: Demonstrated overall AUC of 0.922, with tumor-type variability (colorectal cancer: AUC = 0.867; prostate cancer: AUC = 1.00) [10]
  • Optimal Cut-off: Recommended MSI score cut-off value of ≥13.8%, with borderline range (≥8.7% to <13.8%) requiring additional confirmation [10]
  • TMB Integration: For borderline cases, incorporation of tumor mutational burden significantly improved diagnostic accuracy [10]

Essential Research Reagents and Tools

Table 3: Key research reagents and platforms for NGS-based MSI detection

Category Specific Products/Assays Primary Function Applications
NGS Panels FoundationOne CDx, TruSight Oncology 500, AVENIO CGP Kit Comprehensive genomic profiling with MSI assessment Therapy selection, clinical trial enrollment [11] [6] [8]
MSI Algorithms MSIsensor, MSIDRL, Fraction-Based Analysis Bioinformatic tools for MSI classification from NGS data MSI status determination, biomarker discovery [2] [11]
Reference Materials Promega MSI Analysis System v1.2, Bethesda Panel Gold standard reference methods for validation Assay validation, concordance testing [11] [6]
IHC Reagents MLH1 (ES05), MSH2 (FE11), MSH6 (EP49), PMS2 (EP51) antibodies Protein expression analysis for MMR deficiency Orthogonal confirmation, mechanism determination [8]

Clinical Validation and Predictive Value

Immunotherapy Response Correlations

The clinical validity of MSI-H as a predictive biomarker for immunotherapy response is well-established across multiple studies:

  • Objective Response Rates: In a retrospective analysis of KEYNOTE-158 and KEYNOTE-164 trials, patients with MSI-H tumors determined by F1CDx demonstrated an objective response rate of 43.0% to pembrolizumab [11]
  • Real-World Outcomes: Analysis of Caris' database of over 190,000 patients confirmed that dMMR/MSI-H tumors had significantly better overall and post-immunotherapy survival compared to MMR-proficient/MSI-Stable tumors [12]
  • Pan-Cancer Efficacy: Subgroup analyses of the meta-analysis data highlighted significant PFS benefits across tumor types: colorectal (HR = 0.28), gastric (HR = 0.43), and endometrial cancers (HR = 0.34) [9]
Emerging Biomarkers and Complementary Approaches

While MSI-H remains a robust biomarker, research continues to refine prediction of immunotherapy response:

  • Multi-Omic Algorithms: Tempus' Immune Profile Score (IPS) demonstrates potential for stratifying ICI outcomes in rare cancers and microsatellite stable colorectal cancer [13]
  • Tumor Mutational Burden: Ultrahigh TMB (≥40 mutations/Mb) is associated with significantly improved survival outcomes in patients treated with ICIs across multiple cancer types [13]
  • MAIT Cell Abundance: Mucosal-associated invariant T cells show varied distribution across solid tumors and correlate with clinical factors including MSI status [13]

The comprehensive analysis of MSI testing methodologies demonstrates that NGS-based approaches provide accurate, reliable detection of MSI-H status across diverse cancer types. The high concordance with traditional methods (generally >95%), combined with the ability to simultaneously assess multiple genomic biomarkers, positions NGS as a valuable tool for comprehensive molecular profiling in both clinical and research settings.

While standardization challenges remain for NGS-based MSI detection thresholds across platforms, the accumulating evidence supports its clinical utility for identifying patients likely to benefit from immunotherapy. The integration of MSI status with complementary biomarkers such as TMB and transcriptomic signatures represents the future of precision immuno-oncology, enabling more precise patient stratification and optimized treatment selection.

Microsatellite instability-high (MSI-H) is a critical biomarker in oncology, resulting from a deficient DNA mismatch repair (dMMR) system. It leads to an accelerated accumulation of mutations in repetitive genomic sequences known as microsatellites. The clinical significance of MSI-H has expanded beyond its prognostic value to become a key predictor of response to immune checkpoint inhibitors across multiple cancer types. For researchers and drug development professionals, understanding the epidemiological landscape of MSI-H—its prevalence across different cancers and the methodologies for its detection—is fundamental to advancing precision oncology. This guide provides a comprehensive analysis of MSI-H prevalence data and objectively compares the performance of various detection technologies, with a specific focus on the evolving role of next-generation sequencing (NGS) in a pan-cancer context.

MSI-H Prevalence Across Major Cancer Types

The prevalence of MSI-H varies significantly across different cancer types. Table 1 consolidates data from large-scale studies to provide a comparative overview of MSI-H frequency in major cancers.

Table 1: MSI-H Prevalence Across Different Cancer Types

Cancer Type MSI-H Prevalence (%) Data Source Sample Size
Endometrial Cancer 16.85% - 31.2% Real-world PCR study [14], NGS study [15] 1,389 (PCR)
Colorectal Cancer 3.78% - 15%* Real-world PCR study [14], CRC resource [16] 10,226 (PCR)
Gastric Cancer 6.74% - 8.6% Real-world PCR study [14], NGS study [15] 1,929 (PCR)
Small Intestinal Cancer 8.63% Real-world PCR study [14] Not specified
Duodenal Cancer 5.60% Real-world PCR study [14] Not specified
Adenocortical Carcinoma Notable (Precise % not stated) TCGA/TARGET analysis [17] 92 (NGS)
Cervical Cancer Notable (Precise % not stated) TCGA/TARGET analysis [17] 305 (NGS)
Mesothelioma Notable (Precise % not stated) TCGA/TARGET analysis [17] 83 (NGS)
Breast Cancer 0.63% Plasma-based NGS study [18] 6,718 (NGS)
All Solid Tumors (Pan-Cancer) 3.72% Real-world PCR study [14] 26,237 (PCR)
39 Cancer Types (Pan-Cancer) 3.8% TCGA/TARGET analysis [17] 11,139 (NGS)

*The 15% figure for colorectal cancer is cited by a patient resource [16], while the 3.78% comes from a real-world study that likely tested primarily advanced-stage patients [14], highlighting how prevalence can vary by clinical context.

The data reveals that MSI-H is most frequently observed in endometrial, colorectal, and gastric cancers. A large-scale real-world study of 26,237 samples using PCR-based testing found MSI-H in 30 different cancer types, with the highest frequencies in endometrial (16.85%), small intestinal (8.63%), and gastric (6.74%) cancers [14]. Furthermore, a recent NGS-based study of 35,563 Chinese pan-cancer cases classified cancers into epidemiological clusters, identifying uterine, gastric, and bowel cancers as common cancers with high MSI-H prevalence, together contributing approximately 80% of all MSI-H cases [15] [2].

It is crucial for researchers to note that prevalence can differ within cancer subtypes. For instance, the same NGS study found a statistically significant difference in MSI-H prevalence between colon cancer (10.66%) and rectal cancer (2.19%) [15]. Demographic factors also play a role; the real-world PCR study found a significantly higher MSI-H frequency in female patients (4.75%) compared to males (2.62%), and a higher frequency in younger patients (<40 years) and older patients (≥80 years) [14].

Understanding future epidemiological trends is vital for health system planning and resource allocation for targeted therapies. A statistical modelling study projected the prevalence of solid tumors with key pan-tumor biomarkers, including MSI, in Australia up to 2042 [19] [20].

The study estimated that the 5-year prevalence of individuals diagnosed with any solid cancer (regardless of biomarker status) will increase by 54.2%, from 438,346 in 2018 to 675,722 in 2042, driven largely by population growth and ageing [19]. Accordingly, the 5-year prevalence of individuals with advanced disease at diagnosis whose tumors exhibit MSI is projected to increase from 2,484 to 3,553 [19]. This represents about 2.3% of the 5-year prevalence of individuals with advanced disease at diagnosis, underscoring that while the absolute number is rising, MSI-H remains a biomarker present in a minority of advanced cancer patients [19] [20].

Detection Methodologies: A Technical Comparison

Accurate determination of MSI status is critical for therapeutic decision-making. The dominant methodologies are immunohistochemistry (IHC), polymerase chain reaction (PCR), and next-generation sequencing (NGS). Table 2 provides a technical comparison of these approaches.

Table 2: Comparison of MSI-H Detection Methodologies

Feature IHC (Immunohistochemistry) PCR (Polymerase Chain Reaction) NGS (Next-Generation Sequencing)
Detection Principle Indirect protein-level detection of MMR deficiency (MLH1, MSH2, MSH6, PMS2) [15] Direct DNA-level detection of length shifts in microsatellite loci [15] Direct DNA-level detection of length shifts and sequence variations in microsatellite loci [15] [21]
Typical Markers 4 MMR proteins 5-6 mononucleotide repeats (e.g., Promega panel) [15] [22] Varies; from 7 [15] to 9 [21] to 100+ [15] loci
Tissue Requirements Tumor tissue (normal not always required) Paired tumor-normal tissue Can use paired tumor-normal or tumor-only with sophisticated algorithms
Key Advantage Low cost, widely available, identifies specific deficient protein Established "gold standard," high concordance with IHC in CRC [15] High sensitivity/specificity, pan-cancer applicability, integrates with other genomic profiling (TMB, SNVs) [15] [21]
Key Limitation ~5-11% of MSI cases caused by non-truncating mutations can yield false negatives [15] Primarily validated for colorectal cancer; performance in other cancers may be suboptimal [15] Higher cost, computational complexity, bioinformatics expertise required
Reported Concordance with PCR ~97% (in colorectal cancer) [15] N/A 99.4% in CRC/Endometrial; 96.6% in other cancers [15]

The following workflow diagram illustrates the general process for developing and validating an NGS-based MSI assay, as exemplified by recent studies:

NGS-Based MSI Assay Development Workflow

Experimental Protocols and Data Analysis

This section details specific experimental workflows from key studies to provide a reference for researchers designing similar validation protocols.

Large-Scale NGS Validation Protocol

A 2024 study analyzing 35,563 pan-cancer cases developed a novel NGS-based MSI detector called MSIDRL [15] [2]. The experimental protocol was as follows:

  • Locus Selection & Panel Design: Initially, 500 robust noncoding microsatellite loci were selected from circulating tumor DNA whole-exome sequencing assays. Capture probes were designed for these loci to create a prototype panel [15].
  • Algorithm Training: A training set of 105 pan-cancer FFPE samples with pre-defined MSI status (31 MSI-H, 74 MSI-L/MSS) via PCR was sequenced with the prototype panel. For each locus, a "diacritical repeat length" (DRL) was defined as the repeat length that maximized the cumulative read count difference between MSI-H and MSI-L/MSS samples [15] [2].
  • Statistical Classification: Reads for each locus in a sample were classified as "stable" (longer than DRL) or "unstable" (shorter than or equal to DRL). A binomial test was used to compare the sample's unstable read fraction against a background noise level calculated from MSI-L/MSS samples [15] [2].
  • Final Panel Construction: The top 100 most sensitive loci were selected to form the final panel. The final MSI status was determined by counting the number of unstable loci (ULC), with a ULC ≥11 classifying a sample as MSI-H [15] [2].

Concordance Testing Protocol

A common method for validating new NGS-based MSI tests is to assess their concordance against established methods (PCR or IHC). A study describing a highly sensitive pan-cancer test using novel long mononucleotide repeat (LMR) markers provides a template for this validation [22]:

  • Sample Cohort: 469 tumor samples across 20 cancer types, including 319 from patients with Lynch syndrome, were assembled [22].
  • Reference Standard: dMMR status as determined by immunohistochemistry was used as the reference standard [22].
  • Testing: All samples were tested with the novel LMR MSI Analysis System and the results were compared against both the reference standard and a established PCR method (the Promega pentaplex panel) [22].
  • Metrics Calculation: Sensitivity for detecting dMMR was calculated as 99% for CRC and 96% for non-CRC tumors. The overall percent agreement between the LMR and Promega panels was 99% for CRC and 89% for non-CRC tumors [22].

The Scientist's Toolkit: Essential Research Reagents and Materials

For scientists embarking on MSI-H research, selecting the appropriate tools is critical. The following table details key reagents and their functions as derived from the analyzed experimental protocols.

Table 3: Essential Research Reagents and Materials for MSI-H Studies

Reagent / Material Critical Function Examples from Literature
FFPE Tumor Tissue Source of tumor DNA; standard for clinical validation studies. Used in large-scale NGS study [15] and LMR panel validation [22].
Matched Normal Sample Germline DNA control for distinguishing somatic mutations. Blood or tumor-adjacent tissue used in NGS study with paired samples [21].
DNA Extraction Kit Isolate high-quality DNA from tissue or blood samples. QIAsymphony DNA Mini kit (Qiagen) [14]; QIAamp DNA FFPE Tissue Kit (Qiagen) [21].
Targeted NGS Panel Captures microsatellite loci and/or genes of interest for sequencing. Custom 733-gene panel [15]; custom 2.2 Mb panel [21]; 74-gene cfDNA panel (Guardant360) [18].
MSI Caller Software Bioinformatics algorithm to analyze NGS data and determine MSI status. MANTIS [17], MSIsensor [15], MSIDRL (custom algorithm) [15].
IHC Antibodies Detect expression of MMR proteins (MLH1, MSH2, MSH6, PMS2). IHC kits (OriGene) used for dMMR validation [21].
PCR-Based MSI Kit Gold standard method for validating NGS-based MSI calls. MSI detection kit (Microread Genetics) [21]; Promega pentaplex MSI panel [22].
(R)-Nepicastat hydrochloride(R)-Nepicastat hydrochloride, MF:C14H16ClF2N3S, MW:331.8 g/molChemical Reagent
Nitrovin hydrochlorideNitrovin hydrochloride, MF:C14H13ClN6O6, MW:396.74 g/molChemical Reagent

The epidemiology of MSI-H is characterized by its variable prevalence across cancer types, with the highest frequencies observed in endometrial, colorectal, and gastric cancers. The projected increase in the absolute number of patients with MSI-H tumors underscores the growing importance of this biomarker. While IHC and PCR remain foundational clinical methods, NGS-based detection is increasingly demonstrating high concordance with these standards, offering the significant advantages of pan-cancer applicability and integration with comprehensive genomic profiling. For researchers and drug developers, the choice of detection methodology must be guided by the specific research question, considering factors such as the cancer types under investigation, the required throughput, and the need for concomitant genomic data. The continued refinement of NGS protocols and algorithms, as evidenced by recent large-scale studies, promises to further enhance the sensitivity and specificity of MSI-H detection, thereby optimizing patient selection for immunotherapy.

NGS-Based MSI Detection: Algorithms, Panels, and Workflow Integration

Microsatellite Instability (MSI) has emerged as a pivotal biomarker in oncology, transitioning from a diagnostic marker for Lynch syndrome to a pan-cancer predictive biomarker for response to immune checkpoint inhibitor therapy [15] [23]. Microsatellites, or short tandem repeats (STRs), are short repetitive DNA sequences (1-6 nucleotide units) ubiquitous throughout the human genome [15] [2]. When the DNA mismatch repair (MMR) system is compromised, errors accumulate during DNA replication, leading to insertions or deletions at these microsatellite loci—a phenomenon recognized as MSI [23]. The detection of high-level MSI (MSI-H) is critically important as it identifies patients who may benefit significantly from immunotherapy, regardless of their cancer type [23] [6].

The evolution of MSI detection methodologies reflects major technological shifts in molecular pathology. For years, fragment analysis via PCR (PCR-MSI) and immunohistochemistry (IHC) for MMR proteins were the established gold standards [15] [6]. While these methods remain valuable, the advent of next-generation sequencing (NGS) has introduced a paradigm shift, offering unprecedented scalability, multiplexing capability, and compatibility with comprehensive genomic profiling [23] [24]. This guide provides an objective comparison of these technologies, focusing on the core principles and performance data essential for research and diagnostic applications.

Traditional Methodologies: The Foundational Standards

Polymerase Chain Reaction (PCR) with Fragment Analysis

The PCR-based method directly assesses the integrity of the MMR system by detecting length alterations in specific microsatellite regions.

  • Core Principle: This technique utilizes fluorescently labeled primers to co-amplify a panel of microsatellite markers (typically 5-8 loci, such as the Bethesda or Promega panels) from both tumor and matched normal DNA. The amplified fragments are separated by capillary electrophoresis, and the resulting chromatograms are compared. Instability is declared when novel peaks, representing insertions or deletions, appear in the tumor sample that are absent in the normal sample [23] [6].
  • Key Characteristics:
    • Loci Interrogated: Limited panel of 5-8 markers [23].
    • Throughput: Moderate, limited by the number of markers.
    • Tissue Requirement: Requires matched normal tissue for accurate interpretation to distinguish MSI from natural polymorphisms [6].

Immunohistochemistry (IHC)

IHC provides an indirect assessment of MMR function by evaluating the expression of key proteins in the pathway.

  • Core Principle: This method uses antibodies to detect the nuclear presence of the four core MMR proteins: MLH1, MSH2, MSH6, and PMS2. Loss of nuclear expression in tumor cells, while internal control cells (e.g., stromal cells) retain staining, indicates a deficient MMR system (dMMR), which is highly correlated with MSI-H status [15] [6].
  • Key Characteristics:
    • Target: Protein expression, not DNA sequence.
    • Advantage: Can pinpoint the specific affected MMR protein.
    • Limitation: Approximately 5-11% of MSI-H samples are caused by non-truncating mutations that result in dysfunctional but antigenically intact proteins, leading to false-negative results [15].

Table 1: Comparison of Traditional MSI/MMR Detection Methods

Feature PCR-Fragment Analysis Immunohistochemistry (IHC)
Analytical Target DNA length alterations at microsatellite loci Protein expression of MMR genes (MLH1, MSH2, MSH6, PMS2)
Primary Output Microsatellite Instability (MSI) status Mismatch Repair (MMR) proficiency status
Key Advantage Direct measurement of genomic instability Identifies specific deficient protein
Key Limitation Limited loci; requires matched normal False negatives with non-truncating mutations [15]
Concordance ~97% with IHC [15] ~97% with PCR [15]

The NGS Paradigm: Expanded Loci and Computational Analysis

Next-generation sequencing has redefined MSI detection by moving from a handful of markers to the simultaneous analysis of hundreds to thousands of microsatellite loci, which can be integrated into larger gene panels for a comprehensive genomic portrait of a tumor [15] [23].

Core Technical Principles of NGS-based MSI

NGS-based methods detect MSI by identifying shifts in the length distribution of microsatellite loci from sequencing data. The general workflow involves:

  • Library Preparation & Target Capture: DNA is extracted, and sequencing libraries are prepared. Hybridization capture probes are used to enrich thousands of genomic regions, including hundreds of microsatellite loci [23].
  • Sequencing: The enriched libraries are sequenced on a platform such as Illumina or PacBio.
  • Bioinformatic Analysis: Custom algorithms analyze the sequenced reads to quantify the level of instability.

Several sophisticated algorithms have been developed for this purpose, each with a unique approach to calculating an MSI score:

  • MSIsensor: Quantifies the percentage of unstable microsatellite loci by comparing the length distribution of microsatellites between tumor and normal samples [23].
  • MANTIS (Microsatellite Analysis for Normal-Tumor InStability): Calculates an instability score based on the average squared difference in allele fractions between tumor and normal pairs across a large set of loci [23].
  • mSINGS (Microsatellite Instability by Next-Generation Sequencing): Determines the percentage of unstable loci in a tumor sample by comparing them to a baseline established from control samples, enabling tumor-only analysis [24].
  • MSIDRL (Diacritical Repeat Length): A novel algorithm that defines a "diacritical repeat length" for each locus. It classifies reads as "stable" or "unstable" based on this threshold and uses a binomial test to determine instability, reporting an Unstable Locus Count (ULC) [15].
  • MSIdetect: Employs a curve-fitting algorithm tailored for tumor-only data, modeling the impact of indel burden and tumor content on read coverage at a curated set of homopolymer regions [24].

Key Advantages of NGS-Based MSI Detection

  • Expanded Loci Interrogation: NGS can evaluate hundreds to thousands of microsatellite loci, vastly improving statistical power and robustness compared to traditional 5-marker panels [15] [23].
  • Pan-Cancer Applicability: While traditional PCR panels were optimized for colorectal cancer, NGS panels with broad locus coverage demonstrate high accuracy across diverse cancer types, including endometrial, gastric, and more [15] [24].
  • Integration with Comprehensive Profiling: An NGS test can simultaneously determine MSI status, tumor mutational burden (TMB), single nucleotide variants (SNVs), and copy number variations (CNVs) from a single sample, optimizing tissue use and cost-efficiency [23] [6].
  • Tumor-Only Analysis Capability: Advanced algorithms like mSINGS and MSIdetect allow for accurate MSI detection even without a matched normal sample, addressing a common clinical challenge [24].

Diagram 1: Core NGS-based MSI detection workflow, highlighting the key wet-lab and bioinformatics steps, and the decision point for using a matched normal sample.

Performance Comparison: NGS vs. Traditional Methods

Large-scale retrospective studies and real-world evaluations provide robust data on the performance of NGS-based MSI testing compared to the traditional gold standards.

Concordance with Traditional Methods

A large-scale study of 35,563 pan-cancer cases using an NGS-based method (MSIDRL) demonstrated the capability of NGS to accurately classify MSI status across a wide spectrum of malignancies [15]. Another real-world study of 314 tumors compared Illumina's targeted NGS panels (TruSight Tumor 170/Oncology 500) against PCR-MSI, showing an overall high concordance with an Area Under the Curve (AUC) of 0.922 [6]. However, performance varies by cancer type, as seen in Table 2.

Table 2: Diagnostic Performance of NGS-based MSI Detection Across Cancer Types (vs. PCR-MSI)

Cancer Type Sample Size (n) AUC Notes
Overall Pan-Cancer 314 0.922 High concordance with reference PCR [6]
Prostate Cancer 58 1.00 Perfect agreement in this cohort [6]
Biliary Tract Cancer 11 1.00 Perfect agreement, though sample size is small [6]
Colorectal Cancer (CRC) 201 0.867 Good but lower accuracy; broader score variability [6]
Endometrial Cancer 88* 0.9926* High accuracy; other studies note it can be a challenging type [24] [25]
Stomach Adenocarcinoma 428* 1.00* Excellent performance in validation cohort [24]

Note: Data for endometrial and stomach cancer marked with * are from a separate validation study using MSIdetect and TCGA WES data [24].

Addressing Discordance and Optimizing Panels

Despite high overall concordance, discrepancies between NGS and PCR do occur, particularly in non-colorectal cancers [15]. These can arise from:

  • Tissue-of-Origin Effects: The baseline stability of microsatellites can vary between different healthy tissues [24].
  • Ethnicity-Specific Polymorphisms: Natural variations in microsatellite length across populations can affect results, especially in tumor-only analyses [24].
  • Panel Optimization: Many PCR panels were designed specifically for colorectal cancer, and their performance may be suboptimal in other cancer types [15] [24].

Research has shown that curating optimized panels of microsatellite loci can significantly improve pan-cancer performance. For instance, one study distilled a set of 7 highly sensitive MS loci suitable for pan-cancer MSI detection, while others have identified ~100 homopolymer regions that are minimally variable between tissues and individuals, enhancing the accuracy of tumor-only algorithms like MSIdetect [15] [24].

Comparing NGS Platforms: Illumina vs. PacBio

The choice of sequencing platform can significantly impact MSI detection accuracy, particularly in challenging homopolymer-rich regions.

  • Illumina (Sequencing by Synthesis - SBS): The dominant short-read sequencing technology, known for its high throughput and scalability. It achieves high base-level accuracy, with the majority of bases exceeding Q30 (99.9% accuracy) [26].
  • PacBio (Sequencing by Binding - SBB): A newer short-read technology that decouples nucleotide interrogation and incorporation. This reduces errors in repetitive regions and results in a higher theoretical accuracy, with scores up to Q40 (99.99% accuracy) [27] [28]. PacBio also offers long-read sequencing (HiFi), which is excellent for complex regions but is not the primary focus for targeted MSI panels.

Performance in Microsatellite Regions

Homopolymers are notoriously difficult for conventional SBS to sequence accurately due to a phenomenon known as "phasing." Preliminary data from the DOvEEgene project suggests that the PacBio Onso system (using SBB) identifies more unstable microsatellite loci in targeted panel regions than SBS methods [27]. This is attributed to reduced phasing issues and more accurate enumeration of homopolymer length, which is the fundamental basis for MSI detection [27].

A recent benchmark by the Genome in a Bottle consortium further demonstrated that the PacBio Onso system had the lowest mismatch rate among all short-read technologies evaluated on a matched tumor-normal pair [28]. This lower error rate is crucial for confidently detecting the subtle insertion/deletion signatures that define MSI.

Table 3: Comparison of NGS Platforms for MSI Detection

Feature Illumina (SBS) PacBio (SBB - Onso)
Core Technology Sequencing by Synthesis Sequencing by Binding
Typical Read Length Short-reads Short-reads
Theoretical Accuracy >Q30 (99.9%) [26] Up to Q40 (99.99%) [27]
Throughput & Scalability Very High High
Key Advantage for MSI High throughput, established workflows, lower cost Superior accuracy in homopolymers, lower false positives [27]
Reported Benefit Standard for most clinical NGS panels Improved detection of unstable loci [27]

Essential Protocols and Research Reagents

Detailed Experimental Protocol: NGS-based MSI Workflow

The following protocol is adapted from established methodologies for targeted sequencing-based MSI detection [23].

  • DNA Extraction:

    • Use dedicated kits for your sample type (e.g., QIAamp DNA FFPE Tissue Kit for FFPE samples or QIAamp DNA Blood Mini Kit for blood).
    • Quantify DNA using a fluorometer (Qubit) and assess quality/fragment size via electrophoresis (TapeStation).
  • Library Preparation:

    • Fragment genomic DNA to a desired size (e.g., 200-300bp).
    • Perform end-repair, A-tailing, and ligation of sequencing adapters.
    • Clean up libraries using magnetic beads.
  • Target Enrichment:

    • Hybridize the library to biotinylated probes designed to capture a panel of microsatellite loci (e.g., 100 loci) often combined with a larger gene panel (e.g., 1 Mb).
    • Capture probe-bound fragments using streptavidin-coated magnetic beads.
    • Wash away non-specific fragments and amplify the captured library via PCR.
  • Sequencing:

    • Pool the enriched libraries.
    • Sequence on an appropriate NGS platform (e.g., Illumina NextSeq, PacBio Onso) to achieve high coverage (e.g., >500x) over the target regions.
  • Bioinformatic Analysis (Using MANTIS as an example):

    • Sequence Alignment: Map sequencing reads to the human reference genome (e.g., hg19) using an aligner like BWA-MEM.
    • Microsatellite Locus Analysis: For each target microsatellite locus, count the number of reads supporting each observed allele length in both the tumor and matched normal samples.
    • MSI Score Calculation (MANTIS): For each locus, calculate the divergence from the normal sample: M = (Tumor_Allele_Fraction - Normal_Allele_Fraction)². The MANTIS score is the average of these squared differences across all analyzed loci.
    • Classification: A sample is typically classified as MSI-H if the MANTIS score exceeds a pre-defined threshold (e.g., 0.4) [23].

The Scientist's Toolkit: Key Research Reagents & Solutions

Table 4: Essential Reagents and Materials for NGS-based MSI Detection

Item Function / Application Example Products
DNA Extraction Kits Isolate high-quality DNA from FFPE tissue or blood. QIAamp DNA FFPE Tissue Kit, QIAamp DNA Blood Mini Kit [23]
Library Prep Kit Prepare sequencing libraries for NGS. Illumina DNA Prep, KAPA HyperPrep
Target Capture Panels Enrich microsatellite loci and cancer genes. Custom-designed panels (e.g., 733-gene panel with 100 MS loci [15]), TruSight Oncology 500 [6]
NGS Platforms Perform high-throughput sequencing. Illumina NovaSeq/PacBio Onso [27] [26]
QC Instruments Assess DNA/RNA quality, quantity, and library fragment size. Agilent TapeStation, Qubit Fluorometer [23]
Bioinformatics Tools Analyze sequencing data and determine MSI status. MANTIS [23], MSIsensor [23], MSIdetect [24], mSINGS [24]
Ceritinib dihydrochlorideCeritinib dihydrochloride, CAS:1190399-48-4, MF:C28H38Cl3N5O3S, MW:631.1 g/molChemical Reagent

The transition from fragment analysis to NGS sequencing of microsatellite loci represents a significant advancement in molecular oncology. NGS offers a powerful, multiplexed approach that provides high accuracy, pan-cancer applicability, and the unique ability to integrate MSI status with other genomic biomarkers like TMB in a single assay.

While traditional methods retain their value in specific contexts, the expanded locus coverage, computational sophistication, and integrative potential of NGS make it the superior tool for research and clinical environments aiming for a comprehensive genomic profile. The emerging data from highly accurate sequencing platforms like PacBio Onso further suggests that the field will continue to evolve towards even greater precision in detecting this critical biomarker, ultimately helping to identify more patients who may benefit from targeted immunotherapies.

The evolution of molecular diagnostics has witnessed a pivotal shift from traditional, limited biomarker sets to comprehensive pan-cancer panels that leverage next-generation sequencing (NGS). This transition, particularly evident in microsatellite instability (MSI) testing, is driven by the need for expanded genomic coverage that enables simultaneous detection of diverse variant types across multiple cancer types. Pan-cancer panels address critical limitations of conventional methods like polymerase chain reaction (PCR) and immunohistochemistry (IHC) by integrating analysis of single-nucleotide variants (SNVs), insertions/deletions (indels), copy number variations (CNVs), gene fusions, and key biomarkers such as MSI and tumor mutational burden (TMB) within a single workflow. This guide objectively compares the performance of these emerging comprehensive panels against traditional approaches, providing experimental validation data and methodological insights to inform research and drug development in precision oncology.

Traditional cancer biomarker analysis has relied on modality-specific assays with limited genomic coverage. PCR-based MSI testing typically examines only five mononucleotide markers, while IHC assesses protein expression of four mismatch repair (MMR) genes [6]. Similarly, initial NGS panels targeted small genomic regions (∼35 kb) covering hotspots in approximately 48 genes [29]. While these methods provide valuable data, their narrow focus necessitates multiple parallel tests to obtain comprehensive genomic information, consuming precious tumor samples and potentially missing clinically relevant alterations.

Pan-cancer panels represent a paradigm shift, systematically expanding loci coverage to address the distinct molecular landscapes of diverse cancers. For instance, the SJPedPanel covers 5,275 coding exons, 297 intronic regions for fusion detection, and 7,590 polymorphic sites for copy-number alteration analysis—representing approximately 0.15% of the human genome yet capturing the majority of pediatric cancer drivers [30]. Similarly, the TruSight Oncology 500 (TSO500) assay targets 523 genes for SNVs/indels, 69 genes for CNVs, 55 genes for fusions, plus MSI and TMB biomarkers [31]. This expanded coverage is particularly crucial for childhood cancers, where 62% of driver alterations are CNVs or structural variations with boundaries typically falling outside protein-coding regions [30].

Performance Comparison: Quantitative Data Analysis

Table 1: Analytical Performance Comparison Between Traditional and Pan-Cancer NGS Methods for MSI Detection

Method Characteristic Traditional PCR (Bethesda Panel) NGS Pan-Cancer Panels Performance Implications
Loci Number 5 mononucleotide markers [6] ~100-1,880 loci [2] [32] Increased statistical power and reliability
Analytical Concordance Reference standard 92.2%-99.4% [6] [2] High reliability for clinical decision-making
Optimal Score Threshold Instability at ≥2 loci [6] MSI score ≥13.8% [6] Objective, quantitative classification
Tumor Types Validated Primarily colorectal [2] Pan-cancer (≥13 tumor types) [32] Broad research application
Normal Tissue Requirement Required [6] Not required [6] Simplified workflow with limited samples

Table 2: Comprehensive Genomic Profiling Capabilities of Expanded Panels

Genomic Feature Traditional Hotspot Panels Comprehensive Panels (e.g., TSO500) Research Advantage
Genes Interrogated 48 genes [29] 523 genes [31] Unbiased discovery of novel biomarkers
Variant Types Detected SNVs, indels [29] SNVs, indels, CNVs, fusions, MSI, TMB [31] Comprehensive variant profiling from single assay
Target Region Size ~35 kb [29] ~1.5-2.2 Mb [32] [31] Enhanced TMB calculation accuracy
DNA Input Requirement >250 ng [29] 80 ng [31] Suitable for limited/precious samples
Fusion Detection Not available 55 driver genes via RNA sequencing [31] Identifies novel fusion partners

The expanded loci coverage of pan-cancer panels directly translates to enhanced detection capabilities. Validation studies demonstrate that panels covering >400 genes with at least 1.5 Mb of sequence content can accurately calculate TMB, a critical immunotherapy biomarker [31]. For MSI detection, NGS methods examining hundreds of loci show 97.0% sensitivity and >95.0% specificity compared to reference PCR methods [32], with overall concordance of 92.2% (AUC=0.922) across multiple tumor types [6]. This performance is maintained even in challenging samples, with the SJPedPanel detecting approximately 95% of variants at allele fractions as low as 0.5% [30].

Experimental Protocols and Validation Methodologies

MSI Detection Using NGS Panels

Protocol Principle: NGS-based MSI detection quantifies length variability at numerous microsatellite loci compared to the reference genome, without requiring matched normal tissue [6] [32].

Step-by-Step Workflow:

  • Loci Selection: Identify 100-1,880 mononucleotide homopolymer loci (7-39 bp repeats) from targeted sequencing data, prioritizing intronic regions with reference lengths of 10-20 bp for optimal alignment [32].
  • Library Preparation: Extract DNA (minimum 80 ng) from FFPE samples. Fragment DNA, ligate adapters, and perform hybrid capture using panel-specific probes (e.g., TSO500) [31].
  • Sequencing: Sequence on Illumina platforms (NovaSeq, NextSeq) to achieve minimum median depth of 250× at each microsatellite locus [32].
  • Variant Calling: For each locus, analyze all reads spanning the repeat region to determine allelic length distribution. Calculate mean and variance of allelic lengths for each sample [32].
  • MSI Scoring: Apply principal component analysis (PCA) to combine mean and variance information across all loci. Use the first principal component (PC1) as the MSI score, which typically explains 45% of data variance [32].
  • Classification: Classify samples as MSI-High (MSI-H) using validated score thresholds (e.g., ≥13.8%), with borderline ranges (≥8.7% to <13.8%) requiring additional TMB integration for accurate classification [6].

Validation Approach: Validate against reference PCR methods using 314 tumor samples across multiple cancer types. ROC curve analysis demonstrates AUC values of 0.922 overall, with perfect agreement (AUC=1.00) in prostate and biliary tract cancers [6].

Comprehensive Genomic Profiling Validation

Protocol Principle: Simultaneously detect multiple variant types (SNVs, indels, CNVs, fusions) and biomarkers (MSI, TMB) from single DNA and RNA inputs [31].

Step-by-Step Workflow:

  • Sample Preparation: Extract DNA (80 ng) and RNA (40 ng) from FFPE samples with tumor content as low as 10%. Quantify using fluorometric methods (Qubit) [31].
  • Library Preparation: Process DNA and RNA separately using hybrid capture-based targeting (TSO500). Enrich target regions covering 523 genes (DNA) and 55 fusion driver genes (RNA) [31].
  • Sequencing: Sequence on Illumina platforms to achieve sufficient depth (>100×) for variant detection and biomarker assessment [31].
  • Variant Calling: Use integrated bioinformatics pipeline to call:
    • SNVs/indels at ≥5% variant allele frequency (VAF)
    • CNVs with ≥2.3-fold change
    • Gene fusions with ≥5 supporting reads [31]
  • Biomarker Calculation:
    • MSI: Analyze 1,880 homopolymer loci using PCA method [32]
    • TMB: Calculate as mutations per megabase (mut/Mb) from coding region variants [31]
  • Data Integration: Combine all variant types and biomarkers into comprehensive genomic profile.

Validation Approach: Perform extensive validation using 170 clinical samples. Assess precision, accuracy, sensitivity, specificity, and limit-of-detection for all variant types. Demonstrate >99% concordance for SNVs, indels, CNVs, and fusions; >99% sensitivity/specificity for MSI; and reliable TMB measurement around validated thresholds [31].

Diagram 1: Integrated Workflow for Pan-Cancer Genomic Profiling. Comprehensive panels simultaneously extract multiple data types from single tumor samples, enabling efficient biomarker discovery.

Advantages in Research and Diagnostic Applications

Enhanced Detection of Rare and Novel Variants

The expanded coverage of pan-cancer panels significantly improves detection of rare variants and novel biomarkers. In pediatric cancers, the SJPedPanel covers 86% of pathogenic variants, including 82% of rearrangements responsible for fusion oncoproteins—alterations frequently missed by exome-only approaches [30]. Similarly, large-scale pan-cancer analysis of 35,563 cases identified novel MSI-associated genes and variants, including a frequent deletion in ACVR2A (detected in 66.6% of MSI-H cases) that may represent a new biomarker [2].

The increased genomic coverage also enables more accurate tumor mutation burden calculation, which requires panels covering at least 1.5 Mb [31]. This comprehensive profiling reveals pathway-level insights, with MSI-H tumors showing selective enrichment of alterations in WNT, phosphatidylinositol 3-kinase, and NOTCH signaling pathways across multiple cancer types [32].

Application to Challenging Research Samples

Pan-cancer panels demonstrate superior performance with low-quality or limited input samples common in research settings. The SJPedPanel reliably detects low-frequency drivers (∼80% detection at allele fraction 0.2%) in morphologic remission samples, enabling minimal residual disease monitoring [30]. For FFPE samples with tumor content as low as 10%, the TSO500 assay maintains >99% sensitivity and specificity for most variant types with input requirements of only 80 ng DNA and 40 ng RNA [31].

This sensitivity extends to fusion detection, where RNA-based NGS analysis identifies both known and novel fusion partners, replacing labor-intensive FISH assays that require multiple probes and tissue sections [31]. The ability to simultaneously detect SNVs, CNVs, fusions, MSI, and TMB from single limited samples makes comprehensive panels particularly valuable for rare tumor types or precious biobank specimens.

Research Reagent Solutions

Table 3: Essential Research Tools for Implementing Expanded Pan-Cancer Panels

Reagent/Resource Function Application in Validation
TruSight Oncology 500 (Illumina) Comprehensive genomic profiling Detects SNVs, indels, CNVs, fusions, MSI, TMB in 523 genes [31]
FFPE DNA/RNA Extraction Kits (Promega) Nucleic acid isolation from archival samples Extract DNA/RNA from FFPE samples with 10-90% tumor content [31]
Qubit Fluorometer (Thermo Fisher) Accurate nucleic acid quantification Measure DNA/RNA concentration prior to library preparation [31]
Twist Targeted Enrichment Hybrid capture-based target enrichment Covers exons, UTRs, and relevant intron regions in pan-cancer panels [29]
MSI-PCR Reference Kit (Promega) Orthogonal MSI validation Five mononucleotide marker panel for method comparison [6]
Cell Line Controls (e.g., COLO829) Assay performance monitoring Gold-standard cell lines for sensitivity and limit-of-detection studies [30]

The expanded loci coverage offered by pan-cancer marker panels represents a significant advancement over traditional limited biomarker sets. By comprehensively targeting hundreds of genes and thousands of genomic regions, these panels enable simultaneous detection of multiple variant types and biomarkers from single assays, providing researchers with a more complete molecular portrait of tumors. Validation studies consistently demonstrate superior analytical performance, with high concordance to traditional methods while expanding capabilities to include TMB assessment, fusion detection, and CNV analysis. Particularly for immunotherapy research, where MSI and TMB serve as key predictive biomarkers, comprehensive panels offer a practical solution for pan-cancer biomarker assessment across diverse malignancy types. As precision oncology continues to evolve toward tissue-agnostic treatment approaches, these expanded panels will play an increasingly vital role in biomarker discovery and validation, ultimately accelerating the development of targeted therapies and immunotherapies across the cancer spectrum.

The advent of next-generation sequencing (NGS) has revolutionized cancer molecular profiling by enabling the simultaneous assessment of critical biomarkers from a single assay. This guide provides a comparative analysis of methodologies for the integrated analysis of microsatellite instability (MSI), tumor mutational burden (TMB), and somatic mutations. We evaluate the performance of various NGS-based approaches against traditional techniques, presenting experimental data on concordance rates, sensitivity, and specificity. The data demonstrates that NGS panels offer a comprehensive solution for biomarker identification, with significant implications for immunotherapy response prediction and patient stratification in oncology research and drug development.

Next-generation sequencing (NGS) has emerged as a powerful platform for comprehensive genomic profiling in oncology, allowing researchers to interrogate multiple biomarkers simultaneously from limited tissue samples. The integration of microsatellite instability (MSI), tumor mutational burden (TMB), and somatic mutation analysis provides a holistic view of the tumor genomic landscape, with particular relevance for predicting response to immune checkpoint inhibitors (ICIs) [33] [32]. MSI is a genomic signature resulting from deficiency in the DNA mismatch repair (MMR) system, characterized by increased insertion-deletion mutations in short tandem repeat sequences [34]. TMB quantifies the total number of somatic mutations per megabase of DNA, serving as an indicator of genomic instability and potential neoantigen load [35] [36]. Together with specific somatic mutations, these biomarkers provide complementary information for therapeutic decision-making and clinical trial design.

While immunohistochemistry (IHC) for MMR proteins and PCR-based MSI testing have historically been the gold standards, NGS-based approaches offer distinct advantages for research applications, including higher throughput, ability to analyze multiple biomarkers from a single assay, and applicability across diverse cancer types [24] [2] [12]. This guide objectively compares the performance characteristics of various NGS methodologies against traditional techniques and each other, providing researchers with experimental data to inform their profiling platform selection.

Methodological Comparison of MSI Detection Techniques

Traditional versus NGS-Based MSI Detection

Multiple methodologies exist for detecting microsatellite instability, each with distinct technical approaches and performance characteristics. Traditional methods include immunohistochemistry (IHC), which detects loss of MMR protein expression, and PCR-based fragment analysis, which assesses length variations in specific microsatellite loci [34]. NGS-based approaches sequence a larger number of microsatellite regions and apply specialized algorithms to determine MSI status, offering the advantage of pan-cancer application and integration with other genomic biomarkers [24] [2].

Table 1: Comparison of MSI Detection Methodologies

Method Type Technical Approach Advantages Limitations Best Applications
Immunohistochemistry (IHC) Antibody staining for MMR proteins (MLH1, MSH2, MSH6, PMS2) Rapid, low cost, widely available Indirect measure, cannot detect non-truncating mutations Colorectal, endometrial, gastric cancers
PCR-Based Fragment Analysis Capillary electrophoresis of amplified microsatellite loci High sensitivity for designed loci Limited loci (typically 5-8), tissue-specific performance issues Colorectal cancer, validation studies
NGS with Principal Component Analysis PCA of length variations across multiple homopolymers High throughput, pan-cancer application, integrates with other NGS data Computational complexity, requires validation Research settings, comprehensive genomic profiling
MSIdetect Algorithm Curve-fitting algorithm accounting for tumor content and indel burden Effective with tumor-only samples, multiple cancer types Performance varies by tissue of origin Tumor-only samples, multi-cancer studies
Octaplex CaBio-MSID 8 novel microsatellites with classification algorithm Optimized for colorectal and endometrial cancers Limited validation in other cancer types Colorectal and endometrial cancer research

Performance Metrics Across NGS MSI Detection Methods

Various NGS-based algorithms have been developed for MSI detection, each employing different computational approaches and microsatellite panels. Studies have directly compared the performance of these methods across different cancer types.

Table 2: Performance Metrics of NGS-Based MSI Detection Algorithms

Algorithm Cancer Type Sensitivity Specificity Concordance with Traditional Methods Sample Requirements
MSIdetect Colorectal 98.3% 98.6% 96.3% overall concordance Tumor-only acceptable
Endometrial 96.1% 98.6% Varies by tissue type Tumor-only acceptable
Stomach 100% 100% High concordance Tumor-only acceptable
MANTIS Colorectal 98.1% 99.9% >97.4% across cancer types Requires matched normal
Endometrial 97.4% 99.3% High concordance Requires matched normal
Stomach 100% 100% High concordance Requires matched normal
Octaplex CaBio-MSID Colorectal 98.4% 98.4% Outperforms Idylla, FoundationOne Tumor-only acceptable
Endometrial 89.3% 100% Good for endometrial Tumor-only acceptable
NGS PCA Method Pan-Cancer 97.0% >95.0% 97% vs. PCR/IHC (65/67 samples) Tumor-only acceptable

Large-scale validation studies have demonstrated the robustness of NGS-based MSI detection. A comprehensive analysis of 191,767 solid tumors revealed a remarkably low discordance rate of only 0.31% between NGS-MSI and IHC-MMR, supporting the non-inferiority of NGS for identifying MMR-deficient tumors [12]. The study further demonstrated that dMMR/MSI-H tumors showed better overall and post-immunotherapy survival compared to MMR-proficient/MSI-Stable tumors, validating the clinical utility of NGS-based classification.

Experimental Protocols for Integrated Biomarker Analysis

Sample Preparation and Quality Control

Robust sample preparation is fundamental for reliable MSI, TMB, and somatic mutation analysis. The following protocol outlines key steps for processing formalin-fixed paraffin-embedded (FFPE) tissue samples, the most common specimen type in oncology research:

  • Tissue Selection and Macro-dissection: Select FFPE tissue blocks with optimal tumor cell content (>20% tumor cells as determined by hematoxylin and eosin staining). For heterogeneous samples, perform macro-dissection under morphological guidance to enrich tumor cell percentage [33].

  • DNA Extraction: Extract DNA using specialized FFPE extraction kits (e.g., MagCore Genomic DNA FFPE One-Step Kit). Quantify DNA using fluorometric methods (e.g., Qubit dsDNA HS Assay) rather than spectrophotometry due to better accuracy with degraded FFPE DNA [33] [35].

  • DNA Quality Assessment: Evaluate DNA quality in triplicate using FFPE QC kits. Accept samples with a minimum of 300 ng total DNA and A260/280 ratio ≥1.8. For TMB analysis, ensure the tumor cell percentage is accurately determined as it directly impacts variant allele frequency calculations [35] [36].

  • Library Preparation: Fragment DNA to 90-250 bp fragments using sonication (e.g., Covaris M220). Perform end repair, A-tailing, and adapter ligation following manufacturer protocols. For hybrid capture-based NGS panels, use probes targeting cancer-related genes (e.g., Illumina TruSight Oncology 500 targeting 523 genes or similar panels) [35] [36].

Sequencing and Bioinformatics Analysis

Following library preparation and sequencing, specialized bioinformatics pipelines process the raw data to generate MSI, TMB, and somatic mutation calls:

  • Sequencing Parameters: Sequence libraries on Illumina platforms (e.g., NovaSeq 6000) with a minimum median depth of 150-250× across the target region to ensure accurate variant detection [33].

  • MSI Analysis Algorithms:

    • For PCA-based methods: Identify 1,880+ mononucleotide homopolymers of 7-39 bp in repeat length. For each locus, calculate mean and variance of allelic length across all spanning reads. Apply principal component analysis to generate an MSI score, with the first principal component (PC1) typically explaining 45% of data variance [32].
    • For MSIdetect: Apply curve-fitting algorithm that models the impact of indel burden and tumor content on read coverage at approximately 100 homopolymer regions that show minimal variability between tissues and individuals [24].
  • TMB Calculation:

    • Count all coding somatic mutations including single nucleotide variants and small insertions/deletions with ≥5% variant allele frequency.
    • Exclude known driver mutations and variants with population frequency ≥0.5% in population databases (e.g., gnomAD, dbSNP) to filter germline polymorphisms.
    • Calculate TMB as total mutations divided by the size of the coding region captured in megabases (mut/Mb) [35] [36].
  • Somatic Mutation Calling:

    • Align sequences to reference genome (GRCh37/hg19) using tools like Burrows-Wheeler Aligner.
    • Call variants using matched tumor-normal pairs when available, or use population databases for tumor-only designs.
    • Annotate variants using databases such as COSMIC, ClinVar, and CIViC to determine clinical significance [33].

The following diagram illustrates the comprehensive workflow for integrated MSI, TMB, and somatic mutation analysis:

Technical Standards and Validation Frameworks

Quality Assurance and Best Practices

Implementation of NGS-based MSI, TMB, and somatic mutation testing requires adherence to established quality standards. The EMQN best practice guidelines provide key recommendations for laboratories developing these assays [34]:

  • Laboratory Accreditation: Laboratories must follow established good laboratory practice and demonstrate compliance with internationally recognized standards (e.g., ISO 15189:2022) through formal accreditation.

  • Test Validation: All tests should be thoroughly validated in individual laboratories prior to implementation, with regular participation in external quality assessment (EQA) schemes.

  • Terminology Standardization:

    • MSI-H: Significant increase in microsatellite insertion-deletion variants indicating dMMR
    • MSS: No evidence of microsatellite instability, indicating proficient MMR
    • MSI-L/MSI-I: Intermediate categories requiring careful interpretation based on clinical context
  • Minimum Performance Standards: Assays must demonstrate high sensitivity (>95%) and specificity (>95%) compared to validated reference methods, with appropriate limits of detection established for low tumor purity samples.

TMB Methodological Considerations

TMB calculation shows significant methodological variability that researchers must consider when designing studies:

Table 3: Comparison of TMB Detection Methodologies

Method Aspect Tumor-Only (TO) Approach Tumor-Control (TC) Approach
Basic Principle Compares tumor tissue sequencing data with population databases Simultaneously sequences tumor tissue and matched normal (e.g., white blood cells)
Genes Covered 523 genes (e.g., Illumina TruSight Oncology 500) 425 genes (e.g., Shihe No.1 NSCLC Panel)
Germline Filtering Relies on population frequency databases (dbSNP, ExAC, gnomAD) Direct comparison with patient's normal DNA
Advantages Requires less sample material, lower cost More accurate germline mutation discrimination
Limitations Potential for false positives from germline variants Requires additional normal sample, higher cost
Concordance 92% with TC methods 92% with TO methods
Statistical Agreement Cohen's kappa = 0.833 (good consistency) Chi-square test shows significant differences (p<0.001)

Recent studies have highlighted that different NGS methods can identify different mutation sites, directly impacting TMB calculation, particularly near the critical 10 mut/Mb threshold used for immunotherapy decisions [35] [36]. This emphasizes the importance of consistent methodology within research studies and the potential need for orthogonal validation when results are borderline.

Research Reagent Solutions

Selecting appropriate research reagents is critical for successful implementation of integrated MSI, TMB, and somatic mutation profiling. The following table outlines key solutions used in the studies referenced in this guide:

Table 4: Essential Research Reagents for NGS-Based Biomarker Profiling

Reagent Category Specific Product Examples Primary Function Considerations for Research Use
DNA Extraction Kits MagCore Genomic DNA FFPE One-Step Kit High-quality DNA extraction from FFPE tissues Optimized for degraded FFPE DNA; includes DNase treatment
Target Enrichment Panels Illumina TruSight Oncology 500 (523 genes) Comprehensive genomic profiling Includes content for TMB, MSI, somatic mutations; 1.9 Mb coding region
FoundationOne CDx (315-395 genes) Solid tumor comprehensive genomic profiling 2.1 Mb DNA content; includes MSI and TMB algorithms
Shihe No.1 NSCLC Panel (425 genes) TMB-focused solid tumor profiling Designed specifically for TMB detection; tumor-control approach
Library Preparation Hybrid capture-based NGS kits Target enrichment and library construction Dual hybridization/capture steps improve specificity
QC and Validation FFPE QC Kit (Illumina) DNA quality assessment Triplicate evaluation recommended for reliable results
Standard references (e.g., Jingliang Biotechnology) Assay validation Contain hotspot mutation sites for accuracy verification

Discussion and Future Directions

The integrated analysis of MSI, TMB, and somatic mutations through NGS represents a significant advancement in cancer genomics research. The methodological comparisons presented in this guide demonstrate that NGS-based approaches offer robust, comprehensive biomarker assessment suitable for research applications across multiple cancer types. The high concordance rates between NGS and traditional methods, coupled with the ability to analyze multiple biomarkers from limited tissue, position NGS as an efficient platform for preclinical studies and clinical trial biomarker stratification.

Future developments in this field will likely focus on standardizing TMB calculation across different panels, improving algorithms for rare cancer types, and integrating transcriptional profiles to provide even more comprehensive biomarker information. The recent FDA approval of whole exome sequencing-based companion diagnostics (e.g., Caris's MI Cancer Seek) highlights the growing acceptance of comprehensive genomic profiling for clinical decision-making, which will further influence research applications [12]. As the field evolves, continued method comparison and validation will be essential to ensure research findings are robust, reproducible, and translatable to clinical benefit.

Overcoming Challenges in NGS-MSI Testing: Indeterminate Results and Technical Pitfalls

Next-generation sequencing (NGS) has revolutionized microsatellite instability (MSI) testing, a critical biomarker for predicting response to immune checkpoint inhibitors across multiple solid tumors. However, a significant challenge persists: a subset of tests results in an unclassifiable or "indeterminate" result. These inconclusive findings, reported as MSI-Indeterminate (MSI-I), MSI-Equivocal (MSI-E), or MSI borderline, create clinical dilemmas by failing to provide actionable guidance for therapeutic decisions. This analysis examines the frequency, root causes, and methodological comparisons of indeterminate MSI results from NGS testing, providing researchers and drug developers with evidence-based insights into test performance limitations and optimization strategies.

Frequency of Indeterminate MSI Results in NGS Testing

The occurrence of indeterminate MSI results is not merely an edge case but a consistent feature of NGS-based testing across platforms and laboratories. Evidence from large-scale studies reveals that indeterminate calls affect a substantial minority of clinical samples, with rates varying based on the specific NGS assay, bioinformatic pipeline, and tumor types analyzed.

Table 1: Reported Frequencies of Indeterminate MSI Results in NGS Assays

Study or Assay Context Sample Size Indeterminate Rate Reported Terminology
Large Cohort Study [37] 191,767 solid tumor samples 8.66% (16,607 samples) "Indeterminant"
General NGS Assays [37] Multiple studies 3.2% - 8.9% MSI-I, MSI-E, Borderline
MiMSI Algorithm Test Cohort [38] 317 samples 0.3% (1 sample) MSI-Indeterminate (MSI-ind)
Illumina Panel Validation [6] 331 initial samples 5.1% (17 samples excluded for <40 usable MSI sites) "Unsuitable for analysis"

The data demonstrates that indeterminate results are a pervasive challenge, with one of the largest cohorts showing nearly 1 in 10 tests yielding an inconclusive outcome [37]. While advanced algorithms like MiMSI show promise in reducing this rate, the underlying technical limitations that drive these findings remain a critical focus for assay improvement.

Root Causes of Indeterminate MSI Classifications

The inability to definitively classify MSI status stems from several technical and biological factors that degrade the signal-to-noise ratio in NGS data. Understanding these causes is essential for developing more robust assays and appropriate clinical validation protocols.

Low Tumor Purity and Content

Low tumor purity, which dilutes the MSI signal, represents one of the most significant challenges. The MSI phenotype is a characteristic of tumor cells, and when a sample contains a high proportion of non-malignant cells, the detection of length alterations at microsatellite loci becomes statistically unreliable [37] [38]. This effect is particularly pronounced in algorithms like MSISensor that compare tumor and normal distributions, as low purity minimizes detectable differences between them [38].

Inadequate Sample Quality and Quantity

The integrity and amount of input material directly impact sequencing reliability. Key limitations include:

  • Degraded DNA: Formalin-fixed, paraffin-embedded (FFPE) specimens, the standard in oncology, often yield fragmented DNA [37].
  • Low DNA Input: NGS assays typically require more DNA input (10-50ng) compared to PCR-based methods (1-2ng) [37].
  • Insufficient Tissue: Small biopsy samples may not yield enough material for adequate DNA extraction, making retesting unfeasible [37].

Sequencing and Analytical Limitations

Technical aspects of the NGS workflow itself contribute to indeterminate calls:

  • Insufficient Sequence Coverage: Low read depth at microsatellite loci reduces confidence in calling insertions or deletions [37].
  • Bioinformatic Challenges: The lack of standardized thresholds, algorithms, and reporting criteria across platforms leads to variability in how confidence is assigned to results [6].
  • Panel Design Differences: The number of microsatellite loci analyzed varies widely (from 5 to 7,000), and the selection of markers influences robustness [37].

Methodological Comparisons: NGS vs. PCR for MSI Detection

Understanding indeterminate rates requires comparing NGS to the established gold standard, PCR-based fragment analysis. Each method possesses distinct technical profiles that influence their performance in clinical and research settings.

Table 2: Technical Comparison of PCR vs. NGS for MSI Detection

Parameter PCR-Based MSI Testing NGS-Based MSI Testing
DNA Input Requirements Minimal (1-2 ng) [37] Substantial (10-50 ng or more) [37]
Matched Normal Requirement Typically required [37] Varies by assay; some do not require [6] [37]
Throughput Medium (1-96 samples) [37] High (>96 samples) [37]
Standardization Well-standardized markers and interpretation guidelines [6] [37] Lack of standardized thresholds, panels, and algorithms [6] [37]
Additional Data Provides MSI status only [37] Simultaneously detects other genomic alterations (e.g., TMB, MMR gene mutations) [6] [37]
Key Limitation Does not identify specific gene mutations [37] Higher indeterminate rates due to technical and analytical complexities [37]

The comparative analysis reveals a fundamental trade-off: NGS offers comprehensive genomic profiling in a single test but at the cost of higher technical stringency and greater potential for indeterminate results. In contrast, PCR provides a focused, highly reliable assessment of MSI status with lower technical failure rates.

Experimental Protocols and Advanced Algorithm Performance

Evaluating the performance of NGS-based MSI calling requires rigorous validation against orthogonal methods. The following experimental approach is representative of high-quality studies in this field.

Protocol for Validating NGS-MSI Assays

A typical validation protocol involves:

  • Cohort Selection: Assembling a dataset of tumor samples with known MSI status determined by gold-standard methods (PCR and/or IHC). This cohort should deliberately include challenging cases, such as those with low tumor purity, to stress-test the algorithm [38].
  • Sequencing and Data Generation: Performing targeted NGS using panels (e.g., Illumina's TruSight Oncology 500, MSK-IMPACT, or custom panels). The validation cohort in one study started with 331 tumor samples, excluding 17 that did not meet quality metrics (e.g., having at least 40 usable MSI sites), resulting in a final validation cohort of 314 samples [6].
  • Bioinformatic Analysis: Processing sequencing data through the MSI calling algorithm (e.g., MSISensor, MiMSI, or a novel method). For the MiMSI algorithm, this involves converting aligned reads at each microsatellite site into a vector representation for analysis by a deep neural network [38].
  • Concordance Analysis: Comparing NGS-derived MSI calls to the reference method results. Statistical performance metrics including sensitivity, specificity, and area under the ROC curve (AUC) are calculated. For instance, one study reported an overall AUC of 0.922, with lower performance in colorectal cancers (AUC = 0.867) [6].

Performance of Advanced Machine Learning Algorithms

Emerging machine learning approaches are being developed to address the limitations of traditional statistical models, particularly in low-purity samples.

  • The MiMSI Algorithm: This deep multiple instance learning (MIL) framework treats each microsatellite region as an "instance" and the entire tumor sample as a "bag" of instances. It uses a convolutional neural network to analyze sequence data from these regions, pooling the information to predict a sample-level MSI status. This approach allows it to more effectively handle samples where only a subset of loci provides a clear signal [38].
  • Performance Gains: In a direct comparison on a challenging test set, MiMSI demonstrated a significantly higher sensitivity (0.895) and auROC (0.971) compared to MSISensor (sensitivity: 0.67; auROC: 0.907) [38]. This demonstrates the potential of advanced computational methods to reduce false negatives and, by extension, potentially decrease the rate of indeterminate calls in difficult samples.

The following diagram illustrates the core logical workflow of the MiMSI algorithm, highlighting how its multiple instance learning framework addresses data from individual microsatellite loci to generate a sample-level classification.

Key research reagents and computational tools are fundamental to conducting and advancing MSI detection research. The following table outlines critical components used in the development and validation of NGS-based MSI tests.

Table 3: Key Research Reagent Solutions for NGS-Based MSI Testing

Reagent / Resource Function / Application Example Use in Context
Targeted NGS Panels Captures and sequences genomic regions of interest, including microsatellite loci. Illumina's TruSight Tumor 170, TruSight Oncology 500, and custom panels (e.g., a 733-gene panel) are used for comprehensive profiling [2] [6].
DNA Library Prep Kits Prepares fragmented DNA for sequencing by adding adapters and indexing. The SureSelect XT HS2 DNA Reagent Kit was used for library preparation in a study on inherited bone marrow failure syndromes [39].
Reference Databases Provides pathogenicity annotations for genetic variants. ClinVar, OMIM, and HGMD are used to interpret variants found in MMR genes (MLH1, MSH2, MSH6, PMS2) [40] [39].
Bioinformatic Tools Aligns sequences, calls variants, and performs specialized MSI analysis. MSISensor and the novel MiMSI are specialized tools for MSI classification from NGS data [2] [38]. GATK HaplotypeCaller is used for general variant calling [39].
Validation Resources Serves as the gold standard for validating NGS-based MSI calls. Orthogonal testing methods include PCR-based fragment analysis (e.g., Promega kit) and Immunohistochemistry (IHC) for MMR proteins [6] [37] [38].

Indeterminate results in NGS-based MSI testing represent a significant challenge at the intersection of technology and clinical application, with root causes in tumor sampling, nucleic acid quality, and bioinformatic interpretation. As the field progresses, the integration of advanced machine learning techniques like multiple instance learning shows demonstrable promise in mitigating these issues, particularly for low-purity samples that traditionally confound standard algorithms. For researchers and drug developers, the path forward requires a dual focus: continued refinement of wet-lab protocols to maximize sample quality, and the development of more sophisticated, validated computational methods that can extract clear signals from complex genomic data. Overcoming the "indeterminate call" is essential for fully realizing the potential of precision oncology and ensuring that all patients who may benefit from immunotherapy can be reliably identified.

Microsatellite Instability (MSI) has emerged as a critical biomarker in oncology, with significant implications for prognosis and treatment selection, particularly for immune checkpoint inhibitor therapy [37] [41]. The accurate detection of MSI status depends heavily on pre-analytical factors, especially the quality and quantity of input DNA and tumor purity [37]. Next-generation sequencing (NGS) has gained widespread adoption for MSI testing due to its ability to simultaneously assess multiple genomic biomarkers [37] [2]. However, this method presents specific challenges regarding sample requirements that differ substantially from traditional polymerase chain reaction (PCR)-based approaches [37]. This guide objectively compares the sample requirements for NGS-based and PCR-based MSI testing, providing researchers with evidence-based data to navigate these critical pre-analytical variables and optimize testing outcomes.

Technical Comparison: DNA and Tumor Purity Requirements

The sample requirements for MSI testing vary significantly between PCR and NGS methods, impacting DNA input, tumor purity, and success rates across different sample types.

Table 1: Direct Comparison of Sample Requirements for MSI Testing Methods

Parameter PCR-Based Methods NGS-Based Methods
Minimum DNA Input 1-2 ng for amplification reaction [37] 10-50 ng or more [37]
Typical Tissue Requirements 1-5 unstained FFPE slides [37] Often requires more slides than PCR [37]
Minimum Tumor Purity 20-40% [37] Varies, but generally requires higher purity [37]
DNA Quality Requirements Moderately stringent [37] Highly stringent [37]
Sample Failure/Indeterminate Rates Not specifically reported 3.2%-8.9% of solid tumor samples [37]
Impact of Low Purity/Degraded DNA More tolerant of suboptimal samples [37] Prone to false negatives or indeterminate results with low purity/degraded DNA [37]

The Clinical Impact of Variable Sample Requirements

NGS assays demonstrate higher rates of indeterminate results (3.2%-8.9%) compared to PCR-based methods [37]. These indeterminate calls—categorized as MSI-I (indeterminate), MSI-E (equivocal), MSI borderline, or "cannot be determined"—frequently occur in samples with low tumor purity (which dilutes the MSI signal), low DNA input, or degraded DNA from FFPE samples [37]. One large-scale study of 191,767 solid tumor samples found indeterminant results in 16,607 cases (8.66%) [37]. Such indeterminate results hinder clinical decision-making and may necessitate confirmatory testing with orthogonal methods such as PCR or immunohistochemistry (IHC) [37].

Experimental Protocols and Validation Data

Understanding the experimental approaches used to establish sample requirements provides context for the comparative performance data.

Methodologies for Determining Sample Adequacy

Tumor Purity Assessment: Multiple approaches exist for determining tumor purity. Traditional pathological assessment involves estimating neoplastic cellularity by light microscopy examination of H&E-stained sections [42]. Computational methods such as ABSOLUTE, ASCAT, and THetA2 leverage genomic data to estimate purity, though studies show these algorithms may not consistently add value beyond pathologist estimation and have notable failure rates [42]. More recently, transcriptome-based approaches like PUREE use machine learning to estimate tumor purity from gene expression data, demonstrating high accuracy across solid tumor types [43].

DNA Quality Control: Standard quality assessment for NGS includes profile characterization to ensure high molecular weight DNA (>50 kb) and absence of RNA contamination [44]. Purity measurements using spectrophotometry (A260/280 ratio ~1.8 and A260/230 ratio >2.0) are essential, with fluorometric methods (e.g., Qubit with PicoGreen) representing the gold standard for DNA quantification [44].

Performance Validation Studies

NGS Method Validation: Guidelines from the Association for Molecular Pathology and College of American Pathologists recommend using reference cell lines and materials to establish performance characteristics for NGS oncology panels, including determining positive percentage agreement and positive predictive value for each variant type [45]. Studies directly comparing MSI detection methods have found that while NGS-based approaches show high concordance with PCR in colorectal cancers (99.4%), discordance increases in non-colorectal or non-endometrial cancers (96.6% concordance) [2].

Algorithm Performance: The performance of NGS-based MSI detection depends significantly on the bioinformatic algorithm used. One study evaluating MSISensor found it failed to diagnose MSI in 16% of MSI/dMMR metastatic colorectal cancers, with misdiagnosis rates even higher (32%) in another cohort of metastatic colorectal cancers [46]. Novel algorithms like MSICare have demonstrated improved sensitivity, detecting 100% of cases with true MSI status in validation cohorts [46].

Technical Workflows and Decision Pathways

The following diagram illustrates the key decision points and technical workflows for MSI testing, highlighting how sample quality impacts the testing pathway:

Figure 1: MSI Testing Workflow and Sample Quality Decision Pathway

Essential Research Reagent Solutions

Successful MSI testing requires specific reagents and materials to ensure accurate results. The following table outlines key solutions for navigating sample requirements:

Table 2: Essential Research Reagent Solutions for MSI Testing

Reagent/Material Function/Application Considerations
FFPE Tissue Sections Primary DNA source for MSI testing [37] Optimal thickness 5-10 µm; adjacent H&E sections for tumor purity assessment [42]
DNA Extraction Kits Isolation of high-quality DNA from FFPE [37] Select kits demonstrating high yield from limited input; automation-compatible [44]
Quality Control Assays Assessment of DNA quantity/quality [44] Fluorometric methods (Qubit/PicoGreen) preferred over spectrophotometry [44]
Reference DNA Standards Assay validation and quality control [45] Cell line DNA with known MSI status for validation [45]
Targeted Capture Panels NGS-based MSI detection [2] Panels with optimized microsatellite markers (e.g., 7-locus pan-cancer panel) [2]
PCR-Based MSI Kits Gold-standard MSI detection [37] Commercial kits (e.g., Promega) using quasi-monomorphic mononucleotide repeats [37]
Library Preparation Kits NGS library construction from limited DNA [47] Kits supporting low DNA input (100 pg) with efficient adapter ligation [47]

The selection between PCR and NGS for MSI testing involves careful consideration of sample quality and quantity requirements. While NGS offers the advantage of simultaneous multi-biomarker analysis, it demands higher DNA input, superior DNA quality, and greater tumor purity compared to PCR-based methods [37]. Researchers working with limited or suboptimal samples should consider PCR as a more robust option, while those with adequate sample material may benefit from the comprehensive genomic profiling provided by NGS. The development of novel computational methods for tumor purity estimation [43] and optimized NGS algorithms [46] continues to address these challenges, promising improved MSI detection across diverse sample types in the future.

The accurate detection of microsatellite instability (MSI) using next-generation sequencing (NGS) is a cornerstone of modern precision oncology, guiding immunotherapy decisions for patients with solid tumors. The analytical performance of these bioinformatic pipelines hinges on two fundamental concepts: the determination of robust classification thresholds and the optimization of the signal-to-noise ratio (SNR). Thresholds define the cut-off values that distinguish MSI-High (MSI-H) from microsatellite stable (MSS) tumors, a critical clinical classification. Simultaneously, the SNR quantifies the ability to separate true biological signals from technical and biological background noise, which is endemic to high-throughput genomic data [48] [49]. The optimization of these factors is not merely a technical exercise; it directly impacts diagnostic accuracy, with significant implications for patient treatment and clinical outcomes. Within the broader thesis of NGS performance for MSI testing, this guide objectively compares how different methodological approaches and commercial products handle these challenges, providing researchers and developers with a framework for evaluation and implementation.

MSI Detection Methods: A Comparative Landscape

Gold Standards and Emerging NGS Approaches

The landscape of MSI testing is characterized by a transition from traditional, established methods to comprehensive NGS-based assays.

  • IHC and PCR: The Established Paradigms: Immunohistochemistry (IHC) indirectly assesses MSI status by detecting the loss of mismatch repair (MMR) proteins (MLH1, MSH2, MSH6, PMS2) via staining [2] [37]. While it can indicate which protein is affected, its results can be influenced by pre-analytical processing and non-truncating mutations that preserve antigenicity [2]. Polymerase chain reaction (PCR)-based testing, often considered a gold standard, directly evaluates MSI by using capillary electrophoresis to detect length alterations in a small panel of five to six mononucleotide microsatellite regions by comparing tumor DNA to matched normal DNA [6] [37]. Its high reproducibility is well-established, though the standard panels are primarily validated for colorectal cancer [2].

  • NGS: A Multi-Faceted Tool: NGS-based methods represent a paradigm shift, enabling the simultaneous assessment of hundreds to thousands of microsatellite loci while also profiling other genomic biomarkers like tumor mutational burden (TMB) and specific gene mutations [6] [37]. These methods detect MSI by identifying insertions or deletions within the sequenced microsatellite regions through sophisticated bioinformatic algorithms [2]. A key advantage is that many NGS assays can determine MSI status without a matched normal sample, simplifying the testing process [6]. However, a significant challenge is the lack of standardization in bioinformatic analyses, including the number and type of loci interrogated, the algorithms used, and the thresholds for final classification [6] [37].

Technical Comparison of MSI Detection Methods

Table 1: Comparative advantages and limitations of PCR and NGS for MSI detection.

Feature PCR-based MSI Testing NGS-based MSI Testing
Principle Fragment length analysis of 5-6 microsatellite loci [6] Sequencing of dozens to thousands of microsatellite loci [2] [37]
DNA Input Low (1-5 ng) [37] High (10-50 ng or more) [37]
Normal Sample Required (for most assays) [37] Not always required [6]
Throughput Medium to High (1-96 samples) [37] Very High (>96 samples) [37]
Additional Data Provides MSI status only Simultaneously detects MSI, TMB, mutations, and other genomic alterations [6] [12]
Standardization Well-standardized markers and interpretation [6] Lack of standardization; varies by assay and lab [6] [37]
Key Challenge Primarily validated for colorectal cancer [2] Indeterminate results in ~3-9% of cases, requiring orthogonal confirmation [37]

Experimental Protocols for Threshold Determination and SNR Optimization

Developing an NGS-based MSI Classifier

The development of a novel NGS-based MSI detector, as exemplified by the MSIDRL algorithm, involves a multi-stage process for threshold determination [2].

  • Marker Selection and Panel Design: Initially, hundreds of robust non-coding microsatellite loci are selected from whole-exome sequencing data. Capture probes are designed for these loci to create a prototype sequencing panel [2].
  • Training with Reference Samples: The prototype panel is used to sequence a training set of pan-cancer FFPE samples (e.g., 105 samples) with pre-defined MSI status determined by PCR (31 MSI-H and 74 MSS). For each microsatellite locus, the cumulative read counts by repeat length are compared between MSI-H and MSS samples. The repeat length that maximizes the difference in read count distributions is defined as the "diacritical repeat length" (DRL) for that locus [2].
  • Calculation of Background Noise and Locus Scoring: Background noise (B~i~) for each locus is calculated as the ratio of "unstable" reads (length ≤ DRL) to total reads in the MSS samples. For any sample, the proportion of unstable reads (b~ij~) is compared to the background noise using a binomial test. A p-value cutoff (P~i~) for each locus is determined to achieve a specificity of ≥99% while maximizing sensitivity [2].
  • Final Panel and Threshold Determination: The top-performing loci are selected for the final panel. The unstable locus count (ULC)—the number of loci with p-values exceeding their cutoffs—is calculated for each sample. The optimal ULC cutoff (e.g., 11) to classify a sample as MSI-H is determined by analyzing the bimodal distribution of ULC scores across a very large cohort (e.g., 35,563 cases) [2].

Protocol for SNR Evaluation in Genomic Assays

While the specific term "SNR" is less common in clinical MSI literature, the underlying principle of maximizing the true signal (MSI) over background (technical noise, biological variation) is central. The following protocol, adapted from general microarray and biological computation research, provides a framework [48] [49].

  • Define Signal and Noise: The signal is the quantitative measurement intended to represent the biological phenomenon (e.g., the proportion of unstable microsatellite loci in a sample). Noise encompasses all sources of variation that obscure this signal, including sequencing errors, low tumor purity, and DNA degradation [37] [49].
  • Calculate SNR: A common method for calculating SNR in fluorescence-based assays is: SNR = (Signal_Mean - Background_Mean) / Standard_Deviation_of_Background [50]. A ratio of 3.0 is often the lower limit for reliable detection [50]. For biological data with log-normal distributions (common in gene expression), a geometric SNR calculation is more appropriate: SNR_dB = 20 * log10( |log10(μ_g,true / μ_g,false)| / (2 * log10(σ_g)) ), where μg are geometric means of "true" and "false" states and σg is the geometric standard deviation [49].
  • Optimize Hybridization and Washes: In wet-lab procedures, non-specific hybridization is a primary source of background noise. Optimizing hybridization conditions and stringency washes is the most effective way to increase the SNR numerator [50].
  • Optimize Instrument Settings: For scanning instruments, the photomultiplier tube (PMT) gain must be determined empirically. The relationship between SNR and PMT gain should be measured across a range of target concentrations to find the setting that provides the highest SNR without signal saturation [50].
  • Apply Noise-Reduction in Bioinformatics: In computational analysis, less frequent sequences or data points can be a source of noise. One effective strategy is to remove less frequent sequences, which has been shown to partition an additional 25% of variance from noise to explanatory factors in statistical models, thereby improving the SNR [51].

Performance Data: Concordance and Thresholds in Practice

Real-World Concordance and Threshold Validation

Independent studies and large-scale corporate validations provide critical data on the performance of NGS-based MSI testing compared to traditional methods.

Table 2: Performance metrics of NGS-MSI from recent studies.

Study / Assay Study Cohort Reference Method Key Performance Findings
Illumina TST170/TSO500 [6] 314 tumors (multiple types) PCR (6 markers) Overall AUC = 0.922; Colorectal cancer AUC = 0.867; Optimal MSI score cut-off: ≥13.8%; Borderline range: 8.7% to <13.8%
Caris MI Cancer Seek [12] 191,767 solid tumors (pan-cancer) IHC 99.69% concordance between NGS-MSI and IHC-MMR; Discordance observed in only 0.31% of samples
MSIDRL Algorithm [2] 35,563 pan-cancer cases PCR Bimodal ULC distribution observed; ULC cut-off of 11 determined optimal for MSI-H classification
NGS-MSI (Various) [37] Literature Review PCR/IHC ~3.2% - 8.9% of solid tumor samples yield "indeterminate" or "unknown" MSI results

Analysis of Discordant and Indeterminate Results

Despite high overall concordance, discordant and indeterminate results highlight areas for pipeline optimization. A large-scale study found minimal (0.31%) discordance between NGS-MSI and IHC-MMR, with no difference in overall survival between patients in the discordant group, suggesting clinical equivalence [12]. A more frequent challenge is the "indeterminate" result, where NGS assays fail to classify a sample as MSI-H or MSS with confidence. This occurs in approximately 3.2-8.9% of cases and is often due to low tumor purity, low DNA input, degraded DNA, or insufficient sequencing coverage [37]. For samples falling into a "borderline" zone (e.g., MSI score of 8.7% to 13.8% in one study), integrating Tumor Mutational Burden (TMB) into the classification workflow can significantly improve diagnostic accuracy. If results remain inconclusive, orthogonal confirmation by PCR or IHC is recommended [6].

The Scientist's Toolkit: Essential Research Reagents and Materials

Table 3: Key reagents, tools, and platforms for MSI pipeline development and evaluation.

Item Function/Description Example Use in MSI Research
FFPE Tumor Specimens The primary source material for DNA extraction in clinical MSI testing. Required for both PCR and NGS methods; quality and tumor purity directly impact SNR and classification accuracy [37].
Targeted NGS Panels Gene panels designed to capture microsatellite loci and other genomic regions of interest. Illumina TruSight Oncology 500 and TruSight Tumor 170 are examples used for integrated MSI and genomic profiling [6].
MSI Algorithms (e.g., MSIsensor, MSIDRL) Bioinformatics tools to analyze NGS data and calculate MSI scores. MSIDRL uses a diacritical repeat length and binomial test to classify unstable loci [2].
PCR-based MSI Kits Commercially available kits for gold-standard fragment analysis. Used for orthogonal confirmation of NGS results, especially for indeterminate or borderline cases [6] [37].
IHC Antibodies (MLH1, MSH2, MSH6, PMS2) Antibodies for detecting MMR protein expression. Used for initial screening or orthogonal confirmation of dMMR status [6] [37].
Automated ML Tools (e.g., TPOT) Frameworks for optimizing machine learning pipelines using genetic programming. Can be applied to explore optimal combinations of data pre-processing, feature selection, and classifiers for MSI prediction [52].

Workflow and Pathway Diagrams

NGS-MSI Bioinformatics Pipeline

The following diagram illustrates the key steps in a bioinformatic pipeline for determining MSI status from raw sequencing data, culminating in the critical threshold decision.

MSI-H Threshold Determination Process

This diagram outlines the experimental process for establishing a robust MSI-H classification threshold, as described in the development of the MSIDRL algorithm.

The optimization of bioinformatic pipelines for NGS-based MSI testing is an iterative process centered on the precise determination of classification thresholds and the continuous improvement of the signal-to-noise ratio. As the data demonstrates, NGS platforms achieve high concordance with traditional methods across large, pan-cancer cohorts, validating their clinical utility [2] [12]. However, challenges remain in standardizing analytical approaches and managing indeterminate cases. Future advancements will likely come from the integration of multi-modal data, such as TMB, and the application of sophisticated machine learning frameworks like TPOT [6] [52] to further refine classification boundaries. For researchers and clinicians, this necessitates a thorough understanding of the experimental protocols and performance characteristics underlying these powerful tools, ensuring that MSI status remains a reliable beacon for guiding cancer immunotherapy.

Integrating TMB to Resolve Borderline MSI Classifications

Microsatellite instability (MSI) status serves as a crucial biomarker for predicting response to immune checkpoint inhibitors and identifying Lynch syndrome. While next-generation sequencing (NGS) has emerged as a comprehensive platform for MSI detection, a significant challenge remains in classifying samples that fall into borderline or indeterminate ranges. This review examines the growing evidence supporting the integration of tumor mutational burden (TMB) as a complementary biomarker to resolve these ambiguous cases. We evaluate experimental data demonstrating how TMB enhances diagnostic accuracy in MSI-borderline cases, compare the performance of different testing methodologies, and provide detailed protocols for implementing this integrated approach in clinical research settings. The synthesis of current evidence indicates that TMB integration significantly improves MSI classification confidence, particularly for samples with MSI scores ranging from 8.7% to 13.8%, ultimately optimizing patient stratification for immunotherapy.

The widespread adoption of next-generation sequencing (NGS) for microsatellite instability (MSI) detection has revolutionized molecular diagnostics by enabling simultaneous assessment of multiple biomarkers from limited tissue samples. However, a significant limitation of NGS-based MSI testing emerges in the gray zone of borderline classifications, where MSI scores fall between clearly stable and highly unstable thresholds. These indeterminate results, which various platforms classify as "MSI-indeterminate," "MSI-equivocal," or "MSI-borderline," affect approximately 3.2%-8.9% of solid tumor samples and can reach 8.66% in large cohorts [37]. This classification uncertainty poses substantial challenges for clinical decision-making and trial eligibility.

The fundamental issue stems from several technical and biological factors. Tumor samples with low purity (typically below 20%) can dilute the MSI signal, while degraded DNA from FFPE specimens may yield insufficient coverage of microsatellite loci [37]. Additionally, the absence of standardized bioinformatic algorithms and thresholds across NGS platforms creates variability in MSI scoring [53] [37]. These challenges necessitate complementary approaches to resolve indeterminate cases, with tumor mutational burden (TMB) emerging as a particularly promising biomarker due to its mechanistic relationship with mismatch repair deficiency.

Recent evidence suggests that integrating TMB measurements can significantly improve diagnostic accuracy for samples with borderline MSI scores [53]. This review systematically evaluates the experimental data supporting this integrated approach, compares methodology performance, and provides detailed protocols for research implementation, addressing a critical need in precision oncology.

Experimental Evidence: TMB Integration Enhances MSI Classification

Key Studies Demonstrating TMB Utility in Borderline MSI
Real-World Performance of Illumina NGS Panels

A comprehensive retrospective study evaluating Illumina's TruSight Tumor 170 and TruSight Oncology 500 panels in 331 cancer patients demonstrated the challenge of borderline MSI classifications and the value of TMB integration. The research established an MSI score cut-off of ≥13.8% for defining MSI-High (MSI-H) status, with high overall concordance to PCR-based testing (AUC = 0.922). However, the study identified a critical borderline range of ≥8.7% to <13.8% where MSI classification remained uncertain [53] [6].

Within this borderline range, researchers found that integrating TMB significantly improved diagnostic accuracy. The mechanism behind this enhancement lies in the biological relationship between mismatch repair deficiency and hypermutation. Deficient MMR systems cause simultaneous instability in microsatellite regions and increased mutation accumulation throughout the genome, manifesting as both MSI and high TMB [54]. The study recommended orthogonal confirmation using MSI-PCR for samples that remained inconclusive after TMB integration, highlighting a sequential approach to borderline case resolution [53].

Large-Scale Pan-Cancer Validation

A large-scale retrospective analysis of 35,563 Chinese pan-cancer cases further validated the relationship between MSI and TMB, providing additional evidence for their integrated use. The study developed a novel NGS-based MSI detection algorithm (MSIDRL) that examined MSI-H prevalence across cancer types and identified MSI-associated genes and variants [2].

The research demonstrated that tumors with high MSI status typically exhibited elevated TMB levels, particularly in cancers with the highest MSI-H prevalence (uterine, gastric, and colorectal cancers). This correlation provides the biological rationale for using TMB as an adjunctive biomarker when MSI scores fall into borderline ranges. The study additionally distilled 7 MS loci particularly suitable for pan-cancer MSI detection, offering potential standardization for future assay development [2].

Table 1: Key Studies Supporting TMB Integration for Borderline MSI Classification

Study Cohort Size NGS Platform Borderline Range Definition TMB Integration Benefit
Å kerl et al. [53] 331 patients TruSight Tumor 170/500 MSI score: 8.7%-13.8% Significant improvement in diagnostic accuracy
CheckMate 142 Analysis [55] 59 WES-evaluable Whole exome sequencing Not specified TMB associated with improved response to combo immunotherapy
Large-scale Pan-Cancer [2] 35,563 cases 733-gene NGS LDT ULC: ~10-90 (bimodal distribution) Correlation between high MSI and elevated TMB established
Biological Mechanisms Linking MSI and TMB

The synergistic relationship between MSI and TMB stems from their shared underlying mechanism of DNA mismatch repair deficiency. When MMR genes (MLH1, MSH2, MSH6, PMS2) are impaired through genetic or epigenetic alterations, the cellular ability to correct DNA replication errors is compromised [54]. This deficiency particularly affects microsatellite regions—short tandem repeats of 1-6 nucleotides that are prone to replication slippage. The resultant insertions or deletions at these loci manifest as MSI [2].

Concurrently, MMR deficiency permits accumulation of mutations throughout the genome, leading to elevated TMB. This relationship explains why approximately 16% of cancers with high TMB (defined as >20 mutations/Mb) also demonstrate MMR deficiency and high MSI [54]. The biological link creates a mechanistic foundation for using TMB to clarify ambiguous MSI classifications, particularly in tumors where technical factors may obscure the MSI signal despite underlying MMR deficiency.

Methodology Comparison: Technical Approaches for MSI and TMB Assessment

NGS-Based MSI Detection Methods

NGS-based MSI detection methodologies vary significantly in their technical approaches, contributing to the challenge of standardized classification. The fundamental principle involves sequencing multiple microsatellite loci and comparing the profiles to reference standards to identify instability patterns. Key methodological variations include:

  • Loci Selection: The number of microsatellite loci analyzed ranges widely from 5 to 7,000 across different platforms, with significant implications for sensitivity and specificity [37]. Mononucleotide repeats are generally more sensitive for MSI detection compared to di-, tri-, tetra-, or pentanucleotide repeats [37].

  • Bioinformatic Algorithms: Computational approaches for MSI status determination include MSIsensor, MSIsensor2, and mSINGS, each with different scoring systems and thresholds [2]. The novel MSIDRL algorithm introduced in the large pan-cancer study utilizes a "diacritical repeat length" concept and binomial testing to classify instability at each locus [2].

  • Scoring Systems: MSI scoring methods include percentage-based systems (e.g., Illumina's MSI score), unstable locus count (ULC) methods, and statistical probability approaches. The absence of standardization across these systems contributes to classification variability, particularly in borderline cases [53] [2].

Table 2: Comparison of MSI Detection Methodologies

Parameter PCR-Based MSI NGS-Based MSI IHC for MMR Proteins
Principle Fragment length analysis of 5-6 microsatellite markers Sequencing of 5-7,000 microsatellite loci Protein expression of MLH1, MSH2, MSH6, PMS2
DNA Input 1-2 ng [37] 10-50 ng or more [37] Not applicable (protein-based)
Matched Normal Required (except QMVR method) [37] Varies by assay [37] Not required
Throughput Medium (1-96 samples) [37] High (>96 samples) [37] Medium
Advantages Gold standard, minimal sample requirements, standardized Simultaneous detection of other genomic alterations, no normal required for some assays Identifies affected MMR protein, guides germline testing
Limitations Does not identify gene mutations Lack of standardization, bioinformatic complexity, indeterminate results Epigenetic alterations may cause false negatives
TMB Measurement Approaches

TMB quantification methodologies also exhibit significant variability that researchers must consider when integrating TMB with MSI classifications:

  • Targeted Panels vs. Whole Exome Sequencing: Targeted NGS panels estimate TMB from several hundred genes, while whole exome sequencing (WES) provides comprehensive measurement. Correlations between panel-based and WES-based TMB require careful validation [55] [54].

  • TMB Calculation Methods: TMB is typically reported as mutations per megabase (mut/Mb), with thresholds for "high" TMB varying across cancer types. The arbitrary cut-off of 10 mut/Mb approved for pembrolizumab has demonstrated suboptimal performance in some contexts, with evidence supporting higher thresholds (16-20 mut/Mb) for certain cancers [54].

  • Tumor Purity Considerations: Both MSI and TMB measurements are affected by tumor purity, with samples below 20% tumor content particularly prone to inaccurate classification [37]. Computational methods to adjust for tumor purity can improve accuracy for both biomarkers.

Integrated Classification Workflows

The experimental evidence supports a sequential workflow for resolving borderline MSI cases through TMB integration. The following diagram illustrates this decision process:

Research Reagent Solutions and Experimental Protocols

Essential Research Materials for Integrated MSI/TMB Analysis

Table 3: Key Research Reagents and Platforms for MSI/TMB Integration Studies

Reagent/Platform Type Primary Research Function Considerations for Borderline Cases
TruSight Oncology 500 (Illumina) Targeted NGS panel Simultaneous MSI and TMB assessment Established cut-off of ≥13.8% for MSI-H; borderline range: 8.7%-13.8% [53]
MSI-PCR Kit (Promega) PCR-based MSI detection Orthogonal confirmation for borderline cases Gold standard; requires matched normal; minimal DNA input [37]
FFPE DNA Extraction Kits Sample preparation Obtain high-quality DNA from archival tissues Critical for minimizing indeterminate results; degraded DNA increases borderline risk [37]
Tumor Enrichment Tools Sample preparation Macrodisssection or microdissection to increase tumor purity Tumor purity <20% increases borderline classification likelihood [37]
MSIsensor Bioinformatics tool MSI status calling from NGS data Open-source algorithm; requires optimization for specific panels [2]
Detailed Experimental Protocol for Borderline MSI Resolution

Based on the synthesized evidence, the following step-by-step protocol is recommended for resolving borderline MSI classifications in research settings:

Sample Preparation and Quality Control
  • DNA Extraction: Extract DNA from FFPE tumor sections using specialized kits designed for degraded material. Ensure tumor purity exceeds 20% through pathologist-guided macrodissection if necessary [37].
  • Quality Assessment: Quantify DNA using fluorometric methods and assess degradation through genomic quality number (GQN) or similar metrics. Proceed only with samples meeting platform-specific quality thresholds.
  • NGS Library Preparation: Prepare sequencing libraries using validated NGS panels (e.g., TruSight Oncology 500) according to manufacturer protocols, incorporating unique molecular identifiers to reduce artifacts.
Sequencing and Data Analysis
  • Sequencing Parameters: Sequence to a minimum depth of 500x for targeted panels, ensuring adequate coverage of microsatellite regions.
  • MSI Scoring: Calculate MSI scores using platform-specific algorithms. For Illumina panels, apply the recommended threshold of ≥13.8% for MSI-H and <8.7% for MSS [53].
  • TMB Calculation: Determine TMB in mutations per megabase, applying appropriate panel-specific corrections for comparison to WES-based standards.
Integrated Classification for Borderline Cases
  • Borderline Identification: Flag samples with MSI scores between 8.7% and 13.8% as borderline [53].
  • TMB Integration: For borderline cases, classify as MSI-H if TMB exceeds 20 mut/Mb (or cancer-specific optimized thresholds) [54].
  • Orthogonal Confirmation: Subject TMB-low borderline cases to MSI-PCR using quasimonomorphic mononucleotide repeat panels or MMR IHC for definitive classification [53] [37].
Data Interpretation and Reporting
  • Documentation: Clearly report integrated MSI/TMB status, specifying the methodology and thresholds used.
  • Uncertain Cases: Acknowledge biologically ambiguous cases where discordance between MSI and TMB persists despite orthogonal testing, as these may represent technical artifacts or biologically distinct subsets.

Clinical Implications and Research Applications

Predictive Value for Immunotherapy Response

The integration of MSI and TMB has particular relevance for predicting response to immune checkpoint inhibitors, with different implications for monotherapy versus combination regimens. Analysis from the CheckMate 142 study revealed that in patients with MSI-H/dMMR metastatic colorectal cancer, higher expression of inflammation-related gene signatures was associated with improved response to nivolumab monotherapy. In contrast, higher TMB, tumor indel burden, and degree of MSI were associated with improved response to nivolumab plus ipilimumab combination therapy [55].

This differential association underscores the complementary nature of these biomarkers. While MSI status reflects underlying MMR deficiency and associated immune activation, TMB may provide additional information about neoantigen load that becomes particularly important for combination immunotherapy approaches. For borderline MSI cases, TMB integration may therefore help optimize therapy selection between monotherapy and combination regimens.

Biological Insights from Multi-Omics Approaches

Advanced multi-omics analyses have revealed substantial heterogeneity within MSI-H tumors that further supports the value of integrated biomarker approaches. Proteomic profiling of MSI-H colorectal cancers has identified distinct subtypes with different clinical outcomes, mutational signatures, and pathway enrichments [56]. The MSI-H1 subtype exhibits enrichment of DNA replication and mismatch repair pathways, higher TMB, and prolonged survival, while the MSI-H2 subtype shows extracellular matrix-receptor interaction, high stromal infiltration, and poorer outcomes [56].

These findings indicate that even within definitively MSI-H tumors, additional stratification by TMB and other molecular features may enhance prognostic prediction and therapeutic targeting. For borderline MSI cases, multi-omics approaches could provide further resolution of biologically distinct subsets with different clinical behaviors.

The integration of TMB to resolve borderline MSI classifications represents a significant advancement in precision oncology, addressing a critical challenge in NGS-based biomarker implementation. Experimental evidence consistently demonstrates that this integrated approach improves diagnostic accuracy, particularly for samples with MSI scores falling between 8.7% and 13.8%. The biological rationale stems from the shared mechanism of MMR deficiency driving both microsatellite instability and genome-wide hypermutation.

Future research directions should focus on standardizing TMB and MSI measurement across platforms, establishing cancer-specific thresholds for integrated classification, and validating the clinical utility of this approach in prospective trials. Additionally, exploring the relationship between MSI, TMB, and other emerging biomarkers such as tumor indel burden and gene expression signatures may further refine patient stratification. As NGS continues to evolve as a comprehensive diagnostic platform, algorithmic integration of multiple biomarkers will be essential for maximizing the clinical value of genomic testing in oncology.

Validation and Concordance: Benchmarking NGS Against IHC and PCR

The integration of next-generation sequencing (NGS) into clinical diagnostics has necessitated rigorous evaluation of its concordance with established gold standard methods for detecting microsatellite instability (MSI). This guide synthesizes evidence from large-scale retrospective studies to objectively compare the performance of NGS against polymerase chain reaction (PCR) and immunohistochemistry (IHC). Data demonstrate that NGS achieves high overall concordance with PCR, often exceeding 98%, though performance varies across tumor types. The comprehensive genomic profiling capability of NGS, coupled with its high accuracy, positions it as a powerful alternative to traditional methods, particularly in tissue-limited scenarios and pan-cancer applications.

Microsatellite instability (MSI) has emerged as a crucial pan-cancer biomarker, predicting response to immune checkpoint inhibitors and identifying patients with Lynch syndrome. The DNA mismatch repair (MMR) system, comprised of proteins MLH1, MSH2, MSH6, and PMS2, corrects errors during DNA replication. Deficient MMR (dMMR) leads to MSI, characterized by length alterations in short, repetitive DNA sequences known as microsatellites. For years, immunohistochemistry (IHC) for MMR protein expression and PCR-based fragment analysis have served as diagnostic gold standards. IHC detects loss of MMR protein expression, while PCR directly identifies length variations in specific microsatellite markers. With the advent of NGS, which analyzes dozens to hundreds of microsatellite loci while simultaneously assessing other genomic biomarkers, understanding its concordance with traditional methods is essential for clinical implementation. This guide examines large-scale evidence validating NGS performance against these established standards.

Methodological Frameworks: Comparing Testing Platforms

Established Gold Standards: IHC and PCR

Immunohistochemistry (IHC) visually assesses the nuclear expression of four MMR proteins (MLH1, MSH2, MSH6, and PMS2) in tumor tissues. Loss of expression in one or more proteins suggests dMMR. While cost-effective and widely accessible, IHC results can be affected by pre-analytical variables, staining interpretation, and non-truncating mutations that preserve antigenicity despite functional loss [2] [57].

PCR-Based Methods amplify a panel of five mononucleotide markers (e.g., BAT-25, BAT-26, NR-21, NR-24, MONO-27). MSI-High (MSI-H) status is assigned when ≥2 markers show instability, while Microsatellite Stable (MSS) tumors show no instability. PCR directly detects the functional consequence of dMMR but typically requires matched normal tissue for comparison and analyzes a limited number of loci [58].

Next-Generation Sequencing: A Multiplexed Approach

NGS-based MSI detection leverages targeted gene panels to sequence dozens to hundreds of microsatellite loci. MSI status is determined bioinformatically by calculating the percentage of unstable loci. Common panels include:

  • TruSight Oncology 500 (Illumina): Assesses ~130 loci
  • FoundationOne CDx: Analyzes 95 loci
  • AVENIO CGP Kit (Roche): Uses a proprietary algorithm
  • VariantPlex Solid Tumor Focus v2 (ArcherDx): Evaluates 108-111 loci

NGS provides a quantitative MSI score and can be performed without matched normal tissue. Its key advantage is the simultaneous assessment of MSI, tumor mutational burden (TMB), and other genomic alterations from a single test [2] [6] [8].

Table 1: Core Methodological Characteristics of MSI Testing Platforms

Method Target Readout Key Advantages Key Limitations
IHC MMR proteins (MLH1, MSH2, MSH6, PMS2) Protein expression (presence/absence) Low cost, rapid turnaround, identifies affected protein Subjective interpretation, false negatives with non-truncating mutations
PCR 5-6 mononucleotide repeat markers Fragment length analysis Direct detection of MSI, high sensitivity/specificity for colorectal cancer Requires matched normal tissue, limited loci, less validated for non-colorectal cancers
NGS 95-130+ microsatellite loci + gene panel Sequence alignment, quantitative MSI score Pan-cancer application, simultaneous genomic profiling, no normal tissue required Higher cost, complex bioinformatics, lack of standardized thresholds

Concordance Data: NGS Versus Traditional Methods

Large-scale retrospective analyses demonstrate high overall concordance between NGS and PCR-based methods. A study of 35,563 pan-cancer cases utilizing a novel NGS algorithm (MSIDRL) found a clear bimodal distribution of unstable locus counts (ULC), enabling precise classification of MSI-H and MSS tumors [2]. Another analysis of 314 tumors across multiple cancer types reported an area under the curve (AUC) of 0.922 for NGS compared to PCR, indicating excellent discriminatory power [6] [10]. A separate investigation of 80 solid tumors found a 98.8% concordance rate between NGS and PCR, with positive and negative predictive values of 100% and 98.7%, respectively [58].

Tumor-Specific Performance Variations

While overall concordance is high, performance varies across cancer types. In colorectal cancer (CRC), studies report slightly lower concordance (AUC 0.867) due to broader score variability [6] [10]. In contrast, perfect agreement (AUC 1.00) has been observed in prostate and biliary tract cancers, though smaller sample sizes in these malignancies warrant cautious interpretation [6]. Endometrial cancer presents particular challenges, with one study reporting 88.6% sensitivity for NGS compared to PCR, attributed to cases with "subtle MSI+ phenotype" where instability appears in fewer than five markers [59].

Table 2: Concordance Metrics Between NGS and PCR Across Cancer Types

Cancer Type Sample Size Concordance Metric Reported Value Study
Pan-Cancer 314 AUC 0.922 [6] [10]
Colorectal 201 AUC 0.867 [6] [10]
Prostate 58 AUC 1.00 [6]
Biliary Tract 11 AUC 1.00 [6]
Endometrial 55 Sensitivity 88.6% [59]
Mixed Solid Tumors 80 Overall Concordance 98.8% [58]

Concordance with IHC Standards

NGS also shows strong correlation with IHC. A 2025 study of 139 tumors across 51 colorectal carcinomas, 22 pancreatic ductal adenocarcinomas, and other malignancies found 10 of 12 MSI-H tumors exhibited MMR protein loss by IHC [8]. The two discordant cases (a mucinous adenocarcinoma of omental origin and a mucinous colon adenocarcinoma) were MSI-H by NGS but retained MMR protein expression, potentially representing false-negative IHC results or unusual biological mechanisms [8]. In endometrial cancer, IHC demonstrated 89.3% sensitivity and 87.3% specificity compared to PCR, indicating substantial but imperfect agreement [57].

Special Considerations and Discordant Cases

Algorithm Thresholds and Standardization

A significant challenge in NGS-based MSI testing is the lack of standardized thresholds across platforms. Different panels and algorithms employ varying cut-off values for MSI classification. For instance, the Illumina TSO500 panel has proposed an optimal MSI score cut-off of ≥13.8%, with a "borderline" range of ≥8.7% to <13.8% where integration with TMB improves classification accuracy [6]. Similarly, the VariantPlex panel classifies samples with >30% unstable loci as MSI-H, <20% as MSS, and 20-30% as MSI-Intermediate [8]. This variability underscores the need for laboratory-specific validation and, in borderline cases, orthogonal confirmation with PCR or IHC.

Discordant results between methods may arise from several factors:

  • Non-truncating MMR mutations: IHC may detect protein expression despite functional deficiency, leading to false-negative results [2]
  • Subtle MSI phenotypes: Particularly in endometrial cancer, minimal instability may be detected by PCR but fall below NGS thresholds [59]
  • Heterogeneous staining patterns: Technical artifacts in IHC interpretation can produce false results [57] [8]
  • Tumor type-specific markers: Traditional PCR panels were optimized for colorectal cancer and may perform less optimally in other malignancies [2]

Diagram: Sources and Implications of Method Discordance in MSI Testing

The Evolving Diagnostic Landscape: Emerging Technologies

Artificial Intelligence and Digital Pathology

Deep learning (DL) approaches applied to histopathology images represent a promising emerging technology. A 2025 meta-analysis of 30 studies found pathology slice-based DL models achieved pooled sensitivity of 0.90 and specificity of 0.86 for detecting MSI in colorectal cancer [60]. For external validation, sensitivity was 0.88 and specificity 0.84, demonstrating robust generalizability [60]. These approaches could potentially serve as rapid, cost-effective screening tools, though they currently require validation against molecular standards.

Integrated Testing Algorithms

Given the complementary strengths and limitations of each method, integrated testing algorithms optimize diagnostic accuracy. One proposed workflow begins with IHC as an accessible first-line test, proceeding to PCR or NGS in cases with indeterminate results, retained protein expression despite high clinical suspicion, or when comprehensive genomic profiling is desired [6] [8]. For NGS results falling into borderline ranges, integration with TMB assessment and orthogonal confirmation with PCR is recommended [6].

Diagram: Proposed Integrated Testing Algorithm for MSI/dMMR Detection

Essential Research Reagents and Platforms

Table 3: Key Research Reagents and Platforms for MSI Detection

Category Specific Product/Platform Primary Function in MSI Research
NGS Panels TruSight Oncology 500 (Illumina) Targeted sequencing of 523 genes + MSI loci
FoundationOne CDx Comprehensive genomic profiling with MSI assessment
AVENIO CGP Kit (Roche) 324-gene panel with integrated MSI scoring
PCR Systems Promega MSI Analysis System Fragment analysis of 5 mononucleotide markers
MSI-IVD Kit (FALCO) PCR-based companion diagnostic
IHC Reagents MLH1 (ES05, Dako) Detection of MLH1 protein expression
MSH2 (FE11, Dako) Detection of MSH2 protein expression
MSH6 (EP49, Dako) Detection of MSH6 protein expression
PMS2 (EP51, Dako) Detection of PMS2 protein expression
Bioinformatics Tools MSIsensor NGS-based MSI detection algorithm
MSIDRL Novel algorithm for pan-cancer MSI detection

Large-scale retrospective studies consistently demonstrate that NGS achieves high concordance with both PCR and IHC gold standards for MSI detection, with overall agreement frequently exceeding 95-98%. The multiplexing capability of NGS, providing simultaneous assessment of MSI, TMB, and other genomic alterations from limited tissue, offers distinct advantages in the era of precision oncology. However, tumor-specific performance variations, borderline cases, and unique biological scenarios necessitate method-specific understanding. Integrated testing algorithms, leveraging the complementary strengths of IHC, PCR, and NGS, will provide the most robust approach for accurate MSI classification across diverse cancer types and clinical contexts.

Microsatellite instability (MSI) has emerged as a critical biomarker for predicting response to immune checkpoint inhibitors and identifying hereditary cancer syndromes across a wide spectrum of solid tumors [2] [6] [37]. While traditional methods like immunohistochemistry (IHC) and polymerase chain reaction (PCR) have long been considered the gold standards, next-generation sequencing (NGS)-based MSI detection is increasingly being adopted in clinical and research settings for its ability to interrogate hundreds to thousands of microsatellite loci simultaneously and integrate with other genomic analyses [2] [37]. However, the performance of NGS-based MSI testing is not uniform across all cancer types, demonstrating consistently high accuracy in gastrointestinal (GI) cancers, particularly colorectal cancer (CRC), while showing greater variability in other malignancies [2] [6] [46]. This variability stems from multiple factors, including differences in the underlying biology of microsatellites across tissues, the specific NGS methodology and bioinformatic algorithms employed, and technical considerations such as panel size and tumor purity [6] [37] [61]. This guide provides a comprehensive comparison of NGS-based MSI testing performance across tumor types, synthesizing current experimental data to inform researchers, scientists, and drug development professionals in their assay selection and validation processes.

Performance Comparison of NGS-Based MSI Testing

Concordance Rates with Reference Methods Across Tumor Types

The diagnostic performance of NGS-based MSI testing varies significantly when compared to reference methods like PCR or IHC across different cancer types. The table below summarizes key performance metrics from recent large-scale studies.

Table 1: Performance Metrics of NGS-Based MSI Testing Across Tumor Types

Cancer Type Study Sample Size Reference Method Concordance/ AUC Key Findings
Pan-Cancer Wang et al. [2] 35,563 cases PCR Not specified Bimodal ULC distribution; MSI-H prevalence highest in UTNP, GACA, BWCA.
Multiple Solid Tumors Illumina Panel Study [6] 314 samples PCR AUC: 0.922 (Overall) High overall concordance, but sensitivity lower in CRC.
Colorectal Cancer Illumina Panel Study [6] 201 samples PCR AUC: 0.867 Broader score variability and overlapping distributions observed.
Prostate Cancer Illumina Panel Study [6] 58 samples PCR AUC: 1.00 Perfect agreement with reference method in this cohort.
Colorectal Cancer MSICare Study [46] 102 mCRC patients PCR & IHC Sensitivity: 100% Outperformed MSISensor, especially in MSH6/PMS2-deficient tumors.
Endometrial Cancer Octaplex CaBio-MSID [25] 88 samples Not specified Sensitivity: 89.3%, Specificity: 100% High specificity, but slightly reduced sensitivity versus CRC performance.

Prevalence of MSI-H Across Cancer Types

The prevalence of MSI-H is a key factor in understanding the clinical utility of testing in different malignancies. A large-scale retrospective analysis of 35,563 pan-cancer cases revealed distinct clusters of MSI-H prevalence [2]:

  • High-Prevalence Cancers: Uterine/endometrial (UTNP), gastric (GACA), and biliary tract (BWCA) cancers. These three cancer types contributed approximately 80% of all MSI-H cases identified in the study.
  • Lower-Prevalence Cancers: Bladder (BITC), liver (LICA), oropharyngeal (OFPC), and pancreatic (PACA) cancers.
  • Rare MSI-H Cancers: Lung cancer (LUCA) was noted as the most prevalent cancer in the cohort, but MSI-H status was rare.

Significant differences in MSI-H prevalence were also observed within cancer subtypes. For example, colon cancer showed a significantly higher MSI-H prevalence (10.66%) compared to rectal cancer (2.19%) [2].

Experimental Protocols and Methodologies

Common NGS-Based MSI Detection Workflows

The following diagram illustrates a generalized workflow for NGS-based MSI detection, synthesizing common elements from the methodologies cited in the search results.

Diagram 1: Generic NGS-MSI Testing Workflow. This workflow shows the key steps from sample collection to integrated genomic reporting, highlighting the bioinformatic analysis phase where MSI status is determined.

Detailed Methodologies from Key Studies

MSIDRL Algorithm Development and Validation

A large-scale retrospective analysis of 35,563 Chinese pan-cancer cases introduced a novel NGS-based MSI detection algorithm named MSIDRL [2].

  • Panel Development: The process began with the selection of the top 500 most robust noncoding microsatellite loci from colorectal cancer circulating tumor DNA whole-exome sequencing assays. Capture probes were designed for these loci to form a prototype panel.
  • Training Set: The prototype panel was tested on a training set of 105 pan-cancer FFPE samples (31 MSI-H and 74 MSI-L/MSS) with predefined PCR-based MSI status.
  • Algorithm Core - Diacritical Repeat Length (DRL): For each MS locus, the repeat length that maximized the cumulative read count difference between MSI-H and MSI-L/MSS samples was defined as its DRL. Reads longer than the DRL were classified as "stable" (SRC), while those shorter than or equal to the DRL were "unstable" (URC).
  • Background Noise Calculation and Classification: Background noise (B~i~) for each locus was calculated from MSI-L/MSS samples. For a given sample and locus, the binomial test was used to compare the observed unstable read fraction (b~ij~) to B~i~. The top 100 most sensitive loci were selected for the final panel.
  • Unstable Locus Count (ULC) and Cutoff: The ULC for a sample is the count of loci with binomial test p-values below predefined, locus-specific cutoffs. Analysis of the pan-cancer ULC distribution revealed a bimodal pattern, leading to a validated ULC cutoff of 11 for classifying MSI-H status [2].
Real-World Evaluation of Illumina Targeted Panels

A 2025 retrospective study evaluated the performance of MSI detection using Illumina’s TruSight Tumor 170 and TruSight Oncology 500 targeted panels in a real-world cohort of 331 cancer patients [6].

  • Validation Cohort: The final validation cohort included 314 tumor samples after excluding 17 that did not meet the quality control threshold of at least 40 usable MSI sites.
  • Reference Standard: MSI status was determined by fluorescent multiplex PCR analysis of six mononucleotide repeat markers. The cohort included 28 MSI-H and 286 microsatellite-stable (MSS) tumors.
  • ROC Analysis: Diagnostic performance was assessed using Receiver Operating Characteristic (ROC) curve analysis, which yielded an area under the curve (AUC) of 0.922 for the entire cohort.
  • Cut-off Determination and Borderline Group: The study determined an optimal MSI score cut-off of ≥13.8% for classifying MSI-H. A novel aspect was the introduction of a "borderline" group (MSI score ≥8.7% to <13.8%). For samples in this equivocal range, the integration of tumor mutational burden (TMB) into the classification workflow significantly improved diagnostic accuracy. Orthogonal confirmation by MSI-PCR was advised for inconclusive cases [6].
MSICare: A Novel Algorithm for Improved CRC Detection

Research specifically focused on colorectal cancer revealed limitations in the FDA-approved MSISensor algorithm, particularly in metastatic CRC (mCRC) and MSH6/PMS2-deficient settings, prompting the development of the MSICare test [46].

  • Study Cohorts: The study utilized multiple cohorts: a prospective multicenter cohort of 102 mCRC patients (C1), an independent retrospective cohort of 113 patients (C2), a public TCGA series of 118 CRC patients (C3), and a final validation cohort of 152 new CRC patients (C4).
  • Central Reassessment: All samples were centrally reassessed for MSI and MMR status using the reference methods of pentaplex PCR and IHC.
  • Performance Gap of MSISensor: At the exome level, MSISensor failed to diagnose MSI in 16% (4/25) of MSI/dMMR mCRC in C1 and 32% (8/25) in C2. Misdiagnosed cases included four mCRCs treated with immune checkpoint inhibitors, three of which showed a clinical response.
  • MSICare Superior Performance: The MSICare algorithm, when applied to exome data, detected 100% of true MSI cases in C1 and C2. Its high performance (sensitivity 99.3%, specificity 100%) was maintained in the validation cohort (C4) after targeted sequencing with an optimized microsatellite marker set (MSIDIAG) [46].

Technical Factors Influencing NGS-MSI Performance

Key Factors for Stable MSI Detection

Experimental data highlights several technical factors critically influencing the stability and reliability of NGS-based MSI detection [61]:

  • Panel Size: The number of microsatellite loci analyzed can vary widely (from 5 to 7,000). While studies show that panels of different sizes (e.g., whole-exon, pan-cancer, small specific panels) can distinguish MSI-H from MSS, the specific MSI percentage values fluctuate. This necessitates panel-specific threshold establishment [37] [61].
  • Paired vs. Tumor-Only Analysis: The use of matched normal tissue for comparison (paired analysis) provides a more stable MSI% calculation. Tumor-only analysis, which relies on a built-in baseline model, can lead to an inflation of the MSI% value in MSS samples, making separation from MSI-H samples more challenging and precluding direct comparison of thresholds across different panels [61].
  • Sequencing Methods: Data suggests that different sequencing platforms (e.g., MGI vs. Illumina) and read lengths (e.g., PE150 vs. PE100) have a minimal impact on MSI% values and the specific MSI loci detected. However, sequencing depth is a factor; while MSI-H and MSS samples can be distinguished even at low depths (~50x), the calculated MSI% tends to increase with higher sequencing depth [61].

Indeterminate Results and Limitations

A significant challenge in clinical practice is the occurrence of indeterminate or equivocal results from NGS-based MSI tests. These can be reported as "MSI-I" (indeterminate), "MSI-E" (equivocal), "MSI borderline," or "cannot be determined" [37].

  • Incidence: Studies report that indeterminate rates can range from approximately 3.2% to 8.9% of solid tumor samples, with one large cohort study of 191,767 samples finding an indeterminant rate of 8.66% [37].
  • Causes: Primary causes include low tumor purity, low DNA input or degradation (common with FFPE samples), insufficient tumor tissue, and insufficient sequence coverage at microsatellite loci [37].
  • Clinical Implications: An "MSI Indeterminate" result is a technical classification failure, not a biological finding, and it hinders clinical decision-making. In such cases, confirmatory testing with orthogonal methods like PCR or IHC is necessary [6] [37].

Table 2: The Scientist's Toolkit: Key Reagents and Materials for NGS-MSI Research

Item Function/Description Key Considerations
FFPE Tumor Tissue The primary source material for DNA extraction. Quality and quantity are paramount; degradation and low tumor purity are major causes of test failure [6] [37].
Targeted NGS Panel A set of probes designed to capture specific microsatellite loci and/or genes of interest. Panel size and the specific loci selected are critical for performance and vary by vendor (e.g., Illumina TST170, TSO500) [6] [61].
Bioinformatic Algorithm Software to analyze sequencing data and calculate MSI status (e.g., MSISensor, MSICare, MSIDRL). Algorithm choice significantly impacts accuracy, especially in non-CRC and MSH6/PMS2-deficient cancers [2] [46].
Matched Normal DNA Germline DNA from the same patient (e.g., from blood or normal tissue). Required for paired analysis to control for natural polymorphism; its use minimizes false positives in tumor-only analysis [61].
Reference Control Materials Samples with known MSI status (MSI-H and MSS). Essential for assay validation, calibration, and ongoing quality control to ensure analytical accuracy [6] [46].

NGS-based MSI testing represents a powerful tool in oncology research and drug development, offering high-throughput integration with other genomic biomarkers like TMB. The body of evidence confirms its robust performance in gastrointestinal cancers such as colorectal and gastric cancers, with high sensitivity and specificity relative to gold-standard methods. However, its performance is not universal, with variability observed across different cancer types and technical settings. Key factors such as the choice of bioinformatic algorithm, the design of the MS locus panel, the use of matched normal tissue, and sample quality must be carefully considered and optimized. For cancers where NGS performance is less established or in cases where NGS yields indeterminate results, orthogonal confirmation with PCR or IHC remains a critical step to ensure accurate patient stratification and reliable research outcomes. Future efforts toward standardizing panels, algorithms, and reporting criteria will further enhance the utility of NGS-based MSI testing across the full spectrum of human cancers.

Microsatellite instability (MSI) is a genomic characteristic caused by dysfunction of the DNA mismatch repair (MMR) system, leading to accumulated insertion and deletion mutations in short tandem repeat sequences throughout the genome [37]. The accurate detection of high levels of MSI (MSI-H) has become critically important in clinical oncology, as it serves as both a biomarker for Lynch syndrome—the most common hereditary cancer syndrome—and a predictive biomarker for response to immune checkpoint inhibitor therapy [34] [62]. The College of American Pathologists (CAP) has established evidence-based guidelines to optimize testing methods, while the U.S. Food and Drug Administration (FDA) maintains oversight through its approval of companion diagnostics [62] [63]. This review examines the current regulatory and guideline landscape governing MSI testing methodologies, with particular focus on the comparative performance of next-generation sequencing (NGS) against established testing platforms.

Current Testing Methodologies and Technical Specifications

Methodological Approaches to MSI Detection

MSI testing by polymerase chain reaction (PCR) has been considered the gold-standard method since the early 2000s [37]. This technique utilizes fluorescent multiplexed PCR fragment length analysis to measure instability within a small panel of well-characterized microsatellite loci [37]. The analysis is performed using capillary electrophoresis, which separates DNA fragments by size and electrical charge, enabling comparative analysis between a patient's tumor and matched normal DNA [37]. Laboratories typically use quasi-monomorphic microsatellite sequences, which improve assay performance by reducing population variability [37].

MSI testing by next-generation sequencing (NGS) employs various approaches to determine MSI status through sequencing technology [37]. Unlike PCR-based methods, NGS can interrogate from as few as 5 to as many as 7,000 microsatellite loci of varying nucleotide repeat lengths [37]. The bioinformatic analyses used to determine MSI status are not standardized and differ significantly based on sequencing approach, algorithmic calculations, statistical approaches for determining MSI-H status, and established thresholds for calling results [37]. This methodological diversity contributes to variability in test performance across different cancer types [37].

Immunohistochemistry (IHC) for MMR protein detection represents a complementary protein-based approach that assesses nuclear expression of the four core MMR proteins (MLH1, MSH2, MSH6, and PMS2) [34]. While not directly measuring microsatellite instability, loss of nuclear staining for these proteins indicates MMR deficiency that typically corresponds with MSI-H status [34]. CAP guidelines specifically address the role of these different testing modalities in various clinical contexts [62].

Technical Comparison of MSI Testing Platforms

Table 1: Technical Specifications of MSI Testing Methodologies

Parameter MSI by PCR MSI by NGS MMR by IHC
DNA Input Requirements 1-2 ng DNA [37] 10-50 ng or more DNA [37] Not applicable (protein-based)
Tissue Requirements 1-5 unstained FFPE slides (20-40% tumor purity) [37] Varies widely; often requires more slides [37] 1+ unstained FFPE slides
Throughput Capability Medium to high (1-96 samples) [37] High (>96 samples) [37] Medium
Bioinformatic Requirements Not required [37] Complex pipeline required [37] Not required
Technical Skill Level Basic molecular skillset [37] Highly specialized molecular skillset [37] Pathology interpretation skills
Matched Normal Requirement Required (except with QMVR method) [37] Varies by assay [37] Not required

CAP Guideline Recommendations and Evidence Assessment

Key Recommendations for Testing Modalities

The CAP guideline, developed in collaboration with the Association for Molecular Pathology and Fight Colorectal Cancer, provides six specific recommendations and three good practice statements grounded in systematically reviewed clinical evidence [62]. A critical finding from the guideline development process was that no single test captures all patients with mismatch repair deficiency, indicating that a one-size-fits-all approach is not appropriate [64]. The panel found that while MMR IHC, PCR-based MSI, and MSI-NGS may be largely interchangeable for adenocarcinomas of the gastrointestinal tract, significant performance differences emerge outside of the GI tract [62] [64].

For cancer types outside the GI tract, the CAP guideline indicates that MSI-PCR and MSI-NGS approaches demonstrate reduced reliability [62] [64]. The panel noted that while these methods could potentially be optimized for individual cancer types, such optimization would likely prove too daunting for most clinical laboratories [62]. Consequently, the guideline broadly recommends MMR IHC as a default starting point for non-GI cancers, while acknowledging that interpretation of these IHC tests requires significant pathologist experience [64].

Evidence Quality and Limitations

The CAP expert panels employ the GRADE (Grading of Recommendations Assessment, Development and Evaluation) Evidence to Decision Framework, which considers not only the level of evidence but also the balance of benefits and harms, values and preferences, resources, health equity, accessibility, and feasibility [62]. During the guideline development process, panelists noted that methodology details in published NGS literature were often vague or buried in supplemental files, potentially overstating findings and setting back the field for optimal patient care [64]. The panel identified a significant problem in that many published pathology biomarker studies use archived tissues that are not systematically collected, introducing bias and lowering the certainty of evidence [62].

FDA Regulatory Landscape and Companion Diagnostics

Recent FDA Approvals and Their Implications

The FDA has recently approved the Promega OncoMate MSI Dx Analysis System as a companion diagnostic to identify patients with microsatellite stable (MSS) endometrial carcinoma who may benefit from treatment with KEYTRUDA (pembrolizumab) in combination with LENVIMA (lenvatinib) [63] [65]. This PCR-based assay is designed to evaluate MSI status in tumor tissue to guide treatment decisions and support precision oncology strategies in endometrial carcinoma [63]. This approval marks the first Promega companion diagnostic to receive FDA approval and underscores the critical role of diagnostics in accurately matching patients with targeted therapies [63].

The regulatory landscape for laboratory-developed tests (LDTs) continues to evolve, with a recent U.S. District Court decision halting the FDA's efforts to regulate LDTs as medical devices [66]. However, experts emphasize that the decision was narrow and does not eliminate FDA authority entirely, leaving laboratories to navigate uncertain regulatory terrain [66]. The CAP guideline acknowledges that many clinical laboratories currently use highly effective LDTs to assess MMR IHC, MSI-PCR, and MSI-NGS, provided they follow local accreditation body validation requirements [62].

Tissue-Agnostic Indications and Testing Standards

The 2017 FDA approval of pembrolizumab for patients with MSI-H and/or dMMR cancers represented the first tissue-agnostic approval of an oncology drug, creating new demands for standardized testing across cancer types [64]. This approval was initially problematic for pathologists because it lacked specific guidance on which assays would be best to determine MSI-H or dMMR status [64]. The subsequent development of evidence-based guidelines has helped pathologists meet these challenges by providing clarity on testing methodologies across different cancer types [64].

Table 2: FDA-Approved Tissue-Agnostic Indications Linked to MSI Status

Biomarker FDA-Approved Therapy Cancer Indications Recommended Testing Methods
MSI-H/dMMR KEYTRUDA (pembrolizumab) [67] All solid tumors [67] MMR IHC, MSI-PCR, or NGS [68]
MSI-H/dMMR JEMPERLI (dostarlimab) [67] All solid tumors [67] MMR IHC, MSI-PCR, or NGS [68]
Not MSI-H (MSS) KEYTRUDA + LENVIMA [63] [67] Endometrial carcinoma [63] PCR-based testing [63]

Performance Comparison: NGS Versus Established Methods

Analytical Performance Across Tumor Types

The CAP guideline analysis revealed that NGS assays perform well for detecting DNA MMR and high levels of microsatellite instability in adenocarcinomas from the GI tract, but demonstrate significant fall-off in reliability for cancer types outside the GI tract [62]. Similar performance limitations were noted for standard MSI-PCR assays outside their optimized contexts [62]. The literature suggests that NGS assays may need to be optimized for individual cancer types, an approach that is likely not feasible for most clinical laboratories [62].

A significant limitation of NGS-based MSI testing is the rate of indeterminate or equivocal results, which can hinder appropriate therapeutic decisions [37]. Studies suggest that MSI "unknown" calling by NGS assays occurs in approximately 3.2%-8.9% of solid tumor samples, with one large cohort study of over 191,767 solid tumor samples finding indeterminant results in 8.66% of cases [37]. These indeterminate calls typically occur due to technical limitations such as low tumor purity, low DNA input, degraded FFPE samples, or insufficient sequence coverage at microsatellite loci [37].

Advantages and Limitations of Testing Platforms

Table 3: Performance Characteristics of MSI Testing Methodologies

Performance Characteristic MSI by PCR MSI by NGS MMR by IHC
Sensitivity for dMMR Tumors High with standardized markers [37] Variable depending on cancer type [62] High for most tumors [62]
Reproducibility Highly reproducible [37] Variable due to lack of standardization [37] Dependent on pathologist experience [64]
Indeterminate Rate Low 3.2%-8.9% [37] Low
Additional Genomic Information None Simultaneous detection of other genomic alterations [37] None
Optimization for Different Cancers Possible but challenging [62] Possible but daunting for most labs [62] Generally applicable
Ability to Detect Epigenetic Changes Indirectly through MSI signature Indirectly through MSI signature Direct (loss of protein expression)

Standardization and Reporting Considerations

Terminology and Reporting Standards

The EMQN best practice guidelines emphasize the importance of standardized terminology in MSI analysis, noting that errors in terminology are common [34]. The term "MSI-H" should be used to describe a significant increase in microsatellite insertion-deletion variants that indicates dMMR, while "microsatellite stable" (MSS) refers to the absence of such evidence [34]. The category of "MSI-low" (MSI-L) has uncertain clinical significance and typically indicates proficient MMR, though it may indicate dMMR in some specific contexts [34].

NGS-based MSI assays may introduce categories such as "MSI-indeterminate" (MSI-I) or equivalent terms to account for the continuum of microsatellite instability and indicate uncertainty in results [34]. This lack of definitive categorization can complicate clinical decision-making, often necessitating confirmatory testing using orthogonal methods such as PCR or IHC [37]. The CAP provides template reporting formats to standardize how results of DNA mismatch repair testing are communicated for patients being considered for checkpoint inhibitor immunotherapy [68].

Specimen Requirements and Quality Considerations

Both PCR and NGS assays evaluating MSI in solid tumors utilize FFPE tumor specimens, but their specific requirements for tissue volume, DNA input, and tumor purity vary considerably [37]. PCR-based assays typically require only 1-2ng of DNA input for amplification reactions, while NGS sequencing assays often require 10-50ng or more to ensure sufficient sequencing coverage and depth [37]. Minimum tissue volume requirements also differ, with PCR assays generally requiring less material than NGS approaches [37]. These distinctions become particularly important for small biopsy specimens where tissue availability is limited [37].

Experimental Protocols and Methodological Details

Standardized MSI Testing Workflow

Diagram 1: MSI testing by PCR fragment analysis

NGS-Based MSI Testing Methodology

Diagram 2: NGS-based MSI testing

Research Reagent Solutions for MSI Testing

Table 4: Essential Research Reagents and Materials for MSI Testing

Reagent/Material Function Technical Specifications
Formalin-Fixed Paraffin-Embedded (FFPE) Tissue Source of tumor DNA for analysis 20-40% tumor purity; 1-5 unstained slides for PCR; potentially more for NGS [37]
DNA Extraction Kits Isolation of high-quality DNA from FFPE tissue Minimum yield: 1-2ng for PCR; 10-50ng for NGS [37]
Microsatellite Marker Panels PCR amplification targets 5-7 mononucleotide repeats (e.g., BAT-25, BAT-26); quasi-monomorphic markers preferred [37] [68]
Capillary Electrophoresis System Fragment separation and analysis Fluorescent detection; size resolution of 1-2 base pairs [37]
NGS Library Preparation Kits Preparation of sequencing libraries Compatibility with targeted panels or whole exome/transcriptome sequencing [67]
Bioinformatic Analysis Pipeline MSI status determination from NGS data Algorithm for scoring instability; established thresholds for MSI-H/MSS calling [37] [34]

The regulatory and guideline perspectives on MSI testing continue to evolve as evidence accumulates regarding the performance characteristics of different testing methodologies across diverse cancer types. The CAP guidelines provide critical evidence-based recommendations that emphasize context-dependent test selection, recognizing that no single testing modality optimally identifies all patients with MMR deficiency [62] [64]. While NGS offers the advantage of simultaneous multigene assessment, its performance for MSI detection varies significantly across cancer types, frequently producing indeterminate results that require orthogonal confirmation [37].

FDA approvals of companion diagnostics continue to shape the testing landscape, with recent decisions reinforcing the role of PCR-based methodologies in specific clinical contexts such as endometrial carcinoma [63] [65]. The ongoing development of comprehensive NGS assays with FDA-approved companion diagnostic indications represents a promising direction for the field, particularly as the number of tissue-agnostic therapeutic indications continues to grow [67]. Future progress will depend on improved standardization of testing methodologies, enhanced bioinformatic approaches, and more systematic collection of validation data across diverse cancer types [62] [34].

Conclusion

The integration of NGS for MSI testing represents a significant advancement in precision oncology, offering a comprehensive, highly accurate, and efficient method for identifying a key immunotherapy biomarker. While strong concordance with traditional methods is established, NGS provides distinct advantages through expanded genomic insights, including simultaneous assessment of TMB and other genomic alterations. Future directions should focus on standardizing bioinformatic algorithms and reporting criteria, optimizing panels for non-traditional cancer types, and further validating the clinical utility of NGS-MSI in guiding treatment decisions and predicting patient outcomes. For researchers and drug developers, NGS is an indispensable tool that enriches biomarker discovery and paves the way for more personalized cancer therapies.

References