Zixuan
Huang‡
ab,
Yunpei
Si‡
ab,
Yi
Zhang‡
ab,
Zicheng
Huang
c,
Xuehao
Xiu
ab,
Yunshan
Wang
d,
YuDong
Wang
*c,
Chunhai
Fan
e and
Ping
Song
*ab
aThe International Peace Maternity and Child Health Hospital, School of Medicine, Shanghai Jiao Tong University, National Center for Translational Medicine, Shanghai, 200030, China. E-mail: songpingsjtu@sjtu.edu.cn
bSchool of Biomedical Engineering, Zhangjiang Institute for Advanced Study and National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai 200240, China
cDepartment of Gynecologic Oncology, The International Peace Maternity and Child Health Hospital, Shanghai Jiao Tong University, Shanghai, 200240, China. E-mail: wangyudong@shsmu.edu.cn
dDepartment of Clinical Laboratory, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, 250021, Shandong, China
eState Key Laboratory of Synergistic Chem-Bio Synthesis, School of Chemistry and Chemical Engineering, New Cornerstone Science Laboratory, Frontiers Science Center for Transformative Molecules, Zhangjiang Institute for Advanced Study and National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, 200240, China
First published on 2nd June 2025
Formalin-fixed paraffin-embedded (FFPE) samples are widely used in cancer research and clinical diagnostics for preserving tissue morphology and enabling long-term storage. However, FFPE-induced DNA degradation, crosslinking, and inconsistent quality control significantly hinder their utility in molecular analyses. In this study, we established a robust nanoscale quality control (QC) framework incorporating gel electrophoresis and quantitative polymerase chain reaction (qPCR) to evaluate DNA integrity in clinical tissue FFPE samples. Our findings demonstrate a quantifiable inverse correlation between the degree of DNA fragmentation and amplification efficiency in FFPE samples. Further analysis of 26 single nucleotide polymorphism loci using targeted next-generation sequencing demonstrated substantial improvements in DNA integrity after enzymatic repair. A comparative whole-exome sequencing analysis of endometrial carcinoma samples with different archival durations demonstrated significantly increased damage levels across multiple genomic features in long-term stored specimens, highlighting the cumulative impact of archival duration. These findings emphasize the detrimental effects of prolonged storage on FFPE DNA quality. Our QC framework enables effective sample stratification, facilitating the selection of high-integrity specimens for sequencing and guiding heavily degraded samples toward targeted short-amplicon assays. This strategy provides a standardized approach to assess the integrity of FFPE-derived DNA, supporting accurate and reproducible use of archival biospecimens in clinical genomics.
New conceptsWe present a novel nanoscale-resolved DNA integrity assessment platform that integrates qPCR, gel electrophoresis and next generation sequencing (NGS) to unravel the molecular effects of formalin-fixed paraffin-embedded (FFPE) processing and archival storage. Unlike existing studies that rely on isolated methods, our work uniquely combines macro- and nanoscale techniques to reveal formaldehyde-induced DNA fragmentation. A key innovation is the development of a rapid quality control framework using qPCR and gel electrophoresis for efficient pre-screening of archival FFPE samples, addressing a critical bottleneck in clinical genomics. Additionally, through NGS, we reveal storage-dependent biases in sequencing uniformity, variant allele frequencies, and GC-rich sequence retention, offering direct molecular evidence of DNA lesions affecting clinical sequencing. Moreover, our findings demonstrate that enzymatic repair strategies reduce base substitution artifacts while notably improving amplification efficiency at genomic sites that were previously underrepresented or undetectable prior to repair. Our work establishes a pipeline that bridges nanoscale DNA damage characterization with actionable genomic workflows, improving the reliability of FFPE-derived data in precision oncology. This approach not only enhances the use of archival biospecimens but also provides new insights into optimizing FFPE sample utility for retrospective cancer research and beyond. |
FFPE DNA degradation poses significant challenges over time, particularly in archival storage settings.24–26 Even under controlled conditions, progressive fragmentation and depurination lead to shorter amplifiable fragments,27 which impede PCR amplification efficiency and sequencing uniformity.28,29 Research indicates that DNA integrity declines substantially after years of storage, with FFPE samples stored for over 7 years frequently failing to meet quality thresholds for reliable genomic analysis.30 This degradation manifests as reduced library yields, increased shifts in variant allele frequencies (VAFs), and biases in GC-rich sequence retention.31 Despite these challenges, recent advances in sample preparation, repair, and analytical techniques have improved the utility of FFPE samples for genomic studies, enabling researchers to extract valuable information from these historically challenging specimens.32,33 Enzymatic repair kits, such as PreCR, aim to restore DNA integrity by addressing base damage, including the excision of deaminated cytosines and the repair of oxidized guanine.34 This repair capacity enables the recovery of PCR-amplifiable templates from degraded DNA, thereby improving the fidelity of downstream genomic analyses by mitigating FFPE-induced sequencing artifacts.35
FFPE processing and storage introduce variable degrees of DNA damage, necessitating thorough evaluation prior to genomic applications. Therefore, we employed gel electrophoresis and qPCR to systematically compare DNA extracted from FF and FFPE specimens, establishing a standardized QC framework for the rapid assessment and screening of FFPE samples. To further elucidate the impact of DNA degradation and repair on sequencing accuracy, we performed targeted NGS on both untreated FFPE samples and those treated with a commercial DNA repair kit, focusing on 26 single nucleotide polymorphism (SNP) loci. Additionally, we conducted whole-exome sequencing (WES) on FFPE samples archived for different durations to evaluate the effects of DNA fragmentation and mutation profiles on sequencing performance. This multi-tiered analytical approach enables the efficient screening of FFPE samples, facilitating optimal resource allocation for downstream applications. Specifically, high-integrity samples are prioritized for applications requiring long DNA fragments, such as whole-exome sequencing (WES) and gene fusion detection,36 while severely degraded samples are directed toward targeted short-amplicon assays. By bridging the gap between the characterization of FFPE-induced damage and the implementation of practical QC strategies, our study supports the reliable utilization of archived biospecimens in clinical genomics and contributes to improving the accuracy and reproducibility of FFPE-based genomic research.
The qPCR analysis was conducted on a CFX96 Real-Time PCR Thermal System (Bio-Rad) with a reaction volume of 10 μL, comprising 5 μL of 2× SYBR Green master mix, 1 μL of 4 μM forward primer, 1 μL of 4 μM reverse primer, 2 μL of nuclease-free water, and 1 μL of extracted gDNA. The qPCR was initiated at 95 °C for 2 min, followed by a thermal cycle consisting of denaturation for 10 s at 95 °C, annealing, and extension at 60 °C for 30 s.
Additionally, denaturing polyacrylamide gel electrophoresis (PAGE) was conducted using a 10% denaturing gel. The gel was prepared by dissolving 20 g of urea in 10 mL of 5× TBE buffer using sonication. Following this, 100 μL of 10% ammonium persulfate and 5 μL of TEMED were added to initiate polymerization, ensuring a final TBE concentration of 1×. The gel was cast and allowed to polymerize at room temperature. For sample preparation, 5 μL of extracted DNA was mixed with 5 μL of 2× urea-based denaturing sample buffer and heated at 95 °C for 5 min. After denaturation, the samples were stored on ice until loading onto the gel. Electrophoresis was carried out at 120 V in 1× TBE buffer at room temperature, with progress monitored as the samples migrated through the denaturing gel matrix.
Libraries were sequenced on Illumina platforms using a paired-end 150 bp (PE150) strategy after pooling based on concentration and data requirements. Raw sequencing data underwent quality trimming with Fastp, which removed read pairs containing adapter contamination (>10 nucleotide alignment with ≤10% mismatches), reads with >10% uncertain bases, or those with >50% low-quality bases (Phred score < 5). Cleaned reads were aligned to the GRCh38 reference genome using BWA-MEM v0.7.17. Base quality recalibration was implemented through GATK 4.0.2.1 in a two-step workflow: first, generating covariate tables with BaseRecalibrator using known variants, followed by the application of recalibration parameters genome-wide using ApplyBQSR. Variant calling was conducted with GATK HaplotypeCaller, and low-confidence variants were excluded based on filtering criteria (GQ < 20, DP < 10 or > 500, QUAL < 30) prior to downstream analysis.
In this study, we employed a combined approach of qPCR and gel electrophoresis for the preliminary assessment of FFPE samples (Fig. 1b). The gDNA extracted from human cell lines including NA18562, NA18537, HEK293T, and HeLa was used as the reference for comparisons. Given that DNA measures 0.34 nm per base pair, gel electrophoresis reveals that DNA from FF references typically exceeds 1000 base pairs (approximately 340 nm), whereas DNA from FFPE samples is shorter than 340 nm.38 This fragmentation was further corroborated by qPCR analysis, the reference DNA typically shows decreasing Ct values with increasing amplicon length. In contrast, damaged DNA extracted from FFPE samples does not follow this trend and often yields higher Ct values as the amplicon length increases.
To further evaluate the impact of FFPE-induced DNA damage on sequencing outcomes, we performed NGS analysis for FFPE samples (Fig. 1c). The results revealed distinct patterns of DNA degradation, with older samples exhibiting higher levels of fragmentation and artifactual mutations, which were consistent with prolonged archival storage. These findings underscore the importance of implementing a standardized QC workflow to ensure the reliability of FFPE-derived DNA for genomic analyses.
For intact reference DNA samples, the qPCR amplification Ct value decreases as the amplicon length increases due to enhanced fluorescence signal accumulation per cycle in longer fragments. Consequently, the ΔCt between the Ct value of the long amplicon and the short amplicon is negative. In FF samples, ΔCt values (Ct260bp – Ct50
bp) ranged from –2.0 to –0.7 for the 3 SNPs, closely matching those of the reference NA18562 (–2.0 to –1.0), both exhibiting a consistent trend of decreasing Ct with increasing amplicon length. In contrast, FFPE samples consistently exhibited an increase in Ct values as amplicon length increased from 50 bp to 260 bp, with final Ct increases ranging from 3.9 to 5.5 for these loci (Fig. 2a), indicating substantial DNA fragmentation.
This divergence reflects the limited availability of FFPE DNA templates for long template amplification, highlighting the fragmented nature of FFPE-derived DNA. For example, at the rs206781 locus, the Ct difference between FFPE and FF samples was 2.1 cycles for the amplification of a 50 bp sequence, suggesting that FFPE samples retained only 23.3% of the effective DNA concentration compared to FF samples. However, for a 260 bp amplicon, the Ct difference increased to 8.3 cycles, with FFPE DNA showing just 0.3% of the effective DNA concentration relative to FF DNA (Fig. 2b). This dramatic reduction in amplifiable DNA as amplicon size increases highlights the severe fragmentation of FFPE-derived DNA, which significantly impedes efficient PCR amplification.
Agarose gel electrophoresis further revealed distinct fragmentation patterns among sample types (Fig. 2c). The reference NA18572 exhibited superior DNA integrity, characterized by longer fragment lengths and well-defined band distributions. In contrast, FF samples predominantly retained high-molecular-weight DNA (>1 kb), while FFPE samples exhibited significant fragmentation, with most fragments being less than 1 kb. This progressive degradation in FFPE samples is consistent with the formalin-induced DNA damage observed in qPCR analysis, further highlighting the impact of FFPE processing on DNA quality.
Additionally, denaturing PAGE further distinguished the quality differences between FF and FFPE samples (Fig. 2d). The FF DNA exhibited discrete high-molecular-weight bands, whereas FFPE samples displayed a heterogeneous fragment distribution with significant migration of sub-100 nt fragments. This smearing pattern confirms single-stranded DNA (ssDNA) fragmentation in FFPE samples, consistent with formalin-induced crosslinking and hydrolytic damage mechanisms.
The observed fragmentation patterns directly account for the differential PCR amplification efficiencies between FF and FFPE samples. Formaldehyde fixation induces extensive DNA fragmentation through crosslinking and hydrolytic cleavage, severely limiting the availability of intact templates for long amplicons. As a result, FFPE-derived DNA exhibits increased Ct values with amplicon length, reflecting a progressive loss of amplifiable template regions. In contrast, FF samples, which retain higher molecular weight DNA, maintain efficient amplification across a range of target sizes, with decreasing Ct values for longer amplicons. The commercially available reference DNA further confirmed this trend, demonstrating amplification kinetics characteristic of intact gDNA. Collectively, these findings highlight the substantial impact of FFPE processing on DNA integrity, emphasizing the need for rigorous quality control measures to optimize the use of FFPE-derived DNA in molecular analyses.
To further investigate the impact of sample preparation time on DNA integrity, we analyzed EC FFPE tissue specimens spanning multiple archival years (2018–2024). Similarly, we employed the same FP and different reverse primers to generate amplicons of varying lengths (50 bp to 250 bp) targeting 4 SNPs (Fig. 3a and b and Tables S4–S6, ESI†). This approach enabled a quantitative assessment of DNA fragmentation through differential amplification efficiency. Taking rs206781 as an example, we compared ΔCt values between 250 bp and 50 bp amplicons for FF and FFPE samples, revealing a significant correlation between ΔCt values and archival age (Fig. 3b and Fig. S1, ESI†). One-way ANOVA demonstrated significant differences in ΔCt values among groups (F(4,32) = 20.2, p < 0.001). Tukey HSD post hoc comparisons revealed distinct patterns of nucleic acid degradation across storage durations. References exhibited significantly lower ΔCt values than samples from 2018 (MD = −6.48, p < 0.001), 2019 (MD = −4.82, p < 0.001), and 2023 (MD = −3.05, p = 0.025), supporting their use as undamaged controls and highlighting the extent of damage in samples stored for more than one year. However, the difference between references and samples prepared in 2024 was not statistically significant (MD = −1.59, p = 0.459), suggesting minimal damage in the most recently prepared FFPE samples. Samples prepared in 2018 demonstrated significantly higher degradation compared to 2023 (MD = 3.43, p < 0.001) and 2024 samples (MD = 4.89, p < 0.001). Similarly, 2019 samples showed significantly increased ΔCt values relative to 2024 samples (MD = 3.23, p = 0.014). These findings demonstrate a clear time-dependent increase in nucleic acid damage, with critical degradation occurring in samples stored beyond 6 years. Furthermore, HCC samples mirrored this degradation profile (F(2,12) = 27.5, p < 0.0001), with post-hoc Tukey HSD tests confirming significant ΔCt differences between specimens prepared during 2018–2020 and 2024-prepared specimens (MD = −2.87, p = 0.0049), as well as between the 2018–2020 specimens and references (MD = −5.66, p < 0.0001). These findings collectively demonstrate time-dependent nucleic acid damage in FFPE-preserved tissues across multiple cancer types.
Cross-tissue analysis of 2024-prepared FFPE specimens revealed significant ΔCt value disparities between HCC and EC groups (permutation test: median difference = −1.218, p = 0.015, Fig. 3c). This pronounced divergence suggests that HCC-associated DNA undergoes accelerated formalin-induced fragmentation during FFPE processing. Such inter-tumor variability in DNA degradation may be attributed to tumor-specific intrinsic factors combined with pre-analytical variables.39
Control experiments revealed genome-wide DNA degradation in the S1-2018 specimen, with systematically elevated ΔCt values across all tested loci (ΔCt > 1.47) compared to reference samples (ΔCt < −0.86) (Fig. 3d and Fig. S2–S5, ESI†). This pan-genomic degradation pattern suggests that the observed fragmentation results from intrinsic genome-wide degradation processes rather than sequence-specific amplification artifacts, thereby validating the reliability of qPCR for assessing global DNA integrity in archival specimens.
Agarose gel electrophoresis further validated the progressive DNA fragmentation patterns in FFPE samples (Fig. 3e). Notably, the main bands for FF samples were generally larger than 12000 bp, while both EC and HCC FFPE samples exhibited significant degradation and smearing. Based on the primary bands, we assessed the degradation severity as follows: S4-2024 < S2-2019 < S5-2024 < S7-2024< S6-2023 < S8-2024 < S3-2023 < S1-2018.
This fragmentation hierarchy directly paralleled the ΔCt rankings observed in qPCR analysis, where specimens with more severe fragmentation exhibited systematically elevated ΔCt values. The correlation between reduced amplification efficiency (reflected by higher ΔCt) and lower molecular weight DNA underscores the mechanistic link between strand break accumulation and polymerase accessibility limitations. These findings validate ΔCt measurements as a robust, PCR-based QC metric for assessing DNA integrity in archival biospecimens. The approach of standardizing ΔCt thresholds against reference, combined with gel electrophoresis validation, provides a robust framework for assessing FFPE sample suitability in downstream genomic applications. Previous methods for assessing the integrity of FFPE-derived DNA, such as NanoDrop spectrophotometry only assess DNA concentration but do not reflect DNA integrity,40 while gel electrophoresis offers visual assessment with inherent subjectivity and variability.41 The Q-ratio method evaluates DNA integrity by comparing the qPCR amplification of long and short fragments.30 Herein, we combined gel electrophoresis and qPCR to comprehensively assess DNA integrity in clinical FFPE samples, providing a cost-effective quality control strategy for clinical workflows and uncovering time-dependent nucleic acid damage across different carcinoma tissue types.
To further elucidate time-dependent degradation effects, we analyzed a matched pair of FF and FFPE samples preserved for 4 years, as well as the FFPE samples repaired using the preCR kit. The observed disparity in read counts across samples reflects a complex interplay of DNA fragmentation dynamics, repair-induced template redistribution, and PCR amplification biases. FF samples demonstrated significantly higher mean read counts at the 26 SNP loci, reaching 8155, compared to 172 in FFPE. The distribution of read counts across these loci was also more uniform in FF samples, with a CV of 1.46 and a Gini index of 0.65, compared to a CV of 1.53 and a Gini index of 0.69 in FFPE samples. This discrepancy reflects the reduced number of amplifiable templates caused by FFPE-associated DNA fragmentation (Fig. 4b, more details in Fig. S6, ESI†). Notably, loci with initially high read proportions, defined as the percentage of total reads across all 26 loci in FFPE samples, generally exhibited reduced proportions following PreCR treatment (Fig. 4c). For instance, the read proportion of rs10104396 decreased from 20.16% to 9.14% after PreCR treatment. In contrast, loci with negligible or undetectable initial read proportions showed substantial increases, such as rs869720, which rose from 0.02% to 3.56%, and rs1898170, which increased from 0% to 0.36%. This phenomenon may be attributed to the enhanced amplification and capture of certain loci that benefited from better repair, leading to an increase in their read proportion. Given the fixed sequencing data volume, this resulted in a relative decrease in the read proportion of other loci.
Additionally, the repair process may have altered DNA integrity, fragment length distribution, and amplification preferences, collectively contributing to the observed changes in read proportion distribution. These bidirectional shifts underscore the efficacy of the PreCR kit in repairing FFPE-induced DNA lesions. The treatment not only restored amplifiable DNA, enabling the detection of previously undetectable loci, but also enhanced the amplification of underrepresented loci, thereby improving the overall quality of genomic analysis.
A comparative analysis of mutation patterns across FFPE, FFPE-repaired, and FF samples reveals critical insights into formalin-induced artifacts and repair efficacy (Fig. 4d). The most prevalent mutation observed was A > G, with proportions of 18.02% in FFPE, 16.81% in FFPE-repaired, and 14.88% in FF samples. While A > G transitions are traditionally attributed to sequencing errors rather than formalin damage, the elevated rate in FFPE compared to FF suggests potential compounding effects of formalin fixation on error-prone sequencing contexts. This trend aligns with reports of oxidative damage exacerbating sequencing inaccuracies in archival samples.
Notably, the C > T transition, a well-documented formalin artifact,32,42,43 decreased progressively from 14.68% in FFPE to 14.40% in FFPE-repaired and 13.85% in FF. This gradation demonstrates the partial restoration of DNA integrity through preCR repair, bridging the molecular disparity between FFPE and FF specimens. The repair process appears particularly effective against deamination damage, though complete normalization to FF levels remains elusive. Interestingly, G > A mutations exhibited minimal variation across sample types, ranging from 14.08% to 14.94%, potentially indicating mechanisms that are independent of fixation damage. Similarly, T > C mutations displayed a range of 14.64% to 17.56%, also suggesting mechanisms that are not influenced by fixation damage.
To further investigate the impact of DNA fragmentation, we visualized read alignment at rs206781 using IGV (Fig. 5b). The results revealed that S5-2024 contained a higher number of longer reads at this locus compared to S1-2018. This discrepancy likely explains why Ct values in multi-amplicon length qPCR increased more rapidly for S1-2018 as amplicon length increased, highlighting the challenge of amplifying longer DNA templates in older FFPE samples. These findings underscore the progressive degradation of DNA in long-stored FFPE samples, which poses significant challenges for applications requiring high-quality sequencing data.
Furthermore, we performed mutation analysis on both samples. After filtering out common SNPs and known COSMIC mutations, S1-2018 exhibited a higher mutation density at VAF ranges of 35.8–56.0%, while S5-2024 showed greater density at VAF < 35.8%. (Fig. 5c). One possible explanation for this pattern is the accumulation of formalin-induced artifacts over time, particularly deamination events that lead to C > T transitions. These artifacts tend to manifest at higher VAFs due to their widespread occurrence across DNA molecules.44 Taken together, these findings highlight the compounded effects of prolonged storage on both DNA integrity and mutation profiles in FFPE samples.
To further elucidate the impact of storage duration on sequencing performance, we analyzed differences in read density between S1-2018 and S5-2024 under identical GC content and sequencing depth conditions (Fig. 5d). Our results demonstrated that S1-2018 exhibited higher read density than S5-2024 in low-depth regions (50–150 reads) with GC content ranging from 0.4 to 0.6. However, across a broader depth range (150–400 reads), S1-2018 consistently displayed lower read density. This pattern likely reflects the more extensive DNA damage in S1-2018, which reduces amplification efficiency during library preparation and consequently results in lower sequencing coverage in mid-to-high depth regions. Conversely, the relatively higher read density observed for S1-2018 in low-depth regions may be attributed to the preferential sequencing of shorter, more degraded DNA fragments, which fail to contribute meaningfully to deeper coverage. Moreover, S5-2024 exhibited higher read density in regions with lower GC content, a finding consistent with previous studies indicating that DNA with low GC content is more susceptible to degradation over extended storage periods.45 This observation supports the hypothesis that prolonged storage selectively exacerbates the degradation of AT-rich regions, likely due to their lower thermodynamic stability and increased susceptibility to hydrolytic damage.
Overall, these findings provide valuable insights into how FFPE storage duration influences DNA integrity and sequencing outcomes. Specifically, prolonged storage appears to intensify GC-biased degradation while also altering fragment length distribution and sequencing depth profiles. These results underscore the critical importance of accounting for storage duration when interpreting sequencing data from archival FFPE samples, particularly in the context of mutation analysis and genomic profiling.
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d5nh00176e |
‡ These authors contributed equally to this work. |
This journal is © The Royal Society of Chemistry 2025 |