Open Access Article
Yiyue Jiang
,
Yuan Jiang
and
Pingfeng Wang
*
Department of Industrial and Enterprise Systems Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, USA. E-mail: pingfeng@illinois.edu
First published on 6th May 2026
Thermal runaway (TR) in modern lithium-ion battery packs is a critical safety concern due to its potential to cause fires or explosions. When a single cell experiences TR, the intense heat and exothermic reactions can rapidly propagate to neighboring cells, leading to thermal runaway propagation (TRP) at the module or pack level. This review provides a comprehensive examination of TR and TRP, spanning from fundamental mechanisms through advanced modeling techniques to practical mitigation measures and pack-level design optimization strategies. We first delineate the chemical and thermal mechanisms that initiate TR and govern cell-to-cell propagation. We then review state-of-the-art modeling methods, from reduced-order analytical models to detailed 3D multiphysics simulations, as well as emerging data-driven models that predict the onset and propagation of TR events. The strengths and limitations of these modeling approaches are compared in the context of safety prediction. Finally, we discuss current TRP mitigation strategies and emphasize safety-conscious design optimization for battery packs. By integrating improved thermal management, protective materials, and optimized pack architecture, the review highlights how design optimization can minimize propagation risks. Through this holistic approach, the article offers insights to guide researchers in developing next-generation battery packs with enhanced safety and resilience against TR.
Given the high demand for LIBs, the high energy content and reactive chemistry inside a battery cell also pose serious safety challenges. Foremost among these is the risk of thermal runaway (TR), a phenomenon in which a battery cell undergoes self-accelerating exothermic reactions, leading to rapid temperature and pressure escalation. TR events can be triggered by various abuse conditions or faults, such as overheating, overcharging, internal short circuits, or mechanical damage.16 Once initiated, a TR results in the violent release of heat and gas, often rupturing the cell and potentially causing fire or explosion.17 The core functioning components in LIB applications, such as EVs and stationary energy storage systems (ESSs), typically contain numerous cells packed tightly together for high performance. Under such circumstances, the failure of a single cell can trigger a domino effect and have cascading failure as a consequence. The intense heat and flame from a runaway cell may quickly spread to adjacent cells, causing them to overheat and enter TR in turn – a process known as thermal runaway propagation (TRP). In essence, what begins as a localized cell failure can escalate into a chain reaction engulfing an entire module or pack. This propagation phenomenon is widely recognized as a critical safety concern unique to battery systems, as it can multiply the severity of an incident dramatically. The consequences of TR and TRP events are often catastrophic and have been documented in multiple accidents under different application scenarios.17,18 Such incidents illustrate how TRP can compromise the safety of EVs or ESSs within minutes, posing grave risks to users and surrounding infrastructures.
Preventing or limiting TRP can be exceedingly challenging. Once a chain reaction begins, the heat release and fire can be self-sustaining and difficult to suppress. TRP effectively negates the cell-level safety measures, making the entire battery pack susceptible to total failure. Indeed, TRP has been identified as one of the most formidable barriers to the wider application of LIB technology in transportation and grid storage.19,20 Ensuring battery safety under all conditions has, therefore, become a top priority for manufacturers, researchers, and regulators. Modern battery packs are being designed with myriad safety features to mitigate propagation risks; yet, completely eliminating the TR hazard remains an unsolved engineering problem.
In light of the severity and complexity of this issue, there is a compelling need for a comprehensive review of TRP in contemporary battery packs. Research activity on LIB safety has intensified substantially in recent years, producing a wealth of studies on TR mechanisms, modeling, and mitigation. Many review articles have also appeared to cover related topics.21–23 Nevertheless, a synthesis focusing specifically on TRP in modern, high-energy battery packs is still warranted. This paper aims to provide a holistic overview of the state-of-the-art understanding of TRP, consolidating findings from the latest literature, including the current progress in TRP research and the strategies being developed to enhance battery safety, and identifying the key challenges that lie ahead.
To provide a more comprehensive review on this topic, we organize the entire paper in a progressive manner, starting with the mechanisms of TR and TRP to provide an overall understanding of the hazards, moving towards the modeling methods, introducing the ways to emulate TRP, and finally discussing the design optimization methods to mitigate the problem. Fig. 1 shows the overall thread of this paper, and each covered point will be elaborated in the following sections. As just mentioned, the structure of this paper is arranged as follows: we first examine the fundamental mechanisms of TR and TRP, as well as the influential factors that govern TR propagation between cells (Section 2). This is followed by a review of the modeling and simulation techniques used to study TRP, ranging from empirical models to advanced multiphysics simulations (Section 3). In Section 4, we discuss current mitigation and prevention strategies for TRP, including materials, pack design innovations, and operational controls aimed at stopping or slowing propagation. Finally, Section 5 addresses the remaining challenges, safety standards, and future outlook for TRP, highlighting the research directions and technological developments needed to achieve inherently safer battery packs. Together, these sections provide a comprehensive assessment of TRP in modern LIB packs and outline pathways toward improved safety in the era of electrification.
The external causes of TR can be characterized by three main reasons: mechanical abuse, electrical abuse, and thermal abuse.16
(1) Mechanical abuse: typical mechanical abuse includes crushing, collision, and penetration. Mostly, TR triggered by a mechanical abuse condition is due to the structural deformation of a battery cell, further resulting in internal short-circuit (ISC), an electrical abuse condition.
(2) Electrical abuse: typical electrical abuse includes ISC and overcharging. If an electrical abuse condition occurs in a LIB cell, the produced Joule heat will raise the temperature of battery cell, providing conditions for the thermal abuse.
(3) Thermal abuse: typical thermal abuse includes overheating. When the temperature of a battery cell reaches a certain point, a chain of exothermic reactions inside the LIB cell will take place and a large amount of heat will be released, finally resulting in an extremely high temperature up to 800–1000 °C.
After the trigger of TR, the LIB cell will undergo three stages according to:27 (1) the onset of overheating, (2) the heat accumulation and gas release process, and (3) combustion and explosion. At the first stage, the battery system begins to overheat due to the aforementioned reasons, and the temperature starts to rise. Then, at the second stage, the temperature quickly rises because of various exothermic reactions occurring at different components inside the LIB cell. Finally, at the third stage, the combustion and explosion caused by high-volatility or flammable gasses released during the reactions occur, leading to severe safety hazards.
If the heat production process is explored from a micro perspective, there are various exothermic reactions at different temperatures.28–30 In an order of reaction starting temperature, from low to high, they are mainly: (1) solid electrolyte interface (SEI) decomposition (80–120 °C),24,28 (2) anode–electrolyte reaction (120–250 °C, after SEI decomposition),24,31 (3) decomposition of cathode (onset temperature varies with cathode materials: LixMn2O4 150 °C,32 LixCoO2 around 180 °C,33 LiyNiO2 around 200 °C,34 LiNi1−y−zCoyAlzO2 around 200 °C,35 LiNixCoyMnzO2 around 150–300 °C depending on the values of x, y, z,36 LiFePO4 around 310 °C37), (4) electrolyte decomposition (185-greater than 350 °C),38 (5) graphite anode decomposition (>250 °C).39 Each reaction releases a large amount of heat that is sufficient to raise the battery temperature to the onset point of the next reaction, leading to an acute rise in LIB cell temperature, which further causes combustion and explosion.
The mathematical models of these chemical reactions can be established through Arrhenius equations:40–42
![]() | (1) |
![]() | (2) |
![]() | (3) |
![]() | (4) |
![]() | (5) |
![]() | (6) |
The generated heat through these reactions can be modeled as:
gen = s + a + c + e + ec
| (7) |
![]() | (8) |
![]() | (9) |
![]() | (10) |
![]() | (11) |
![]() | (12) |
Eqn (2)–(6) represent the reactions occurring during TR. xa and xs represent the Li-ion present in the anode and SEI; z is the dimensionless thickness of the SEI (z0 is its initial value); α is the degree of conversion of the cathode; ce is the dimensionless concentration of the electrolyte. The symbols A and E represent the frequency factor and the activation energy, respectively, and the subscripts stand for different reactions: s for SEI decomposition, a for anode decomposition, c for cathode decomposition, e for electrolyte decomposition, and ec for electrochemical reactions. ISCcond is an indicator representing whether the ISC is triggered inside the battery; if triggered, its value is set to 1; otherwise, it is set to 0. kB is the Boltzmann constant. It is worth noting that as a triggering method, ISC can significantly influence the thermal behaviors after TR starts inside a LIB.43,44
Eqn (7) represents the total heat generation produced by the exothermic reaction heat, which is the sum of different exothermic heats. Eqn (8)–(12) list the heat release rate of each reaction, where m and h stand for the mass and the enthalpy or heat released by different reactions, respectively. Combined with eqn (1)–(6), the total heat released by TR can be modeled. The specific parameter values can be looked up in ref. 40–42.
Once TR is initiated in a single cell, propagation to neighboring cells typically follows. A large amount of heat from the TR battery will be transferred to adjacent battery cells due to a combination of conduction, convection, and radiation, leading them to soon fall into an overheating condition.23 If the exothermic reactions of their components are triggered because of the high temperature, TR will start inside those adjacent battery cells. Finally, a chain reaction of TRP will burn out the entire battery pack, which contains numerous battery cells.45 Fig. 2 shows the sketch of the internal structure of a LIB and the main exothermic reactions.
ΔĖ = Q + ht,
| (13) |
ht is the heat transfer intensity. Eqn (13) can be further broken down into eqn (14):
![]() | (14) |
The terms T and t represent temperature and time. The left side of eqn (14) represents ΔĖ in eqn (13). ρb and Cp,b are the bulk density and the specific heat capacity of the battery cell, respectively. On the right side of eqn (14),
gen represents the generated heat, primarily from the chemical heat of the TR process, and also includes some electrical heat from the Joule heating process. The rest of the terms of the equation stand for
ht. ∇·(k∇T) represents the conductive heat within solid materials, where k is the thermal conductivity. −∇·(ρfCp,f
T) represents the convection heat within the natural or forced flow of air or cooling fluid when applicable, where ρf and Cp,f represent the density and specific heat capacity of the fluid, and
represents the velocity field of the fluid. σε(T4 − Tsur4) is the radiation heat, where σ is the Stefan–Boltzmann constant, ε is the emissivity, and Tsur is the temperature of the surrounding area. Finally, the term
med represents the heat generation through the medium in which the batteries are soaked. It is 0 when the medium is air, while it has a negative sign when using a heat absorption material, such as the phase change material (PCM), as the medium. These influential factors are sketched in Fig. 3 including each term in eqn (14).
Therefore, according to eqn (14) and Fig. 3, apart from battery physical properties such as density and specific heat, the propagation process (as represented by ∂T/∂t on the left side of the equation) can be influenced by the factors on the right side of the equation, including the heat generation from batteries, heat transfer (conduction, convection, and radiation), and the heat absorption from the medium.
Research has proven that batteries with higher SOC have a higher probability and a more intensive reaction of TR.47 Intuitively, a higher SOC means more stored energy and reactive material, so cells at 100% SOC experience more violent TR with greater heat release (and larger flame/jets) than cells at lower SOC. Chen et al.48 tested cells with 30% SOC and 100% SOC, and observed that 100% high-energy 21
700 cells underwent more aggressive TR than 30% SOC cells. However, propagation scenarios are different. A series of experiments conducted by Wang et al.49 have shown that the battery packs with 40% and 60% SOC have an obvious propagation among battery cells with the highest temperature increasing rate, whereas the propagation did not happen in 80% and 100% SOC packs. They explained this phenomenon by more amount of combustible gases released by 50% SOC cells compared to fully-charged 100% SOC cells, and these gases were more likely to be heated and ignited by the high temperature during TR, which aligned with the studies of Zhong et al.50 Karmakar et al.51 also discovered in their experiments that while the battery modules with lower SOC had TR propagated, such phenomenon did not occur in the module with 100% SOC since the rapid heat release rate did not transfer sufficient heat to trigger the TR of the neighboring cells.
Since battery aging can significantly influence the normal cycling performance of a battery cell, it is also considered a crucial factor contributing to the probability of TR occurrence.52 Zhao et al.53 tested the TR of battery cells under different cycle ages from fresh to 400 cycles whose state of health (SOH) has dropped to 80%, and they discovered that compared to fresh battery cells, the battery cells having more cycles had a lower triggering temperature of TR, meaning that they were easier to get into TR. However, their peak TR temperature was lower than that of fresh cells. In contrast, experiments conducted by Wang et al.49 did not find a significant influence of battery aging in terms of TR propagation rate. They tested 5 different battery cell groups with different aging degrees, and the propagation interval between adjacent battery cells did not exhibit an obvious changing trend as the cycle life changed. Han et al.54 found that the aged battery module's TR start time would be advanced by comparing the TR and TRP processes of battery packs with 100% and 90% SOH. It can also be observed from their results that the propagation intervals between two adjacent cells remained similar under the two different SOH values.
Chemistry of the battery components also plays a role in TR, thus may influence TRP. As discussed before in Section 2.1, different cathode materials have different thermal stability and reaction enthalpies, causing different reaction temperatures. Schöberl et al.55 tested NCM811 and LFP battery cells, which were utilized in different EV models, and found that the NCM-811 cells were 9 times faster in TR reaction speed and 5 times faster in the whole propagation interval than LFP cells. Batteries with the cathode composed of the same element may exhibit different TRP characteristics because of different element concentrations. Li et al.56 tested 3 prismatic battery packs with different NCM cathodes. Their test results indicated that high-Ni cathodes showed the fastest heat rise and faster propagation speed compared with low-Ni cathodes. Apart from electrode materials, different electrolyte chemistry or form may also influence heat generation in a single battery cell, thus changing the TRP behaviors. Chen et al.57 explored the effects of quasi-solid-state electrolyte and liquid electrolyte on the TR behaviors, and the results showed a higher thermal stability of the former electrolyte, with more side-reaction inhibition performance and higher TR onset temperature.
Overall, battery heat generation is mainly correlated with the electrochemical status and materials of the battery cells. Also, the TR behaviors of a single battery may differ from TRP of a battery pack, meaning that conditions causing easier TR in a single battery may not necessarily lead to easier or quicker propagation in battery pack scenario. Higher SOC would cause a more intense TR reaction of a single battery cell, but the battery packs with around 50% SOC are more prone to TRP because of more flammable gas release. TR is more likely to happen in more aged battery cells, but the propagation speed does not show much difference between fresh and aged battery packs. Finally, electrode or electrolyte materials have a lot of influence on the thermal stability of the battery cells, making it important for the safety design of battery cells.
In summary, all three heat transfer modes can contribute to the propagation, but the relative significance of each mode depends on the pack geometry and conditions. Conduction can be dominant in small cell spacing and solid connections, whereas convection becomes significant in open airflow or venting, and radiation prevails at larger gaps and higher temperatures. Also, changing the dominant heat transfer mode may change the transferred heat, thus influencing the propagation behaviors. The battery pack configurations can largely influence cell-to-cell heat transfer during propagation, specifically, the inter-cell spacing, geometric arrangement of the battery pack, heat transfer medium, side plate, connection methods, etc. However, there do exist certain trade-offs when designing a battery pack considering some of these factors. For example, how to choose the interstitial material with a proper thermal conductivity, or how to adjust the inter-cell spacing to balance the energy density and the safety of the battery pack. Therefore, the challenge still remains.
med in eqn (14). Pack-scale studies consistently show that changing the heat-absorption characteristics of the inter-cell medium reshapes how TR propagates. In an open module with only air as the medium, this term is effectively zero since air has negligible heat absorption aside from carrying away some heat via convection. Instead, filling the space with a PCM or other coolant can significantly change the propagation behavior. PCMs absorb a large amount of heat when their phase changes (usually melting), thus acting as thermal buffers. Incorporating PCM around cells is a promising strategy to prevent TRP due to the PCM's endothermic phase change, which draws in heat. Ma et al.64 investigated the TRP behaviors of a composite PCM (CPCM) inside a battery pack and identified critical PCM thermophysical property thresholds for halting propagation. In their model, they marked a boundary between a zone where TR propagates and another zone where TR inhibits. Beyond the inhibition sketch, direct module experiments demonstrate the same trend: flame-retardant or high-conductivity CPCMs reduce neighbor-cell peak temperatures and delay or prevent ignition, provided that their latent capacity is not overly diluted by fillers.65,66 Those studies have provided solid proofs that the heat absorption of the heat transfer medium, CPCM in their cases, could dramatically alter the TR or TRP behaviors.Geometry and coverage of the absorbing medium around cells could be one of the factors that influence the thermophysical behavior of the PCM. In a parametric module study, Luo et al.66 reported a critical PCM layer thickness above which propagation is arrested; notably, the critical thickness varied non-monotonically with cell thickness, highlighting that the required heat-sink layer depends on both module geometry and PCM transport properties. Menz et al.67 found that thin, strategically placed endothermic barriers between cells can fully block cell-to-cell transfer at a much lower mass cost. Model–test optimization of such inter-cell, heat-absorbing barriers provided a practical route to meet safety targets without sacrificing energy density.
The thermophysical properties of the PCM also have a large influence. For example, the thermal conductivity controls how fast the heat can be transferred through the medium. For CPCM, conductive scaffolds, such as expanded graphite (EG), spread heat from the trigger cell into larger PCM volumes quickly, increasing early-time absorption. However, if conductivity is raised without maintaining adequate latent capacity, lateral conduction can also expedite heat delivery to neighbors. Pack-scale modeling and tests have observed this trade-off. Conductivity enhancements shorten the preheating period of adjacent cells unless balanced by sufficient latent enthalpy and/or spatial decoupling.66 Filler fraction and processing also set anisotropy. Compression-aligned EG networks yield different thermal conductivity in different directions, so orienting the high thermal conductivity direction along outbound heat-flow paths improves buffering where it matters most.68 In CPCM based on paraffin/EG, Talele et al.69 explored the delay effect of TR occurrence point caused by the percentage of added EG in a paraffin/EG CPCM. They discovered that when EG was at 30% with a thermal conductivity of 13.8 W (m K)−1, the maximum possible delay was achieved at LiFePO4 cathode. Besides thermal conductivity, other thermophysical properties, including melting temperature, melting range, and composition, also steer the TRP behaviors. Using a hydrated-salt PCM, Zhi et al.70 discovered that it had good thermophysical properties for the battery pack thermal management system, and could significantly relieve the TRP. In contrast, some other CPCMs, such as paraffinic ones with poorly matched melting temperature or insufficient flame retardancy, can delay venting but, once ignited, intensify flame spread. Several attempts were therefore made by combining absorbing media with flame-retardant chemistries or hybrid structures to secure both endothermy and fire tolerance. Chen et al.65 synthesized a retardant CPCM by including intumescent flame retardant into the PCM, and this CPCM could efficiently decrease the risk of TRP and prolong the propagation time. Li et al.71 proposed a thermal barrier containing nano-ceramic fiber, PCM and mica cross skeleton, which largely blocked or postponed the TRP within a NCM battery module. Wang et al.72 investigated a flame-retardant MXene-based CPCM containing various components including ammonium polyphosphate and zinc hydroxy stannate as flame retardant. Those components enabled a V0 flame-retardant rating and a noticeable cooling effect on the battery temperature.
Finally, except for the aforementioned internal properties of CPCM, external factors such as interfacial and contact effects also determine how much thermal heat actually enters the medium during TR or TRP. In controlled single-cell abuse experiments conducted by Chen et al.,73 aerogel, thermally conductive gel, and PCM wrapped around a battery cell effectively suppressed the rate of temperature rise within the battery, thus delaying safety-valve rupture and postponing TR onset. At the module level, Xiao et al.74 combined CPCM with liquid-cooling plates and discovered that increasing coolant flow rate primarily delayed the TRP on both sides of the cooling plate. They also concluded that the thermal conductivity reduction of CPCM can significantly delay TRP since the heat produced in each cell was concentrated on the edge of the battery and was hard to transfer through CPCM and the cooling plate.
In summary, the existing studies have formed a clear exploration of the mechanisms of TR and its propagation in a battery pack. These mechanisms serve as a strong theoretical base for the application-level implementations of the following modeling and suppression works, which further facilitate the design optimizations, as will be demonstrated later in the paper.
This section organizes the modeling methods into three different modeling methods. First, reduced-order TRP models, a set of models often developed for simplicity and quick evaluation at the cost of pack details; second, 3D detailed and multiphysics models, established with the help of computational fluid dynamics (CFD)/finite element (FE) frameworks that couple different physics including TR reaction kinetics, ejecta, and cell-to-cell heat transfer; and finally, surrogate models and data-driven methods, enabling rapid hazard screening and online prognosis.
Before discussing specific model classes, it is useful to clarify that most TRP thermal models rely on two groups of assumptions, namely assumptions for internal heat generation and assumptions for boundary conditions. For heat generation, reduced-order models usually lump the thermal runaway reactions into a small number of equivalent source terms, often represented by Arrhenius-type kinetics or by experimentally calibrated heat-release functions. In some cases, the heat released by internal short circuit is further added through a simplified event-gated or state-dependent term. By contrast, detailed 3D models usually distribute the heat source in space, but they still often simplify the underlying chemistry by prescribing equivalent runaway kinetics or a fitted heat-source profile once thermal runaway is triggered, rather than resolving every electrochemical process during normal charge and discharge. Therefore, the predicted temperature rise and propagation timing are strongly affected by how the onset temperature, total heat release, and release rate are defined.
Boundary conditions are equally important because they determine how much of the generated heat remains inside the cell or is transferred to neighboring cells and the surroundings. At the single-cell level, adiabatic boundary conditions are often used to identify the intrinsic heat-release behavior, which is why the heat-loss term can be set to zero in some calibration models. At the module or pack level, however, the boundaries are generally non-adiabatic and are represented through thermal contact resistances between cells and structural parts, convective heat exchange with air or coolant, radiative exchange with surrounding surfaces, and, when applicable, heat absorption by a medium such as phase change material. In CFD-based models, these assumptions may be further specified through inlet/outlet, velocity or pressure, and ambient-temperature settings. As a result, even for the same cell chemistry, different heat-source formulations or boundary-condition definitions may lead to different predictions of peak temperature, trigger time, and dominant propagation pathway.
650-cell pack. Jiang et al.81 established a TRNM based on the heat transfer characteristics of the battery pack, with discussions of the TR prevention effect of PCM, exhibiting high computational efficiency and accuracy in modeling and predicting TR propagation. He et al.82 built a highly accurate reduced-order model for the TRP by redistributing heat source terms and correcting TR trigger criteria. They also considered the heat dissipation through the liquid cooling plates.
When establishing the model, some of the studies started from the heat generation of a single cell,80,81 while some others started from considering the heat flow in each single node.82 For the former type, the governing equation of the physics in a single node i with mass Mi and specific heat Cp,i can be considered as:80
![]() | (15) |
The chemical heat release Qr of each node can be modeled through multiple lumped reactions by Arrhenius law, similar to eqn (1)–(6) as described in Section 2.1. The electrical heat Qe was represented as an event-gated form as T > TTR in which TTR is the triggering temperature of TR in ref. 80:
![]() | (16) |
While in ref. 81, this ISC electrical heat can also be associated with changing SOC considering the venting effect:
![]() | (17) |
![]() | (18) |
After the heat release model of a single cell is established, the next step is to couple all the nodes into a network model. In the thermal network construction, each single cell becomes a node. In some of the models, heat leaves a node via 6 paths from the 6 sides of a prismatic battery cell to its neighbors or to the ambient, or it categorizes nodes by location and writes an energy balance for each type.80,81 Along a path, total resistance is the sum of conduction, contact, and external heat-loss resistances, which are represented using Rz (Table 9, Feng et al.80), or conductive, contact, convective, and radiative resistance (Table 3, Jiang et al.81).
A threshold-based triggering criterion could be adopted to determine whether the neighboring cells go TR. Because side heating raises the edge temperature first, the TR event is triggered when a front-edge estimate
fi, which can be calculated through eqn (19), reaches TTR:80
![]() | (19) |
Here, the subscript i in eqn (19) denotes the i-th battery in the pack, while the superscripts f and b denote the front and back cell of that battery, respectively. Therefore, Tfi and Tbi represent the temperatures of the front and back cell in the i-th battery. This notation follows the two-cell representation of one battery adopted in Feng et al.'s study.80 Note that this index definition is different from that in eqn (20), where the subscript i denotes the node index in the thermal network. To avoid confusion, eqn (19) describes the cell temperatures within the i-th battery, whereas eqn (20) describes the energy balance of node i in the reduced-order model.
In contrast, ref. 82 considered the overall heat flow of a cell from the beginning. They use the heat balance equation:
![]() | (20) |
represents the sum of all heat flow into and out of node i, in which the heat flow from node j to i is defined as
related to the thermal resistance Rji between node j and i.
Apart from rapid modeling for the TRP inside a pack with a specific design, another significance of the TRNM methods is that they reveal several design insights for propagation prevention or mitigation, including: (1) raising the separator collapse temperature TTR; (2) reducing electrical heat release ΔHe; (3) increasing external heat dissipation hdis; (4) adding thermal barriers between cells.80 Also, coolants such as PCM modules were demonstrated to delay or block TRP by absorbing latent heat.81
TRNMs couple thermal resistance during TRP, facilitating the considerations of thermal barriers among the battery cells. It can be observed and deduced from the modeling process and the coupling inside a thermal network that TRNMs are the most effective for prismatic battery cells because of their regular box-like shape, which makes it straightforward to define clear conduction paths between adjacent cells, cells and accessories (such as cooling plates), etc. Compared to cylindrical battery cells with a radially symmetric way and varying thickness of pouch battery cells, prismatic cells are easier to establish a resistance network structure. Therefore, TRNMs are not always easy to build for all TRP modeling cases.
There are some other types of reduced-order models developed in some of the existing studies. For example, despite the fact that a lot of models were made for prismatic batteries because of their regular shape, Zeng et al.83 developed a model for 18
650 battery cells, considering the specific cylindrical geometry of a single battery cell and the battery pack. They defined the TRP energy balance equation by splitting the solid and gaseous parts, considering the internal heat generation of a battery cell, heat transfer to other battery cells, heat loss, and vented gas. To couple the thermal transfer between two battery cells, the geometry and stacking of the cylindrical battery pack present challenges that must take into account factors such as the angles of the stacking. Li et al.84 developed their method based on the reduced-order model derived from finite element (FE) analysis. The model they built was based on the Arnoldi method in the Krylov subspace, saving computational resources while obtaining the temperature at specified points.
Despite their merits in rapid prediction and computational resource savings, there are, of course, some strong assumptions underlying the establishment of such reduced-order TRP models. For example, many studies assumed a spatially uniform temperature within each node, which means that they lack spatial details and do not capture localized temperature gradients or the complex geometry effects within a module.77 For instance, the battery cell surface can be unevenly heated during propagation, while TRNMs are not able to capture this property. Moreover, defining effective thermal resistance values for irregular pack layouts or cooling components can be challenging.
Table 1 lists some of the reduced-order models established for TRP as mentioned before. A lot of models were built for prismatic packs because of their regular shapes, as pack geometry and cell shape could largely influence the establishment of the model. Therefore, more suitable models for some other battery types, such as cylindrical and pouch cells, are needed for more adaptability of the method of reduced-order model.
| Ref. | Governing equations | Inter-cell coupling | Battery type |
|---|---|---|---|
| Feng et al.80 | ![]() |
Thermal resistance network | Prismatic packs |
| Jiang et al.81 | ![]() |
Thermal resistance network. | Prismatic packs |
| He et al.82 | ![]() |
Thermal resistance network. | Prismatic packs |
| Zeng et al.83 | ![]() |
Direct heat transfer | Cylindrical packs |
| Li et al.84 | ![]() |
Interpolation in FE analysis | Prismatic packs |
Fig. 4 compares three representative reduced-order modeling routes for TRP. The thermal resistance network model in Fig. 4(a) idealizes the battery pack as a set of lumped thermal nodes connected by equivalent resistances, enabling rapid evaluation of inter-cell heat transfer and pack-level propagation trends.80 The direct heat-transfer model in Fig. 4(b) retains a more explicit description of the dominant heat-transfer pathways between neighboring cells, such as radiation, effective conduction/convection, and ejecta-related heating, thereby providing a more physics-informed representation of propagation while still keeping the computational cost relatively low.83 Different from these purely lumped formulations, the reduced-order framework in Fig. 4(c) starts from a higher-fidelity FEM heat-transfer problem and then applies state-space reduction to preserve part of the spatial temperature information with substantially improved efficiency.84 Together, these methods illustrate the main trade-off of reduced-order TRP modeling, namely, simplicity and speed versus physical and spatial fidelity.
![]() | ||
| Fig. 4 Frameworks of different types of existing reduced-order models from different studies, (a) thermal resistance network model (reproduced from ref. 80 with permission from Elsevier, copyright 2026), (b) direct heat transfer model (reproduced from ref. 83 with permission from Elsevier, copyright 2026), (c) reduced-order model used for interpolations in FEM (reproduced from ref. 84 with CC-BY license). | ||
In short, reduced-order models trade away granular details for simplicity and fastness. They are best suited for high-level risk assessments and parametric studies under specific cases, but to investigate the fine-scale thermal behaviors during propagation and develop a more universally adopted method, more detailed approaches are needed.
From a practice-oriented perspective, reduced-order models are often sufficient when the main objective is to compare pack concepts, screen a large design space, or estimate pack-level indicators such as propagation time, peak temperature, and the sequence of neighboring-cell failure. In these cases, the designer is usually more concerned with the relative ranking of layouts, spacings, cooling strategies, or barrier configurations than with resolving every local temperature gradient or reaction pathway in detail. Because such models rely on a smaller set of effective parameters and can often be calibrated directly from module-level or pack-level tests, they are especially attractive for early-stage engineering design, optimization, and uncertainty quantification, where many repeated evaluations are required. In other words, when the decision of interest is at the pack level and the dominant need is speed, robustness, and trend identification rather than local mechanistic resolution, simpler lumped or reduced-order models are often the more appropriate choice.
These models are typically built on the energy conservation equations applied to each cell and its surroundings, often using finite-element or finite-volume methods to resolve the geometry of the module.46 In 3D propagation simulations, each cell that undergoes runaway is represented as a distributed heat source, while heat conduction, convection, and radiation are modeled through the module structure. Although 3D models resolve spatial temperature gradients, they still require simplified boundary conditions at the pack boundaries and interfaces. In practice, these are commonly prescribed as convection and radiation to the ambient on exposed surfaces, thermal-contact conditions between cells and structural parts, and inlet/outlet conditions for coolant or vent-gas domains when fluid flow is included. Therefore, the higher fidelity of 3D models mainly comes from resolving the geometry and local temperature field, rather than from eliminating modeling assumptions altogether.
Foundational studies date back to 2007, Spotnitz et al.85 developed models to predict abuse tolerance of Li-ion cells and packs. Around the same time, Kim et al.42 introduced a 3D thermal abuse model for individual lithium-ion cells, coupling electrical, thermal, and electrochemical aspects within a cell. More recent efforts include the study developed by Feng et al.,46 who proposed a 3D TR propagation model for a large-format cell module, simplifying the internal chemistry with empirical kinetics. Key findings of this model include the identification of strategies to suppress propagation, such as increasing the cell's TR trigger temperature, improving heat dissipation, adding thermal insulation between cells, etc. Larsson et al.86 modeled TRP and demonstrated that including physical barriers could markedly slow or prevent propagation, elaborating on how pack geometry and thermal barriers are critical factors. Similarly, Chen et al.87 modeled the influence of a flame-retardant polypropylene thermal barrier layer on the TRP behaviors based on their experiments. Their simulation showed that the thermal barrier significantly suppressed heat transfer through radiation and flaming convection during the intensive ejection stages. Coman et al.41 conducted experiments and developed the corresponding simulation based on COMSOL Multiphysics v5.2 for TRP triggered by a self-designed internal heating device. That study confirmed the importance of heat conduction in the propagation process and created a controlled initiation of TRP testing. TRP experiments are often used to instruct and calibrate the simulation models. Xiong et al.88 established a TRP simulation model based on their experiment, where they received data on temperature, voltage, gas, and pressure changes. They also brought several pack design suggestions based on their experiment and model, for example, strong structural integrity and proper ventilation.
As models grew more sophisticated, researchers examined the dominant heat-transfer pathways during propagation. Lamb et al.89 experimentally observed propagation in 10-cell cylindrical and pouch modules and found that heat conduction through the module connecting posts was the primary propagation pathway. This finding was further reflected in later modeling studies. Tang et al.90 developed a 3D CFD-based thermal runaway propagation model for a cylindrical 18
650 module, in which the cell heat source was represented using a thermal-abuse reaction framework together with an electrically connected cell model. It should be noted that, although ref. 90 referred to this framework as an electrochemical–thermal coupling model, the heat-generation submodel mainly describes exothermic side reactions during thermal runaway, such as SEI decomposition, electrode–electrolyte reactions, binder decomposition, and electrolyte decomposition, rather than the electrochemical reactions governing normal charging and discharging. Using this framework, they quantified the relative contributions of different heat-transfer modes and showed that, for cells directly adjacent to the failing cell, conductive heat transfer through solid contacts dominates, whereas for cells farther from the failing cell, thermal radiation becomes the main heat-exchange mode. This indicates that pack layout and connection design can significantly alter propagation behavior. Jin et al.91 built the TRP simulation model based on the calibration of the propagation tests. They analyzed the heat flow of some of the heat transfer interfaces and identified the transfer mechanisms of TR as the accumulation of heat energy, pointing out that the key to delaying or inhibiting TRP is reducing the heat flux power of the triggered battery to unburned.
Many propagation simulations assume a cell is already in TR and do not model normal battery operation for a focus on the propagation process itself, while some multiphysics models connect the propagation with the abuse conditions. For the TR of a single cell, Domalanta et al.92 presented one such model for a lithium polymer cell. They included multiple reaction steps within COMSOL Multiphysics and validated against experiments, revealing that the cathode–electrolyte reaction contributed the highest heat generation. For the propagation in a whole battery pack, Qi et al.93 explored the TRP model under overcharge and analyzed the influences of current, convection coefficient, and gap between batteries. They also discovered that when the gap exceeded a certain value, the clamp between batteries became a vital heat conduction factor for the TRP occurrence. Such models simulate a battery's normal behaviors, failure triggering, and propagation within one framework. Lyu et al.94 investigated the TR and TRP behaviors of a pouch cell pack related to tab overheating. They used an electro-thermal model to calculate the heat generation at the normal charge–discharge stage, coupling with a lumped TR and propagation model. They found that several different factors, such as charging C-rate, insulator, and tab connection model, would influence the TRP behaviors. Wang et al.95 discussed the TRP of fast-charging NMC batteries. Their results showed that increasing the triggering temperature, battery spacing, and decreasing the charging C-rate could effectively reduce the TRP risk.
There are also some researchers starting to incorporate the vent gas and flame propagation phenomena that occur when a battery cell vents hot gases or ignites for more realistic modeling. For the TR of a single battery cell, Kong et al.96 developed a coupled conjugate heat transfer and CFD model to capture the venting process and jet-fire behavior of a 18
650 cell undergoing TR. They concluded that the gas ejecta might cause an abrupt temperature rise of the air above the cell when the venting valve opens, and a higher SOC tended to have a higher maximum flame temperature and shorter onset time, indicating how the vent gas and flame might affect propagation. On the pack side, Yao et al.97 explored TR, gas release, and propagation under different SOC levels of cylindrical 18
650 packs through testing and a multiphysics coupled model. They exhibited how the gas ejecta and flames heat the surrounding areas and ignite the nearby cells, providing a detailed picture of flame-driven propagation. Mishra et al.98 built a TRP simulation model accounting for multiple coupled non-linear phenomena, including vent gas flow and combustion. Their results exhibited the critical importance of these factors in heat transfer during TRP. The 3D TRP simulation model developed by Takagishi et al.,99 containing 98 battery cells heated by a burner, evaluated multiple steps during TRP including gas ejection. They discovered that the gas velocity field dominated the velocity field in their battery pack chamber, affecting the temperature distribution and the TRP processes. Weber et al.100 proposed a simulation framework incorporating vented gas and the related thermal transport changes produced by pouch cells under TR. The simulation was validated by two battery cell propagation experiments in an autoclave. Their results showed that when the pressure was increased, the propagation became faster accordingly.
Parametric analysis is one of the most important tasks in such modeling studies. By investigating the influences of each model parameter on the modeling results, researchers can make design suggestions for the pack accordingly, aiming to mitigate TRP. In the model Wang et al.101 built, several thermophysical parameters were investigated during TRP, such as self-heating and triggering temperature, mass loss, etc. Their results indicated that some of these parameters played a crucial role in TRP, and manipulating these might effectively delay it. Xia et al.102 conducted a safety risk assessment study for TR and TRP with a multiphysics model. They utilized a probabilistic model and calculated the safety risk of the TR and TRP under different parameters, such as ISC triggering temperature, internal resistance, and inconsistency among the pack, etc. Zhang et al.103 explored the influence of parameters related to the dynamic heat conductivity and heat convection coefficient on the TRP characteristics and prevention effectiveness. Their parametric study found the boundary formed by these two parameters for the TRP inhibition.
Table 2 concludes the features of each type of 3D high-fidelity simulation, including high-fidelity 3D models, thermal-fluid models, and electrochemical multiphysics models. Since these modeling approaches only require computational resources to run and can also provide major details inside the TRP process, which researchers want to capture, a common way of conducting such simulations is to use small-scale TRP experimental tests to calibrate or validate the simulation model. This way, the higher costs from large-scale destructive battery tests are saved and the modeling accuracies are kept.
| Modeling approach | Coupled physics | Fidelity | Primary purposes |
|---|---|---|---|
| High-fidelity 3D thermal models (FEM) | Full 3D conduction in modules with detailed geometry including tabs or cooling plates, boundary radiation included. | Heat transfer throughout the battery pack. | • Resolve heat pathways, hot spots, and timing of cell-to-cell propagation. |
| • Validate reduced-order models. | |||
| • Evaluate spacing, pack configurations, and cooling. | |||
| • Identify critical components for sensing or barriers. | |||
| Thermal–fluid (thermal + CFD) | Solid conduction with CFD for vent gas flow, also convective heat transfer. Possible combustion, radiation, or existing turbulence. | Fluid and gas flow for simulating the ejecta, venting gas, and flame. | • Study cascade failure involving gas flow, flame spread, pressure rise. |
| • Optimize vent routing and exhaust/deflectors. | |||
| • Supports vent port placement and suppressant strategies. | |||
| Electrochemical multiphysics (electro–thermal–mechanical) | Reaction kinetics and structural deformation, including short-circuit, electrical coupling, and structural rupture. | Integrates electrochemistry and mechanical deformation, with inter-cell reactions and mechanical stresses and short-circuit. | • Trigger analysis under abuse crash. |
| • Predict when deformation leads to TR and propagation. | |||
| • Informs casing strength, restraint, electrical disconnects. |
Fig. 5 summarizes representative frameworks of 3D detailed and multiphysics models for TRP. The high-fidelity thermal models in Fig. 5(a1) explicitly resolve the three-dimensional geometry of the battery module and directly calculate conductive, convective, and radiative heat transfer, thereby capturing local hot spots and the spatiotemporal evolution of cell-to-cell propagation.46 Building upon this basis, the coupled model with thermal barriers in Fig. 5(a2) further incorporates protective components such as inter-cell barriers and enclosure structures, making it possible to evaluate how material design and pack architecture modify heat-transfer pathways and suppress propagation.87 In Fig. 5(b), the thermal–fluid model extends the formulation by accounting for vent gas flow, jetting, and flame evolution, which is particularly important when ejecta-induced convection and combustion impose substantial thermal loads on neighboring cells.96 For Fig. 5(c) it should be noted that the model in Xia et al.'s study102 is not a full physics-based electrochemical model at the electrode/material level; rather, it is an equivalent-circuit-based electro-thermal coupling framework further combined with thermal-abuse and fluid-dynamics submodels. Such a treatment enables the analysis of electrical inconsistency, ISC-related heating, and pack-level heat transfer during TR and TRP, while avoiding the substantially higher complexity and computational cost of fully physics-based electrochemical models, which are still rarely adopted in TRP modeling.102
![]() | ||
| Fig. 5 Frameworks of different 3D detailed modeling studies, (a) high-fidelity thermal model: (a1) 3D thermal model (reproduced from ref. 46 with permission from Elsevier, copyright 2026), (a2) model coupled with other systems such as thermal barriers (reproduced from ref. 87 with permission from Elsevier, copyright 2026); (b) thermal-fluid model (developed for the vent gas and flame) (reproduced from ref. 96 with permission from Elsevier, copyright 2026); (c) equivalent-circuit-based electro-thermal model (reproduced from ref. 102 with permission from Elsevier, copyright 2026). | ||
Crucially, 3D simulations can evaluate how design modifications impact propagation in ways that lumped models cannot. The insights, derived from the 3D modeling and confirmed by experiments in some cases, directly inform safer pack design by highlighting which engineering changes most strongly influence propagation outcomes. Furthermore, unlike the nodal models, full 3D simulations capture the uneven heating and multidirectional heat transfer in a module, which is essential for assessing protective components (walls, vents, cooling plates, etc.) on a local scale.
Nevertheless, the added fidelity of detailed electrochemical–thermal or multiphysics models is most justified when local mechanisms directly determine the engineering conclusion. Typical examples include resolving severe spatial non-uniformity, analyzing abuse-trigger pathways such as overcharge, internal short circuit, or mechanical deformation, evaluating vent-gas/flame impingement and local heat-transfer routes, or assessing the effectiveness of tabs, cooling plates, enclosures, vents, and thermal barriers at the component level. In such scenarios, simplified lumped descriptions may miss the dominant failure mechanism. At the same time, it should be recognized that these models are notoriously difficult to calibrate, since they may involve many coupled material, kinetic, and transport parameters, often on the order of several tens, whose identifiability is limited by the availability of abuse-test data. Therefore, for practical TRP studies, high-fidelity electrochemical–thermal models are most valuable when the additional local physics is expected to change the design conclusion, whereas equivalent-circuit-based or simplified thermal-abuse models are often more pragmatic for routine pack-level analyses.
Traditional machine learning (ML) methods, such as decision tree, k-nearest neighbor (kNN), support vector machine (SVM), and random forest (RF), have been widely applied as surrogate models to capture battery TRP patterns for risk classification and fault localization. Jia et al.104 established a multiphysics electrochemo-mechanical model to investigate TR in a single pouch cell triggered by internal short circuits. Approximately 300
000 simulated electrical and thermal data across varying SOCs and C-rates were used to train decision tree and SVM classifiers, achieving rapid four-level risk identification within minutes and F1 scores greater than 0.93. Daniels et al.105 developed an RF framework for TR localization in cylindrical battery packs. Validated CFD simulations generated temperature fields and guided optimal thermocouple placement by identifying high-sensitivity regions. The model successfully learned nonlinear mappings between sensor readings and fault location, offering an accurate and efficient surrogate for early TR detection. To further strengthen robustness, the same group advanced this work by introducing a stacked ensemble model that leveraged diverse advantages from kNN, RF, and extreme gradient boosting (XGBoost).106 Extending beyond purely data-driven methods, Chen et al.107 integrated ML with physics-based modeling within a hybrid TR warning framework: the unsupervised K-means clustering with dynamic time warping distance captured abnormal data patterns, whilst the Bernardi thermal model estimated internal heating dynamics. An SVM-based fusion strategy combined both outputs, achieving 25 minutes of advance warning with fewer false alarms, thereby improving early TR detection in terms of both reliability and lead time.
Design optimization is another critical aspect of battery TRP research, where repeated performance evaluations are required to search for optimal solutions. To alleviate prohibitive cost of high-fidelity simulations, surrogate models are trained on simulation or experimental datasets to approximate global system responses, facilitating rapid inference and efficient multi-objective optimization. Li et al.108 built a response surface surrogate model trained on electrochemical–thermal TRP simulation dataset. This framework, coupled with multi-objective particle swarm optimization algorithm, improved hybrid battery thermal management system (BTMS) performance by reducing maximum temperature by 4.37% and pump energy consumption by 75%. However, response surface models are restricted by their polynomial form and struggle to capture complex nonlinearities in high-dimensional design spaces. To overcome this limitation, an adaptive Kriging surrogate with truncated high-dimensional model expansion was proposed, retaining only first- and second-order interactions to efficiently capture nonlinear effects with reduced complexity.109,110 The Kriging surrogate reduced each evaluation from several hours of CFD to only seconds, expediting design optimization of CPCM configuration and cooling/insulation layouts for TRP suppression in prismatic and cylindrical modules.
Beyond accelerating design optimization, surrogate models are also valuable for uncertainty quantification (UQ) in TRP thermal modeling. In practice, TRP predictions are sensitive not only to nominal model inputs, but also to their uncertainty. Typical uncertain parameters include thermal conductivity, heat transfer coefficients, thermal contact resistance, triggering temperature, SOC, PCM thermophysical properties, inter-cell spacing, and boundary-condition-related quantities. Variations in these inputs may arise from manufacturing inconsistency, cell-to-cell variability, operational fluctuations, and modeling assumptions. Once propagated through a thermal model, such uncertainties can affect not only the predicted peak temperature and TR onset time, but also the propagation interval, the sequence of neighboring-cell failure, and the final extent of propagation. Therefore, surrogate models that can efficiently emulate high-fidelity thermal simulations are particularly suitable for repeated stochastic sampling, uncertainty propagation, and global sensitivity analysis.
Gaussian-process-based surrogates are especially attractive in this context because they provide not only fast predictions, but also probabilistic estimates with quantified variance.111,112 Existing studies have shown that combining such surrogates with sensitivity analysis can identify which uncertain inputs dominate different TR metrics. For example, Yeardley et al.113 employed a Gaussian-process model to compute time-dependent Sobol indices for TR abuse and showed that emissivity and thermal conductivity can strongly influence the onset and peak-temperature responses. Zhang et al.114 further integrated adaptive Kriging into TRP uncertainty quantification by propagating stochastic variations in SOC, PCM thermal conductivity, and inter-cell spacing to propagation outcomes. Their results indicated that the uncertainty of TRP becomes most pronounced at intermediate SOC, especially around 40%–50%, and that once propagation proceeds from the first cell to the second one, the probability of subsequent cascading failure increases sharply. These findings show that UQ is useful not only for estimating uncertainty bands in temperature prediction, but also for quantifying escalation risk under uncertain conditions.
More broadly, uncertainty-aware data-driven surrogates further extend this idea to online prognosis. Ouyang et al.115 developed a predictor that combines fuzzy inference with a multi-task CNN-LSTM architecture to account for variations in SOC, charging/discharging conditions, and trigger locations. Such efforts suggest that future surrogate-assisted TRP thermal modeling should move beyond a single deterministic prediction and instead provide robust prediction intervals, probabilistic risk measures, and safety margins under cell-to-cell variability and uncertain boundary conditions. Accordingly, surrogate-assisted UQ can serve as an important complement to nominal TRP prediction, especially for robust pack design, probabilistic safety assessment, and decision-making under uncertainty.
Although traditional ML methods and Bayesian modeling have shown effectiveness in TRP surrogate modeling, they often struggle with highly nonlinear behaviors and complex spatial-temporal feature interactions within battery systems. Neural networks (NNs), thanks to their flexible architectures and representation learning ability, can exploit large simulation and experimental datasets to approximate intricate input–output mappings with higher accuracy and generalization. As the most fundamental architecture, fully-connected neural networks (FCNNs) stack dense layers to capture nonlinear dependencies and serve as a basis for more advanced designs. Yan et al.116 employed a lightweight FCNN trained on simulation data to forecast cell temperature trajectories and TRP intervals under various SOC, ambient temperature, and heater power, enabling rapid what-if evaluation and safety screening without repeated tests. Zhu et al.117 proposed a hybrid multiphysics framework for module-level temperature distribution monitoring to prevent TR in cylindrical battery. Considering the non-uniformity of temperature field induced by low-fidelity lumped battery model, an FCNN was introduced to compensate for spatial errors, which was then integrated with an unscented Kalman filter for real-time temperature state estimation. To predict long-term TRP behavior within battery packs, Lekoane et al.118 applied layer-recurrent NN to investigate sophisticated mapping between voltage, current, SOC, and feedback temperature from COMSOL-generated simulation data. By incorporating temporal recurrence, the model better captured dynamic dependencies than FCNN and Elman NN, yielding lower prediction errors particularly beyond 1000 s.
Battery system layout, heat transfer parameters, and temperature field exhibit strong spatial correlations that are difficult to perceive by FCNNs. Convolutional neural networks (CNNs),119,120 through local receptive fields and weight sharing, can efficiently extract spatial features from temperature maps of FE/CFD outputs, thereby facilitating accurate representation of TRP patterns. Considering temperature distribution with battery packs as images, Goswami et al.120 trained CNNs on heat maps generated from a coupled electrochemical–thermal–degradation model to classify safe, critical, and TR states. A YOLO-based CNN module further localized hotspots, enabling simultaneous TR stage identification and initiation-site detection for early warning. Wang et al.121 not only employed CNNs to learn spatial temperature distribution in battery packs, but also applied FCNNs to predict TR onset time based on CFD simulations. The hybrid framework ensured TR temperature prediction error below 10 °C and onset time prediction within 2 s. Apart from CNN models, graph neural networks (GNNs) also provide a flexible way to model spatial correlations by directly representing battery cells or sensors as nodes and their thermal couplings as edges.122 The graph-based formulation allows efficient aggregation of inter-cell interactions and accommodates irregular sensor layouts, thereby supporting accurate hotspot forecasting and TRP risk assessment beyond the grid-based representations required by CNNs.
It should be noted that both CNN and GNN models are limited in capturing long-term temporal dependencies. Since TR involves strongly time-dependent processes such as heat accumulation and propagation delay, autoregressive models including recurrent neural networks (RNNs),118 long-short-term memory (LSTM),123,124 and gated recurrent units (GRU)125 have been introduced to explicitly model sequential dynamics. Zhang et al.123 proposed a multi-mode multi-task data-driven framework that integrated 3D CNN features from thermal images with LSTM features from operating data for TRP prediction. A vanilla LSTM was also introduced to predict battery temperature under normal conditions. By incorporating memory cells and gating mechanisms, LSTM effectively captures temporal dependencies across sequence, thereby improving the accuracy of long-horizon TRP forecasting. Considering time-dependent battery heat maps as sequences of images, Li et al.124 applied a CNN-LSTM framework that combined CNNs for spatial feature extraction with LSTMs for temporal modeling of CFD-generated thermal contours. Preprocessed into 8 consecutive thermal images, the model achieved 96.7% accuracy with high recall. To provide early warning of TR under mechanical abuse, Li et al.126 proposed a model-switching architecture that combines an LSTM for failure onset prediction with a 1D CNN-LSTM for post-failure temperature evolution. Trained on experimental and mechanical–electrical–thermal simulation datasets, the framework ensured less than 1 °C error in most tests for multi-level warnings. Battery systems inherently operate under uncertain conditions. To address variations of SOC, cycling modes, and abuse locations on TRP behavior, Ouyang et al.115 developed an uncertainty-aware predictor combining fuzzy logic with multi-task CNN-LSTM surrogate. However, LSTM architectures often suffer from gradient decay and high parameterization in long sequences. In contrast, GRU127 employs a simplified gating mechanism with fewer parameters, and attention mechanisms128 dynamically weight informative features, together enabling more efficient and robust long-horizon TRP prediction. Inspired by this intuition, Li et al.125 proposed a multimodal TRP predictor that integrated GRU-based multi-scale gated fusion with bidirectional cross-attention to couple 2D thermal images and 1D temperature data. This framework not only stabilized feature fusion under limited samples but also reduced prediction errors by over 25% compared with CNN-GRU baselines.
Although purely data-driven models have shown promise in TRP prediction, their accuracy still relies heavily on large volumes of high-fidelity data, which are impractical in engineering scenarios due to limited time, budget, and computational resources. Additionally, the performance of surrogate models often deteriorates when extrapolating beyond training domains. Physics-informed machine learning (PIML),129–131 emerged as a promising paradigm, addresses these limitations by embedding domain knowledge and partial differential equations (PDEs) into NN structures or the training process, thereby reducing the reliance on extensive datasets whilst enhancing physical consistency and generalization capability under unseen conditions.132 Kim et al.133 proposed a multiphysics-informed NN for single cell TR prediction, embedding energy-balance PDE and Arrhenius kinetics into two coupled FCNNs. Even with partially labeled temperature data, the network successfully predicted species concentration and long-term TR temperature evolution. Considering the fidelity gap induced by 2D heat transfer simplification, Jiang et al.134 proposed a multi-fidelity PI-CNN for battery pack heat map prediction under varying layouts. By coupling a PDE-constrained UNet backbone with a supervised lightweight projection head, the framework achieved high accuracy with a maximum error of only 0.02 °C. The two-stage training strategy reduced high-fidelity data requirements to one-fifth whilst maintaining desired accuracy, thereby enabling efficient design-space exploration and facilitating subsequent optimization of battery layout.135 Jeong et al.136 embedded heat transfer and chemical degradation knowledge within the loss function of a physics-informed deep operator network (DeepONet), which significantly outperformed its data-only counterpart for TR temperature prediction beyond 30 minutes. As a representative neural operator framework, PI-DeepONet not only embeds physical constraints to ensure consistency, but also learns mappings between function space, enabling extrapolation across diverse operating and boundary conditions.137 This provides better flexibility and generalization compared with PI-FCNN and PI-CNN architectures, which are typically limited to fixed operating conditions and domain discretizations.
Table 3 summarizes the capabilities and limitations of each of the model types. There are several common benefits and limitations that these models have. They are all faster and more suitable for real-time online instant prediction compared to the 3D simulation, while they are all data-dependent, requiring a large amount of high-quality data that is hard to obtain. PINNs can solve the interpretability problem of the other two types of methods and the data requirement problem to some extent, but they increase a lot of complexity to the training process. Those methods should be chosen carefully based on a task-directed way.
| Approach | Inputs to outputs | Models | Capabilities | Limitations |
|---|---|---|---|---|
| ML classifiers | Sensor data including voltage, temperature, etc. → risk class (TR/no-TR, risk level). | Tree ensembles, SVM, shallow NNs. | • Detect subtle signs and classify TR risk in an early stage. | • Require extensive failure data covering diverse conditions. |
| • Fast inference, suitable for onboard BMSs. | • Easy overfitting and lack the predicting ablity outside training distribution. | |||
| • Can reach high prediction accuracy. | • Black-box method without physical insights. | |||
| Neural-network surrogates | Time–space sensor data or simulation snapshots → temperature evolution, TR timelines, and peak temperature. | Complex NN models such as LSTM, CNN, autoencoders. | • Emulate complex TRP behavior much faster than high-fidelity models. | • Requires large training datasets covering varied scenarios. |
| • Useful for online prognosis, allowing emergency controls to trigger. | • Hard to extrapolate to the conditions outside the training dataset. | |||
| • Capture nonlinear dependencies that are hard to explicitly model. | • May produce results not matching physical constraints. | |||
| Hybrid & physics-informed ML | Sensor data + physics priors → fields consistent with reaction kinetics. | PINNs, Neural operators such as DeepONet. | • Models follow the fundamental physical laws, thus requiring less data and remain valid under extrapolation. | • Complex training requiring expertise in both physics and deep learning, |
| • High-speed and high-fidelity surrogates of 3D simulation models. | • Balancing the data-fit and physics residuals during training can be tricky. | |||
| • More interpretable and can provide insights into the physics. | • Still high-quality data-dependent to an extent. |
A further practical consideration in model selection is battery aging. As discussed in Section 2.2.1, aging can shift TR triggering behavior, but its effects are strongly dependent on usage history, chemistry, and degradation mode. This makes aging particularly difficult to represent with fully physics-based models, because the required degradation parameters and their evolution are rarely known in sufficient detail for safety applications. For this reason, simpler semi-empirical, reduced-order, or hybrid models can be more attractive for aged batteries and second-life systems, especially when they can be re-identified from pack-level measurements such as temperature, voltage, and resistance. In practice, a useful guideline is to start from the simplest model that can answer the target question with acceptable accuracy, and only move to higher-fidelity multiphysics descriptions when localized mechanisms, trigger pathways, or component-level interactions must be resolved explicitly.
Fig. 6 shows the framework and the results of some surrogate modeling of TR, TRP, or thermal-related modeling and prediction. The classifier methods typically identify the dangerous and safe parameter space in the cell or pack configuration parameters with a decision boundary. This helps the designer select pack design parameters more pertinently. The surrogate ML and NN models target predicting the quantitative results of TR and TRP behaviors, such as peak temperature and propagation time. These methods can be combined with uncertainty quantification techniques and give more information on the reliability of the cell or pack design. Finally, PINN methods integrate physical information in the surrogate models, enabling interpretability in data-driven models. Although having a more complex training and model establishment process, these models possess the merits of lower data requirements and more uniformity to physical laws, which the previous two types of models do not have. Therefore, when establishing a surrogate model of TRP, one should choose the desired modeling method carefully, considering the features each model has.
![]() | ||
| Fig. 6 Framework or results in the existing studies of surrogate models, (a) ML classifiers: (a1) SVM safety risk classification (reproduced from ref. 104 with permission from Elsevier, copyright 2026), (a2) RF faulty cell position prediction (reproduced from ref. 105 with permission from Elsevier, copyright 2026), (a3) the fault boundary ML-physical combination model identifies (reproduced from ref. 107 with CC-BY license); (b) neural network surrogate model: (b1) design optimization through surrogate modeling considering cooling of PCM (reproduced from ref. 108 with permission from Elsevier, copyright 2026), (b2) surrogate modeling using Kriging and incorporating uncertainty quantification (reproduced from ref. 114 with CC-BY license), (b3) TRP early warning strategy incorporating LSTM modeling (reproduced from ref. 123 with permission from Elsevier, copyright 2026), (b4) TRP prediction using a framework of CNN-LSTM with uncertainty (reproduced from ref. 115 with permission from Elsevier, copyright 2026); (c) PINN modeling: (c1) PINN modeling and prediction for TR, incorporating various exothermic process inside the battery cell (reproduced from ref. 133 with CC-BY license), (c2) heat map prediction using PINN (reproduced from ref. 134 with permission from Elsevier, copyright 2026), (c3) TR prediction through physics-informed DeepONet (reproduced from ref. 136 with permission from Elsevier, copyright 2026). | ||
To conclude, physics-based and data-driven surrogate models have greatly advanced TRP research by facilitating efficient risk classification, fault localization, design optimization, and uncertainty quantification. Traditional ML methods, such as SVM and RF, offered early successes, whereas NNs, including FCNN, CNN, GNN, and recurrent variants, further improved the modeling of nonlinear spatiotemporal dynamics, supporting long-term forecasting and early warnings. Notably, PIML has emerged to overcome data scarcity and extrapolation challenges by embedding physical laws into NNs. This shift from empirical data-driven surrogates to PIML and neural operators marks a key step toward reliable, interpretable, and generalizable TRP prediction for next-generation battery safety design and management.
![]() | ||
| Fig. 7 TRP mitigation and prevention technologies: (a) thermal management strategies, including (a1) active cooling via cold plate (reproduced from ref. 140 with permission from Elsevier, copyright 2026) and air flow (reproduced from ref. 141 with permission from Elsevier, copyright 2026), (a2) passive thermal management through PCM (reproduced from ref. 142 with permission from Elsevier, copyright 2026), and (a3) insulating material to reduce heat propagation (reproduced from ref. 143 with permission from Elsevier, copyright 2026); (b) fire suppression technologies, such as liquid nitrogen cooling (reproduced from ref. 144 with permission from Elsevier, copyright 2026); (c) early detection and monitoring systems, including (c1) electrical and temperature sensing (reproduced from ref. 145 with CC-BY license), (c2) gas sensing (reproduced from ref. 146 with CC-BY license), and (c3) acoustic detection (reproduced from ref. 147 with permission from Elsevier, copyright 2026). | ||
As shown in Fig. 7(a1), active cooling mainly relies on cold plates placed between adjacent cells or on module sides, together with forced air flow through the module, to extract heat and maintain the cell temperature within a safer range. Such approaches are intended to reduce the temperature rise of neighboring cells before they reach the TR triggering condition. By continuously removing heat from the module, active cooling can delay TR triggering and, in some cases, interrupt further propagation. Active cooling is employed in modern EV battery packs, such as liquid coolant circuits, refrigerant loops, or forced air cooling, to maintain cell temperatures within safe limits during operation and to rapidly dissipate heat in abuse conditions. Liquid cooling is particularly effective since it can absorb large amounts of heat and even handle extreme cases like early-stage TR, due to many coolants’ high boiling points and heat capacities.148 For instance, direct liquid cooling with the help of a heat pipe and phase-change liquid in an EV pack was shown to prevent TRP by keeping the adjacent cell temperatures below 185 °C when a neighboring cell went into runaway.141 Immersion cooling is an emerging approach in some ESS designs.140 In this system, the battery is submerged into a non-conductive dielectric fluid, thus making direct contact with the cell.149 Compared to indirect cooling, direct fluid contact yields more uniform temperatures and suppresses TR propagation more effectively.150 Active cooling includes air cooling, heat pipe cooling, fluid cooling, and mixed refrigeration processes, also providing a safety buffer during thermal incidents.151 However, these systems add complexity, weight, and cost. Sufficient coolant flow rate and power are required. If cooling is insufficient or disabled, propagation can proceed unchecked.148 Additionally, forced-air cooling has limitations. On the one hand, forced-air cooling performs reasonably well for certain types of batteries while not ideal for those that have higher heat densities.152 On the other hand, forced-air cooling has a lower heat transfer coefficient and worse thermal conductivity compared with liquid cooling.153 Besides, the introduction of airflow during a battery fire may inadvertently supply oxygen and worsen flames.154 Thus, designers must carefully balance cooling effectiveness with system complexity.
Passive thermal management using PCMs or heat sinks provides an extra safety layer by absorbing excess heat without external power. Fig. 7(a2) shows a representative PCM-based passive thermal management strategy. In this case, the cells are surrounded by a phase-change medium that absorbs a large amount of heat through endothermic melting. Therefore, the PCM acts as a thermal buffer, slows down the temperature rise of adjacent cells, and delays the onset of TRP. Compared with active cooling, this strategy does not rely on continuous external power input, but its effectiveness strongly depends on the latent heat, thermal conductivity, and geometric coverage of the PCM. PCMs melt or undergo phase change during overheating, soaking up heat through latent heat absorption and stabilizing cell temperature.142 Studies have shown that integrating PCM around cells can delay or even prevent propagation. Wilke et al.155 demonstrated that adding a phase-change composite layer into a small pack, where the TR propagated through all cells without PCM, would let the failing cell burn out without igniting its neighbors. Dai et al.156 reported that using paraffin wax as PCM could delay the onset of runaway by several minutes and reduce peak cell temperatures by tens of degrees. Also, Javani et al.157 discovered that PCM could maintain a more uniform temperature distribution on a battery cell and avoid hot spots. CPCMs address the key low thermal conductivity limitation of pure PCMs by enhancing heat spread within the material.158 This ensures the PCM melts uniformly and captures heat effectively before the cell temperature spikes and blocks propagation efficiently. Combined with some other passive heat spreaders like copper or aluminum heat sinks and heat pipes, heat can be further drawn away from an endangered cell, flattening thermal gradients. While PCMs and passive elements do not actively cool the pack under normal conditions, they provide crucial buying time during thermal incidents. However, one drawback of PCM cooling is that once the PCM's phase-change heat capacity is expended, it no longer absorbs heat. Furthermore, flammable PCMs can easily burn with the TR cell and exacerbate the propagation.
In addition to heat absorption like passive cooling, purely insulating materials are used to isolate cells thermally. Advanced thermal barrier sheets (ceramic fiber mats, aerogel blankets, intumescent polymers) are placed between cells or modules to block heat transfer and flame spread. These barriers do not cool the cell but slow down heat propagation, giving adjacent cells more time to vent or for external interventions to take effect. One of such a passive strategy is presented in Fig. 7(a3), where an insulating material is inserted between cells to interrupt heat propagation. This is based on a sandwich-structured aerogel/copper insulation board. Its mechanism is not limited to blocking conduction. First, the low-emissivity copper surfaces strongly suppress radiative heat transfer. In the referenced study, without the insulation board, the heat received by the adjacent cell was composed of approximately 65% radiation and 35% convection, whereas after introducing the insulation board the radiative contribution decreased to only about 4%. Accordingly, the remaining heat-transfer pathway shown in Fig. 7(a3) becomes predominantly convective (about 96%), which explains the values marked in the figure. Second, the board also reduces the air flow between the heat source and the neighboring cell, thereby weakening convective heat transfer. Third, owing to its low thermal conductivity, a large temperature gradient can be established across the board, further limiting conductive heat leakage. Therefore, Fig. 7(a3) illustrates that insulating materials can simultaneously suppress radiation, reduce convection, and hinder conduction, and thus markedly slow down TRP. Also, there are some other techniques, for instance, Yu et al.143 developed a sandwich-structured composite that delays TRP by more than 6 hours in NMC modules. Aerogel is also a good heat insulation choice for constructing thermal barriers.159,160 Ceramic fibers are another type of thermal barrier. The Al2SiO5 nanofiber membranes developed by Zhao et al.161 exhibited an ultra-low thermal conductivity across a wide temperature range and excellent delay efficacy for TRP. Commercial products like intumescent polymer sheets serve as cell-to-cell firewalls, designed to stop propagation even in densely packed modules. The effectiveness of thermal barriers has been confirmed in numerous studies, though complete suppression is not always guaranteed. Key properties for a good thermal barrier include low thermal diffusivity, high decomposition temperature, and structural stability under flame to resist TRP. A trade-off exists because thicker or multi-layer barriers improve safety but reduce energy density and may impede heat dissipation during normal operation. Thus, designers optimize barrier placement (often only between module units or at critical interfaces, rather than between every cell) to contain potential flames while managing weight/space constraints. Overall, combining active cooling (handling gradual heat build-up) with passive thermal barriers (for extreme events) is considered a best practice in pack design for EVs and ESSs.65,162
In EVs and ESSs, active fire suppression systems can detect a thermal event and deploy a fire-quenching agent. For instance, many stationary battery containers are equipped with automatic systems that flood the enclosure with fire suppressants like inert gas, aerosolized dry chemicals, or water mist when TR is detected. The goal is to cool the batteries and starve the fire of oxygen. Water remains one of the most effective firefighting agents for LIB fires, primarily due to its high heat capacity. Zhang et al.163 tested 21
700-format cells and showed that a fine water spray can effectively suppress TR – the longer the spray duration, the better the TR suppression achieved. Wu et al.164 developed a dry powder extinguishing agent that shortened the TR duration and suppressed the jet flames. Liu et al.165 explored the control strategy and the cooling effect of water mist through a series of TRP experiments – their experiment exhibited a successful cooling process depending on the water evaporation latent heat. Spray strategies like water mist and other extinguishing agents can effectively limit the peak temperatures and suppress the TR propagation. The limitation of plain water is the Leidenfrost effect – if cell surfaces are extremely hot, water droplets can vaporize upon contact and fail to wet the surface, reducing cooling efficacy.166,167 The solutions to this problem can be adding surfactants or salts to change the physical properties of the water mist, reducing water's surface tension and raising its boiling point.168 Such mixtures better adhere to burning cells and absorb heat more rapidly. Similarly, a proprietary blend of fire suppressant chemicals combined with fine water mist achieved greater flame knockdown than either agent alone. Yao et al.169 investigated the synergistic effect of gaseous extinguishing agents (CO2, C6F12O) combined with water mist intermittent spray in the suppression of TRP, and explored the optimal strategy that minimized the water volume and improved the efficiency. Gao et al.144 proposed an alternating water mist – liquid nitrogen cooling strategy and tested its deflagration suppression efficiency on the LFP and NCM battery modules. These approaches both cool the battery and form protective layers that smother flames.
As for considering the safety design for the TRP suppression agent or method, the ideal ones for LIB fires should act fast, be electrically non-conductive to avoid shorting electronics, leave minimal residue, and be safe for personnel and the environment. In vehicles, fully integrated suppression systems are not yet common due to space and weight constraints, but some manufacturers have begun incorporating thermal sensors that can trigger ventilation or even small suppressant canisters. For large ESS containers, clean-agent gas systems and aerosol generators are popular. They fill the enclosure with fire-quenching gas when activated. These systems can contain a battery fire long enough for external firefighting or for the batteries to cool. Notably, standards such as UL 9540A for ESSs and UNECE Regulation No. 100 Rev.3 for EVs now require demonstrating that TR fires are either prevented from propagating or else effectively contained and suppressed. This has driven innovation in pack designs that incorporate both fire-resistant construction and on-demand suppression. In summary, fire suppression technologies are crucial for minimizing damage and danger from the cascade failure and dangerous propagation once a TR has started, especially in high-energy systems where an unchecked battery fire could be catastrophic.
The conventional BMS already tracks cell voltages, temperatures, and currents for performance management. These signals can also serve as first-level TR indicators. For example, a sudden voltage drop may indicate an internal short, and an abnormal local temperature rise can signal failing cells. However, relying on standard sensors alone has limitations, since temperature sensors on cell surfaces may not detect fast internal heating until it's too late.170 Similarly, a cell's voltage might not dip significantly until catastrophic failure is imminent. Thus, researchers have proposed enhanced monitoring such as temperature sensor arrays,171 pressure sensors,172 or strain parameter monitoring.173 Moreover, synthetic monitoring indicators may provide comprehensive and robust detection. For example, Gu et al.174 presented a “state of safety” model that uses strain evolution data from cell expansion alongside voltage and temperature to warn of TR well before it happens. High-fidelity sensing of internal resistance or impedance is another avenue. Gomez et al.145 suggested that an increase in internal resistance can precede TR, so periodic electrochemical impedance spectroscopy or pulse resistance measurements can provide early warning. They also indicated that a technique monitoring subtle changes in a cell's resistance under small current pulses was able to detect the onset of TR in advance, by recognizing the signature of lithium plating or SEI breakdown inside the cell.
Another promising early warning method is gas sensing.146 When a Li-ion cell undergoes abuse, it often releases small amounts of combustible gases even before significant heat or smoke is generated. Also, gas sensors react more quickly compared to battery surface temperature sensing, thus making it an important means of measuring early-stage TR.175 These off-gases result from the decomposition of the electrolyte and electrode SEI layer at elevated temperatures. Torres-Castro et al.176 explored an early-stage TR diagnostics method combining electrochemical impedance spectroscopy (EIS) and gas sensing technology, taking advantage of the quick response of both methods in TR detection. In their tests, the gas sensors could offer a warning time of more than 6 minutes under overcharge failure of a battery pack, and this time could enable the BMS to isolate the affected module and halt charging, preventing TR progression. Besides quick response, they also tend to be more sensitive in large systems where a localized temperature spike might be missed; but gases vented from a cell will spread and can be picked up by a nearby sensor.176 It can be foreseen that the techniques of using combinations of sensors for key species like CO2, CO, H2, and volatile organics, sometimes in tandem with ML algorithms to distinguish failure signatures from background levels can be a promising path for TR diagnostics.
Acoustic detection also springs up in many TR early detection techniques.147 The opening of a safety vent or the sound of a cell case rupturing when the gas builds inside a cell and the internal pressure rises produces an audible or ultrasonic signature.177 This tends to catch the moment of failure rather than early pre-failure signals, but it can still trigger alarms or suppression systems a split second faster than temperature rise detection. Liu et al.178 proposed a multi-domain acoustic signal fusion, where the anomalies were screened by two levels of algorithms to ensure accurate detection. Tam et al.179 obtained an early-stage TR acoustic dataset containing 1330 acoustic samples, then used ML methods to accurately classify these data and detect the anomalies. As these studies have presented, algorithms and data-driven methods play an important role in acoustic signal processing. The combination of acoustic signal sensors and efficient data-driven signal processing methods can increase the early-stage detection accuracy.
Table 4 provides a comparative overview of the mentioned mitigation and prevention technologies for TRP, categorized by their primary mechanism. As concluded, each category of strategies offers unique benefits and addresses different aspects of the problem, and they have different advantages and limitations. When considering TRP mitigation and prevention, the most effective methods are often in combination as part of an optimized battery safety design, rather than a single technique alone.
| Category | Mechanism | Techniques | Advantages | Limitations |
|---|---|---|---|---|
| Thermal management | Regulates or absorbs heat to keep cell temperature below critical thresholds. | (1) Active cooling, | (1) Generally passive and reliable once implemented, | (1) Additional weight/complexity and cost, |
| (2) PCMs, | (2) Different strategies can be combined for synergy. | (2) Not usually sufficient alone under extreme abuse. | ||
| (3) Thermal barriers. | ||||
| Fire suppression | Actively cool and/or chemically quench flames once TR has started. | (1) Fire-resistant coatings on cell surface, | (1) Final defense to prevent small fires from becoming cascade failure disasters, | (1) Only deploys after failure initiation, |
| (2) Gaseous suppression agents or water mist. | (2) Coatings and additives work passively to reduce flammability. | (2) Inappropriate suppression agent reduces efficiency, | ||
| (3) Challenging to be installed in limited space like EVs/portable devices, | ||||
| (4) Possible re-ignition if root cause not solved. | ||||
| Early detection & monitoring | Sensors and algorithms that continuously monitor for early signs of TR, allowing intervention before propagation. | (1) Operational signal (voltage, temperature, etc.) monitoring, | (1) Proactively prevent TR in an early stage, | (1) Requires additional calibration and maintenance to ensure sensors function correctly, |
| (2) Gas sensing, | (2) Minimal impact sensor installation has to the system. | (2) Missed alarms can be caused by false positives or sensor failures, | ||
| (3) Acoustic detection. | (3) Other physically efficient mitigation methods required when failure warned. |
In this section, we will review how safety considerations drive design optimizations, balancing preventative measures and mitigative measures, as shown in Fig. 8. We will discuss theoretical frameworks and practical methodologies for safety-oriented design, covering passive strategies built into the pack's architecture and active strategies involving control and sensing. Finally, we highlight optimization techniques that incorporate safety as a key objective alongside performance and cost. This balanced review spans both conceptual advances and real-world engineering solutions.
![]() | ||
| Fig. 8 Design optimization considering battery pack safety: (a) structural and passive safety strategies considering (a1) mechanical integrity (reproduced from ref. 180 with permission from Elsevier, copyright 2026) and (a2) passive TR prevention (reproduced from ref. 181 with CC-BY license); (b) active safety measures and management systems (reproduced from ref. 182 with permission from Elsevier, copyright 2026); (c) safety-conscious and comprehensive design optimization (reproduced from ref. 109 with permission from Elsevier, copyright 2026). | ||
Passive design measures aim to contain or slow TRP as discussed in Section 4.1.1, buying time for occupants to evacuate or for active systems to respond. First, structural or geometrical design optimizations can significantly change the TRP behaviors. Li et al.187 developed a framework through CFD simulation and ML method for pack layout design optimization. Combined with an optimized BMS, they found the best design setup. Thermal insulation barriers or heat-absorbing components are two types of passive containment ways. The thermal barriers typically have very low thermal conductivity and high heat resistance, while the heat-absorbing components directly draw heat released from the TRP process, buffering temperature rise. Both methods serve to localize the heat of a failing cell, preventing it from immediately igniting neighbors. For instance, Rui et al.188 applied thin silica–aerogel sheets between cells, showing successful TRP block even in high-energy cell modules. Higher-density aerogels provide better insulation at extreme temperatures, enabling safer battery designs for energy-dense chemistries. Aerogels with optimized density and thickness can withstand the intense heat of a runaway cell and delay or entirely stop the domino effect. Xie et al.189 proposed a suppression method for propagation, implementing passive insulation coupling active liquid cooling which was validated through a numerical model. They found that with the synergistic effects of coolant flow rate, appropriately thinning the thermal insulation material can suppress the propagation. There are also many design studies related to PCMs, whether considering the structural optimization or the design of different varieties of PCMs. Huang et al.190 reported that embedding flame-retardant PCM pads in modules significantly delays the onset of TR and sometimes prevents propagation entirely. If formulated with flame-retardant additives, they can also smother sparks or flames. Zhao et al.191 proposed a CMCP combining PEG2000, epoxy resin, and EG to improve dispersion and thermal stability. These PCMs act as built-in fuses: under normal operation, they enhance heat dissipation, and under abuse, they absorb energy and form an insulating layer. It is worth noting that, besides standalone insulation layers that only focus on suppressing TRP, the design optimization method should pay more attention to hybrid thermal management strategies that balance normal operating cooling with emergency insulation. Since in practice, the design goal of the thermal management system in a battery pack should be both thermal efficiency to optimize the performance under normal conditions and safety under extreme failures.
Dangerous ejecta vented by TR cells needs to be managed by battery pack design during TRP. There are some EV packs aiming at directing the venting gas flow or energetic ejecta away from critical components or passengers by properly positioning the battery cells and designing the pack enclosure.192 Also, there are some studies that directly looked into the design of battery packs considering venting gas. Srinivasan et al.181 proposed a series of iterative designs of battery modules with different vent channels. Since the clogging of venting channels is key to the propagation, they focused on the transmission of the energetic materials in the ejecta and validated the design through simulations and experimental testing. Directing the venting gas and ejecta can be very important to prevent the TRP process, and the influence of ejecta on the TRP was investigated in an extensive amount of studies,28,75,193 yet there is still a lot of room for exploring the design optimization considering the venting gas and ejecta.
Active BTMSs are one crucial safety element in the active measure design part. The primary role of BTMS is to keep cell temperatures in an optimal range to ensure performance; they are also designed to prevent overheating and extreme safety hazards. Designs on active BTMS often incorporate different active cooling systems to ensure its safety performance is better than a standalone active cooling method. Najafi et al.195 experimentally built a low-cost BTMS combining air cooling and thermoelectric modules in an aluminum block pack with 48 18
650 battery cells. They found that thermoelectric-only cooling is insufficient at higher loads, while their combination strategy extended runtime and lowered surface temperatures, pointing to a practical, compact path for basic pack safety in light-duty uses. There are also some design trials combining active and passive measures together to amplify both their merits in TRP prevention. Sun et al.196 developed a hybrid BTMS combining active liquid cooling with passive media, integrating a snake-shaped coolant tube and interstitial fillers, with either copper foam to boost thermal diffusivity or expanded-graphite/paraffin PCM for latent-heat buffering. Their TRP mitigation assessment in a 2D numerical model showed that it outperformed natural air cooling across various scenarios and conditions. Though BTMS can both tune the normal operational temperature and prevent TRP, different designs have different utility focuses. A comparison study conducted by Yang et al.182 contrasted liquid cold plates on different sides, insulation, and PCM, judging from dimensions including TRP, cooling, and energy density within one validated CFD framework. It concluded that inter-cell cold plates with sufficient flow best halt propagation but worsen uniformity or energy density, whereas insulation or PCM favor density yet underperformed on TR and cooling—leading to clear recommendations for small EVs, large EVs, and ESSs. When conducting a design optimization study concerning BTMS, one should pick proper methods accordingly with regard to their characteristics.
Finally, active strategies may involve automatic fire suppression systems within the battery pack design. Since LIB fires burn very fast and are very dangerous, early response is highly important to control such a situation. Three main parts, including a monitoring system, a signal processing system, and an extinguishing system, should be considered to design a full fire protection system inside a battery pack.197 Separately, these components were introduced by the previous sections in 4.1. How to implement them into a whole fire suppression system may be a promising topic. There are some industrial solutions, e.g., discovering anomalies by monitoring the gas production behavior and then using liquid N2 to prevent combustion.197 The design of a fire suppression system should comprehensively consider these different components connected in series as a whole. This brings some extra requirements for certain scenarios. For instance, an ESS may have sufficient space to hold large and complete equipment sets, while an EV probably may not, and so do some other much smaller application products such as drones or anthropomorphic robots. Therefore, special tailoring should be made to them to ensure fitness and safety efficiency.
One approach is to treat safety requirements as constraints in the optimization problem. The design is then optimized for performance objectives subject to all safety constraints being satisfied. This ensures any resulting design automatically fulfills safety standards. For instance, in optimizing a pack enclosure, Shui et al.184 set deformation under shock and vibrational frequency targets as constraints along with weight minimization. The outcome was a Pareto-optimal set of designs balancing weight vs. mechanical safety, from which a final safe design was chosen. Yao et al.198 validated the model against nail-penetration and external-heating TRP tests and then used it to explore hybrid pack layout designs. The study showed that alternating or firewall-like LFP placements could arrest propagation, while denser NCM-heavy stacks need added convection or thin aerogel insulation to stay safe. Another important approach is to include safety in the objective function directly, effectively quantifying safety performance that the optimizer tries to maximize alongside other objectives. This is seen in recent works using multi-disciplinary design optimizations.199,200 The optimizer might try to simultaneously maximize operational performance and safety. Decision-makers could refer to the results and select a balanced design.
Multi-physics simulation plays a big role in safety-driven optimization. Designers now employ coupled simulations that capture electrical, thermal, and mechanical behavior under fault conditions. For example, a coupled electro-thermal model can simulate a cell overcharge and subsequent TR, feeding its output into a numerical model of fire spread. By integrating these into an optimization loop, one can optimize parameters like cell spacing, vent size, or coolant flow rate to minimize the chance of catastrophic outcomes. Zhang et al.110 took such an approach by developing a high-dimensional surrogate model for TRP in a battery module, then running an optimization to design a hybrid mitigation system. The system combined low-conductivity barriers with high-conductivity cooling elements, and the optimization tuned their relative properties to best dissipate everyday heat while still blocking thermal emergencies. The optimized design notably balanced these competing demands of heat transfer vs. insulation, significantly improving the module's safety margin. Rui et al.188 refined a hybrid cooling plus insulation strategy for TR prevention through both simulation and experiments. Their final results showed safe and unsafe areas regarding heat dissipation and thermal insulation, bringing indications for any further design optimizations considering TRP.
To make such optimization computationally feasible, researchers are increasingly using surrogate modeling and ML techniques. High-fidelity simulations of battery abuse scenarios can be extremely time-consuming, while surrogate models can approximate these physics results much faster. Zhang et al.109 built and experimentally validated a CFD hybrid pack model, then used it to compare cycling and abuse scenarios of the proposed method against baselines. The design optimization via a surrogate flagged PCM conductivity/thickness, heat-pipe length, and inlet velocity, and the results yielded a cooler yet uniform module and faster post-runaway cool-down. Li et al.201 built a 3D CFD model of an 8-cell air-cooled module using Latin-hypercube sampling to probe gap and airflow variables, and adopted Kriging to stand in for the costly solver while sensitivity plots clarify the physics. Coupled with a multi-objective genetic search, the surrogate guided a compact uneven-gap layout that cuts temperature difference, non-uniformity, and volume, with a clear Pareto front and sensitivity bar chart steering trade-offs.
In summary, pack safety optimization is a holistic endeavor. It spans choosing safer cell chemistries and formats, determining module layout and spacing, selecting materials and thicknesses for thermal barriers, designing cooling and venting systems, and programming the BMS with appropriate algorithms – all within a unified optimization scheme. The literature reflects a trend toward such integration: instead of addressing thermal issues, mechanical crashworthiness, and electrical safety in isolation, the state-of-the-art is to tackle them together in co-optimization. By doing so, one can avoid solutions that fix one problem but worsen another. For instance, simply adding heavy armor for crash safety could drastically reduce EV range, while a co-optimization would seek a better solution that meets crash requirements with minimal weight penalty. Thanks to high-performance computing and AI, designers can explore countless design variations to find those rare combinations that achieve both high energy and high safety solutions. The outcome is a new generation of battery packs that are inherently safer by design, rather than relying solely on add-on protective measures. Table 5 provides a summary of key safety-driven design strategies and how they contribute to overall pack safety, along with representative studies from recent literature.
| Optimization direction | Design strategies | Implementation examples |
|---|---|---|
| Mechanical integrity & passive prevention (crash, vibration, passive TRP containment) | (1) Robust enclosure materials and shock absorption. | (1) Enclosure with metal composites to resist impact deformation; additional components or materials to absorb vibration. |
| (2) Multi-objective structural optimization balancing safety vs. volume and energy density, or weight vs. stiffness and strength. | (2) Optimized interlaced cell arrangement, or module segmentation and spacing to create natural firebreaks. | |
| (3) Heat insulation blocks heat transfer and flame spread, and heat-absorbing materials soak up excess heat during abuse. | (3) Thermal barriers such as ceramic fiber, aerogels, etc., applied between cells; heat-absorbing materials, such as PCM, filling the interstice of the pack. | |
| (4) Venting mechanisms to safely relieve pressure and direct flames. | (4) Pack vents, burst panels. | |
| Active thermal management (overheat mitigation & fire suppression) | (1) Cooling systems for extracting heat during high loads and keeping cell temperature uniform. | (1) Liquid cooling systems (coolant channels or plates) and forced-air cooling. |
| (2) Dynamic thermal control: TR signal monitoring triggers increasing cooling power. | (2) Sensors trigger increased cooling if a hotspot or abnormal temperature rise is detected. | |
| (3) Heating/cooling elements to prevent extreme conditions. | (3) Pre-heating in cold to avoid lithium plating, cooling in heat to avoid overheating. | |
| Integrated safety design (holistic & optimized solutions) | (1) Integrated signal monitoring-response chain design. | (1) Continuous cell signal monitoring with automated shutdown or power reduction on detecting unsafe conditions. |
| (2) Multi-layer safety: combining passive and active measures. | (2) Thermal barrier + cooling + sensing + suppression together. | |
| (3) Safety-oriented optimization. | (3) Simulation-driven plus algorithms to find optimal designs that satisfy safety constraints. | |
| (4) Surrogate modeling/advanced computation method (integrated with numerical simulations or experiments). | (4) ML models TRP outcomes quickly; topology optimization discovers designs that improve safety. |
Accordingly, this section provides a dedicated outlook on the key obstacles, in Section 5.1, and next steps, in Section 5.2, to guide more robust studies about safer battery packs. We first identify major technical and institutional challenges that continue to hinder TRP mitigation, modeling, and standardization. These challenges range from mechanistic uncertainties and data scarcity to the lack of common testing protocols, model fidelity issues, and practical implementation constraints. We then propose corresponding future directions accordingly, outlining a concise guideline for researchers, industry practitioners, and policy-makers in prioritizing efforts that directly address the most pressing TRP issues. There might be some limitations or uncovered points within them, but we only hope this challenge-scope framework can serve as an inspiration to future related studies.
Mechanistically, when one cell undergoes TR, its exothermic reactions rapidly elevate its temperature and eject energy that can ignite adjacent cells via direct conduction, radiative heating, and convective transfer of hot ejecta. Critical factors such as cell chemistry, SOC, and battery age strongly influence the likelihood of runaway initiation and the speed of its propagation through the pack. Because full-scale propagation experiments are hazardous and costly, researchers increasingly rely on modeling and simulation to study TRP. Accordingly, this review outlined how predictive models now range from reduced-order thermal network models for fast, low-complexity analysis to high-fidelity 3D multiphysics simulations that resolve coupled thermal–electrochemical–structural dynamics, as well as data-driven surrogate models that enable rapid risk screening and design optimization. These tools have become indispensable for probing TRP behavior and guiding safer battery designs. Then, for suppression and mitigation, a variety of preventive and protective techniques have been developed, including BTMSs, fire suppression measures, early fault detection and monitoring, etc. Meanwhile, safety-conscious pack design modifications significantly improve a pack's inherent resistance to propagation. Both passive and active safeguards must be integrated for enough redundant safety design, since no single countermeasure is sufficient on its own, and only a coordinated combination of strategies yields effective TRP prevention. An optimized battery pack therefore employs multiple layers of protection to collectively mitigate the onset of TR and block its propagation.
Finally, several persistent challenges and future research needs are also identified according to the review of existing studies, aiming at guiding the future studies to a more effective and comprehensive thread. Looking at the current policies, industrial applications, and academic studies, there are still research gaps within the safety standards and test protocols, data effectiveness, balanced pack design considering operational performance, failure early detection, focus on limited application scenarios, etc. Accordingly, several future scopes on the modeling and safety designs considering TRP have been proposed to bridge these gaps. There might be some shortcomings and limitations within these views, but we hope that they can inspire and stimulate future research to better overcome the safety problems within a battery pack. Progress on these fronts will be crucial for the next generation of LIB packs to be inherently resilient against TRP while still delivering the performance and longevity demanded by modern clean energy applications.
| This journal is © The Royal Society of Chemistry 2026 |