Open Access Article
Ahmed Abdelrahman and
Jie Liu
*
Department of Mechanical and Aerospace Engineering, Carleton University, Ottawa, ON K1S 5B6, Canada. E-mail: jie.liu@carleton.ca
First published on 7th May 2026
Accurate estimation of the State of Health (SOH) of lithium-ion batteries is essential to ensure their reliable performance and operational safety. This study integrates functional principal component analysis (FPCA) of incremental capacity (ICA) curves and then propagates the extracted functional features through a two-stage learning architecture, where a linear regression model captures the dominant degradation trend and a Gaussian process regression (GPR) layer adaptively corrects the nonlinear residuals. This multi-stage framework enables highly accurate prediction of both state-of-health (SoH) evolution and end-of-life (EOL) timing from early-life data. Initially, Incremental Capacity Analysis (ICA) is employed to extract characteristic voltage–capacity features that encapsulate the electrochemical degradation behavior of the cells. These features are subsequently transformed through functional decomposition using FPCA, enabling the identification of dominant degradation modes and temporal evolution patterns within the cycling data. The resulting functional components are then mapped to the SOH through a hybrid regression framework, combining the linear interpretability of traditional regression techniques with the nonlinear learning capability of GPR. The proposed approach demonstrates strong predictive performance, achieving a mean absolute percentage error (MAPE) of 2.99% and an average end-of-life (EOL) prediction error of 4.95%, underscoring its effectiveness and robustness in accurately characterizing lithium-ion battery degradation.
Broader contextAccurate early-life estimation of the State of Health (SoH) and End of Life (EOL) of lithium-ion batteries is critical for ensuring safety, reliability, and cost-effective operation in electric vehicles (EVs) and energy storage systems (ESS). Conventional model-based and data-driven approaches often rely on predefined degradation models, which limits their ability to generalize across varying operating conditions and aging behaviors, which led to the growing interest in non-parametric degradation analysis techniques. At the same time, Incremental Capacity Analysis (ICA) provides valuable insight into electrochemical degradation mechanisms, however, its high dimensionality and sensitivity to noise hinder its direct application in prognostic frameworks. To address these challenges, this work adopts a functional data analysis approach in which ICA curves are treated as continuous functions and decomposed using Functional Principal Component Analysis (FPCA) to extract dominant degradation modes and their evolution across the battery's life cycle. The resulting low-dimensional functional features are then propagated through a two-stage regression model, where linear regression captures the dominant global aging trend and Gaussian Process Regression (GPR) adaptively models nonlinear residual behavior. This hybrid and interpretable framework enables highly accurate prediction of SoH evolution and EOL timing from early-life data, demonstrating strong robustness and predictive performance and highlighting the potential of FPCA-based functional representations for next-generation battery prognostics. |
000, about the size of the European market 5 years earlier. In Canada, new registrations of battery electric vehicles went up by about 57% in 2024 compared to the previous year, and plug-in hybrids rose by 75% in the same period.2 1 in 4 new cars sold in 2025 will be electric, new report says, and China is the undisputed EV leader.3
Such growth in the adoption of electric vehicles has led to a much greater reliance on lithium-ion batteries as the main source of power.4 With this widespread usage comes the challenge of dealing with how these batteries age and lose capacity over time. As drivers depend on their vehicles for both performance and range, battery state of health (SOH) estimation and prediction have become increasingly important.5
SOH estimation for lithium-ion batteries (LIBs) have been searched in various ways.6 The most common ways are model-based methods and data-driven methods. Model-based methods involve parameter estimation of battery equivalent circuit models or electrochemical models.7
Model-based methods, including equivalent-circuit,8 electrochemical,9 and degradation-mechanism-based formulations, are attractive because they are physically interpretable and can embed known battery behaviour. However, their practical use is often constrained by parameter identifiability,10 sensitivity to operating conditions, chemistry dependence, and the computational burden associated with online implementation of electrochemical models. Recent reviews have emphasized that no single physics-based model can fully capture the coupled and evolving degradation pathways of lithium-ion batteries across varying temperatures, current rates, and usage histories, especially under realistic field conditions.11 These issues become even more pronounced when transferring calibrated models across cells, chemistries.
Data-driven methods, by contrast, are well suited to learning nonlinear degradation patterns directly from measured voltage, current, temperature, or diagnostic features, and they have achieved strong predictive accuracy in many controlled datasets. Nevertheless, their limitations are now well documented. First, they are often heavily dependent on large, well-labeled, and distributionally representative datasets,12 which are difficult to obtain in early life or under real-world EV operation. Second, many models suffer from weak generalization when the testing domain differs from the training domain in terms of operating window, charging protocol, ambient temperature, or cell-to-cell variability. Third, many deep learning models remain difficult to interpret physically, which limits trust and deployment in safety-critical battery management settings.13
A further limitation shared by both model-based and data-driven methods is their difficulty in delivering reliable early-life prediction. Early degradation trajectories are typically subtle, noisy, and partially confounded by formation effects, hysteresis, and operating-condition variability. As a result, models built only on cycle index or simple scalar health indicators may miss higher-order structure in the degradation trajectory. This limitation has motivated the move toward richer signal representations and multi-source features. For example, internal-resistance-informed learning has been proposed as a way to bridge equivalent-circuit information and data-driven mapping, improving robustness and interpretability.
In response to these limitations, recent research has increasingly shifted toward hybrid physics-informed machine learning methods.14,15 These approaches aim to combine the extrapolation capability and interpretability of physics-based models with the flexibility and pattern-recognition strength of data-driven techniques. Aykol et al.16 argued that neither purely physics-based nor purely data-driven approaches are sufficient in isolation, and proposed several integration strategies that explicitly fuse mechanistic understanding with statistical learning for battery lifetime prediction. Building on this perspective, subsequent studies have operationalized hybrid frameworks in practical settings. For instance, Nascimento et al.17 developed hybrid physics-informed neural networks for battery modelling and prognosis, while Kohtz et al.18 introduced a physics-informed learning framework leveraging partial charging segments with embedded physical constraints. More recently, Wang et al.19 proposed a physics-informed neural network capable of stable SOH estimation across varying battery chemistries and operating conditions. Collectively, these studies indicate that hybrid approaches can improve generalization, reduce data requirements, and enforce physically consistent predictions, particularly when full-cycle labelled datasets are unavailable.
Another important emerging direction is uncertainty-aware prognostics. Conventional SOH estimation methods typically produce deterministic point predictions, which can be misleading given the stochastic nature of degradation and the significant variability observed across battery populations. To address this, probabilistic and uncertainty-aware frameworks have been developed to quantify prediction confidence and support more reliable decision-making in battery management systems. Thelen et al.20 provided a comprehensive review of probabilistic machine learning approaches for battery diagnostics and emphasized uncertainty quantification as a critical requirement for real-world deployment. In a similar vein, Zou et al.21 introduced a Bayesian model averaging approach for SOH estimation that captures both model-form and parameter uncertainty, enabling probabilistic outputs alongside point estimates. Earlier work by Richardson et al.22 also highlighted the importance of uncertainty quantification in battery lifetime prediction using Gaussian process regression. These developments are particularly important for early-life prediction, where limited degradation information and higher noise levels inherently increase uncertainty, and deterministic models may otherwise lead to overconfident and potentially misleading predictions.
Despite these advancements, a fundamental limitation remains: most existing approaches, whether physics-based, data-driven, or hybrid, still rely on compressed representations of battery degradation, typically in the form of scalar indicators such as capacity and internal resistance or a limited set of engineered features extracted from voltage and current signals. While hybrid frameworks improve robustness and predictive reliability, they generally operate on reduced-dimensional representations that may not fully capture the underlying structure of degradation processes.
In particular, degradation in lithium-ion batteries is reflected in the continuous evolution of electrochemical signals, such as voltage profiles, incremental capacity curves,23 and differential voltage characteristics. Prior studies have shown that these signals contain rich information about electrode-level aging mechanisms, phase transitions, and kinetic limitations that are not adequately preserved when reduced to scalar features or sparse descriptors. Even in advanced machine learning pipelines, feature extraction is often performed as a preprocessing step, resulting in the loss of important shape-based information and temporal dependencies embedded in the full signal.
This creates a critical gap in existing SOH estimation and prognostics frameworks: the lack of methodologies that explicitly model degradation as a functional object rather than a collection of discrete features. As a result, subtle yet systematic changes in curve morphology, particularly in early life, may remain undetected or underutilized.
To address this limitation, this paper introduces a multi-stage functional framework for early and accurate prediction of lithium-ion battery (LIB) degradation.
The key contributions are summarized as follows:
1. Functional decomposition and multi-stage predictive learning: this work establishes a functional-learning approach in which ICA curves are represented in a compact FPCA subspace that captures the dominant electrochemical ageing modes, including peak migration and compression. Instead of relying on a single regression model, we propose a multi-stage hybrid architecture that first extracts the deterministic ageing trajectory through linear regression and then models the remaining nonlinear, cycle-dependent dynamics using Gaussian process regression (GPR). This layered structure delivers higher predictive accuracy, improved robustness to noise, and enhanced physical interpretability, outperforming conventional single-stage ICA-based approaches.
2. Early-life SoH and EOL forecasting with an early-cycle observation window: the proposed framework delivers high-accuracy prediction of SoH evolution and end-of-life EOL using only the first 33% of the battery's operational life, offering a level of early-cycle sensitivity that is rarely achieved in existing prognostic methods. By exploiting the functional structure embedded in ICA curves and refining the remaining nonlinear dynamics through an uncertainty-aware GPR layer, the model provides reliable long-horizon forecasts at the battery's optimal operating temperature.
Such early and accurate degradation prediction is pivotal for a wide range of applications. For EVs, early EOL prediction is crucial for anticipating when a battery pack will no longer meet performance requirements, supporting informed decisions on vehicle resale value, second-life deployment, and long-term fleet planning. In smartphones, laptops, and other portable electronics, it helps prevent unexpected failures and enables proactive battery replacement. The framework therefore, enhances reliability and health-aware management across both large-scale and consumer battery systems.
The graphical abstract below shown in Fig. 1 illustrates the proposed multi-stage functional framework for early prediction of lithium-ion battery degradation and SoH. The process begins with the extraction of ICA features from voltage–capacity profiles, which are subsequently smoothed, aligned, and normalized to form comparable functional representations. FPCA is then applied to capture the dominant degradation modes as a set of low-dimensional functional scores. These scores are fed into a two-stage regression model, combining the global linear trend with nonlinear residual learning via GPR, to estimate both SoH and EOL at different operating temperatures. The integrated workflow bridges electrochemical interpretability with statistical learning, enabling accurate and early EOL prediction after observing only one-third of the total cycle life.
FPCA was first brought into use for MRI by Viviani et al.,25 mainly to find and summarize the main patterns in brain activity data.
FPCA has been applied alongside several types of battery diagnostic curves to capture degradation behaviour compactly. One well-known diagnostic tool is the ICA curve, which can reveal distinct signatures of aging and has been widely used in SoH estimation. FPCA has been adopted in battery prognostics to reduce high-dimensional cycle data into a few informative modes of variation and then regress health indicators on those mode scores. Early work modeled discharge-voltage trajectories with FPCA and a Bayesian updating scheme to predict residual life, demonstrating that nonparametric functional representations can capture degradation dynamics effectively.26 Subsequent studies applied FPCA to multi-sensor monitoring streams (voltage, current, temperature) and used sparse regression to predict capacity/SOH from functional scores, showing competitive accuracy on the NASA datasets.27 More recently, nonparametric degradation frameworks have incorporated FPCA within reliability settings to handle incomplete/truncated signals for remaining useful life (RUL) prediction.28 Beyond operational datasets, FPCA has also been used to compress and forecast time-resolved outputs in battery manufacturing simulations, highlighting its utility as a general functional-data compressor before downstream prediction.29 While FPCA has demonstrated strong capability in extracting low-dimensional degradation features across diverse battery signals, accurate early-life EOL prediction remains challenging, with errors generally exceeding 6–10% or tens of cycles MAE. This motivates the development of a more structure-aware functional framework capable of leveraging derivative-based diagnostics such as ICA while maintaining robustness to noise and misalignment.
In parallel, the ICA literature has largely focused on extracting peak/shape features from incremental-capacity curves and then learning SOH/RUL with statistical or machine-learning models. Recent works pair ICA-derived indicators with Gaussian Process Regression (GPR) to forecast RUL in high-capacity cells.30 Separately, GPR has been extensively employed for capacity/SOH estimation directly from voltage–time snippets or operating-condition inputs without ICA.31,32 To the best of our knowledge, no existing work has jointly leveraged ICA, FPCA, and a two-stage modelling architecture combining linear regression with Gaussian process residual learning for SOH estimation and EOL prediction. Our approach is the first to formalize ICA curves as functional representations of ageing behaviour, extract degradation modes via FPCA, and then model capacity evolution by using a linear model for the global trend followed by a GPR model that captures nonlinear residual behaviour. This integrated, mode-aware formulation enables earlier and more accurate EOL estimation than traditional scalar-feature or single-model approaches.
However, ICA curves present practical challenges: they are often noisy, highly sensitive to testing conditions, and produce large volumes of data that are difficult to process directly. To address these limitations, this study applies FPCA to extracted ICA curves, capturing the dominant degradation patterns and transforming them into a compact set of representative features. These FPCA scores are then tracked across cycles, enabling a direct connection between their evolution and the battery's degradation behavior. Subsequently, the extracted functional features are used as inputs to a multi-stage regression framework that combines linear regression with GPR to predict the battery's SoH and end-of-life EOL.
Before going into detail about the ICA processing and the structure of predictive model, it is important to recognize the challenges inherent in using ICA for functional learning. ICA curves are highly sensitive to noise, easily lose monotonicity, and exhibit nonlinear peak shifts with ageing, all of which complicate their direct use in functional decomposition. Applying FPCA to such data is nontrivial: misalignment, noise amplification, and mixed electrochemical effects can produce principal components that are difficult to interpret and unstable across cycles. Even when meaningful components are extracted, their relationship with capacity is only approximately linear, leaving residual nonlinear structure unmodeled and necessitating an additional modeling stage to capture the remaining degradation dynamics.
These challenges motivate the need for a carefully designed preprocessing pipeline and a multi-stage modelling strategy. In the next subsection, we outline how ICA monotonic reconstruction, voltage-axis alignment, functional centering, and staged linear–nonlinear regression are integrated to overcome these limitations and yield stable, interpretable, and cycle-sensitive prognostic features.
The ICA curve is obtained by differentiating discharge/charge capacity with respect to voltage. Given time-sampled data (V(t), I(t)), the accumulated capacity is:
![]() | (1) |
The ICA curve is then estimated as:
![]() | (2) |
1. Monotonic construction of Q(V)
During constant-current discharge, the instantaneous capacity is obtained by integrating the current over time as shown in eqn (1):
![]() | (1) |
| Q(Vj) = median{Qk|Vk = Vj} | (3) |
This step guarantees that Q(V) is single-valued and monotonic, enabling a stable numerical derivative
for ICA computation.
2. Voltage alignment to a common axis
To enable point-wise comparison across cycles, each capacity curve Q(V) is interpolated to a common voltage axis Vi∈[2.5, 4.25] V using:
| Qi = interp(Q(Vk), Vk, Vi) | (4) |
This alignment ensures that all curves share identical voltage coordinates, thereby allowing element-wise operations such as mean subtraction and covariance computation across cycles. Such consistency is essential for building the covariance matrix used in functional principal component analysis (FPCA), defined as:
![]() | (5) |
is their mean function.
3. Smoothing and filtering
Differentiation amplifies noise; hence smoothing is applied to Q(V) or directly to dQ/dV.
The ICA smoothing strategy was selected based on a sensitivity analysis that quantified the impact of preprocessing on the downstream functional principal component analysis (fPCA). Several filtering techniques were evaluated, including moving average, Gaussian smoothing, Savitzky–Golay (SG), and weighted Savitzky–Golay (WSG). The evaluation was performed relative to the raw ICA baseline by measuring changes in the mean ICA curve, explained variance structure, and fPCA score stability.
The smoothing methods are defined as follows. The moving-average filter is
![]() | (6) |
The Gaussian filter is
![]() | (7) |
The Savitzky–Golay filter fits a local polynomial:
![]() | (8) |
The weighted variant becomes:
![]() | (9) |
Table 1 summarizes the most relevant metrics from the sensitivity analysis. The weighted Savitzky–Golay filter achieved the lowest overall instability among the filtered cases, with improved score stability and strong preservation of the mean ICA shape. Although Gaussian smoothing resulted in smaller variance shifts, it significantly degraded the fPCA structure. Therefore, the Savitzky–Golay family was selected due to its balance between smoothing and structural preservation.
| Filter | RMSE | EV shift | fPC corr. | Score RMSE |
|---|---|---|---|---|
| Lower RMSE and variance shifts, and higher correlations indicate better performance. | ||||
| WSG (11,2) | 0.0068 | 0.310 | 0.572 | 0.658 |
| Moving (5) | 0.0107 | 0.370 | 0.567 | 0.784 |
| SG (11,2) | 0.0100 | 0.467 | 0.399 | 1.024 |
| Gaussian (2) | 0.0154 | 0.242 | 0.339 | — |
To further justify the preprocessing choice, a dedicated sensitivity analysis was conducted on the Savitzky–Golay parameters, namely the window size M and polynomial order p. The results are summarized in Table 2. It is observed that smaller window sizes with higher polynomial orders provide the most stable fPCA representation.
| (M,p) | RMSE | EV shift | fPC corr. | Score RMSE |
|---|---|---|---|---|
| (5,4) | ≈0 | ≈0 | 1.000 | ≈0 |
| (7,4) | 0.0017 | 0.052 | 0.541 | 0.541 |
| (9,4) | 0.0033 | 0.176 | 0.678 | 0.504 |
| (5,3) | 0.0021 | 0.049 | 0.497 | 0.559 |
| (5,2) | 0.0023 | 0.049 | 0.497 | 0.559 |
In particular, the configuration (M,p) = (5,4) produced nearly identical results to the raw baseline, indicating minimal distortion. Among practical smoothing configurations, (7,4) and (9,4) offered the best trade-off between noise reduction and preservation of the underlying functional structure. Larger window sizes consistently increased score RMSE and reduced fPC correlation, confirming that excessive smoothing distorts the extracted principal modes.
Overall, this analysis demonstrates that the fPCA results are sensitive to ICA preprocessing, and that Savitzky–Golay filtering with carefully selected parameters provides a robust and interpretable solution. The final choice is therefore guided by quantitative stability criteria rather than arbitrary smoothing assumptions.
The Savitzky–Golay filter is commonly used, which performs a local polynomial fit and computes a smoothed value or derivative analytically:
![]() | (10) |
![]() | (11) |
Let xi(v) denote the ICA curve (e.g., dQ/dV) for cycle i∈1, … N, defined on a common voltage domain v∈[a, b] (e.g., [2.5, 4.2] V). This represents the studied cells’ normal working range. FPCA seeks a mean function μ(v) and orthonormal eigenfunctions {ϕk(v)}k≥1 such that
![]() | (12) |
.
The FPCA mean is the pointwise average over cycles. The mean ICA curve gives the average behavior, and centering shows how each cycle differs from that average. By looking at how these differences progress from one cycle to the next, we can see the signs of degradation taking shape:
![]() | (13) |
Covariance shows how different parts of the ICA curve change together across cycles. In batteries, this means that if one peak shifts or shrinks, covariance reveals which other regions of the curve tend to change at the same time, pointing to linked degradation processes.
![]() | (14) |
After covariance, we extract eigenfunctions. Eigenfunctions are the main patterns of how the ICA curves change. Each one highlights a different way the battery ages, like peaks shrinking or shifting, and their scores tell us how much each cycle follows that pattern.
The main functional patterns of degradation are obtained as eigenfunctions of the covariance operator, solving
![]() | (15) |
![]() | (16) |
Each eigenfunction ϕk(v) represents a dominant mode of variation, and its eigenvalue λk quantifies the variance explained by that mode.
Once the eigenfunctions are obtained, each ICA curve can be represented as a combination of these orthogonal basis functions:
![]() | (17) |
![]() | (18) |
The score ξi,k measures how strongly cycle i exhibits the degradation pattern described by eigenfunction ϕk(v).
The total functional variance is given by
![]() | (19) |
![]() | (20) |
The cumulative explained variance ratio (CEVR) for the first K components is
![]() | (21) |
In this study, five principal components were retained, which together captured more than 95% of the total functional variance.
The FPCA scores derived in this section serve as the input features to the proposed hybrid prediction framework, which combines linear regression and GPR to estimate the battery's SOH. The following section 3.4 presents the structure and formulation of this predictive model in detail.
We retain only the first few (typically up to five) fPCA scores for each cycle, as these capture most of the functional variability while avoiding overfitting. These scores form the input feature vector for the regression model.
These compact, information-rich components serve as the predictive features for the regression model.
The chosen kernel has the form:
The RBF kernel was chosen because battery degradation and capacity fade happen gradually, not suddenly. Cycles that are close to each other in time or have similar FPCA features are expected to show similar capacity values. The RBF kernel captures this smooth and continuous behavior by making nearby points strongly correlated and distant points weakly correlated. After the linear regression step, the remaining residuals are small and smooth, with no large jumps or oscillations, so the RBF kernel is well suited to model these +nonlinear corrections.
The final prediction is obtained by combining the linear mean estimate and the GP-predicted residual mean:
ŷ = ŷlin + GP. |
The GP also provides a posterior variance that quantifies the predictive uncertainty of the residual correction, offering confidence intervals for each prediction. This hybrid model therefore provides both interpretability and flexibility while maintaining calibrated uncertainty estimates.
The metrics used for model evaluation were the Mean Absolute Percentage Error (MAPE), Mean Absolute Error (MAE), and the coefficient of determination (R2), defined as follows:
![]() | (22) |
![]() | (23) |
![]() | (24) |
In this work, the input features x are taken as the principal component (FPCA) scores extracted from ICA curves, which effectively reduce the dimensionality of the degradation signals. The outputs y are the measured SOH values. The hybrid model utilized in this study learns the nonlinear mapping from FPCA scores to SOH, while also quantifying the uncertainty of predictions. This property is particularly valuable in battery prognostics, where reliability and confidence in predictions are crucial (Fig. 3).
![]() | ||
| Fig. 3 Overall FPCA–ICA pipeline showing (a) mean ICA curve, (b) centered curves, (c) FPCA scores, (d) explained variance ratio and cumulative variance curve, and (e) capacity prediction. | ||
To investigate the influence of training size on prediction accuracy, the model was trained using progressively larger portions of early-life cycling data, as summarized in Table 3. Prediction performance consistently improves as additional cycles are included, indicating that longer training histories provide richer information about the underlying degradation process. Nonetheless, using approximately 33% of the total dataset (around 160 cycles) is already sufficient to capture the dominant aging behavior and to support reliable SoH prediction. While further extending the training window continues to yield incremental improvements, the 33% training horizon was chosen because it enables accurate and stable prediction while allowing SoH estimation to be performed as early as possible in the battery lifetime.
| Training cycles | MAPE (%) | R2 |
|---|---|---|
| 1 | 9.35 | −2.619 |
| 10 | 5.42 | −0.420 |
| 50 | 1.47 | 0.884 |
| 100 | 0.96 | 0.938 |
| 200 | 0.89 | 0.926 |
As discussed earlier, the proposed hybrid linear + GPR model integrates a linear regression layer to capture the global degradation trend and a GPR layer to model the nonlinear residuals. Building on this framework, the FPCA scores extracted from the incremental capacity curves were employed as the primary input features for model training.
For comparison, a standalone GPR model was also tested to evaluate the benefit of hybridization. While the GPR-only approach captured the general degradation behavior, its performance was limited, achieving a mean absolute percentage error (MAPE) of 2.18% and a coefficient of determination (R2) of 0.531 at 155 training cycles. In contrast, the hybrid model demonstrated superior accuracy, with a MAPE of 0.78% and an R2 of 0.936.
Furthermore, the hybrid model was able to forecast the 80% state-of-health (SOH) threshold at cycle 476, closely matching the experimentally observed end-of-life (EOL) at cycle 480. This strong agreement highlights the model's reliability and robustness across the entire degradation range.
As illustrated in Fig. 4, the hybrid approach provides a smoother and more accurate trajectory than the standalone GPR model, especially in the later stages of cycling, where degradation tends to accelerate. The improved alignment with experimental data validates the effectiveness of combining linear and nonlinear components in a single predictive framework.
The FPCA scores were used as inputs to the two-stage model, which resulted in a mean absolute percentage error (MAPE) of 0.89% and a coefficient of determination (R2) of 0.926. Based on these features, the model was able to forecast the 80% SOH threshold at cycle 476, which is in close agreement with the actual end-of-life observed at cycle 480.
The dataset used in this study is the NASA Randomized Battery Usage Dataset (Table 4),34 which contains cycling experiments on commercial 18
650 lithium-ion cells. For consistency, the regular battery subset was selected, as these cells underwent repeatable charge–discharge cycles under controlled conditions. This makes them particularly suitable for ICA and FPCA, as the data provide clear degradation trajectories without additional variability introduced by randomized stress tests. In this dataset, the batteries were cycled through full discharge. This ensured that the complete usable charge of the cell was measured in every cycle.
| Field (units) | Symbol | Description |
|---|---|---|
| Time (s) | t | Elapsed time since experiment start. |
| Mode (—) | — | 0 = rest, 1 = charge, −1 = discharge. |
| Voltage (V) | V | Terminal cell voltage. |
| Current (A) | I | Applied current (discharge +, charge −). |
| Capacity (Ah) | Q | Cumulative transferred charge. |
| Temperature (°C) | T | Cell surface temperature (if available). |
| File index (—) | — | One file per battery (e.g., battery01.csv). |
Moreover, the same preprocessing pipeline applied to the lab dataset was also consistently implemented on the NASA dataset to ensure a unified and fixed methodology across all analyses.
To obtain the capacity, Coulomb counting was applied, where the discharge current is integrated over time. This method makes it possible to quantify the capacity for each cycle and track its reduction over time, thus allowing the degradation of the battery to be observed clearly.
ICA curves were generated for all charge–discharge cycles to capture the evolution of the battery's electrochemical behavior over time. To evaluate the model's predictive capability, the first 33% of the available cycles were used for training, while the remaining cycles were reserved for testing. This approach ensures that the model learns only from the early degradation phase and is then challenged to predict the long-term performance, reflecting a realistic and forward-looking scenario for battery health estimation.
As presented in Fig. 6, the model effectively reproduces the actual degradation trajectory of the battery. The quantitative results summarized in Table 7 confirm this performance, achieving an MAPE of 2.99% in estimating the SoH and an average EOL prediction error of 4.95%. These results demonstrate that the model maintains strong predictive accuracy even when trained on a limited portion of the data, highlighting its ability to generalize degradation trends beyond the training range.
Further comparison in Fig. 7 shows a close correspondence between the predicted and actual EOL cycles. The predicted curve follows the measured degradation path with high consistency, with minor deviations appearing only near the end-of-life region, where degradation dynamics become more nonlinear and difficult to model. Despite this, the multi-stage functional framework effectively captures the underlying progression of degradation, illustrating the robustness of combining ICA-based feature extraction with FPCA–GPR modeling.
Following the evaluation of SoH estimation and EOL prediction accuracy, the predictive models were further examined in the functional domain through ICA reconstruction. Specifically, ICA curves were reconstructed from predicted FPCA scores using two approaches: a GPR-only score model and a hybrid linear + GPR residual model, and compared against the experimentally derived ICA curves at representative mid-to-late life cycles. This analysis provides insight beyond capacity-level metrics by assessing how well each model preserves the underlying electrochemical degradation signatures encoded in the ICA profiles.
Fig. 5 shows that both models are able to capture the overall evolution of the ICA profiles; however, the hybrid linear + GPR approach demonstrates slightly more consistent reconstructions than the GPR-only model. As summarized in Table 5, the MAPE between the true ICA curves and the FPCA-based reconstructed ICA (using predicted scores) remains consistently low across all cycles.
| Cycle | Model | MAPE (%) |
|---|---|---|
| 200 | GPR-only | 8.15 |
| Linear + GPR | 1.97 | |
| 250 | GPR-only | 13.40 |
| Linear + GPR | 10.55 | |
To further demonstrate the competitiveness and robustness of the proposed methodology, a comparison is conducted against representative state-of-the-art studies from the literature focusing on early-life prediction of lithium-ion battery degradation. As shown in Table 6, the proposed hybrid FPCA-based framework achieves competitive performance in both SOH estimation and EOL prediction, while maintaining interpretability and reduced model complexity compared to deep learning approaches.
| Paper | Methodology | Cycles used | EOL error |
|---|---|---|---|
| Online lifetime prediction for lithium-ion batteries35 | Data-driven prediction with cycle-based updates and ensembling | 200 | 6.1% (MAPE) |
| Early prediction of lithium-ion degradation trajectories36 | Voltage curve feature extraction with regression-based prediction | 100 | 10.0% (MAPE) |
| Aging trajectory and end-of-life prediction via similar fragment extraction37 | Capacity degradation trajectory analysis using fragment extraction | Not explicitly specified | 3.23% (abs. rel. error) |
| Lithium-ion battery end of life prediction based on the decelerating aging point38 | Feature-based prediction using degradation characteristics | Not explicitly specified | 8.7% (MAPE, max) |
| Proposed FPCA + linear + GPR method | ICA-based FPCA feature extraction with hybrid regression | 33% of cycles | 4.95% (EOL error) |
| Battery | Actual | Pred. | Δ (cycles) | MAE (Ah) | MAPE (%) |
|---|---|---|---|---|---|
| EOL | EOL | ||||
| 1st | 160 | 151 | −9 | 0.0691 | 3.63 |
| 2nd | 590 | 611 | +21 | 0.0321 | 2.07 |
| 3rd | 340 | 353 | +13 | 0.0395 | 3.69 |
| 4th | 280 | 261 | −19 | 0.0382 | 2.57 |
It should be noted that the reported EOL error metrics across the literature are not fully standardized, with some studies reporting mean absolute percentage error (MAPE), while others report absolute relative error or maximum error. Despite these differences, the comparison provides a general benchmark for evaluating prediction accuracy.
Although certain methods in the literature report lower EOL prediction errors, such as,37 these approaches often rely on a larger or unspecified portion of the degradation trajectory. In contrast, the proposed method achieves competitive accuracy while utilizing only early-life data (33% of cycles), highlighting its effectiveness for early prediction applications.
Overall, these findings indicate that early-cycle data can be used to accurately predict future battery health and lifespan. The proposed approach offers a practical and data-efficient solution for real-time health forecasting, reducing the need for extensive long-term testing while maintaining a high level of predictive reliability.
One limitation of this work is that it is developed and validated on a relatively limited dataset, which may not fully represent the variability seen across larger battery populations and real operating conditions. In addition, the framework relies on full charge–discharge data, whereas in practical EV applications, batteries are often only partially cycled, making such data difficult to obtain. This could limit how directly the approach can be used in real systems. Finally, although the method performs well under controlled conditions, applying it within real-world battery management systems may present challenges due to measurement noise, changing operating conditions, and practical implementation constraints.
Future work will focus on extending the framework to operate effectively with partial charge–discharge data, enabling its use in more realistic scenarios where full cycling is not available. This will make the approach more applicable to real-world battery management systems, particularly in EVs where operating conditions are highly dynamic. In parallel, a hybrid fPCA framework will be developed by integrating a weighted covariance matrix, allowing greater emphasis on voltage regions more strongly correlated with capacity degradation. This enhancement is expected to improve the model's ability to capture critical degradation signatures and enhance the accuracy of SoH prediction. Further validation will also be carried out using real-world EV datasets to assess the robustness and practical applicability of the proposed approach.
| This journal is © The Royal Society of Chemistry 2026 |