Open Access Article
Sreesanth Kolangaravalappila,
Ramandeep Singha,
Pooja Jamdagnib and
Ashok Kumar
*a
aDepartment of Physics, Central University of Punjab, Bathinda, 151401, India. E-mail: ashokphy@cup.edu.in
bDepartment of Computational Sciences, Central University of Punjab, Bathinda, 151401, India
First published on 17th December 2025
The rising demand for clean and green energy sources has sparked global interest in sustainable hydrogen production technologies. To address this problem, photocatalytic water splitting has emerged as a promising solution for the sustainable production of green hydrogen and oxygen. We investigate the hydrogen adsorption Gibbs free energy for hydrogen evaluation reaction (HER) and rate-limiting Gibbs free energy for oxygen evolution reaction (OER) to analyse the catalytic activity of the transition metal (TM) intercalated PtXY/ζ-phosphorene (X ≠ Y; X, Y = S, Se, Te) van der Waals heterostructures (vdWHs). Our workflow involves generating a large dataset, followed by performing high-throughput first-principles density functional theory (DFT) calculations on a small fraction of the dataset to obtain the training dataset for a machine learning (ML) framework. Incorporating the ML with the DFT workflow, we obtained 13 potential catalysts for HER and 6 potential catalysts for OER. The interlayer distance of the heterostructures and the bond length between the Pt and X atom emerged as the most influential features for HER, whereas the choice of adsorption site is one of the major OER descriptors. Overall, ML approach integrated with high-throughput first principles calculations is promising for the prediction of potential TM-intercalated vdWHs photocatalysts for water splitting.
Photocatalytic water splitting is a promising and one of the leading methods for hydrogen production, harnessing two of the most abundant and renewable resources, water and sunlight. The low-cost implementation, low energy consumption, and minimal infrastructure requirement for photocatalytic water splitting offer a financially viable and efficient resource.5 There are three significant steps in the water splitting photocatalysis process: electron–hole pairs generation by sunlight with photon energy exceeding the semiconductor photocatalyst band gap; photogenerated charge carriers separation and migration to the surface of photocatalyst; the hydrogen evolution reaction (HER) to produce hydrogen and oxygen evolution reaction (OER) to produce oxygen by the photogenerated electrons in conduction band and photogenerated holes in valence band of photocatalyst.6–8 Unlike conventional hydrogen production methods such as electrolysis, thermal water splitting, and cracking of petroleum, the photocatalytic water splitting is environmentally friendly and less expensive.9
Two-dimensional (2D) Janus transition metal dichalcogenides (JTMDCs) exhibit a unique asymmetric structure, where a transition metal layer is sandwiched between two chalcogen layers, thereby breaking the structural symmetry along the z-direction.10 This asymmetry creates an intrinsic dipole, which can significantly enhance photocatalytic performance compared to conventional symmetric TMDCs.11,12 Also, JTMDs show high adsorption coefficients and low exciton binding energy, which are beneficial for photocatalysis applications.13,14 Among the many promising JTMDs, PtSSe is experimentally synthesized and has desirable properties that make it suitable for water splitting photocatalysis.15 Janus PtSSe demonstrates an impressive solar-to-hydrogen (STH) conversion efficiency (∼18%), considerably higher than other active photocatalytic materials such as Ga2S3 (6.4%), Ga2SSe bilayer (7.42%), pentagonal PdSe2 (12.59%), and even Janus WSSe (14.46%).16
In the recent past, phosphorene allotropes have gained the interest of researchers. Heterostructures of blue phosphorene and black phosphorene have shown promising photocatalytic water-splitting activity.17,18 The PtSSe/ζ-phosphorene heterostructure has been demonstrated to exhibit semiconducting properties with impressively high carrier mobility of ∼103 cm2 V−1 s−1, and a type-II mechanism favourable for water splitting photocatalysis.19 It has also been shown that the STH conversion efficiency of these heterostructures can exceed 10%, which motivates further investigation into the characteristics of phosphorene.
The unique features, such as effective charge separation due to the interfacial electric field in van der Waals heterostructures (vdWHs), make them suitable candidates for photocatalytic water splitting.20 vdWHs also show excellent solar light absorption ability, e.g., MoSe2/HfS2 heterostructures demonstrate a strengthened optical absorption intensity in both the visible and infrared regions compared to the individual HfS2 and MoSe2 monolayers;21 GeI2/C2N vdWHs display a higher redshift around 720 nm than either of its constituent monolayers, enhancing its ability to capture solar energy.22
Furthermore, Janus heterostructures possess some distinctive characteristics that make them highly desirable for photocatalytic water splitting. The coupling of the intrinsic intralayer polarization in the Janus layer with the interlayer built-in polarisation field provides an additional degree of freedom for tuning the photocatalytic properties. This intrinsic polarization plays a vital role in achieving spatial charge-carrier separation across the interface.23 Beyond charge separation, the Janus layer polarization also enables modulation of band alignment through variations in stacking order. For instance, in the BlueP/MoSSe heterostructure, altering the stacking sequence transforms the band alignment from type-I to type-II, a configuration particularly favourable for photocatalytic applications.23 PtSSe-based heterostructures have shown overall excellent results for photocatalytic water splitting.24
Also, the metal atom intercalation can significantly influence the photocatalytic properties of vdWHs, such as improving the charge carrier's transport, alteration in the band gap.25 In addition, metal atoms, when intercalated into layered structures, can create an optimised charge transfer pathway.20 When alkali metals are intercalated into graphitic carbon nitride, it induces the charge redistribution that accelerates the separation efficiency of photogenerated carriers, with intercalated atoms bridging the adjacent layers and enhancing the interlayer charge transportation.26
Although conventional density functional theory (DFT) calculations are computationally intensive for heterostructures, accommodating lattice mismatch often requires a supercell, which can increase the number of atoms and computational costs. Incorporating a machine learning (ML) workflow into the conventional DFT method could significantly reduce the time and resources needed. Furthermore, the size of the dataset for high-throughput computation is often in the order of hundreds or thousands; exploring each of them poses a huge challenge. However, machine learning-assisted screening solves this constraint to a great degree.27 Machine learning is also useful in capturing the complex patterns in the dataset, and analysing these patterns could give us insights into how the results depend upon various material properties.28–30
Based on the insights gathered from previous research, we have investigated Janus PtSSe/ζ-phosphorene (X ≠ Y, X, Y = S,Se,Te) heterostructures intercalated with 3d-transition-metal elements for photocatalytic water splitting, utilising high-throughput DFT computation and machine learning. Intercalating transition metals into selected sites between the layers of the heterostructures and adsorbing H, O, OH, and OOH on particular adsorption sites, a dataset using high-throughput DFT computation has been generated for machine learning-assisted screening. Machine learning has proven to be a valuable tool, as evident from its performance. The results are further analysed for potential photocatalytic activity descriptors.
The schematic representation of the methodology is shown in Fig. 1. The features used to train the ML models are presented in Table S2 of the SI. We employed a set of supervised ML regression algorithms, including linear models such as ridge and LASSO, which incorporate regularisation to prevent overfitting, as well as ensemble and tree-based models such as random forest regressor (RFR), gradient boosting regressor (GBR), and ada boost regressor (ABR), which were extensively used previously.42 Also, an artificial neural network (ANN), specifically a multi-layer perceptron model (MLP), was considered due to its ability to capture complex non-linear relations.43 Additionally, neighbor-based models such as KNeighbors Regressor (KNR) were also utilized. The various ML models used in the study is given in Table S3, SI.
The K-fold cross-validation method is used for model selection with 10-fold splits. The randomized search CV method is used to optimize the hyperparameters. The quantitative evaluation of the optimized model is carried out using standard regression metrics: mean absolute error (MAE) and coefficient of determination (R2). The factors driving the predictions of the ML model are crucial for identifying the key indicators of photocatalytic activity. SHaply Additive explanations (SHAP)44 is a robust framework for interpreting ML models. SHAP is based on cooperative game theory, which enables the quantitative assessment of the importance of each input feature and provides insights into the model's final prediction.
| EB = EPtXY/ξ−P − EPtXY − Eξ−P | (1) |
The calculated value of binding energy of PtSSe/ζ-phosphorene (S-side), PtSSe/ζ-phosphorene (Se-side), PtSTe/ζ-phosphorene (S-side), PtSTe/ζ-phosphorene (Te-side), PtSeTe/ζ-phosphorene (Se-side), and PtSeTe/ζ-phosphorene (Te-side), is −37 meV Å−2, −22 meV Å−2, −25 meV Å−2, −21 meV Å−2, −24 meV Å−2 and −22 meV Å−2, respectively. The negative value of the binding energy indicates good energetic stability of these heterostructures. The calculated binding energies are comparable with other layered crystals, such as graphite48 and MoS2.49
For the six stable primary heterostructures, we investigated two distinct interlayer sites for the intercalation of ten 3d-transition-metal atoms: Sc, Ti, V, Cr, Mn, Fe, Co, Ni, Cu and Zn. The relative position of the intercalating sites is illustrated in Fig. 3. In this way, we obtained 120 intercalated configurations from each with two intercalation sites with different metals atoms. After obtaining the optimised 120 structures, we calculated the binding energy of the intercalated transition metal element as:
| Eb = ETM/subtrate − ETM − Esubtrate | (2) |
![]() | ||
| Fig. 4 Heatmap of the binding energy of transition metals intercalated at site 1 and site 2 of Janus PtXY/ζ-phosphorene (X ≠ Y; X, Y = S, Se, Te) heterostructures. | ||
After obtaining the 120 intercalated vdWHs and evaluating their stability through binding energy, we identified two active sites for hydrogen adsorption, with one on each layer of the heterostructure. The active site ‘one’ is on top of the topmost phosphorus atom of the phosphorene layer, while active site ‘two’ is located above the topmost chalcogen atom of the PtXY layer. The position of the adsorption sites relative to the layers is illustrated in Fig. 3(b). Each of 120 intercalated heterostructures features two distinct adsorption sites, resulting in a total of 240 configurations for hydrogen adsorptions. For hydrogen adsorption, we prepared a set of 240 configurations from which 120 entries (50% of the dataset) were randomly selected for further DFT calculations. We used the Panda's library in Python for the random selection of data entries. Random selection ensures that the training data can represent the entire dataset. For the adsorption calculation, all the atoms of the intercalated vdWHs are kept fixed in their equilibrium positions, and the adsorbed hydrogen atom is allowed to move freely in the z-direction.
Similarly, we have prepared a dataset of 240 entries each for OER calculations involving adsorption of O, OH, and OOH. We separately selected 120 random entries from these datasets for DFT calculation. In the case of OER, rate-limiting Gibbs free energy is the target property, so the rate-limiting step is determined from the Gibbs free energy values. In this way, we obtained the dataset using high-throughput computation for HER and OER, with hydrogen adsorption Gibbs free energy and rate-limiting Gibbs free energy being the target properties for the respective processes. The details of Gibbs free energy calculations are given in SI.
Next, we addressed the overfitting problem of the ensemble models, when the model is too complicated for the problem at hand, so the model learns from the training dataset, on the underlying noise, so as a result, the model performs exceptionally well for the training dataset but performs worse when presented with an unfamiliar prediction dataset. To check whether the selected model is overfitting, we employed a train-test verification approach. We trained the models with 80% of the available dataset and the remaining 20% test set, which allows us to compare the performance of the seen and unseen datasets. Train-test verification metrics are given in Table S5 (SI). The ExtraTreeRegressor achieved a perfect score for the training dataset, but slightly reduced the lower performance for the testing data, indicating the potential for overfitting. The remaining model exhibits comparable performance on both the training and testing datasets. We used best best-performing model, RandomForestRegressor, for further study.
Now we performed the recursive feature elimination process on the random forest regressor (RFR) model with the best combination of hyperparameters (n_estimators = 80, min_samples_split = 2, min_samples_leaf = 1, max_features = 15, max_depth = 30). A learning curve depicting the variation of model accuracy (R2) with the number of samples in the training dataset is plotted (Fig. S13, SI). For a higher number of training samples, the accuracy of both the training dataset and the testing dataset is close to each other. After the feature elimination and hyperparameter optimisation process, the model metrics are obtained as R2 = 0.84 and MAE = 0.11 eV. This model, trained on the DFT-generated dataset, is further used to predict the Gibbs free energy of the rest of the dataset. During the filtration of the most promising candidates of the datasets, we only considered those structures with Gibbs free energy lying between −0.3 and 0.3 eV,50 and also, while selecting a photocatalyst, we considered the thermodynamic stability of those which having binding energies more than −5 eV.
The plot between Gibbs free energy and dataset index is shown in Fig. 5; the selection criterion for Gibbs free energy is denoted by horizontal dotted lines. We observed that the structures in this region are of PtSTe/ζ-phosphorene (Te side) heterostructures. Also, by flagging the scatter points with the adsorption sites Fig. 5(a), the adsorption site 2 is the most favourable for hydrogen adsorption. Adsorption site 2 is the one on the chalcogen of the Janus layer and is not facing the ζ-phosphorene layer. Fig. 5(d) represents the Gibbs free energy values obtained from DFT calculations and ML predictions, which are plotted for both training and testing data. The blue line shows perfect matching between the two values, indicating that our model closely matches the actual values. The dotted lines show a deviation of ±0.1 eV most of our data points are well within this limit.
Furthermore, we employed SHAP analysis to examine the influence of each feature on the model's final output, identifying the best descriptors of photocatalytic activity among the selected features. The SHAP analysis results are depicted in Fig. 6. From SHAP analysis, we observed that the Pt–X bond length (L_Pt–X), interlayer thickness (D_layer), choice of intercalation sites (A_site_A_1, A_site_A_2), Pt–Y bond length (L_Pt–Y), and binding energy show the most excellent mean SHAP value. A bar diagram of the average SHAP value for each feature, along with a waterfall plot for a particular prediction, is provided in Fig. S14 and S15 of the SI. The SHAP analysis of two individual predictions (Fig. S15, SI), one corresponding to a promising catalyst i.e. Cr intercalated PtSTe/zeta-P heterostructure (Te side) with ΔG = 0.06 eV and another for a poor catalyst i.e. Zn intercalated PtSSe/zeta-P heterostructure (S side) with (ΔG = 1.11 eV), indicate that the interlayer distance and Pt–X bond length make the most significant contributions to reaching the final prediction. The good catalyst has an interlayer distance of 2.88 Å and whereas poor catalyst has 2.56 Å. It has also been observed in the previous studies that increased interlayer distance leads to the good HER activity by iimproving the Gibbs free energy.51,52 Based on the benchmark of Gibbs free energy between the limits −0.3 eV to 0.3 eV, the 21 structures are selected (Table S6, SI), out of which 13 structures having binding energy more than −5 eV are given in Table 1.
| Heterostructures | Gibbs free energy (eV) | Binding energy (eV) |
|---|---|---|
| PtSTe/zeta-P(Te,V,I1,A2) | 0.09 | −5.94 |
| PtSTe/zeta-P(Te,Cr,I1,A2) | 0.06 | −6.30 |
| PtSTe/zeta-P(Te,Mn,I1,A2) | 0.05 | −6.23 |
| PtSTe/zeta-P(Te,Fe,I1,A2) | −0.0005 | −5.89 |
| PtSTe/zeta-P(Te,Co,I1,A2) | −0.06 | −5.42 |
| PtSTe/zeta-P(Te,Ti,I2,A2) | 0.10 | −5.98 |
| PtSTe/zeta-P(Te,V,I2,A2) | 0.08 | −6.41 |
| PtSTe/zeta-P(Te,Cr,I2,A2) | 0.05 | −6.68 |
| PtSTe/zeta-P(Te,Mn,I2,A2) | 0.03 | −6.79 |
| PtSTe/zeta-P(Te,Fe,I2,A2) | 0.003 | −6.61 |
| PtSTe/zeta-P(Te,Co,I2,A2) | −0.05 | −6.20 |
| PtSTe/zeta-P(Te,Ni,I2,A2) | 0.03 | −5.58 |
To benchmark the performance of our selected structures, we compared them with the commercially used Pt catalyst, which exhibits a hydrogen evolution Gibbs free energy close to zero. Our calculated Gibb's free energy values span from 0.09 eV to −0.0005 eV, highlighting their excellent catalytic potential relative to the Pt benchmark. Comparable efficiencies have been reported for transition-metal–intercalated Ti-doped WS2 bilayers, which exhibit Gibbs free energies in the range of 0.003–0.083 eV.53 Similarly, metal-intercalated bilayer borophene demonstrated Gibbs free energy range −0.082 to 0.183 eV.54 Furthermore, cation-intercalated 1T-MoS2 exhibited Gibbs free energies of the same order, further reinforcing the strong catalytic promise of our findings.55
We now performed recursive feature elimination on the SVR model. After feature elimination, we obtained 30 features for future training. With hyperparameter tuning using Randomised search CV, we were able to find the best set of hyperparameters for the model. The selected final SVR model with the optimised hyperparameters is given as SVR (kernel = ‘rbf’, C = 20, tol = 1 × 10−5, shrinking = False, gamma = 0.1). The learning curve for the SVR model is depicted in Fig. S13(b), which shows favourable behaviour. The final SVR model exhibits excellent performance with metrics MAE = −0.21 and R2 = 0.83.
A graph between the rate-limiting value of Gibbs free energy and the dataset index is plotted in Fig. 7. The scatter points are coloured based on the adsorption site in Fig. 7(a). The adsorption site 2 shows the minimum limiting Gibbs free energy value. The intercalation sites do not show any pattern with the rate-limiting Gibbs free energy value (Fig. 7(b)). The graph of rate-limiting Gibbs free energy (true values vs. ML predicted values) is given in Fig. 7(c). The values show a close correspondence with the ideal behaviour and most of the points are within he ±0.1 eV limit as denoted by the dotted lines.
Next, to identify the most important features among the selected features for OER descriptors, we use a permutation importance technique.56 The permutation importance value of each feature is given in Fig. 8. All the selected features have positive permutation importance, which means that shuffling the feature values increases the error of the models or the feature has a positive influence on the model prediction. The choice of the adsorption site is the important feature by far.
From the ML predictions, we have selected 10 structures with the minimum rate-limiting Gibbs free energy values (Table S9, SI), out of which 6 structures exhibit a binding energy greater than −5 eV. The structures, along with their rate-limiting Gibbs free energy value and binding energy, are given in Table 2. The selected structures have rate-limiting Gibbs free energy values more than 3.5 eV. These values are comparable with previous findings or Janus PtSSe-based vdWHs.24 With this, we can narrow down our search to ten promising candidates from a dataset of 240 structures. Feature importance analysis unveils valuable insights into the catalytic activity descriptors. The selected structures also show optimistic binding energy values.
| Heterostructure | Rate-limiting Gibbs free energy change (eV) | Binding energy (eV) |
|---|---|---|
| PtSTe/zeta-P(S,V,I_1,A_2) | 3.68 | −6.13 |
| PtSTe/zeta-P(S,Co,I_1,A_2) | 3.69 | −5.66 |
| PtSTe/zeta-P(S,Sc,I_2,A_2) | 3.67 | −6.76 |
| PtSTe/zeta-P(S,Fe,I_2,A_2) | 3.66 | −6.83 |
| PtSTe/zeta-P(S,Co,I_2,A_2) | 3.57 | −6.33 |
| PtSTe/zeta-P(S,Ni,I_2,A_2) | 3.56 | −5.62 |
We now calculate the projected electronic bands structure of pristine and intercalated PtSTe/ζ-phosphorene heterostructure as shown in Fig. S21(a) (SI). The corresponding band alignments of pristine heterostructure (Fig. S22, SI) reveals a type-II configuration, favourable to HER and OER activity. In contrast, the Fe- and Mn-intercalated PtSTe/ζ-phosphorene shows a metallic behaviour (Fig. S23, SI), arising from the introduction of states at the Fermi level by the intercalated atoms. Intercalation of a metal atom has been shown to improve photocatalytic activity by acting as a bridge between layers for charge transfer.57
| This journal is © The Royal Society of Chemistry 2026 |