Open Access Article
Byron MC Michael Kadum
*a and
Mageswary Karpudewan
b
aKolej Matrikulasi Labuan, Malaysian Institute of Chemistry, Malaysia. E-mail: byrnkdm@gmail.com
bSchool of Educational Studies, Universiti Sains Malaysia, 11800, Penang, Malaysia
First published on 20th May 2026
Gender comparisons of experimental self-efficacy (ESE) in chemistry laboratories are typically based on mean scores, which do not reveal how ESE beliefs are organised as an interconnected system. A cross-sectional sample of 655 Malaysian pre-university chemistry students (179 male; 476 female) completed a 12-item ESE measure spanning conceptual understanding, procedural complexity, laboratory hazards, and resource sufficiency. Measurement invariance was examined using multi-group CFA; configural invariance was supported, with borderline evidence for full metric invariance, so gender comparisons are interpreted cautiously as patterns under this operationalisation. Gender-specific networks were estimated using regularised Gaussian graphical models; strength centrality and permutation-based network comparison tests (NCTs) were used to evaluate gender differences, with bootstrap and case-dropping analyses assessing robustness. Both networks were domain-clustered, with strongest within-domain connections and relatively limited cross-domain links. Descriptively, conceptual understanding (item 3), “I am confident that I understand the chemical processes in the experiment,” was the most central node in the female network, whereas laboratory hazards (item 2), “I am confident of working in the laboratory without chemical spillage,” was the most central node in the male network; however, NCT indicated no differences in network structure or global strength, and no individual edges or centralities differed significantly. Overall, ESE showed similar network organisation across gender in this context. The identified clusters and cross-domain links, particularly those involving conceptual sense-making, provide hypotheses for future research on laboratory support rather than direct evidence of instructional effects.
Across chemistry education research, self-efficacy is consistently framed as consequential because it shapes students’ motivation and learning behaviours. Importantly, chemistry self-efficacy is not only an outcome of instruction but may also operate as a mechanism through which learning experiences are translated into sustained engagement and achievement in chemistry. For instance, among first-year chemistry undergraduates in New Zealand, students with higher self-efficacy tended to attribute success to effort, whereas those with lower self-efficacy were more likely to interpret difficulties as a lack of ability – an appraisal associated with discontinuing their studies (Dalgety and Coll, 2006). Consistent with this, Naibert et al. (2024), using data from 1485 undergraduates across six U.S. institutions, reported that stronger chemistry self-efficacy was associated with a mastery-oriented focus rather than failure avoidance, which in turn aligned with better performance on high-stakes assessments. Similar patterns are evident at the secondary level: among 370 Israeli Grade 11 students, higher chemistry self-efficacy was linked to greater willingness to engage with complex problem-solving rather than avoidance (Avargil, 2019).
Evidence in chemistry education indicates that students’ chemistry self-efficacy is not static, but can shift over time in response to instructional experiences and performance-related feedback. For example, studies have tracked chemistry self-efficacy repeatedly during a semester and reported systematic within-semester change alongside achievement patterns, consistent with self-efficacy being sensitive to ongoing course experiences (e.g., assessment outcomes, mastery experiences, and classroom demands) (Villafañe et al., 2014, 2016). Complementary longitudinal work using self-efficacy items embedded within established chemistry attitudes/experiences instruments likewise shows that students’ self-efficacy can vary across timepoints as they progress through chemistry learning experiences (Dalgety and Coll, 2006). Taken together, these findings support conceptualising chemistry self-efficacy as a course-responsive belief that may shape students’ persistence and longer-term decisions about continuing in STEM pathways (Villafañe et al., 2016).
The chemistry laboratory introduces performance demands that are partly distinct from those typically emphasised in classroom learning. Beyond conceptual understanding, students must demonstrate competence in handling chemicals and glassware, following and adapting procedures, making accurate measurements, observing and recording changes systematically, troubleshooting errors, and drawing defensible conclusions from imperfect data. At the same time, these demands are affectively charged: laboratory activities can elicit anxiety related to hazards, breakage, contamination, or “getting the wrong result.” Such emotional arousal is theoretically relevant because physiological and affective states are one of the four principal sources of self-efficacy beliefs (Bandura, 1997). Consistent with this, chemistry laboratory research shows that uncertainty and anxiety can shape how confidently students approach practical tasks. For example, Kurbanoglu and Akin (2010), studying chemistry undergraduates at four Turkish universities, reported that affective responses to laboratory uncertainty were associated with students’ confidence in undertaking experimental work. Similarly, Galloway and Bretz (2015), surveying 3583 undergraduates across 15 U.S. institutions, found that negative affective expectations (e.g., worry and confusion) can operate as self-fulfilling prophecies, inhibiting the development of laboratory confidence. Notably, however, Wong et al. (2021) reported a contrasting pattern among 320 Malaysian 14-year-old secondary school students: despite limited exposure to hands-on laboratory work, students reported high practical self-efficacy. The authors further suggested that this perceived experimental self-efficacy was fully mediated by students’ conceptions of learning science rather than by direct practical experience, highlighting that students’ interpretive frames may also shape efficacy beliefs for what it means to learn science.
Prior literature suggests that experimental self-efficacy (ESE) is multidimensional, with students differentiating confidence across key laboratory competencies such as safe practice, apparatus handling, procedural execution, measurement/data collection, and interpretation, rather than holding a single global judgement (Uzuntiryaki-Kondakci and Capa-Aydin, 2013; Alkan, 2016). Extending this laboratory-specific framing, Kolil et al. (2020, 2023) empirically characterised ESE as manifesting through four dominant, laboratory-relevant facets – conceptual understanding, procedural complexity, laboratory hazards, and sufficiency of resources. This facet structure may be interpreted in relation to Bandura's (1997) four sources of self-efficacy – mastery experiences, vicarious experiences, social persuasion, and physiological/affective states. This operationalisation of ESE provides a domain-appropriate lens for representing the kinds of competence judgements Malaysian pre-university students must make to participate productively in high-stakes practical work (Abdullah et al., 2022; Matriculation Division, 2022a, 2022b).
Notwithstanding these null findings, one study indicates that gender differences can emerge more clearly when ESE is assessed within a chemistry laboratory anxiety–self-efficacy framework. In a sample of 363 Turkish pre-service science teachers, female participants reported significantly higher ESE than male participants (Kırbaşlar et al., 2015). This finding suggests that gender effects may be more detectable in certain teacher-education populations and measurement contexts – particularly when laboratory confidence is conceptualised alongside affective factors that shape laboratory participation (e.g., anxiety in chemistry laboratory environments).
Crucially, the literature suggests that gendered patterns in self-efficacy may be expressed less as stable mean-level gaps and more as differences in how efficacy beliefs are measured, organised, and connected to learning processes. In chemistry education, this concern is reflected in the importance of establishing measurement equivalence before interpreting gender differences, because apparent group differences are meaningful only when the instrument functions comparably across gender groups (Rocabado et al., 2020). This point is reinforced by Saputra et al. (2024), who examined the Indonesian version of the High School Chemistry Self-Efficacy Scale through confirmatory and multigroup invariance analyses before interpreting gender- and grade-level differences. Consistent with this broader point, Sezgintürk and Sungur (2020), using a multidimensional model of science self-efficacy that included a practical-work dimension (i.e., ESE), found no overall multivariate gender difference and almost identical practical-work self-efficacy means for girls and boys. However, the explanatory structure differed: the set of predictors accounted for more variance in girls’ practical-work self-efficacy than in boys’, implying that the drivers of laboratory-related confidence may differ by gender even when average levels are similar. In chemistry education, Villafañe et al. (2014) similarly showed that chemistry self-efficacy patterns may vary across time and demographic subgroups, suggesting that self-efficacy should not be interpreted only as a fixed mean-level attribute. Related science education research also suggests that gender can shape the affective and motivational conditions surrounding self-efficacy: Kiran and Sungur (2012), for example, found no gender difference in science self-efficacy or strategy use, but reported gender differences in emotional arousal and interpersonal sources of efficacy. Collectively, these findings support the view that gender may influence how self-efficacy beliefs are organised, supported, and connected to broader learning processes, rather than producing a uniform “female lower/male higher” pattern.
Evidence that gender can condition the translation of ESE into outcomes is clearest in work using ESE scores across scientific disciplines and examining their relations with laboratory performance. For example, Kolil et al. (2023) reported that ESE was positively associated with laboratory scores in both groups, but the gender interaction differed in direction and magnitude across disciplinary contexts. In their chemistry group, the ESE × Gender interaction was positive (B = 0.213, β = 0.066, p = 0.027), indicating a stronger ESE-laboratory score relationship for males than for females. In their physics/biology group, however, the interaction was negative and somewhat larger in absolute size (B = −0.266, β = −0.090, p < 0.001), indicating a weaker relationship for males and a stronger one for females. Thus, gender was not a consistent direct predictor of laboratory performance, but it did condition how strongly ESE related to laboratory scores across disciplinary contexts (Kolil et al., 2023). In chemistry laboratory curriculum interventions, improvements in inquiry-related self-efficacy have also been observed for both men and women, indicating that well-designed laboratory experiences can raise experimental confidence across genders rather than amplifying gaps (Winkelmann et al., 2015). Taken together, the evidence indicates that gender differences in ESE are best understood as context- and model-dependent: mean differences are often small or absent, but gender may still matter through (i) differences in belief structure and (ii) differences in how efficacy beliefs connect to performance (Sezgintürk and Sungur, 2020; Kolil et al., 2023).
Psychometric network analysis operationalises this systems perspective by modelling ESE items as nodes and estimating edges as conditional associations (typically regularised partial correlations) between items after controlling for all others (Epskamp et al., 2018; Borsboom et al., 2021). Conceptually, this approach aligns with the idea that psychological attributes can emerge from direct interactions among components, rather than being explained solely by a latent common cause (Borsboom et al., 2021). For ESE in chemistry, this is theoretically meaningful: beliefs about safe handling, glassware confidence, procedural fluency, and interpretive competence are likely to co-activate during real practical work and may reinforce one another through repeated mastery experiences, feedback, and affective responses in the lab. Network models can therefore reveal which beliefs form tight clusters, which beliefs bridge across domains, and which conditional links represent key “routes” through which confidence may generalise from one aspect of laboratory work to another (Epskamp et al., 2018; Borsboom et al., 2021). These structural insights are largely inaccessible when items are collapsed into subscale totals.
A further justification is that network models can identify central and bridging components of ESE – beliefs that are highly connected within the network or that connect otherwise distinct clusters – thereby offering a principled way to locate potential leverage points for instructional design and intervention (Epskamp et al., 2018). This is particularly useful in chemistry laboratory contexts because small improvements in a central belief (e.g., confidence in handling hazards or apparatus) may plausibly cascade into other connected components (e.g., willingness to engage, persistence with procedures, or confidence in interpreting results). Network methodology also emphasises the need to evaluate the robustness of estimated edges and centrality indices, encouraging researchers to report accuracy/stability evidence (e.g., bootstrapped confidence intervals) rather than treating network features as fixed properties (Epskamp et al., 2018).
Importantly, psychometric network analysis strengthens gender-focused inquiry by moving beyond the question of whether male and female students differ in overall level to whether they differ in structure – that is, whether the organisation of efficacy beliefs differs by gender even when mean differences are mixed or small. The network comparison test (NCT) enables formal tests of group differences in (i) overall connectivity (global strength) and (ii) network structure (patterns of conditional associations), which directly matches the theoretical possibility that gender effects operate through different belief architectures rather than through simple mean gaps (van Borkulo et al., 2023). A psychometric network approach provides an empirically and theoretically grounded way to advance knowledge on ESE in chemistry by (a) mapping how specific confidence beliefs cohere under authentic laboratory demands, (b) identifying central and bridging beliefs that may be instructionally strategic, and (c) testing whether the organisation of those beliefs differs by gender – thereby offering explanations that complement, rather than duplicate, score-based comparisons (Epskamp et al., 2018; Borsboom et al., 2021; van Borkulo et al., 2023).
In chemistry laboratories, self-efficacy is rarely isolated to a single skill. Feeling capable of working safely may shape willingness to engage with uncertain outcomes; confidence in procedural control may influence whether students persist when results deviate from expectation; and confidence in interpreting evidence may determine whether students treat unexpected data as informative rather than as failure. A network perspective is therefore useful because it models ESE as an interconnected belief system rather than as independent subscale scores. Importantly, if the belief framework differs by gender, then instructional supports may need to be targeted differently; conversely, if the framework is invariant, then universal leverage points for strengthening ESE can be prioritised.
There is currently no consensus on the exact a priori power requirements or minimum sample sizes for cross-sectional psychological network models, as the required N depends on the number of nodes, network sparsity, and underlying edge weights (Epskamp et al., 2017, 2018; Hevey, 2018). Simulation work on Ising and Gaussian graphical models by Epskamp et al. (2018) indicated that the sample sizes typically used in psychological research can produce poorly recovered networks, whereas larger samples yield more accurate edge-weight estimates and more stable centrality indices. In their tutorial on network accuracy, Epskamp et al. (2018) varied sample size between 100 and 5000 and demonstrated that the correlation-stability (CS) coefficient for centrality increases as a function of N, recommending that CS values should not fall below 0.25 and preferably be above 0.50 before centrality differences are interpreted. In a related methodological paper, Epskamp et al. (2017) emphasised that network models often contain many parameters relative to typical psychological sample sizes and cautioned that results may depend heavily on modelling assumptions when N is small, whereas larger samples tend to yield more robust and accurate networks. Hevey (2018) similarly argued that larger samples produce more stable and interpretable networks and noted the limited empirical basis for conventional a priori power analyses for network models. Although these simulation and tutorial studies were conducted mainly in clinical and health-psychology contexts, they examined general statistical properties of regularised psychological network models for Likert-type variables rather than disorder-specific content (Epskamp et al., 2017, 2018; Hevey, 2018). The Experimental Self-Efficacy (ESE) items used in this study are likewise domain-specific psychological indicators (science/chemistry self-efficacy) measured on an ordinal response scale, so the same issues of parameter-to-sample-size ratio, sparsity, and centrality stability are applicable. In chemistry education research, large-scale survey studies typically involve a few hundred participants; accordingly, the present study drew on these domain-general methodological recommendations to justify a relatively large sample for a modest 12-node network, to consider the adequacy of the male and female subgroup Ns separately, and to interpret the smaller male network more cautiously.
(a) What is the network structure of ESE for male and female students?
(b) Which ESE items emerge as most central within each gender network?
(c) Are the male and female ESE networks invariant in overall structure and global strength?
The study site was a government-funded pre-university college that offers a science curriculum broadly aligned with the Cambridge Advanced Level (A-Level). English was the primary medium of instruction, and all examination questions were set in English. Students admitted to this college were typically high performers in the Sijil Pelajaran Malaysia (SPM) examination, which is comparable to the British General Certificate of Education Ordinary Level (GCE O-Level). After completing the one-year programme, most students progressed to STEM-related undergraduate degrees at universities in Malaysia and abroad. These representational demands are particularly relevant in Malaysia's pre-university chemistry context, where laboratory learning is structured, compulsory, and high-stakes (Matriculation Division, 2022a, 2022b). The pre-university programme in Malaysia requires students to complete practical work routinely (e.g., scheduled laboratory sessions and a set number of experiments each semester), with explicit expectations related to safe practice, correct technique, accurate measurement, error awareness, and evidence-based conclusions (Abdullah et al., 2022). In such settings, successful participation depends not only on conceptual knowledge but also on students’ confidence to plan, execute, and interpret experimental work under real constraints. Therefore, self-efficacy becomes pivotal in shaping whether students initiate tasks, invest effort, and persist when laboratory outcomes are ambiguous or do not match expectations.
The college was selected using a convenience sampling approach, considering practical considerations such as travel distance and the feasibility of obtaining administrative approval. Malaysia has 15 government-funded pre-university colleges, each typically enrolling about 1500 to 3000 students per intake. The participating college usually admits between 1900 and 2400 science students per cohort and follows a centrally designed national STEM curriculum. Its students are competitively selected through a national merit-based matriculation process. For the science stream, applicants are typically required to obtain at least a credit in Bahasa Melayu, English, Additional Mathematics, Chemistry, and either Physics or Biology, a grade B in Mathematics, and a pass in History. Selection is then made on a meritocratic basis, considering academic merit, co-curricular achievement, and family income; the merit calculation allocates 90% to academic performance and 10% to co-curricular performance. Consequently, although the data formally came from a single institution, the context shares common curricular features with other colleges, so findings may be most transferable to similar pre-university settings; however, generalisation beyond this site requires replication.
The final overall sample of 655 pre-university students substantially exceeded the N = 359 used in Epskamp et al. (2018) work on PTSD and fell in the mid-to-upper range of sample sizes (100–5000) examined in their simulation studies. Because networks were estimated separately for males and females, the adequacy of these subgroup sizes was also considered. Epskamp et al. (2018) showed that regularised Gaussian graphical models can recover the general structure of sparse networks at N ≈ 100, with edge-weight and centrality estimates becoming increasingly stable as N increases. The female subsample (N = 476) lay in the middle of this simulated range, whereas the male subsample (N = 179) was closer to the lower bound but still above N = 100. For a 12-node network, this corresponded to a case-to-node ratio of roughly 40
:
1 for females and 15
:
1 for males, which compares favourably with examples and recommendations in the methodological literature (Epskamp et al., 2017, 2018; Hevey, 2018). Taken together, the overall N = 655 and the subgroup Ns (179 males, 476 females) placed this study in a range where prior work suggests that regularised psychological networks can be estimated with satisfactory accuracy and stability, while still warranting somewhat more cautious interpretation for the smaller male network.
The instrument development and administration process proceeded in four stages. First, the initial adapted ESE items were reviewed by three chemistry education subject matter experts on 5 December 2024 to gather evidence based on test content and identify wording revisions. Second, the revised pilot version was administered on 7 January 2025 to two randomly selected intact chemistry lecture groups, comprising 221 pre-university students, using a Google Form after their chemistry lecture session, with permission obtained from the respective lecturer to use approximately 10 to 15 minutes of class time. Third, response-process interviews with seven students were conducted on 17 January 2025 to identify any remaining wording difficulties. Fourth, the resulting final 12-item version was administered to the main sample from 21 to 25 April 2025 using a Google Form with eight randomly selected intact chemistry lecture groups that had not participated in the pilot phase. With permission from the respective lecturers, students completed the survey during the final 10 to 15 minutes of their chemistry lecture sessions. In both the pilot and main phases, students were asked to respond with reference to their current face-to-face chemistry laboratory experiences within the pre-university chemistry programme.
| Item | Item-rest correlation | |||
|---|---|---|---|---|
| Estimate | Lower 95% CI | Upper 95% CI | ||
| CU = conceptual understanding; LH = laboratory hazards; PC = procedural complexity; SR = sufficiency of resources. | ||||
| CU1 | I believe I have a good understanding of the theory behind laboratory experiments before performing experiments. | 0.637 | 0.551 | 0.709 |
| CU2 | Experimental concepts become clearer to me as I perform the experiment. | 0.625 | 0.538 | 0.700 |
| CU3 | I am confident that I understand the chemical processes in the experiment. | 0.696 | 0.621 | 0.758 |
| LH1 | I can usually handle the glass apparatus in the laboratory on my own without any fear of breakage and injury. | 0.505 | 0.400 | 0.598 |
| LH2 | I am confident of working in the laboratory without chemical spillage. | 0.585 | 0.491 | 0.665 |
| LH3 | I am always alert in the laboratory and have minimal accidents. | 0.539 | 0.438 | 0.626 |
| PC1 | After an experiment, I have no difficulty figuring out how my calculation procedures and errors affected my results. | 0.713 | 0.641 | 0.772 |
| PC2 | When presented with laboratory results, I know how to interpret them and make correct conclusions from them. | 0.795 | 0.741 | 0.839 |
| PC3 | I do not struggle with processing information in background articles and relating them to my own laboratory procedures and results. | 0.763 | 0.702 | 0.813 |
| SR1 | I find it easy to complete the exercise in the laboratory even though there is limited personal participation in performing experiments. | 0.652 | 0.569 | 0.722 |
| SR2 | It is easy for me to understand theory and concepts properly in spite of limited availability of physical instruments. | 0.728 | 0.659 | 0.784 |
| SR3 | I do not find it challenging to understand an experiment even if there is only one try due to limited availability of chemicals. | 0.725 | 0.655 | 0.782 |
For the network analysis, each of the 12 ESE items was treated as a node, and an edge represented the unique association between two items after statistically controlling for the remaining items. There were no missing responses across the 12 ESE items; therefore, all cases were retained for network estimation. Networks were estimated for the full sample and separately for male and female students using Gaussian graphical models (GGMs), as implemented in the Network (Frequentist) module in JASP. In this framework, the network is based on partial correlations, meaning that an edge is retained only when two items remain associated after the other items in the network have been considered (Hevey, 2018).
To avoid an overly dense network driven by trivial associations, the least absolute shrinkage and selection operator (LASSO) regularisation was applied. This procedure shrinks very small edge weights towards zero, producing a sparser and more interpretable network. The extended Bayesian information criterion (EBICglasso) was then used to select the most parsimonious network, with hyperparameter γ = 0.5 to balance simplicity and edge retention. Because responses were recorded on a 5-point Likert scale, items were treated as approximately continuous, following common practice in applied network psychometrics, although this choice may influence edge estimates (Hevey, 2018; Huth et al., 2023). To address this, robustness analyses were conducted and small edges were interpreted cautiously (Hevey, 2018; Huth et al., 2023; Diaz-Milanes et al., 2024). In addition, the gender-specific networks were re-estimated using higher γ values (0.75 and 1.00) as a sensitivity check. These analyses produced identical weight matrices and network structures, indicating that the findings were robust to stricter regularisation.
Networks were visualised using a force-directed (Fruchterman–Reingold) layout, with thicker edges indicating stronger partial correlations. Node centrality was quantified using the indices available in JASP – strength, closeness, betweenness, and strength/expected influence – to characterise how strongly and how efficiently each node is connected to the rest of the network (Hevey, 2018; Huth et al., 2023; Diaz-Milanes et al., 2024). Higher centrality values were interpreted as indicating items that may play a more influential role in the ESE network.
The accuracy and stability of the estimated networks were evaluated using non-parametric bootstrapping procedures, following established guidelines for psychological network analysis (Epskamp et al., 2018; Hevey, 2018; Huth et al., 2023). Bootstrapped 95% confidence intervals were computed for each edge weight to assess estimation precision; narrower intervals were taken to indicate more reliable edges. Stability of centrality indices (especially node strength and expected influence) was assessed via case-dropping subset bootstrapping and summarised using the correlation-stability (CS) coefficient, which reflects the maximum proportion of cases that can be dropped while retaining a high correlation (r ≥ 0.70) between the original and subset-based centrality estimates (Epskamp et al., 2018; Hevey, 2018). In line with current recommendations, CS values of at least 0.25 were regarded as acceptable, with values of 0.50 or higher considered preferable for drawing substantive conclusions about central nodes (Epskamp et al., 2018).
In addition to standard centrality indices, bridge expected influence (BEI) was computed to quantify which ESE items connect different domains. Clusters were pre-specified according to the four ESE domains (CU, LH, PC, SR). One-step BEI was operationalised as the sum of a node's edge weights linking it to nodes outside its own domain (Jones et al., 2021). Because all retained edges in the estimated networks were positive, BEI yields the same rank ordering as bridge strength, but retains an interpretation aligned with expected influence. To evaluate whether the estimated networks recover the hypothesised four-domain organisation, community detection was performed using the Louvain algorithm and modularity (Q) was computed for each gender-specific network (Blondel et al., 2008). Bridge indices and modularity results were used descriptively to strengthen the interpretation of cross-domain connectivity patterns.
Before comparing networks by gender, the measurement invariance of the four-factor ESE model was evaluated using multi-group CFA. The configural model showed acceptable fit (CFI = 0.924, RMSEA = 0.059, SRMR = 0.056), supporting configural invariance across males and females. Imposing equality constraints on factor loadings (metric invariance) resulted in a small reduction in model fit (CFI = 0.910; ΔCFI = −0.014; ΔRMSEA = 0.001; ΔSRMR = 0.016), although the χ2 difference test was non-significant (Δχ2(12) = 17.921, p = 0.118). Adding intercept constraints (scalar invariance) did not further degrade fit relative to the metric model (CFI = 0.918; ΔCFI = 0.008; ΔRMSEA = −0.005; ΔSRMR = 0.000; Δχ2(8) = 4.525, p = 0.807). Collectively, these results support configural invariance, indicating that the same four-factor ESE configuration was recovered across males and females. However, support for full metric invariance was only borderline, meaning that exact equality of item-factor loadings across gender was not clearly established. Because metric invariance concerns whether the strength of item-factor relations is similar across groups, incomplete loading equivalence limits stronger claims that individual items are functioning in the same way for males and females (Gregorich, 2006; Rocabado et al., 2020). Accordingly, the gender-based network comparisons in the present study are interpreted cautiously as descriptive comparisons of broad network organisation under the current operationalisation, rather than as definitive evidence that specific item-level contrasts reflect substantively different gender effects in ESE (Sass, 2011; Rocabado et al., 2020). In other words, the findings are interpreted mainly in terms of whether the overall arrangement of ESE relations appears broadly similar across groups, while recognising that stronger between-group claims would require stronger evidence of invariance.
To investigate gender differences in the structure of ESE, independent-group network comparison tests were conducted on the male and female networks. Permutation-based tests were used to examine (a) global strength invariance (i.e., whether the sum of absolute edge weights differed between groups) and (b) network structure invariance (i.e., whether the overall pattern of edge weights differed), mirroring procedures used in recent applications comparing symptom networks across gender (Zhang et al., 2024). Statistical significance was evaluated at α = 0.05, and differences in global strength and selected edge weights were reported to aid interpretation of any observed group differences (Hevey, 2018; Zhang et al., 2024).
| Overall (N = 655) | Female (N = 476) | Male (N = 179) | |
|---|---|---|---|
| CU1 | 3.508 ± 0.820 | 3.481 ± 0.809 | 3.581 ± 0.847 |
| CU2 | 4.241 ± 0.766 | 4.267 ± 0.761 | 4.173 ± 0.778 |
| CU3 | 3.562 ± 0.814 | 3.534 ± 0.795 | 3.637 ± 0.859 |
| LH1 | 3.921 ± 0.945 | 3.901 ± 0.936 | 3.972 ± 0.968 |
| LH2 | 3.829 ± 0.901 | 3.813 ± 0.925 | 3.872 ± 0.835 |
| LH3 | 4.366 ± 0.760 | 4.399 ± 0.760 | 4.279 ± 0.757 |
| PC1 | 3.333 ± 0.847 | 3.309 ± 0.836 | 3.397 ± 0.877 |
| PC2 | 3.440 ± 0.803 | 3.437 ± 0.772 | 3.447 ± 0.881 |
| PC3 | 3.412 ± 0.845 | 3.403 ± 0.827 | 3.436 ± 0.893 |
| SR1 | 3.350 ± 0.952 | 3.311 ± 0.945 | 3.453 ± 0.967 |
| SR2 | 3.366 ± 0.871 | 3.326 ± 0.857 | 3.475 ± 0.901 |
| SR3 | 3.337 ± 0.937 | 3.313 ± 0.916 | 3.402 ± 0.992 |
![]() | ||
| Fig. 1 Gender-based comparison of network edge structure in Malaysian pre-university chemistry students. | ||
In terms of edge structure, both networks were characterised by stronger within-construct than between-construct connectivity, producing clear clustering across the four ESE domains (CU, LH, PC, and SR). Supporting this pattern, the female network retained all 12 within-construct edges, whereas the male network retained 11 of 12 (with one within-PC edge shrunk to zero, i.e., between PC1 and PC3). Moreover, within-construct edges were substantially larger on average than cross-construct edges (female: Mwithin ≈ 0.278 vs. Mbetween ≈ 0.068; male: Mwithin ≈ 0.291 vs. Mbetween ≈ 0.070), indicating a modular organisation with strong internal coherence and comparatively weaker bridging ties. Practically, this means that students’ confidence beliefs tended to reinforce one another most strongly within the same ESE domain, whereas cross-domain links were present but comparatively limited. Although the female network was marginally denser overall, this should not be interpreted as materially greater between-domain connectivity, because cross-domain edge weights were very similar across gender and the later network comparison test indicated no significant difference in global structure.
In the female network (see SI, Appendix 1: Table A), the strongest and most interpretable retained edges were concentrated within the same ESE domains, particularly laboratory hazards and sufficiency of resources. Specifically, the strongest conditional associations were observed among laboratory hazards items – LH2–LH3 (0.408) and LH1–LH2 (0.407) – followed closely by strong cohesion within sufficiency of resources (SR1–SR2 = 0.404; SR2–SR3 = 0.373). Within-domain connectivity was also evident for conceptual understanding (CU1–CU3 = 0.348; CU2–CU3 = 0.315) and procedural complexity (PC2–PC3 = 0.318; PC1–PC2 = 0.260). Substantively, this pattern suggests that safety-related judgements such as avoiding spillage, handling apparatus safely, and remaining alert in the laboratory tended to cohere closely, as did confidence judgements related to coping with resource constraints. Likewise, the conceptual items clustered around sense-making during experiments, indicating that confidence in understanding theory, gaining clarity during experimentation, and understanding chemical processes tended to reinforce one another.
A comparable pattern emerged in the male network, as presented in Fig. 1 and Appendix 2: Table B (see SI), although the single strongest retained edge occurred within procedural complexity, where PC2–PC3 (0.522) was the largest connection in the entire network. Strong within-construct links were also observed for laboratory hazards (LH1–LH2 = 0.464; LH1–LH3 = 0.243; LH2–LH3 = 0.223), indicating that confidence in safe handling, avoiding spillage, and remaining alert tended to form a tightly coupled safety-technique cluster. Sufficiency of resources also showed coherent within-domain clustering (SR2–SR3 = 0.386; SR1–SR2 = 0.225; SR1–SR3 = 0.188), suggesting that confidence under limited participation, limited instruments, and one-try-only conditions was experienced as a connected set of resource-related judgements. Conceptual understanding remained internally cohesive as well, led by CU1–CU3 (0.368) and CU2–CU3 (0.180), indicating that confidence in understanding theory, gaining conceptual clarity during experimentation, and understanding chemical processes was organised as a sense-making cluster. The procedural complexity pattern is especially informative because PC1 and PC3 were both linked to PC2, but not directly to each other. This suggests that these two forms of procedural confidence may converge on a shared interpretive core rather than forming a single undifferentiated procedural block. Specifically, PC1 concerns understanding how calculation procedures and errors affected results, whereas PC3 concerns relating background information to one's own procedures and results; both judgements plausibly converge on PC2, which focuses on interpreting laboratory results and making correct conclusions. Once this shared interpretive component is considered, the direct PC1–PC3 relation may no longer remain. In this sense, PC2 appears to function as a local interpretive hub within the procedural complexity domain.
Beyond these within-construct clusters, a smaller number of cross-construct “bridge” edges helped connect the ESE domains (Fig. 1). In the female network, the largest cross-construct connection was CU2–LH3 (0.204), followed by bridges linking procedural complexity with resources (PC2–SR3 = 0.160) and conceptual understanding with procedural complexity (CU3–PC1 = 0.153; CU1–PC1 = 0.135; CU3–PC3 = 0.126). In the male network, the most prominent bridges included CU1–PC3 (0.155), CU2–LH3 (0.154), CU3–SR3 (0.152), and links involving hazards with resources/procedures (LH2–SR1 = 0.151; LH2–PC1 = 0.139). Although these cross-domain edges were smaller than the strongest within-domain edges, they remain important because they indicate where confidence in one domain may connect to confidence in another. Overall, these patterns show that while both gender networks are primarily organised around strong within-domain coherence, the retained cross-domain edges consistently connect conceptual understanding with the other ESE components, particularly procedural complexity and laboratory hazards.
Bridge expected influence (BEI) was then used to quantify which items most strongly connect domains. As shown in Table 3, in the female network, bridge connectivity was most prominent for procedural complexity items, with PC3 (BEI = 0.590) emerging as the strongest bridge node, followed by PC1 (0.456) and PC2 (0.448). These bridge effects reflected cross-domain links connecting procedural processing with conceptual understanding and constraint-related confidence (e.g., PC3–CU3 = 0.126; PC3–SR1 = 0.095; PC3–SR2 = 0.092; PC3–LH1 = 0.112). In the male network (Table 3), bridge connectivity was most prominent for CU3 (0.521) and LH2 (0.514), followed by CU1 (0.481) and PC3 (0.471). Notably, LH2 showed cross-domain links with safety/resource and procedural nodes (e.g., LH2–SR1 = 0.151; LH2–PC1 = 0.139; LH2–CU1 = 0.130), whereas CU3 primarily linked conceptual understanding with procedural and resource-related confidence (e.g., CU3–SR3 = 0.152; CU3–PC1 = 0.117; CU3–PC2 = 0.114). These bridge patterns help explain why some item pairs are not directly connected despite belonging to the same broader ESE system: confidence judgements may relate to one another indirectly through local hubs or cross-domain connectors rather than through a single direct edge.
| Rank | Female node | BEI | Male node | BEI |
|---|---|---|---|---|
| Higher = more cross-domain connectivity. | ||||
| 1 | PC3 | 0.590 | CU3 | 0.521 |
| 2 | PC1 | 0.456 | LH2 | 0.514 |
| 3 | PC2 | 0.448 | CU1 | 0.481 |
| 4 | CU1 | 0.446 | PC3 | 0.471 |
| 5 | CU3 | 0.444 | SR1 | 0.412 |
Because the network comparison test indicated no statistically significant gender differences in global structure or strength, these bridge patterns are interpreted descriptively as potential connective “routes” within each gender network rather than as evidence of robust between-group differences.
In the female network, CU3 was the most central item, exhibiting the highest strength (S = 1.430), alongside comparatively high betweenness (B = 1.992) and closeness (C = 1.614). The next highest-strength nodes were SR2 (S = 1.105), PC3 (S = 0.971), and PC2 (S = 0.852), indicating that these items contributed most strongly to overall connectivity. Conversely, CU2 showed the lowest strength (S = −1.764), followed by SR1 (S = −1.237) and SR3 (S = −0.953).
In the male network, LH2 emerged as the strongest hub, with the highest strength (S = 1.729), and it also showed the highest betweenness (B = 2.773) and closeness (C = 1.784). Other high-strength nodes were PC2 (S = 1.177) and CU3 (S = 0.980), followed by PC3 (S = 0.550) and CU1 (S = 0.511). The lowest strength values were observed for PC1 (S = −1.340), CU2 (S = −1.177), and LH3 (S = −1.165).
Descriptively, CU3 appeared among the more central nodes in the female network, whereas LH2 appeared among the more central nodes in the male network; however, these differences were not statistically supported in the network comparison test and are therefore interpreted as exploratory patterns. Overall, centrality patterns were broadly similar across gender, with CU3 and LH2 emerging as prominent nodes in the female and male networks, respectively; however, group differences in centrality were not statistically significant in the network comparison test.
A key contribution of the network approach in this study is not simply that items cluster into the four expected domains – an outcome already consistent with prior evidence supporting a four-factor structure – but that the network quantifies which specific items connect domains and therefore may represent plausible leverage points for broadening students’ ESE. Cluster detection recovered the four intended ESE domains exactly in both gender networks (female Q = 0.364; male Q = 0.345), indicating coherent domain organisation. However, BEI highlighted a small subset of items with disproportionately high cross-domain connectivity, providing hypotheses for future intervention studies for instruction than domain-level scores alone. Against this structural backdrop, node centrality was examined to identify the most interconnected beliefs within each gender network.
Centrality estimates (strength) indicated that CU3 (“I am confident that I understand the chemical processes in the experiment”) was the most central node in the female network, whereas LH2 (“I am confident of working in the laboratory without chemical spillage”) was the most central node in the male network. Descriptively, this suggests that conceptual process understanding may function as a key organising belief among females, while spillage-related safety/technique confidence may be a salient hub among males. However, these contrasts should be interpreted as exploratory because centrality is sensitive to sampling variability and should not be over-interpreted without robust stability evidence (Epskamp et al., 2018).
Permutation-based network comparison provided no evidence of gender differences in overall network structure or global strength, indicating that the conditional association framework of ESE appears broadly similar across male and female students in this Malaysian pre-university sample. Given the borderline evidence for full metric invariance, this result is interpreted as no evidence of network differences under the current operationalisation, rather than as definitive evidence that the underlying construct is fully invariant across gender. This framing reinforces the value of formal network comparison and measurement evaluation, rather than relying on visual inspection of apparent network differences (van Borkulo et al., 2023).
The dominance of within-domain edges across both networks suggests that students hold tightly coupled confidence beliefs within each domain – understanding chemical processes (CU), managing hazards (LH), navigating procedural demands (PC), and coping with resource constraints (SR). This pattern supports the conceptualisation of ESE as task- and context-specific judgements about capability in authentic laboratory conditions, consistent with self-efficacy theory (Bandura, 1986, 1997). From a network psychometrics standpoint, such clustering is expected when items reflect strongly related components of a broader competence perception (Hevey, 2018; Borsboom et al., 2021).
Although fewer and generally smaller, cross-domain connections were theoretically meaningful and were concentrated in a limited set of bridge items. In the female network, BEI was highest for PC3 (0.590), followed by PC1 (0.456) and PC2 (0.448), indicating that procedural complexity beliefs – especially confidence in connecting preparatory information to procedures and outcomes – were most strongly positioned to link domains (e.g., PC3–CU3; PC3–LH1; PC3–SR1/SR2). In the male network, BEI was highest for CU3 (0.521) and LH2 (0.514), suggesting that conceptual process understanding and spillage-related technique/safety confidence were the most prominent cross-domain connectors (e.g., CU3–SR3; CU3–PC1/PC2; LH2–SR1; LH2–PC1; LH2–CU1). Taken together, these bridge patterns reinforce the practical reality of chemistry laboratory work: students must align conceptual reasoning with procedural execution while managing hazards and constraints (Villafañe et al., 2016; Avargil, 2019).
The emergence of CU3 (female) and LH2 (male) as prominent hubs is consistent with the idea that different belief elements can “anchor” a student's broader confidence system. However, centrality should be interpreted cautiously because (i) it does not imply causality, and (ii) some centrality indices (especially closeness and betweenness) are less stable in estimated psychological networks, motivating the emphasis on strength and expected influence supported by bootstrap accuracy and case-dropping procedures (Epskamp et al., 2018; Hevey, 2018). Accordingly, these node-level contrasts are best treated as hypothesis-generating.
In addition to CU3, SR2 also appeared relatively central in the female network. Descriptively, this suggests that confidence in understanding chemistry under resource-constrained laboratory conditions may have been more strongly integrated into the broader ESE system for female students. In practical terms, this may indicate that for female students, resource-related confidence was tied not only to perceived adequacy of materials, but also to their broader sense of conceptual functioning in the laboratory. However, this pattern should be interpreted cautiously because the network comparison test did not indicate statistically significant gender differences in centrality.
The present results align with the wider literature showing that gender differences in ESE are not consistently observed and may depend on population, learning design, and how laboratory confidence is conceptualised and measured (Winkelmann et al., 2015; Wahyudiati et al., 2020). At the same time, studies have reported gender differences under particular conditions – for example, when laboratory confidence is examined alongside affective variables such as laboratory anxiety, or in specific teacher-education contexts (Kırbaşlar et al., 2015). The current findings, therefore, extend prior work in two ways: (i) they show small gender differences at the mean level, and (ii) they demonstrate that gender similarity also appears at the level of belief architecture, with comparable within-domain organisation and bridging patterns, consistent with the modularity and BEI results reported above (Borsboom et al., 2021; van Borkulo et al., 2023).
The lack of detectable gender differences in network structure or global strength may reflect several plausible mechanisms:
(a) Shared laboratory ecology and standardised expectations. In a high-stakes pre-university setting, students often experience similar laboratory routines, assessment demands, and structured practical formats, which may reduce the formation of divergent confidence pathways by gender (Abdullah et al., 2022; Matriculation Division, 2022a, 2022b).
(b) Task-specific measurement attenuates broad gender generalisations. Because ESE is measured as confidence in clearly defined practical-work demands (e.g., understanding processes, avoiding spillage, interpreting results under constraints), gender differences that sometimes appear in broader “science confidence” measures may be less pronounced here (Bandura, 1997; Villafañe et al., 2016).
(c) Borderline metric invariance encourages cautious inference. Although multi-group CFA supported configural invariance and showed borderline evidence for metric invariance, this reinforces the interpretation of gender comparisons primarily as overall patterning rather than as strong claims about measurement functioning differences.
Within social cognitive theory, self-efficacy refers to judgements of capability to organise and execute actions needed to attain goals, and these judgements are shaped through learning experiences in specific contexts (Bandura, 1986, 1997). The current network findings complement this perspective by illustrating that ESE is not merely “high or low,” but a system of mutually connected capability beliefs across conceptual, procedural, safety, and constraint-related facets of laboratory work. In network terms, clusters reflect tightly linked belief components, while bridging edges suggest how certain beliefs (notably conceptual understanding) may coordinate confidence across domains (Hevey, 2018; Borsboom et al., 2021). This provides a theoretically grounded basis for thinking about laboratory support: targeted strengthening of strategically connected beliefs may help reinforce the broader ESE system.
The present findings suggest several tentative pedagogical considerations for chemistry laboratory teaching and student support; however, these should be interpreted cautiously because the current network analysis is cross-sectional and does not directly test instructional effects:
(a) Support conceptual sense-making during experiments. The recurrent cross-domain involvement of conceptual-understanding items suggests that helping students connect observations, procedures, and underlying chemical processes may be a reasonable focus for laboratory support (Villafañe et al., 2016; Avargil, 2019). For instance, prompts that explicitly link procedure → observation → chemical process → conclusion may be worth examining in future instructional designs, although the present findings do not establish causal effects.
(b) Treat spillage-related confidence as an exploratory area for targeted support. The prominence of LH2 in the male network suggests that confidence related to avoiding chemical spillage may be one potentially connected aspect of laboratory self-efficacy (Towns et al., 2015; Kolil et al., 2020). However, because node-level contrasts were not statistically significant in the network comparison test, this pattern is best treated as exploratory rather than as evidence of a confirmed instructional entry point.
(c) Examine interpretation support under constrained conditions. Bridge connections involving procedural complexity and resource-related confidence suggest that students’ confidence in interpreting laboratory work may be linked to how they experience limited time, limited equipment, or limited opportunities to repeat experiments. Accordingly, scaffolds that support interpretation and planning under such constraints may warrant further examination in future intervention studies (Towns et al., 2015; Kolil et al., 2020).
Several limitations should frame interpretation. First, the networks are cross-sectional; edges represent conditional associations and do not imply causal direction (Hevey, 2018; Borsboom et al., 2021). Second, small edges showed wide and overlapping confidence intervals, so fine-grained ranking of weaker connections should be avoided (Epskamp et al., 2018). Third, the male subsample was smaller, which may reduce sensitivity for subtle group differences. Finally, because multi-group CFA showed borderline evidence for full metric invariance, gender comparisons should be interpreted cautiously as reflecting group patterning under this operationalisation; replication with explicit partial invariance testing and/or item-level non-invariance modelling would strengthen inference. Future research could (i) use longitudinal or intervention designs to examine whether strengthening conceptual sense-making changes downstream procedural/safety/resource confidence, (ii) compare networks across institutions with different laboratory infrastructures, and (iii) test moderators more proximal than gender (e.g., prior laboratory exposure, laboratory anxiety, or perceived instructional support), consistent with prior evidence that gender effects can become clearer in affect-linked measurement contexts (Kırbaşlar et al., 2015; Sezgintürk and Sungur, 2020).
Across both gender groups, the estimated networks showed a coherent, domain-aligned structure: the most prominent connections occurred within each ESE domain, while fewer and smaller cross-domain links connected domains. In practical terms, this pattern suggests that students’ confidence in laboratory work is not a single undifferentiated belief, but a clustered system where capability judgements cohere most strongly around related demands (e.g., procedural demands with procedural demands; hazard-related demands with hazard-related demands). At the same time, the networks retained several interpretable bridge connections, with conceptual understanding – especially confidence in explaining chemical processes – appearing repeatedly as a linking component that connects to procedural and safety-related confidence, indicating a plausible coordinating role for sense-making within the overall ESE system.
At the node level, strength centrality suggested that CU3 (“I am confident that I understand the chemical processes in the experiment”) was most central in the female network, whereas LH2 (“I am confident of working in the laboratory without chemical spillage”) was most central in the male network. However, these contrasts should be treated as exploratory rather than definitive gender differences. The network comparison tests indicated no evidence of gender differences in overall network structure or global strength, and neither individual edges nor centrality indices differed significantly between groups. Moreover, robustness analyses reinforced a cautious interpretive stance: stronger edges were estimated more precisely, whereas many smaller edges had wide and overlapping confidence intervals, and closeness/betweenness showed lower stability, motivating an interpretation that prioritises strength (and expected influence where relevant).
Collectively, these findings contribute to knowledge in three main ways. First, they show that for Malaysian pre-university chemistry students, gender similarity is evident not only in overall patterns but also in the architecture of ESE – the way confidence components cohere within domains and connect across domains – providing the type of insight that network psychometrics is designed to offer. Second, the findings foreground conceptual sense-making as a potentially strategic component for supporting broader laboratory confidence, given its recurring bridging role across domains. Third, the results offer a practical reframing of ESE: rather than treating ESE as simply “high” or “low,” the network perspective supports viewing chemistry practical-work self-efficacy as a mutually reinforcing system of capability beliefs, suggesting that targeted support aimed at strategically connected beliefs may have wider benefits across the ESE network.
The study also suggests a plausible area for support for chemistry laboratory teaching and student support. Instructional designs that prioritise linking observation to explanation may strengthen confidence beyond conceptual understanding alone, particularly when prompts explicitly connect procedure → observation → chemical process → conclusion. In addition, the prominence of spillage-related confidence (LH2) in the male network suggests that technique-focused supports – such as structured practice for safe transfer/pouring and feedback on handling – may function as an accessible “micro-skill” entry point for strengthening broader laboratory confidence. Finally, bridges involving procedural complexity and resources indicate that confidence in interpreting results may be intertwined with perceived constraints; therefore, scaffolds for interpretation and planning routines may be especially valuable in contexts where time, instruments, or opportunities to repeat experiments are limited.
In conclusion, this study provides evidence that ESE among Malaysian pre-university chemistry students is organised as a structured, interconnected system of laboratory confidence beliefs, and that this system appears broadly similar across gender in both overall connectivity and structure. These patterns may inform future longitudinal or intervention studies by identifying confidence components that appear structurally connected within the ESE network; however, they should not be interpreted as direct evidence that targeting these components would causally strengthen broader laboratory confidence.
Supplementary information (SI) contains additional gender-specific Experimental Self-Efficacy (ESE) network analysis outputs, including edge-weight matrices for the female and male networks, item-level centrality indices for both networks, and robustness plots showing bootstrapped edge-weight confidence intervals and centrality stability curves. See DOI: https://doi.org/10.1039/d6rp00083e.
| This journal is © The Royal Society of Chemistry 2026 |