Lilian Daniala,
Jenna Koenen
b and
Rüdiger Tiemann
*a
aChemistry Education Research, Chemistry Department, Humboldt-Universität zu Berlin, Brook-Taylor-Str. 2, 12489 Berlin, Germany. E-mail: ruediger.tiemann@hu-berlin.de
bChemistry Education, Department of Educational Sciences, School of Social Sciences and Technology, Technical University of Munich, Arcisstraße 21, 80333 Munich, Germany
First published on 2nd September 2025
Critical thinking (CT) is actively reflecting upon one's experience and knowledge while searching for necessary information through inquiry, representing a fundamental competency in science education. Transitioning science teaching from passive rote learning to emphasizing CT skills is essential for promoting inquiry-based learning and scientific argumentation. However, fostering and assessing CT within scientific inquiry and laboratory-based learning environments continues to present significant challenges. This study examined the impact of a modified laboratory manual (LM) integrating cognitive prompts designed to enhance CT skills and dispositions in an undergraduate physical chemistry laboratory course. Using a mixed methods approach with pre- and post-experimental design, we assessed CT outcomes with the California Critical Thinking Disposition Inventory (CCTDI) and the California Critical Thinking Skills Test (CCTST), supplemented by open-ended questionnaires and semi-structured interviews with both teaching staff and students to evaluate perceptions of the intervention. Participants included 31 second-year undergraduate students randomly assigned to either an experimental group (n = 11) that used the CT-focused modified LM or a control group (n = 20) that followed the traditional LM. Results showed no observable differences between groups in the CCTST tool. However, a statistically significant decrease was observed in the control group's CT dispositions, in the overall score of the CCTDI, and four of seven subscales, while the experimental group maintained their CT dispositions. The four affected subscales were specifically aligned with the modifications’ objectives, while the remaining three were unrelated to the original LM and course objectives. Qualitative findings from interviews corroborated these results, indicating that the targeted modifications effectively sustained and enhanced CT dispositions in undergraduate laboratory settings. The study highlights the importance of incorporating CT through structured learning activities in undergraduate science education to maintain student engagement and CT dispositions, while promoting higher-order thinking skills.
An emerging body of literature emphasizes the importance of integrating CT into educational curricula, particularly in higher education (Dominguez, 2018; Zahavi and Friedman, 2019; Cruz et al., 2021; Andreucci-Annunziata et al., 2023), and adapting teaching methods to better foster CT skills within specific professional contexts (Niu et al., 2013; Dumitru et al., 2018). Despite the growing emphasis on developing CT skills in undergraduate (UG) science education, significant gaps remain in understanding how laboratory-based interventions can effectively foster CT (Danczak et al., 2017; Bowen, 2022). Existing research has predominantly focused on: (1) school-level chemistry education (e.g. van Brederode et al., 2020), (2) variations in CT definitions and perceptions (e.g. Danczak et al., 2017; Bowen, 2022), and (3) redesigning curricula involving changes in instructional strategies (e.g. Abrami et al., 2008; Andreucci-Annunziata et al., 2023). Comparatively few studies have investigated the CT instruction integration into undergraduate laboratory courses (van Brederode et al., 2020), particularly within the unique constraints of undergraduate chemistry education (UGCE).
This study aims to address this gap by designing and evaluating an intervention that incorporates CT-promoting instructional strategies into undergraduate physical chemistry laboratory manuals (LM). By examining CT skills, dispositions, and both instructor and student perspectives, this research contributes to advancing evidence-based laboratory instruction models for chemistry educators.
RQ1: To what extent do the modifications, including cognitive prompts, affect students' CT skills and dispositions compared to traditionally designed LM?
RQ2: How do students and teaching staff (TS) view the modifications?
The intervention was designed using three key theoretical foundations: CT principles (Ennis, 1985; Facione, 1990), cognitive load theory (Sweller et al., 2011), and meaningful learning (Ausubel et al., 1978). These frameworks guided two specific pedagogical features (as detailed in the methods section): (1) the integration of cognitive prompts (Galloway and Bretz, 2015; Rodriguez and Towns, 2018) and (2) prior knowledge activation (Merrill, 2002; Galloway and Bretz, 2015). This design aligns with the framework for pre-laboratory instruction proposed by Agustian and Seery (2017), which emphasizes the role of supportive information in preparing students for complex learning environments such as laboratory settings.
Additionally, the study adopts the immersion approach to CT instruction (Ennis, 1989), whereby CT is fostered implicitly through the design of the instructional material, rather than explicitly taught. This method minimizes potential biases introduced by variations in teaching strategies or the instructors’ views of CT. Collectively, these theoretical perspectives establish the foundation of the study's approach to support CT in an undergraduate physical chemistry laboratory course.
The APA-Delphi report frames CT as comprising both abilities (a set of cognitive skills) and dispositions (such as personal traits, motivations, and emotional intelligence), with these dimensions being closely related (Facione et al., 1994; Facione, 2000; Dominguez, 2018; Dumitru et al., 2018). This definition remains valid and widely used today (Andreucci-Annunziata et al., 2023), where CT dispositions represent the consistent internal motivation to employ CT skills (Facione, 2000). Both CT skills and dispositions should be fundamental outcomes of undergraduate education (Jones et al., 1995).
The NRC (2012) highlights that well-designed laboratory experiences can promote students’ competencies in scientific practices, including experimental design, CT, and scientific argumentation. Traditional verification-based laboratories, however, dominate UGCE, often limiting intellectual autonomy and reinforcing lecture content without deepening conceptual understanding (Weaver et al., 2008). Undergraduate Research Experiences (UGRE) offer students hands-on immersion in research, fostering deeper learning, skills development, and positive impacts on career pathways. They are typically only available to advanced students, missing critical early engagement opportunities (Nagda et al., 1998; Russell et al., 2007).
Transitioning high achievers to inquiry-based labs can be challenging, but interestingly, middle and lower-performing students often excel in such environments, showing the potential benefits of earlier inquiry integration (Weaver et al., 2008). NRC (2012) suggests that early inclusion of inquiry elements can enhance UGCE by improving students’ self-efficacy and skill in scientific practices, though cognitive overload and insufficient metacognitive support in undergraduate chemistry laboratories (UGCL) can drive students toward rote learning and memorization, compromising meaningful learning (Agustian, 2022).
The NRC report also notes that experts often overlook the novice perspective due to an “expert blind spot”, which can hinder effective guidance. Additionally, complex laboratory tasks can overwhelm working memory, especially when prior knowledge is insufficient (Galloway and Bretz, 2015). Furthermore, inconsistent alignment of laboratory goals across UGCL divisions – particularly regarding lecture connections, communication skills, and uncertainty analysis – underscores the need for clearer objectives and faculty engagement (Bruck et al., 2010). Despite these challenges, UGCL retains significant potential for developing higher-order thinking, such as CT (Agustian et al., 2022).
While metacognitive strategies can complement learning and self-assessment (e.g., Quintana et al., 2005; White and Frederiksen, 2005), this study focuses on cognitive prompts for two main reasons: (1) cognitive prompts alone or combined with metacognitive prompts have fostered successful learning (Berthold et al., 2007), and (2) limiting tasks to context-relevant cognitive prompts reduces working memory load while highlighting key learning aspects, as recommended by the NRC DBER (2012) report.
Meaningful learning in UGCL, as outlined by Galloway and Bretz's (2015) adaptation of Ausubel et al. (1978) framework, occurs if three conditions are met: (i) relevant prior knowledge for anchoring new information; (ii) re-organized learning materials connecting prior and new knowledge; and (iii) students’ conscious effort to make meaningful connections (metacognitive awareness). When any of these conditions are ill-addressed, students may resort to rote memorization (Grove and Bretz, 2012) as supported by Ausubel and Novak's Assimilation Theory, which emphasizes the central role of the learner in constructing knowledge. These conditions are central to Agustian and Seery's (2017) framework, which applies Cognitive Load Theory (Sweller et al., 2011) to optimize laboratory preparation and learning by addressing both supportive information (high-complexity, conceptual understanding, and underlying principles) and procedural information (low-complexity, task-specific, step-by-step guidance) in LM by allowing students to construct a meaningful mental schema for the experiment.
Given the necessity of the development of CT, this study investigates how modified LMs incorporating cognitive prompts and prior knowledge activation affect CT skills and dispositions in the context of undergraduate physical chemistry laboratory courses. To avoid potential biases from differing CT definitions among TS and students (Danczak et al., 2017; Bowen, 2022) and instructional variations between experimental and control groups, this study employs Ennis’ (1989) implicit immersion approach to CT, as detailed in the following section.
During each lab session, students work in pairs or groups of three to conduct experiments while making real-time methodological decisions. They evaluate procedure adequacy, data validity, and if measurements need repetition, justifying choices using course content. The course runs twice a week and includes 16 fixed experimental stations, with each student group performing a different pre-set experiment per session before rotating through all units. After concluding the experiment, students submit written lab-reports that present their obtained results and analyses based on the provided guidelines in the LM.
The overall aims of the course, which are expected by the TS, are (a) to familiarize students with experimental methods, tools, processes, calculations and data analysis that are central to physical chemistry, and (b) to enhance their ability to apply and justify theoretical knowledge in practical decision-making within the lab setting.
– Block A was assigned as the control group (n = 20:
12 males, 8 females) using the original LM; and,
– Block B was assigned as the experimental group (n = 11:
7 males, 4 females) using the modified LM.
To minimize bias, neither students nor TS were informed about the group distinction in the research design.
The prompts were integrated to challenge students to reflect on conceptual understanding of the underlying chemical and experimental principles, make experimental predictions, justify procedures, and evaluate their outcomes against theoretical concepts, aligning with Agustain and Seery's framework that proposes that an effective pre-laboratory preparation must address both supportive information (high-complexity, conceptual understanding and underlying principles) and procedural information (low-complexity, task-specific, step-by-step guidance) as an integral part of the preparation and LM by constructing meaningful schema.
The intervention aimed to reducing the cognitive overload (Galloway and Bretz, 2015) by addressing the extraneous cognitive load and germane load (Sweller et al., 2011) through two core modifications: (1) prior knowledge activation – helping students organize conceptual schemas before entering the lab, and (2) cognitive prompts – directing their attention to critical decision points during the experiment (see details about the modifications below). By supporting cognitive processing in this way, the intervention sought to enhance students’ capacity to apply CT during laboratory work without overwhelming their working memory.
In addition, the modifications met the following characteristics essential for higher education (van Brederode et al., 2020): (1) suitable for undergraduate level; (2) promote CT in a non-inquiry lab; (3) optional for students with no additional staffs burden; (4) transferable across curriculum; (5) resilient to variations in TS turnover, styles, evaluation practices, and syllabi; (6) sustainable and independent from the presence of the researcher; and (7) low-cost and easy to implement. These characteristics, which, unlike in school-based education (van Brederode et al., 2020), are even more important to maintain in the context of higher education and UGCL.
1. “Activation” of prior knowledge: based on a pre-study involving surveys and informal discussions with students and TS, students often complained that the required prior knowledge section in the LM, which is one of the crucial parts of their preparation, is ambiguous. This has left them with feelings of unclarity and anxiety. Therefore, the first effective learning principle, “activation” (Merrill, 2002), was applied by reformatting learning goals into operational descriptions. This aspect of clarifying the learning objectives is aligned with Galloway and Bretz's (2015) first condition for meaningful learning.
2. Gained knowledge and skills: a section categorizing expected learning and competency outcomes was added to the LM in three domains: content knowledge, laboratory skills, and data analysis skills. This addressed students’ difficulties in connecting individual experiments and to broader contexts, a concern also noted by laboratory supervisors.
These first two modifications support transfer of learning (Sousa, 2011) by:
– facilitating “transfer during learning” (prior knowledge application), and,
– enabling “transfer of learning” (new knowledge application).
According to the NRC (2012), prior knowledge plays a crucial role, either enhancing new learning through positive transfer or hindering it through negative transfer. Meaningful learning (Ausubel et al., 1978) involves connecting new information to existing knowledge, helping students create a coherent framework for future use. In contrast, rote memorization often results in isolated facts that lack the connections necessary for immediate application (transfer during learning) and long-term retention across courses (transfer of learning).
These components of the modification align with the second guiding principle presented by Seery and colleagues (2024) to ensure coherence with intended learning goals for students and TS.
3. Adding prompts: the LM layout was transformed from a portrait to a landscape orientation and split into two parts. The left side (about two-thirds of the document's width) contained the original LM with the first two modifications, while the right side added prompts parallel to the relevant text (see Fig. 1). This proximity follows the Theory of Multimedia Learning (Mayer, 2014) and the Cognitive Load Theory (Sweller et al., 2011; NRC, 2012). The prompts help students relate theoretical knowledge to experimental practices and activate their CT thinking about information and procedures, as well as their significance to the broader context of the curriculum (Galloway and Bretz, 2015). The prompts aimed to:
![]() | ||
Fig. 1 The original LM (left) and modified LM (right) – prior knowledge “activation” (marked in green), gained knowledge and skills (marked in yellow), and prompts to foster CT (marked in red). |
– link theoretical knowledge and experimental practices;
– anticipate results for better experimental judgments;
– justify procedures and consider alternatives;
– connect the obtained data to the activity aim and the theoretical background; and,
– enable generalization and transferability
This modification aligns with Seery et al.'s (2024) third guiding principle that advocates incorporating pre-laboratory activities in the form of information, prompts or questions so that students can be better prepared for learning in a complex and cognitively demanding environment and Rodriguez and Towns's (2018) emphasis on meaningful experimental procedure and data interpretation.
It is important to note that this study followed Ennis's (1989) immersion approach, where CT was fostered implicitly without explicit instruction or participant awareness of the CT focus. This approach was favorable to minimize potential biases from TS and students' perceptions and interpretation of CT (Danczak et al., 2017; Bowen, 2022), such effect was found to be central to influencing instruction and students' CT abilities in data interpretation (van Brederode et al., 2020). In line with Rodriguez and Towns (2018), our approach was to foster CT without requiring a full redesign of the existing LM, ensuring that students in the control and experimental groups received the same instructions and explanations. Fig. 1 illustrates the original vs. modified LM versions with the modifications marked in colors.
CCTST and subscales | Description |
---|---|
Overall – 34-point scale | Measures the reasoning skills used in the process of reflectively deciding what to believe or what to do. |
Analysis – 7-point scale | Ability to identify assumptions, reasons, and claims, and to examine how they interact in the formation of arguments. The tool relies on using charts, graphs, diagrams, spoken language, and documents to gather information. |
Inference – 16-point scale | Drawing correct conclusions from reasons and evidence, in offering thoughtful suggestions and hypotheses. |
Evaluation – 11-point scale | Credibility assessment of sources of information and the claims by providing the evidence, reasons, methods, criteria, or assumptions behind the claims and conclusions. |
Induction – 17-point scale | Decision-making that relies on inductive reasoning forms a confident basis, albeit with a degree of uncertainty, for solid belief in conclusions and provides a reasonable foundation for action. |
Deduction – 17-point scale | Decision-making in well-defined contexts relies on deductive reasoning, which uses established rules, principles, and core beliefs to arrive at conclusions with certainty. It starts with accepted premises for forming a conclusion, with no room for uncertainty. |
CCTDI and subscale | Description |
---|---|
Overall – 420-point scale | Assesses individual beliefs, expectations, intentions, values, and perceptions regarding critical thinking based on agree-or-disagree items. |
Truth-seeking | The habit of always desiring the best possible understanding of any given situation; it is following reasons and evidence wherever they may lead, even if they lead one to question cherished beliefs. |
Open-mindedness | Tolerance toward the opinions of others, even if we do not agree, knowing that often we all hold beliefs which make sense only from our perspectives. |
Analyticity | To be alert to what happens next and striving to anticipate both the good and the bad potential consequences or outcomes of situations, choices, proposals, and plans. |
Systematicity | Approaching problems in a disciplined, orderly, and systematic way. |
Confidence in reasoning | Trusting reflective thinking to solve problems and to make decisions. |
Inquisitiveness | Being curious and eager to acquire new knowledge and to learn the explanations for things, even when the applications of that new learning are not immediately apparent. |
Maturity of judgment | Seeing the complexity of issues and yet striving to make timely decisions even in the absence of complete knowledge. |
For RQ2, the qualitative data were collected through open-ended questionnaires administered to the control and experimental groups at the end of the course, and semi-structured interviews conducted with the instructor and two students from the experimental group after completing the modified learning activities. The interview with the students allowed for a deeper insight into the open-ended responses and for triangulation of findings across different data sources.
All assessments and interviews were conducted outside regular teaching hours and in the absence of TS to maintain independence from the instructional environment. Scores on the pre- and post-tools, and verbal and written responses were anonymized and not disclosed to the TS until after the conclusion of the course and final grading. This ensured the independence of research data from instructional and grading processes. All tools and raw data were only accessible to the research team members. These measures were implemented to uphold ethical standards, mitigate potential conflicts of interest arising from the researchers’ dual roles, and ensure the validity and fairness of the study.
To address RQ2 regarding participant perspectives, qualitative analysis of open-ended questionnaire responses was used, based on the principles of Constructivist Grounded Theory (CGT) (Charmaz, 2006). This methodology was chosen as it facilitates the exploration of open-ended data while acknowledging the researcher's role in co-constructing meaning alongside participants (Bowen, 2022). The analysis considered, grouped, and reported all written responses. One student from the control group left the questions unanswered, resulting in 19 responses in the control group and 11 total responses in the intervention, as reported in the Results section. Responses were grouped based on recurring ideas and categories that emerged inductively from the data.
Similarly, the qualitative data from the semi-structured interviews were subject to inductive thematic analysis concerning RQ2. Consistent with CGT, the interpretations are presented not as objective truths but as co-constructed insights shaped by both participant narratives and the researcher's analytical perspective (Charmaz, 2006; Bowen, 2022). This analytical approach has established applicability in comparable educational research contexts (Merriam and Tisdell, 2016; Creswell and Poth, 2017).
Inst. | Gr | Pre | Post | 95% CI for mean difference | t | df | t-crit | |d| | ||
---|---|---|---|---|---|---|---|---|---|---|
M | SD | M | SD | |||||||
a 420-point scale.b 34 points scale.c Con stands for the control group (N = 20).d Exp stands for the experimental group (N = 11).e Significance according to 1-tailed critical t-value (p < 0.05). | ||||||||||
CCTDIa | Conc | 287.15 | 26.33 | 277.55 | 26.23 | 9.60, 22.48 | 1.910e | 19 | 1.729 | 0.427 |
Expd | 296.27 | 28.11 | 301.18 | 21.34 | −4.91, 27.98 | −0.582 | 10 | 1.813 | 0.175 | |
CCTSTb | Con | 18.05 | 6.30 | 19.05 | 6.13 | −1.0, 2.83 | −1.581 | 19 | 1.729 | 0.353 |
Exp | 18.91 | 5.84 | 18.82 | 3.82 | 0.09, 4.04 | 0.075 | 10 | 1.813 | 0.023 |
A similar analysis was conducted on the seven subscales of the CCTDI (see Table 4), where two subscales showed similar statistically significant declines: Confidence in reasoning (t(19) = 1.804, p < 0.05, d = 0.4) and Inquisitiveness (t(19) = 2.228, p < 0.05, d = 0.5). In both cases, post-test mean scores of the control group were lower than the pre-test means scores.
CCTDI/subscalea | Gr | Pre | Post | 95% CI for mean difference | t | df | t-crit | |d| | ||
---|---|---|---|---|---|---|---|---|---|---|
M | SD | M | SD | |||||||
a Presented are the two subscales that were found to indicate a statistically significant difference between the research groups, 60-point scale.b Con stands for the control group (N = 20).c Exp stands for the experimental group (N = 11).d p < 0.05.e Significance according to 1-tailed critical t-value (p < 0.05). | ||||||||||
Confidence in reasoning | Conb | 40.55 | 5.99 | 38.95 | 5.54 | 1.60, 3.97 | 1.804e | 19 | 1.729 | 0.404 |
Expc | 44.00 | 4.04 | 43.09 | 5.00 | 0.91, 4.01 | 0.752 | 10 | 1.812 | 0.023 | |
Inquisitiveness | Con | 47.20 | 7.37 | 43.75 | 6.68 | 3.45, 6.92 | 2.228d | 19 | 1.729 | 0.498 |
Exp | 48.72 | 7.51 | 48.91 | 6.41 | −0.18, 7.76 | −0.078 | 10 | 1.812 | 0.227 |
In the experimental group, a non-significant numerical improvement was observed, as shown in Tables 3 and 4, and no significant difference was observable in the subscales.
CCTDI/subscalea | Group (N) | 95% CI for mean difference | t | df | t-crit | |d| | ||||
---|---|---|---|---|---|---|---|---|---|---|
Contr. (20) | Exper. (11) | |||||||||
M | SD | M | SD | |||||||
a Presented are the overall mean scores and four subscales that were found to indicate a statistically significant difference between the research groups in the CCTDI tool.b p < 0.05.c p < 0.005.d Significance according to 1-tailed critical t-value (p < 0.05). | ||||||||||
Overall | Pre | 287.15 | 26.33 | 296.27 | 28.11 | −9.12, 10.12 | −0.902 | 29 | 1.699 | 0.34 |
Post | 277.55 | 26.23 | 301.18 | 21.34 | −23.63, 9.25 | −2.554b | 29 | 0.99 | ||
Confidence in reasoning | Pre | 40.55 | 5.99 | 44.00 | 4.04 | −3.45, 2.03 | −1.701 | 29 | 1.699 | 0.67 |
Post | 38.95 | 5.54 | 43.09 | 5.00 | −4.14, 2.01 | −2.057b | 29 | 0.78 | ||
Inquisitiveness | Pre | 47.20 | 7.37 | 48.72 | 7.51 | −1.53, 2.78 | −0.548 | 29 | 1.699 | 0.21 |
Post | 43.75 | 6.68 | 48.90 | 6.41 | −5.16, 2.473 | −2.086b | 29 | 0.79 | ||
Analyticity | Pre | 41.50 | 6.49 | 43.54 | 5.68 | −2.04, 2.33 | −0.875 | 29 | 1.699 | 0.34 |
Post | 40.00 | 5.26 | 46.09 | 3.14 | −6.09, 1.74 | −3.495c | 29 | 1.41 | ||
Systematicity | Pre | 38.80 | 6.05 | 41.54 | 4.08 | −2.75, 2.04 | −1.340 | 29 | 1.699 | 0.53 |
Post | 38.30 | 5.16 | 41.72 | 4.19 | −3.42, 1.82 | −1.882d | 29 | 0.73 |
Post-test comparisons revealed distinctly different results for the CCTDI. Significant differences occurred in the overall post-score and four out of seven subscales: Confidence in reasoning, Inquisitiveness (as identified in the previous analyses), plus Analyticity and Systematicity. The control group was significantly different from the experimental group; Cohen's d-values ranged from 0.73 to 1.41 (see Table 5). According to Cohen's (1992) guidelines, these values indicate a large effect size, meaning that the higher the d-value is, the greater the chance for a randomly selected student from the experimental group to have a higher score than a randomly selected student from the control group.
These results show that while both groups started at relatively similar dispositions toward CT, they diverged significantly following the intervention, with the control group showing lower mean scores than the experimental group across multiple measures. According to the results, the experimental group improved numerically between the pre- and post-test. However, these improvements were not statistically significant, potentially due to the small sample size of the experimental group.
The positive feedback from the TS was based on their observations in the laboratory, the oral exam that tested students’ understanding and preparedness for the execution of the experiment, and analysis of submitted lab-reports. The following is a direct quote of one TS member when asked, “What do you think about the modified LM?”:
I use these questions in my [oral exam] because they are helpful to me to know if they understood everything or not … Like the question about autocatalytic reaction and the relationship between Iodine concentration and calibration curve. I think this is also helpful for other experiments because they know how this calibration actually works and how they can be used in other experiments … [the students] are more experienced and understand the theory behind the experiment more, they can relate the theoretical background better to the experiment. I also think that this is helpful to understand their uses in other experiments. With the questions they have less problems in understanding the experiment, I in previous years usually ask them about the steps, what you need to do now and now. and with the questions I see that they have less problems in understanding the experiment itself now and what they need to do … for experiment like T3 or T2 [unmodified learning activities] it would be nice to have something like this, because this is something that takes a lot of time, not only for me but also for them. Everyone could quit a little bit earlier when they understand better [smiles] … I must say it is helpful but it should be added to other experiments. [MQ_K8_K11_Sup_Int]
Comparative analysis of end-of-course questionnaires revealed notable differences between groups. When asked about what was challenging in the course, 42% of control group students (8 out of 19 students) attributed the most difficulty to the oral exam and the ambiguities related to the execution of the experiment. In contrast, only 18% of experimental group students (2 of 11) students. However, one referred specifically to the oral exams from unmodified learning activities (see first quote below), while the other student implied disregarding the modifications during preparation. The following are the quotes from the same two students from the experimental group:
Writing [reports]for the experiments/22 or T3. Moreover, the preparation for the [oral exam] of the experiments mentioned. [25929LILA_B3_Q13]
I [didn’t] answer them, I just read the questions and thought about it, whether I am able to answer them immediately. [01973ANRI_B3_Q17]
Examples for the students who mentioned the oral exam as challenging:
Mostly the [oral exam], because the preparation for them was really [extensive]. [02945MAAB_A5_Q13]*
The [oral exam]. Every Assistant had another focus, moreover the [oral exam]'s difficulty was not equal. [12983JUEL_A8_Q13]*
Some even added that there was ambiguity about what they needed to demonstrate in the oral exam regarding chemical background and experiment execution proficiency.
To say exactly what is requested in the [oral exam] [02945MAAB_A5_Q16]*
If [we are supposed to depend] on the script, then this is what can be asked in the [oral exam], please do this and ask some things about the issues mentioned on the script. [12983JUEL_A8_Q16]*
The performances of the experiments should be described even more precisely in the script. [15963MAPA_A2_Q16]*
* All quotes provided were from students in the control group.
Lab reports emerged as another significant challenge, mentioned by 58% of control group respondents (11 students) compared to 36% of experimental group respondents (4 students). Most of the difficulties were attributed to how time-consuming the preparation was, tight submission deadlines, and the late feedback they received on these lab-reports.
The open-ended experimental group questionnaire included additional items specifically addressing the modifications. Although students mostly agreed that there was no reduction in preparation time, they noted benefits including deeper conceptual understanding, improved theoretical knowledge and experimental connections, and clearer expectations about experimental data.
Students commented: [The modifications] guided my preparation of the experiment, but not the intensity of the preparation [01955MASC_B13_Q17]
Performance and the knowledge which was necessary were even more understandable and better structured. [23953VIRU_B3_Q18]
The preparation took a lot of energy and thinking a lot. Because there was lot of theoretical background. But the experiment was not so hard [01955MASC_B13_Stu_Int].
Another student emphasized how the revised prior knowledge section helped focus their preparation.
The additional questions didn’t influence the way I [prepared]. Because the questions given in the introduction (previous knowledge) were very detailed and as preparation for the [oral exam] it was enough to practice on the questions and answer them. [25929LILA_B3_Q17]
The semi-structured interviews further showed that despite the time investment, students valued how the modifications helped them to better understand the relevant chemical background and develop more realistic expectations from the experiment.
I think [the modifications] are very good, it became easier for me to prepare myself to the experiment, and I think I spent more time answering all the questions and understanding the whole manual and manuscript… I read the introduction (Theoretical Background) and then looked at the questions and took 2 hours to make solvatochromy clear to me, but after that, after 2 hours, I was able to explain the things here in the rest of the script. So after the preparation I read the whole manuscript and talked about the other questions, yes [my partner and I], we talked a lot on the phone. [01955MASC_B13_Stu_Int]
Some students expressed reservations about the modified format, particularly the unfamiliar placement of questions alongside relevant text, while others perceived the added text (especially the expected gained knowledge and skills, and the prompts) as extra workload. However, 8 out of 11 students in the experimental group (73%) recommended extending these modifications to the rest of the course LM.
The significant decrease in the dispositions toward CT in the control group may reflect Kuhn's (1999) argument about applying CT. Kuhn argues that thinking carefully and reflectively is a demanding process, and to make CT a habit of mind, one needs to see its value. She argues further that conceiving the value of such thinking is a key factor in dispositions and, eventually, will affect the application of such thinking. In the traditional format of the examined course, students followed predetermined procedures to obtain expected results and produce technical reports, limiting opportunities for reflective thinking. Without explicit feedback or engagement with CT processes, students may neglect these demanding cognitive practices (Dunbar and Fugelsang, 2005; Kuhn, 2010).
Analysis of the attributions of the four affected CCTDI subscales (see Table 5) showed their direct alignment with the aims of the LM modifications, which included:
− conceptual overviews linking experimental protocols to broader learning goals;
− procedural guidance for relevant laboratory techniques;
− questions aimed at linking theoretical understanding with practical experimental practices;
− prompting students to anticipate outcomes to enable better decision-making during the experiment;
− prompts to guide students to justify procedures and consider alternative approaches;
− prompts to help students interpret results/observations and relate them to the learning objectives and underlying theoretical concepts; and,
− fostering the ability to generalize gained knowledge and skills to broader contexts.
The knowledge activation and cognitive prompts specifically targeted the subscale Analyticity by encouraging students to question the significance of each step and consider alternative approaches These types of prompts are also connected to the subscale Inquisitiveness, by stimulating students’ willingness to acquire new knowledge and to understand relationships between concepts.
The Systematicity subscale was addressed through prompts that alternately directed students to focus on experimental details (“zoom in”) and broader implications (“zoom out”), fostering a systematic and holistic understanding of the applied experimental technique. Confidence in reasoning developed as students evaluated their knowledge gaps and learning needs.
By contrast, Truth-seeking (evaluating personal beliefs and arguments), Open-mindedness (considering others’ opinions), and Maturity of judgement (making timely decisions on complex issues) were neither part of the original LM design nor explicitly addressed by our prompts, as they fell outside the lab's structured format and learning objectives.
This pattern aligns with Insight Assessment's (2024) CCTDI findings, revealing that only three out of the seven subscales—Truth-seeking, Open-mindedness, and Maturity of judgment—increased significantly during enrollment in higher education. Our study observed that the control group's overall score decreased significantly (t = 1.910, p < 0.05), suggesting a potential decline in CT dispositions over time. In contrast, the experimental group showed a non-significant numerical improvement in dispositions. Further analysis showed that the same four CCTDI subscales that did not improve in the Insight Assessment study were major contributors to the decline in our control group, while the three improved subscales in the previous study remained stable in ours. This pattern suggests that certain dispositional aspects of CT may be more resistant to enhancement and may even decline without targeted intervention.
This parallel extends to CT skills: Insight Assessment reported also that CCTST mean score increased by 1.4 points (t = 9.10, p < 0.001), with undergraduate students' average scores of 16.5 in 2012 and 16.3 in 2019, while graduate students’ average scores rose from 19.0 to 20.0, with variations across institution types. Similarly, our study observed a 1-point gain in the control group versus negligible change in the experimental group.
The results also highlight the importance of reducing cognitive load—an issue that Galloway and Bretz (2015) raised—as overly complex laboratory tasks can overwhelm working memory and prevent meaningful engagement, particularly relevant in physical chemistry, where content and procedural complexity are high.
By focusing students on specific, theory-aligned tasks, the study's prompts likely mitigated cognitive overload, helping them better connect theoretical knowledge and experimental procedures. This approach aligned Seery et al.'s (2024) third guiding principle for laboratory learning, scaffolding student preparation and engagement while reducing in-session cognitive load – a critical factor in facilitating effective learning (Seery et al., 2024).
Our findings are consistent with prior research on inquiry-based learning (Linn et al., 2006; Chin and Osborne, 2010), reinforcing the importance of scaffolding and metacognitive reflections in laboratory instruction (Schraw and Dennison, 1994). From the cognitive load perspective (Sweller et al., 2011), the results suggest that structured pre-laboratory instructional materials, particularly those combining conceptual and procedural guidance, play an important role in reducing students' cognitive load and enabling higher-order thinking during laboratory sessions (Danial et al. 2025; Schmidt-McCormack et al., 2017; Seery et al., 2024).
In our study, the modified learning materials’ targeted questions and prompts explicitly aligned with each experiment's cognitive and conceptual demands, scaffolded students' reasoning in advance. This allowed knowledge access during execution without cognitive overload, helping the experimental group sustain their CT dispositions and exhibit greater engagement in complex scientific reasoning. Thus, rather than redesigning laboratory activities themselves, integrating structured CT prompts into existing LMs served as an effective strategy for promoting cognitive efficiency and deeper scientific engagement (Galloway and Bretz, 2015; Agustian and Seery, 2017).
The first two modification components aligned with Seery et al.'s (2024) second guiding principle, which emphasizes the importance of coherence and consistency with intended learning goals among the professional learning community. They identified misalignment between student, instructor, and staff goals as a key challenge in laboratory learning, recommending clear objectives emphasizing higher-order thinking skills, experimental competencies, and theory-practice integration. However, student feedback indicated that the added text (marked in green and yellow in Fig. 1) seemed too excessive, suggesting a need for a more concise presentation.
The findings from the interviews highlight the positive impact of modified LM on student engagement and understanding. In particular, both students and TS reported enhanced scientific comprehension and experiment preparation, with TS noting stronger theory and practice connections. This aligns with Keen and Sevian's (2022) conclusions that curriculum design should account for students' struggles through clear, purpose-driven tasks and challenge anticipation. The students in the experimental group showed better theory-experiment linking, resulting in more focused preparation and fewer students reporting the oral exam as challenging compared to the control group.
Despite these benefits, some students found the revised structure unfamiliar, particularly the placement of questions alongside the text, indicating potential areas for refinement in future iterations.
As Seery et al. (2024) argue, laboratory instruction offers a wide range of potential learning outcomes, yet practical constraints impose careful selection of learning outcomes based on course structure, student profile, and curricular stage. Our study focused specifically on CT and meaningful learning aspects in a laboratory setting within these practical boundaries.
Additionally, the study's immersion approach (Ennis, 1989) did not explicitly address CT principles, which may have limited the intervention's effectiveness. A more direct emphasis on CT as a learning objective could better support students’ long-term retention and application of these skills.
Finally, while data collection took place several years ago, the unchanged course structure and associated pedagogical challenges – including passive engagement and underdeveloped CT skills – retain the findings’ relevance. In light of the growing post-pandemic awareness of self-regulated and reflective learning, the findings remain highly relevant for informing sustainable instructional improvements.
While offering practical strategies for STEM educators through demonstrating how relatively minor modifications, such as embedding cognitive prompts, can significantly sustain CT dispositions in undergraduate students, this study also highlights two crucial considerations: the domain-specific nature of CT and the necessity of targeted curricular interventions. Our approach successfully balanced multiple implementation factors by aligning learning goals, integrating targeted pre-laboratory preparation, and maintaining practical feasibility for educators.
In light of the current literature on CT assessments (self-reports and non-specific tools) at the undergraduate level (NASEM, 2017) and on domain-specificity and CT-transferability (Ennis, 1989), there remains a need to develop more authentic assessment tools for CT performance in real-world settings. Furthermore, given that traditional laboratory goals in UGCL have been shown to emphasize procedural knowledge over conceptual understanding (Bruck et al., 2010), future research should explore the integration of more direct CT training and scaffolded learning tasks across the curriculum. The strategic embedding of prompts and reflective tasks across laboratory sequences may yield more consistent and transferable gains in students’ CT abilities.
Participation in the study was entirely voluntary and anonymous, with all participants being adult students who willingly provided informed consent. Participants were informed about the study's purpose and their right to withdraw at any time without any consequences. No disadvantages of any kind were associated with non-participation, thereby ensuring that participation was free from compulsion. To protect privacy and confidentiality, all data were anonymized, with participant identities concealed and replaced by coded identifiers to facilitate connections between pre- and post-assessment without compromising anonymity.
To uphold fairness and equity, all participants were subsequently granted access to the same materials provided to other groups after the study, ensuring no group experienced unequal treatment.
This journal is © The Royal Society of Chemistry 2025 |