Principles to foster critical thinking skills and dispositions in an undergraduate physical-chemistry laboratory course: the impact of modified laboratory manuals

Lilian Danial; Jenna Koenen; Rüdiger Tiemann

doi:10.1039/D4RP00373J

View PDF Version

DOI: 10.1039/D4RP00373J (Paper) Chem. Educ. Res. Pract., 2025, Advance Article

Principles to foster critical thinking skills and dispositions in an undergraduate physical-chemistry laboratory course: the impact of modified laboratory manuals

Lilian Danial^a, Jenna Koenen^b and Rüdiger Tiemann*^a
^aChemistry Education Research, Chemistry Department, Humboldt-Universität zu Berlin, Brook-Taylor-Str. 2, 12489 Berlin, Germany. E-mail: ruediger.tiemann@hu-berlin.de
^bChemistry Education, Department of Educational Sciences, School of Social Sciences and Technology, Technical University of Munich, Arcisstraße 21, 80333 Munich, Germany

Received 8th January 2025 , Accepted 11th July 2025

First published on 2nd September 2025

Abstract

Critical thinking (CT) is actively reflecting upon one's experience and knowledge while searching for necessary information through inquiry, representing a fundamental competency in science education. Transitioning science teaching from passive rote learning to emphasizing CT skills is essential for promoting inquiry-based learning and scientific argumentation. However, fostering and assessing CT within scientific inquiry and laboratory-based learning environments continues to present significant challenges. This study examined the impact of a modified laboratory manual (LM) integrating cognitive prompts designed to enhance CT skills and dispositions in an undergraduate physical chemistry laboratory course. Using a mixed methods approach with pre- and post-experimental design, we assessed CT outcomes with the California Critical Thinking Disposition Inventory (CCTDI) and the California Critical Thinking Skills Test (CCTST), supplemented by open-ended questionnaires and semi-structured interviews with both teaching staff and students to evaluate perceptions of the intervention. Participants included 31 second-year undergraduate students randomly assigned to either an experimental group (n = 11) that used the CT-focused modified LM or a control group (n = 20) that followed the traditional LM. Results showed no observable differences between groups in the CCTST tool. However, a statistically significant decrease was observed in the control group's CT dispositions, in the overall score of the CCTDI, and four of seven subscales, while the experimental group maintained their CT dispositions. The four affected subscales were specifically aligned with the modifications’ objectives, while the remaining three were unrelated to the original LM and course objectives. Qualitative findings from interviews corroborated these results, indicating that the targeted modifications effectively sustained and enhanced CT dispositions in undergraduate laboratory settings. The study highlights the importance of incorporating CT through structured learning activities in undergraduate science education to maintain student engagement and CT dispositions, while promoting higher-order thinking skills.

1. Introduction

In today's increasingly complex and interconnected world, education must extend beyond discipline-specific knowledge and skills to foster the development of essential 21st-century skills, including critical thinking (CT), creativity, collaboration, and communication (Conklin, 2005; Trilling and Fadel, 2009; Shavelson, 2010; Voogt and Roblin, 2010; Saavedra and Opfer, 2012; Holbrook, 2017). Among these, the development of students’ CT is particularly important (Barak et al., 2007; Abrami et al., 2008; Facione, 2011; Saavedra and Opfer, 2012; Halpern, 2014; OECD, 2018b). Along with competencies such as learning to learn and active citizenship, CT enables students to effectively address contemporary challenges, providing a foundation not only for personal development but also for professional success as adaptable and innovative employees (Hoskins and Deacon Crick, 2010). In particular, CT fosters the abilities to evaluate information, solve problems creatively, and make informed decisions—skills that are increasingly important for addressing global challenges and success in modern, dynamic workplaces (Saavedra and Opfer, 2012; OECD, 2018b). The European Qualifications Framework (EQF) for Lifelong Learning (European Commission, 2008) reinforces this view, identifying such skills as essential competences.

An emerging body of literature emphasizes the importance of integrating CT into educational curricula, particularly in higher education (Dominguez, 2018; Zahavi and Friedman, 2019; Cruz et al., 2021; Andreucci-Annunziata et al., 2023), and adapting teaching methods to better foster CT skills within specific professional contexts (Niu et al., 2013; Dumitru et al., 2018). Despite the growing emphasis on developing CT skills in undergraduate (UG) science education, significant gaps remain in understanding how laboratory-based interventions can effectively foster CT (Danczak et al., 2017; Bowen, 2022). Existing research has predominantly focused on: (1) school-level chemistry education (e.g. van Brederode et al., 2020), (2) variations in CT definitions and perceptions (e.g. Danczak et al., 2017; Bowen, 2022), and (3) redesigning curricula involving changes in instructional strategies (e.g. Abrami et al., 2008; Andreucci-Annunziata et al., 2023). Comparatively few studies have investigated the CT instruction integration into undergraduate laboratory courses (van Brederode et al., 2020), particularly within the unique constraints of undergraduate chemistry education (UGCE).

This study aims to address this gap by designing and evaluating an intervention that incorporates CT-promoting instructional strategies into undergraduate physical chemistry laboratory manuals (LM). By examining CT skills, dispositions, and both instructor and student perspectives, this research contributes to advancing evidence-based laboratory instruction models for chemistry educators.

1.1. Research questions

This study seeks to answer the following questions:

RQ1: To what extent do the modifications, including cognitive prompts, affect students' CT skills and dispositions compared to traditionally designed LM?

RQ2: How do students and teaching staff (TS) view the modifications?

2. Theoretical positioning of the study

This study is informed by multiple theoretical perspectives relevant to science education rather than a single overarching framework. At its foundation, the study draws primarily on the constructivist view of learning, which postulates that knowledge is actively constructed by the learner rather than passively received (Vygotsky, 1978; Kuhn, 1999). Within this perspective, learning is facilitated when students engage in meaningful tasks that challenge their understanding and promote CT.

The intervention was designed using three key theoretical foundations: CT principles (Ennis, 1985; Facione, 1990), cognitive load theory (Sweller et al., 2011), and meaningful learning (Ausubel et al., 1978). These frameworks guided two specific pedagogical features (as detailed in the methods section): (1) the integration of cognitive prompts (Galloway and Bretz, 2015; Rodriguez and Towns, 2018) and (2) prior knowledge activation (Merrill, 2002; Galloway and Bretz, 2015). This design aligns with the framework for pre-laboratory instruction proposed by Agustian and Seery (2017), which emphasizes the role of supportive information in preparing students for complex learning environments such as laboratory settings.

Additionally, the study adopts the immersion approach to CT instruction (Ennis, 1989), whereby CT is fostered implicitly through the design of the instructional material, rather than explicitly taught. This method minimizes potential biases introduced by variations in teaching strategies or the instructors’ views of CT. Collectively, these theoretical perspectives establish the foundation of the study's approach to support CT in an undergraduate physical chemistry laboratory course.

2.1. Literature review

CT is defined in diverse ways across the literature. For example, Ennis (1985) and Kuhn (1999) offer influential perspectives, characterizing CT as reasonable, reflective, and purposeful thinking focused on justifying knowledge to determine what to believe or do. Giancarlo and Facione (2001) describe CT as purposeful thinking that produces judgments subject to continuous monitoring and improvement, applicable to both information evaluation and decision-making. Facione (1990) identifies core CT cognitive skills, including analysis, interpretation, inference, explanation, evaluation, and self-regulation, whereas Kuhn (1999) further conceptualizes CT as the reflective justification of existing or new knowledge to inform subsequent decisions (Barak et al., 2007). This thinking mode is fundamental to rationalizing unfamiliar situations by applying higher-order thinking skills, problem-solving strategies, and metacognition (Ennis, 1989). Although there is no consensus on its definition (Bowen, 2022), Andreucci-Annunziata et al.'s (2023) systematic review establishes CT as a trainable higher-order cognitive process incorporating both dispositions and skills.

The APA-Delphi report frames CT as comprising both abilities (a set of cognitive skills) and dispositions (such as personal traits, motivations, and emotional intelligence), with these dimensions being closely related (Facione et al., 1994; Facione, 2000; Dominguez, 2018; Dumitru et al., 2018). This definition remains valid and widely used today (Andreucci-Annunziata et al., 2023), where CT dispositions represent the consistent internal motivation to employ CT skills (Facione, 2000). Both CT skills and dispositions should be fundamental outcomes of undergraduate education (Jones et al., 1995).

The NRC (2012) highlights that well-designed laboratory experiences can promote students’ competencies in scientific practices, including experimental design, CT, and scientific argumentation. Traditional verification-based laboratories, however, dominate UGCE, often limiting intellectual autonomy and reinforcing lecture content without deepening conceptual understanding (Weaver et al., 2008). Undergraduate Research Experiences (UGRE) offer students hands-on immersion in research, fostering deeper learning, skills development, and positive impacts on career pathways. They are typically only available to advanced students, missing critical early engagement opportunities (Nagda et al., 1998; Russell et al., 2007).

Transitioning high achievers to inquiry-based labs can be challenging, but interestingly, middle and lower-performing students often excel in such environments, showing the potential benefits of earlier inquiry integration (Weaver et al., 2008). NRC (2012) suggests that early inclusion of inquiry elements can enhance UGCE by improving students’ self-efficacy and skill in scientific practices, though cognitive overload and insufficient metacognitive support in undergraduate chemistry laboratories (UGCL) can drive students toward rote learning and memorization, compromising meaningful learning (Agustian, 2022).

The NRC report also notes that experts often overlook the novice perspective due to an “expert blind spot”, which can hinder effective guidance. Additionally, complex laboratory tasks can overwhelm working memory, especially when prior knowledge is insufficient (Galloway and Bretz, 2015). Furthermore, inconsistent alignment of laboratory goals across UGCL divisions – particularly regarding lecture connections, communication skills, and uncertainty analysis – underscores the need for clearer objectives and faculty engagement (Bruck et al., 2010). Despite these challenges, UGCL retains significant potential for developing higher-order thinking, such as CT (Agustian et al., 2022).

2.2. Strategies to foster critical thinking

CT is a thoughtful, complex, and cognitively demanding process that requires scaffolded learning activities to shift students from default rote mechanical behavior to applying CT. As Kuhn (2010) explains, knowledge advancement occurs when individuals recognize their current understanding may be incomplete or incorrect, acknowledging that there is new knowledge to acquire. In other words, knowledge is constructed and subject to change as new information is acquired. This perspective aligns with Dweck (2006), Popper (1959), and Vygotsky's (1978) view that recognizing limitations and errors can challenge existing knowledge and encourage deeper learning. Scaffolding through prompts, such as questions and hints, can guide deeper information processing, fostering the higher-order thinking skills necessary for critical analysis and problem-solving (Vygotsky, 1978; Anderson et al., 2001; Fisher, 2011; Chi et al., 2018). From its definition, CT incorporates both cognitive and metacognitive processes (Kuhn, 1999), where metacognitive strategies support deeper learning and self-assessment by improving students’ ability to reflect on and monitor their understanding and learning processes (White and Frederiksen, 2005). The cognitive processes fundamental to CT involves creating meaningful interconnections between pieces of information (Weinstein and Mayer, 1986), which can be activated through prompts (such as questions) (Singer and Smith, 2013) or so-called “strategy-activators” (Berthold et al., 2007) that induce latent learning strategies.

While metacognitive strategies can complement learning and self-assessment (e.g., Quintana et al., 2005; White and Frederiksen, 2005), this study focuses on cognitive prompts for two main reasons: (1) cognitive prompts alone or combined with metacognitive prompts have fostered successful learning (Berthold et al., 2007), and (2) limiting tasks to context-relevant cognitive prompts reduces working memory load while highlighting key learning aspects, as recommended by the NRC DBER (2012) report.

Meaningful learning in UGCL, as outlined by Galloway and Bretz's (2015) adaptation of Ausubel et al. (1978) framework, occurs if three conditions are met: (i) relevant prior knowledge for anchoring new information; (ii) re-organized learning materials connecting prior and new knowledge; and (iii) students’ conscious effort to make meaningful connections (metacognitive awareness). When any of these conditions are ill-addressed, students may resort to rote memorization (Grove and Bretz, 2012) as supported by Ausubel and Novak's Assimilation Theory, which emphasizes the central role of the learner in constructing knowledge. These conditions are central to Agustian and Seery's (2017) framework, which applies Cognitive Load Theory (Sweller et al., 2011) to optimize laboratory preparation and learning by addressing both supportive information (high-complexity, conceptual understanding, and underlying principles) and procedural information (low-complexity, task-specific, step-by-step guidance) in LM by allowing students to construct a meaningful mental schema for the experiment.

Given the necessity of the development of CT, this study investigates how modified LMs incorporating cognitive prompts and prior knowledge activation affect CT skills and dispositions in the context of undergraduate physical chemistry laboratory courses. To avoid potential biases from differing CT definitions among TS and students (Danczak et al., 2017; Bowen, 2022) and instructional variations between experimental and control groups, this study employs Ennis’ (1989) implicit immersion approach to CT, as detailed in the following section.

3. Methodology

3.1. Context and participants

This study was conducted during the winter term of 2016 in a second-year undergraduate physical chemistry lab course. The course structure, instructional challenges, and institutional setting have remained consistent since data collection.

3.1.1. The physical chemistry laboratory course. The course syllabus comprises 16 experimental units covering standard topics in undergraduate physical chemistry: kinetics (K), thermodynamics (T), electrochemistry (E), and spectroscopy. The course falls within an expository laboratory setting (Domin, 2007), and each learning activity in the laboratory manual (LM) follows a consistent seven-part structure: (1) aim, (2) prior knowledge (list of concepts), (3) theoretical background, (4) measuring principle/method, (5) procedure, (6) reporting/analysis of the results, and (7) references. The responsible TS assesses student preparedness at the beginning of each lab session through an oral exam focusing mainly on the prior knowledge and theoretical background sections of the LM.

During each lab session, students work in pairs or groups of three to conduct experiments while making real-time methodological decisions. They evaluate procedure adequacy, data validity, and if measurements need repetition, justifying choices using course content. The course runs twice a week and includes 16 fixed experimental stations, with each student group performing a different pre-set experiment per session before rotating through all units. After concluding the experiment, students submit written lab-reports that present their obtained results and analyses based on the provided guidelines in the LM.

The overall aims of the course, which are expected by the TS, are (a) to familiarize students with experimental methods, tools, processes, calculations and data analysis that are central to physical chemistry, and (b) to enhance their ability to apply and justify theoretical knowledge in practical decision-making within the lab setting.

3.1.2. The participants. 31 second-year undergraduate students majoring in chemistry participated in this study. Due to laboratory capacity constraints and the fixed 16-station setup, students self-selected into two pre-existing groups (“Block A” and “Block B”). For this study:

– Block A was assigned as the control group (n = 20 [thin space (1/6-em)] :12 males, 8 females) using the original LM; and,

– Block B was assigned as the experimental group (n = 11 [thin space (1/6-em)] :7 males, 4 females) using the modified LM.

To minimize bias, neither students nor TS were informed about the group distinction in the research design.

3.2. The intervention

Two existing LM learning activities were modified to enhance CT through embedded cognitive prompts. These prompts were designed to encourage reflective thinking and deeper cognitive engagement, informed by the literature on CT, particularly the framework developed by Paul and Elder (2005), research on the effectiveness of cognitive prompts in fostering metacognitive processes (Peters and Kitsantas, 2010; Zhang et al., 2015), and directly linked to the framework proposed by Agustian and Seery (2017) which draws on principles of Cognitive Load Theory (Sweller et al., 2011) to optimize laboratory preparation and learning.

The prompts were integrated to challenge students to reflect on conceptual understanding of the underlying chemical and experimental principles, make experimental predictions, justify procedures, and evaluate their outcomes against theoretical concepts, aligning with Agustain and Seery's framework that proposes that an effective pre-laboratory preparation must address both supportive information (high-complexity, conceptual understanding and underlying principles) and procedural information (low-complexity, task-specific, step-by-step guidance) as an integral part of the preparation and LM by constructing meaningful schema.

The intervention aimed to reducing the cognitive overload (Galloway and Bretz, 2015) by addressing the extraneous cognitive load and germane load (Sweller et al., 2011) through two core modifications: (1) prior knowledge activation – helping students organize conceptual schemas before entering the lab, and (2) cognitive prompts – directing their attention to critical decision points during the experiment (see details about the modifications below). By supporting cognitive processing in this way, the intervention sought to enhance students’ capacity to apply CT during laboratory work without overwhelming their working memory.

In addition, the modifications met the following characteristics essential for higher education (van Brederode et al., 2020): (1) suitable for undergraduate level; (2) promote CT in a non-inquiry lab; (3) optional for students with no additional staffs burden; (4) transferable across curriculum; (5) resilient to variations in TS turnover, styles, evaluation practices, and syllabi; (6) sustainable and independent from the presence of the researcher; and (7) low-cost and easy to implement. These characteristics, which, unlike in school-based education (van Brederode et al., 2020), are even more important to maintain in the context of higher education and UGCL.

3.2.1. Description of the modifications. Three types of modifications were introduced to the LM:

1. “Activation” of prior knowledge: based on a pre-study involving surveys and informal discussions with students and TS, students often complained that the required prior knowledge section in the LM, which is one of the crucial parts of their preparation, is ambiguous. This has left them with feelings of unclarity and anxiety. Therefore, the first effective learning principle, “activation” (Merrill, 2002), was applied by reformatting learning goals into operational descriptions. This aspect of clarifying the learning objectives is aligned with Galloway and Bretz's (2015) first condition for meaningful learning.

2. Gained knowledge and skills: a section categorizing expected learning and competency outcomes was added to the LM in three domains: content knowledge, laboratory skills, and data analysis skills. This addressed students’ difficulties in connecting individual experiments and to broader contexts, a concern also noted by laboratory supervisors.

These first two modifications support transfer of learning (Sousa, 2011) by:

– facilitating “transfer during learning” (prior knowledge application), and,

– enabling “transfer of learning” (new knowledge application).

According to the NRC (2012), prior knowledge plays a crucial role, either enhancing new learning through positive transfer or hindering it through negative transfer. Meaningful learning (Ausubel et al., 1978) involves connecting new information to existing knowledge, helping students create a coherent framework for future use. In contrast, rote memorization often results in isolated facts that lack the connections necessary for immediate application (transfer during learning) and long-term retention across courses (transfer of learning).

These components of the modification align with the second guiding principle presented by Seery and colleagues (2024) to ensure coherence with intended learning goals for students and TS.

3. Adding prompts: the LM layout was transformed from a portrait to a landscape orientation and split into two parts. The left side (about two-thirds of the document's width) contained the original LM with the first two modifications, while the right side added prompts parallel to the relevant text (see Fig. 1). This proximity follows the Theory of Multimedia Learning (Mayer, 2014) and the Cognitive Load Theory (Sweller et al., 2011; NRC, 2012). The prompts help students relate theoretical knowledge to experimental practices and activate their CT thinking about information and procedures, as well as their significance to the broader context of the curriculum (Galloway and Bretz, 2015). The prompts aimed to:


	Fig. 1 The original LM (left) and modified LM (right) – prior knowledge “activation” (marked in green), gained knowledge and skills (marked in yellow), and prompts to foster CT (marked in red).

– link theoretical knowledge and experimental practices;

– anticipate results for better experimental judgments;

– justify procedures and consider alternatives;

– connect the obtained data to the activity aim and the theoretical background; and,

– enable generalization and transferability

This modification aligns with Seery et al.'s (2024) third guiding principle that advocates incorporating pre-laboratory activities in the form of information, prompts or questions so that students can be better prepared for learning in a complex and cognitively demanding environment and Rodriguez and Towns's (2018) emphasis on meaningful experimental procedure and data interpretation.

It is important to note that this study followed Ennis's (1989) immersion approach, where CT was fostered implicitly without explicit instruction or participant awareness of the CT focus. This approach was favorable to minimize potential biases from TS and students' perceptions and interpretation of CT (Danczak et al., 2017; Bowen, 2022), such effect was found to be central to influencing instruction and students' CT abilities in data interpretation (van Brederode et al., 2020). In line with Rodriguez and Towns (2018), our approach was to foster CT without requiring a full redesign of the existing LM, ensuring that students in the control and experimental groups received the same instructions and explanations. Fig. 1 illustrates the original vs. modified LM versions with the modifications marked in colors.

3.3. Data collection

The study employed a mixed-methods approach to address both research questions. For RQ1, the pre- and post-tests (Campbell and Stanley, 1963) were administered using the California Critical Thinking Skills Test (CCTST) (Facione, 1990; Facione and Facione, 1994) and the California Critical Thinking Disposition Inventory (CCTDI) (Facione and Facione, 1992a, b) as dependent variables. The CCTST provides both an overall score and five sub-scores measuring specific CT skills dimensions, while the CCTDI instrument provides an overall disposition score along with seven sub-scores across different dispositional aspects (see Tables 1 and 2 for definitions of the overall score and subscales, respectively). In addition, background information such as demographics, motivation, interest (adapted from the Programme for International Student Assessment – PISA (OECD, 2016, 2018a) and Student Understanding of Science and Scientific Inquiry Questionnaire – SUSSI) (Liang et al., 2006, Liang et al., 2009), were measured.

Table 1 The overall and five measured attributes of the CCTST instrument and their definitions (Facione, 1990; Facione and Facione, 1994; Insight Assessment, 2017a)

CCTST and subscales	Description
Overall – 34-point scale	Measures the reasoning skills used in the process of reflectively deciding what to believe or what to do.
Analysis – 7-point scale	Ability to identify assumptions, reasons, and claims, and to examine how they interact in the formation of arguments. The tool relies on using charts, graphs, diagrams, spoken language, and documents to gather information.
Inference – 16-point scale	Drawing correct conclusions from reasons and evidence, in offering thoughtful suggestions and hypotheses.
Evaluation – 11-point scale	Credibility assessment of sources of information and the claims by providing the evidence, reasons, methods, criteria, or assumptions behind the claims and conclusions.
Induction – 17-point scale	Decision-making that relies on inductive reasoning forms a confident basis, albeit with a degree of uncertainty, for solid belief in conclusions and provides a reasonable foundation for action.
Deduction – 17-point scale	Decision-making in well-defined contexts relies on deductive reasoning, which uses established rules, principles, and core beliefs to arrive at conclusions with certainty. It starts with accepted premises for forming a conclusion, with no room for uncertainty.

Table 2 The overall and seven measured attributes of the CCTDI instrument and their definitions (from Facione and Facione, 1992a, b; Facione et al., 1995; Insight Assessment, 2017b)

CCTDI and subscale	Description
Overall – 420-point scale	Assesses individual beliefs, expectations, intentions, values, and perceptions regarding critical thinking based on agree-or-disagree items.
Truth-seeking	The habit of always desiring the best possible understanding of any given situation; it is following reasons and evidence wherever they may lead, even if they lead one to question cherished beliefs.
Open-mindedness	Tolerance toward the opinions of others, even if we do not agree, knowing that often we all hold beliefs which make sense only from our perspectives.
Analyticity	To be alert to what happens next and striving to anticipate both the good and the bad potential consequences or outcomes of situations, choices, proposals, and plans.
Systematicity	Approaching problems in a disciplined, orderly, and systematic way.
Confidence in reasoning	Trusting reflective thinking to solve problems and to make decisions.
Inquisitiveness	Being curious and eager to acquire new knowledge and to learn the explanations for things, even when the applications of that new learning are not immediately apparent.
Maturity of judgment	Seeing the complexity of issues and yet striving to make timely decisions even in the absence of complete knowledge.

For RQ2, the qualitative data were collected through open-ended questionnaires administered to the control and experimental groups at the end of the course, and semi-structured interviews conducted with the instructor and two students from the experimental group after completing the modified learning activities. The interview with the students allowed for a deeper insight into the open-ended responses and for triangulation of findings across different data sources.

3.4. Role of the researcher

Clear and consistent protocols were employed to separate the roles of the developers-researchers and evaluators-instructors, ensuring transparency and reliability in the data collection process. The researchers who developed the laboratory modifications also played a role in data collection, including conducting observations, interviews, designing pre- and post-assessment tools, and administering assessments. It is important to note that none of the researchers were involved in the instruction, evaluation, or grading of students in the laboratory course.

All assessments and interviews were conducted outside regular teaching hours and in the absence of TS to maintain independence from the instructional environment. Scores on the pre- and post-tools, and verbal and written responses were anonymized and not disclosed to the TS until after the conclusion of the course and final grading. This ensured the independence of research data from instructional and grading processes. All tools and raw data were only accessible to the research team members. These measures were implemented to uphold ethical standards, mitigate potential conflicts of interest arising from the researchers’ dual roles, and ensure the validity and fairness of the study.

3.5. Data analysis

For RQ1, quantitative data were analyzed using t-tests and ANCOVA after ensuring normal distribution and verifying equality of variances. These statistical tests assessed two key comparisons: (1) the differences between the pre- and post-test mean scores on the CCTST and CCTDI instruments within each group, and (2) differences in scores between the experimental and control groups (De Winter, 2013).

To address RQ2 regarding participant perspectives, qualitative analysis of open-ended questionnaire responses was used, based on the principles of Constructivist Grounded Theory (CGT) (Charmaz, 2006). This methodology was chosen as it facilitates the exploration of open-ended data while acknowledging the researcher's role in co-constructing meaning alongside participants (Bowen, 2022). The analysis considered, grouped, and reported all written responses. One student from the control group left the questions unanswered, resulting in 19 responses in the control group and 11 total responses in the intervention, as reported in the Results section. Responses were grouped based on recurring ideas and categories that emerged inductively from the data.

Similarly, the qualitative data from the semi-structured interviews were subject to inductive thematic analysis concerning RQ2. Consistent with CGT, the interpretations are presented not as objective truths but as co-constructed insights shaped by both participant narratives and the researcher's analytical perspective (Charmaz, 2006; Bowen, 2022). This analytical approach has established applicability in comparable educational research contexts (Merriam and Tisdell, 2016; Creswell and Poth, 2017).

4. Results

4.1. RQ1 – effect of the modifications on students’ CT skills and dispositions

To examine the effect of the modifications on students’ CT skills and dispositions, several t-tests, after assuring normal distribution and verification of equality of variances, were conducted. These analyses addressed two key comparisons: (1) differences between pre- and post-test mean scores of the CCTST and CCTDI instruments within each group, and (2) differences between the experimental and control groups (De Winter, 2013). The t-tests were conducted on both overall scores and subscales scores for both instruments, with mean scores, standard deviations, dependent samples t-test results, and Cohen's d (Cohen, 1988; 1992) presented in Table 3. The background information, motivations, interest in science, and scientific literacy collected through the tools based on PISA and SUSSI did not show any statistically significant differences between the groups.

Table 3 Results of dependent samples t-tests and descriptive statistics for the overall score of CCTDI and CCTST in pre- and post-test, by group

Inst.	Gr	Pre		Post		95% CI for mean difference	t	df	t-crit	\|d\|
Inst.	Gr	M	SD	M	SD	95% CI for mean difference	t	df	t-crit	\|d\|
a 420-point scale.b 34 points scale.c Con stands for the control group (N = 20).d Exp stands for the experimental group (N = 11).e Significance according to 1-tailed critical t-value (p < 0.05).
CCTDI^a	Con^c	287.15	26.33	277.55	26.23	9.60, 22.48	1.910^e	19	1.729	0.427
CCTDI^a	Exp^d	296.27	28.11	301.18	21.34	−4.91, 27.98	−0.582	10	1.813	0.175
CCTST^b	Con	18.05	6.30	19.05	6.13	−1.0, 2.83	−1.581	19	1.729	0.353
CCTST^b	Exp	18.91	5.84	18.82	3.82	0.09, 4.04	0.075	10	1.813	0.023

4.1.1. Impact on CT skills based on the results of the dependent samples t-test on the CCTST. The analysis revealed no significant difference in pre- and post-test mean scores for either group on the CCTST instrument (see Table 3). This suggests that the cognitive prompts embedded in the learning activities did not have a measurable impact on students’ overall CT skills.

4.1.2. Impact on CT dispositions based on the results of the dependent samples t-test on the CCTDI. Analysis of the mean scores from the CCTDI instrument revealed contrasting results to the skills assessment. The dependent samples t-test results show otherwise. Examining the 1-tailed t-test of the dependent samples using critical t-value (t_crit(19) = 1.729) showed a statistically significant decrease in the control group's disposition mean scores (t₍₁₉₎ = 1.910, p < 0.05, d = 0.43) between the pre- (M = 287.15, SD = 26.33) and post-test (M = 277.55, SD = 26.23) (see Table 3).

A similar analysis was conducted on the seven subscales of the CCTDI (see Table 4), where two subscales showed similar statistically significant declines: Confidence in reasoning (t₍₁₉₎ = 1.804, p < 0.05, d = 0.4) and Inquisitiveness (t₍₁₉₎ = 2.228, p < 0.05, d = 0.5). In both cases, post-test mean scores of the control group were lower than the pre-test means scores.

Table 4 Results of dependent samples t-tests and descriptive statistics for CCTDI statistically significant subscales in pre- and post-test, by group

CCTDI/subscale^a	Gr	Pre		Post		95% CI for mean difference	t	df	t-crit	\|d\|
CCTDI/subscale^a	Gr	M	SD	M	SD	95% CI for mean difference	t	df	t-crit	\|d\|
a Presented are the two subscales that were found to indicate a statistically significant difference between the research groups, 60-point scale.b Con stands for the control group (N = 20).c Exp stands for the experimental group (N = 11).d p < 0.05.e Significance according to 1-tailed critical t-value (p < 0.05).
Confidence in reasoning	Con^b	40.55	5.99	38.95	5.54	1.60, 3.97	1.804^e	19	1.729	0.404
Confidence in reasoning	Exp^c	44.00	4.04	43.09	5.00	0.91, 4.01	0.752	10	1.812	0.023
Inquisitiveness	Con	47.20	7.37	43.75	6.68	3.45, 6.92	2.228^d	19	1.729	0.498
Inquisitiveness	Exp	48.72	7.51	48.91	6.41	−0.18, 7.76	−0.078	10	1.812	0.227

In the experimental group, a non-significant numerical improvement was observed, as shown in Tables 3 and 4, and no significant difference was observable in the subscales.

4.1.3. Comparing the control and the experimental groups based on the results for the independent samples t-test for the CCTST and CCTDI. Independent samples t-test compared pre-test mean scores between groups on the CCTST and CCTDI instruments and their respective subscales (see Table 5). The results revealed no significant differences in either overall pre-test mean scores or their subscales, indicating that the students had similar CT skills and dispositions towards CT at the beginning of the intervention.

Table 5 Results of independent samples t-tests and descriptive statistics for CCTDI overall score and its statistically significant subscales in pre- and post-test

CCTDI/subscale^a		Group (N)				95% CI for mean difference	t	df	t-crit	\|d\|
		Contr. (20)		Exper. (11)
		M	SD	M	SD
a Presented are the overall mean scores and four subscales that were found to indicate a statistically significant difference between the research groups in the CCTDI tool.b p < 0.05.c p < 0.005.d Significance according to 1-tailed critical t-value (p < 0.05).
Overall	Pre	287.15	26.33	296.27	28.11	−9.12, 10.12	−0.902	29	1.699	0.34
Overall	Post	277.55	26.23	301.18	21.34	−23.63, 9.25	−2.554^b	29		0.99
Confidence in reasoning	Pre	40.55	5.99	44.00	4.04	−3.45, 2.03	−1.701	29	1.699	0.67
Confidence in reasoning	Post	38.95	5.54	43.09	5.00	−4.14, 2.01	−2.057^b	29		0.78
Inquisitiveness	Pre	47.20	7.37	48.72	7.51	−1.53, 2.78	−0.548	29	1.699	0.21
Inquisitiveness	Post	43.75	6.68	48.90	6.41	−5.16, 2.473	−2.086^b	29		0.79
Analyticity	Pre	41.50	6.49	43.54	5.68	−2.04, 2.33	−0.875	29	1.699	0.34
Analyticity	Post	40.00	5.26	46.09	3.14	−6.09, 1.74	−3.495^c	29		1.41
Systematicity	Pre	38.80	6.05	41.54	4.08	−2.75, 2.04	−1.340	29	1.699	0.53
Systematicity	Post	38.30	5.16	41.72	4.19	−3.42, 1.82	−1.882^d	29		0.73

Post-test comparisons revealed distinctly different results for the CCTDI. Significant differences occurred in the overall post-score and four out of seven subscales: Confidence in reasoning, Inquisitiveness (as identified in the previous analyses), plus Analyticity and Systematicity. The control group was significantly different from the experimental group; Cohen's d-values ranged from 0.73 to 1.41 (see Table 5). According to Cohen's (1992) guidelines, these values indicate a large effect size, meaning that the higher the d-value is, the greater the chance for a randomly selected student from the experimental group to have a higher score than a randomly selected student from the control group.

These results show that while both groups started at relatively similar dispositions toward CT, they diverged significantly following the intervention, with the control group showing lower mean scores than the experimental group across multiple measures. According to the results, the experimental group improved numerically between the pre- and post-test. However, these improvements were not statistically significant, potentially due to the small sample size of the experimental group.

4.2. RQ2 – The modification from the point of view of students and TS

The modified LM received predominantly positive feedback from both students and TS. Positive attributions, such as enhancing deeper scientific understanding and preparedness to execute the experiments, were highlighted. In addition, the TS responsible for the modified learning activities noted that the modifications allowed students to draw deeper connections between the different learning activities and laboratory skills. Based on these observed improvements in student understanding and execution, the head professor of the course requested expansion of the modifications to all the remaining LM for the coming semester.

The positive feedback from the TS was based on their observations in the laboratory, the oral exam that tested students’ understanding and preparedness for the execution of the experiment, and analysis of submitted lab-reports. The following is a direct quote of one TS member when asked, “What do you think about the modified LM?”:

I use these questions in my [oral exam] because they are helpful to me to know if they understood everything or not … Like the question about autocatalytic reaction and the relationship between Iodine concentration and calibration curve. I think this is also helpful for other experiments because they know how this calibration actually works and how they can be used in other experiments … [the students] are more experienced and understand the theory behind the experiment more, they can relate the theoretical background better to the experiment. I also think that this is helpful to understand their uses in other experiments. With the questions they have less problems in understanding the experiment, I in previous years usually ask them about the steps, what you need to do now and now. and with the questions I see that they have less problems in understanding the experiment itself now and what they need to do … for experiment like T3 or T2 [unmodified learning activities] it would be nice to have something like this, because this is something that takes a lot of time, not only for me but also for them. Everyone could quit a little bit earlier when they understand better [smiles] … I must say it is helpful but it should be added to other experiments. [MQ_K8_K11_Sup_Int]

Comparative analysis of end-of-course questionnaires revealed notable differences between groups. When asked about what was challenging in the course, 42% of control group students (8 out of 19 students) attributed the most difficulty to the oral exam and the ambiguities related to the execution of the experiment. In contrast, only 18% of experimental group students (2 of 11) students. However, one referred specifically to the oral exams from unmodified learning activities (see first quote below), while the other student implied disregarding the modifications during preparation. The following are the quotes from the same two students from the experimental group:

Writing [reports]for the experiments/22 or T3. Moreover, the preparation for the [oral exam] of the experiments mentioned. [25929LILA_B3_Q13]

I [didn’t] answer them, I just read the questions and thought about it, whether I am able to answer them immediately. [01973ANRI_B3_Q17]

Examples for the students who mentioned the oral exam as challenging:

Mostly the [oral exam], because the preparation for them was really [extensive]. [02945MAAB_A5_Q13]*

The [oral exam]. Every Assistant had another focus, moreover the [oral exam]'s difficulty was not equal. [12983JUEL_A8_Q13]*

Some even added that there was ambiguity about what they needed to demonstrate in the oral exam regarding chemical background and experiment execution proficiency.

To say exactly what is requested in the [oral exam] [02945MAAB_A5_Q16]*

If [we are supposed to depend] on the script, then this is what can be asked in the [oral exam], please do this and ask some things about the issues mentioned on the script. [12983JUEL_A8_Q16]*

The performances of the experiments should be described even more precisely in the script. [15963MAPA_A2_Q16]*

* All quotes provided were from students in the control group.

Lab reports emerged as another significant challenge, mentioned by 58% of control group respondents (11 students) compared to 36% of experimental group respondents (4 students). Most of the difficulties were attributed to how time-consuming the preparation was, tight submission deadlines, and the late feedback they received on these lab-reports.

The open-ended experimental group questionnaire included additional items specifically addressing the modifications. Although students mostly agreed that there was no reduction in preparation time, they noted benefits including deeper conceptual understanding, improved theoretical knowledge and experimental connections, and clearer expectations about experimental data.

Students commented: [The modifications] guided my preparation of the experiment, but not the intensity of the preparation [01955MASC_B13_Q17]

Performance and the knowledge which was necessary were even more understandable and better structured. [23953VIRU_B3_Q18]

The preparation took a lot of energy and thinking a lot. Because there was lot of theoretical background. But the experiment was not so hard [01955MASC_B13_Stu_Int].

Another student emphasized how the revised prior knowledge section helped focus their preparation.

The additional questions didn’t influence the way I [prepared]. Because the questions given in the introduction (previous knowledge) were very detailed and as preparation for the [oral exam] it was enough to practice on the questions and answer them. [25929LILA_B3_Q17]

The semi-structured interviews further showed that despite the time investment, students valued how the modifications helped them to better understand the relevant chemical background and develop more realistic expectations from the experiment.

I think [the modifications] are very good, it became easier for me to prepare myself to the experiment, and I think I spent more time answering all the questions and understanding the whole manual and manuscript… I read the introduction (Theoretical Background) and then looked at the questions and took 2 hours to make solvatochromy clear to me, but after that, after 2 hours, I was able to explain the things here in the rest of the script. So after the preparation I read the whole manuscript and talked about the other questions, yes [my partner and I], we talked a lot on the phone. [01955MASC_B13_Stu_Int]

Some students expressed reservations about the modified format, particularly the unfamiliar placement of questions alongside relevant text, while others perceived the added text (especially the expected gained knowledge and skills, and the prompts) as extra workload. However, 8 out of 11 students in the experimental group (73%) recommended extending these modifications to the rest of the course LM.

5. Discussion

This study evaluated how modified LMs with embedded cognitive prompts affected students’ CT skills and dispositions in an undergraduate physical chemistry lab course. The findings align with theoretical foundations emphasizing CT's importance for fostering 21st-century skills in STEM education (Holbrook, 2017; Trilling and Fadel, 2009) and support the NRC DBER (2012) recommendation to move beyond verification-based laboratory practices towards approaches that encourage analytical and reflective thinking. While the modified LMs showed potential for fostering students’ CT by linking cognitive prompts with lab tasks, the observed decline in CT dispositions among control group students highlights the need for sustained interventions in the curriculum to maintain and enhance these skills.

The significant decrease in the dispositions toward CT in the control group may reflect Kuhn's (1999) argument about applying CT. Kuhn argues that thinking carefully and reflectively is a demanding process, and to make CT a habit of mind, one needs to see its value. She argues further that conceiving the value of such thinking is a key factor in dispositions and, eventually, will affect the application of such thinking. In the traditional format of the examined course, students followed predetermined procedures to obtain expected results and produce technical reports, limiting opportunities for reflective thinking. Without explicit feedback or engagement with CT processes, students may neglect these demanding cognitive practices (Dunbar and Fugelsang, 2005; Kuhn, 2010).

Analysis of the attributions of the four affected CCTDI subscales (see Table 5) showed their direct alignment with the aims of the LM modifications, which included:

− conceptual overviews linking experimental protocols to broader learning goals;

− procedural guidance for relevant laboratory techniques;

− questions aimed at linking theoretical understanding with practical experimental practices;

− prompting students to anticipate outcomes to enable better decision-making during the experiment;

− prompts to guide students to justify procedures and consider alternative approaches;

− prompts to help students interpret results/observations and relate them to the learning objectives and underlying theoretical concepts; and,

− fostering the ability to generalize gained knowledge and skills to broader contexts.

The knowledge activation and cognitive prompts specifically targeted the subscale Analyticity by encouraging students to question the significance of each step and consider alternative approaches These types of prompts are also connected to the subscale Inquisitiveness, by stimulating students’ willingness to acquire new knowledge and to understand relationships between concepts.

The Systematicity subscale was addressed through prompts that alternately directed students to focus on experimental details (“zoom in”) and broader implications (“zoom out”), fostering a systematic and holistic understanding of the applied experimental technique. Confidence in reasoning developed as students evaluated their knowledge gaps and learning needs.

By contrast, Truth-seeking (evaluating personal beliefs and arguments), Open-mindedness (considering others’ opinions), and Maturity of judgement (making timely decisions on complex issues) were neither part of the original LM design nor explicitly addressed by our prompts, as they fell outside the lab's structured format and learning objectives.

This pattern aligns with Insight Assessment's (2024) CCTDI findings, revealing that only three out of the seven subscales—Truth-seeking, Open-mindedness, and Maturity of judgment—increased significantly during enrollment in higher education. Our study observed that the control group's overall score decreased significantly (t = 1.910, p < 0.05), suggesting a potential decline in CT dispositions over time. In contrast, the experimental group showed a non-significant numerical improvement in dispositions. Further analysis showed that the same four CCTDI subscales that did not improve in the Insight Assessment study were major contributors to the decline in our control group, while the three improved subscales in the previous study remained stable in ours. This pattern suggests that certain dispositional aspects of CT may be more resistant to enhancement and may even decline without targeted intervention.

This parallel extends to CT skills: Insight Assessment reported also that CCTST mean score increased by 1.4 points (t = 9.10, p < 0.001), with undergraduate students' average scores of 16.5 in 2012 and 16.3 in 2019, while graduate students’ average scores rose from 19.0 to 20.0, with variations across institution types. Similarly, our study observed a 1-point gain in the control group versus negligible change in the experimental group.

5.1. The role of instructional design and cognitive load

The findings suggest that while the cognitive prompts in the modified LM facilitated deeper content engagement, their impact on CT skills (measured by CCTST) was not statistically significant. This aligns with Ennis's (1989) argument on domain specificity in CT, implying that context-specific modifications may not lead to a transferability of skills across different problem types. The prompts used in this study were specific to the physical chemistry laboratory setting and, thus, may have improved domain-specific CT without leading to broader application beyond the subject area. The CCTST, however, presents problems more aligned with everyday reasoning and decision-making, which may explain the lack of observable transfer of CT skills to this generalized assessment tool.

The results also highlight the importance of reducing cognitive load—an issue that Galloway and Bretz (2015) raised—as overly complex laboratory tasks can overwhelm working memory and prevent meaningful engagement, particularly relevant in physical chemistry, where content and procedural complexity are high.

By focusing students on specific, theory-aligned tasks, the study's prompts likely mitigated cognitive overload, helping them better connect theoretical knowledge and experimental procedures. This approach aligned Seery et al.'s (2024) third guiding principle for laboratory learning, scaffolding student preparation and engagement while reducing in-session cognitive load – a critical factor in facilitating effective learning (Seery et al., 2024).

Our findings are consistent with prior research on inquiry-based learning (Linn et al., 2006; Chin and Osborne, 2010), reinforcing the importance of scaffolding and metacognitive reflections in laboratory instruction (Schraw and Dennison, 1994). From the cognitive load perspective (Sweller et al., 2011), the results suggest that structured pre-laboratory instructional materials, particularly those combining conceptual and procedural guidance, play an important role in reducing students' cognitive load and enabling higher-order thinking during laboratory sessions (Danial et al. 2025; Schmidt-McCormack et al., 2017; Seery et al., 2024).

In our study, the modified learning materials’ targeted questions and prompts explicitly aligned with each experiment's cognitive and conceptual demands, scaffolded students' reasoning in advance. This allowed knowledge access during execution without cognitive overload, helping the experimental group sustain their CT dispositions and exhibit greater engagement in complex scientific reasoning. Thus, rather than redesigning laboratory activities themselves, integrating structured CT prompts into existing LMs served as an effective strategy for promoting cognitive efficiency and deeper scientific engagement (Galloway and Bretz, 2015; Agustian and Seery, 2017).

5.2. Student dispositions and engagement with inquiry-based learning

Results from the CCTDI indicate a decline in CT dispositions in the control group, suggesting potential disengagement from reflective thinking in the absence of structured prompts. Previous research supports the link between students’ engagement in CT and their exposure to inquiry-based activities (Weaver et al., 2008). Despite showing non-significant improvements, the experimental group exhibited greater engagement in four subscales that typically don’t improve among UG without explicit intervention (Insight Assessment, 2024). This indicates that structured inquiry elements may foster a supportive environment for developing CT dispositions. These results reinforce Weaver et al.'s, (2008) observation that inquiry-based learning particularly benefits middle- and lower-performing students, suggesting that early exposure to inquiry elements could help sustain CT dispositions.

5.3. Constraints in promoting CT – efficacy vs. practicality

In developing our intervention, we balanced effectiveness with practical constraints, including TS turnover, varied teaching styles, evaluation practices, and syllabi requirements. Our intervention aimed to foster CT and meaningful learning without imposing additional burdens on TS or students, adhering strictly to the practical constraints outlined in our methods section.

The first two modification components aligned with Seery et al.'s (2024) second guiding principle, which emphasizes the importance of coherence and consistency with intended learning goals among the professional learning community. They identified misalignment between student, instructor, and staff goals as a key challenge in laboratory learning, recommending clear objectives emphasizing higher-order thinking skills, experimental competencies, and theory-practice integration. However, student feedback indicated that the added text (marked in green and yellow in Fig. 1) seemed too excessive, suggesting a need for a more concise presentation.

The findings from the interviews highlight the positive impact of modified LM on student engagement and understanding. In particular, both students and TS reported enhanced scientific comprehension and experiment preparation, with TS noting stronger theory and practice connections. This aligns with Keen and Sevian's (2022) conclusions that curriculum design should account for students' struggles through clear, purpose-driven tasks and challenge anticipation. The students in the experimental group showed better theory-experiment linking, resulting in more focused preparation and fewer students reporting the oral exam as challenging compared to the control group.

Despite these benefits, some students found the revised structure unfamiliar, particularly the placement of questions alongside the text, indicating potential areas for refinement in future iterations.

6. Limitation

While the cognitive prompts enhanced students’ contextual understanding, the findings face several limitations. The small sample size leads to uncertainty in estimating the effect, limiting the ability to claim statistical significance. The lack of statistically significant improvement in CT skills for the experimental group highlights the need for larger-scale replication.

As Seery et al. (2024) argue, laboratory instruction offers a wide range of potential learning outcomes, yet practical constraints impose careful selection of learning outcomes based on course structure, student profile, and curricular stage. Our study focused specifically on CT and meaningful learning aspects in a laboratory setting within these practical boundaries.

Additionally, the study's immersion approach (Ennis, 1989) did not explicitly address CT principles, which may have limited the intervention's effectiveness. A more direct emphasis on CT as a learning objective could better support students’ long-term retention and application of these skills.

Finally, while data collection took place several years ago, the unchanged course structure and associated pedagogical challenges – including passive engagement and underdeveloped CT skills – retain the findings’ relevance. In light of the growing post-pandemic awareness of self-regulated and reflective learning, the findings remain highly relevant for informing sustainable instructional improvements.

7. Conclusions

This study examined whether modifying LMs to emphasize CT could enhance students’ CT skills and dispositions in a second-year undergraduate physical chemistry lab course. The findings suggest that such modifications are essential for maintaining students’ CT dispositions, with qualitative interview data strongly supporting these conclusions. The analysis suggests that the modified LMs improved student engagement and preparedness while addressing common challenges, such as the high cognitive load associated with complex laboratory tasks. Additionally, results indicate that science instruction lacking explicit CT opportunities has the potential to decrease students’ willingness to apply advanced modes of thinking in scientific contexts, highlighting the importance of providing structured platforms for using and practicing CT as an essential component of science education (Barak et al., 2007).

While offering practical strategies for STEM educators through demonstrating how relatively minor modifications, such as embedding cognitive prompts, can significantly sustain CT dispositions in undergraduate students, this study also highlights two crucial considerations: the domain-specific nature of CT and the necessity of targeted curricular interventions. Our approach successfully balanced multiple implementation factors by aligning learning goals, integrating targeted pre-laboratory preparation, and maintaining practical feasibility for educators.

In light of the current literature on CT assessments (self-reports and non-specific tools) at the undergraduate level (NASEM, 2017) and on domain-specificity and CT-transferability (Ennis, 1989), there remains a need to develop more authentic assessment tools for CT performance in real-world settings. Furthermore, given that traditional laboratory goals in UGCL have been shown to emphasize procedural knowledge over conceptual understanding (Bruck et al., 2010), future research should explore the integration of more direct CT training and scaffolded learning tasks across the curriculum. The strategic embedding of prompts and reflective tasks across laboratory sequences may yield more consistent and transferable gains in students’ CT abilities.

Ethical considerations

This study adhered to rigorous ethical standards throughout the research process. Although ethical approvals were not mandated at the time of data collection in Germany, the research employed strict adherence to the code of ethics of the German Research Foundation (DFG; Project number: DFG GSC 1013 SALSA), which was published at the time when the study planning and data collection took place.

Participation in the study was entirely voluntary and anonymous, with all participants being adult students who willingly provided informed consent. Participants were informed about the study's purpose and their right to withdraw at any time without any consequences. No disadvantages of any kind were associated with non-participation, thereby ensuring that participation was free from compulsion. To protect privacy and confidentiality, all data were anonymized, with participant identities concealed and replaced by coded identifiers to facilitate connections between pre- and post-assessment without compromising anonymity.

To uphold fairness and equity, all participants were subsequently granted access to the same materials provided to other groups after the study, ensuring no group experienced unequal treatment.

Author contributions

The first author served as the primary investigator, data curation and analysis, and wrote the original draft. The second and third authors served in supervision, validation, and writing review & editing.

Conflicts of interest

The authors declare no conflict of interest.

Data availability

The paper is based on the following: (1) The raw data collected from the pre and post-tests and the statistical analysis. (2) Interviews conducted with two students and one teaching staff member. (3) Open-ended questionnaire for the students and students’ responses. All data collected from human participants, therefore, are not available for confidentiality reasons and data protection regulations.

Acknowledgements

We are especially thankful for the funding by the German Research Foundation (DFG; Project number: DFG GSC 1013 SALSA) for the School of Analytical Sciences Adlershof (SALSA), where this research was conducted. We express our gratitude to all participating students and faculty members, especially Dr. Wolfgang Christen and the course's teaching staff, who contributed to the data collection process. Their support and collaboration were invaluable to the success of this project.

References

Abrami P. C., Bernard R. M., Borokhovski E., Wade A., Surkes M. A., Tamim R. and Zhang D., (2008), Instructional interventions affecting critical thinking skills and dispositions: a stage 1 meta-analysis, Rev. Educ. Res., 78(4), 1102–1134.
Agustian H. Y., (2022), Considering the hexad of learning domains in the laboratory to address the overlooked aspects of chemistry education and a fragmentary approach to assessment of student learning, Chem. Educ. Res. Pract., 23(3), 518–530.
Agustian H. Y., Finne L. T., Jørgensen J. T., Pedersen M. I., Christiansen F. V., Gammelgaard B. and Nielsen J. A., (2022), Learning outcomes of university chemistry teaching in laboratories: a systematic review of empirical literature, Rev. Educ., 10(2), e3360.
Agustian H. Y. and Seery M. K., (2017), Reasserting the role of prelaboratory activities in chemistry education: a proposed framework for their design, Chem. Educ. Res. Pract., 18(4), 518–532.
Anderson L. W., Krathwohl D. R., Airasian P. W., Cruikshank K. A. and Wittrock M. C., (2001), A taxonomy for learning, teaching, and assessing: A revision of Bloom's taxonomy of educational objectives, New York: Longman.
Andreucci-Annunziata P., Riedemann A., Cortés S., Mellado A., del Río M. T. and Vega-Muñoz A., (2023), Conceptualizations and instructional strategies on critical thinking in higher education: a systematic review of systematic reviews, Front. Educ., 8, 1141686.
Ausubel D. P., Novak J. D. and Hanesian H., (1978), Educational psychology: A cognitive view, 2nd edn, New York: Holt, Rinehart and Winston.
Barak M., Ben-Chaim D. and Zoller U., (2007), Purposely teaching for the promotion of higher-order thinking skills: a case of critical thinking, Res. Sci. Educ., 37(4), 353–369.
Berthold K., Nückles M. and Renkl A., (2007), Do learning protocols support learning strategies and outcomes? The role of cognitive and metacognitive prompts, Learn. Instruct., 17(5), 564–577.
Bowen R. S., (2022), Exploring the development of students’ critical thinking in the laboratory context, Chem. Educ. Res. Pract., 23(4), 725–741.
Bruck L. B., Towns M. and Bretz S. L., (2010), Faculty perspectives of undergraduate chemistry laboratory: goals and obstacles to success, J. Chem. Educ., 87(12), 1416–1424.
Campbell D. T. and Stanley J. C., (1963), Experimental and quasi-experimental designs for research, Chicago: Rand-McNally.
Charmaz K., (2006), Constructing grounded theory: A practical guide through qualitative analysis, London: Sage.
Chi M. T. H., Glaser R. and Farr M. J., (2018), Promoting learning through active engagement, Educ. Psychol., 53(2), 73–86.
Chin C. and Osborne J., (2010), Supporting argumentation through students’ questions: case studies in science classrooms, J. Learn. Scie., 19(2), 230–284.
Cohen J., (1988), Statistical power analysis for the behavioral sciences, 2nd edn, Hillsdale, NJ: Erlbaum.
Cohen J., (1992), A power primer, Psychol. Bull., 112(1), 155–159.
Conklin J., (2005), Review of A taxonomy for learning, teaching, and assessing: a revision of Bloom's taxonomy of educational objectives, Educ. Horiz., 83(3), 154–159.
Creswell J. W. and Poth C. N., (2017), Qualitative inquiry and research design: Choosing among five approaches, 4th edn, Thousand Oaks, CA: SAGE.
Cruz G., Payan-Carreira R., Dominguez C., Silva H. and Morais F., (2021), What critical thinking skills and dispositions do new graduates need for professional life? Views from Portuguese employers in different fields, High. Educ. Res. Dev., 40(4), 721–737.
Danczak S. M., Thompson C. D. and Overton T. L., (2017), ‘What does the term critical thinking mean to you?’: a qualitative analysis of chemistry undergraduate definitions, Chem. Educ. Res. Pract., 18(3), 420–434.
Danial L., Koenen J. and Tiemann R., (2025), Critical thinking performance assessment in an undergraduate physical chemistry laboratory course: Developing a contextual critical thinking coding manual, Am. J. Qual. Res., in press.
De Winter J. C., (2013), Using the Student's t-test with extremely small sample sizes, Pract. Assess., Res., Evaluation, 18(1), 10.
Domin D. S., (2007), Students’ perceptions of when conceptual development occurs during laboratory instruction, Chem. Educ. Res. Pract., 8(2), 140–152.
Dominguez C., (2018), A European review on critical thinking educational practices in higher education institutions, Vila Real: UTAD/EU.
Dumitru D., Bigu D., Elen J., Jiang L., Railienė A. and Penkauskienė D., et al., (2018), A European collection of the critical thinking skills and dispositions needed in different professional fields for the 21st century, Vila Real: UTAD.
Dunbar K. and Fugelsang J., (2005), Scientific thinking and reasoning, in Holyoak K. J. and Morrison R. G. (ed.), The Cambridge handbook of thinking and reasoning, Cambridge: Cambridge University Press, pp. 705–725.
Dweck C. S., (2006), Mindset: The new psychology of success, New York: Random House.
Ennis R. H., (1985), Critical thinking and the curriculum, National Forum, 65(1), 28–31.
Ennis R. H., (1989), Critical thinking and subject specificity: clarification and needed research, Educ. Res., 18(3), 4–10.
European Commission, (2008), The European Qualifications Framework for Lifelong Learning (EQF), Luxembourg: Office for Official Publications of the European Communities. Available at https://ehea.info/media.ehea.info/file/Framework_for_qualifications/69/0/EQF-LLL-2008_596690.pdf.
Facione P. A., (1990), The California Critical Thinking Skills Test (CCTST): Forms A and B; and the CCTST test manual, Millbrae, CA: California Academic Press.
Facione P. A., (2000), The disposition toward critical thinking: its character, measurement, and relationship to critical thinking skill, Informal Logic, 20(1), 61–84.
Facione P. A., (2011), Critical thinking: What it is and why it counts, San Jose, CA: Insight Assessment.
Facione P. A. and Facione N. C., (1992a), The California Critical Thinking Disposition Inventory (CCTDI), Millbrae, CA: California Academic Press.
Facione P. A. and Facione N. C., (1992b), The California Critical Thinking Disposition Inventory (CCTDI), Millbrae, CA: California Academic Press.
Facione P. A. and Facione N. C., (1994), The California Critical Thinking Skills Test: Test manual, Millbrae, CA: California Academic Press.
Facione N. C., Facione P. A. and Sanchez C. A., (1994), Critical thinking disposition as a measure of competent clinical judgment: the development of the California Critical Thinking Disposition Inventory, J. Nursing Educ., 33(8), 345–350.
Facione P. A., Sanchez C. A., Facione N. C. and Gainen J., (1995), The disposition toward critical thinking, J. Gen. Educ., 44(1), 1–25.
Fisher R., (2011), Teaching thinking: Philosophical enquiry in the classroom, London: Continuum.
Galloway K. R. and Bretz S. L., (2015), Development of an assessment tool to measure students’ meaningful learning in the undergraduate chemistry laboratory, J. Chem. Educ., 92(7), 1149–1158.
Giancarlo C. A. and Facione P. A., (2001), A look across four years at the disposition toward critical thinking among undergraduate students, J. General Educ., 50(1), 29–55.
Grove N. P. and Bretz S. L., (2012), A continuum of learning: from rote memorization to meaningful learning in organic chemistry, Chem. Educ. Res. Pract., 13(3), 201–208.
Halpern D. F., (2014), Thought and knowledge: An introduction to critical thinking, New York: Psychology Press.
Holbrook J., (2017), 21st-century skills and science learning environments, in Taber K. S. and Akpan B. (ed.), Science Education, Rotterdam: Sense Publishers, pp. 385–401.
Hoskins B. and Deacon Crick R., (2010), Competences for learning to learn and active citizenship: Different currencies or two sides of the same coin?, Eur. J. Educ., 45(1), 121–137.
Insight Assessment, (2017a), CCTST User Manual and Resource Guide, San Jose, CA: The California Academic Press.
Insight Assessment, (2017b), CCTDI User Manual and Resource Guide, San Jose, CA: The California Academic Press.
Insight Assessment, (2024), What the critical thinking data tell us: Critical thinking in college-level learners – 2024 update, Hermosa Beach, CA: Measured Reasons. Available at: https://www.insightassessment.com [Accessed 20 December 2024].
Jones E., Hoffman S., Moore L., Ratcliff G., Tibbetts S. and Click B., (1995), National assessment of college student learning: Identifying the college graduates’ essential skills in writing, speech and listening, and critical thinking, Washington, DC: U.S. Department of Education, Office of Educational Research and Improvement.
Keen C. and Sevian H., (2022), Qualifying domains of student struggle in undergraduate general chemistry laboratory, Chem. Educ. Res. Pract., 23(1), 12–37.
Kuhn D., (1999), A developmental model of critical thinking, Educ. Res., 28(2), 16–46.
Kuhn D., (2010), What is scientific thinking and how does it develop? in Goswami U. (ed.), The Wiley-Blackwell Handbook of Childhood Cognitive Development, 2nd edn, Chichester: Wiley-Blackwell, pp. 497–523.
Liang L. L., Chen S., Chen X., Kaya O. N., Adams A. D., Macklin M. and Ebenezer J., (2006), Student understanding of science and scientific inquiry (SUSSI): Revision and further validation of an assessment instrument, Annual Conference of the National Association for Research in Science Teaching (NARST), San Francisco, CA, April, 122, pp. 1–38.
Liang L. L., Chen S., Chen X., Kaya O. N., Adams A. D., Macklin M. and Ebenezer J., (2009), Preservice teachers’ views about nature of scientific knowledge development: an international collaborative study, Int. J. Sci. Math. Educ., 7(5), 987–1012.
Linn M. C., Davis E. A. and Bell P., (2006), Internet environments for science education, in Linn M. C., Davis E. A. and Bell P. (ed.), Internet Environments for Science Education, Mahwah, NJ: Lawrence Erlbaum Associates, pp. 1–28.
Mayer R. E., (2014), The Cambridge Handbook of Multimedia Learning, 2nd edn, Cambridge: Cambridge University Press.
Merriam S. B. and Tisdell E. J., (2016), Qualitative research: A guide to design and implementation, 4th edn, San Francisco, CA: Jossey-Bass.
Merrill M. D., (2002), First principles of instruction, Educ. Technol. Res. Dev., 50(3), 43–59.
Nagda B. A., Gregerman S. R., Jonides J., von Hippel W. and Lerner J. S., (1998), Undergraduate student-faculty research partnerships affect student retention, Rev. High. Educ., 22(1), 55–72.
National Academies of Sciences, Engineering, and Medicine (NASEM), (2017), Undergraduate Research Experiences for STEM Students: Successes, Challenges, and Opportunities, Washington, DC: The National Academies Press DOI:10.17226/24622.
National Research Council, (2012), Discipline-based education research: Understanding and improving learning in undergraduate science and engineering, Washington, DC: The National Academies Press DOI:10.17226/13362.
Niu L., Behar-Horenstein L. and Garvan C. W., (2013), Do instructional interventions influence college students’ critical thinking skills? A meta-analysis, Educ. Res. Rev., 9, 114–128.
OECD, (2016), PISA 2015 assessment and analytical framework, Paris: OECD Publishing.
OECD, (2018a), Teaching and learning international survey (TALIS), Paris: OECD Publishing.
OECD, (2018b), The future of education and skills: Education 2030, Paris: OECD Publishing.
Paul R. and Elder L., (2005), A guide for educators to critical thinking competency standards: Standards, principles, performance indicators, and outcomes with a critical thinking master rubric. Dillon Beach, CA: Foundation for Critical Thinking.
Peters E. and Kitsantas A., (2010), The effect of nature of science metacognitive prompts on science students’ content and nature of science knowledge, metacognition, and self-regulatory efficacy, School Sci. Math., 110(8), 382–396.
Popper K., (1959), The logic of scientific discovery, London: Hutchinson.
Quintana C., Zhang M. and Krajcik J., (2005), A framework for supporting metacognitive aspects of online inquiry through software-based scaffolding, Educ. Psychol., 40(4), 235–244.
Rodriguez J. M. G. and Towns M. H., (2018), Modifying laboratory experiments to promote engagement in critical thinking by reframing prelab and postlab questions, J. Chem. Educ., 95(12), 2141–2147.
Russell S. H., Hancock M. P. and McCullough J., (2007), Benefits of undergraduate research experiences, Science, 316(5824), 548–549.
Saavedra A. R. and Opfer V. D., (2012), Learning 21st-century skills requires 21st-century teaching, Phi Delta Kappan, 94(2), 8–13.
Schmidt-McCormack J. A., Judge J. A., Spahr K., Yang E., Pugh R., Karlin A., Speth E. B. and Shultz G. V., (2017), Analysis of the role of a writing-to-learn assignment in student understanding of organic acid–base concepts, Chem. Educ. Res. Pract., 18(3), 353–374.
Schraw G. and Dennison R. S., (1994), Assessing metacognitive awareness, Contemp. Educ. Psychol., 19(4), 460–475.
Seery M. K., Agustian H. Y., Christiansen F. V., Gammelgaard B. and Malm R. H., (2024), 10 guiding principles for learning in the laboratory, Chem. Educ. Res. Pract., 25(2), 383–402.
Shavelson R. J., (2010), On the measurement of competency, Empirical Res. Vocat. Educ. Training, 2, 41–63.
Singer S. R. and Smith K. A., (2013), Discipline-based education research: Understanding and improving learning in undergraduate science and engineering, J. Eng. Educ., 102(4), 468–471.
Sousa D. A., (2011), How the ELL brain learns, Thousand Oaks, CA: Corwin Press.
Sweller J., Ayres P. and Kalyuga S., (2011), Cognitive load theory, New York: Springer.
Trilling B. and Fadel C., (2009), 21st century skills: Learning for life in our times, Hoboken, NJ: John Wiley & Sons.
van Brederode M. E., Zoon S. A. and Meeter, M., (2020), Effects of prompting critical thinking in upper secondary chemistry education, Chem. Educ. Res. Pract., 21(4), 1173–1182.
Voogt J. and Roblin N. P., (2010), 21st century skills, Discussion Paper, Zoetermeer, The Netherlands: Kennisnet.
Vygotsky L. S., (1978), Mind in society: The development of higher psychological processes, Cambridge, MA: Harvard University Press.
Weaver G. C., Russell C. B. and Wink D. J., (2008), Inquiry-based and research-based laboratory pedagogies in undergraduate science, Nat. Chem. Biol., 4(10), 577–580.
Weinstein C. E. and Mayer R. E., (1986), The teaching of learning strategies, in Wittrock C. M. (ed.), Handbook of Research on Teaching, 3rd edn, New York: Macmillan, pp. 315–327.
White B. and Frederiksen J., (2005), A theoretical framework and approach for fostering metacognitive development, Educ. Psychol., 40(4), 211–223.
Zahavi H. and Friedman Y., (2019), The Bologna process: an international higher education regime, Eur. J. High. Educ., 9(1), 23–39 DOI:10.1080/21568235.2018.1561314.
Zhang W. X., Hsu Y. S., Wang C. Y. and Ho Y. T., (2015), Exploring the impacts of cognitive and metacognitive prompting on students’ scientific inquiry practices within an e-learning environment, Int. J. Sci. Educ., 37(3), 529–553.

Click here to see how this site uses Cookies. View our privacy policy here.