Investigating students' expectations and engagement in general and organic chemistry laboratory courses

Elizabeth B. Vaughan; Saraswathi Tummuru; Jack Barbera

doi:10.1039/D4RP00277F

View PDF VersionPrevious ArticleNext Article

DOI: 10.1039/D4RP00277F (Paper) Chem. Educ. Res. Pract., 2025, 26, 271-288

Investigating students' expectations and engagement in general and organic chemistry laboratory courses

Elizabeth B. Vaughan , Saraswathi Tummuru and Jack Barbera *
Department of Chemistry, Portland State University, Portland Oregon, 97207-0751, USA. E-mail: jbarbera@pdx.edu

Received 11th September 2024 , Accepted 26th October 2024

First published on 28th October 2024

Abstract

Students’ expectations for their laboratory coursework are theorized to have an impact on their learning experiences and behaviors, such as engagement. Before students’ expectations and engagement can be explored in different types of undergraduate chemistry laboratory courses, appropriate measures of these constructs must be identified, and evidence of validity and reliability for the data collected with these instruments must be investigated. This study collected evidence related to response process validity, internal structure validity, and single administration reliability for version 2 of the Meaningful Learning in the Laboratory Instrument (MLLIv2) and a measure of student engagement in the undergraduate chemistry laboratory. Additionally, evidence of consequential validity was assessed through measurement invariance, providing support for the comparison of latent means between the groups. Differences in students’ expectations and engagement were found based on course-level (general vs. organic chemistry) and pedagogical style (cookbook vs. inquiry-based).

Introduction

Laboratory courses constitute a significant portion of undergraduate training in the field of chemistry (American Chemical Society and Committee on Professional Training, 2015; Royal Society of Chemistry, 2019). Because of their prominence, chemistry education researchers have invested a great deal of effort toward understanding and improving students’ learning experiences within the teaching laboratory (Bretz, 2019; Grushow et al., 2021). One of the ways that researchers and practitioners have begun to update these courses is by moving away from confirmatory “cookbook” style laboratory activities. There are a variety of pedagogical techniques that can be used to augment or replace cookbook style laboratory activities, including inquiry-based laboratory activities and research-based laboratory activities (Weaver et al., 2008). This paper will explore and, when appropriate, compare students’ affective experiences in cookbook style and argument-driven inquiry (ADI) (Walker et al., 2011; Sampson and Walker, 2012; Walker et al., 2012) laboratory courses. This specific inquiry-based laboratory course style was selected for study as the laboratory curriculum at the authors’ institution was undergoing transformation and incorporating ADI into the general chemistry labs. ADI based laboratory activities are “designed to provide students with an opportunity to develop their own method to generate data, to carry out investigations, use data to answer research questions, write, and be more reflective as they work” (Walker et al., 2011). Due to the iterative and reflective nature of ADI (Walker et al., 2011), in addition to its adoption across countries and educational levels (Fakhriyah et al., 2021), investigations related to ADI are useful to a wide variety of STEM education researchers and practitioners.

As the chemistry education community strives to improve undergraduate lab courses, it is important for researchers and practitioners to investigate how these pedagogical changes impact the student learning experience. One of the ways the community can investigate students’ laboratory learning experiences is through an affective lens. Seery and colleagues discussed four “guiding principles regarding laboratory learning” that were informed by existing educational literature. One of these four principles was that the “[c]onsideration of learners’ emotions, motivations, and expectations is imperative in laboratory settings” (Seery et al., 2019). Additionally, Murphy et al. argued that “affective aspects [of learning in STEM] need to be understood and explicitly addressed as part of any successful strategy to improve engagement in STEM education, and to address the significant gender equity issues associated with STEM” (Murphy et al., 2019). While both of these articles highlight the importance of affective learning experiences in STEM, our current understanding of the way that these constructs manifest in the laboratory setting is limited (Bretz, 2019; Seery et al., 2019). These arguments highlight a need within chemistry education research to improve our current understanding of the affective learning experiences that occur chemistry laboratory settings.

For example, chemistry education literature suggests that students’ expectations for their laboratory coursework likely have an impact on their learning experiences and behaviors, such as engagement (DeKorver and Towns, 2015; Galloway and Bretz, 2015a, b). A 2015 article by Galloway and Bretz argued:

“If students have expectations to be excited to do chemistry and to learn… then the instructor has a responsibility… to design the curriculum in such a way that the student can choose to engage in those experiences” (Galloway and Bretz, 2015b).

While education literature suggests that students’ expectations and engagement are important aspects of their laboratory experiences, these constructs have not yet been widely investigated across laboratory pedagogical styles. Before students’ expectations and engagement can be explored in different types of undergraduate chemistry laboratory courses, appropriate measures of these constructs must be identified, and evidence of validity and reliability for the data collected with these instruments must be investigated. Currently in chemistry education literature, measures of student expectations (Galloway and Bretz, 2015a; Vaughan et al., 2024) and engagement (Smith and Alonso, 2020) do exist for the undergraduate chemistry laboratory environment. The aim of this paper is to investigate the validity and reliability evidence for the data collected with these measures in the populations of interest. As a part of this aim, evidence of consequential validity will be assessed through measurement invariance. If appropriate measurement invariance is found, comparisons will be made between the scores of different respondent groups for each of the two measures, including by course level (general vs. organic chemistry) and instructional style (cookbook vs. ADI).

Measuring students’ laboratory expectations

The Meaningful Learning in the Laboratory Instrument (MLLI) was designed to measure the learning expectations (pre-course) and experiences (post-course) of students enrolled in general and organic chemistry laboratory courses (Galloway and Bretz, 2015a). The instrument operationalized Novak's Theory of Meaningful Learning (Novak, 1993, 2003, 2010), which states that in order for meaningful learning to occur, the cognitive, affective, and psychomotor domains of learning must all be addressed (Novak, 1993). The developers of the MLLI intended for the instrument to focus on students’ cognitive and affective learning expectations. There are 30 items included in the original version of the MLLI, each of which was author-assigned into one of three categories: cognitive (16 items), affective (8 items), and cognitive/affective (6 items). Additionally, each of the items were also identified to be either positively worded (16 items) or negatively worded (14 items). Each of the items included in the MLLI share a common item stem. For the pre-course version of the instrument, the item stem is: ‘When performing experiments in my chemistry laboratory course this semester, I expect…’. After reading the item stem, participants are prompted to respond to a variety of statements, for example, ‘… to learn chemistry that will be useful in my life’. Students respond to each of these statements by moving a slider bar in 1% increments on a 0 (0%, completely disagree) to 100 (100%, completely agree) response scale (Galloway and Bretz, 2015a).

A recent investigation, conducted by the authors of this study, into the validity and reliability evidence for data collected with the MLLI resulted in development of the MLLIv2 (Vaughan et al., 2024). The MLLIv2 contains 16 of the original 30 MLLI items. The item-wording and response scale remain unchanged from the original version of the instrument. General and organic chemistry laboratory data collected with the MLLIv2 showed evidence for a two-factor structure, where the factors “likely represent student expectations that contribute to meaningful learning [9-items] and student expectations that detract from meaningful learning [7-items]” (Vaughan et al., 2024). Evidence of response process validity was collected for each of the 16 items included in the MLLIv2. Additionally, evidence of internal structure validity and single administration reliability has been collected for each of the factors identified in the MLLIv2. This study intends to provide additional insight related to data collected with the MLLIv2 by investigating evidence of measurement invariance.

Measuring students’ laboratory engagement

In 2020, Smith and Alonso published a measure of student engagement in the undergraduate chemistry laboratory. While this instrument was unnamed in the original Smith and Alonso publication, for ease, we will refer to the measure for the remainder of this manuscript as the Undergraduate Chemistry Laboratory Engagement measure (UCL-Engagement). The UCL-Engagement was originally used to investigate differences in student engagement between particular experiments in a general chemistry course. Aspects of engagement measured through the survey include cognitive, behavioral, and emotional engagement, and the survey items focus on activities during the pre-laboratory introduction, laboratory procedures, and data collection (Smith and Alonso, 2020). The measure contains 25 items using a four-point Likert-type response scale ranging from strongly disagree to strongly agree. The 25 items are split into six factors: Cognitive Engagement in Data and Overall, Negative Emotional Engagement in Lab Procedures, Positive Emotional Engagement in Lab Procedures, Behavioral Engagement in Lab Procedures, Cognitive Engagement in Lab Procedures, and Negative Emotional Engagement in Data Collection. By investigating the factor structure of the UCL-Engagement, the original authors found that overall, the survey results “followed the dimensions of engagement established in the literature, within the context of the general chemistry laboratory” (Smith and Alonso, 2020). While there is some evidence of validity and reliability for the data collected with this measure in general chemistry laboratory courses, it has not yet been investigated in other undergraduate laboratory courses.

Validity and reliability

Before the data generated by a measure can be scored and interpreted, evidence supporting the validity and reliability of the data must be gathered (Lewis, 2022; Stains, 2022). Evidence in support of validity suggests that an instrument measures what it is intended to measure, while evidence in support of reliability provides information about the consistency of the data (Arjoon et al., 2013; American Educational Research Association, 2014). While there are various types of validity and reliability evidence that can be assessed for a particular measure. this study will present evidence in support of: (1) response process validity (which is concerned with the ways that participants interpret and respond to the items (Collins, 2003; Arjoon et al., 2013; Deng et al., 2021)), (2) internal structure validity (which is concerned with the relations between items and latent constructs and how these relations match to the hypothetical structure of the construct (Worthington and Whittaker, 2006; Arjoon et al., 2013)), (3) measurement invariance (which is concerned with ensuring that the data structure is supported at the group level (Rocabado et al., 2020)), and (4) single administration reliability (which is concerned with how consistent participants’ responses are to items measuring the same construct (Komperda et al., 2018; Taber, 2018)). Each of these aspects of validity and reliability will be further discussed in the Methods section of this manuscript.

Research questions

As part of a larger project to explore the relations among students’ expectations, buy-in, and engagement in lower-division undergraduate chemistry laboratory courses, this study aims to investigate the validity and reliability evidence for the data collected with the MLLIv2 (as our measure of students’ expectations) and the UCL-Engagement (as our measure of students’ engagement) for use in evaluating these constructs within each course and comparing each construct across course environments. Therefore, the research questions guiding this study are as follows:

1. What evidence of validity and reliability supports interpreting data collected with the MLLIv2 and the UCL-Engagement in the environments of interest to this study?

2. If sufficient evidence of validity and reliability is found to support the interpretation of data collected with each measure, how do students’ laboratory expectations and engagement differ between courses and pedagogical styles?

Methods

The Institutional Review Board (IRB) at Portland State University approved the collection of all data included in this study. Additionally, informed consent was obtained from all student participants. This study intends to present evidence to support the validity and reliability of data collected with both the MLLIv2 and the UCL-Engagement. This study builds on previous work reporting on the collection and analysis of evidence to support response process validity, structural validity, and single administration reliability for data collected with the MLLIv2 (Vaughan et al., 2024). The MLLIv2 data presented in this study constitutes a subset of the data presented in the original publication for the MLLIv2 (Vaughan et al., 2024) and adds our current understanding of the instrument by investigating evidence of consequential validity within the data. While some details related to the collection and analysis of MLLIv2 data will be described here to provide necessary context, the authors refer interested readers to the original publication for additional details.

Data collection

Two forms of data were collected in this study: qualitative response process validity data and quantitative survey data used to assess evidence of structural validity, consequential validity, and single administration reliability. Evidence related to response process validity for the MLLIv2 has previously been reported for the populations of interest in this study (Vaughan et al., 2024). Therefore, qualitative data related to response process validity will only be discussed for the UCL-Engagement. Quantitative survey data will be presented for both the MLLIv2 and UCL-Engagement.

Qualitative data collection

For the UCL-Engagement, response process data were collected in two ways: student interviews and free response survey items. All response process interviews related to this study were conducted at the end of the fall term and beginning of the winter term in the 2020/2021 academic year. Interview participants included students who took a first-term general or organic chemistry laboratory course at Portland State University in the fall of 2020. Students’ interest in participating in interviews and informed consent were collected via Qualtrics. Participants took part in interviews via Zoom, and each interview lasted approximately 45 minutes. During each interview, participants were provided with a copy the UCL-Engagement and were asked to independently respond to each of the items. Upon completion of the survey, interview participants were then asked to read each of the items aloud, state which value they selected on the Likert-type response scale (1–4; strongly disagree to strongly agree), and explain why they selected each value. As needed, follow-up questions were asked by the interviewer to clarify participants’ understanding of the items and/or response reasoning.

Additional response process data was collected for the UCL-Engagement via open-ended survey items. Survey responses were collected from students taking a second-term general or organic chemistry laboratory course at Portland State University in the winter of 2021. All survey items were distributed via Qualtrics in the last two weeks of the laboratory course. Survey participants were recruited via an in-class video announcement that was pre-recoded by the first-author (E. B. V.) and presented by the graduate teaching assistants in each laboratory section. The video was then posted to each laboratory section's learning management site, along with written recruitment materials and a link to the Qualtrics survey. Students were offered a nominal amount of extra credit for accessing the survey. Within the survey, students were randomly presented with eight or nine items (dependent upon item randomization) included in the UCL-Engagement and asked to respond using the four-point Likert scale. After each survey item, students were asked to describe why they selected their response value. For each of the 25 engagement items, 4 interview responses and approximately 100 open ended written responses were collected.

Quantitative data collection

The survey data presented in this study were collected at Portland State University (PSU) and East Carolina University (ECU) (Table 1). PSU participants included students enrolled in the fall 2021 or fall 2022 first term general chemistry laboratory course and students enrolled in the winter 2022 or winter 2023 first term organic chemistry laboratory course. At PSU, the fall 2021 general chemistry laboratory course and all organic chemistry courses employed cookbook style laboratory activities. In the fall of 2022, the general chemistry laboratory course employed argument-driven inquiry (ADI) style laboratory activities. ECU participants included students enrolled in a first semester general or organic chemistry laboratory course in the fall of 2021. While two years of data were collected at PSU, data was only collected at ECU in the 2021/2022 academic year. During that year, all general chemistry laboratory courses taught at ECU employed ADI style laboratory activities, while all organic chemistry courses employed cookbook style laboratory activities.

All quantitative data described in this study were collected via pre-course and post-course Qualtrics surveys. Participants were recruited to complete each survey via a video announcement that was pre-recoded by the first-author (E. B. V.) and presented in each laboratory section. Additionally, the video was posted to each laboratory section's learning management site, along with written recruitment materials and a link to the Qualtrics survey. In the general and organic chemistry teaching laboratories at PSU, the pre- and post-course surveys were treated as assignments, and students were assigned a grade for completing each survey. While each survey was graded for completion, students were able to choose whether or not their data could be used for research the proposes. Alternatively, students at ECU were offered a nominal amount of extra credit for accessing each survey. The pre-course survey was focused on measuring student's pre-course laboratory expectations and was distributed to students during the first two weeks of each laboratory course. In the 2021/2022 academic year, the pre-course survey included all 30 of the original MLLI items, although only the MLLIv2 items will be analyzed in this study. When only a portion of an administered survey is analyzed, it is possible that the function of the analyzed items is related to the items that have been omitted. To ensure that the omitted items are not impacting the data analysis, it is necessary to collect additional evidence of validity and reliability for data collected with the reduced item set via an additional data collection (Hancock et al., 2010; Knekta et al., 2019). Therefore, after the MLLIv2 was established (Vaughan et al., 2024), the pre-course survey data collected in the 2022/2023 academic year only contained the 16 MLLIv2 items. While the MLLI, and therefore the MLLIv2, were originally designed to measure students’ pre-course laboratory expectations and post-course laboratory experiences (Galloway and Bretz, 2015a; Vaughan et al., 2024), the study reported in this manuscript is part of a larger project that is focused on investigating the relations among students’ pre-course laboratory expectations and their post-course buy-in and engagement. Therefore, while the post-course MLLIv2 has been shown to produce valid and reliable data (Vaughan et al., 2024), it will not be presented in this study. The post-course survey was distributed to students during the final two weeks of each laboratory course and was focused investigating students’ laboratory engagement. In order to prevent item-order effects in the survey data, the pre-course and post-course survey items were randomized for each participant. Additionally, a check item was included in each survey to allow for the removal of student responses who were not fully reading the items. Before analysis of the quantitative data began, incomplete surveys, duplicate participant responses, and responses from participants who incorrectly marked the check item were removed.

Table 1 Cleaned student response totals for the MLLIv2 (pre-course) and the UCL-Engagement (post-course). The pedagogical style of each course is denoted as either CB (Cookbook) or ADI (Argument-Driven Inquiry)

Course	Expectation survey (n)	Engagement survey (n)
Portland State University (PSU) 2021/2022
General Chemistry (CB)	192	148
Organic Chemistry (CB)	109	93
Portland State University (PSU) 2022/2023
General Chemistry (ADI)	276	226
Organic Chemistry (CB)	119	108
East Carolina University (ECU) 2021/2022
General Chemistry (ADI)	285	163
Organic Chemistry (CB)	97	94
Totals	1078	832

Analysis methods

Qualitative data analysis

In order to assess evidence of response process validity for data collected with the UCL-Engagement, interview transcripts and written responses to survey items were analyzed to determine if participants were responding to the items as intended. This analysis was conducted by two researchers (E. B. V. and S. T.), who each individually read through all of the participant responses and flagged items that did not appear to be functioning properly. Reasons for flagging items included: (1) the participant's reasoning did not match the value they selected on the response scale, (2) the participant expressed confusion or did not seem to understand the meaning of an item (3) the participant's explanation indicated that their interpretation of an item was different from it's intended meaning, (4) the participant responded that the item was not relevant, and (5) the participant interpreted two or more items as being redundant. Once each of the items had been analyzed by each of the two researchers, they then came together to discuss the clarity and relevance of any flagged items, in order to come to a consensus.

Quantitative data analysis

Quantitative data analysis methods for this study included: investigation of evidence for internal structure validity via confirmatory factor analysis (CFA), investigation of evidence for single administration reliability via the calculation of McDonald's Omega, investigation of evidence to support the validity of group comparisons via measurement invariance, and, when appropriate, group comparisons via structured means modeling (SMM). All negatively worded survey items were reverse coded before data analysis began.

Confirmatory factor analysis

To investigate evidence in support of the structural validity for the data collected with the MLLIv2 and the UCL-Engagement, this study employed confirmatory factor analysis (CFA). All CFAs were completed using the statistical program R (version 4.2.0 (2022-04-22)) with the package lavaan (version 0.6-11). Because the MLLIv2 employs a 0 to 100 item response scale, data collected with the instrument can be treated as continuous. Therefore, a maximum likelihood estimator (MLM) with Satorra–Bentler adjustment and robust standard errors was used (Satorra and Bentler, 1994). Appropriate fit statistics were calculated and interpreted for goodness of the data-model fit, where the guidelines for good fit are CFI & TLI ≥ 0.95, RMSEA ≤ 0.06, SRMR ≤ 0.08; and adequate fit are CFI & TLI ≥ 0.90, RMSEA ≤ 0.08, SRMR ≤ 0.10 (Brown and Cudeck, 1993; Hu and Bentler, 1999; Marsh et al., 2004; Kline, 2005; Hair et al., 2010; Schweizer, 2010). Alternatively, the UCL-Engagement utilizes a 4-point Likert-type response scale, which indicates that data collected with this instrument should be treated as categorical. Therefore, a robust diagonally weighted least squares estimator (WLSMV) would provide the most appropriate treatment of the data. While this type of estimator would ideally be used investigate the factor structure of the dataset, the sample sizes of the comparison groups do not provide sufficient power to use this estimator for measurement invariance testing and latent means comparisons. Therefore, a maximum likelihood estimator with Satorra–Bentler adjustment and robust standard errors, which provides a slightly inferior treatment of the categorical data, was used for all analyses in order to maintain consistency (Li, 2016).

McDonalds omega

McDonald's Omega was used to investigate the single-administration reliability of each unidimensional factor included in the MLLIv2 and the UCL-Engagement. Values for omega range from 0 to 1, where a value of 1 indicates that all of the observed variance can be attributed to the latent construct. Therefore, higher omega values provide evidence to support of the internal consistency of the items (McDonald, 1999).

Measurement invariance

Before the MLLIv2 or the UCL-Engagement can be used to make comparisons between the scores of different respondent groups, measurement invariance must be established to ensure that the data structure is supported at the group level. Establishing measurement invariance requires investigation of sequential, increasingly constrained, steps with sufficient evidence of data-model fit being established at each step (Rocabado et al., 2020). The first step of measurement invariance is configural invariance, which requires simultaneously testing the unconstrained factor model for each group. If evidence of configural invariance was found, then metric invariance was assessed by constraining the factor loadings across groups. Finally, scalar invariance was evaluated by constraining both the factor loadings and the item intercepts across groups. Each step of measurement invariance was evaluated using the suggested recommended cutoffs for changes in data-model fit between subsequent levels: ΔCFI ≤ 0.010, ΔRMSEA ≤ 0.015, and ΔSRMR ≤ 0.030 for metric invariance and ≤0.010 for scalar invariance (Chen, 2007).

Structured means modeling

When appropriate evidence of scalar invariance was found for either the MLLIv2 or the UCL-Engagement, structured means modeling (SMM) was used to compare latent means differences between the two groups. In order to compare latent means, one group is selected as the reference group, while the other group is deemed the comparison group. When the latent mean of the reference group is set to zero, the resulting latent mean of the comparison group is the difference between the two latent means. In this study, the latent means were compared between all general chemistry (assigned as the reference group) and organic chemistry courses aggregated across both institutions. Additionally, the latent means were compared between the cookbook style general chemistry course at PSU (fall 2021), and the ADI style general chemistry course at PSU (fall 2022). The effect size of each analysis was calculated as the absolute difference between latent means divided by the square root of the pooled variance. While this calculation aligns with Cohen's d, it is important to note that latent means are considered to be free from measurement error. Therefore, the value of the effect size should be larger than those seen for measured variables.

Results and discussion

As two measures are presented within this manuscript, the following sections first address the evidence supporting the MLLIv2. Evidence in support of the UCL-Engagement measure follows.

Evidence of validity and reliability for data collected with the MLLIv2

Evidence of structural validity and single administration reliability

Evidence of validity and reliability for data collected with the MLLIv2, including response process validity, structural validity, and single administration reliability, have been previously reported by the authors of this manuscript (Vaughan et al., 2024). Additional evidence in support of structural validity and single administration reliability for the environments of interest to this study can be found in Table 2. Additionally, standardized factor loadings for individual items can be found in the Appendix, Tables 15 and 16.

Table 2 Data-model fit statistics and omega values for the MLLIv2 (n = 1078)

	χ² (df)	p-value	CFI	TLI	RMSEA [90% CI]	SRMR	Omega
Italic values indicate goodness-of-fit values that met the suggested cutoff criteria for adequate fit (CFI & TLI ≥ 0.90, RMSEA ≤ 0.08, SRMR ≤ 0.10). Bold values indicate goodness-of-fit values that met the suggested cutoff criteria for good fit (CFI & TLI ≥ 0.95, RMSEA ≤ 0.06, SRMR ≤ 0.08) (Brown and Cudeck, 1993; Hu and Bentler, 1999; Marsh et al., 2004; Kline, 2005; Hair et al., 2010; Schweizer, 2010).
Complete model
Two-factor	448.567 (103)	<0.001	0.940	0.930	0.056 [0.051–0.061]	0.053	—
Single factors
Positive	145.055 (27)	<0.001	0.940	0.920	0.064 [0.055–0.072]	0.036	0.85
Negative	67.279 (14)	<0.001	0.981	0.972	0.059 [0.048–0.072]	0.022	0.90

Evidence in support of structural validity for data collected with the MLLIv2 was assessed via CFA. The expected two factor structure (positive expectations that contribute to meaningful learning and negative expectations that detract from meaningful learning) was found to have adequate-to-good data-model fit. Once evidence in support of structural validity was established for the full two-factor model, evidence in support of single administration reliability was investigated. First, a single factor CFA was conducted for each of the MLLIv2 factors to evaluate their unidimensionality. Evidence of adequate-to-good data-model fit was found for the positive factor, and evidence of good data-model fit was found for the negative factor. Once each of the two factors were found to be unidimensional, McDonalds omega was calculated for each factor. Values of 0.85 and 0.90 were found for the positive and negative factors, respectively. These values provide evidence to support of the internal consistency of the items that make up each unidimensional factor of the MLLIv2. Combined, the factor structure and single administration reliability support the reporting of positive and negative expectation scores from the data collected in this study.

Measurement invariance testing

Before students’ expectations could be compared by course level (i.e., general chemistry vs. organic chemistry) and pedagogical style (i.e., cookbook vs. ADI), evidence of consequential validity was assessed through measurement invariance testing. Evidence of scalar invariance was found for each of the comparisons, providing support for the comparison of latent group means via SMM. Data-model fit statistics and their differences among each level of measurement invariance testing can be found in Table 3 (Course Level Comparisons) and Table 4 (Pedagogical Style Comparisons).

Table 3 Data-model fit statistics for measurement invariance by course for the MLLIv2

Group	χ ² (df)	p-value	CFI	SRMR	RMSEA
a Italic values indicate goodness-of-fit values that met the suggested cutoff criteria for adequate fit (CFI & TLI ≥ 0.90, RMSEA ≤ 0.08, SRMR ≤ 0.10). Bold values indicate goodness-of-fit values that met the suggested cutoff criteria for good fit (CFI & TLI ≥ 0.95, RMSEA ≤ 0.06, SRMR ≤ 0.08) (Brown and Cudeck, 1993; Hu and Bentler, 1999; Marsh et al., 2004; Kline, 2005; Hair et al., 2010; Schweizer, 2010). b Bold values indicate changes in data-model fit between subsequent levels that met the suggested cutoff criteria (ΔCFI ≤ 0.010, ΔRMSEA ≤ 0.015, and ΔSRMR ≤ 0.030 for metric invariance and ≤0.010 for scalar invariance.) (Chen, 2007).
Baseline Data-Model Fit by Group^a
General chemistry (n = 753)	337.525 (103)	<0.001	0.940	0.055	0.055
Organic chemistry (n = 325)	239.416 (103)	<0.001	0.929	0.064	0.064

Testing level	χ ² (df)	p-value	CFI	SRMR	RMSEA	ΔCFI	ΔSRMR	ΔRMSEA
Invariance Data-Model Fit by Level^b
Configural	578.730 (206)	<0.001	0.936	0.058	0.058	—	—	—
Metric	595.397 (220)	<0.001	0.936	0.060	0.056	0.000	0.002	0.002
Scalar	670.599 (234)	<0.001	0.928	0.062	0.059	−0.008	0.002	0.003

Table 4 Data-model fit statistics for measurement invariance by pedagogical style for the MLLIv2

Group	χ ² (df)	p-value	CFI	SRMR	RMSEA
a Italic values indicate goodness-of-fit values that met the suggested cutoff criteria for adequate fit (CFI & TLI ≥ 0.90, RMSEA ≤ 0.08, SRMR ≤ 0.10). Bold values indicate goodness-of-fit values that met the suggested cutoff criteria for good fit (CFI & TLI ≥ 0.95, RMSEA ≤ 0.06, SRMR ≤ 0.08) (Brown and Cudeck, 1993; Hu and Bentler, 1999; Marsh et al., 2004; Kline, 2005; Hair et al., 2010; Schweizer, 2010). b Bold values indicate changes in data-model fit between subsequent levels that met the suggested cutoff criteria (ΔCFI ≤ 0.010, ΔRMSEA ≤ 0.015, and ΔSRMR ≤ 0.030 for metric invariance and ≤0.010 for scalar invariance.) (Chen, 2007).
Baseline Data-Model Fit by Group^a
Cookbook (n = 148)	204.875 (103)	<0.001	0.890	0.073	0.072
ADI (n = 226)	264.179 (103)	<0.001	0.902	0.072	0.075

Testing level	χ ² (df)	p-value	CFI	SRMR	RMSEA	ΔCFI	ΔSRMR	ΔRMSEA
Invariance Data-Model Fit by Level^b
Configural	468.177 (206)	<0.001	0.891	0.073	0.074	—	—	—
Metric	479.287 (220)	<0.001	0.894	0.071	0.075	0.003	0.002	0.001
Scalar	520.680 (234)	<0.001	0.885	0.077	0.072	−0.009	0.006	−0.003

Latent means comparisons

Structured means modeling was used to conduct latent means comparisons between students enrolled in first-term general chemistry and first-term organic chemistry (Table 5). When students’ positive expectations were compared, no significant difference was found. When students’ negative expectations were compared it was found that organic chemistry students scored lower on the measure with a small effect size. Because all negatively worded items were reverse coded for analysis, this result indicates that organic chemistry students are more likely to report negative expectations than students in general chemistry at the beginning of each course. Because many of the negative expectation items are related to negative affective experiences (e.g., being confused, disorganized, frustrated, or intimidated), this difference may be attributed to the fact that students often perceive organic chemistry to be more difficult than general chemistry (Wasacz, 2010).

Table 5 Latent means comparisons between responses from general chemistry (n = 753, reference group) and organic chemistry (n = 325, comparison group) students on the MLLIv2. Bolded values indicate a significant effect

Measure	Latent mean difference (effect size)
Positive expectations	1.020 (n/a)
Negative expectations	−3.025 (0.14)

In order to control for as many variables as possible, latent means comparisons between pedagogical styles (Table 6) were only conducted in the case where course level (general chemistry) and institution (PSU) were the same between the two groups. When students’ negative expectations were compared by the pedagogical style of their laboratory course, no significant difference was found between the cookbook-style course and ADI-style course. Alternatively, when students’ positive expectations were compared by the pedagogical style of their laboratory course, it was found that students enrolled in the ADI-type course scored higher on the measure than students enrolled in the cookbook-type course with a small effect size. Because students’ expectations were surveyed at the beginning of each course (i.e., before students’ expectations were theoretically impacted by their laboratory experiences) this result is unexpected. Without additional data, it is difficult to identify the reason for this difference. Possible factors which could contribute to this difference include the timing of survey distribution and Covid related influences. While students were required to complete the preliminary expectation survey before their second lab meeting, it is possible that students’ experiences in the first week of the course impacted their responses to the survey items. Because the first week activities differed between the cookbook and ADI-style labs, these activities may have had different impacts on the expectations of the students within each course. For example, on the first day of all ADI-style labs, students were presented with a brief overview of the ADI instructional model (Walker et al., 2011) and the positive impacts that ADI-style laboratory courses have had on student learning experiences (J. Walker et al., 2012; J. P. Walker and Sampson, 2013; Hosbein et al., 2021). No such unpacking of the course pedagogical style was provided in the cookbook-style laboratory courses. Additionally, it is worth noting that student data from the cookbook-style course was collected in Fall 2021. At PSU, this was many students’ first in-person term following the university wide closure caused by the Covid pandemic. It is possible that students in this year were less likely to hold positive expectations for their laboratory experiences due to their unconventional learning experiences in the prior year.

Table 6 Latent means comparisons between responses from cookbook-style (n = 148, reference group) and ADI-style (n = 226, comparison group) students on the MLLIv2. Bolded values indicate a significant effect

Measure	Latent mean difference (effect size)
Positive expectations	4.048 (0.23)
Negative expectations	−0.578 (n/a)

Evidence of validity and reliability for data collected with the UCL-engagement

Response process validity

Before any quantitative data collected with the UCL-Engagement was analyzed, response process data was used to investigate if participants were responding to the items as intended. While the wording of the items included in this instrument are not specific to any one chemistry laboratory course, the instrument was originally designed to investigate general chemistry students’ engagement in individual laboratory activities. For this reason, it was especially important to investigate the ways in which organic chemistry students responded to the survey items. Analysis of response process interviews and free response items indicated that students enrolled in both general chemistry and organic chemistry laboratory courses interpreted and responded to each of the 25 items included in the engagement measure as expected (Table 7).

Table 7 General and organic chemistry student responses which provide evidence of response process validity for data collected with the UCL-Engagement. Response values ranged from strongly disagree (1) to strongly agree (4)

Survey item	Example student response from general chemistry		Example student response from organic chemistry
Survey item	Response value	Reasoning	Response value	Reasoning
I participated fully when using the lab equipment and instruments during the lab procedures.	4	“I did participate fully when using the lab equipment and instruments during the lab procedures. I don't think there was one time I didn't actively and fully participate.”	4	“I did my best to use equipment properly and efficiently while in in-person labs.”
I put a lot of effort into following the lab procedures properly.	4	“I am very careful about following the lab procedures properly, I don't like to or want to mess up the procedure because I didn't do something properly. There isn't a way you can skip a step in the procedure, you have to do everything in order to produce accurate results and an accurate lab.”	4	“I believe I followed the lab procedure since I had to read it several times to write my procedures in lab reports.”
I participated fully in using and/or preparing the chemicals during the lab procedures.	3	I did participate in using and/or preparing the chemicals during lab procedures, but typically we would prepare the chemicals during the lab procedures collectively as a group.”	4	“I did my best to use/prepare chemicals as properly and efficiently as possible while in in-person labs.”
I participated fully when using the glassware during the lab procedures.	4	“I participated fully when managing glassware during the lab process as glassware is mandatory to use in most of the labs.”	4	“I did my best to use and be careful with glassware.”
I felt interested when using the lab equipment and instruments during the lab procedures.	2	“I didn't find using the lab equipment that interesting. It was more hard and confusing than interesting.”	3	“It was interesting to use real scientific equipment to perform experiments.”
I found it interesting to use and/or prepare the chemicals during the lab procedures.	4	“I found it interesting to use some of the chemicals as it led to some interesting reactions as results.”	3	“I [was] very interested to prepare the chemicals.”
I found it interesting to use the glassware during the lab procedures.	3	“I thought that the various flasks and glassware used within the experiments were interesting.”	2	“It was fun to perform experiments, not necessarily to use glassware.”
I felt interested when following the laboratory procedures.	2	“I find procures boring but they are essential to be successful.”	3	“Some of the lab procedures were interesting, especially for topics that I could understand fully.”
I felt nervous about using the glassware during the lab procedures.	1	“I selected strongly disagree because I feel comfortable handling glassware.”	4	“I am very clumsy and prefer not to handle glass or breakable things.”
I felt unsure when following the lab procedures.	3	“Lab procedures were straight forward, however there were some gaps in the manual.”	2	“Some parts were easy to understand while some needed another once over but I never felt like I did not get the concept at all.”
I felt insecure about using the lab equipment and instruments during the lab procedures.	1	“I feel confident that I can use the equipment safely.”	1	“I know how to use a lot of the lab materials.”
I felt nervous when using and/or preparing the chemicals during the lab procedures.	1	“No I did not because … the lab manuals did a great job of explaining [the procedures].	2	“No, I think the instructions [for preparing chemicals] are pretty clear.
I put a lot of effort into understanding the design of the lab procedures in terms of how different parts of the lab procedures were related to one another.	3	“I definitely put effort in to trying to figure out what was going on. However, a lot of the time it didn't make a lot of sense when it was just, “Do this.” type stuff rather than any actual reasoning for why we are doing it.”	4	“I strongly agreed because in order to excel in this class you need to understand how each piece of the lab interacts with each-other.”
I tried hard to understand why specific lab equipment and instruments were used during the lab procedures.	4	“I felt it was necessary in appropriately and fully answering lab questions. It helped me understand the procedures and methodologies used.”	4	“I put strongly agree because understanding the equipment and instruments is key to understanding how and why the lab functions the way it does.”
I put a lot of effort into understanding why specific chemicals were used and/or prepared during the lab procedures.	2	“Not really. I just followed the instructions and didn't think much about the chemicals.”	3	“I did try to do this, but not always successfully. The labs where we needed to draw out the reaction mechanisms really helped me with this aspect.”
I tried hard to understand why specific glassware was used during the lab procedures.	2	“While I can think of some circumstances where certain glassware is better than others It was not a detail I found myself needing to pay attention to for the purposes of these labs.”	1	“I did not pay close attention to the glassware. I felt as if it could all be interchangeable.”
I felt insecure about the data/observations I was collecting during the lab.	3	“Sometimes I was unsure if the data I was collecting was correct because I did not know roughly what data I should be getting.”	2	“Overall I felt fine about data I collected, however I feel in some cases if you are physically doing the lab it makes it easier to understand what data you should be collecting since you are physically going through the set up and taking the time to understand what you should be observing.”
I felt unsure about accurately recording the data from the instruments, equipment, glassware, and my observations.	3	“I often felt unsure of my measurements. Especially when it came to weights because of the [scale] fluctuations between different values.”	2	“Significant figures and keeping track of units is difficult sometimes.”
I felt worried about collecting all the data/observations before the end of the lab.	1	“Running out of time never felt like a concern to me.”	1	“We have more than enough time in the lab class to ask for clarification… I did not feel worried about collecting data before the end of class.”
I put a lot of effort into understanding and evaluating the data/observations as I was collecting them during the lab.	4	“It is important to understand the data collected because it must be evaluated during the lab report. So, understanding the data/observations is important for understanding and for lab report production.”	3	“[Y]ou need to truly understand all the data you collect if you're going to be able to write a report on it. Therefore I did a relatively good job at making sure that at least everything I observed/noted made sense.”
I put a lot of effort into thinking about techniques to help me accurately record the data from the instruments, equipment, glassware, and my observations.	4	“I worked hard to collect correct data because if I do not have accurate data, then the results of my lab will be messed up and it will be much harder later.”	4	“Yes, as I had never utilized some of the equipment before I had to really consider what was occurring and why.”
I put a lot of effort into planning how I would collect all the data/observations before the end of the lab.	4	“I would read the instructions and see how its all done before starting, too annoying to redo all the data [collection]!	4	“Yes because I want to make sure my data is a accurate as possible since they are tied to my grade. I also want to see what mistakes I have made and try to correct them for future labs.”
I tried hard to connect the lab experiment to the theory from lecture.	2	“I didn't try very hard to do this. Sometimes it happened naturally, other times I just followed instructions.”	4	“I do try hard to connect the experiment to theory from lecture because understanding the lab helps me comprehend concepts from lecture that I may not completely understand.”
I put a lot of effort into thinking about real world applications and connections to the lab experiment.	1	“I don't put a lot of thought into chemistry in the real world”	3	“I'm always fascinated by how the real apply these concepts learned, so I usually look up where these experiments used in the real world.”
I tried hard to understand the lab concepts rather than just memorizing them.	4	“Memorizing the material won't do me any good and so I try my best to understand and connect with the ideas.”	1	“I completely memorize over understand as it's not knowledge I need in the real world. This is just knowledge I need to get a degree.”

For example, when asked to respond to the item ‘I felt nervous about using the glassware during the lab procedures,’ one of the general chemistry students selected a Likert Score value of 1 (Strongly Disagree). When asked to explain their reasoning selecting strongly disagree the student said, “I selected strongly disagree because I feel comfortable handling glassware.” Alternatively, one of the organic chemistry students selected a Likert Score value of 4 (Strongly Agree) for the same item. The reasoning that this student provided was: “I am very clumsy and prefer not to handle glass or breakable things.” The clarity in item interpretation combined with the consistency in use of the response scale gathered from this portion of the study provide evidence that these items function well with both the general and organic chemistry students in this study.

Internal structure validity

Evidence in support of structural validity for data collected with the UCL-Engagement was assessed via CFA. The survey developers originally investigated the factor structure of the instrument via exploratory factor analysis (Smith and Alonso, 2020). Based on this analysis, the expected structure of the instrument includes six correlated factors: Cognitive Engagement in Data and Overall, Negative Emotional Engagement in Lab Procedures, Positive Emotional Engagement in Lab Procedures, Behavioral Engagement in Lab Procedures, Cognitive Engagement in Lab Procedures, and Negative Emotional Engagement in Data Collection (Fig. 1). While evidence of adequate-to-good data-model fit was found for the six-factor correlated model when it was investigated via CFA using the complete aggregated set of student data (n = 832, Table 8), correlations among two pairs of factors were found to be greater than 0.85.


	Fig. 1 The expected factor structure of the UCL-Engagement. Red and blue factor correlations indicate factors which may not be measuring unique components of engagement.

Table 8 Data-model fit statistics for the UCL-Engagement (n = 832)

Model	χ ² (df)	p-value	CFI	TLI	SRMR	RMSEA [90% CI]
Italic values indicate goodness-of-fit values that met the suggested cutoff criteria for adequate fit (CFI & TLI ≥ 0.90, RMSEA ≤ 0.08, SRMR ≤ 0.10). Bold values indicate goodness-of-fit values that met the suggested cutoff criteria for good fit (CFI & TLI ≥ 0.95, RMSEA ≤ 0.06, SRMR ≤ 0.08) (Brown and Cudeck, 1993; Hu and Bentler, 1999; Marsh et al., 2004; Kline, 2005; Hair et al., 2010; Schweizer, 2010).
Six-factor correlated	752.311 (260)	<0.001	0.939	0.930	0.052	0.048 [0.044–0.051]

Correlation values close to one suggest that the two correlated factors may be measuring the same construct. The two highly correlated factor pairs were: (1) Cognitive Engagement in Data Overall and Cognitive Engagement in Lab Procedures (0.860), and (2) Negative Emotional Engagement in Lab Procedures and Negative Emotional Engagement in Data Collection (0.875). Because each of these highly correlated pairs share a theoretical component of engagement within the pair (i.e., cognitive engagement and emotional engagement), it was decided to explore a four-factor correlated model for the UCL-Engagement, in which each pair of highly correlated factors was collapsed into a single factor (Fig. 2). Within the four-factor correlated model, the largest correlation (0.680) was found to be between cognitive engagement and positive emotional engagement. While this correlation is not large enough to raise concern that these two factors are measuring the same latent construct, it is an interesting outcome. Most education researchers agree that the three dimensions of engagement (i.e., cognitive, emotional, and behavioral) are highly interrelated (Fredricks et al., 2004, 2016). Specifically, Sinatra and colleagues argue that cognitive engagement can be particularly difficult to define and measure because “many of the dimensions of cognitive engagement overlap with dimensions of both behavioral engagement (i.e., effort) and emotional engagement” (Sinatra et al., 2015). One possible explanation for the moderate correlation between cognitive and emotional engagement seen in this study is the idea that positive and negative emotions can be either activating or deactivating in regard to engagement (Pekrun, 2006; Sinatra et al., 2015; Naibert et al., 2022). As Sinatra and colleagues argue, “[t]heoretically, both negative and positive emotions can facilitate activation of attention and engagement; however, research to date has shown an advantage for positive emotions over negative in promoting engagement” (Broughton et al., 2013; Heddy and Sinatra, 2013; Sinatra et al., 2015). Ultimately, evidence of adequate-to-good data-model fit was found for the four-factor correlated model (Table 9). Additionally, standardized factor loadings for individual items can be found in the Appendix, Tables 17–19.


	Fig. 2 The condensed factor structure of the UCL-Engagement.

Table 9 Data-model fit statistics for the UCL-Engagement (n = 832)

Model	χ ² (df)	p-value	CFI	TLI	SRMR	RMSEA [90% CI]
Italic values indicate goodness-of-fit values that met the suggested cutoff criteria for adequate fit (CFI & TLI ≥ 0.90, RMSEA ≤ 0.08, SRMR ≤ 0.10). Bold values indicate goodness-of-fit values that met the suggested cutoff criteria for good fit (CFI & TLI ≥ 0.95, RMSEA ≤ 0.06, SRMR ≤ 0.08) (Brown and Cudeck, 1993; Hu and Bentler, 1999; Marsh et al., 2004; Kline, 2005; Hair et al., 2010; Schweizer, 2010).
Four-factor correlated	873.316 (269)	<0.001	0.925	0.917	0.054	0.052 [0.049–0.055]

The values of the data-model fit statistics investigated in this study were found to be approximately equivalent between the six-factor model and the four-factor model. Therefore, the decision was made to move forward with the four-factor correlated model. It is worth noting that appropriate values of data-model fit were found for both the six-factor model and the four-factor model, despite the use of a maximum likelihood estimator. If the sample sizes were appropriate for all analyses, a more suitable estimator for categorical data, such as a robust diagonally weighted least squares estimator, would further strengthen the data-model fit.

Single administration reliability

Evidence in support of single administration reliability for data collected with the UCL-Engagement was assessed via CFA and calculation of McDonalds omega (Table 10). McDonalds omega was then calculated for each of the four factors, and acceptable values of single administration reliability (>0.78) were found, these values provide evidence to support of the internal consistency of the items that make up each unidimensional factor included in the UCL-Engagement. Combined, the factor structure and single administration reliability support the reporting of four separate engagement scores from the data collected in this study.

Table 10 Data-model fit statistics and omega values for the individual factors included in the UCL-Engagement (n = 832)

Factor	χ ² (df)	p-value	CFI	TLI	SRMR	RMSEA [90% CI]	Omega
Italic values indicate goodness-of-fit values that met the suggested cutoff criteria for adequate fit (CFI & TLI ≥ 0.90, RMSEA ≤ 0.08, SRMR ≤ 0.10). Bold values indicate goodness-of-fit values that met the suggested cutoff criteria for good fit (CFI & TLI ≥ 0.95, RMSEA ≤ 0.06, SRMR ≤ 0.08) (Brown and Cudeck, 1993; Hu and Bentler, 1999; Marsh et al., 2004; Kline, 2005; Hair et al., 2010; Schweizer, 2010).
Cognitive engagement	287.648 (35)	<0.001	0.900	0.871	0.052	0.093 [0.085–0.102]	0.89
Negative emotional engagement	109.441 (14)	<0.001	0.945	0.917	0.037	0.091 [0.077–0.104]	0.87
Positive emotional engagement	2.122 (2)	0.346	1.000	1.000	0.006	0.009 [0.000–0.067]	0.84
Behavioral engagement	2.716 (2)	0.257	0.999	0.997	0.009	0.040 [0.000–0.145]	0.78

Measurement invariance testing

Evidence of scalar invariance was found for the group comparisons by course level (i.e., general chemistry vs. organic chemistry) and pedagogical style (i.e., cookbook vs. ADI), providing support for the comparison of latent group means via SMM. Data-model fit statistics and their differences among each level of measurement invariance testing can be found in Table 11 (Course Level Comparisons) and Table 12 (Pedagogical Style Comparisons).

Table 11 Data-model fit statistics for measurement invariance by course for the UCL-Engagement

Group	χ ² (df)	p-value	CFI	SRMR	RMSEA
a Italic values indicate goodness-of-fit values that met the suggested cutoff criteria for adequate fit (CFI & TLI ≥ 0.90, RMSEA ≤ 0.08, SRMR ≤ 0.10). Bold values indicate goodness-of-fit values that met the suggested cutoff criteria for good fit (CFI & TLI ≥ 0.95, RMSEA ≤ 0.06, SRMR ≤ 0.08) (Brown and Cudeck, 1993; Hu and Bentler, 1999; Marsh et al., 2004; Kline, 2005; Hair et al., 2010; Schweizer, 2010). b Bold values indicate changes in data-model fit between subsequent levels that met the suggested cutoff criteria (ΔCFI ≤ 0.010, ΔRMSEA ≤ 0.015, and ΔSRMR ≤ 0.030 for metric invariance and ≤ 0.010 for scalar invariance.) (Chen, 2007).
Baseline Data-Model Fit by Group^a
General chemistry (n = 537)	693.887 (269)	<0.001	0.923	0.058	0.054
Organic chemistry (n = 295)	470.250 (269)	<0.001	0.922	0.057	0.050

Testing level	χ ² (df)	p-value	CFI	SRMR	RMSEA	ΔCFI	ΔSRMR	ΔRMSEA
Invariance Data-Model Fit by Level^b
Configural	1164.954 (538)	<0.001	0.923	0.058	0.053	—	—	—
Metric	1212.450 (559)	<0.001	0.919	0.062	0.053	−0.004	0.004	0.000
Scalar	1309.828 (580)	<0.001	0.911	0.063	0.055	−0.008	0.001	0.002

Table 12 Data-model fit statistics for measurement invariance by pedagogical style for the UCL-Engagement

Group	χ ² (df)	p-value	CFI	SRMR	RMSEA
a Italic values indicate goodness-of-fit values that met the suggested cutoff criteria for adequate fit (CFI & TLI ≥ 0.90, RMSEA ≤ 0.08, SRMR ≤ 0.10). Bold values indicate goodness-of-fit values that met the suggested cutoff criteria for good fit (CFI & TLI ≥ 0.95, RMSEA ≤ 0.06, SRMR ≤ 0.08) (Brown and Cudeck, 1993; Hu and Bentler, 1999; Marsh et al., 2004; Kline, 2005; Hair et al., 2010; Schweizer, 2010). b Bold values indicate changes in data-model fit between subsequent levels that met the suggested cutoff criteria (ΔCFI ≤ 0.010, ΔRMSEA ≤ 0.015, and ΔSRMR ≤ 0.030 for metric invariance and ≤ 0.010 for scalar invariance.) (Chen, 2007).
Baseline Data-Model Fit by Group^a
Cookbook (n = 148)	477.230 (269)	<0.001	0.880	0.081	0.072
ADI (n = 226)	429.307 (269)	<0.001	0.925	0.071	0.051

Testing level	χ ² (df)	p-value	CFI	SRMR	RMSEA	ΔCFI	ΔSRMR	ΔRMSEA
Invariance Data-Model Fit by Level^b
Configural	909.447 (538)	<0.001	0.904	0.061	0.075	—	—	—
Metric	934.982 (559)	<0.001	0.902	0.066	0.080	−0.002	0.005	0.005
Scalar	999.760 (580)	<0.001	0.893	0.062	0.082	−0.009	−0.004	0.002

Latent means comparisons

Structured means modeling was used to compare the latent means of students enrolled in first-term general chemistry and first-term organic chemistry (Table 13). No significant difference was found when Cognitive Engagement was compared between groups. When students’ Negative Emotional Engagement, Positive Emotional Engagement, and Behavioral Engagement were compared, it was found that organic chemistry students scored higher on each of these measures with a small effect size. These results indicate that organic chemistry students are more likely to report positive emotional engagement and behavioral engagement than students in general chemistry at the end of each course. Because the negatively worded items were reverse coded for analysis, these results also indicated that organic chemistry students are less likely to report negative emotional engagement than students in general chemistry at the end of each course. Some of the differences between general and organic chemistry students’ reported levels of engagement may be due to the fact that organic chemistry students have already completed a year of general chemistry lab and therefore already have some experience in the laboratory. Additionally, not all students who take general chemistry continue on to organic chemistry, therefore, these difference may be related to these students’ interest levels and/or previous success in chemistry courses.

Table 13 Latent means comparisons between responses from general chemistry (n = 537, reference group) and organic chemistry (n = 295, comparison group) students on the UCL-Engagement. Bolded values indicate a significant effect

Measure	Latent mean difference (effect size)
Cognitive engagement in data and overall	0.071 (n/a)
Negative emotional engagement in lab procedures	0.126 (0.20)
Positive emotional engagement in lab procedures	0.170 (0.27)
Behavioral engagement in lab procedures	0.191 (0.37)

As described for the MLLIv2, latent means comparisons between pedagogical styles (Table 14) were only conducted in the case where course level (general chemistry) and institution (PSU) were the same between the two groups. When students’ Cognitive Engagement, Negative Emotional Engagement, and Behavioral Engagement were compared between pedagogical groups, no significant differences were found. When students’ Positive Emotional Engagement was compared by the pedagogical style of their laboratory course, it was found that students enrolled in the ADI-type course scored higher on the measure than students enrolled in the cookbook-type course with a small effect size. One possible reason for this difference could be attributed to the design of the ADI-style laboratory activities. Within ADI-style activities, students are provided “with an opportunity to develop their own method[s] to generate data” (Walker et al., 2011). This level of autonomy in data collection may lead to students enrolled in ADI-type courses reporting higher levels of positive emotional engagement in lab procedures.

Table 14 Latent means comparisons between responses from cookbook-style (n = 148, reference group) and ADI-style (n = 226, comparison group) students on the UCL-Engagement. Bolded values indicate a significant effect

Measure	Latent mean difference (effect size)
Cognitive engagement in data and overall	−0.003 (n/a)
Negative emotional engagement in lab procedures	0.005 (n/a)
Positive emotional engagement in lab procedures	0.116 (0.18)
Behavioral engagement in lab procedures	0.056 (n/a)

Conclusions

The purpose of this study was to investigate the validity and reliability evidence for data collected with the MLLIv2, a measure of students’ expectations, and the UCL-Engagement, a measure of student engagement, in the undergraduate chemistry laboratory. This evidence was then used to support the evaluation of these constructs within different course levels (general and organic chemistry) and pedagogical styles (cookbook and ADI) and the comparison of each construct across course environments. To address research question one, What evidence of validity and reliability supports interpreting data collected with the MLLIv2 and the UCL-Engagement in the environments of interest to this study?, evidence related to response process validity, internal structure validity, and single administration reliability was investigated for each of the two measures. Appropriate evidence was found to support each of the types of validity and reliability for data collected with each of the two measures of interest. Additionally, evidence of consequential validity was assessed through measurement invariance. Evidence of scalar invariance was found for each of the comparisons (general vs. organic chemistry and cookbook vs. ADI), providing support for the comparison of latent means between the groups.

Because appropriate evidence of scalar invariance was found, research question two was addressed: If sufficient evidence of validity and reliability is found to support the interpretation of data collected with each measure, how do students’ laboratory expectations and engagement differ between courses and pedagogical styles? Differences in students’ expectations and engagement were found based on course-level (general vs. organic chemistry) and pedagogical style (cookbook vs. ADI). Overall, organic chemistry students were found to have higher negative expectations at the beginning of their laboratory course than general chemistry students. Organic chemistry students were also found to have higher positive emotional engagement and behavioral engagement, and lower negative emotional engagement than general chemistry students. Additionally, students enrolled in an ADI-style general chemistry course were found to have higher positive emotional engagement than students enrolled in a cookbook style general chemistry course. Because students’ expectations are theorized to impact their levels of engagement (Galloway and Bretz, 2015b), this difference should be interpreted with caution, as students’ incoming levels of positive expectations were also found to be different between these two groups. Work currently in review by authors E. B. V. and J. B. explores the relations between these constructs.

Limitations

It is possible that the results of this study were impacted by the Covid-19 shutdown. For example, the qualitative portion of the study was completed in virtual laboratory courses which used online laboratory simulations (during the Covid-19 shutdown). While this delivery method is atypical for the laboratory courses of interest to this study, the qualitative data collection was focused on students’ interpretation and understanding of survey items related to expectations and engagement, not their reported expectations or engagement. Therefore, this data collection still provided useful insight to this study, despite the atypical course delivery method. The quantitative data collection was conducted during the first and second academic years that students returned to in-person courses. The authors acknowledge that students’ expectations and engagement may have been influenced by the rapid transitions in learning format, especially in-person organic chemistry students who completed their general chemistry laboratory course online, and therefore had little or no laboratory experience when entering organic chemistry.

Implications for research

This project has a variety of implications for chemistry education researchers. First and foremost, this study provides additional evidence in support of the validity and reliability of data collected with two self-report surveys related to students’ laboratory expectations and laboratory engagement (i.e., MLLIv2 and UCL-Engagement, respectively). These survey instruments may be of use to chemistry education researchers who are interested in further investigating laboratory courses. That said, researchers who are interested in using these measures in novel environments are encouraged to collect additional evidence of validity and reliability as appropriate for their research questions. For example, because these survey instruments were originally developed and subsequently adapted and/or tested in general and organic laboratory environments in the United States, the existing items may not be appropriate for international chemistry laboratory courses, which may vary in purpose and/or approach. Researchers who are interested in using either of these instruments should critically consider the content of the items to ensure that these measures are appropriate for their environments.

Second, this study is novel in its comparison of students’ expectations and engagement in across pedagogical styles (cookbook vs. ADI). That said, the results reported in this study are limited to single institution. Therefore, additional data collections across a variety of institutions could add further evidence to this line of research exploring the potential differences in students’ affective learning experiences by pedagogical style. Additionally, it is well known that argument-driven inquiry is not the only pedagogical alternative to cookbook style laboratory activities; there are wide variety of inquiry-based and research-based laboratory activities that have been reported across STEM education literature (Weaver et al., 2008; Beck et al., 2014; Agustian et al., 2022). Using the MLLIv2 and the UCL-Engagement to investigate laboratory courses which employ different pedagogical styles could provide education researchers and practitioners with a better understanding of students’ experiences within a range of learning environments, and how they compare to one another.

Finally, the validity and reliability evidence collected in this study sets the groundwork for investigating the relations between students’ expectations and engagement, and possible reasons for differences between groups. However, before a student can engage in a laboratory course, they must make the decision to do so. This decision to engage has been described and studied in STEM education literature as ‘buy-in’ (Cavanagh et al., 2016; Shaw et al., 2019; Wang et al., 2021). While the relations among students’ expectations, buy-in, and engagement have been theorized, chemistry education researchers have not yet meaningfully investigated the relations among these constructs in the laboratory environment. That said, a related study has been published investigating the relations among students’ perspectives (trust in instructors and growth mindset), buy-in, and engagement as they relate to evidence-based teaching practices (EBPs) in undergraduate STEM lecture courses (Wang et al., 2021). This study resulted in “a framework that depicts how trust and growth mindset increase buy-in to EBPs, which positively influences student outcomes such as engagement in self-regulatory learning strategies…” The existence of these relations among student perspectives, buy-in, and engagement in STEM lecture courses provides further support that similar relations may exist among these constructs in the laboratory environment.

Implications for teaching

This study provides additional evidence of validly and reliability for data collected with a measure of students’ laboratory expectations (i.e., MLLIv2) and a measure of students’ laboratory engagement (i.e., UCL-Engagement). Using these tools to collect feedback related to students’ affective learning experiences could provide instructors with an improved understanding of students’ perspectives within their own laboratory courses. This insight is especially useful for instructors who are interested in gauging the impact of pedagogical changes that they make to their own laboratory courses.

Additionally, the group comparisons conducted in this study found that students enrolled in an ADI-style general chemistry course reported higher positive expectations and positive emotional engagement in lab procedures. Although these results should be interpreted with caution because differences in students’ incoming expectations were unexpected, these results further support the extensive body of research (Weaver et al., 2008; Sampson and Walker, 2012; Walker et al., 2012; Beck et al., 2014) that suggests that inquiry-style laboratory courses generally provide students with better learning experiences than cookbook-style laboratory courses.

Data availability

The data are not publicly available as approval for this study did not include permission for sharing data publicly.

Conflicts of interest

There are no conflicts to declare.

Appendix

Table 15 Standardized factor loadings for the MLLIv2 (n = 1078)

Item stem: when performing experiments in my chemistry laboratory course this semester, I expect…		Standardized factor loadings
Positive items
1	To learn chemistry that will be useful in my life.	0.622
2	To make decisions about what data to collect.	0.505
3	To experience moments of insight.	0.657
4	To be excited to do chemistry.	0.662
5	To develop confidence in the laboratory.	0.649
6	To interpret my data beyond only doing calculations.	0.512
7	To use my observations to understand the behavior of atoms and molecules.	0.679
8	To be intrigued by the instruments.	0.559
9	To learn chemistry that will be useful in my life.	0.681
Negative items
10	To feel unsure about the purpose of the procedures.	0.729
11	To be confused about how the instruments work.	0.753
12	To feel disorganized.	0.694
13	Tto be confused about the underlying concepts.	0.811
14	To be frustrated.	0.747
15	Tto feel intimidated.	0.749
16	To be confused about what my data mean.	0.806

Table 16 Standardized factor loadings for the 16 items of each individual factor of the MLLIv2 (n = 1078)

Standardized factor loadings
Positive		Negative
1	0.621	10	0.727
2	0.509	11	0.755
3	0.662	12	0.691
4	0.655	13	0.812
5	0.642	14	0.745
6	0.515	15	0.750
7	0.680	16	0.807
8	0.564
9	0.681

Table 17 Standardized factor loadings for the 25-item six-factor structure of the UCL-Engagement (n = 832)

Standardized factor loadings
		Standardized factor loadings
Cognitive engagement in data and overall
1	I put a lot of effort into understanding and evaluating the data/observations as I was collecting them during the lab	0.715
2	I put a lot of effort into thinking about techniques to help me accurately record the data from the instruments, equipment, glassware, and my observations	0.742
3	I put a lot of effort into planning how I would collect all the data/observations before the end of the lab	0.680
4	I tried hard to connect the lab experiment to the theory from lecture	0.654
5	I put a lot of effort into thinking about real world applications and connections to the lab experiment	0.708
6	I tried hard to understand the lab concepts rather than just memorizing them	0.661
Cognitive engagement in lab procedures
7	I put a lot of effort into understanding the design of the lab procedures in terms of how different parts of the lab procedures were related to one another	0.715
8	I tried hard to understand why specific lab equipment and instruments were used during the lab procedures	0.668
9	I put a lot of effort into understanding why specific chemicals were used and/or prepared during the lab procedures	0.729
10	I tried hard to understand why specific glassware was used during the lab procedures	0.598
Negative emotional engagement in lab procedures
11	I felt nervous about using the glassware during the lab procedures	0.643
12	I felt unsure when following the lab procedures	0.600
13	I felt insecure about using the lab equipment and instruments during the lab procedures	0.810
14	I felt nervous when using and/or preparing the chemicals during the lab procedures	0.742
Negative emotional engagement in data collection
15	I felt insecure about the data/observations I was collecting during the lab	0.821
16	I felt unsure about accurately recording the data from the instruments, equipment, glassware, and my observations	0.724
17	I felt worried about collecting all the data/observations before the end of the lab	0.664
Positive emotional engagement in lab procedures
18	I put a lot of effort into understanding the design of the lab procedures in terms of how different parts of the lab procedures were related to one another	0.911
19	I tried hard to understand why specific lab equipment and instruments were used during the lab procedures	0.644
20	I put a lot of effort into understanding why specific chemicals were used and/or prepared during the lab procedures	0.885
21	I participated fully when using the glassware during the lab procedures	0.910
Behavioral engagement in lab procedures
22	I put a lot of effort into understanding the design of the lab procedures in terms of how different parts of the lab procedures were related to one another	0.715
23	I tried hard to understand why specific lab equipment and instruments were used during the lab procedures	0.668
24	I put a lot of effort into understanding why specific chemicals were used and/or prepared during the lab procedures	0.729
25	I tried hard to understand why specific glassware was used during the lab procedures	0.598

Table 18 Standardized factor loadings for the 25-item four-factor structure of the UCL-Engagement (n = 832)

Standardized factor loadings
		Training
Cognitive engagement
1	I put a lot of effort into understanding and evaluating the data/observations as I was collecting them during the lab	0.715
2	I put a lot of effort into thinking about techniques to help me accurately record the data from the instruments, equipment, glassware, and my observations	0.724
3	I put a lot of effort into planning how I would collect all the data/observations before the end of the lab	0.658
4	I tried hard to connect the lab experiment to the theory from lecture	0.641
5	I put a lot of effort into thinking about real world applications and connections to the lab experiment	0.698
6	I tried hard to understand the lab concepts rather than just memorizing them	0.651
7	I put a lot of effort into understanding the design of the lab procedures in terms of how different parts of the lab procedures were related to one another	0.666
8	I tried hard to understand why specific lab equipment and instruments were used during the lab procedures	0.588
9	I put a lot of effort into understanding why specific chemicals were used and/or prepared during the lab procedures	0.724
10	I tried hard to understand why specific glassware was used during the lab procedures	0.524
Negative emotional engagement
11	I felt nervous about using the glassware during the lab procedures	0.616
12	I felt unsure when following the lab procedures	0.619
13	I felt insecure about using the lab equipment and instruments during the lab procedures	0.777
14	I felt nervous when using and/or preparing the chemicals during the lab procedures	0.706
15	I felt insecure about the data/observations I was collecting during the lab	0.773
16	I felt unsure about accurately recording the data from the instruments, equipment, glassware, and my observations	0.698
17	I felt worried about collecting all the data/observations before the end of the lab	0.646
Positive emotional engagement
18	I put a lot of effort into understanding the design of the lab procedures in terms of how different parts of the lab procedures were related to one another	0.722
19	I tried hard to understand why specific lab equipment and instruments were used during the lab procedures	0.802
20	I put a lot of effort into understanding why specific chemicals were used and/or prepared during the lab procedures	0.740
21	I participated fully when using the glassware during the lab procedures	0.745
Behavioral engagement
22	I put a lot of effort into understanding the design of the lab procedures in terms of how different parts of the lab procedures were related to one another	0.911
23	I tried hard to understand why specific lab equipment and instruments were used during the lab procedures	0.643
24	I put a lot of effort into understanding why specific chemicals were used and/or prepared during the lab procedures	0.885
25	I tried hard to understand why specific glassware was used during the lab procedures	0.911

Table 19 Standardized factor loadings for the 25 items of each individual factor of the UCL-Engagement (n = 832)

Standardized factor loadings
Cognitive engagement		Negative emotional engagement		Positive emotional engagement		Behavioral engagement
1	0.710	11	0.618	18	0.718	22	0.912
2	0.719	12	0.615	19	0.798	23	0.637
3	0.656	13	0.777	20	0.752	24	0.886
4	0.648	14	0.708	21	0.741	25	0.911
5	0.702	15	0.773
6	0.654	16	0.698
7	0.660	17	0.646
8	0.587
9	0.727
10	0.525

References

Agustian H. Y., Finne L. T., Jørgensen J. T., Pedersen M. I., Christiansen F. V., Gammelgaard B. and Nielsen J. A., (2022), Learning outcomes of university chemistry teaching in laboratories: a systematic review of empirical literature, Rev. Educ., 10(2), e3360 DOI:10.1002/rev3.3360.
American Chemical Society and Committee on Professional Training, (2015). Undergraduate Professional Education in Chemistry: ACS Guidelines and Evaluation Procedures for Bachelor's Degree Programs, American Chemical Society.
American Educational Research Association, (2014), Standards for Educational and Psychological Testing, American Educational Research Association (AERA), American Educational Research Association.
Arjoon J. A., Xu X. and Lewis J. E., (2013), Understanding the State of the Art for Measurement in Chemistry Education Research: Examining the Psychometric Evidence, J. Chem. Educ., 90(5), 536–545 DOI:10.1021/ed3002013.
Beck C., Butler A. and Burke da Silva K., (2014), Promoting Inquiry-Based Teaching in Laboratory Courses: Are We Meeting the Grade? CBE—Life Sci. Educ., 13(3), 444–452 DOI:10.1187/cbe.13-12-0245.
Bretz S. L., (2019), Evidence for the Importance of Laboratory Courses, J. Chem. Educ., 96(2), 193–195 DOI:10.1021/acs.jchemed.8b00874.
Broughton S. H., Sinatra G. M. and Nussbaum E. M., (2013), “Pluto Has Been a Planet My Whole Life!” Emotions, Attitudes, and Conceptual Change in Elementary Students’ Learning about Pluto's Reclassification, Res. Sci. Educ., 43(2), 529–550 DOI:10.1007/s11165-011-9274-x.
Brown M. W. and Cudeck R., (1993), Alternative ways of assessing model fit, in Bollen K. A. and Long J. S. (ed.), Testing structural equation models, Sage Publications, Inc., pp. 136–162.
Cavanagh A. J., Aragón O. R., Chen X., Couch B. A., Durham M. F., Bobrownicki A., Hanauer D. I. and Graham M. J., (2016), Student Buy-In to Active Learning in a College Science Course, CBE—Life Sci. Educ., 15(4), ar76 DOI:10.1187/cbe.16-07-0212.
Chen F. F., (2007), Sensitivity of Goodness of Fit Indexes to Lack of Measurement Invariance, Struct. Equation Modeling: A Multidisciplinary J., 14(3), 464–504 DOI:10.1080/10705510701301834.
Collins D., (2003), Pretesting survey instruments: an overview of cognitive methods, Quality Life Res., 12(3), 229–238 DOI:10.1023/A:1023254226592.
DeKorver B. K. and Towns M. H., (2015), General Chemistry Students’ Goals for Chemistry Laboratory Coursework, J. Chem. Educ., 92(12), 2031–2037 DOI:10.1021/acs.jchemed.5b00463.
Deng J. M., Streja N. and Flynn A. B., (2021), Response Process Validity Evidence in Chemistry Education Research, J. Chem. Educ., 98(12), 3656–3666 DOI:10.1021/acs.jchemed.1c00749.
Fakhriyah F., Rusilowati A., Wiyanto W. and Susilaningsih E., (2021), Argument-Driven Inquiry Learning Model: A Systematic Review, Int. J. Res. Educ. Sci., 767–784 DOI:10.46328/ijres.2001.
Fredricks J. A., Blumenfeld P. C. and Paris A. H., (2004), School Engagement: Potential of the Concept, State of the Evidence, Rev. Educ. Res., 74(1), 59–109 DOI:10.3102/00346543074001059.
Fredricks J. A., Filsecker M. and Lawson M. A., (2016), Student engagement, context, and adjustment: addressing definitional, measurement, and methodological issues, Learn. Instruct., 43, 1–4 DOI:10.1016/j.learninstruc.2016.02.002.
Galloway K. R. and Bretz S. L., (2015a), Development of an Assessment Tool To Measure Students’ Meaningful Learning in the Undergraduate Chemistry Laboratory, J. Chem. Educ., 92(7), 1149–1158 DOI:10.1021/ed500881y.
Galloway K. R. and Bretz S. L., (2015b), Measuring Meaningful Learning in the Undergraduate General Chemistry and Organic Chemistry Laboratories: A Longitudinal Study, J. Chem. Educ., 92(12), 2019–2030 DOI:10.1021/acs.jchemed.5b00754.
Grushow A., Hunnicutt S., Muñiz M., Reisner B. A., Schaertel S. and Whitnell R., (2021), Journal of Chemical Education Call for Papers: Special Issue on New Visions for Teaching Chemistry Laboratory, J. Chem. Educ., 98(11), 3409–3411 DOI:10.1021/acs.jchemed.1c01000.
Hair J. F., Black W. C., Babin B. J. and Anderson R. E., (2010), Multivariate Data Analysis, 7th edn, Pearson.
Hancock G. R., Hancock G. R., Mueller R. O., Stapleton L. M. and Mueller R. O., (2010), The Reviewer's Guide to Quantitative Methods in the Social Sciences, Routledge DOI:10.4324/9780203861554.
Heddy B. C. and Sinatra G. M., (2013), Transforming Misconceptions: Using Transformative Experience to Promote Positive Affect and Conceptual Change in Students Learning About Biological Evolution, Sci. Educ., 97(5), 723–744 DOI:10.1002/sce.21072.
Hosbein K. N., Lower M. A. and Walker J. P., (2021), Tracking Student Argumentation Skills across General Chemistry through Argument-Driven Inquiry Using the Assessment of Scientific Argumentation in the Classroom Observation Protocol, J. Chem. Educ., 98(6), 1875–1887 DOI:10.1021/acs.jchemed.0c01225.
Hu L. and Bentler P. M., (1999), Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Struct. Equation Model., 6(1), 1–55 DOI:10.1080/10705519909540118.
Kline R. B., (2005), Principles and practice of structural equation modeling, Guilford Press.
Knekta E., Runyon C. and Eddy S., (2019), One Size Doesn’t Fit All: Using Factor Analysis to Gather Validity Evidence When Using Surveys in Your Research, CBE—Life Sci. Educ., 18(1), rm1 DOI:10.1187/cbe.18-04-0064.
Komperda R., Pentecost T. C. and Barbera J., (2018), Moving beyond Alpha: A Primer on Alternative Sources of Single-Administration Reliability Evidence for Quantitative Chemistry Education Research, J. Chem. Educ., 95(9), 1477–1491 DOI:10.1021/acs.jchemed.8b00220.
Lewis S. E., (2022), Considerations on validity for studies using quantitative data in chemistry education research and practice, Chem. Educ. Res. Pract., 23(4), 764–767 10.1039/D2RP90009B.
Li C.-H., (2016), Confirmatory factor analysis with ordinal data: comparing robust maximum likelihood and diagonally weighted least squares, Behav. Res. Methods, 48(3), 936–949 DOI:10.3758/s13428-015-0619-7.
Marsh H. W., Hau K.-T. and Wen Z., (2004), In Search of Golden Rules: Comment on Hypothesis-Testing Approaches to Setting Cutoff Values for Fit Indexes and Dangers in Overgeneralizing Hu and Bentler's (1999) Findings, Struct. Equation Model., 11(3), 320–341 DOI:10.1207/s15328007sem1103_2.
McDonald R. P., (1999), Test theory: A unified treatment, L. Erlbaum Associates.
Murphy S., MacDonald A., Wang C. A. and Danaia L., (2019), Towards an Understanding of STEM Engagement: A Review of the Literature on Motivation and Academic Emotions, Canadian J. Sci., Math. Technol. Educ., 19(3), 304–320 DOI:10.1007/s42330-019-00054-w.
Naibert N., Vaughan E. B., Lamberson K. M. and Barbera J., (2022), Exploring Student Perceptions of Behavioral, Cognitive, and Emotional Engagement at the Activity Level in General Chemistry, J. Chem. Educ., 99(3), 1358–1367 DOI:10.1021/acs.jchemed.1c01051.
Novak J. D., (1993), Human constructivism: a unification of psychological and epistemological phenomena in meaning making, Int. J. Personal Construct Psychol., 6(2), 167–193 DOI:10.1080/08936039308404338.
Novak J. D., (2003), The Promise of New Ideas and New Technology for Improving Teaching and Learning, Cell Biol. Educ., 2, 122–132 DOI:10.1187/cbe.02-11-0059.
Novak J., (2010), Learning, Creating, and Using Knowledge: concept maps as facilitative tools in schools and corporations, J. E-Learning Knowledge Soc., 6(3), 21–30.
Pekrun R., (2006), The Control-Value Theory of Achievement Emotions: Assumptions, Corollaries, and Implications for Educational Research and Practice, Educ. Psychol. Rev., 18(4), 315–341 DOI:10.1007/s10648-006-9029-9.
Rocabado G. A., Komperda R., Lewis J. E. and Barbera J., (2020), Addressing diversity and inclusion through group comparisons: a primer on measurement invariance testing, Chem. Educ. Res. Pract., 21(3), 969–988 10.1039/D0RP00025F.
Royal Society of Chemistry, (2019), Accreditation of degree programmes, Royal Society of Chemistry.
Sampson V. and Walker J. P., (2012), Argument-Driven Inquiry as a Way to Help Undergraduate Students Write to Learn by Learning to Write in Chemistry, Int. J. Sci. Educ., 34(10), 1443–1485 DOI:10.1080/09500693.2012.667581.
Satorra A. and Bentler P. M., (1994), Corrections to test statistics and standard errors in covariance structure analysis, Latent variables analysis: Applications for developmental research, Sage Publications, Inc.
Schweizer K., (2010), Some guidelines concerning the modeling of traits and abilities in test construction, Eur. J. Psychol. Assess., 26(1), 1–2 DOI:10.1027/1015-5759/a000001.
Seery M. K., Agustian H. Y. and Zhang X., (2019), A Framework for Learning in the Chemistry Laboratory, Isr. J. Chem., 59(6–7), 546–553 DOI:10.1002/ijch.201800093.
Shaw T. J., Yang S., Nash T. R., Pigg R. M. and Grim J. M., (2019), Knowing is half the battle: assessments of both student perception and performance are necessary to successfully evaluate curricular transformation, PLoS One, 14(1), e0210030 DOI:10.1371/journal.pone.0210030.
Sinatra G. M., Heddy B. C. and Lombardi D., (2015), The Challenges of Defining and Measuring Student Engagement in Science, Educ. Psychol., 50(1), 1–13 DOI:10.1080/00461520.2014.1002924.
Smith K. C. and Alonso V., (2020), Measuring student engagement in the undergraduate general chemistry laboratory, Chem. Educ. Res. Pract., 21(1), 399–411 10.1039/C8RP00167G.
Stains M., (2022), Keeping Up-to-Date with Chemical Education Research Standards, J. Chem. Educ., 99(6), 2213–2216 DOI:10.1021/acs.jchemed.2c00488.
Taber K. S., (2018), The Use of Cronbach's Alpha When Developing and Reporting Research Instruments in Science Education, Res. Sci. Educ., 48(6), 1273–1296 DOI:10.1007/s11165-016-9602-2.
Vaughan E. B., Montoya-Cowan A. and Barbera J., (2024), Investigating evidence in support of validity and reliability for data collected with the meaningful learning in the laboratory instrument (MLLI), Chem. Educ. Res. Pract., 25(1), 313–326 10.1039/D3RP00121K.
Walker J. P. and Sampson V., (2013), Argument-Driven Inquiry: Using the Laboratory To Improve Undergraduates’ Science Writing Skills through Meaningful Science Writing, Peer-Review, and Revision, J. Chem. Educ., 90(10), 1269–1274 DOI:10.1021/ed300656p.
Walker J., Sampson V., Grooms J., Anderson B. and Zimmerman C., (2012), Argument-Driven Inquiry in undergraduate chemistry labs: the impact on students’ conceptual understanding, argument skills, and attitudes toward science, J. College Sci. Teach., 41, 82.
Walker J. P., Sampson V. and Zimmerman C. O., (2011), Argument-Driven Inquiry: An Introduction to a New Instructional Model for Use in Undergraduate Chemistry Labs, J. Chem. Educ., 88(8), 1048–1056 DOI:10.1021/ed100622h.
Wang C., Cavanagh A. J., Bauer M., Reeves P. M., Gill J. C., Chen X., Hanauer D. I. and Graham M. J., (2021), A Framework of College Student Buy-in to Evidence-Based Teaching Practices in STEM: The Roles of Trust and Growth Mindset, CBE—Life Sci. Educ., 20(4), ar54 DOI:10.1187/cbe.20-08-0185.
Wasacz J. T., (2010), Organic chemistry preconceptions and their correlation to student success.
Weaver G. C., Russell C. B. and Wink D. J., (2008), Inquiry-based and research-based laboratory pedagogies in undergraduate science, Nat. Chem. Biol., 4(10), 577–580 DOI:10.1038/nchembio1008-577.
Worthington R. L. and Whittaker T. A., (2006), Scale Development Research: A Content Analysis and Recommendations for Best Practices, Counseling Psychol., 34(6), 806–838 DOI:10.1177/0011000006288127.

Click here to see how this site uses Cookies. View our privacy policy here.