Investigation of evidence for the internal structure of a modified science motivation questionnaire II (mSMQ II): a failed attempt to improve instrument functioning across course, subject, and wording variants

Regis Komperda a, Kathryn N. Hosbein b, Michael M. Phillips c and Jack Barbera *b
aDepartment of Chemistry & Biochemistry, Center for Research in Mathematics and Science Education, San Diego State University, USA
bDepartment of Chemistry, Portland State University, USA. E-mail: jbarbera@pdx.edu
cSchool of Psychological Sciences, University of Northern Colorado, USA

Received 29th January 2020 , Accepted 7th April 2020

First published on 8th April 2020


Abstract

The Science Motivation Questionnaire II (SMQ II) was developed to measure aspects of student motivation in college-level science courses. Items on the SMQ II are structured such that the word ‘science’ can be replaced with any discipline title (e.g., chemistry) to produce a discipline-specific measure of student motivation. Since its original development as the Science Motivation Questionnaire and subsequent refinement, the SMQ II and its discipline-specific variants have been used in a number of science education studies. However, many studies have failed to produce acceptable validity evidence for their data based on the proposed internal structure of the instrument. This study investigated if modifications could be made to the SMQ II such that it produces consistent structural evidence across its use in various forms. A modified SMQ II (mSMQ II) was tested with wording variants (‘science’ and ‘biology’ or ‘chemistry’) in general biology and in preparatory and general chemistry courses at several institutions. Exploratory and confirmatory factor analysis were used to cull problematic items and evaluate the structure of the data based on the relations posited by the SMQ II developers. While extensive revisions resulted in acceptable data model fit for the five-factor structural models in most course and wording conditions, significant issues arose for the single-factor scales. Therefore, potential users are cautioned about the utility of the SMQ II or its variants to support the evaluation of classroom practices. A reflective review of the theoretical underpinnings of the SMQ II scales call into question the original framing of the scales and suggests potential alternatives for consideration.


Introduction

The discipline-based education research community continues to recognize the importance of including the affective domain when studying student outcomes in science courses (National Research Council, 2012; Fortus, 2014). Motivation is one aspect of the affective domain frequently investigated in the field of chemistry education research (Black and Deci, 2000; Zusho et al., 2003; Chan and Bauer, 2014; González and Paoloni, 2015; Salta and Koulougliotis, 2015; Ferrell et al., 2016; Ardura and Pérez-Bitrián, 2018; Austin et al., 2018; Liu et al., 2018) as well as in other science fields (Simpkins et al., 2006; Glynn et al., 2009; Olimpo et al., 2016; Schumm and Bogner, 2016; Young et al., 2018; Zeyer, 2018). One commonality among all these studies is their use of self-report survey instruments for measuring student motivation.

When developing instruments to measure unobservable (i.e., latent) traits such as motivation, it is necessary to align the items on the instrument with a theoretical framework for the latent variable (American Educational Research Association et al., 2014). In the case of motivation, the literature contains multiple theoretical frameworks including social-cognitive theory (Bandura, 1993), self-determination theory (Ryan and Deci, 2000), and expectancy-value theory (Wigfield and Eccles, 2000), among others. One instrument combining multiple motivation theories is the Science Motivation Questionnaire (SMQ; Glynn and Koballa, 2006; Glynn et al., 2009), which was later revised by the developers into the Science Motivation Questionnaire II (SMQ II; Glynn et al., 2011).

Theoretical framework of the SMQ II

Glynn and colleagues’ work (e.g., 2006; 2007; 2009; 2011) on the Science Motivation Questionnaire (SMQ and SMQ II) used self-regulation as the overarching framework of their five-factor motivation instrument which includes the individual motivation scales of intrinsic, extrinsic [grade and career], self-determination, self-efficacy, relevance, and anxiety. Yet, many of the scales do not address the unique aspects of how students self-regulate their thoughts, actions, environment, and motivation to achieve their academic goals (Zimmerman, 2000). As Eccles and Wigfield (2002) note upon review, self-regulated learning tends to include three important aspects: self-observation while engaged with an academic task, self-judgment regarding one's performance, and self-evaluation or reactions to one's performance after a task has been completed. Given the absence of items aligned with these aspects, a reflective evaluation is needed regarding the conceptual/theoretical underpinnings for the SMQ II in addition to the psychometric characteristics of the scales.

Within the SMQ II (Glynn et al., 2011), the only items that specifically address aspects of self-regulation are on the self-determination scale (factor 3; p. 1167), as these items focus on study preparation and effort exertion for studying science (e.g., “I put enough effort into learning science” or “I prepare well for science tests and labs”). Additionally, the scale itself might not align with a framework of self-determination, particularly if, as described by the authors (p. 1161) this definition arises from the self-determination theory (SDT; Deci and Ryan, 2000), which centers more on the three psychological needs of autonomy, competence, and relatedness. When these three needs are met (to varying degrees), an individual's actions are more self-determined, which can influence regulatory styles (as described in SDT) across the extrinsic-intrinsic continuum. Self-determined actions are growth-oriented and are not overly impacted by external influences, which is how self-determined actions are related to the distinction between intrinsic and extrinsic motivation. This continuum contrasts with the dichotomy implied by the separation of the intrinsic motivation scale from the extrinsic scale in the SMQ and later the extrinsic focused scales of grade and career motivation in the SMQ II. An additional concern regarding the theoretical framework of the SMQ II scales is the inclusion of self-determination as a distinct construct. Though Ryan and Deci describe their theory of motivation as self-determination theory (SDT), self-determination tends to describe motivated action when one's psychological needs are being met (Ryan and Deci, 2000) and can range on a continuum from extrinsic to intrinsic motivation rather than being a distinct construct.

The primary support for Glynn and colleagues’ (2011) proposed theoretical framework for the SMQ II comes from analyses of the internal structure of the instrument using both exploratory and confirmatory factor analysis. Factor analysis statistical techniques allow researchers to determine if instrument data are aligned with a hypothesized internal structure, a form of model testing critical to the practice of science (Grosslight et al., 1991). In the case of the SMQ II, this takes the form of a model containing five distinct yet related aspects of motivation in a five-factor model (intrinsic, extrinsic [grade and career], self-determination, self-efficacy). Results from exploratory factor analysis in prior work (Glynn et al., 2009) showed that extrinsic motivation consisted of two separate but related components: grade and career motivation. When extrinsic motivation was split into these two components, the five-factor motivation model was shown to provide adequate fit to samples of students within major and non-major biology courses (Glynn et al., 2011). While these two types of extrinsic motivation were supported through factor analysis, these results alone do not provide strong theoretical support for the new constructs. If subsequent data are found to poorly fit this model, or if the aspects of motivation measured by the SMQ II are not found to be distinct factors within additional samples, this provides an opportunity to examine potential issues with the underlying model of motivation or the items developed to measure it and further refine the items and/or model. This model testing should occur at each use of the instrument to support the validity of the data collected and ensure the results can be interpreted in a meaningful way (American Educational Research Association et al., 2014).

As assessment instruments are commonly used within the chemistry education community to provide insight to the impacts of classroom practice, it is imperative that the data produced by an assessment instrument shows evidence of validity and reliability (Arjoon et al., 2013). Furthermore, if assessment items used to measure a relevant trait, such as self-efficacy, are not shown to align with a theory of self-efficacy, the interpretation of the results may not be reflective of a learning environment's support of, or impact on, the trait. Therefore, interpreting data from assessment instruments that do not show evidence of validity and reliability can lead to misinformed judgements about classroom practice. As the SMQ II is purported as an assessment tool that can be used across a range of courses and disciplines (Glynn et al., 2011), data from its different wording variants and applications must be equally supported by evidence.

Prior studies of SMQ II internal structure

The SMQ II is a revision of the SMQ by the original developers based on both student interviews and factor analysis of 1,450 student responses to original and revised items (see Table S1, ESI for the trajectory of the scales and item modifications; Glynn et al., 2009, 2011). Though this level of development and testing is commendable, it is a common misconception that once an instrument is published in the literature it is “validated” for all uses regardless of changes in population, context, or wording (Barbera and VandenPlas, 2011). The proliferation of variations of the SMQ II in the STEM education literature provides opportunities to examine how frequently evidence is found to support the hypothesized five-factor structure of the SMQ II.

As the SMQ II developers intentionally designed the instrument such that the word ‘science’ could be replaced with any other specific discipline (Glynn et al., 2011) many versions of the SMQ II can be found in the literature using wording such as biology, chemistry, organic chemistry, histology, math, nanotechnology, pharmacy, physics, and technology (Tosun, 2013; Campos-Sánchez et al., 2014; Riccitelli, 2015; Salta and Koulougliotis, 2015; Srisawasdi, 2015; Hibbard et al., 2016; Kassaee, 2016; Kwon, 2016; Mahrou et al., 2016; Olimpo et al., 2016; Cleveland et al., 2017; Reece and Butler, 2017; Yamamura and Takehira, 2017; Ardura and Pérez-Bitrián, 2018; Austin et al., 2018; Cagande and Jugar, 2018; Komperda et al., 2018a; Young et al., 2018). The popularity of the SMQ II also extends to discipline-based education researchers who have translated it from English into at least seven other languages (Tosun, 2013; Campos-Sánchez et al., 2014; Salta and Koulougliotis, 2015; Srisawasdi, 2015; Schumm and Bogner, 2016; Shin et al., 2017; Yamamura and Takehira, 2017; Ardura and Pérez-Bitrián, 2018; Vasques et al., 2018).

Investigation of the internal structure of the SMQ II has utilized analysis techniques both with and without a priori models of how the items should be related. Analyses without an a priori model generally fall under the classification of exploratory factor analysis (EFA), although of the most commonly used techniques with the SMQ II, principal components analysis and principal axis factoring, the former is frequently described as a data reduction technique, not a factoring approach (Henson and Roberts, 2006). As described earlier, the theoretical framework of the SMQ II describes motivation as a “multicomponent construct” composed of “types and attributes of motivation” (Glynn et al., 2011, 1161) including intrinsic motivation, self-determination, self-efficacy, grade motivation, and career motivation. Of the seven studies using EFA techniques, four identified five factors aligned with the proposed theoretical framework (Glynn et al., 2011; Kwon, 2016; Schmid and Bogner, 2017; Ardura and Pérez-Bitrián, 2018). After initially failing to find a five-factor solution, Austin et al., (2018) removed a majority of the intrinsic items resulting in a combined intrinsic/career factor described as ‘relevance.’ Yamamura and Takehira (2017) also obtained a four-factor solution after removing 12 items due to low association with a factor, including all the self-efficacy items. The last study only utilized three scales from the SMQ II (self-efficacy, self-determination, and career) which resulted in a three-factor solution (Schumm and Bogner, 2016). These studies (Table S2, ESI) provide some support that the items are aligned with their intended factors but moving to a confirmatory framework provides the ability to test data against a previously specified model and restricts items to only associating with a single factor.

The SMQ II developers specified a correlated five-factor model with five items belonging to each factor (Glynn et al., 2011). Therefore, data collected from administration of the SMQ II can be tested against this a priori model and evaluated with typical data-model fit criteria (Hu and Bentler, 1999). These data-model fit criteria take the form of examining the value of various fit indices and comparing them to suggested cutoff values, generally a CFI and/or TLI at or above 0.95, RMESA at or below 0.06, and SRMR at or below 0.08. Direct comparison of data-model fit across studies with the SMQ II is difficult due to variations in the wording of the items as either science or discipline-specific, and editing or removal of the items themselves (see Table S3, ESI for a list of studies, variations, and fit values). However, in general the Hu and Bentler cutoff criteria were not met by most studies (Glynn et al., 2011; Salta and Koulougliotis, 2015; Kwon, 2016; Ardura and Pérez-Bitrián, 2018; Komperda, et al., 2018a; Vasques et al., 2018) unless the instrument was modified by removing items or entire scales (Tosun, 2013; Yamamura and Takehira, 2017).

Additional limitations for the direct comparison of CFA results across studies are due to the low frequency with which information is reported about the estimator chosen for the factor analysis (Table S3, ESI) and justification that the properties of the data supported the use of the chosen estimator. For example, when descriptive statistics are reported for SMQ II items or scales it is frequently found that responses to the grade motivation items are much higher (more positive) than the other scales (Glynn et al., 2011; Salta and Koulougliotis, 2015; Hibbard et al., 2016; Ardura and Pérez-Bitrián, 2018; Austin et al., 2018; Komperda et al., 2018a). This could indicate potential issues with nonnormality of data or collapsing of the five-point response scale such that it essentially functions only as a two- or three-point scale for some items. In any of these cases it would be recommended to move from the typical maximum likelihood (ML) estimator to a robust estimator (MLR) that provides a correction for non-normality or a mean and variance adjusted weighted least squares (WLSMV) estimator for categorical data (Finney and DiStefano, 2013). Studies employing the WLSMV estimator with the full 25-item SMQ II have found slightly better data-model fit than those employing the ML estimator (Komperda et al., 2018a). The inconsistency among the CFA results suggests the need to examine causes for this variation, particularly as they related to alignment between the theoretical framework of the SMQ II, the individual items, and student responses. Providing evidence of these alignments is paramount to ensuring that the instrument has solid theoretical support and can further be used to inform instructional practices.

Research goals

In light of inconsistent evidence for the internal structure of the SMQ II in the literature, this research investigated modifications that could potentially improve the functioning of the instrument, as measured within a factor analysis framework. Prior work by Komperda, Hosbein, and Barbera with the original 25-item version of the SMQ II (2018) had identified both overall poor data-model fit and differences in data-model fit across wordings (science and chemistry) and course types (introductory chemistry and general chemistry) but did not explore possible explanations for the model misfit or alternative items to improve model fit. In order for instructors or other researchers to be able to utilize the SMQ II for comparisons of motivational impacts, the instrument needs to show similar psychometric characteristics across the varied measurement contexts. Therefore, the overall goal of the current research was to determine if modifications could be made in such a way that both improved overall data-model fit and minimized differences in data-model fit across different measurement contexts, that is, in different types of courses with both the science and discipline-specific wording. These outcomes would align with the original goals of the SMQ II developers to have evidence for the functioning of discipline-specific versions of the instrument. This work was driven by two primary research questions.

1. What are potential reasons for the inconsistent validity evidence based on the internal structure of the SMQ II as proposed by the SMQ II developers?

2. If these issues are addressed, will a modified SMQ II that is aligned with the theoretical framework proposed by the SMQ II developers have acceptable internal structure across different wordings and course contexts?

These research questions were addressed in two phases with independent samples. The goal of phase one was to identify modifications that could be made to the SMQ II to improve the functioning of both the science and discipline-specific wordings when administered to undergraduate students. The result of phase one was the development of a modified Science Motivation Questionnaire II (mSMQ II). The goal of phase two was to assess the functioning of the mSMQ II in a new sample of undergraduate students enrolled in science courses in order to evaluate whether the modifications resulted in improved data-model fit relative to the SMQ II. A factor analysis framework was chosen for this research to align with previous work done by the instrument developers (Glynn et al., 2011) and to provide a point of comparison to the previously discussed SMQ II studies. The methods and results from each phase are reported sequentially. Within each phase, human subjects IRB approval was obtained from Portland State University and appropriate participant consent was gathered from the study populations. Any incentives, if provided, are noted within each population description.

Phase one

Methods

In light of the limited evidence provided in the literature for the structural validity of SMQ II scores, the first phase of this research investigated potential threats to validity due to interpretation of the response scale and individual items. As the internal structure of an instrument can be impacted by issues with the response scale and/or the interpretation of items by the target population, three sources of information were utilized in phase one: expert reviews, student response processes, and best practices in survey wording. Expert reviews were solicited using an online survey of experts in educational research and measurement, asking them to determine if the five-point frequency-based response scale on the original SMQ II, ranging from never to always, was appropriately aligned with the wording of the items themselves. Interviews on a subset of SMQ II items were conducted with a convenience sample of general chemistry students to better understand their response processes related to both the frequency-based response scale and their interpretation of item wording. The research team used the results of the expert response scale survey, the student interviews, and best practices in survey wording (Krosnick and Presser, 2010) to make modifications to the SMQ II, resulting in the mSMQ II.
Participants and data collection.
Expert response scale survey. A total of 12 experts with experience in discipline-based education research (n = 8) or educational measurement (n = 4) were invited to and participated in an online survey about the type of response scale that was best suited to the wording of each SMQ II item. Experts were blinded to the fact that the items were from the SMQ II and that the original response scale was frequency-based. Experts were asked to sort items based on their perception of whether each item could best be answered with a frequency-based scale (never–rarely–sometimes–usually–always), a Likert-type scale (disagree–somewhat disagree–neither disagree nor agree–somewhat agree–agree), or that both scales would be equally appropriate. The survey was conducted in the same term as the student interviews and the experts were not compensated for their time.
Student interviews. Student interviews on the SMQ II items were conducted as part of a larger project investigating how wording of survey items affects student responses. Students were recruited during the winter 2017 term from both on-sequence and off-sequence large-enrollment general chemistry courses for science majors at Portland State University. A total of 40 student interviews were conducted, representing 5% of the overall course enrollment. The students participating in the interviews represent only a subset of students ultimately involved in the larger psychometric analyses conducted in phase 2 of this project. However, these students are representative of the population in which the SMQ II has been utilized, which is with university-level students in their first or second year of a majors-level science curriculum.

During the interview, students were provided a paper copy of the SMQ II and randomly presented with either the science or chemistry wording of the items. Students read all of the items silently and circled their responses on the original frequency-based scale. Students were asked to explain their responses to a subset of items, which were identified by the research team as having potentially good or poor fits to the response scale and/or a hypothesized item category based on a prior study (Komperda et al., 2018a). Next, students were asked to go over the entire survey again and explain if any item responses would change if the other wording (science or chemistry) were substituted. Specific demographics were not collected for the interview participants and each was provided a $10 gift card for their time.

Analysis.
Expert response scale survey analysis. Responses to the expert survey were summarized as percentages to identify which items experts perceived to be more aligned with the original five-point frequency-based response scale (never–rarely–sometimes–usually–always), a five-point Likert-type (disagree–somewhat disagree–neither disagree nor agree–somewhat agree–agree) response scale, or where the response scales were seen as equally appropriate for the wording of the item.
Student interview coding. The audio files of the interviews were transcribed by a commercial transcription service. All analysis was conducted on the interview transcripts, led by author Hosbein. Cognitive interview analysis (Beatty, 2004) was used by author Hosbein to establish a coding rubric. This entailed carefully reading each transcript and documenting the language students used in explaining their responses. These reviews were used to establish initial codes regarding students’ scale-based language. Example codes are provided in Table S4 (ESI). All coding was completed using QSR NVivo 8. Author Hosbein and a secondary (non-author) researcher independently coded all 40 transcripts using the initial rubric. After discussing discrepancies in coding, the rubric was revised and additional transcripts were independently re-coded. The final rubric had an acceptable Cohen's kappa value of 0.8 (Hallgren, 2012) and was used by author Hosbein to re-code the remaining transcripts.

Results and discussion

Expert response scale survey. Expert preference for a response scale was determined by simple majority for one of the two response scales. Experts showed a preference for the original frequency-based response scale for only five of the 25 items, whereas for 12 items experts preferred the Likert-type response scale. For the remaining items, experts either showed no majority preference for either response scale, or felt that either scale would be acceptable. Overall, self-determination labeled items were most likely to be judged to fit best on the frequency-based scale whereas items labeled as intrinsic, self-efficacy, grade, and career motivation were more likely to be judged to align with a Likert-type scale. Detailed results are provided in Table S5 (ESI).
Student interviews. Student responses were coded as frequency-based when a student explicitly used the frequency-based scale words (never, rarely, sometimes, usually, always) or other time-based words (often, typically, never really, and most of the time) in their response. Students also used the frequency scale in more of a quantity-based way with responses more aligned with “how much” rather than “how often.” A quantity code was assigned when a student used language that involved quantity in their response that was not specifically frequency-based or made a comparison between two things that implied a quantity. Language that reflected the quantity code included the words, “really”, “a lot”, or “it depends.” If both frequency-based and quantity-based language was used to explain a response, the response was coded as frequency-based. Examples of each code are provided in the Table S4 (ESI).

Though students were not asked to fully describe their response process for all SMQ II items during the interviews, the results from the 12 items students responded to show similarities between language used by students and the preferred response scale identified by the experts. The three intrinsic labeled items explored in the interviews (I2, I3, and I5; wording given in Table 1) were coded as having frequency-based responses in less than 40% of instances (32%, 25%, and 38%, respectively), which aligned with expert preferences not to use a frequency-based response scale for these items. Similar responses were seen for the self-efficacy item SE2 in which frequency-based codes were used with 32% of responses and for the three career items explored in the interviews (C1, C2, and C3) with less than a quarter of students using frequency-based language (15%, 22%, and 15%, respectively). Only two items explored in the interviews, a self-efficacy item (SE1) and a grade item (G4) showed a majority use of frequency-based language (65% and 57% of codes, respectively), which is also consistent with the experts’ evaluation of being more aligned with a frequency-based response scale than a Likert-type scale.

Table 1 Scale, label, and wording for SMQ II and mSMQ II items where [[thin space (1/6-em)]] was replaced with either biology, chemistry, or science
Scale Item SMQ II wording Itema mSMQ II wording
a Items removed after exploratory factor analysis are indicated with asterisks.
Intrinsic I1 The science I learn is relevant to my life I1* The [[thin space (1/6-em)]] I learn is relevant to my life
I1a* The [[thin space (1/6-em)]] I learn is relevant to the world around me
I2 Learning science is interesting I2 Learning [[thin space (1/6-em)]] is interesting
I3 Learning science makes my life more meaningful I3a Learning [[thin space (1/6-em)]] helps me understand the world around me
I3b Learning [[thin space (1/6-em)]] increases my appreciation of the world around me
I4 I am curious about discoveries in science I4 I am curious about discoveries in [[thin space (1/6-em)]]
I5 I enjoy learning science I5 I enjoy learning [[thin space (1/6-em)]]
Self-determination SD1 I put enough effort into learning science SD1a I put effort into learning [[thin space (1/6-em)]] well
SD2 I use strategies to learn science well SD2* I use strategies to learn [[thin space (1/6-em)]] well
SD3 I spend a lot of time learning science SD3 I spend a lot of time learning [[thin space (1/6-em)]]
SD4 I prepare well for science tests and labs SD4a* I prepare well for [[thin space (1/6-em)]] tests
SD5 I study hard to learn science SD5 I study hard to learn [[thin space (1/6-em)]]
SD5a* I use a lot of mental energy learning [[thin space (1/6-em)]]
Self-efficacy SE1 I am confident I will do well on science tests SE1 I am confident I will do well on [[thin space (1/6-em)]] tests
SE2 I am confident I will do well on science labs and projects SE2a I am confident I will do well on [[thin space (1/6-em)]] assignments
SE3 I believe I can master science knowledge and skills SE3a* I believe I can master [[thin space (1/6-em)]] knowledge
SE4 I believe I can earn a grade of “A” in science SE4a I believe I can earn the grade I want in [[thin space (1/6-em)]]
SE5 I am sure I can understand science SE5* I am sure I can understand [[thin space (1/6-em)]]
Grade G1 I like to do better than the other students on science tests G1a* I like to do better than the other students in [[thin space (1/6-em)]]
G2 Getting a good science grade is important to me G2 Getting a good [[thin space (1/6-em)]] grade is important to me
G3 It is important that get an “A” in science G3a It is important that I earn the grade I want in [[thin space (1/6-em)]]
G4 I think about the grade I will get in science G4 I think about the grade I will get in [[thin space (1/6-em)]]
G4a* I worry about my [[thin space (1/6-em)]] grade
G5 Scoring high on science tests and labs matters to me G5a Scoring high on [[thin space (1/6-em)]] tests matters to me
Career C1 Learning science will help me get a good job C1 Learning [[thin space (1/6-em)]] will help me get a good job
C2 Knowing science will give me a career advantage C2 Knowing [[thin space (1/6-em)]] will give me a career advantage
C3 Understanding science will benefit me in my career C3 Understanding [[thin space (1/6-em)]] will benefit me in my career
C4 My career will involve science C4 My career will involve [[thin space (1/6-em)]]
C5 I will use science problem-solving skills in my career C5* I will use [[thin space (1/6-em)]] problem-solving skills in my career


When asking students to explain their responses to some of the SMQ II items, recurrent issues with students' interpretation arose. For example, when explaining their responses to the intrinsic item (I1) “The science I learn is relevant to my life”, 17% of students referenced their career as the reason that science (or chemistry) was relevant to their lives. This was unexpected since the SMQ II contains a separate set of items intended to address students' career motivation. Similar overlap with thinking about future careers was seen in responses to the grade item (G4) “I think about the grade I will get in science” students cited pressure from graduate or professional school for the reason they think about their grade.

When responding to the intrinsic item (I3) “Learning science makes my life more meaningful,” students were unsure of what “meaningful” meant in that context. The most frequent way students described “meaningful” was that learning science (or chemistry) helped them to better understand the world around them. Students also expressed confusion about other vague phrases such as “relevant” (I1) and “think about” (G4) suggesting that the wording of these items could be made clearer to improve response process validity.

Another commonly observed response was for students to ignore a portion of an item when formulating their response. An example of this is with the self-efficacy item (SE2) “I am confident I will do well on science labs and projects.” Some students explicitly mentioned only focusing on the lab portion of the question because their course did not involve projects while other students responding to this question made a comparison between labs and tests. In these instances, students may be interpreting projects to mean tests or simply ignoring the project portion of the question. In either case, this suggests problems with the wording of the item due to the presence of multiple topics within a single item (i.e., the item is double-barreled) or a topic in the item not being applicable to the typical experience of a student in a general chemistry course (e.g., having projects). Full student quotes are provided in Table S6 (ESI).

Revisions to SMQ II. The results of both the expert response scale surveys and student interviews were in general agreement that items on the grade, career, and intrinsic scales were not well aligned with a frequency-based response scale. In lieu of using two different response scales within the same instrument, the modified SMQ II (mSMQ II) response scale was changed to the more traditional Likert-type response scale containing five scale points (disagree–somewhat disagree–neither disagree nor agree–somewhat agree–agree), as presented in the expert response scale survey. Similar changes have been made by other users of the SMQ II (Srisawasdi, 2015; Kwon, 2016; Olimpo et al., 2016; Childers and Jones, 2017; Schmid and Bogner, 2017).

During the student interviews, two intrinsic items (I1 and I3) were identified as potentially causing unintended student responses. In the first item, I1, students were considering their career as a reason that science or chemistry was relevant to their lives, which could cause this item to be more associated with responses to items on the career scale rather than the intrinsic motivation scale. For that reason, an additional version of the item “The science I learn is relevant to the world around me” was added to the survey in order to test whether more general wording could avoid having the item align with the career items. The second item, I3, was revised into two different wordings to address different reasons for why science or a specific discipline may be “meaningful” to students, which is by increasing their understanding or appreciation. The phrase “world around me” was used in these revisions to align with the revision to item I1. Additionally, students describing their response to item G4 emphasized thinking about their grade in terms of worrying about requirements for graduate or professional school so an additional item was added to explore the “worry” aspect of thinking about grades.

Other modifications were made to items to remove or separate double-barreled items. As in Salta and Koulougliotis (2015), the phrase “labs” was removed from items SD4, SE2, and G5 rather than removing the items entirely (Austin et al., 2018) since most of the classes had separate lab and lecture components. Additionally, the phrase “projects” was removed from SE2 and replaced with the more general “assignments” since not all classes have projects. Similarly, item SE3 had “skills” removed since that appeared more aligned with a laboratory context.

The final set of modifications was made to deal with phrases that were either overly vague, or overly focused on a specific aspect of course performance. One self-determination item, SD1, “I put enough effort into learning science” was modified into the more concrete “I put effort into learning science well”, which better aligned it with item SD2 about using “strategies to learn science well.” Similarly, a second version of item SD5 “I study hard to learn science” was written more concretely as “I use a lot of mental energy learning science.” In a grade item, G1, specifically focused on students doing better than others on tests, the test-specific language was dropped to account for other aspects of course performance. Similarly, two items, SE4 and G3, focused students on earning an “A” in science, in recognition that this is not necessarily the goal for all students, this was reworded as earning “the grade I want in science.” The complete set of mSMQ II items is provided in Table 1. Items that stayed the same from the SMQ II to the mSMQ II retain the same numbering. Items that have been revised have a letter after their designation (e.g., I1a) to indicate that they are a revision of a pre-existing SMQ II item, I1 in this case. In total, 29 items appeared on the mSMQ II.

Phase two

Methods

In phase two, the mSMQ II was administered to a nationwide sample of undergraduate students using both the science and discipline-specific wordings, which were aligned with the discipline of the course in which the students were enrolled, either biology or chemistry. Following a similar process to that used by the SMQ II authors (Glynn et al., 2011), the mSMQ II responses were randomly split into two equally sized datasets balanced across course, wording, and self-reported gender of respondent to avoid unintentional bias in the datasets. Exploratory factor analysis (EFA) was used on the first half of the data (the ‘training’ dataset) to examine potential issues with the functioning of the modified and added items. Results from the EFA were used to identify and remove problematic items. Full details of the EFA methods and results are contained in the ESI. Confirmatory factor analysis (CFA) was used with the remaining data (the ‘testing’ dataset) to test the data-model fit for a five-factor model of the mSMQ II with the reduced set of items, representing the structure hypothesized by the original developers. Additional CFAs were performed on single-factor models representing the individual aspects of motivation comprising the mSMQ II. These individual factor models were also used to provide information about the reliability of the individual motivation scales.
Participants and data collection. Student participants were recruited by contacting instructors of chemistry and biology courses at nine different colleges and universities across the United States. Courses were classified as preparatory chemistry if the official description indicated that the course was designed to prepare academically weaker students for eventual enrollment in a general chemistry course. Courses were classified as general chemistry if the official course description either had a required or recommended prerequisite of secondary school chemistry or equivalent or if the description indicated that the course was designed for science or engineering majors. Courses were classified as general biology if the official course description indicated that the course was designed for science majors. Data were collected in both on-sequence first term and off-sequence second term general chemistry courses and general biology courses, with data for each discipline combined into one dataset.

A link to the online mSMQ II, created in Qualtrics, was provided to each course instructor. The instructor was asked to provide this link to students through their course management website and also to play a brief video in class in which a research team member described the purpose of the study and the consent process to students. No identifying student information was collected on the survey itself. Most of the course instructors offered extra credit for student participation in the survey. If extra credit was offered, students were taken to a separate survey where they entered their name and university ID for identification purposes for extra credit only. All surveys were open for a non-exam week selected by the instructor between the end of October and the end of November 2017. When taking the survey, students were randomly presented with either the science or discipline-specific wording (biology or chemistry) for all mSMQ II items. The items were presented in a randomized order followed by demographic questions about gender, race/ethnicity, and declared major.

Analysis. All data cleaning and analysis steps for phase two were performed using R version 3.5.0 (R Core Team, 2019).
Data cleaning. A total of 3386 raw survey responses were obtained. Responses for an entire course were excluded from the full data set if the course response rate was under 25% of enrollment, leaving 3101 responses. Next, individual responses were removed if the student did not correctly answer a ‘participant check’ item asking students to select “Disagree.” Additionally, missing data were addressed through listwise deletion. As a result of these cleaning steps the final data set contained 2487 responses corresponding to 73% of the raw data.
Response patterns and descriptive statistics. Descriptive statistics including mean, median, standard deviation, range, skew and kurtosis were calculated for each wording and course condition using the R psych package (version 1.8.4; Revelle, 2018).
Confirmatory factor analysis. After identifying and removing problematic items based on the EFA results (see EFA section in ESI), confirmatory factor analysis (CFA) was used to test models of the mSMQ II with the reduced number of items (as noted in Table 1). In line with the original model used by the developers, and supported by the EFA results, a correlated five-factor model of the mSMQ II items was tested for all course and wording conditions using the testing dataset previously partitioned from the general biology and general chemistry courses and the full preparatory chemistry data set. A training dataset was not generated from the preparatory chemistry data due to the smaller sample size. Lastly, using the combined training and testing data sets, single-factor models were tested for each of the five reduced sets of items representing aspects of motivation (e.g., intrinsic, career, etc.).

The self-determination and self-efficacy scales of the mSMQ II only consisted of three items each after removing poorly functioning items. With only three items and no restrictions on the strength of associations between an item and a factor, known as loadings, a single-factor model has zero degrees of freedom and data-model fit cannot be tested. Constraining loadings on a factor to be equal (i.e., a tau-equivalent model) restores degrees of freedom to the model and data-model fit can be tested (Komperda et al., 2018b). While tau-equivalent models are more restrictive and therefore less likely to achieve model-fit than unconstrained (i.e., congeneric) models, it is necessary to use them when a factor has less than four items for the aforementioned reasons. Therefore, tau-equivalent single-factor models were tested for the self-determination and self-efficacy scales, while congeneric single-factor models were tested for the intrinsic, grade, and career scales.

In recognition of the ordinal and highly-skewed properties of the mSMQ II data, the robust diagonally weighted least squares (WLSMV) estimator was used (Finney and DiStefano, 2013). As the WLSMV estimator was expected to show better data-model fit than robust maximum likelihood (MLR) due to the properties of the data; fit indices from both estimators are provided for comparison purposes. Data-model fit was evaluated using a set of indices appropriate for the estimator used (Hu and Bentler, 1999; Yu, 2002; Beauducel and Herzberg, 2006; Xia and Yang, 2018). For the WLSMV estimator, values of CFI and TLI ≥ 0.95 and RMSEA ≤ 0.05 were used to indicate acceptable data-model fit. Since previous studies demonstrated that the SRMR does not function well with the WLSMV estimator with small number of response categories, the SRMR was not used to make data-model fit assessments for this estimator. For the MLR estimator, values of CFI and TLI ≥ 0.95, RMSEA ≤ 0.06, and SRMR ≤ 0.08 were used to determine acceptable data-model fit. For both estimators, a model was deemed to have acceptable data-model fit when all fit indices were acceptable. All CFA models were analyzed using the lavaan package in R (version 0.6–1; Rosseel, 2012).

Reliability. There are multiple ways to assess the reliability of data obtained from a survey instrument (American Educational Research Association et al., 2014); for this research single-administration reliability values were reported since the instrument was only administered at one time point and the sample size was large enough to examine the internal structure of the instrument (Komperda et al., 2018b). It is not appropriate to address the reliability of the instrument as a whole due to the multidimensional way in which motivation was conceptualized by the original developers as incorporating five distinct factors (Cronbach, 1951). Instead, single-administration reliability values were calculated for each motivation scale showing acceptable data-model fit to a single-factor model tested during the CFA portion of the analysis. All reliability calculations were performed using polychoric correlations to account for the ordinal nature of the data (Gadermann et al., 2012) with the userfriendlyscience R package (version 0.7.1; Peters, 2017).

Results and discussion

Participant characteristics. After data cleaning there were 2487 usable responses from students who responded to all items on the survey. A majority of the students surveyed were enrolled in general chemistry courses (Table 2). Across the various course and wording conditions students were primarily female (ranging from 51–77%) and white (ranging from 60–75%). Students enrolled in the preparatory chemistry courses and general biology courses were primarily biology pre-health majors. In general chemistry, students were primarily engineering majors (30% of science wording and 31% of chemistry wording) or biology pre-health majors (21% of science wording and 20% of chemistry wording).
Table 2 Student responses and demographic information for mSMQ II administration
Course Wording Responses Female (%) White (%) Top major (%)
Preparatory chemistry Science 139 61 75 Biology pre-health (40)
Chemistry 137 76 74 Biology pre-health (35)
General chemistry Science 835 55 63 Engineering (30)
Chemistry 855 51 60 Engineering (31)
General biology Science 258 77 61 Biology pre-health (29)
Biology 263 76 66 Biology pre-health (28)


Descriptive statistics. Responses to the mSMQ II items followed a similar pattern to previous studies (Komperda et al., 2018a) where students were more likely to respond positively (4 or 5 on the 5-point response scale) to items with the science wording relative to the discipline-specific wording. These differential response patterns were most pronounced for items associated with intrinsic and career motivation. Of the five motivation aspects, the grade motivation items had the most strongly positive responses, regardless of wording, which is aligned with what other researchers have reported (Glynn et al., 2011; Salta and Koulougliotis, 2015; Hibbard et al., 2016; Ardura and Pérez-Bitrián, 2018; Austin et al., 2018; Komperda et al., 2018a). Detailed descriptive statistics are provided in Table S7 (ESI).
Exploratory factor analysis. The EFA results were used to identify potentially problematic items that should be removed before moving into a confirmatory framework. Items were determined to be problematic if they showed low relation to their intended scale factor, if they showed evidence of association with more than one factor, or if they displayed an inconsistent pattern of association with a factor across different wording and course conditions. This last condition is particularly important for the mSMQ II since the original SMQ II is intended to be used to measure motivation in different contexts. Items identified as problematic are indicated in Table 1 and Fig. S1 (ESI), with an asterisks below their coefficient bar for the factor they were intended to be associated with. The full results of the EFA can be found in the ESI.
Confirmatory factor analysis.
Five-factor models. The correlated five-factor models of the mSMQ II data represent the hypothesized framework of motivation proposed by the original developers. Testing the fit between this hypothesized model and the data collected from the mSMQ II provides evidence for the validity of that underlying theoretical framework. As expected, since the WLSMV estimator was more appropriate for the characteristics of the data, the fit indices for the 19 mSMQ II items in the five-factor model with the WLSMV estimator reached acceptable levels for more course and wording combinations than the MLR estimator (see Table 3). Overall these fit indices were better than previous studies using the WLSMV estimator (Komperda et al., 2018a) and more consistently acceptable across course and wording combinations suggesting that the extensive revisions minimized some previous issues with instrument functioning across course and wording combinations. For WLSMV, only the general chemistry course with the chemistry wording failed to meet acceptable cutoff values whereas with MLR half of the course and wording combinations failed to meet acceptable data-model fit (both wordings in preparatory chemistry and the science wording in general biology).
Table 3 Data-model fit for five-factor mSMQ II model (df = 142) with WLSMV and MLR estimators. Acceptable individual data-model fit indices are noted in bold. A model is deemed ‘acceptable’ when all indices for a given course/wording data set are bolded
Estimator Course Wording χ 2 CFIa TLIa RMSEAa [90% CI] SRMRa
a Acceptable data-model fit values differ by estimator: for WLSMV cut-off values are CFI and TLI ≥ 0.95 and RMSEA ≤ 0.05; for MLR cut-off values are CFI and TLI ≥ 0.95, RMSEA ≤ 0.06, and SRMR ≤ 0.08.
WLSMV Preparatory chemistry Science (n = 139) 189 0.99 0.99 0.05 [0.03, 0.07]
Chemistry (n = 137) 194 0.99 0.98 0.05 [0.03, 0.07]
General chemistry Science (n = 417) 251 0.99 0.98 0.04 [0.03, 0.05]
Chemistry (n = 426) 334 0.99 0.98 0.06 [0.05, 0.06]
General biology Science (n = 128) 179 0.99 0.98 0.05 [0.02, 0.06]
Biology (n = 130) 191 0.99 0.99 0.05 [0.03, 0.07]
MLR Preparatory chemistry Science (n = 139) 251 0.90 0.88 0.07 [0.06, 0.09] 0.06
Chemistry (n = 137) 235 0.93 0.92 0.07 [0.05, 0.08] 0.06
General chemistry Science (n = 417) 227 0.97 0.96 0.04 [0.03, 0.05] 0.04
Chemistry (n = 426) 284 0.96 0.95 0.05 [0.04, 0.06] 0.04
General biology Science (n = 128) 255 0.89 0.87 0.08 [0.06, 0.09] 0.07
Biology (n = 130) 210 0.96 0.95 0.06 [0.04, 0.08] 0.05


The acceptable fit index values for some wording and course combinations with the MLR estimator align with those seen in other studies involving extensive modifications by removing items (Tosun, 2013; Yamamura and Takehira, 2017). Unfortunately, it is unclear if those studies with acceptable fit indices used the same estimator, though it is a reasonable assumption since the default estimator in many CFA programs is maximum likelihood. The MLR fit indices that did not meet acceptable values are more similar to those from studies by the original developers (CFI = 0.91; RMSEA = 0.07; SRMR = 0.04; Glynn et al., 2011) or those using the 25 SMQ II items with only modifications to the language (e.g., Greek) or the target (e.g., chemistry) of the instrument (Salta and Koulougliotis, 2015; Kwon, 2016; Ardura and Pérez-Bitrián, 2018; Vasques et al., 2018).

The results of testing the five-factor model with two different estimators suggests two possibilities. First, that the poor data-model fit seen in prior work with the unmodified SMQ II was primarily a result of using an inappropriate estimator for the characteristics of the data. However, this conclusion is suspect because previous work using the original wording and the appropriate WLSMV estimator did not show consistently acceptable data-model fit (Komperda et al., 2018a). Second, that the need for extensive revisions or removal of items in order to fit a five-factor model, as in this study and others, indicates a larger problem with the underlying theoretical framework of the instrument. This second possibility was investigated by looking at how the individual scales, which represent the individual aspects of motivation, function as independent factors. If the scales show good data-model fit as single-factor scales this indicates that they are appropriate measurements of a construct but that they do not necessarily relate to each other in the ways hypothesized by the SMQ II developers. If the individual scales do not show good data-model fit as single-factor models this indicates issues with what the scales themselves are measuring and whether it is well aligned with existing theories of motivation.


Single-factor models. Unlike in the five-factor model results, there is less of a clear benefit from using the WLSMV estimator relative to MLR in the single-factor models. Of the 30 possible course and wording combinations, only seven had acceptable data-model fit under the WLSMV estimator whereas 11 were acceptable using MLR (shown in bold in Table 4). None of the scales showed consistently acceptable data-model fit across all course and wording combinations. Unexpectedly, the models for self-determination and self-efficacy with the loadings constrained to be equal across items were as likely to have acceptable data-model fit as other scales where this constraint was not present.
Table 4 Data-model fit for single-factor mSMQ II scales with WLSMV and MLR estimators. Acceptable individual data-model fit indices are noted in bold. A model is deemed ‘acceptable’ when all indices for a given course/wording data set are bolded
Model Estimator Course Wording χ 2 CFIa TLIa RMSEAa [90% CI] SRMRa
a Acceptable data-model fit values differ by estimator: for WLSMV cut-off values are CFI and TLI ≥ 0.95 and RMSEA ≤ 0.05; for MLR cut-off values are CFI and TLI ≥ 0.95, RMSEA ≤ 0.06, and SRMR ≤ 0.08.
Intrinsic items (df = 5) WLSMV Preparatory chemistry Science (n = 139) 18 0.99 0.98 0.14 [0.08, 0.21]
Chemistry (n = 137) 42 0.98 0.96 0.23 [0.17, 0.30]
General chemistry Science (n = 835) 69 0.99 0.98 0.12 [0.10, 0.15]
Chemistry (n = 855) 166 0.98 0.96 0.19 [0.17, 0.22]
General biology Science (n = 258) 37 0.98 0.97 0.16 [0.11, 0.21]
Biology (n = 263) 48 0.99 0.97 0.18 [0.14, 0.23]
MLR Preparatory chemistry Science (n = 139) 18 0.93 0.86 0.14 [0.09, 0.20] 0.04
Chemistry (n = 137) 34 0.91 0.81 0.21 [0.15, 0.27] 0.04
General chemistry Science (n = 835) 31 0.97 0.95 0.08 [0.06, 0.10] 0.03
Chemistry (n = 855) 107 0.94 0.87 0.15 [0.13, 0.18] 0.04
General biology Science (n = 258) 44 0.90 0.81 0.17 [0.14, 0.22] 0.05
Biology (n = 263) 50 0.92 0.83 0.18 [0.15, 0.23] 0.04
Self-determination items (equal loadings; df = 2) WLSMV Preparatory chemistry Science (n = 139) 0 1.00 1.01 0.00 [0.00, 0.06]
Chemistry (n = 137) 7 0.98 0.97 0.14 [0.04, 0.25]
General chemistry Science (n = 835) 26 0.99 0.98 0.12 [0.08, 0.16]
Chemistry (n = 855) 26 0.99 0.98 0.12 [0.08, 0.16]
General biology Science (n = 258) 2 1.00 1.00 0.02 [0.00, 0.13]
Biology (n = 263) 11 0.99 0.99 0.13 [0.06, 0.21]
MLR Preparatory chemistry Science (n = 139) 1 1.00 1.02 0.00 [0.00, 0.12] 0.09
Chemistry (n = 137) 10 0.89 0.84 0.17 [0.08, 0.28] 0.15
General chemistry Science (n = 835) 13 0.97 0.95 0.08 [0.05, 0.12] 0.12
Chemistry (n = 855) 12 0.98 0.97 0.08 [0.05, 0.11] 0.09
General biology Science (n = 258) 0 1.00 1.02 0.00 [0.00, 0.04] 0.02
Biology (n = 263) 0 1.00 1.02 0.00 [0.00, 0.04] 0.02
Self-efficacy items (equal loadings; df = 2) WLSMV Preparatory chemistry Science (n = 139) 2 1.00 1.00 0.00 [0.00, 0.16]
Chemistry (n = 137) 5 1.00 1.00 0.10 [0.00, 0.22]
General chemistry Science (n = 835) 18 1.00 0.99 0.10 [0.06, 0.14]
Chemistry (n = 855) 10 1.00 1.00 0.07 [0.03, 0.11]
General biology Science (n = 258) 13 0.99 0.99 0.14 [0.07, 0.22]
Biology (n = 263) 10 0.99 0.99 0.13 [0.06, 0.21]
MLR Preparatory chemistry Science (n = 139) 3 0.99 0.98 0.06 [0.00, 0.16] 0.10
Chemistry (n = 137) 6 0.97 0.96 0.12 [0.00, 0.24] 0.09
General chemistry Science (n = 835) 24 0.97 0.95 0.12 [0.08, 0.15] 0.09
Chemistry (n = 855) 9 0.99 0.98 0.06 [0.03, 0.11] 0.05
General biology Science (n = 258) 10 0.96 0.94 0.13 [0.07, 0.19] 0.13
Biology (n = 263) 7 0.97 0.96 0.10 [0.03, 0.19] 0.08
Grade items (df = 2) WLSMV Preparatory chemistry Science (n = 139) 7 0.99 0.97 0.14 [0.04, 0.25]
Chemistry (n = 137) 3 1.00 0.99 0.06 [0.00, 0.19]
General chemistry Science (n = 835) 4 1.00 1.00 0.04 [0.00, 0.09]
Chemistry (n = 855) 7 1.00 1.00 0.06 [0.02, 0.10]
General biology Science (n = 258) 2 1.00 1.00 0.00 [0.00, 0.12]
Biology (n = 263) 2 1.00 1.00 0.01 [0.00, 0.12]
MLR Preparatory chemistry Science (n = 139) 10 0.84 0.52 0.17 [0.09, 0.26] 0.04
Chemistry (n = 137) 1 1.00 1.04 0.00 [0.00, 0.09] 0.01
General chemistry Science (n = 835) 5 0.99 0.98 0.04 [0.00, 0.07] 0.01
Chemistry (n = 855) 3 1.00 0.99 0.03 [0.00, 0.06] 0.01
General biology Science (n = 258) 2 1.00 1.02 0.00 [0.00, 0.07] 0.02
Biology (n = 263) 3 0.99 0.97 0.05 [0.00, 0.10] 0.02
Career items (df = 2) WLSMV Preparatory chemistry Science (n = 139) 0 1.00 1.00 0.00 [0.00, 0.09]
Chemistry (n = 137) 3 1.00 1.00 0.07 [0.00, 0.20]
General chemistry Science (n = 835) 22 1.00 0.99 0.11 [0.07, 0.15]
Chemistry (n = 855) 22 1.00 1.00 0.11 [0.07, 0.15]
General biology Science (n = 258) 5 1.00 0.99 0.08 [0.07, 0.17]
Biology (n = 263) 5 1.00 1.00 0.07 [0.00, 0.16]
MLR Preparatory chemistry Science (n = 139) 1 1.00 1.02 0.00 [0.00, 0.07] 0.01
Chemistry (n = 137) 2 1.00 1.00 0.02 [0.00, 0.15] 0.01
General chemistry Science (n = 835) 15 0.97 0.92 0.09 [0.06, 0.12] 0.02
Chemistry (n = 855) 10 0.99 0.98 0.07 [0.04, 0.11] 0.01
General biology Science (n = 258) 11 0.95 0.85 0.14 [0.08, 0.20] 0.03
Biology (n = 263) 4 1.00 0.99 0.06 [0.00, 0.13] 0.01


Examination of patterns of acceptable and unacceptable data-model fit for the single factor models provides evidence that the model of the remaining intrinsic items (I2, I3a, I3b, I4, and I5) showed consistently poor data-model fit across course and wording conditions, particularly in the high RMSEA values relative to other scales. In contrast, the grade motivation items had a majority of course and wording combinations with acceptable data-model fit, more than any other scale. The patterns are more difficult to interpret for the self-determination and self-efficacy items given their inconsistency in fit across course and wording combinations as well as the additional constraints placed on those scales in order to obtain data-model fit information. Similarly, the results for the career items were somewhat mixed particularly when comparing across estimators.

Overall, the single-factor model results provide some evidence that the underlying issue of inconsistent fit of the SMQ II and mSMQ II data may be due to problems with the individual aspects of motivation hypothesized to underpin the instrument. Single-factor models of the original SMQ II items showed similar patterns with the WLSMV estimator in that the intrinsic items had less support for their structure across course and wording combinations while the grade motivation items had more (Komperda et al., 2018a). Yet, grade motivation is not a known construct within the motivation literature while intrinsic motivation is. These results further support the interpretation that even when a more appropriate estimator is used there are underlying issues in the structure and framework of the SMQ II and mSMQ II responsible for the poor data-model fit.

Reliability. The results of the single-factor models provided in Table 4 were used to determine the course and wording conditions under which each scale showed acceptable data model fit with either the WLSMV or MLR estimator. The models for the intrinsic, grade, and career scales are congeneric because the items were not constrained to be associated with the factor to the same extent. Under congeneric model conditions, omega is the most appropriate single-administration reliability coefficient to report (Komperda et al., 2018b). The self-determination and self-efficacy scales with loadings constrained to be equal are tau-equivalent models where alpha is an acceptable single-administration reliability coefficient. Since alpha and omega are identical under tau-equivalent conditions, only omega values are reported in Table 5.
Table 5 Single-administration reliability values (omega) by course, wording, and scale. No value is reported for scales that did not have acceptable single-factor data-model fit
Course Preparatory chemistry General chemistry General biology
Wording Science Chemistry Science Chemistry Science Biology
Intrinsic
Self-determination 0.81 0.82 0.86
Self-efficacy 0.89 0.86
Grade 0.90 0.90 0.88 0.89 0.90
Career 0.92 0.92 0.95


No omega value is reported in Table 5 for scales that did not meet the previously determined data-model fit criteria for each estimator (CFI and TLI ≥ 0.95 and RMSEA ≤ 0.05 for WLSMV; CFI and TLI ≥ 0.95, RMSEA ≤ 0.06, and SRMR ≤ 0.08 for MLR). Though the omegas reported in Table 5 for the modified scales are generally higher than previously reported reliability values for the scales (Glynn et al., 2011; Salta and Koulougliotis, 2015; Schumm and Bogner, 2016; Schmid and Bogner, 2017; Ardura and Pérez-Bitrián, 2018; Komperda et al., 2018a), they should not be interpreted as providing any indication of scale quality on their own, as there is no set threshold for acceptable reliability (Arjoon et al., 2013; Taber, 2018). Rather, the reliability values are one of many pieces of evidence that should be evaluated to provide overall evidence for the quality of data obtained from an instrument.

Limitations

Though this study was designed as a follow up to previous examination of the functioning of the SMQ II in introductory and general chemistry courses with both the science and chemistry wording (Komperda et al., 2018a), the current study did not obtain enough participants from introductory chemistry courses to test mSMQ II functioning in that population. Future work should obtain data from a more diverse population of students to provide more generalizable information about the functioning of the instrument. Similarly, though previous studies had interviewed biology students to modify items (Glynn et al., 2011) the current study only interviewed general chemistry students to examine patterns in interpretation of the SMQ II items therefore additional interviews with other populations of students would be beneficial.

Conclusions

The goal of this study was to identify potential reasons for the inconsistent internal structure validity evidence for the SMQ II in published literature and determine if modifications to address these issues would improve the internal structure of the instrument. Though the modifications undertaken in this study ultimately resulted in acceptable data-model fit for the overall five-factor model in five out of six wording and course conditions when using an estimator appropriate for the data characteristics, WLSMV, the need for such extensive revisions here and in other studies (Tosun, 2013; Salta and Koulougliotis, 2015; Kwon, 2016; Ardura and Pérez-Bitrián, 2018) suggests there may be deeper issues with the underlying theoretical framework of the SMQ II. None of the individual scales demonstrated acceptable functioning across all course and wording conditions and the best functioning scale had the least theoretical support from the motivation literature. These results further support the interpretation that the items themselves are not well aligned with motivational theories and that calculation of individual scale scores provides little meaningful and interpretable information. It would also not be acceptable to create a total motivation score given the varied theoretical nature of the individual scales.

Though the grade motivation scale had relatively better fit than other scales, this is not an aspect of motivation with a strong theoretical foundation in the self-regulatory literature. The grade scale was created by the developers during revision from the original SMQ to the SMQ II based on EFA results and interviews with students (Glynn et al., 2011; Table S1, ESI). The original grade (and career) items were initially intended to belong to a single factor representing extrinsic motivation, a more theoretically-based aspect of motivation (Ryan and Deci, 2000) but evidence from this study does not suggest the grade and career items belong to a single factor. In addition to the theoretical concerns for the grade scale, there are practical concerns in how useful this scale is to instructors and researchers since students were likely to always select responses on the far end of the response scale resulting in a ceiling effect.

In contrast, the self-efficacy scale has more theoretical justification and when tested with constraints, had acceptable data-model fit in some course and wording conditions and more students making use of the entire response scale. The original set of items on the self-efficacy scale were also found to function well when translated to Spanish and used with the physics wording. For this reason, the self-efficacy scale may be useful to some practitioners, though it would be beneficial for future research to explore adding additional items to the scale.

It is likely that students value the science content they are learning for multiple reasons and these could potentially have some overlap that needs to be addressed. This would explain the incongruence of the intrinsic and career items overlapping so strongly when intrinsic and extrinsic motivation are on opposite ends of a motivational continuum as described by self-determination theory (Ryan and Deci, 2000). In recent years, Brophy (2008) noted that students’ motivation to learn has been examined in three categories: (1) the influence of the classroom milieu, (2) students’ expectancies or self-beliefs, and (3) their perceived value of the task. The SMQ II tends to focus more on the latter two and thus the expectancy-value model (Wigfield and Eccles, 2000) might be more harmonious with the intended purpose of the measure. For example, the items for the intrinsic, career, and grade motivation scales on the SMQ II are more aligned with different task value perspectives than the polar ends of the extrinsic-intrinsic continuum. In the expectancy-value model, Eccles and Wigfield (2002) articulate different reasons for why an individual might value a task. For the SMQ II, it tends to be that the two most related might be utility (e.g., the usefulness of a chemistry course for reaching one's career goals) and intrinsic value (because the content is inherently interesting to learn about). When breaking down the intrinsic, career, and grade motivation scales, we could potentially see that a student inherently finds chemistry to be interesting to learn about and thus wants to pursue a career in chemistry. Thus, the student is interested in attaining good grades in her chemistry course based on the relevance to her future career pursuits. For this student, her motivation would be somewhere between ‘identified’ or ‘integrated’ regulation on Ryan and Deci's (2000) extrinsic-intrinsic continuum.

Implications for researchers

When examining academic motivation, it is important to consider its multifaceted nature. What research has shown is that expectancies (e.g., self-efficacy beliefs) are more related to direct performance within a particular course, while having a lower relationship to things like major or career choice in the future (Harackiewicz and Hulleman, 2010). On the other hand, task value is not as highly related to course performance, yet shows greater predictive value of future choice decisions and persistence in a major (e.g., Harackiewicz et al., 2008). Therefore, as educational researchers we need to carefully consider why we are using particular scales and ensure alignment of scales to our research objectives.

With specific regard to the SMQ II and its discipline-specific variants (i.e., BMQ II, CMQ II, PMQ II and others), researchers deciding to use this assessment instrument are encouraged to carefully consider the outcomes of this study, our prior work (Komperda et al., 2018a), and the number of SMQ II studies compiled within this manuscript when analyzing their data. Given the repeated lack of substantiation for the structure of the SMQ II, researchers choosing to use the instrument are urged to continue conducting single and multi-factor CFAs to evaluate the structure of their data.

While psychometric studies can provide insights regarding if an assessment instrument, administered in a specific context, produces valid and reliable data, they are not necessarily designed to explain why validity or reliability might not be supported. Therefore, when structural validity issues arise, such as those that have been observed with the SMQ II, qualitative studies designed specifically to explore the underlying theoretical framework might be warranted. Consequently, it is recommended that future work on the SMQ II include cognitive interviews using open-ended probes designed to elicit the nature of an underlying construct (e.g., intrinsic motivation) within a specific context (e.g., chemistry and biology courses). These studies would provide an understanding beyond the item-level cognitive interviews conducted to support response process validity and might provide insight to the salient features of a construct and the way in which cuing of subjects affects student ideas and therefore, why the same structural model might not be equally supported.

Implications for practitioners

As this study and several others by different researchers have been unsuccessful in supporting a key aspect of validity (i.e. structural) for the SMQ II, practitioners are highly cautioned in selecting this tool for use in their evaluation of classroom practices. While the SMQ II and its variations are highly prevalent in the literature, there are other academic motivation instruments and scales available for use in higher education STEM learning environments. These include scales based on motivational theories such as expectancy-value theory (Ferrell and Barbera, 2015; Flake et al., 2015; Kosovich et al., 2015) and self-determination theory (Black and Deci, 2000; Hall and Webb, 2014; Jeno et al., 2017; Liu et al., 2017) and other more general student motivation measures (Pintrich et al., 1991). Overall, it is best to identify an instrument that is most closely aligned with a variable of interest (e.g., self-efficacy) and has evidence for acceptable functioning in environments most similar to those in which it will be used (e.g., discipline, course, target populations). Using assessment instruments that have been shown to produce valid and reliable data in similar environments provides some level of support for the interpretations of the data produced by the instrument. If an assessment instrument is not available, practitioners are encouraged to consider collaborating with an education researcher to study and/or adapt one of the currently available instruments for use in their learning environment. These types of collaborations are symbiotic as the practitioner acquires an assessment tool aligned with their specific needs and the researcher and larger chemistry education community gains new insights to the available tools being used to understand our educational landscape.

Conflicts of interest

There are no conflicts to declare.

References

  1. American Educational Research Association, American Psychological Association and National Council on Measurement in Education, (2014), Standards for educational and psychological testing, Washington, DC: American Educational Research Association.
  2. Ardura D. and Pérez-Bitrián A., (2018), The effect of motivation on the choice of chemistry in secondary schools: adaptation and validation of the Science Motivation Questionnaire II to Spanish students, Chem. Educ. Res. Pract., 19(3), 905–918.
  3. Arjoon J. A., Xu X. and Lewis J. E., (2013), Understanding the state of the art for measurement in chemistry education research: Examining the psychometric evidence, J. Chem. Educ., 90(5), 536–545.
  4. Austin A. C., Hammond N. B., Barrows N., Gould D. L. and Gould I. R., (2018), Relating motivation and student outcomes in general organic chemistry, Chem. Educ. Res. Pract., 19(1), 331–341.
  5. Bandura A., (1993), Perceived self-efficacy in cognitive development and functioning, Educ. Psychol., 28(2), 117–148.
  6. Barbera J. and VandenPlas J. R., (2011), All assessment materials are not created equal: the myths about instrument development, validity, and reliability, in Bunce D. M. (ed.), Investigating classroom myths through research on teaching and learning, ACS Symposium Series, Washington, DC: American Chemical Society, pp. 177–193.
  7. Beatty P., (2004), The dynamics of cognitive interviewing, in Presser S., Rothgeb J. M., Couper M. P., Lessler J. T., Martin E., Martin J. and Singer E. (ed.), Methods for testing and evaluating survey questionnaires, Hoboken, NJ: Wiley-Interscience, pp. 45–66.
  8. Beauducel A. and Herzberg P. Y., (2006), On the performance of maximum likelihood versus means and variance adjusted weighted least squares estimation in CFA. Struct. Equ. Model. A Multidiscip. J., 13(2), 186–203.
  9. Black A. E. and Deci E. L., (2000), The effects of instructors’ autonomy support and students’ autonomous motivation on learning organic chemistry: a self-determination theory perspective, Sci. Educ., 84(6), 740–756.
  10. Brophy J., (2008), Developing students’ appreciation for what is taught in school, Educ. Psychol., 43(3), 132–141.
  11. Cagande J. L. L. and Jugar R. R., (2018), The flipped classroom and college physics students’ motivation and understanding of kinematics graphs, Issues Educ. Res., 28(2), 288–307.
  12. Campos-Sánchez A., López-Núñez J. A., Carriel V., Martín-Piedra M.-Á., Sola T. and Alaminos M., (2014), Motivational component profiles in university students learning histology: a comparative study between genders and different health science curricula, BMC Med. Educ., 14, 46.
  13. Chan J. Y. K. and Bauer C. F., (2014), Identifying at-risk students in general chemistry via cluster analysis of affective characteristics, J. Chem. Educ., 91(9), 1417–1425.
  14. Childers G. and Jones M. G., (2017), Learning from a distance: High school students’ perceptions of virtual presence, motivation, and science identity during a remote microscopy investigation, Int. J. Sci. Educ., 39(3), 257–273.
  15. Cleveland L. M., Olimpo J. T. and DeChenne-Peters S. E., (2017), Investigating the relationship between instructors’ use of active-learning strategies and students’ conceptual understanding and affective changes in introductory biology: A comparison of two active-learning environments, CBE Life Sci. Educ., 16(2), 1–10.
  16. Cronbach L. J., (1951), Coefficient alpha and the internal structure of tests, Psychometrika, 16(3), 297–334.
  17. Deci E. L. and Ryan R. M., (2000), The “what” and “why” of goal pursuits: Human needs and the self-determination of behavior, Psychol. Inq., 11(4), 227–268.
  18. Eccles J. S. and Wigfield A., (2002), Motivational beliefs, values, and goals, Annu. Rev. Psychol., 2002, 53, 109–132.
  19. Ferrell B. and Barbera J., (2015), Analysis of students’ self-efficacy, interest, and effort beliefs in general chemistry, Chem. Educ. Res. Pract., 318(16), 318–337.
  20. Ferrell B., Phillips M. M. and Barbera J., (2016), Connecting achievement motivation to performance in general chemistry, Chem. Educ. Res. Pract., 17(4), 1054–1066.
  21. Finney S. J. and DiStefano C., (2013), Non-normal and categorical data in structural equation modeling, in Hancock G. R. and Mueller R. O. (ed.), Structural equation modeling: a second course, Charlotte, NC: Information Age Publishing, pp. 439–492.
  22. Flake J. K., Barron K. E., Hulleman C., McCoach B. D. and Welsh M. E., (2015), Measuring cost: the forgotten component of expectancy-value theory, Contemp. Educ. Psychol., 41, 232–244.
  23. Fortus D., (2014), Attending to affect, J. Res. Sci. Teach., 51(7), 821–835.
  24. Gadermann A. M., Guhn M. and Zumbo B. D., (2012), Estimating ordinal reliability for likert-type and ordinal item response data: a conceptual, empirical, and practical guide, Pract. Assessment, Res. Eval., 17(3), 1–13.
  25. Glynn S. M. and Koballa T. R., (2006), Motivation to learn in college science, in Mintzes J. J. and Leonard W. H. (ed.), Handbook of college science teaching, Arlington, VA: National Science Teachers Association, pp. 25–32.
  26. Glynn S. M., Taasoobshirazi G. and Brickman P., (2007), Nonscience majors learning science: a theoretical model of motivation, J. Res. Sci. Teach., 44(8), 1088–1107.
  27. Glynn S. M., Taasoobshirazi G. and Brickman P., (2009), Science motivation questionnaire: construct validation with nonscience majors, J. Res. Sci. Teach., 46(2), 127–146.
  28. Glynn S. M., Brickman P., Armstrong N. and Taasoobshirazi G., (2011), Science motivation questionnaire II: Validation with science majors and nonscience majors, J. Res. Sci. Teach., 48(10), 1159–1176.
  29. González A. and Paoloni P.-V., (2015), Perceived autonomy-support, expectancy, value, metacognitive strategies and performance in chemistry: a structural equation model in undergraduates, Chem. Educ. Res. Pract., 16(3), 640–653.
  30. Grosslight L., Unger C., Jay E. and Smith C. L., (1991), Understanding models and their use in science: Conceptions of middle and high school students and experts, J. Res. Sci. Teach., 28(9), 799–822.
  31. Hall N. and Webb D., (2014), Instructors’ support of student autonomy in an introductory physics course, Phys. Rev. Spec. Top. - Phys. Educ. Res., 10(2), 1–22.
  32. Hallgren K. A., (2012), Computing inter-rater reliability for observational data: An overview and tutorial, Tutor. Quant. Methods Psychol., 8(1), 23–34.
  33. Harackiewicz J. M. and Hulleman C. S., (2010), The importance of interest: The role of achievement goals and task values in promoting the development of interest, Soc. Personal. Psychol. Compass, 4(1), 42–52.
  34. Harackiewicz J. M., Durik A. M., Barron K. E., Linnenbrink-Garcia L. and Tauer J. M., (2008), The role of achievement goals in the development of interest: Reciprocal relations between achievement goals, interest, and performance, J. Educ. Psychol., 100(1), 105–122.
  35. Henson R. K. and Roberts J. K., (2006), Use of exploratory factor analysis in published research: Common errors and some comment on improved practice, Educ. Psychol. Meas., 66(3), 393–416.
  36. Hibbard L., Sung S. and Wells B., (2016), Examining the effectiveness of a semi-self-paced flipped learning format in a college general chemistry sequence, J. Chem. Educ., 93(1), 24–30.
  37. Hu L. and Bentler P. M., (1999), Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives, Struct. Equ. Model. A Multidiscip. J., 6(1), 1–55.
  38. Jeno L. M., Raaheim A., Kristensen S. M., Kristensen K. D., Hole T. N., Haugland M. J. and Mæland S., (2017), The relative effect of team-based learning on motivation and learning: A self-determination theory perspective, CBE Life Sci. Educ., 16(4), 1–12.
  39. Kassaee A. M., (2016), Examining the role of motivation and mindset in the performance of college students majoring in STEM fields, Middle Tennessee State University.
  40. Komperda R., Hosbein K. N. and Barbera J., (2018a), Evaluation of the influence of wording changes and course type on motivation instrument functioning in chemistry, Chem. Educ. Res. Pract., 19(1), 184–198.
  41. Komperda R., Pentecost T. C. and Barbera J., (2018b), Moving beyond alpha: A primer on alternative sources of single-administration reliability evidence for quantitative chemistry education research, J. Chem. Educ., 95(9), 1477–1491.
  42. Kosovich J. J., Hulleman C. S., Barron K. E. and Getty S., (2015), A practical measure of student motivation: Establishing validity evidence for the expectancy-value-cost scale in middle school, J. Early Adolesc., 35(5–6), 790–816.
  43. Krosnick J. A. and Presser S., (2010), Question and questionnaire design, in Marsden P. V. and Wright J. D. (ed.), Handbook of survey research, Bingley, UK: Emerald Group Publishing Limited.
  44. Kwon H., (2016), Effect of middle school students’ motivation to learn technology on their attitudes toward engineering, Eurasia J. Math. Sci. Technol. Educ., 12(9), 2281–2294.
  45. Liu Y., Ferrell B., Barbera J. and Lewis J. E., (2017), Development and evaluation of a chemistry-specific version of the academic motivation scale (AMS-Chemistry). Chem. Educ. Res. Pract., 18(1), 191–213.
  46. Liu Y., Raker J. R. and Lewis J. E., (2018), Evaluating student motivation in organic chemistry courses: moving from a lecture-based to a flipped approach with peer-led team learning, Chem. Educ. Res. Pract., 19, 251–264.
  47. Mahrou A., Ginger K. and Rowell H., (2016), Motivationally-informed interventions for at-risk STEM students, J. STEM Educ., 17(3), 77–84.
  48. National Research Council, (2012), Discipline-based education research: Understanding and improving learning in undergraduate science and engineering, Singer S. R., Nielsen N. R. and Schweingruber H. A. (ed.), Washington, DC: The National Academies Press.
  49. Olimpo J. T., Fisher G. R. and Dechenne-Peters S. E., (2016), Development and evaluation of the tigriopus course-based undergraduate research experience: Impacts on students’ content knowledge, attitudes, and motivation in a majors introductory biology course, CBE Life Sci. Educ., 15(4), 1–15.
  50. Peters G.-J. Y., (2017), userfriendlyscience: quantitative analysis made accessible, [Computer Software].
  51. Pintrich P. R., Smith D. A. F., Garcia T. and McKeachie W. J., (1991), A manual for the use of the motivated strategies for learning questionnaire (MSLQ), Ann Arbor, MI.
  52. R Core Team, (2019), R: A language and environment for statistical computing, [Computer Software].
  53. Reece A. J. and Butler M. B., (2017), Virtually the same: a comparison of STEM students content knowledge, course performance, and motivation to learn in virtual and face-to-face introductory biology laboratories, J. Coll. Sci. Teach., 46(3), 83–89.
  54. Revelle W., (2018), psych: Procedures for psychological, psychometric, and personality research, [Computer Software].
  55. Riccitelli M., (2015), Science identity's influence on community college students engagement, persistence, and performance in biology, Northcentral University.
  56. Rosseel Y., (2012), lavaan: An R package for structural equation modeling, J. Stat. Softw., 48(2), 1–36.
  57. Ryan R. M. and Deci E. L., (2000), Self-determination theory and the facilitation of intrinsic motivation, Am. Psychol., 55(1), 68–78.
  58. Salta K. and Koulougliotis D., (2015), Assessing motivation to learn chemistry: Adaptation and validation of Science Motivation Questionnaire II with Greek secondary school students, Chem. Educ. Res. Pract., 16(2), 237–250.
  59. Schmid S. and Bogner F. X., (2017), How an inquiry-based classroom lesson intervenes in science efficacy, career-orientation and self-determination, Int. J. Sci. Educ., 39(17), 2342–2360.
  60. Schumm M. F. and Bogner F. X., (2016), Measuring adolescent science motivation, Int. J. Sci. Educ., 38(3), 434–449.
  61. Shin S., Lee J. K. and Ha M., (2017), Influence of career motivation on science learning in Korean high-school students, Eurasia J. Math. Sci. Technol. Educ., 13(5), 1517–1538.
  62. Simpkins S. D., Davis-Kean P. E. and Eccles J. S., (2006), Math and science motivation: A longitudinal examination of the links between choices and beliefs, Dev. Psychol., 42(1), 70–83.
  63. Srisawasdi N., (2015), Evaluation of motivational impact of a computer-based nanotechnology inquiry learning module on the gender gap, J. Nano Educ., 7(1), 28–37.
  64. Taber K. S., (2018), The use of cronbach's alpha when developing and reporting research instruments in science education, Res. Sci. Educ., 48(6), 1273–1296.
  65. Tosun C., (2013), Adaptation of Chemistry Motivation Questionnaire-II to Turkish, J. Educ. Fac., 15(1), 173–202.
  66. Vasques D. T., Yoshida L., Ellinger J. and Maninang J. S., (2018), Validity and reliability of the science motivation questionnaire II (SMQ II) in the context of a japanese university, New perspectives in science education, Florence: Pixel.
  67. Wigfield A. and Eccles J. S., (2000), Expectancy–value theory of achievement motivation, Contemp. Educ. Psychol., 25(1), 68–81.
  68. Xia Y. and Yang Y., (2018), RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods, Behav. Res. Methods, (1999), 1–20.
  69. Yamamura S. and Takehira R., (2017), Effect of practical training on the learning motivation profile of Japanese pharmacy students using structural equation modeling, J. Educ. Eval. Health Prof., 14, 2.
  70. Young A. M., Wendel P. J., Esson J. M. and Plank K. M., (2018), Motivational decline and recovery in higher education STEM courses, Int. J. Sci. Educ., 40(9), 1016–1033.
  71. Yu C.-Y., (2002), Evaluating cutoff criteria of model fit indices for latent variable models with binary and continous outcomes, University of California, Los Angeles.
  72. Zeyer A., (2018), Gender, complexity, and science for all: Systemizing and its impact on motivation to learn science for different science subjects, J. Res. Sci. Teach., 55(2), 147–171.
  73. Zimmerman B. J., (2000), Attaining self-regulation: A social cognitive perspective, in Boekaerts M., Pintrich P. R. and Zeidner M. (ed.), Handbook of self-regulation, San Diego, CA: Academic Press, pp. 13–39.
  74. Zusho A., Pintrich P. R. and Coppola B., (2003), Skill and will: the role of motivation and cognition in the learning of college chemistry, Int. J. Sci. Educ., 25(9), 1081–1094.

Footnote

Electronic supplementary information (ESI) available: see DOI: 10.1039/d0rp00029a

This journal is © The Royal Society of Chemistry 2020