Towards a theoretically sound measure of chemistry students’ motivation; investigating rank-sort survey methodology to reduce response style bias

Ying Wang and Scott E. Lewis *
University of South Florida – Chemistry, 4202 E. Fowler Avenue CHE205, Tampa, Florida 33620, USA. E-mail: slewis@usf.edu

Received 29th July 2021 , Accepted 18th November 2021

First published on 29th November 2021


Abstract

Prior research has demonstrated the important role of chemistry students’ affect in academic performance. Likert-scale surveys are the most prevalent tools to measure students’ affect within chemistry education research, however, data collected through a Likert-scale survey may exhibit response style bias which can hinder accurately measuring students’ affect. This study investigates the utility of a novel survey methodology, termed rank-sort survey, in understanding students’ academic motivation in a general chemistry course. Informed by Q methodology, in a rank-sort survey participants rank a set of statements in terms of level of agreement with limits in place on how many items can be assigned a particular rank. In this investigation, a rank-sort survey was developed by using statements from an existing Likert-scale instrument, the Academic Motivation Survey in Chemistry. Data collected from the rank-sort surveys, compared to Likert-scale surveys, showed a better alignment with self-determination theory, the underlying theoretical framework, and a better ability to predict students’ academic performance in chemistry. The study also discusses which surveys in chemistry education research are likely to benefit from adopting a rank-sort approach.


The importance of students’ affect in chemistry education has become increasingly recognized in chemistry education research (Flaherty, 2020), resulting in a growing interest in developing and investigating accurate measures of students’ affect. The strong majority of such investigations to date have relied on Likert-scale survey instruments. Data collected by Likert-scale instruments may be hindered by participants’ response style bias which is a default preference for a particular response option on a Likert-scale (Van Vaerenbergh and Thomas, 2012; Park and Wu, 2019). Response style bias leads to a pattern of consistent ratings across the survey items and this can be particularly problematic when measuring students’ affect with a framework that contains competing dimensions. For example, self-determination theory (Ryan and Deci, 2000) describes motivation on a spectrum from controlled to autonomous. A Likert-scale survey, informed by this theory, offers the option for students to rate both motivation approaches similarly which is incongruent with the expectation of the theory. Possible explanations for these results are that response bias was responsible for students rating items similarly or that the theory is not applicable at the setting. Measuring students’ relative ranking of statements belonging to each dimension offers a potential path forward. The ranking process prevents all dimensions receiving similar ratings. Further, as theories are evaluated based on their usefulness if the relative ranking offers more insight into related factors such as student performance it would be indicative of the utility of the theoretical framework at the setting. Thus, frameworks that include competing dimensions, which are prevalent within chemistry education research, could benefit from a measurement of students’ relative ranking of the dimensions. The relative ranking would serve the researchers and practitioners’ interest of determining which of the competing dimension is most applicable to each participant.

This work investigates rank-sort surveys as an alternative to Likert-scale surveys meant to address the concern of response style bias and potentially provide additional insight into the measurement of students’ affect into chemistry. More specifically, this work examines students’ responses to the Academic Motivation in Chemistry survey (Liu et al., 2017) when the survey is formatted as either a Likert-scale or rank-sort survey. The data collected with each survey methodology will be contrasted with consideration for the match to the theoretical basis of the survey and the ability of the survey results to predict students’ academic performance in chemistry. The predicting ability serves as external evidence as to how well the results are in line with the underlying theory.

Measuring students’ affect

Students’ affect plays an important role in students’ academic performance (Nieswandt, 2007; Villafañe et al., 2016). According to a recent review of affective studies in chemistry education research (CER), students’ affect can be described as attitudes, self-efficacy, expectations, values, interest, motivation, effort beliefs or achievement emotions (Flaherty, 2020). The relationship between affective dimensions and students’ academic performance, as well as the integration among students’ feeling, thinking and doing is well documented in a number of studies (Brandriet et al., 2013; Chan and Bauer, 2014; Liu et al., 2017; Gibbons et al., 2018; Liu et al., 2018) generally finding a positive association between students’ affect and academic performance in chemistry.

A common technique for measuring students’ affect in chemistry education research is to administer a Likert-scale survey to students. Likert-scale surveys include a series of items that belong to an underlying, latent construct and the participant is prompted to rate each item on a specific scale (e.g. 5-point scale; 7-point scale). The responses categories are written to represent different levels of agreement (e.g. “Strongly agree” to “Strongly disagree”). Likert-scale surveys can be easily given to large samples of students and therefore may more readily support claims of generalizability than other more labor-intensive data collection techniques such as focus groups or interviews. As a result, Likert-scale surveys are among the most common measurement tools in CER. To demonstrate the ubiquity of Likert-scale surveys in CER, an examination of the recent articles from Flaherty's 2020 review of CER studies on students’ affect was conducted. Among 32 quantitative studies published from 2016 through 2020 that focus on students’ affect, 26 studies used a Likert-scale survey as a measurement instrument. However, despite the wide uses of Likert-scale surveys, data collected through a Likert-scale may face a threat from response style bias which can result in ceiling or flooring effects or high correlations among dimensions that are not intended to relate (Van Vaerenbergh and Thomas, 2012; Park and Wu, 2019). High correlations among dimensions that are thought to be independent or inversely related would indicate that the survey was not functioning as intended and would limit the utility of the survey results.

Response style bias

Response style bias refers to the fact that respondents tend to prefer or avoid particular categories, regardless of the levels of traits being measured (Van Vaerenbergh and Thomas, 2012; Park and Wu, 2019). For example, a student may have a default selection of the second most positive choice that is selected for any items the student does not have a strong impression for, while another student may do the same with the most negative choice. For surveys measuring multiple dimensions (also called latent constructs), responses style bias may deflate or inflate the correlations and distort the associations among the constructs. Evidence of response style bias has been demonstrated in psychological studies. Moors (2008) explored response style behavior in participants’ answers to a five-point Likert-scale survey measuring attitude towards family, children and immigrants. The survey was administered online and assigned to 2000 Dutch households. Using latent class analysis, the study found a clear pattern of extreme response styles where participants tended to pick the extreme options on the scale. Park and Wu (2019) explored response style bias in participants’ responses to a four-point self-esteem Likert-type scale. Data for this study was retrieved from the open source of the 2005 Longitudinal Study of Generation in California and 1596 participants were included in the analysis. The results of the study indicate that the model with two extremity factors (extremity in high self-esteem direction and extremity in low self-esteem direction) fit the data better than two other models including one posting no extremity factor and the other posting a single extremity factor. The findings suggested the presence of two distinct extreme response styles, preference on agreement and preference on disagreement, within the responses. For the current study, some exemplars of the impact of response style bias are presented below; however, the interested reader can find a detailed review of the adverse effects of response style bias in a study by Van Vaerenbergh and Thomas (2012) including the impact of response style bias on means and variances, as well as the magnitude of correlations between variables.

Possible outcomes of a strong response style bias are a ceiling or flooring effect. Ceiling effect is a term used to describe the case in which a large proportion of respondents in a study score at the highest possible limit; flooring effect reflects scoring at the lower limit (Kortlever et al., 2015). Flooring and ceiling effect influences analysis because it may affect the normality of data resulting in Type I errors (2003) and reduce the amount of variation in data from the questionnaire (Kortlever et al., 2015). In a pre/post study to evaluate the impact of instructional interventions, a ceiling effect on a premeasure would limit the ability to measure improvement over time. For example, Liu and colleagues (2018) employed the Academic Motivation Scale-Chemistry (AMS-Chemistry), a five-point Likert-scale survey, to evaluate students’ academic motivation in a lecture-based class and an active learning class. For the lecture-based class, the descriptive statistic for the pre-measures shows that students average score on identified regulation (one of the subscales) is 4.09 with 0.73 standard deviation. The standard deviation suggests that identified regulation has a ceiling effect, where a sizable proportion rated it at 5.00, the highest value possible. That is, for these students, it is not possible to detect improvement on this subscale in a pre/post comparison.

Another concern with response style bias in data collected from a Likert-scale survey is the correlation between dimensions which are not expected to relate. Lewis (2018) conducted a study on students’ goal orientations in general chemistry courses, using a Likert-scale survey, the Achievement Goal Questionnaire (AGQ), designed to measure students’ goal orientation with three dimensions: task, self and other. These three dimensions are related to two goal orientations, the task and self dimensions describe mastery orientation and the other dimension measures performance orientation (Brotherton and Preece, 1996). According to Dweck and Leggett (1988), performance-oriented individuals seek to establish sufficiency of their ability and their goal is to gain positive judgement. In contrast, mastery-oriented individuals’ goal is to improve their ability and increase their competence. An individual is not expected to be performance-oriented and mastery-oriented at the same time. In the Lewis 2018 study, student responses on the AGQ were grouped using cluster analysis. The results of the cluster analysis demonstrated that 60% of the students rated all three dimensions at a similar level leading to the cluster labels “high all”, “average all”, or “low all”. These groupings suggest a correlation among the three variables for these students. As a result, it is not clear which type of goal orientations (mastery or performance) these 60% of students are inclined to and the underlying theory does not describe students engaging in both goal orientations. Concerned with the challenge of response style bias, the current study explores whether a rank-sort survey technique, based on Q methodology, can address this challenge while maintaining the advantage of easy administration to a large sample size as seen with Likert-scale surveys.

Q methodology

Q methodology has been designed to understand human subjectivity (Brown, 1996) by exploring a person's responses on a given topic and the extent to which that person's responses are shared by other individuals (McKeown and Thomas, 2013). Inherent in the methodology is requesting a participant to rank a set of statements on the participant's subjective impressions (e.g. applicability to the individual, extent of agreement) and therefore creating the need to prioritize some statements over others (Kotul'áková, 2020). Thus, with Q methodology, researchers are able to explore which statements the participants value more relative to others concerning a dimension.

Traditionally, Q methodology includes a four-step procedure (Brown, 1996). The first step is for the researcher to create a concourse, a large set of statements which represents the discussion regarding a particular topic. Next the researcher selects a Q-sample, which means an appropriate set of statements from the concourse that captures the essential nature of the topic. The third step is Q sorting, where respondents are requested to rank and sort the Q-sample into categories with the number of statements per category following a predefined quasi-normal, pyramid shaped distribution as demonstrated in Table 1. Each category is labeled in terms of a subjective impression; in the case of Table 1, the categories are labeled by extent of agreement. To place the statements in each category, respondents have to compare all the statements and provide a relative ranking of them, thereby avoiding response style bias as the same rating cannot be applied across more than a set number of statements. In practice, participants can engage in Q sorting independently guided by detailed instructions or through guidance provided by a facilitator. The final step is the researcher analyzes the rank-sort, which can use factor analysis to group participants with different opinions on the topic.

Table 1 Example of the quasi-distribution for 28 statements
image file: d1rp00206f-u1.tif


Q methodology has been widely used to uncover opinions, attitudes and values on certain topics regarding healthcare issues and public policies (Webler et al., 2001; Baker et al., 2014). However, it has not been extensively utilized in chemistry education research to date. Only one recent study employing Q methodology was found in chemistry education. Kotul'áková (2020) applied Q methodology to identify preservice chemistry teachers’ beliefs about teaching and learning science. In their study, 69 chemistry teachers were purposefully selected and asked to rank and sort a series of 51 statements in a fixed distribution based on their level of agreement on each statement from strongly agree to strongly disagree. Three groups of preservice teachers associated with different educational beliefs were identified. However, it was also indicated by the author that these results are not considered to be effective for generalization purposes because of the small number of participants and the purposeful sampling strategy.

Developing rank-sort surveys

The limited generalizability is an important consideration with the utility of Q methodology studies (Ramlo, 2016). To address this issue, researchers have adapted Q methodology into a survey design called a rank-sort survey. In a rank-sort survey, participants are given a Q sample of statements and asked to rank them based on categories of a subjective trait. The number of statements per category are limited to a quasi-normal distribution (e.g.Table 1) which promotes a relative ranking of the statements and limits the number of ties in the ranking. Since the number of ties are limited, rank-sort surveys have the potential to alleviate response style bias which may be found in Likert-scale surveys.

Three prior research studies have compared rank-sort and Likert-scale surveys. Prior to reviewing these studies, it may be helpful to review terminology particular to the data analytic methods used in the studies. Data collected with a survey is commonly stored in a database where each column represents a survey item and each row a survey participant with an array of scores that indicate each participant's rating of each item. A conventional exploratory factor analysis identifies a set of factors, and each factor is comprised of survey items that are scored consistently across the participants. Thus, in conventional factor analysis a factor represents a set of items with high commonality. An alternative technique can be used, termed an inverted factor analysis, also exploratory, which is conducted on the transposed version of a dataset. Inverted factor analysis finds individuals with common scoring across the items and thus factors from an inverted factor analysis represent participants who score items in common. Possibly related to historical precedent, researchers have commonly used inverted factor analysis with rank-sort data and conventional factor analysis with Likert-scale data, although either factor analysis technique can be used with either survey type.

In one study, ten Klooster and colleagues (2008) evaluated individuals’ perceptions of beef through administering a rank-sort survey to 51 respondents and a Likert-scale survey to 160 respondents. Each survey used an identical set of 30 statements and the rank-sort survey was administered in person while the Likert-scale survey administration is not specified but a 16% response rate is reported implying the survey was administered remotely. Correlations among the average rating of the rank-sort survey item and the corresponding Likert-scale item were above 0.9 indicating general agreement in the relative ranking of the items. An inverted factor analysis was used with the rank-sort survey data and a conventional factor analysis was used with the Likert-scale survey data. The authors concluded that the rank-sort survey enabled characterizing varying participant perspectives about the survey target while the Likert-scale survey characterized varying dimensions which are considerations when forming a perception of the survey target. For example, rank-sort characterized a group of participants as idealistic critics, a group of people generally positive about beef but concerned about animal welfare, while Likert-scale found that beef image is characterized by dimensions of price, food safety, animal welfare and so on.

Thompson et al. (2012) compared a rank-sort and Likert-scale survey each designed to measure community views on a natural resource. Each survey used the same set of 36 items and the surveys were mailed, using random assignment of survey type, to 1700 participants. This research group also employed an inverted factor analysis with the rank-sort data but evaluated the Likert-scale data with both a conventional and inverted factor analysis. Four factors were identified in the rank-sort data and three factors with the Likert-scale conventional factor analysis. Characterizing each factor by the items that were most related to these factors, two of the factors from rank-sort and two of the factors from Likert-scale were similar, a third featured some differences and the fourth rank-sort factor was seen as potentially unique from the Likert-scale survey data. The inverted factor analysis on the Likert-scale data showed similarity with the three factors of the rank-sort survey, however the fourth rank-sort factor remained unique. The authors concluded that the rank-sort survey technique offered similar insight into participants’ perspectives as Likert-scale, but the rank-sort offered greater flexibility by characterizing groups of participants across a generalizable sample.

Eyvindson and colleagues (2014) administered a rank-sort survey in person followed in two months with a Likert-scale survey by mail to the same set of respondents (N = 34). The survey pertained to rating forest planning processes. The rank-sort and Likert-scale survey were each analyzed using conventional and inverted factor analyses. In comparing the conventional factor analysis of the two survey approaches, the authors report no obvious similarities between the factors identified. In comparing the inverted factor analysis technique, grouping of participants had an observed Spearman rank correlation of 0.64, indicating general agreement in participant grouping but also identifying some differences. The authors also note that the inverted factor analysis technique can leave some participants without a group assignment, while a conventional factor analysis followed by cluster analysis would group all the participants. Ultimately, the authors stress the importance of reporting survey results as dependent on the methodological choices enacted with the survey.

Combined, this set of past work highlights the potential for rank-sort and Likert-scale data to provide differing insight into participants’ views although two concerns were manifest in reviewing the past work. First, some of the past work has used different analytic techniques between the survey types. For example, ten Klooster and colleagues (2008) used an inverted factor analysis with a rank-sort survey but a conventional factor analysis with the Likert-scale survey. The use of different analytic techniques with each survey means that differences in the results may be attributed to the different survey methodology or the different analytic technique and it is not possible to distinguish the impact of each. Second, each of the past studies observed differences in the survey results between survey types, however it is not clear which survey type generated results that are more accurate or useful. A necessary step in comparing survey methodologies would be to explore the results for each survey in how they relate to an external measure. To determine the relative accuracy of survey responses between each survey type, an external measure that is an independent measure of the same construct targeted by the surveys would be used. To determine the relative utility of survey responses between each survey type, an external measure that is expected to relate to the construct targeted by the survey would be used. Without such an external measure, one could only conclude that the differing survey methodologies generate different results, but no information is gained over whether one methodology offers an advantage over another.

This study seeks to expand the efforts to contrast the data generated from rank-sort and Likert-scale surveys into chemistry education research. Informed by the above concerns, this study will use a common analytic technique with both survey designs and will relate the survey results from both survey methodologies to an external measure to determine the relative utility of the results from each survey methodology. More specifically, this study will explore chemistry students’ motivation to learn chemistry using Likert and rank-sort surveys and compare the survey results to academic performance in a chemistry class.

Self-determination theory

Self-determination theory (SDT) is a framework that describes human motivation and personality (Deci and Ryan, 2000; Ryan and Deci, 2000; Deci et al., 2017). This theory describes autonomy as a basic psychological need for human flourishing and adopts a multidimensional description of motivation based on the type of regulation from controlled to autonomous (Deci and Ryan, 1987). Individuals who act in autonomous behaviors have an internal locus of causality and their behaviors are initiated by a sense of choice and personal volition. In contrast, individuals who act in controlled behaviors have an external locus of causality and their behaviors are regulated by external pressures (e.g. rewards for completing a task or punishment for not completing a task) or internal pressures (e.g. feeling anxious or guilty for postponing a task).

SDT posits motivation as a continuum ranging from extrinsic motivation to intrinsic motivation, with amotivation as a state of lack of motivation (Ryan and Deci, 2000). Intrinsic motivation is viewed as the prototype of autonomy (Deci and Ryan, 1985; Ryan and Deci, 2000). It is the driving force for engaging in activities out of genuine interest and enjoyment. For example, in a chemistry class, students may freely engage in solving a problem on molecular polarity simply because they find it inherently enjoyable and interesting. Extrinsic motivation is the drive to attain consequences which are different from the inherent nature of the task (e.g., avoiding punishment or obtaining rewards) (Taylor et al., 2014). For example, chemistry students may not experience inherent enjoyment when they are doing the practices on polarity; however, they are studying because they are interested in the outcome of a better grade.

SDT specifies different types of extrinsic motivation varying in the extent to which they are autonomous (Deci and Ryan, 1985; Ryan and Deci, 2000). These types, from the least to the most autonomous, are comprised of external, introjected, identified, and integrated regulation. External regulation means being motivated by an external pressure or reward with which individuals often perceive their behaviors as being directly controlled by others. Through the process of internalization, initially external regulations can be transformed into internal regulations (Black and Deci, 2000). Introjected regulation describes a state of engaging with activities because of an internalized sense of compulsion, pressure toward standards, or self-esteem contingencies (Assor et al., 2009). An example would be a student who studies long hours to prove to themselves that they are worthy. Introjected regulation is a partial form of internalization, which is more autonomous than external regulation but still represents a controlled behavior, albeit an internally generated control (Deci et al., 2017). Identified regulation refers to a fuller internalization, in which individuals engage in a task because it is seen as valuable or worthwhile but not inherently interesting or enjoyable (Donald et al., 2020). For example, when a student does extra exercises on polarity because they believe it will help them understand the concept better. Identified regulation is more autonomous relative to external and introjected regulation. Finally, integrated regulation represents a fully internalized state of extrinsic motivation, in which behaviors are motivated by one's identity and core value (Ryan and Deci, 2000). It is the most autonomous motivation within extrinsic motivation states. SDT also considers amotivation as a feeling of ineffectiveness, lack of purpose or resistant toward action (Donald et al., 2020). SDT describes motivation as a continuum from extrinsic to intrinsic and thus an individual is not expected to exhibit both forms of motivation simultaneously.

Academic motivation and students’ achievement

Intrinsic motivation and the more internalized extrinsic motivation states have been found to positively correlate with students’ academic outcomes in STEM courses, including persistence, engagement and exam scores (Black and Deci, 2000; Vansteenkiste et al., 2004; Austin et al., 2018). Educational studies demonstrated the importance of intrinsic motivation for academic achievement (Deci et al., 1991; Pintrich, 2003; Niemiec and Ryan, 2009; Goldman et al., 2016). Taylor et al. (2014) explored the correlations between different types of motivations and academic performance in a meta-analysis and three empirical studies of high school and college students in Canada and in Sweden. The meta-analysis results suggest that intrinsic motivation and identified regulation have a moderately strong, positive relationship with academic achievement, while introjected and external regulation have a weaker, but significant negative relation with school achievement. The results from three empirical studies also showed a positive relation of intrinsic motivation and academic achievement.

Evidence for the positive correlations between intrinsic motivation and students’ academic outcomes was also found in chemistry settings. Liu and colleagues (2017) used the AMS-Chemistry survey to measure students’ academic motivation in a college level general chemistry course. Results showed that students’ intrinsic motivation ratings were positively associated with academic achievement at the end of the term. Austin and colleagues (2018) explored the correlation between motivation and student outcomes in general organic chemistry. In this study, motivation has been operationalized as a construct that includes intrinsic motivation, career motivation, self-determination, self-efficacy and grade motivation. The results suggest that students’ performance was strongly correlated with self-efficacy and self-determination, the latter of which is related to autonomy. In summary, within SDT, intrinsic motivation was the only type of motivation that consistently predicted students’ academic achievement.

Rationale

Although Likert-scale surveys are prevalently used to measure students’ affect, the data generated by a Likert-scale survey may exhibit response style bias. As a result, Likert-scale survey data may make it difficult for researchers to understand which factor the participants agree with more than other factors. Similarly, results of a Likert-scale survey may be less informative to instructional practice if the data does not distinguish relative placement of constructs being measured. The challenges associated with response style bias are particularly problematic for surveys that have subscales, each measuring a unique dimension, and the dimensions are not meant to co-occur. This study is designed to investigate a potential remedy to this issue via the use of a rank-sort survey, where participants are not able to score all items consistently. By limiting response style bias the rank-sort survey could serve as an alternative to Likert-scale surveys. Additionally, the previous studies comparing a Likert-scale survey with a rank-sort survey were commonly conducted through different analysis techniques, which served as a confounding factor in understanding the impact of the survey methodology. The current work expands the literature by contrasting two survey methodologies and using a consistent analytic technique with each survey. Further, no prior work was found that compared these survey methodologies by examining how well the survey results related to an external measure, thus it was unclear whether one methodology offered an advantage in accuracy or utility over the other methodology. The current work addresses this gap by relating the results of each survey methodology to an external variable to determine the utility of the data collected from each survey methodology.

SDT serves as an ideal framework for exploring this issue as it describes dimensions, extrinsic and intrinsic motivation, which are not thought to co-occur. Thus, response style bias poses a particular threat as it would result in some participants rating both forms of motivation equally in contrast to the expectation from theory. Further the relevance of academic motivation for academic achievement has been demonstrated in general (Black and Deci, 2000; Vansteenkiste et al., 2004; Austin et al., 2018) and more specifically within chemistry (Liu et al., 2017). It is thus possible that a rank-sort survey using the items from the AMS-Chemistry survey may generate data that enables describing participants more in line with the SDT framework than the Likert-scale version of the same survey. If the different survey styles result in differing motivation profiles assigned to participants, it is also necessary to investigate the utility of the motivation profiles. SDT and past literature (Liu et al., 2017) provides a prediction of intrinsic motivation positively related to academic achievement. Using this expected relationship, the motivation profiles from each methodology will be compared to academic achievement to determine which survey methodology has greater utility in measuring students’ motivation that features the expected relationship with academic achievement. Thus, this work pursues the following research questions:

(1) To what extent does the use of Likert-scale versus rank-sort survey generate different descriptions of general chemistry students’ academic motivations?

(2) To what extent do different descriptions of academic motivation, generated by each survey methodology, relate to students’ academic performance in a general chemistry course?

Methodology

Ethical statement

Approval to conduct this study was obtained by the university's institutional review board as Study 001406. All data was collected with participants’ consent. Neither researcher was a teacher of the classes in which data was collected. The first author was a teaching assistant for the classes, responsible for importing and announcing assignments in the online courses. To minimize the potential double agency, where students may unintentionally perceive participating in the study as required for the course, the second author who was unaffiliated with the course posted the recruiting information for the study.

Setting

This study took place at a large research-intensive university located in the southeastern United States during the Fall semester in 2020. Data was collected in nine classes of first semester general chemistry (GC1) each taught online due to COVID-19 restrictions. Students’ motivation in the current study might be impacted by the online class setting because of the limitation of interactions with instructors and peers. Therefore, it is unclear the extent that the results would generalize to an on-campus setting under non-pandemic conditions. The class sizes range from 140 to 208 students. The class format includes two regular lecture sessions (75 mins for each) and a Peer-Led Team Learning (PLTL) session per week (Lewis, 2011). Attendance to PLTL session was 10% of students’ grade. The classes were coordinated using the same syllabus, textbook and learning objectives and students across the classes had common homework assignments and concurrent exams. There were three interim exams (each 15% of the grade) and one cumulative final exam (25% of the grade). Each exam contained 15 or 26 multiple-choice questions and 5 or 8 fill-in-the-number questions. For each question, there was a question bank including three different items aligned with the same learning objective. The item for each question was randomly selected from the question bank and assigned to each student. Therefore, Cronbach's alpha for the interim and final exams were not calculated. Correlation values among the exams ranged from 0.55 to 0.73 indicating convergent validity among the set of exams scores.

Instruments

Likert-scale survey. AMS-Chemistry was used to measure students’ chemistry-specific motivation (Liu et al. 2017). The original version of AMS-Chemistry is a Likert-scale instrument that asks students: “Why are you enrolled in this chemistry course?” The survey contains seven subscales: to experience, to accomplish, to know (each of these measuring intrinsic motivation), identified regulation, introjected regulation, external regulation and amotivation. The earlier constructs of integrated regulation and identified regulation from SDT were combined into a single identified regulation subscale in this survey as both represent high autonomy within the extrinsic motivation (Ryan and Deci, 2000; Liu et al., 2017). Each subscale consists of four items and the instrument contains 28 items in total. The Likert-scale for this instrument has five response options: not at all, a little, moderately, a lot and exactly. In scoring the instrument in line with the instrument developers (Liu et al., 2017) these five responses are converted into a score of 1 for not at all to 5 for exactly. Below are exemplar items on this instrument and the full instrument can be found in the reference (Liu et al., 2017).

Amotivation (A): “Honestly, I don’t know; I really feel that I am wasting my time taking chemistry courses.”

Extrinsic, external regulation (ER): “In order to obtain a better job later on.”

Extrinsic, introjected regulation (InR): “Because I want to show myself that I can succeed in studying chemistry.”

Extrinsic, identified regulation (IdR): “Because I believe that chemistry courses will improve my skills in my chosen career.”

Intrinsic, to experience (TE): “For the satisfaction I experience while learning about various chemistry topics.”

Intrinsic, to accomplish (TA): “For the satisfaction I experience while improving my understanding of chemistry.”

Intrinsic, to know (TK): “For the pleasure that I experience in broadening my knowledge about chemistry.”

Rank-sort survey. A rank-sort survey was designed by employing the same prompt, “Why are you enrolled in this chemistry course?”, and same items (e.g. “In order to obtain a better job later on.”) from the Likert-scale AMS-Chemistry. While a Likert-scale survey requires rating each item with a five-point scale, a rank-sort survey asks students to sort statements into five groups. The label for the five groups matched the five-point scale used in the Likert-scale survey. Students’ responses to these groups have been transformed into the same numeric scale as the Likert-scale survey data prior to data analysis, ranging from 1 for “not at all” to 5 for “exactly”. The transformation of rank-sort data matched prior work comparing rank-sort to Likert-scale surveys (Thompson et al., 2012; Eyvindson et al., 2014) and resulted in similar matrices as those of Likert-scale surveys, where students’ responses for each item serve as separate rows. The rank-sort survey was created in Qualtrics, an online questionnaire software, using “Pick, Group and Rank” question type. A java syntax was used to limit the number of statements that are allowed for each group. See Fig. 7 in Appendix for an image of how this survey was presented to students. A reading check statement asking students to sort that statement into “a little” was included among the statements, and a similar statement was included in the Likert-scale survey. The number of statements required for each group in the rank-sort survey followed a quasi-normal distribution, with one extra statement added to “a little” to account for the reading check. The resulting distribution was: 4 statements for “Not at all”, 7 for “a little”, 8 for “moderately”, 6 for “a lot” and 4 for “exactly”.

Short version surveys

A short version for each survey (Likert-scale survey and rank-sort survey) was created with the aim to explore the impact of instrument length on the ability of each method to measure motivation. To create the shorter version of the Likert-scale survey, the item with the least factor loading for each factor in Liu et al.'s (2017) study was removed from the instrument. Therefore, it resulted in a shortened Likert-scale survey with only 21 items (22 items with the reading check) and a corresponding shortened rank-sort survey with the number of statements per group following the pattern 3 “not at all”, 5 “a little” (including the reading check), 7 “moderately”, 4 “a lot” and 3 “exactly” statements. The original surveys will be referred to as L28 (Likert-scale with 28 items) and Q28 (rank-sort with 28 items) and the shortened surveys as L21 (Likert-scale with 21 items) and Q21 (rank-sort with 21 items).

Data collection

Students were prompted to take an online survey using Qualtrics following the second of four tests, approximately half-way through a fifteen-week semester. Upon clicking the link to the survey, the student were randomly assigned to one of four surveys, L21, Q21, L28 or Q28. Students were allowed one week to finish the survey and received extra credit for completing the survey equal to 0.5% of their final grade in the course. Each survey required a complete response to submit a response. That is, for the rank-sort surveys, all the statements had to be sorted into the groups prior to submitting the response and for Likert-scale surveys, students had to select an option for each item prior to submitting the survey.

Of the 1871 students enrolled in the nine GC1 classes, 875 students completed one of the motivation surveys and consented to the study. Students who didn’t respond to the reading check item in accordance with the item description were removed prior to the data analysis. Of the 875 students, 28 students were removed owing to the check item, resulting in complete surveys from 847 students in total. Among the 847 surveys, 213 students completed Q28, 204 students completed Q21, 208 students completed L28, and 222 students completed L21. Students with the Q sort survey were more likely to be flagged by the check item, 21 students versus 7 with Likert, but the rate of students missing the check item was below 5% with each survey type.

Internal structure

Past work comparing Likert-scale and rank-sort survey data relied on conventional exploratory factor analysis or inverted factor analysis. As the AMS-Chemistry has a previously defined factor structure, a confirmatory factor analysis (McFate and Olmsted III, 1999) was used to explore whether the expected factor structure was present in the survey data. A confirmatory factor analysis was conducted on the survey results using the seven-factor model established by Liu et al. (2017) for each survey. The data were treated as continuous, and a maximum likelihood robust estimator was used. This estimator takes into account nonnormally distributed data. To calculate the estimator, the model fixes the first factor loading on each factor to 1 and allows other parameters, including other loadings, variances, and covariances, to be freely estimated.

The data for each rank-sort survey failed to converge with the seven-factor model. As indicated in the literature review, one possible explanation for failing to converge is the limits of statements that respondents can place with each rating (e.g. 6 statements for “a lot” in Q28) which prevented making consistent ratings across the set of four items that belonged to each factor. The Likert scale survey data converged as expected with acceptable fit statistics: the L21 data converged with fit indices of: CFI = 0.971, SRMR = 0.044, and RMSEA = 0.050 and the L28 data converged with fit indices of: CFI = 0.945, SRMR = 0.066, and RMSEA = 0.059. The Cronbach's alpha for the set of items within each of the seven factors for L21 and L28 ranged from 0.82 to 0.92 indicating consistency within each set of items. Since previous studies provide an evidence base supporting the seven subscale internal structure, the data is presented as subscale scores, representing the average of the items assigned to each subscale. The same analyses techniques were also conducted on the by-item data and are presented in the appendix to determine the sensitivity of the results to the creation of subscales; no notable changes in the substantive outcomes were observed when conducting the analyses by-item.

Results

To aid the presentation of the results data a comparison of the L28 and Q28 survey data will be conducted first. Following this comparison, the L21 and Q21 data will be presented with attention paid to the similarities or differences observed from the original comparison. To describe the academic motivation of General Chemistry I students from the L28 and Q28 surveys the mean score on each of the items and percentage of respondents on each response choice are presented in Fig. 1. These items are labeled and sorted according to the intended factor. The mean scores of the items indicate that students overall had an academic motivation aligned more toward identified regulation and external regulation, with introjected regulation and three types of intrinsic motivation rated in the middle and amotivation rated lowest. The pattern of relative rank of each item was consistent between the Likert and rank-sort survey and the spearman rank correlation observed was 0.96. For L28 items, the average skewness is 0.01 and the average kurtosis −0.28. For Q28 items, the average skewness is 0.23 and the average kurtosis is −0.03. The kurtosis values indicate the rated scores for items in Q28 have a wider range or scores than items in L28. Fig. 1 demonstrates a clearer picture of the distribution of students’ response choice on each item for both L28 (left) and Q28 (right). The upper x axis is set for the mean score and the actual mean score for each item is demonstrated in the circle. The bottom x axis is set for the percentage of respondents for each response choice. The length of each bar with a specific color demonstrates the actual percentage of respondents that selected a specific response choice for each item. According to the figure, the distribution of students’ rate is roughly similar for amotivation items for both L28 and Q28, where the majority of students rated amotivation on “not at all”. However, for the rest of the items, students tended to rate them on the extreme levels (“not at all” and “A lot”) for L28 compared to Q28. The results indicate that a rank-sort survey could help reduce the extreme response style in students’ response compared to a Likert-scale survey.
image file: d1rp00206f-f1.tif
Fig. 1 Item-level mean scores and percentage of respondents on each response choice for L28 and Q28. TK = To know, TA = To accomplish, TE = To experience, IdR = Identified regulation, InR = Introjected regulation, ER = External regulation, A = Amotivation.

While the descriptive statistics describe the average responses for the entire cohort, the next consideration was for how groups of students answered the set of items. To identify groups of students, cluster analysis was used to classify students with similar patterns across the 7 subscales. Cluster analysis uses an algorithm to group members that are as similar as possible to others within a group and as different as possible to those in other groups (Clatworthy et al., 2005). The researchers aim to identify clear groupings that are large in a size that allows for meaningful implications without obscuring differences among participants (Ryan and Huyton, 2000).

Hierarchical and K-means cluster analysis are two of the most widely used approaches in previous studies (Clatworthy et al., 2005). The data analysis for this study started with a hierarchical cluster analysis, where each student began as a single cluster and students are grouped together in terms of their similarity. Squared Euclidean distance was used as the metric for distance to determine similarity. Ward's method was used as the algorithm to minimize the sum of squares when calculating for the distance. The characteristics of each cluster are qualitatively described by considering the cluster centers (average value of the participants belonging to each cluster) for each subscale relative to the scores for other subscales within the same cluster and the overall average score for the same subscale across clusters.

To generate the clusters, a hierarchical approach was employed using SPSS. To determine the number of clusters, a range of clusters were considered and evaluated based on whether each cluster represented a unique description of the data and had sufficient sample size. The results suggested a three-cluster solution for each of the four surveys. Following that, centers for the three-cluster solution was used as a starting point for K-means clustering. The average score on each subscale for the resulting cluster solution in the L28 and Q28 data are presented in Fig. 2.


image file: d1rp00206f-f2.tif
Fig. 2 Averages for each Cluster from L28 and Q28 Survey Data.

The graphs in Fig. 2 are demarcated based on the subscales. Beginning with the L28 survey the three clusters were labeled as “Average all”, “High all” and “Amotivation” based on the values observed for each cluster across the set of items. The “Average all” label was formulated as the average response to items from external regulation, introjected regulation and intrinsic motivation all feature considerable overlap within the 2.5 to 4 range. The scores from the remaining metrics, identified regulation and amotivation, were approximate to the overall average for identified regulation (3.52 to 4.15 in Fig. 1) and amotivation (1.48 to 1.90 in Fig. 1). In addition, the other L28 clusters were more pronounced on identified regulation and amotivation. Overall, this cluster represented a group of students, comprising 42% of the sample, whose responses matched the overall pattern observed across the items.

The second cluster was labeled “High all” and contains respondents who rated both intrinsic motivation and the extrinsic motivation subscales higher than the average, and amotivation lower than the average. This group described 23% of the sample and rated intrinsic motivation and the extrinsic motivation subscales greater than 4.00. The third cluster was labeled “Amotivation” owing to the higher rating of amotivation subscale than the other clusters combined with lower levels of both intrinsic motivation and extrinsic motivation compared to the overall average and the other clusters. The consistency in relative ranking of each cluster across the set of items describes a limit in this survey methodology to differentiate the type of motivation among students and may be indicative of a response style bias.

In the Q28 survey data, the three clusters were labeled “Identified intrinsic”, “Identified external” and “Amotivation”. Students in the “Identified external” cluster rated external regulation (3.98) higher than the intrinsic motivation subscales (2.45 to 3.01) and 0.52 higher than the overall average on the external regulation subscale. These students also rated identified regulation comparably higher (4.19), although Q28 students overall tended to also rate identified regulation high (3.95). This cluster represented nearly half (45%) of the sample. The second cluster “Identified intrinsic” motivation also rated identified regulation high (4.03) as well as rated all three intrinsic subscales (3.10 to 3.65) higher than the overall with an average increase of 0.44. This cluster represented nearly a third of the sample at 34%. The third cluster “Amotivation” has higher scores on amotivation (3.25) compared to the overall average (1.62). Students in this cluster represented 21% of the sample and rated external regulation (3.71) higher than the overall average (3.46), rated introjected regulation close to the average (3.09) and rated the other four subscales below the overall average.

To test whether the rank-sort survey identified intrinsic versus extrinsic motivation accurately, the profiles were compared based on the test scores earned in the class. The theoretical basis predicts that students who exhibit intrinsic motivation will be more apt to persist in the face of challenges and ultimately perform better on a measure of academic performance such as a test. The relationship between students’ motivation profiles (clusters) and students’ exam scores across the semester are presented in Fig. 3. A test of normality was conducted with each of the test scores as a dependent variable for both surveys. The results indicated that none of the tests was normally distributed. Therefore, a Kruskal–Wallis test (a non-parametric equivalent of ANOVA) was conducted to determine if the clusters performed significantly different on each test. The results of the Q28 clusters on each test score showed that the groups of students performed significantly different on each test (H ranged from 17.0 to 31.0, each p < 0.05). Kruskal–Wallis test of the L28 clusters showed similar results on the omnibus test (H ranged from 10.8 to 16.4, each p < 0.05).


image file: d1rp00206f-f3.tif
Fig. 3 Average test percent correct for each cluster from L28 and Q28 survey data.

An examination of the graphs in Fig. 3 shows that both survey types show a clear departure of performance among the students whose responses were classified as “Amotivation.” The L28 clusters have “High all” consistently outperforming the “Average all” group, though the differences are at times small including less than one percent difference on Test 2. To characterize the extent of the differences, follow-on pairwise comparisons were done using the Mann–Whitney test to determine effect sizes as the correlation coefficient r. Coefficient r values have been described as 0.1 representing a small effect size, 0.3 medium and 0.5 large (Cohen, 1988). The pairwise differences between “High all” and “Average all” range from r = 0.05 (Test 2) to 0.14 (Final Exam). The Q28 clusters showed the “Identified intrinsic” cluster consistently outperforming the “Identified external” cluster, matching the expectation from theory. The pairwise differences between “Identified Intrinsic” and “Identified External” range from r = 0.18 (Final Exam) to 0.28 (Test 3), a range of effect sizes that are consistently larger than the L28 “High all” versus “Average all” groups.

Comparison of L21 and Q21

To describe the data from the L21 and Q21 surveys the item-level mean score and percentage of respondents on each response choice for L21 and Q21 are presented in Fig. 4. As with the longer surveys, identified regulation and external regulation were rated highest, with introjected regulation and three types of intrinsic motivation rated in the middle and amotivation rated lowest. The pattern of relative placement was consistent between the Likert-scale and rank-sort survey and the spearman rank correlation observed was 0.94. For L21 items, the average skewness is 0.01 and the average kurtosis 0.36. For Q21 items, the average skewness is 0.31 and the average kurtosis is 0.57. In particular, the kurtosis for the amotivation items in both surveys are larger (between 3.5 and 6.5) indicating the presence of outliers on these items from what is expected in a normal distribution. As seen before, with the Q21 survey fewer students selected the extreme response choice compared to L21.
image file: d1rp00206f-f4.tif
Fig. 4 Item-level mean scores and percentage of respondents on each response choice for L21 and Q21. TK = To know, TA = To accomplish, TE = To experience, IdR = Identified regulation, InR = Introjected regulation, ER = External regulation, A = Amotivation.

The clustering of data from each short survey using the same method as before led to three clusters that had similar group identifications as the longer survey (e.g. L21 had similar clusters as L28). The average of each cluster across the subscales are presented in Fig. 5. As before, the Likert-scale survey led to clusters with uniform relative ratings across the extrinsic and intrinsic subscales. Also, the rank-sort survey Q21 led to the differentiation of a group that rated the intrinsic motivation subscales higher and another group that rated the extrinsic motivation subscales higher, matching the expectations of SDT. Also as before, the rank-sort survey identified a smaller percentage of students within the amotivation group than the Likert-scale survey does. Further the amotivation group is more clearly demarcated on the amotivation subscale with the rank-sort survey.


image file: d1rp00206f-f5.tif
Fig. 5 Averages for each cluster from L21 and Q21 survey data.

The clusters were then compared on test scores to determine if the groupings offered information relevant to student performance in the course. The test scores for each group are presented in Fig. 6. With the L21 survey, there was little delineation between the amotivation cluster and the average all cluster, which is unique from the L28 survey data. The Q21 data follows the same relative pattern as the Q28 data but the gaps are not as pronounced, particularly with Tests 1 and 3. Similar as L28 and Q28, a normality test was conducted on each test score, the results of which indicates that the test scores do not follow the normal distribution. Kruskal–Wallis test of the L21 data showed the clusters differed significantly on Test 1, Test 3 and the final test (H ranged from 7.2 to 15.7, each p < 0.05). Kruskal–Wallis test of the Q21 data showed the clusters differed significantly on Test 1, Test 2 and the Final Exam (H ranged from 7.4 to 17.1, each p < 0.05)


image file: d1rp00206f-f6.tif
Fig. 6 Average test percent correct for each cluster from L21 and Q21 survey data.

The effect sizes in the L21 clusters between the “High all” and “Average all” range from r = 0.14 (Test 1) to 0.29 (Final exam). As can be seen in Fig. 6, though, the L21 survey data showed a negligible difference between the “Average all” and “Amotivation cluster with an effect size of r = 0.13 for Test 1 and then ranging from 0.0047 (Test 2) to 0.05 (Final exam). The effect sizes in the Q21 clusters between “Identified intrinsic” and “Identified external” was negligible for Test 3 (r = 0.02) but ranged from 0.11 to 0.16 on the remaining measures. The Q21 data also differentiated the amotivation group from the others, this group differed from identified intrinsic with a range of effect sizes from r = 0.16 (Test 3) to 0.48 (Test 2).

Discussion

To synopsize the results, the use of Likert-scale versus rank-sort surveys for the AMS-Chemistry led to consistent relative rankings of the items in the instrument as shown in Fig. 1 and 4. A cluster analysis to find patterns in student responses showed that the rank-sort survey responses can be grouped as identified intrinsic, identified external and amotivation, in line with the expectation from SDT. The inclusion of identified regulation with both the identified external and identified intrinsic clusters matches the description of identified regulation as a form of more autonomous extrinsic motivation that is a precursor to intrinsic motivation (Ryan and Deci, 2000); thus students can be thought of as part of a spectrum with identified external closer to extrinsic motivation and identified intrinsic closer to intrinsic motivation. In contrast, the same approach of Likert-scale survey data found groups that were consistent across both the extrinsic and intrinsic survey items and were labeled as high all, average all and amotivation, the latter scoring lower across the set of intrinsic and extrinsic items. These groups do not offer a clear delineation of student preferences for intrinsic versus extrinsic motivation and are difficult to reconcile with the theoretical basis for the instrument. Instead, these groups may be representative of response style bias; for example, some students in the high all group may tend to pick scores rating agreement. It should also be noted that the cluster analysis for each survey methodology led to consistent clustering between the longer and shorter surveys thereby supporting the reliability of the clustering results.

Finally, an expectation of SDT is that students with intrinsic motivation would perform better on measures of academic performance, an expectation that has been supported in the research literature. The clustering in the rank-sort survey data matched this expectation with students grouped as identified intrinsic consistently outperforming the students grouped as identified external or amotivation on the chemistry tests given throughout the semester. The difference between identified intrinsic and identified external had effect sizes range from 0.11 to 0.28 (ranging from a small to medium effect size) with one exception in the Q21 survey data. In contrast, the Likert-scale data was less consistent where the high all and average all clusters featured small differences (r = 0.05 to 0.14) in the L28 data, while the average all and amotivation clusters featured no discernible differences in the L21 data (r = 0 to 0.05). Further it was not clear which group among the Likert-scale data would be expected from the theory to experience greater academic success.

As a comparison of survey methodological techniques, it is important to review the limitations of this study as it concerns survey methodology recommendations. It is possible that the rank-sort survey process, and the limit on the number of items per category, led to a situation where some students had to rank items differently than their preference if a particular category was filled. Alternatively, the novelty of the survey technique to students may have also led to a misassignment of category. If present, either of these concerns would serve as a threat to the validity of the responses obtained from the rank-sort survey approach. Future research that examines student response process when engaging with a rank-sort survey could explore the feasibility of these concerns. Second, the results presented describe the methodology applied to one survey at one research setting. It is unknown what effect the distribution of response options per category in the rank-sort survey had on the responses as other distributions that approximate a normal distribution may lead to different outcomes. Studies examining the rank-sort methodology with other distributions, other surveys or at other settings may find different results. Future research is needed to determine if a rank-sort survey approach would offer benefits over Likert with other surveys. A review of the survey design can identify surveys in chemistry education research that may benefit from a rank-sort approach.

The rank-sort survey methodology relies on prompting students to compare the relative placement of the items given in the survey and as a result are best designed for surveys that have multiple dimensions (each measured by a subscale of items within the survey) which require differentiation. The AMS-Chemistry was an example of such a survey with a dimension of intrinsic motivation and a set of dimensions related to varying degrees of extrinsic motivation. Key to this survey type is that these dimensions are excluded by each other according to the theory and not supposed to correlate or correspond. Another example of a potential survey is the aforementioned Achievement Goal Questionnaire. This survey was designed to measure students’ goal orientation with three dimensions: task, self and other. Dweck and Leggett (1988) proposed that individuals could be performance-oriented (measured by ‘other’) or mastery-oriented (measured by ‘task and self’). Adapting this survey for a rank-sort approach would promote students comparing and ranking the items which can lead to better differentiation among the task, self and other dimensions. The differentiation of the dimensions would be better aligned with theory than students offering similar scores across the dimensions as has been found with a Likert-scale instrument in earlier research (Lewis, 2018). Another potential use can be found in Habig and Colleagues’ (2018) work, where they sought to measure the impact of context-based learning on students’ situational interest. In this study, a Likert-scale survey developed by Schiefele (1991) was used to measure situational interest across two dimensions: feeling-related and value-related. Feeling-related is defined as “positive emotions caused by stimuli of the object of interest” and value-related is defined as “the personally perceived meaningfulness of an object”. These two dimensions are considered as conflicting and could be possibly better measured if a rank-sort survey is used. Additionally, Lichtenfeld and colleagues (2012) developed a survey to measure students’ achievement emotions based on control-value theory of achievement emotions (Pekrun, 2006). This survey focused on examining three achievement emotion dimensions: enjoyment, anxiety, and boredom. According to the theory, there is an expected contrast of the positive dimension of enjoyment with the negative dimensions of anxiety and boredom. Finally, the revised study process questionnaire (Biggs et al., 2001) describes two dimensions of study processes as deep and surface. The dimensions are expected to contrast with a deep study process characterized by a motivation to understand and a surface study process with a narrow focus on the content. Students study processes are expected to span a continuum between these contrasting dimensions and thus a rank-sort approach may be beneficial.

Other survey types may be less amenable to the rank-sort survey approach. Unidimensional surveys, where the set of items are meant to measure one common dimension, would not benefit from the rank-sort methodology as the ranking of items prevents commonality of item ratings across the single dimension. For example, Geban et al. (1994) developed a Likert-survey to measure students’ attitudes towards chemistry. Students’ attitudes were treated as a unidimensional construct and measured by 15 items on a 5-point Likert-type scale. In addition, it may not be ideal to use a rank-sort survey to measure a multidimensional construct if the dimensions within the instrument are expected to correlate with each other. For example, Hosbein and Barbera (2020) developed a Likert-scale survey to measure students’ chemistry identity. The construct of chemistry identity includes five dimensions: mindset, situational interest, verbal persuasion, vicarious experiences, and mastery experiences. Each of these dimensions is aligned with a common metric of students’ identity and thus a rank-sort methodology which promotes differentiating scores across these dimensions would not match the intended idea of these dimensions sharing a commonality in measuring students’ identity.

Ultimately, this work seeks to advance knowledge on how to improve the accuracy in measuring students’ affect. The measurement of students’ affect can serve multiple roles in an education setting. Improving students’ affect represents an important instructional goal as instruction should serve to not only develop students’ knowledge and skills but also develop students’ appreciation for a discipline (Wang et al., 2021). The measurement of students’ affect can determine if students’ affect is an area of concern within a class setting, or it can be used to evaluate the impact of a learning intervention or instruction as normal on students’ affect. Additionally, Flaherty (2020) calls for the advancement of interventions designed purposefully to improve students’ affect which will necessarily rely on measures of students’ affect to evaluate. By improving the methods in which students’ affect is measured, each of these goals can be advanced.

Conclusion

Response style bias, referred to as the default preference for a particular response option, has been a concern with data collected through a Likert-scale survey. Limited work has been contributed to evaluate different survey techniques in chemistry education research. This study investigated the utility of a rank-sort survey methodology by comparing data collected from a rank-sort and Likert-scale survey each using the identical statements from an instrument previously used in chemistry education research. The results show that data collected from the rank-sort survey is more aligned with the SDT. Further the rank-sort survey responses related to students’ academic performance in the course in line with theoretical expectations. As a rank-sort survey limits the number of statements for each agreement level, it can be particularly useful when measuring surveys that include multiple dimensions which are expected to contrast. Future work will be needed to explore the potential utility of a rank-sort survey methodology in measuring other dimensions of students’ affect in chemistry education research.

Conflicts of interest

There are no conflicts of interest to declare

Appendix

The appendix presents a copy of the rank-sort survey instrument (Q28) as it appeared to the students in this study in Fig. 7. Following the instrument, the results from the by-item analysis of the same data set are presented.
image file: d1rp00206f-f7.tif
Fig. 7 Interface of the rank-sort survey.

With the by-item analysis, the cluster analysis was conducted following the same procedures as in the original methods. Cluster analysis is typically performed on subscales generated from factor analysis (e.g. external regulation, identified regulation); however, cluster analysis may also be conducted based upon students’ responses to the set of individual items (Ryan and Huyton, 2000). The cluster analyses identified three clusters each in the L28 and Q28 survey data and the cluster averages on each item are presented in Fig. 8.


image file: d1rp00206f-f8.tif
Fig. 8 Averages for each cluster from L28 and Q28 by-item survey data.

Reviewing the cluster averages for each item found that the clusters from the by-item data qualitatively matched the clusters generated with the subscale data. In the L28 survey data, the clusters were identified as “High all”, “Average all” and “Amotivation” with the same pattern as the subscale data. The Q28 cluster analysis of the by-item data generated clusters similar to the subscale data and were labeled as “Identified intrinsic”, “Identified external” and “Amotivation”.

Next, the clusters from the by-item data were compared based on test scores (Fig. 9). With the L28 data, the Kruskal–Wallis test showed the three clusters were significant on each test with H values ranging from 11.6 to 18.7, and for each analysis significant at a 0.05 Type I error threshold. For the Q28 data the H values ranged from 17.2 to 27.5, also with each analysis significant at a 0.05 threshold. In terms of pairwise comparison, the amotivation cluster remained below each of the other clusters. In the L28 clusters “High all” and “Average all” had a range of effect sizes from r = 0.04 to 0.16 while the Q28 clusters “Identified intrinsic” and “Identified external” ranged from r = 0.18 to 0.31. As with the subscale data, the Q28 data showed a consistently larger effect size than the L28 data with the by-item approach.


image file: d1rp00206f-f9.tif
Fig. 9 Average test scores of the by-item clusters from L28 and Q28 survey data.

The cluster analysis and test score comparison was also conducted for the shorter surveys using the by-item approach. The cluster analysis of Q21 and L21 using the by-item data are presented in Fig. 10 and led to the same qualitative descriptions of the clusters with L21 having “High all”, “Average all” and “Amotivation” and Q21 having “Identified intrinsic”, “Identified external” and “Amotivation”. The comparison of these clusters by test scores are presented in Fig. 11 and the relationship appears similar to what was observed with the subscale data in Fig. 6. In the L21 data, the Kruskal–Wallis statistic was significant for Test 1, Test 3 and the Final Exam (H ranged from 7.7 to 17.7) at the 0.05 threshold. For the Q21 data, the comparison was significant for Test 2 and the Final Exam (H values of 8.0 and 11.9); unlike with the subscale data, Test 1 was not significantly different among the clusters with the by-item data on this survey. In pairwise comparisons, with the L21 data, “High all” and “Average all” had effect sizes ranging from r = 0.14 to 0.31, while “Average all” and “Amotivation” had effect sizes ranging from 0.00028 to 0.13 again indicating the proximity of these two groups among the L21 data. In the Q21 data, “Identified intrinsic” and “External intrinsic” had effect sizes ranging from 0.02 to 0.16 and “Identified intrinsic” and “Amotivation” had effect sizes ranging from 0.10 to 0.40.


image file: d1rp00206f-f10.tif
Fig. 10 Averages for each cluster from L21 and Q21 by-item survey data.

image file: d1rp00206f-f11.tif
Fig. 11 Average test scores of the by-item clusters from L21 and Q21 survey data.

Acknowledgements

The authors wish to acknowledge the students who took the survey, the instructors who facilitated administering the survey, and the authors of the AMS-Chemistry, without which this work could not be conducted.

References

  1. Assor A., Kaplan H., Feinberg O. and Tal K., (2009), Combining vision with voice: A learning and implementation structure promoting teachers' internalization of practices based on self-determination theory, Theory. Res. Educ., 7, 234–243.
  2. Austin A. C., Hammond N. B., Barrows N., Gould D. L. and Gould I. R., (2018), Relating motivation and student outcomes in general organic chemistry, Chem. Educ. Res. Pract., 19, 331–341.
  3. Baker R., Wildman J., Mason H. and Donaldson C., (2014), Q-ing for health – A new approach to eliciting the public's views on health care resource allocation, Health Econ., 23, 283–297.
  4. Biggs J., Kember D. and Leung D. Y., (2001), The revised two-factor study process questionnaire: R-SPQ-2F, Br. J. Educ. Psychol., 71, 133–149.
  5. Black A. E. and Deci E. L., (2000), The effects of instructors' autonomy support and students' autonomous motivation on learning organic chemistry: A self-determination theory perspective, Sci. Educ., 84, 740–756.
  6. Brandriet A. R., Ward R. M. and Bretz S. L., (2013), Modeling meaningful learning in chemistry using structural equation modeling, Chem. Educ. Res. Pract., 14, 421–430.
  7. Brotherton P. N. and Preece P. F. W., (1996), Teaching science process skills, Int. J. Sci. Educ., 18, 65–74.
  8. Brown S. R., (1996), Q methodology and qualitative research, Qual. Health Res., 6, 561–567.
  9. Chan J. Y. K. and Bauer C. F., (2014), Identifying at-risk students in general chemistry via cluster analysis of affective characteristics, J. Chem. Educ., 91, 1417–1425.
  10. Clatworthy J., Buick D., Hankins M., Weinman J. and Horne R., (2005), The use and reporting of cluster analysis in health psychology: A review, Br. J. Health Psychol., 10, 329–358.
  11. Cohen, J., Statistical Power Analysis for the Behavioral Sciences, Lawrence Erlbaum Associates, Hillsdale, 1988.
  12. Deci E. L. and Ryan R. M., (1985), The general causality orientations scale – self-determination in personality, J. Res. Pers., 19, 109–134.
  13. Deci E. L. and Ryan R. M., (1987), The support of autonomy and the control of behavior, J. Pers. Soc. Psychol., 53, 1024–1037.
  14. Deci E. L. and Ryan R. M., (2000), The “what” and “why” of goal pursuits: Human needs and the self-determination of behavior, Psychol. Inq., 11, 227–268.
  15. Deci E. L., Olafsen A. H. and Ryan R. M., (2017), Self-determination theory in work organizations: The state of a science, Annu. Rev. Organ. Psychol. Organ. Behav., 4, 19–43.
  16. Deci E. L., Vallerand R. J., Pelletier L. G. and Ryan R. M., (1991), Motivation and education – the self-determination perspective, Educ. Psychol., 26, 325–346.
  17. Donald J. N., Bradshaw E. L., Ryan R. M., Basarkod G., Ciarrochi J., Duineveld J. J., Guo J. and Sahdra B. K., (2020), Mindfulness and its association with varied types of motivation: A systematic review and meta-analysis using self-determination theory, Pers. Soc. Psychol. Bull., 46, 1121–1138.
  18. Dweck C. S. and Leggett E. L., (1988), A social-cognitive approach to motivation and personality, Psychol. Rev., 95(2), 256.
  19. Eyvindson K., Kangas A., Hujala T. and Leskinen P., (2014), Likert versus Q-approaches in survey methodologies: discrepancies in results with same respondents, Qual. Quant., 49, 509–522.
  20. Flaherty A. A., (2020), A review of affective chemistry education research and its implications for future research, Chem. Educ. Res. Pract., 21, 698–713.
  21. Geban Ö., Ertepınar H., Yılmaz G., Altın A. and Şahbaz F., (1994), Bilgisayar destekli eğitimin öğrencilerin fen bilgisi başarılarına ve fen bilgisi ilgilerine etkisi, Ulusal Fen Bilimleri Eğitimi Sempozyumu, 1–2.
  22. Gibbons R. E., Xu X., Villafañe S. M. and Raker J. R., (2018), Testing a reciprocal causation model between anxiety, enjoyment and academic performance in postsecondary organic chemistry, Educ. Psychol., 38, 838–856.
  23. Goldman Z. W., Goodboy A. K. and Weber K., (2016), College students’ psychological needs and intrinsic motivation to learn: An examination of self-determination theory, Commun. Q., 65, 167–191.
  24. Habig S., Blankenburg J., van Vorst H., Fechner S., Parchmann I. and Sumfleth E., (2018), Context characteristics and their effects on students’ situational interest in chemistry, Int. J. Sci. Educ, 40, 1154–1175.
  25. Hosbein K. N. and Barbera J., (2020), Development and evaluation of novel science and chemistry identity measures, Chem. Educ. Res. Pract., 21, 852–877.
  26. Kortlever J. T., Janssen S. J., van Berckel M. M., Ring D. and Vranceanu A. M., (2015), What is the most useful questionnaire for measurement of coping strategies in response to nociception?, Clin. Orthop. Relat. Res., 473, 3511–3518.
  27. Kotul'áková K., (2020), Identifying beliefs held by preservice chemistry teachers in order to improve instruction during their teaching courses, Chem. Educ. Res. Pract., 21, 730–748.
  28. Lewis S. E., (2011), Retention and reform: An evaluation of peer-led team learning, J. Chem. Educ., 88, 703–707.
  29. Lewis S. E., (2018), Goal orientations of general chemistry students via the achievement goal framework, Chem. Educ. Res. Pract., 19, 199–212.
  30. Lichtenfeld S., Pekrun R., Stupnisky R. H., Reiss K. and Murayama K., (2012), Measuring students' emotions in the early years: The achievement emotions questionnaire-elementary school (AEQ-ES), Learn Individ. Differ., 22, 190–201.
  31. Liu Y. J., Ferrell B., Barbera J. and Lewis J. E., (2017), Development and evaluation of a chemistry-specific version of the academic motivation scale (AMS-Chemistry), Chem. Educ. Res. Pract., 18, 191–213.
  32. Liu Y. J., Raker J. R. and Lewis J. E., (2018), Evaluating student motivation in organic chemistry courses: moving from a lecture-based to a flipped approach with peer-led team learning, Chem. Educ. Res. Pract., 19, 251–264.
  33. McFate C. and Olmsted III J., (1999), Assessing student preparation through placement tests, J. Chem. Educ., 76, 562–565.
  34. McKeown B. and Thomas D., (2013), Q Methodology.
  35. Moors G., (2008), Exploring the effect of a middle response category on response style in attitude measurement, Qual. Quant., 42, 779–794.
  36. Niemiec C. P. and Ryan R. M., (2009), Autonomy, competence, and relatedness in the classroom Applying self-determination theory to educational practice, Theory. Res. Educ., 7, 133–144.
  37. Nieswandt M., (2007), Student affect and conceptual understanding in learning chemistry, J. Res. Sci. Teach., 44, 908–937.
  38. Park M. and Wu A. D., (2019), Item response tree models to investigate acquiescence and extreme response styles in Likert-type rating scales, Educ. Psychol. Meas., 79, 911–930.
  39. Pekrun R., (2006), The control-value theory of achievement emotions: Assumptions, corollaries, and implications for educational research and practice, Educ. Psychol. Rev., 18, 315–341.
  40. Pintrich P. R., (2003), A motivational science perspective on the role of student motivation in learning and teaching contexts, J. Educ. Psychol., 95, 667–686.
  41. Ramlo S., (2016), Mixed method lessons learned from 80 years of Q methodology, J. Mix. Methods Res., 10, 28–45.
  42. Ryan R. M. and Deci E. L., (2000), Self-determination theory and the facilitation of intrinsic motivation, social development, and well-being, Am. Psychol., 55, 68–78.
  43. Ryan C. and Huyton J., (2000), Who is interested in aboriginal tourism in the Northern Territory, Australia? A cluster analysis, J. Sustain. Tour., 8, 53–88.
  44. Schiefele U., (1991), Interest, learning, and motivation, Educ. Psychol., 26, 299–323.
  45. Taylor G., Jungert T., Mageau G. A., Schattke K., Dedic H., Rosenfield S. and Koestner R., (2014), A self-determination theory approach to predicting school achievement over time: the unique role of intrinsic motivation, Contemp. Educ. Psychol., 39, 342–358.
  46. ten Klooster P. M., Visser M. and de Jong M. D. T., (2008), Comparing two image research instruments: The Q-sort method versus the Likert attitude questionnaire, Food Qual. Prefer., 19, 511–518.
  47. Thompson A. W., Dumyahn S., Prokopy L. S., Amberg S., Baumgart-Getz A., Jackson-Tyree J., Perry-Hill R., Reimer A., Robinson K. and Mase A. S., (2012), Comparing random sample Q and R methods for understanding natural resource attitudes, Field Methods, 25, 25–46.
  48. Vansteenkiste M., Simons J., Lens W., Sheldon K. M. and Deci E. L., (2004), Motivating learning, performance, and persistence: The synergistic effects of intrinsic goal contents and autonomy-supportive contexts, J. Pers. Soc. Psychol., 87, 246–260.
  49. Van Vaerenberg h Y. and Thomas T. D., (2012), Response styles in survey research: A literature review of antecedents, consequences, and remedies, Int. J. Public Opin. Res., 25, 195–217.
  50. Villafañe S. M., Xu X. and Raker J. R., (2016), Self-efficacy and academic performance in first-semester organic chemistry: Testing a model of reciprocal causation, Chem. Educ. Res. Pract., 17, 973–984.
  51. Wang, Y., Rocabado, G., Lewis, J.E. & Lewis, S.E., (2021), Prompts to promote success: Evaluating utility value and growth mindset interventions on general chemistry students’ attitude and academic performance, J. Chem. Educ.98, 1476–1488.
  52. Webler T., Tuler S. and Krueger R., (2001), What is a good public participation process? Five perspectives from the public, Environ. Manag., 27, 435–450.

This journal is © The Royal Society of Chemistry 2022