Analyzing explanations of substitution reactions using lexical analysis and logistic regression techniques

Amber J. Dood , John C. Dood , Daniel Cruz-Ramírez de Arellano , Kimberly B. Fields and Jeffrey R. Raker *
Department of Chemistry, University of South Florida, Tampa, Florida, USA. E-mail:

Received 8th July 2019 , Accepted 1st October 2019

First published on 16th October 2019

Assessments that aim to evaluate student understanding of chemical reactions and reaction mechanisms should ask students to construct written or oral explanations of mechanistic representations; students can reproduce pictorial mechanism representations with minimal understanding of the meaning of the representations. Grading such assessments is time-consuming, which is a limitation for use in large-enrollment courses and for timely feedback for students. Lexical analysis and logistic regression techniques can be used to evaluate student written responses in STEM courses. In this study, we use lexical analysis and logistic regression techniques to score a constructed-response item which aims to evaluate student explanations about what is happening in a unimolecular nucleophilic substitution (i.e., SN1) reaction and why. We identify three levels of student explanation sophistication (i.e., descriptive only, surface level why, and deeper why), and qualitatively describe student reasoning about four main aspects of the reaction: leaving group, carbocation, nucleophile and electrophile, and acid–base proton transfer. Responses scored as Level 1 (N = 113, 11%) include only a description of what is happening in the reaction and do not address the why for any of the four aspects. Level 2 responses (N = 549, 53%) describe why the reaction is occurring at a surface level (i.e., using solely explicit features or mentioning implicit features without deeper explanation) for at least one aspect of the reaction. Level 3 responses (N = 379, 36%) explain the why at a deeper level by inferring implicit features from explicit features explained using electronic effects for at least one reaction aspect. We evaluate the predictive accuracy of two binomial logistic regression models for scoring the responses with these levels, achieving 86.9% accuracy (with the testing data set) when compared to human coding. The lexical analysis methodology and emergent scoring framework could be used as a foundation from which to develop scoring models for a broader array of reaction mechanisms.


Research has shown that students can draw reaction mechanisms (i.e., the stepwise process of the movement of electrons over the course of a reaction) without understanding the pictorial representation of a reaction mechanism (e.g., Bhattacharyya and Bodner, 2005; Ferguson and Bodner, 2008; Grove et al., 2012); this body of research has relied on think-aloud interviews and constructed-response assessments to identify the observed lack of understanding. Combined, the work demonstrates that explanations are a necessary complement to mechanistic pictures in order to provide stronger evidence of understanding of reaction mechanisms (cf., Becker et al., 2016; Crandell et al., 2018). Despite their importance, use of constructed-response items during instruction and on formative assessments is limited due to time constraints of grading written responses and lack of automated scoring tools.

Lexical analysis is a solution to scoring written responses to constructed-response items including written explanations of reaction mechanisms. Through lexical analysis and resultant predictive models, computers are trained to code written responses to constructed-response assessments for themes and overall response levels of sophistication or correctness. Such models have been used in STEM education to formatively assess student understanding of phenomena such as acid–base chemistry (e.g., Haudek et al., 2012; Dood et al., 2018), the central dogma of biology (e.g., Prevost et al., 2016), and thermodynamics (e.g., Prevost et al., 2012). Results of computer-scored written responses provide data from which educators can tailor future instruction including remediation or forgo further discussion of topics that a majority of students already understand.

In this study, we describe the use of lexical analysis and the application of logistic regression model techniques to the scoring of a constructed-response assessment item aimed to evaluate student explanations of a substitution reaction, one of several reaction types taught in introductory organic chemistry courses. We identify three levels of explanation sophistication in student responses to what happens in the reaction and why the chemical reaction occurs mechanistically. We evaluate the predictive accuracy of a pair of binomial logistic regression models that, when combined, produce our three-level scoring of assessment responses. Lastly, we offer methods for incorporating the assessment item and resultant computer-scoring model as a formative assessment in chemistry instruction.

Student understanding of organic chemistry mechanisms

Organic chemistry mechanisms are represented by curved arrows that signify the movement of electrons from areas of high electron density to areas of low electron density (Kermack and Robinson, 1922; Bhattacharyya, 2013). Practicing organic chemists use mechanisms as tools to predict the products of reactions. Research has shown that students do not view mechanisms as tools (e.g., Bhattacharyya and Bodner, 2005; Grove et al., 2012). Instead, students memorize the pictorial representations without connecting chemical theory to the lines, letters, and symbols; they then reproduce the mechanistic pictures to earn points on assessments, where, for example, understanding of mechanisms is frequently assessed solely through reproduction of pictorial representations. Grove et al. (2012) found that students drew the chemical structures of mechanistic steps from memory and then added in arrows at the end, as if decorating the structures with arrows, instead of using mechanisms as a communicative and predictive tool. Bhattacharyya and Bodner (2005), as well, found that students were able to successfully reproduce organic chemistry mechanisms, but did not understand the chemical concepts behind the mechanisms.

Student performance in organic chemistry courses has been correlated with type of mechanistic explanation (Cooper et al., 2016; Crandell et al., 2018; Dood et al., 2018). Cooper et al. (2016) and Crandell et al. (2018) found that students who provided causal mechanistic reasoning in response to an assessment item were more likely to successfully produce the mechanistic representation. A similar constructed-response item was used in our previous work (Dood et al., 2018) which found that Lewis acid–base model use in mechanistic explanations of an acid–base proton-transfer reaction was associated with higher performance on acid–base examination items and overall exam performance.

Explanations and descriptions of pictorial representations have been shown to uncover understanding and misunderstanding of organic chemistry and related topics where merely asking students to produce representations fails to do so (Bhattacharyya and Bodner, 2005; Strickland et al., 2010; Grove et al., 2012; Cruz-Ramírez de Arellano and Towns, 2014; Anzovino and Bretz, 2015, 2016; Cooper et al., 2016; Bhattacharyya and Harris, 2018; Dood et al., 2018; Popova and Bretz, 2018). Even entry-level chemistry graduate students who are intending to study organic chemistry lacked the understanding of mechanisms that instructors expect from undergraduate organic chemistry students (Strickland et al., 2010). Constructed-response assessments provide a means to evaluate understanding of reaction mechanisms via student explanations about provided pictorial drawings or coupled with assessments that have students generate their own pictorial drawings.

Reasoning in chemistry

Students and instructors use multiple types of reasoning when considering chemistry-related concepts (e.g., teleological, anthropomorphic, mechanistic, causal, causal mechanistic). Teleological reasoning implies purpose based on a certain goal (Wright, 1972, 1976; Tamir and Zohar, 1991; Talanquer, 2007). A common example of teleological reasoning is that atoms react the way they do in order to fulfill the octet rule (Talanquer, 2007). Caspari et al. (2018b) asked students to propose mechanistic arrows for organic chemistry reactions and explain why they proposed each step. Teleological reasoning was observed, with some students claiming that the reason for the occurrence of a mechanistic step was in order to make the next step possible. Similarly, students in a study conducted by Bhattacharyya and Bodner (2005) reasoned that mechanistic steps occurred in order to get to the product. Anthropomorphic reasoning, frequently used in conjunction with teleological reasoning, attributes human-like characteristics to non-human entities (e.g., “the atom wants to have a full octet”). Anthropomorphic reasoning can be constructive in cases where teachers and students are aware that anthropomorphism is being used metaphorically (Lemke, 1990; Taber and Watts, 1996). However, anthropomorphic and teleological reasoning become problematic when students think entities such as atoms and electrons actually want and need. Talanquer (2013) stated that anthropomorphic and teleological explanations provide people with a false sense of understanding, leading them not to seek out deeper understanding of phenomena because they have confused their superficial understanding with deeper understanding. Talanquer (2013) also called teleological explanations “a cognitively cheap way of satisfying a need for explanation without having to engage with more complex mechanistic reasoning” (p. 1423); Talanquer (2013) argued that students in chemistry classrooms should be given the opportunity to develop and apply mechanistic and causal explanations of chemical phenomena.

In order to help students develop reasoning beyond teleological reasoning, instruction should focus on how and why chemical process are occurring at mechanistic and causal levels. According to Talanquer (2018), mechanistic reasoning “invoke[s] the existence of entities (e.g., atoms, molecules) whose properties, interactions, activities, and organization are responsible for the behaviors we observe” (p. 1906). Teaching mechanistic reasoning in the classroom is a critical part of instruction, as it describes “how the particular components of a system give rise to its behavior” (Russ et al., 2008, p. 504). In addition to mechanistic reasoning, instructional activities should encourage students develop reasoning that explains the process of cause and effect (i.e., causal reasoning; Koslowski, 1996; Schauble, 1996; Sperber et al., 1996; Zimmerman, 2000; Abrams and Southerland, 2001). In the context of chemistry, Ferguson and Bodner (2008) define chemical mechanism as how reactions happen and chemical causality as why reactions happen, arguing that chemical causality is necessary to demonstrate full understanding of reactivity. An example from a framework presented by Crandell et al. (2018) describes causal reasoning/explanations in the context of acid–base reactions as discussing the electrostatic interaction between species, while mechanistic reasoning/explanations only describe the movement of electrons. Causal mechanistic reasoning combines aspects of causal and mechanistic reasoning. In the context of organic chemistry, Crandell et al. (2018) defined causal mechanistic reasoning as “a type of explanation of a phenomenon that identifies the causal factors and the activities of the underlying entities (electrons) to provide a stepwise account of the phenomenon from start to finish” (p. 214). Crandell et al. (2018) argued that asking students to engage in causal mechanistic reasoning is beneficial, as the act “requires that students reflect on and connect the sequence of events underlying a phenomenon and the causal drivers involved” (p. 215), therefore facilitating learning. The idea that causal mechanistic reasoning should promote learning is supported by the results of Cooper et al. (2016) who found that students who engaged in causal mechanistic reasoning were more likely to successfully produce mechanistic arrows. Therefore, instruction in organic chemistry should emphasize the mechanistic and causal reasoning behind chemical processes, including asking students to explain such process in assessment contexts.

Reasoning about substitution reactions

Studies have explored student understanding of substitution reactions and aspects of substitution reactions (Cruz-Ramírez de Arellano and Towns, 2014; Anzovino and Bretz, 2015, 2016; Caspari et al., 2018a; Popova and Bretz, 2018; Bodé et al., 2019). An interview study by Cruz-Ramírez de Arellano and Towns (2014) considered student understanding of alkyl halide reactions similar to the SN1 reaction presented in this study. Students were asked to solve organic chemistry mechanism problems requiring them to predict the products of alkyl halide reactions. An “expert argumentation scheme” included warrants such as: classifying methanol as a weak nucleophile; knowing that chlorine will act as a leaving group and create a carbocation intermediate; and that “oxygen in the substitution product will lose its proton through acid–base chemistry with another molecule of methanol” (p. 505).

The results of these studies indicate that student reasoning about aspects of substitution reactions remains at a surface level. Students memorize which leaving groups are good, but are unable to describe why in a meaningful way, invoking the octet rule and charge to size ratio to reason about leaving group stability, and electronegativity and electron pulling as reasons for halides being good leaving groups (Caspari et al., 2018a; Popova and Bretz, 2018). Understanding of carbocation stability is limited to the number of substituents of the carbocation, again indicating reasoning that is limited to explicit features (Caspari et al., 2018a; Bodé et al., 2019). Studies on students’ understanding of nucleophiles and electrophiles have shown evidence of heavy reliance on explicit reaction features, with students using structural cues such as charges and mechanistic arrows to identify nucleophiles and electrophiles (Anzovino and Bretz, 2015, 2016).

Levels of reasoning in organic chemistry

Levels of reasoning have been used to describe student understanding of reaction mechanisms (Sevian and Talanquer, 2014; Caspari et al., 2018a; Bodé et al., 2019). Caspari et al. (2018a) asked students to explain which of two mechanistic steps had the lowest activation energy. Based on student explanations, Caspari et al. (2018a) defined levels of complexity of relations that included low complexity, middle complexity, and high complexity (Table 1). Bodé et al. (2019) presented students with mechanisms and reaction coordinate diagrams for two similar reactions and asked which reaction was most likely to proceed and why. Using the Chemical Thinking Learning Progression (CTLP) developed by Sevian and Talanquer (2014) as a framework, Bodé et al. (2019) scored student responses as descriptive, relational, linear causal, and multicomponent causal (Table 2). Most student responses (63%) fell into the linear causal category. Although multicomponent causal was considered ideal, no student responses fell into that category. Caspari et al. (2018a) and Bodé et al. (2019) included the theme of explicit versus implicit features in their scoring schemes; ideally, explanations would include implicit features inferred from explicit features. The CTLP, as used by Bodé et al. (2019), considered non-causal versus causal reasoning, with two lower levels for non-causal reasoning and two more advanced levels for causal reasoning. These studies elicited and described levels of students’ chemical reasoning using constructed-response items. As the construction of explanations in science is an important tool for learning, other studies have specifically addressed the nature of assessment prompts to promote and elicit student reasoning.
Table 1 Scoring scheme used by Caspari et al. (2018a) to analyze student understanding of reaction mechanisms
Relations with low complexity: explicit structural differences or non-electronic effects used as a cause for change.
Relations with middle complexity: implicit structural properties or non-electronic effects used as a cause for change.
Relations with high complexity: implicit structural differences used to describe electronic effects on change.

Table 2 Scoring scheme used by Bodé et al. (2019) which was adapted from the CTLP developed by Sevian and Talanquer (2014)
Descriptive: describing properties of reaction materials, explicit (surface) features of problem described.
Relational: explicit and implicit properties discussed; connections made but reasoning does not get to the why.
Linear causal: cause-and-effect relationships (i.e., the why) are discussed for single variables using explicit and implicit properties of the reaction.
Multicomponent causal: multiple variables considered using explicit and implicit features of the reaction to describe cause-and-effect relationships.

Constructed-response items to elicit chemical reasoning

Purposefully developed constructed-response items can elicit and promote student understanding of scientific representations and concepts. The Framework for K-12 Science Education names constructing explanations and engaging in argument from evidence as two of eight practices for science classrooms (National Research Council, 2011) that help develop understanding. The Framework states that scientific explanations “explain observed relationships between variables and describe the mechanisms that support cause and effect inferences about them” (p. 67). Becker et al. (2016) stated that “engaging students in the construction of scientific explanations affords them an opportunity to engage (in a scaffolded form) in the practices of science, that is, the activities that scientists engage in as they investigate the world” (p. 1714). When the proper scaffolding is provided, constructed-response items can be used to elicit student reasoning in chemistry and to promote student construction of knowledge through participation in the development of scientific explanations.

Constructed-response items are useful for eliciting student explanations when properly developed. Recommendations from previous work to better elicit student explanations included providing students with multiple representations (Becker et al., 2016; Crandell et al., 2018) and specifically asking students both what is happening and why (Cooper et al., 2016). Though useful for eliciting and developing explanations, constructed-response items are onerous to incorporate in courses, as time is required for an instructor to read and score responses. The delay in feedback due to grading is a barrier to the utility of constructed-response items for formative assessment purposes. Constructed-response items could be more useful if incorporated into automated assessment systems with computer-assisted scoring where immediate feedback is possible. Instructors would benefit from the amassed feedback provided as this feedback could be used to develop new instructional strategies or just-in-time teaching moments in the classroom. Students, as well, could benefit from immediate feedback received about their written responses. Lexical analysis techniques coupled with predictive regression models provide a means for development of the type of computer-assisted scoring necessary to achieve the utility of constructed-response items.

Lexical analysis of written responses

Lexical analysis is a technique to evaluate textual responses such as those obtained through constructed-response items. This technique codes common words and phrases used in text data; codes are then used to construct predictive models (e.g., binomial logistic regression). Such models have been used to predict correctness, conceptual understanding, and levels of understanding for a number of constructed-response items used in chemistry and biology instruction (Haudek et al., 2012; Prevost, et al., 2012, 2013; Kaplan et al., 2014; Moharreri et al., 2014; Shen et al., 2014; Prevost, et al., 2016; Dood et al., 2018).

Lexical analysis begins with reviewing text-based responses for common words and phrases. These are then combined into broader categories that, in the case of use with constructed-response items in chemistry, may indicate a concept or topic being used or not used by a student within a written response. For example, common words and phrases used to describe differences in electron density in a reactant molecule could be combined into a category called partial charges; any response to the assessment item that uses any of the words or phrases associated with this category would be marked accordingly: 1 means included in the response and 0 means not included in the response. After categories have been developed for a data set, those categories are used to develop a regression model that predicts an overall score for each response based on categories represented in that response. In previous work, we coded responses for use or non-use of a Lewis acid–base model by students when describing the mechanism of a single-proton transfer reaction (Dood et al., 2018). We used a binomial logistic regression model with categories (such as partial charges described above) as predictors of Lewis acid–base model use, a binary variable. Logistic regression models can be developed for binary, ordinal, and multinomial models based on the desired overall scoring scheme of the constructed-response item. As an example of a contrasting model, a multinomial logistic regression model was used by Prevost et al. (2016), scoring student responses as: correct, incomplete/irrelevant, or incorrect.

A high level of predictive accuracy (greater than 85% agreement with human coding) is expected for determining the utility of the computer-assisted scoring of assessment items. A large number of diverse responses is necessary to develop a model with broad predictive ability. While no established guidelines exist for minimum or ideal sample size, the Automated Analysis of Constructed Response (AACR) and associated studies typically collect more than 800 responses to build and evaluate scoring models; such a sample size provides a sufficient number of responses to divide the data into a training set (∼70%) from which the model is built and refined and a testing set (∼30%) from which the model is evaluated. Use in large enrollment courses is touted as a key utility of the computer-assisted scoring of constructed-response items (Prevost et al., 2013); typical testing sets (30% of 800) equate to approximately 240 responses, a likely size of a large enrollment course.

Examples of lexical analysis and predictive regression models being used to analyze constructed-response assessments span only a small number of scientific disciplines: biology (Haudek et al., 2012; Prevost, et al., 2012, 2016; Moharreri et al., 2014), chemistry (Dood et al., 2018), and statistics (Kaplan et al., 2014). A collection of constructed-response items can be found in the AACR library ( The project began in biology, and has now branched out to biochemistry, chemistry, and statistics topics; however, topics are often central and relevant to biology. In this study, we analyze student explanations of an SN1 reaction mechanism and develop a predictive regression model to evaluate student understanding of the SN1 reaction.

Theoretical foundation and research questions

This study embraces an emergent process for identifying and predicting classifications of students’ mechanism explanations, and presents levels of sophistication of student explanations which arose from the data set. Our analysis is framed in the current understanding in the literature of student explanations and descriptions of mechanisms. Given this understanding, the goal of this study is to accurately code students’ written responses to a constructed-response item using a predictive logistic regression model that has potential to be used in classes with large enrollment. We address two key questions:

1. How do students respond to a prompt asking for a written response about what is happening and why for an SN1 (i.e., unimolecular substitution) reaction mechanism?

2. Does lexical analysis lead to a logistic regression model for predicting level of explanation sophistication in responses to a constructed-response formative assessment item on explaining an SN1 reaction mechanism?


This work was conducted under application Pro#00028802, “Comprehensive evaluation of the University of South Florida's undergraduate and graduate chemistry curricula”, as reviewed and approved on December 13, 2016, by the University of South Florida's Institutional Review Board.

Development of constructed response item

The constructed-response item (see Fig. 1) used in this study mirrors an item developed by Cooper et al. (2016) which asked students in a transformed general chemistry curriculum what is happening in an acid–base proton-transfer mechanism at a molecular level and why. Cooper et al. (2016) used an iterative, research-based process to develop the item. In previous work, we adapted the item for use in our organic chemistry curriculum and for scoring using lexical analysis and logistic regression (Dood et al., 2018). The constructed-response item in this study uses wording and symbolism that mirror the prompt used in our previous work; the key difference is this study explores a multistep SN1 reaction mechanism rather than a single-step proton-transfer reaction. Additionally, the predictive model in our previous work coded responses for use or non-use of the Lewis acid–base model while the predictive model in this study scores responses for overall level of explanation sophistication. To broaden the utility of the item, the leaving group was varied (i.e., bromide, chloride, and iodide). The version of the prompt with chloride as the leaving group is depicted in Fig. 1.
image file: c9rp00148d-f1.tif
Fig. 1 Constructed-response items given to students with bromide as the leaving group. Additional iterations of the prompt included iodide and chloride as leaving groups.

Students were provided with separate response boxes for parts A and B of the prompt, as Cooper et al. (2016) found that including both what and why in separate parts better elicited student reasoning. However, similar to Cooper et al. (2016), responses to parts A and B were combined for analysis due to many students being unable to differentiate between what and why.

Data collection

Data were collected during the Fall 2017 and Spring 2018 semesters in the first semester of a two-semester postsecondary organic chemistry sequence with three different instructors at a large, public university in the southeast United States. The item was given as an extra credit opportunity via Qualtrics. The iteration of the item with bromide as a leaving group (N = 733) was collected during the Fall 2017 semester, and the iterations with chloride (N = 155) and iodide (N = 153) as leaving groups were collected during the Spring 2018 semester.

Development of scoring scheme

Responses were first analyzed by author AJD to inductively develop a scoring scheme via an exploratory and iterative process (cf., constant comparative analysis; Glaser and Strauss, 1967; Strauss and Corbin, 1990), allowing the scoring scheme to emerge from the data. Through this process, author AJD noted different levels of complexity in students’ explanations based on four main aspects of the presented reaction: leaving group, carbocation, nucleophile and electrophile, and proton transfer. Different explanation types were classified into categories through discussions between authors AJD and JRR, during which categories were refined and inclusion criteria were established. Author AJD then independently applied this initial scheme to all responses. The resulting preliminary categories described the level of complexity of student explanations: description only (i.e., response describes what is occurring but not why the reaction occurs), surface level reasoning (i.e., reasoning about why the reaction occurs at a surface level), and deeper reasoning (i.e., reasoning about why the reaction occurs at a deeper level). To test the fidelity of the reasoning categories, author JRR performed an interrater check by randomly selecting responses across the three categories and applying the categories; discrepancies were discussed between authors AJD and JRR until agreement was reached. Minor revisions to the category descriptions were made and author AJD applied the final scheme to the remaining responses. After the sorting process was complete, it was determined that the categories represented levels of explanation sophistication. Therefore, we refer to the categories as Level 1, Level 2, and Level 3.

Development of logistic regression model

Model development and evaluation was conducted using Python, an open-source scripting language (code available at, and a lexical analysis framework built using SPSS Modeler. Logistic regression analysis code for Python was adapted from Pedregosa et al. (2011). The model development process is outlined in Fig. 2. Combined responses to part A and part B of the assessment item were randomly partitioned into a training set (N = 728, ∼70%) and a testing set (N = 313, ∼30%). Training data were used to build and revise the logistic regression model; testing data were used after the model was built to validate the model. The foundation of the predictive model is a hierarchical coding scheme using common words and phrases to generate categories (i.e., predictor variables). A pair of binomial logistic regression models were used to predict the human-determined score.
image file: c9rp00148d-f2.tif
Fig. 2 The process of creating a logistic regression predictive model from constructed-response item to model development.

The first step in model development was to generate four coding algorithms that prepared and analyzed the text responses for inclusion in the predictive model. The first two algorithms replaced words or phrases in the responses to adjust for misspellings, equivalents, or synonyms. The last two algorithms considered combinations of words or phrases in the responses and generated categorical codes used as predictor variables in the regression models.

Equivalents. Equivalents are words or phrases that read as exactly the same, common misspellings, and different ways to write something. For example, consider bromine. This term can include Br, bromide, and Br, as students are referring to the same thing when using these in the context of our assessment item. The term bromine also includes “bromin” and “bromean” as common misspellings. The equivalents algorithm cleans responses by replacing all equivalents to the same term; in the case of our example, all of the words in the response that are equivalent to bromine (e.g., “bromin” or Br) will be replaced with the term bromine.
Types. Types are groups of words that are synonyms. For example, a type called halogen could include: bromine, chlorine, iodine, and fluorine. In the context of our assessment item, these words are synonyms because three different halogens are used as the leaving group in the three iterations of the assessment item. As with the equivalents algorithm, data are cleaned by replacing words or phrases with synonymous words or phrases (e.g., halogen).
Rules. Rules are patterns of text made up of words, phrases, equivalents, or types. A rule consists of two or more words, equivalents, or types that appear in the text in sequence within six or less words of each other. For example, the phrase “the alcohol is [a with combining low line][t with combining low line][t with combining low line][r with combining low line][a with combining low line][c with combining low line][t with combining low line][e with combining low line][d with combining low line] to the [p with combining low line][o with combining low line][s with combining low line][i with combining low line][t with combining low line][i with combining low line][v with combining low line][e with combining low line][l with combining low line][y with combining low line] charged carbocation” contains words or phrases of the type attract and the type positive (see underlined words). Though the two words are not directly next to each other, a rule that codes for the attract and positive types would apply to this phrase.

The order of the words, equivalents, or types is important. Thus, separate rules must be created to account for discrepancies in word order, as respondents may use different wordings to mean the same thing (e.g., “the bond was formed” vs. “it formed a bond”). The rules bond forms and forms bond were thus both necessary. The rule bond forms must contain a word from the type bond and then a word from the type form. The rule forms bond must contain a word from the type form and then a word from the type bond. The necessary order of terms has the potential to distinguish between understanding and misunderstanding in cases when use of the terms is parallel with correct and incorrect application of a scientific concept or principle, though the data set in this study was not scored for correctness.

Categories. A category contains specific words, phrases, rules, or types found in responses; a category may also be designed to denote that a response does not contain particular words, phrases, rules, or types. Any given response can be coded with all or none of the categories, with most responses being coded with multiple categories (an average of 9 categories per response out of 33 possible categories for responses in this data set). A numeric code of 1 is given to responses that include a given category or a numeric code of 0 is given to responses that do not include the given category.

An example category is bond forming; this category includes any responses that are coded as use of the rules bond forms or forms bond, as well as the phrase “new bond”. Another example category is absence of explanation; this category includes responses that omit words from types that would indicate the student has addressed why the reaction is occurring.

Logistic regression model. Category codes (i.e., 1 or 0) are then used as predictor variables in logistic regression models. In this study, we used a pair of binomial logistic regression models with human-scoring as the outcome variable due to higher accuracy in this context over an ordinal model. The model development process used only the training data set. Accuracy was determined as the percentage of correct scoring assignments of the responses by the computer as compared to human scores. An iterative process of refining equivalents, types, rules, and category coding algorithms was conducted with the training dataset until model accuracy was maximized. Published models typically include accuracy of at least 70%, although the desired level of accuracy is typically >85%. Refinement of algorithms was achieved through consideration of how algorithms led to false positive and false negative predicted scores. When accuracy was maximized, the resultant model was applied to the testing set.

Results and discussion

RQ1: How do students respond to a prompt asking for a written response about what is happening and why for an SN1 reaction mechanism?

Analysis of responses to the constructed-response item resulted in a tri-level score based on complexity of explanation used by students when writing about what is happening and why for an SN1 reaction mechanism (see Table 3 for representative responses). A Level 1 response was limited to a description of what is happening in the reaction and did not address why the reaction occurs. A Level 2 response included a surface level explanation about why the reaction occurs. That is, the explanation was related only to explicit features of the reaction or mentioned implicit features, but with no evidence of deeper reasoning. For example, if an explanation claimed bromide is a good leaving group due to its size with no other explanation, it is impossible to tell whether the explanation is based on sound chemical reasoning or if the explanation stems from memorizing a pattern of atomic size. A Level 3 response included a deeper level explanation of why the reaction occurs (i.e., reasoning related to implicit features of the reaction that have been inferred from the explicit features of the reaction and explained using electronic effects). The three levels are hierarchical in that a Level 3 response could include similar phrases to a Level 1 or Level 2 response as to what is happening in the reaction; however, the Level 3 response would provide a deeper explanation for why the reaction is happening that is not present in a Level 1 or Level 2 response. In the complete human-coded data set, there were 113 responses coded as Level 1, 549 responses coded as Level 2, and 379 responses coded as Level 3.
Table 3 Scheme used to score the constructed-response item
Overall score and description Example
Level 1 “A nucleophilic attack is occurring. The Br is the leaving group and this happens in two steps. The O is attacking the cation and then grabbing the hydrogen.”
Response only describes what is happening in the reaction; the response does not address why the reaction is occurring.
Level 2 “This reaction occurs because bromide is a good leaving group that generally wants to break a bond in order for a nucleophile such as water to come in and balance the charge.”
Response describes why the reaction is occurring at a surface level; explanations either include only explicit reaction features or mention implicit features with no further explanation. Example reasons: stability and leaving group ability.
Level 3 “Since iodine is such a good leaving group due to its large electron shell and thus polarizability, the bond between the alpha carbon and iodine is polar. Once the iodine leaves, the alpha carbon forms a tertiary carbocation. Due to hyperconjugation and inductive effects, this is as stable as it gets in terms of carbocations. The polar secondary alcohol (with oxygen having a partial negative charge) is attracted to the full positive charge of the tertiary carbocation and therefore attacks it.”
Response describes why the reaction is occurring at a deeper level; explanation includes implicit features of the reaction that have been inferred from explicit features; explains electronic effects. Example reasons: electron density, electronegativity, and partial charges.

Though overall levels were assigned holistically, responses were noted to include the four components of SN1 reactions of which explanation could be considered in levels: leaving group, carbocation, nucleophile and electrophile, and proton transfer. Responses varied in the number of components that were addressed from none to all four. Responses that addressed more components generally received a higher-level score; however, a response could address all four components but provide no explanation for why parts of the reaction act the way they do and thus still receive a Level 1 score. In addition, it was not essential that explanation be offered for each component explicitly to receive a Level 2 or Level 3 score. For example, a respondent could explain that the leaving group bromine will be a good leaving group because of its large atomic size (this is a memorized trend with no evidence of deeper reasoning and thus an example of Level 2 explanation), but only describe the other parts of the reaction at a Level 1. The overall Level assignment would be Level 2 because of the inclusion of the why at a surface level for at least one component of the reaction.

Differences in how respondents addressed the four components (i.e., leaving group, carbocation, nucleophile and electrophile, and proton transfer) provide more information about reasoning used at each scoring Level. An example response developed by the authors and considered expert-level with bromide as the leaving group is provided below with specific deeper reasoning bolded for emphasis:

An S N 1 reaction is occurring. In the first step, bromide acts as a leaving group and the bond between bromine and carbon is broken. Bromine's large atomic radius allows for a stable bromide ion due to delocalization of the negative charge . The positively-charged tertiary carbocation is also stable because of shared electron density from surrounding methyl groups, which act as electron donors . In the second step, ethanol acts as a nucleophile and the carbocation acts as an electrophile. The electrostatic attraction between the carbocation and the nucleophile is greater than that of the carbocation and bromide; thus, electron-deficient carbocation and the partial negative charge of the electron-rich oxygen combine to form a new bond.In the final step of the reaction, the positively-charged intermediate is neutralized. Ethanol iselectron-rich and able to act as a base and a bond can be formed between the ethanol-oxygen and the hydrogen atom on the product. The acidic hydrogen is electron-poor due to the positively-charged, electronegative oxygen unequally sharing electron density with it. The hydrogen–oxygen bond is broken and the electrons that formed the bond become a lone pair on oxygen, resulting in a neutral final product.

Although this response is considered expert-level for our prompt, students are not necessarily expected to be able to address all aspects of the response at this level to be successful in the course. Based on the expectations of the course, students should be able to explain the four components of the reaction at Level 2.

Leaving group. The first step of the SN1 reaction is the bond breaking between leaving group and substrate (i.e., starting material). Level 1 responses included descriptive statements such as “the Br is the leaving group, so the bond between Br and the carbon first breaks in the rate limiting step”. There is a description of what is happening in the response; however, the response lacks explanation of why the step is occurring. Level 2 responses included simple rationales. For example, “The Br group leaves because it is a good leaving group”. While the statement “good leaving group” is superficial, the evaluative statement suggests the foundation for a rationale for the step. However, this response could be mere evidence of a memorized fact rather than deeper level explanation. Of our 1041 total responses, there were 339 instances of the phrase “good leaving group”, many of which were not accompanied by further explanation of why the leaving group was good. Students may have memorized which leaving groups are relatively good and used explicit features in the pictorial representation (i.e., halide as the leaving group) to reason that the reaction was able to occur due to the goodness of the leaving group. More in-depth (Level 3) reasoning would be “bromine leaves because it is large and able to accommodate a negative charge”. We have scored this at a Level 3 based on similar higher-level explanation designations noted in previous studies showing that invoking size and its relationship to charge stabilization when explaining leaving group ability trends towards more causal explanations (Caspari et al., 2018a; Popova and Bretz, 2018).

Most students in our study articulated that the halogen leaves because it is a good leaving group. Organic chemistry students often memorize relative leaving group abilities and use these memorized trends to determine the best leaving group without understanding why that leaving group is the best; Popova and Bretz (2018) made this same observation in their work on student understanding of what makes a good leaving group. Though the memorized trend can be successful (i.e., a high grade), regurgitation of the trend is not evidence of chemical understanding. When assessments lack an explanatory component, memorization strategies will dominate student approaches to studying; when more constructed-response based assessments ask for explanations that require the why, students will need to utilize appropriate study strategies to develop the deep understanding necessary to infer implicit features from explicit features and provide causal rationales for observed phenomena such as leaving group trends. Caspari et al. (2018a) make a similar recommendation, calling for instructors to require extensive explanations from students and make underlying causal components the main focus of examination questions. It is important to note, though, that as assessments shift toward more constructed-response explanations, teaching approaches must shift as well. Instructors should build opportunities for students to learn how and to practice constructing explanations of phenomena causally during class time and on low-stakes assessments.

Carbocation. Students also described the formation of a carbocation intermediate in the reaction. Level 1 responses noted that a carbocation was formed; some also labeled the carbocation as tertiary. Some responses showed evidence of teleological reasoning, saying that the leaving group left because a stable tertiary carbocation would be formed; these teleological explanations were also scored at Level 1. Level 2 responses noted that the ‘tertiary’ carbocation was stable, sometimes noting this stability was due to the number of substituents but not elaborating on why the number of substituents matters for stability. This is evidence of a memorized pattern that more highly substituted carbocations are more stable. Level 3 responses stated that the carbocation formed is stabilized by the three electron-donating methyl groups adjacent to the cation; few responses included this level of sophistication. At Level 3, electron density should serve as the foundation for rationalizing the viability of the intermediate carbocation.

As with our critique of how students rationalized leaving groups, the lack of observed deeper level explanations using implicit features for the viability of the carbocation intermediate can also potentially be attributed to instructional and assessment strategies used in many postsecondary organic chemistry courses, including the course taken by the students who participated in this study. Though not the case in all introductory organic chemistry courses, many courses include assessments which ask students to label given carbocations as primary, secondary, tertiary, etc. Such activities may unintentionally emphasize the importance of assigning surface-level labels based on explicit features and memorizing patterns. Assessments commonly ask students to rank carbocations from least to most stable; this task can be completed by assigning the relative stability through regurgitation of a memorized trend list, however complicated the trend may get by the end of the yearlong course. The ability to correctly answer such questions solely using memorized trends can encourage students to avoid the task of developing a rationale for why the answer is correct. Using the degree of substitution on the carbocation to assert stability is a step toward developing causal reasoning for its formation; however, this surface feature-based heuristic for rationalizing stability can become problematic when multiple structural features need to be considered to determine the viability of the carbocation intermediate (e.g., allylic or benzylic carbocations). To encourage the development of deeper reasoning, students should be asked to provide a rationale for why a particular carbocation intermediate is more or less stable than a comparison carbocation intermediate. Caspari et al. (2018a) were able to elicit student explanations about carbocation stability beyond the number of substituents as deeper explanations were required to successfully answer their prompt. It is possible that our students would have invoked electron density more frequently had we provided them with an assessment where there were other obvious factors impacting carbocation stability, such as the neighboring carbonyl group present in the prompt used by Caspari et al. (2018a). Students should be provided with assessments that require going beyond the number of substituents to determine carbocation stability to encourage development of deeper explanations. Items containing scaffolded questions that direct and guide students to use electron density in their responses communicate the importance of rationalizing carbocation stability beyond a surface level. Students should also be asked to solve more complicated problems with answers that cannot be attained using only surface level reasoning to encourage them to develop the deeper level of explanation required to solve the problem at hand.

Nucleophile and electrophile. The second step of the reaction is the formation of a bond between an electrophile (i.e., the carbocation) and a nucleophile (i.e., an ethanol molecule in our assessment prompt). Responses were scored at Level 1 when the response only included labeling of the reagent and intermediate with the terms nucleophile and electrophile, respectively, and describing that a bond formed between the two. Responses that included rationale such as ethanol and the tertiary carbocation want to form a bond due to the positive charge of the carbocation were scored at Level 2. This reasoning is both anthropomorphic and teleological but may be a step in the right direction to developing electronic reasoning. Finally, responses that provided a rationale including electron density or partial charges, implicit features of the reaction, were scored at Level 3.

Students were much more likely to articulate the presence of a nucleophile in their responses than to articulate the presence of an electrophile. Of our 1041 responses, there were 611 instances of the term “nucleophile” and only 33 instances of the term “electrophile”. This overemphasis on nucleophiles over electrophiles is in line with the work of Anzovino and Bretz (2016) who partly attributed increased familiarity with nucleophiles to the professor's emphasis on nucleophiles over electrophiles during class. Bhattacharyya and Harris’ (2018) work suggests that the emphasis on identification of nucleophiles over electrophiles could also be attributed to the active voice typically used when describing the reaction step (i.e., saying “the nucleophile attacks the carbocation”) based on the syntax of the electron pushing formalism. Respondents in our study explicitly noted that it was one of the lone electron pairs on ethanol (i.e., the nucleophile) that formed the bond with the carbocation (i.e., the electrophile), indicating they understood something about the direction of electron flow; however, these respondents did not always label the nucleophile and electrophile with the proper chemical terminology.

Proton transfer. Responses that mentioned the proton transfer step (e.g., “once the ethanol is attached to the t-butyl, another ethanol will remove the proton”) without explanation were coded as Level 1. Many students described the proton-transfer step as occurring to “get rid of the charge” or “make the product neutral”, which they related to stability of the product. This use of teleological reasoning was associated with Level 2 scores. An ideal Level 3 explanation for this step would describe the electronegative oxygen of a second ethanol molecule attracting the partially positive hydrogen, forming a new bond, with the electrons from the broken bond becoming a lone pair on oxygen. Though students have previously been able to describe acid–base proton-transfer reactions using the Lewis acid–base model in the context of a standard-curriculum organic chemistry course (Dood et al., 2018) and causally in the context of a revised curriculum (Cooper et al., 2016; Crandell et al., 2018) when provided with a similar prompt, many students in this study simply omitted discussion about the acid–base proton-transfer step. It is likely that students were more focused on the first two steps of the reaction (i.e., the steps that occur in all SN1 reactions) and either did not feel the need to explain the proton-transfer step or did not realize it was there.
Use of anthropomorphisms. Based on the literature, we expected a significant amount of anthropomorphism use in responses (Taber and Watts, 1996). Instances in our dataset ranged from humorous commentaries “Bromine is a very heavy and electronegative atom making it a perfect leaving group because it bares the negative charges very well and honestly does better on its own living the single life” (emphasis added), to more mundane “Bromine is a good leaving group because it wants to be on its own”. Though studies have shown that the use of anthropomorphisms in teaching can be harmful to developing understanding (Talanquer, 2007, 2013), anthropomorphism use can also be evidence for sound chemical reasoning as one could argue was demonstrated in the response that bromine is better “on its own living the single life”. However, this is more the exception than the rule. Many students apply humanlike qualities to molecules and are unable to give reasoning for why a reaction occurs other than the molecule wants to. For example, when describing the final step (i.e., the proton-transfer step) of the reaction, many respondents stated that the compound did not want to have a charge, or the compound wants to be neutral. The word “want” occurred 223 times in our dataset, with very few instances of this anthropomorphism coupled with sound chemical reasoning.

Research has shown that use of anthropomorphisms during instruction can encourage student misunderstandings, as students take the human-like models literally (Talanquer, 2007, 2013) rather than appreciate the analogy intended. Results of our study echo this, as we found that many students wrote about atoms or molecules wanting without providing causal explanations alongside anthropomorphic explanations. Though experts understand that a molecule does not want, novices who are given these anthropomorphic explanations without instruction in how they relate to chemical reasoning may believe that such rationale is sufficient for understanding the chemical activity of the system.

Overall level assignments. Overall, there were a small (N = 113) number of responses scored as Level 1; this indicates respondents in our sample are able to articulate what is happening in the reaction mechanism and, to some degree, why a reaction is occurring at a molecular level (i.e., responses scored as Level 2 or Level 3). The prompt was designed to invoke explanations by asking what was happening and why it was happening separately (see Cooper et al., 2016); thus, inclusion of an attempt to provide a rationale is expected given the structure of the assessment item. Most responses (N = 549, 53%) were coded as Level 2; 379 (36%) were coded as Level 3. This is expected, as students are heavily assessed on the what (i.e., draw the mechanism, predict the products) and less often asked why reactions occur. Caspari et al. (2018a) found that many students did not explain with high levels of complexity until after prompting by the interviewer; thus, it is possible that our students may be capable of inferring implicit features of the reaction and using those features to reason about the why, but simply did not feel it was necessary to include in their response because they had not previously been required to provide this type of explanation.

RQ2: Does lexical analysis lead to a logistic regression model for predicting level of explanation sophistication in response to a constructed-response formative assessment item on explaining an SN1 reaction mechanism?

Two binomial logistic regression models were used to predict the tri-level scores for each response. The first model (i.e., Model #1) codes Level 1 responses as 0 and Level 2 and 3 responses as 1. The second model (i.e., Model #2) codes Level 1 and 2 responses as 0 and Level 3 responses as 1. The results of the two models are combined to assign Level 1, 2, and 3 to responses.

Following the iterative development process described in the Methods section, 91.6% overall accuracy was achieved with the training dataset; this includes an 85.0% accuracy for Level 1, a 90.3% accuracy for Level 2, and a 95.5% accuracy for Level 3. Incorrect assignments were within one level of the correct assignment; in other words, there were no Level 1 responses scored by the computer as Level 3 and there were no Level 3 responses scored by the computer as Level 1. When the same model was applied to the testing dataset, the overall prediction accuracy was 86.9%; the accuracy for Level 1 was 84.4%, the accuracy for Level 2 was 85.1%, and the accuracy for Level 3 was 90.6%. Again, all incorrect assignments were one level away from the correct assignment.

A description of categories used in each model, including rules, types, words and phrases that comprise each category, is provided in Appendices 1 and 2. Regression coefficients (β), p values, and odds ratios for each category by model are reported in Table 4.

Table 4 Descriptive summary of categories by level classification for the training set and results of two binomial logistic regressions
Category Model #1 Model #2
L1% (n = 76) L2/3% (n = 652) β OR p L1/2% (n = 456) L3% (n = 272) β OR p
a Inverse odds ratio.*p < 0.05, **p < 0.01, ***p < 0.001. For Model #1: χ2(26) = 156.36, p < 0.0001. For Model #2: χ2(33) = 742.22, p < 0.0001.
Accept/donate electrons 22.4 73.0 −0.77 2.17a 26.5 26.5 0.41 1.51
Attraction 0 8.9 2.70 14.95 0 21.3 6.46 636.5 ***
Absence of explanation 5.3 14.6 −2.92 18.51a *** 11.8 16.5 −2.98 19.77a **
Bond breaks 1.3 31.8 1.42 4.12 0.7 75.4 0.52 1.69
Bond electrons 90.1 7.8 0.11 1.12 25.9 0.7 0.29 1.34
Bond forms 7.9 20.0 1.65 5.22 14.5 25.7 0.25 1.28
Carbocation attacked 6.6 16.1 −0.19 1.20a 12.9 18.8 0.53 1.69
Carbocation 59.2 86.4 −0.10 1.11a 81.4 87.1 −1.00 0.37
Degree carbocation 34.2 52.6 1.06 2.89 48.7 54.0 −0.34 1.41a
Deprotonate 2.6 4.8 1.51 4.52 4.2 5.1 0.91 2.48
Donate hydrogen 19.7 57.7 1.65 5.22 41.9 73.5 1.02 2.77
“Don’t know” 0 27.2 −7.41 1660a * 6.1 54.8 0.66 1.93
Electron attack 1.3 34.4 −2.66 14.21a * 26.1 39.0 0.77 2.17
Electron terminology 0 35.1 2.18 8.82 * 29.6 34.6 −0.09 1.09a
Electronegativity 36.8 44.0 47.6 36.0 4.22 68.24 ***
Electrophile accepts 47.4 44.8 −0.22 1.23a 50.0 36.8 −0.64 1.91a
Eliminate charge 7.9 15.2 3.45 31.48 ** 14.9 13.6 −0.52 1.69a
Good leaving group 0 49.9 41.5 50.0 −0.21 1.24a
Leaving group leaves 0 5.1 0.80 2.23 3.7 5.9 0.16 1.17
Nucleophile attacks 1.3 2.3 −0.55 1.75a 2.6 1.5 0.14 1.15
Nucleophile/electrophile terminology 1.3 35.0 −0.14 1.15a 27.4 38.2 −0.68 1.97a
Opposites 9.2 1.1 2.4 1.1 4.83 125.8 ***
Partial charges 4.0 24.1 17.5 29.4 7.64 2070 ***
Positive/negative charges 0 9.7 3.95 51.73 ** 10.8 5.1 −0.12 1.13a
Reaction terminology 0 9.4 0.23 1.26 8.6 8.1 −0.55 1.74a
Solvent terminology 10.5 21.3 0.33 1.39 14.5 29.8 −0.92 2.51a
Stability carbocation 2.6 10.4 0.68 1.97 7.0 14.0 0.06 1.06
Stability terminology 0 6.0 0.4 13.6 −0.52 1.69a
Sterics 51.3 68.4 64.9 69.5 2.13 8.37 *
Temperature 30.3 30.2 1.95 7.04 31.1 28.7 0.60 1.82
Wants 2.6 48.5 0.76 2.14 32.5 62.5 −0.35 1.42a
Weak/strong base 6.6 24.4 16.7 32.4 −0.06 1.06a
Weak/strong nucleophile 2.6 28.2 0.97 2.63 24.1 27.9 0.36 1.43
Constant −0.65 0.52 −3.20 0.04

Regression model 1. The first binomial logistic regression model differentiates Level 1 responses (coded as 0) from Level 2 and Level 3 responses (coded as 1). Significant predictors include absence of explanation, “don’t know”, electron terminology, electron attack, eliminate charge, and positive/negative charges. Unsurprisingly, the categories absence of explanation and “don’t know” were strong predictors for a response being scored as Level 1, with inverse odds ratios of 19 and 1660, respectively. Thus, the odds of responses that included the category absence of explanation being scored as Level 1 (as opposed to being scored as Level 2/Level 3) were 19 times higher than responses that did not include the category, and the odds of responses that claimed they did not know why the reaction is occurring (i.e., included the category “don’t know”) being scored as Level 1 were 1660 times higher with all other factors held constant. The category electron attack was also a strong predictor for a response being scored as Level 1, with an inverse odds ratio of 14. The nature of this category as a negative predictor is discussed further below. Use of electron terminology, talking about eliminating the charge in the reaction, and talking about positive and negative charges were positive predictors for a response being scored as Level 2 or Level 3. The odds of responses that included these categories being scored as Level 2 or Level 3 (as opposed to be being scored as Level 1) were 9, 32, and 52 times higher, respectively, compared to responses that did not include these categories and holding all other factors constant.
Regression model 2. The second binomial logistic regression model differentiated Level 1 and 2 responses (coded as 0) from Level 3 responses (coded as 1). Significant positive predictors for Level 3 include partial charges, attraction, electronegativity, sterics, and opposites. Absence of explanation is a significant negative predictor with an inverse odds ratio of 20. The categories “partial charges”, “attraction”, and “electronegativity” describe reasoning that would be included in a Level 3 response. The odds of responses coded with these categories being scored are Level 3 (as opposed to being scored as Level 1/Level 2) are 2070, 637, and 68 times higher, respectively, compared to response not coded with these categories when holding all other factors constant. The odds of responses including the category “sterics” being scored as Level 3 (as opposed to being scored as Level 1/Level 2) were eight times higher than for responses that did not include sterics, holding all other variables constant. The category opposites includes a rule that looks for types negative and positive in any order, a rule that looks for the type positive and then the type attack (e.g., “the positive charge is being attacked”), a rule that looks for the type negative and then the type attack (e.g., “the negative charge is attacking”), a rule that looks for the type opposite and then the type charge, and the word dipole. The odds of responses in this category being scored as Level 3 over Level 1/Level 2 were 126 times higher than responses that were not in this category when holding all other variables constant.
Overall level assignments. An 86.9% overall accuracy with the testing dataset suggests that we have developed a suitable set of predictive regression models for scoring the constructed-response item. While this level of accuracy exceeds reported accuracies in similar studies, we recommend that use of resultant scores be limited to formative assessments; this recommendation mirrors recommendations for use by others who have developed analogous computer-assisted scoring models (Moharreri et al., 2014; Link et al., 2017).

Discussion of electrons was important across all three levels. The use of electron terminology was a predictor for a response to be scored as Level 2 or Level 3. Typically, the use of electron terminology included explanation such as describing lone pairs attracting an electrophile (deeper-level explanation) or even bromine wanting to keep its electrons to itself (i.e., surface-level teleological explanation). The use of phrases where electrons attack was a negative predictor for Level 2 or 3, as the inclusion of a phrase indicating electrons are attacking is not enough to indicate an explanation of why the reaction is occurring. However, writing about electrons attacking may be a first step toward developing an explanation for why the attack is occurring. The categories partial charges, attraction, and electronegativity were also predictors for a Level 3 score.

Another important concept that differentiated a Level 1 score from Level 2 score is the use of charges in an explanation. The categories eliminate charge and positive/negative charges were significant predictors of Level 2 and Level 3 scores. Partial charges was a significant predictor of a Level 3 score. The category opposites was also a predictor of a Level 3 score; this is a category that refers to positive and negative partial and formal charges interacting with each other. This corroborates findings from Anzovino and Bretz (2016) where charges were an important part of student understanding of nucleophiles and electrophiles. Most participants in the Anzovino and Bretz study, as well as Level 2 responses in our study, used charges to explain the reactivity of nucleophiles and electrophiles. Partial charges, which better explain reactivity than formal charges alone, were associated with Level 3 responses. Areas of high and low electron density (i.e., partial charges) interacting with each other indicate deeper reasoning (i.e., a Level 3 score), as partial charges are an implicit feature.


Implications for researchers

Formative assessment items can be developed to further evaluate students’ explanations of scientific phenomena. Analysis of written assessment responses including through lexical analysis and logistic regression models can provide insight about student explanations. Though some time is required to develop logistic regression models, once models are developed, they can be used for formative assessment and to evaluate the impact of interventions on student responses to constructed-response items. Our analyses were conducted using Python, an open-source scripting language (see, making these tools more accessible. Researchers can develop predictive models without experiencing the financial barrier of purchasing software. In the field of organic chemistry, constructed-response items should be developed that cover a broader range of reaction types (e.g., SN2, E1, E2, Diels–Alder). Analogous constructed-response items could allow researchers to more deeply understand student explanations of those reaction types and uncover persistent misunderstanding across a spectrum of reaction types. Future models could include more detailed analysis of student responses, such as level assignments for explanations of specific aspects of the reaction (e.g., leaving group, carbocation, nucleophile and electrophile, and proton transfer). Additionally, constructed-response items that include case comparison, such as those used by Bodé et al. (2019) and Caspari et al. (2018a), could be developed to further elicit student explanations and predictive logistic regression models could be developed for such items. Predictive models could be developed to score responses to prompts of the type used in this study based on the frameworks for levels of explanation used by Bodé et al. (2019) and Caspari et al. (2018a).

Resultant predictive models could be used to develop targeted interventions for students, such as the targeted intervention we developed to encourage students to explain what is happening in an acid–base proton-transfer mechanism and why using the Lewis acid–base model (Dood et al., 2019). Instructional tools like our Lewis acid–base tutorial could be developed that remedy lack of understanding about specific topics. Such remedies should be grounded in implications of studies regarding those specific parts of the reaction mechanism. For example, in our study, we considered the four main parts of the reaction when evaluating the three Levels of explanation sophistication that have previously been explored by other researchers: leaving groups (Popova and Bretz, 2018), carbocations (Caspari et al., 2018a; Bodé et al., 2019), nucleophiles and electrophiles (Anzovino and Bretz, 2015, 2016), and acid/base proton transfer (Cartrette and Mayo, 2011; Cooper et al., 2016; Crandell et al., 2018). An understanding and synthesis of all of these topics is required for a complete understanding of the reaction mechanism (cf., Cruz-Ramírez de Arellano and Towns, 2014). Our work could be used to develop learning tools that remedy lack of understanding of each of these parts individually and encourage a synthesis of the components to form a coherent causal rationale of reaction mechanisms. Application program interfaces (APIs) could be developed to analyze student responses in real time and provide learners with instructional tools based on categories that are missing from their responses and level of understanding. Students could respond to a prompt and have the response immediately scored by the API. Students would not see their score; the score, though, would be used to direct the program to provide students with a specific tutorial meant to assist them based on their current level of explanation sophistication. For example, if a student were scored at Level 1, a tutorial meant to aid students in developing their explanation to a Level 2 explanation could be provided. Students scored at Level 2 initially could have a tutorial aimed at developing their current Level 2 explanation into a Level 3 explanation. Students already at Level 3 could receive a tutorial that takes them beyond the levels expressed in this model or a tutorial reinforcing the concepts they have already expressed in their explanation. There are many options for what a predictive model can code. The developer of the model can choose to code for specific text patterns, phrases, or terminology at their desired grain size. Models could be developed that identify the use of specific cause-and-effect relations or implicit properties. The work presented in this paper only scratches the surface of the possibilities for lexical analysis, logistic regression models, and APIs for use in organic chemistry courses.

Implications for instructors

Students should be assessed regarding molecular explanations about why reactions occur, as assessments provide students with a strong message about what is important in the course (Holme et al., 2010). Other researchers have called for instructors to use assessments that ask students to engage in reasoning about the why (Cooper, 2015; Cooper et al., 2016; Caspari et al., 2018a). If an instructor wishes to summatively assess students on their explanations of phenomena, it is important to change their teaching strategies to aid development of such skills in a low-stakes environment. One way to develop such skills in a formative way is through the use of assignments which use logistic regression techniques to provide large numbers of students with immediate feedback. The model presented in this study is available for use by instructors in their courses ( Given the non-perfect accuracy of our scoring tool, we recommend, as have others, that our predictive models be used to score the constructed-response items only in formative assessment contexts. Model output includes an overall Level assignment as well as category codes; instructors can use this depth of information to tailor lectures to address areas missing from many students’ explanations. For example, if the majority of student responses are assigned Level 1, it would be important to help develop students’ explanations in such a way that students will begin to explain why the reaction is occurring rather than just describing the reaction. The absence of explanation category, a negative predictor for a Level 2 or Level 3 score, includes responses that lack the types associated with explanations of why the reaction is occurring. Such explanations could include types such as attract, electronegativity, partial, stability, degree, strong, and weak. A professor can use information about their students’ responses and emphasize these topics in their lecture.

In addition, students could receive a list of categories coded for their response as well as ideal categories that were not coded by an API immediately after completing a constructed-response item. This fine-grain level of information coupled with targeted-instructional materials assigned to the student by the API based on their written response could be used by the student to better understand what level of explanation is expected of them and to direct their attention to areas they have missed.

Preparing students for success in a curriculum where examinations require them to explain the why rather than simply regurgitating memorized information may require additional instruction on learning how to learn and study. For example, one way to encourage and teach self-regulated learning (Zimmerman and Martinez-Pons, 1988) is through a intervention module such as the Growth and Goals project from the University of Ottawa (Flynn, 2016).

Our data suggest that students continue to use anthropomorphisms in their explanations. Instructors should limit use of anthropomorphisms and teleology when describing chemical phenomena, encouraging students to develop sound chemical reasoning. Deeper-level chemical explanations including the use of explicit features to infer implicit features should be encouraged. Talanquer (2013) suggested specifically discussing the differences between teleological and more chemically sound reasoning, such as causal mechanistic reasoning, for the same system during class, so that students learn to develop explanations and apply reasoning in a sound chemical manner.


There are some key limitations of our study: homogeneity of the sample, lack of ability to ask follow-up questions to prompt deeper explanations, level of accuracy of the scoring model, and focus on one specific reaction type. The response data were collected from students who took organic chemistry with one of three instructors during two different semesters; these students experienced one curriculum setting at one institution. Different curricula have been reported in the literature for the postsecondary yearlong course in organic chemistry (e.g., the Mechanisms before Reactions curriculum; Flynn and Ogilvie, 2015); students in other contexts may respond differently to the constructed-response item and may require modification of the coding scheme based on different emphases and levels of explanation sophistication. Despite the homogeneity of our responses, our model is generalizable as our findings mirror those of more interview-based studies and human-scored assessments reported in the literature.

Additionally, as the constructed-response item was administered using a survey, there was not an opportunity to ask students follow-up questions that may have prompted the use of deeper explanations (i.e., Level 3). Caspari et al. (2018a) were able to observe high levels of complexity in student explanations, but this complexity typically did not occur until after follow-up questions were asked by the interviewer. Given this finding, it is possible we could have elicited more Level 3 responses from students had the study used interviews.

Another limitation is the overall accuracy of the predictive model: 86.9%. Our accuracy level is similar to other published predictive models (Haudek et al., 2012; Prevost, et al., 2012, 2016; Dood et al., 2018) developed for similar purposes, but is still non-perfect. We thus reiterate our recommendation that the computer-assisted scoring be used for formative assessment purposes.

We report a scoring model for one specific reaction type (i.e., SN1) with minimally varied starting materials and reagents. As students learn many reactions and several mechanism types in organic chemistry, it would be useful to expand the tool to include more SN1 reactions and develop additional tools for use with other reaction types (e.g., SN2, E1, E2). Items could also be developed that ask students to make comparisons about the feasibility of different reaction mechanisms (Caspari et al., 2018a; Bodé et al., 2019). This may require the construction of multiple models, as the same coding scheme may not fit with all mechanism types. For example, the reaction studied here includes an acid–base step at the end. Not all substitution and elimination reactions require this step. A reaction mechanism of the same type (i.e., SN1) that does not include this step would likely not fit the current predictive model as is, but the model could be modified to accommodate this difference.


This study has shown that three levels of explanation emerged for our sample of students in response to a constructed-response item on the SN1 reaction mechanism. The assessment item and corresponding logistic regression models we presented in this study revealed themes in student understanding that corroborated those found by others in the literature. Results of our study can be used to modify teaching practices and develop instructional tools to help students develop a deeper understanding of SN1 reaction mechanisms.

Conflicts of interest

There are no conflicts to declare.


Appendix 1: categories used in logistic regression

Categories are not mutually exclusive. <type>: Several words (or just one word) can be added to a type so that rules can be created. A list of types can be found below.

Rules allow for six “any words” between types called for in the rule.

Category Absence of explanation
Description Response does not include terminology and phrases associated with the why of the reaction.
Terms/phrases/types <absence of explanation>
Rules Does NOT include:
• <attract>
• <electronegativity>
• <partial>
• <stability>
• <degree>
• <strong>
• <weak>
• “excess”
Example response(s) • “A carbocation forms where bromine left the substrate. The bromine leaves the substrate and the nucleophile attacks.”

Category Accept/donate electrons
Description Response talks about accepting or donating electrons.
Terms/phrases/types None
Rules • <donate> + <electrons>
• <accept> + <electrons>
• <lose> + <electrons>
• <keep> + <electrons>
Example response(s) • “…creates a positive carbocation which in turn accepts a pair of electrons from the alcohol.”
• “The oxygen molecule comes in and donates its electrons.”
• “The oxygen keeps the electrons that were forming the bond.”
• “Carbon and bromine are breaking to give those two electrons to bromine.”

Category Attraction
Description Response talks about the attraction of something.
Terms/phrases/types None
Rules • <attract> + <negative>
• <carbocation> + <attract>
• <opposites> + <attract> • <attract> + <nucleophile>
• <electrons> + <attract>
• <positive> + <attract>
• <negative> + <attract>
• <attract> + <positive>
• <attract> + <electrons>
• <starting material> + <attract>
• <attract> + <starting material>
• <attract> + <bond>
• <bond> + <attract>
• <attract> + <carbocation>
• <nucleophile> + <attract>
Example response(s) • “…to form a tertiary carbocation, which then attracts the oxygen molecule from the alcohol.”
• “With ethanol acting as a nucleophile it is attracted to the carbocation and bonds to the structure.”
• “The bromine is attracting the electrons in the bond.”

Category Bond breaks
Description Response talks about the breaking of a bond.
Terms/phrases/types None
Rules • <bond> + <break>
• <break> + <bond>
Example response(s) • “Bromide's bond to carbon is being broken.”
• “The ethanol comes back around and attacks the hydrogen breaking the bond and forming t-butyl ether.”

Category Bond electrons
Description Response talks about the electrons in the bond or electrons bonding.
Terms/phrases/types None
Rules • <bond> + <electrons>
• <electrons> + <bond>
Example response(s) • “The extra pair of electrons will bond to the positive charge.”
• “The bromine is attracting the electrons in the bond.”
• “The bond electrons stay with the oxygen.”

Category Bond forms
Description Response talks about a bond forming.
Terms/phrases/types “new bond”
Rules • <bond> + <form>
• <form> + <bond>
Example response(s) • “A bond forms between the carbocation and the oxygen of the ethanol.”
• “One of the lone pairs from the alcohol group attached to the ethanol molecule attacks the carbocation and forms a bond with it.”
• “The ethanol comes in and attacks the positive carbocation forming a new bond.”

Category Carbocation
Description Response includes the carbocation type.
Terms/phrases/types <carbocation>
Rules None
Example response(s) • “A bromine takes the electrons in its bond, leaving a carbocation.”

Category Carbocation attack
Description Response talks about the carbocation being attacked or the carbocation attacking.
Terms/phrases/types None
Rules • <attack> + <carbocation>
• <attack> + <positive>
• <bond> + <carbocation>
• <carbocation> + <accept>
• <attack> + <charge>
• <carbocation> + <attack>
Example response(s) • “The alcohol attacks because the C with the cation needs to be stabilized.”
• “…leaving a tertiary carbocation which is then attacked by the ethyl alcohol.”
• “Ethanol is used to make the new structure because it will bond to the carbocation.”
• “…creates a positive carbocation which in turn accepts a pair of electrons from the alcohol.”

Category Carbocation degree
Description Response talks about the degree of the carbocation (i.e., tertiary).
Terms/phrases/types None
Rules • <degree> + <carbocation>
• <carbocation> + <degree>
Example response(s) • “Bromine leaves, leaving a tertiary carbocation.”
• “The reaction is SN1 because the carbocation is tertiary.”

Category Deprotonate
Description Response talks about deprotonation or accepting a proton.
Terms/phrases/types <deprotonate>
Rules • <accept> + <hydrogen>
• <hydrogen> + <accept>
Example response(s) • “A second alcohol comes in to deprotonate the substrate.”
• “The hydrogen is attacked by the oxygen to remove the hydrogen.

Category Donate hydrogen
Description Response talks about donating a proton.
Terms/phrases “protonate”
Rules • <donate> + <hydrogen>
• <lose> + <hydrogen>
• <hydrogen> + <donate>
Example response(s) • “A proton is donated from the protonated alcohol.”
• “Water is playing the role of a base, therefore protonating the alcohol that was there.”

Category “Don’t know”
Description Response states a phrase related to “I don’t know”
Terms/phrases/types <do not know>
Rules None
Example response(s) • “I am unable to answer the question to the desired depth.”
• “I’m not sure if this is in depth enough.”

Category Electron attack
Description Response talks about electrons attacking or being used to form a bond.
Terms/phrases/types None
Rules • <electrons> + <attack>
• <attack> + <electrons>
Example response(s) • “OH is a good nucleophile because it has a pair of electrons it can attack with.”
• “An EtOH lone pair then attacks the carbocation's positive charge.”
• “It takes the electrons in the bond that was once formed with the substrate.”

Category Electron terminology
Description Response uses electron terminology.
Terms/phrases/types <electrons>
Rules None
Example response(s) • “The C–Br bond will break giving its electrons to Br.”
• “Then the oxygen from EtOH has lone pairs that will attack the carbocation.”

Category Electronegativity
Description Response invokes the concept of electronegativity.
Terms/phrases/types <electronegativity>
Rules • <share> + <electrons>
• <electrons> + <density>
• <density> + <electrons>
• <needs> + <electrons>
• <wants> + <electrons>
• <electrons> + <sharing>
• <attract> + <electrons>
• <electrons> + <attract>
Example response(s) • “The lone pairs in the oxygen are attracted to the carbocation.”
• “…it needs more electrons due to it having a cation.”
• “Since it is more electronegative than its substrate it takes the electrons in the bond.”
• “Br is taking the shared electrons.
• “Different areas of electron density cause a shift in reactants to form new products.”

Category Electrophile accepts
Description Response talks about the electrophile accepting electrons.
Terms/phrases/types None
Rules • <electrophile> + <accept>
• <carbocation> + <accept>
Example response(s) • “An electrophile accepts a pair of electrons to form a new covalent bond.”
• “The tertiary carbocation accepts the electrons from the OH.”

Category Eliminate charge
Description Response talks about neutralizing the charge.
Terms/phrases/types <neutralize>
Rules • <stabilize> + <charge>
• <lose> + <charge>
• <eliminate> + <charge>
Example response(s) • “The remaining reaction occurs to remove the charge on water and balance the molecule.”
• “Ethanol is in excess so it will need to be used again to neutralize the alcohol group.”
• “The bond with hydrogen is broken so the oxygen can return to a neutral charge.”
• “Another ethanol molecule comes along and remove the hydrogen from the positively charged oxygen in order to eliminate the positive charge.”
• Bromine is a very good leaving group (it is able to stabilize a negative charge).

Category Good leaving group
Description Response describes a good leaving group.
Terms/phrases/types None
Rules • <good> + <leaving group>
• <wants to> + <leave>
Example response(s) • “This reaction occurs because bromine is a very good leaving group.”
• “The reaction occurs because the tertiary bromine wants to leave as a leaving group.”

Category Leaving group leaves
Description Response talks about the leaving group leaving.
Terms/phrases/types None
Rules • <leaving group> + <leave>
• <halogen> + <leave>
Example response(s) • “This reaction occurs because the leaving group leaves.”
• “The nucleophile has to wait for the bromine to leave in order to bond to the positive carbocation.”

Category Nucleophile attacks
Description Response talks about the nucleophile attacking, or something acting as a nucleophile.
Terms/phrases/types None
Rules • <nucleophile> + <attack>
• <attack> + <nucleophile>
• <act> + <nucleophile>
Example response(s) • “This leaves behind a positive charge where the HO of the ethanol nucleophile attacks since it has extra lone pairs.”
• “The lone pair on the oxygen acts as a nucleophile.”

Category Nucleophile/electrophile terminology
Description Response talks about a nucleophile or an electrophile.
Terms/phrases/types <nucleophile>, <electrophile>
Rules None
Example response(s) • “The nucleophile attacks the electrophile.”

Category Opposites
Description Response talks about opposing charges.
Terms/phrases/types “dipole”
Rules • <opposite> + <charge>
• <positive> + <negative>
• <negative> + <positive>
• <negative> + <attack> + <positive>
• <positive> + <attack> + <negative>
Example response(s) • “Opposite charges attract and like charges repel.”
• “The carbocation is positively charged attracting the partially negative oxygen in ethanol.”
• “OH then attacks the carbocation because a negative charge is attracted to the positive charge.”
• “The second step would be an O which maintains a partial negative charge attacking the positive compound.”
• “A positive dipole is created on the oxygen of ethanol so another ethanol can remove the hydrogen.”

Category Partial charges
Description Response
Terms/phrases/types None
Rules <partial> + <charge>
Example response(s) • “Ethanol acts as a nucleophile because of the partial negative charge on oxygen.”

Category Reaction terminology
Description Response labels the type of reaction (e.g., SN1).
Terms/phrases/types <reaction type>
Rules None
Example response(s) • “Since it is an SN1 reaction, the leaving group leaves first.”
• “SN1 is a unimolecular substitution reaction.”
• “There is also not heat present so the reaction will not favor elimination.”

Category Solvent terminology
Description Response talks about the solvent (e.g., ethanol) or solvent type (e.g., protic).
Terms/phrases/types <solvent>
Rules None
Example response(s) • “The reaction occurs because the substrate is tertiary and the solvent is polar protic.”
• “In the presence of polar aprotic solvents, the leaving group leaves and forms a carbocation.”

Category Stability of carbocation
Description Response talks about the stability of the carbocation
Terms/phrases/types None
Rules • <carbocation> + <stability>
• <stability> + <carbocation>
• <carbocation> + <deficient>
Example response(s) • “The carbocation is unstable and needs to acquire electrons to stabilize the charge.”
• “After bromine leaves, a stable tertiary carbocation forms in its place.”
• “A tertiary carbocation forms that is deficient in electrons.”

Category Stability terminology
Description Response includes terminology related to stability.
Terms/phrases/types <stability>
Rules None
Example response(s) • “This reaction occurs in order to achieve a more stable molecule.”
• “Another ethanol molecule can further stabilize the reaction by removing the hydrogen from the first oxygen.”
• “This reaction occurs due to stability.”
• “The positive charge is not stable.”

Category Sterics
Description Response includes terminology related to sterics.
Terms/phrases/types <sterics>
Rules None
Example response(s) • “Since the substrate is a tertiary carbocation, it is considered sterically hindered.”
• “This occurs because the reaction consists of a bulky base.”
• “The alkyl halide leaves first to create a carbocation and to reduce crowdedness.”

Category Temperature
Description Response includes terminology related to temperature.
Terms/phrases/types <temperature>
Rules None
Example response(s) • “Since it is tertiary and no heat or strong base, it prefers SN1.”
• “The reaction occurs because of lower temperature.”

Category “Wants”
Description Response talks about molecules “wanting” to do things.
Terms/phrases/types <wants to>
Rules None
Example response(s) • “This reaction occurs because bromine is a leaving group that generally wants to break a bond.”

Category Weak/strong base
Description Response describes the strength of the base.
Terms/phrases/types None
Rules • <weak> + <base>
• <strong> + <base>
• <base> + <weak>
• <base> + <strong>
Example response(s) • “SN1 favors weak bases.”
• “This happens because OH is a strong base.”
• “The leaving group leaves because the substrate is tertiary and the base is weak.”

Category Weak/strong nucleophile
Description Response describes the strength of the nucleophile.
Terms/phrases None
Rules • <weak> + <nucleophile>
• <strong> + <nucleophile>
• <nucleophile> + <weak>
• <nucleophile> + <strong>
Example response(s) • “SN1 has a weak nucleophile with a highly substituted leaving group.”
• “The reaction is SN1 because it is a tertiary substrate/electrophile and the nucleophile is a strong nucleophile.”
• “Since the nucleophile is weak it then becomes an SN1 mechanism.”
• “The leaving group and nucleophile are strong.”

Appendix 2: types used in logistic regression

Type Words included in type
Accept Accept, attach, gain, get, keep, leave with, pick up, pull off, reach, receive, steal, swipe, interact, draw
Acid Acid
Act Act
Attack Attack, grab, take
Attract Attract, draw, interact, gravitate, intermolecular force, pull, go to
Base Base
Because Because, due to, since
Bond Bond
Break Break, disconnect
Carbocation Positive carbon, cation, carbocation
Carbon Carbon
Charge Charge
Degree Degree, tertiary, quaternary
Density Rich, poor, density, deficient, sufficient, lack
Deprotonate Deprotonate, remove proton, take proton
Do not know Don’t know, don’t remember, no idea, not exactly sure, not sure, unable to answer, unknown, not too sure
Donate Donate, give, let go, take from, transfer
Electronegativity Electronegative, electropositive
Electrons Bond electrons, electron cloud, electron pair, electrons, lone pair
Electrophile Electrophile
Form Form, make, produce
Halogen Bromine, chlorine, iodine, fluorine, halogen
Hydrogen Hydrogen, proton
Ions Ion, anion, cation
Join Join, come together
Just because Appropriate, conditions, how it works, nature of reaction, just because
Leave Alone, depart, detach, exit, go away, leave, move, off by itself, separate, break, kick off, remove itself
Leaving group Leaving group, lg
Lose lose
Need Need, require
Negative Negative, minus
Neutralize Cancel out, get rid of charge, get rid of the charge, neutralize, no charge, no net charge, not charged, remove charge, remove the charge, uncharge, eliminate charge, eliminate the charge, neutral, relieve
Nucleophile nucleophile
Number of steps Step, more than one, multiple
Opposite Opposite, different, unbalance, unequal
Partial Partial, slight, unequal
Positive Cation, lack, not neutral, plus, positive
Preposition Between, from, with
Product Product, final
Reaction type Addition, rearrangement, eliminate, substitute, condition, e1, e2, sn1, sn2, backside attack
Sharing Share, divide, distribute
Size Small, large, big, tiny, size
Solvent Excess, polar, protic, aprotic, solvent, nonpolar
Stability Balance, stable, unstable
Starting material Original compound, starting material, reactant, reagent, substrate, ethanol, methanol, alcohol, tert-butyl bromine, tert-butyl iodine, tert-butyl chlorine, water, oxygen
Sterics Steric, bulky, hinder, crowd, constraint, inaccessible
Strong Fine, good, strong, excellent
Temperature Cold, cool, heat, hot, warm, temperature
Wants to Want, fine with, happy to, like, prefer, ready to, want, desire
Weak Weak, not strong


  1. Abrams E. and Southerland S., (2001), The how's and why's of biological change: how learners neglect physical mechanisms in their search for meaning, Int. J. Sci. Educ., 23, 1271–1281.
  2. Anzovino M. E. and Bretz S. L., (2016), Organic chemistry students’ fragmented ideas about the structure and function of nucleophiles and electrophiles: a concept map analysis, Chem. Educ. Res. Pract., 17, 1019–1029.
  3. Anzovino M. E. and Bretz S. L., (2015), Organic chemistry students’ ideas about nucleophiles and electrophiles: the role of charges and mechanisms, Chem. Educ. Res. Pract., 16, 797–810.
  4. Becker N., Noyes K. and Cooper M., (2016), Characterizing Students’ Mechanistic Reasoning about London Dispersion Forces, J. Chem. Educ., 93, 1713–1724.
  5. Bhattacharyya G., (2013), From source to sink: mechanistic reasoning using the electron-pushing formalism, J. Chem. Educ., 90, 1282–1289.
  6. Bhattacharyya G. and Bodner G. M., (2005), “It gets me to the product”: how students propose organic mechanisms, J. Chem. Educ., 82, 1402–1407.
  7. Bhattacharyya G. and Harris M. S., (2018), Compromised Structures: Verbal Descriptions of Mechanism Diagrams, J. Chem. Educ., 95, 366–375.
  8. Bodé N. E., Deng J. M. and Flynn A. B., (2019), Getting Past the Rules and to the WHY: Causal Mechanistic Arguments When Judging the Plausibility of Organic Reaction Mechanisms, J. Chem. Educ., 96, 1068–1082.
  9. Cartrette D. P. and Mayo P. M., (2011), Students’ understanding of acids/bases in organic chemistry contexts, Chem. Educ. Res. Pract., 12, 29–39.
  10. Caspari I., Kranz D. and Graulich N., (2018a), Resolving the complexity of organic chemistry students’ reasoning through the lens of a mechanistic framework, Chem. Educ. Res. Pract., 19, 1117–1141.
  11. Caspari I., Weinrich M. L., Sevian H. and Graulich N., (2018b), This mechanistic step is “productive”: organic chemistry students’ backward-oriented reasoning, Chem. Educ. Res. Pract., 19, 42–59.
  12. Cooper M., (2015), Why Ask Why? J. Chem. Educ., 92, 1273–1279.
  13. Cooper M., Kouyoumdjian H. and Underwood S., (2016), Investigating students’ reasoning about acid–base reactions, J. Chem. Educ., 93, 1703–1712.
  14. Crandell O., Kouyoumdjian H., Underwood S. and Cooper M., (2018), Reasoning about Reactions in Organic Chemistry: Starting It in General Chemistry, J. Chem. Educ., 2019, 96(2), 213–226.
  15. Cruz-Ramírez de Arellano D. and Towns M. H., (2014), Students’ understanding of alkyl halide reactions in undergraduate organic chemistry, Chem. Educ. Res. Pract., 15, 501–515.
  16. Dood A. J., Fields K. B. and Raker J. R., (2018), Using Lexical Analysis To Predict Lewis Acid–Base Model Use in Responses to an Acid–Base Proton-Transfer Reaction, J. Chem. Educ., 95, 1267–1275.
  17. Dood A. J., Fields K. B., Cruz-Ramírez de Arellano D. and Raker J. R., (2019), Development and evaluation of a Lewis acid–base tutorial for use in postsecondary organic chemistry courses, Can. J. Chem., 1–11.
  18. Ferguson R. and Bodner G., (2008), Making sense of the arrow-pushing formalism among chemistry majors enrolled in organic chemistry, Chem. Educ. Res. Pract., 9, 102–113.
  19. Flynn A., (2016), Growth and Goals Project Details [WWW Document]. Flynn Res. Group.
  20. Flynn A. B. and Ogilvie W. W., (2015), Mechanisms before Reactions: A Mechanistic Approach to the Organic Chemistry Curriculum Based on Patterns of Electron Flow, J. Chem. Educ., 92, 803–810.
  21. Glaser B. G. and Strauss A. L., (1967), The discovery of grounded theory: strategies for qualitative research, Chicago: Aldine Publishing.
  22. Grove N. P., Cooper M. M. and Rush K. M., (2012), Decorating with Arrows: Toward the Development of Representational Competence in Organic Chemistry, J. Chem. Educ., 89, 844–849.
  23. Haudek K. C., Prevost L. B., Moscarella R. A., Merrill J. and Urban-Lurain M., (2012), What are they thinking? Automated analysis of student writing about acid-base chemistry in introductory biology, CBE Life Sci. Educ., 11, 283–293.
  24. Holme T., Bretz S. L., Cooper M., Lewis J., Paek P., Pienta N., Stacy A., Stevens R. and Towns M., (2010), Enhancing the role of assessment in curriculum reform in chemistry, Chem. Educ. Res. Pract., 11, 92–97.
  25. Kaplan J. J., Haudek K. C., Ha M., Rogness N. and Fisher D. G., (2014), Using lexical analysis software to assess student writing in statistics, Technol. Innov. Stat. Educ., 8(1), retrieved from
  26. Kermack W. O. and Robinson R., (1922), LI.—An explanation of the property of induced polarity of atoms and an interpretation of the theory of partial valencies on an electronic basis, J. Chem. Soc., Trans., 121, 427–440.
  27. Koslowski B., (1996), Theory and Evidence the Development of Scientific Reasoning.
  28. Lemke J. L., (1990), Talking Science: Language, Learning, and Values, Ablex Publishing Corporation, 355 Chestnut Street, Norwood, NJ 07648, (hardback: ISBN-0-89391-565-3; paperback: ISBN-0-89391-566-1).
  29. Link S., Chukharev-Hudilainen E. and Ranalli J., (2017), Automated writing evaluation for formative assessment of second language writing: investigating the accuracy and usefulness of feedback as part of argument-based validation, Educ. Psychol., 37, 8–25.
  30. Moharreri K., Ha M. and Nehm R. H., (2014), EvoGrader: an online formative assessment tool for automatically evaluating written evolutionary explanations, Evol. Educ. Outreach, 7, 15.
  31. National Research Council, (2011), A Framework for K-12 Science Education: Practices, Crosscutting Concepts, and Core Ideas.
  32. Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P., Weiss R., Dubourg V., Vanderplas J., Passos A., Cournapeau D., Brucher M., Perrot M. and Duchesnay É., (2011), Scikit-learn: Machine Learning in Python, J. Mach. Learn. Res., 12, 2825–2830.
  33. Popova M. and Bretz S. L., (2018), Organic Chemistry Students’ Understandings of What Makes a Good Leaving Group, J. Chem. Educ., 95, 1094–1101.
  34. Prevost L. B., Haudek K., Urban-Lurain M. and Merrill J., (2012), Examining student constructed explanations of thermodynamics using lexical analysis, in 2012 Frontiers in Education Conference Proceedings, Presented at the 2012 Frontiers in Education Conference Proceedings, pp. 1–6.
  35. Prevost L. B., Haudek K. C., Norton Henry E., Urban-Lurain M. and Berry M. C., (2013), Automated text analysis facilitates using written formative assessments for just-in-time teaching in large enrollment courses, Presented at the 120th ASEE Annual Conference & Exposition.
  36. Prevost L. B., Smith M. K. and Knight J. K., (2016), Using student writing and lexical analysis to reveal student thinking about the role of stop codons in the central dogma, CBE-Life Sci. Educ., 15, ar65:1–13.
  37. Russ R. S., Scherr R. E., Hammer D. and Mikeska J., (2008), Recognizing mechanistic reasoning in student scientific inquiry: a framework for discourse analysis developed from philosophy of science, Sci. Educ., 92, 499–525.
  38. Schauble L., (1996), The development of scientific reasoning in knowledge-rich contexts, Dev. Psychol., 32, 102–119.
  39. Sevian H. and Talanquer V., (2014), Rethinking chemistry: a learning progression on chemical thinking, Chem. Educ. Res. Pract., 15, 10–23.
  40. Shen J., Liu O. L. and Sung S., (2014), Designing Interdisciplinary Assessments in Sciences for College Students: An Example on Osmosis, Int. J. Sci. Educ., 36, 1773–1793.
  41. Sperber D., Premack D. and Premack A. J., (1996), Causal Cognition: A Multidisciplinary Debate, Oxford University Press.
  42. Strauss A. L. and Corbin J. M., (1990), Basics of qualitative research: grounded theory procedures and techniques, Newbury Park, Calif: Sage Publications.
  43. Strickland A. M., Kraft A. and Bhattacharyya G., (2010), What happens when representations fail to represent? Graduate students’ mental models of organic chemistry diagrams, Chem. Educ. Res. Pract., 11, 293–301.
  44. Taber K. S. and Watts M., (1996), The secret life of the chemical bond: students’ anthropomorphic and animistic references to bonding, Int. J. Sci. Educ., 18, 557–568.
  45. Talanquer V., (2007), Explanations and Teleology in Chemistry Education, Int. J. Sci. Educ., 29, 853–870.
  46. Talanquer V., (2013), When Atoms Want, J. Chem. Educ., 90, 1419–1424.
  47. Talanquer V., (2018), Importance of Understanding Fundamental Chemical Mechanisms, J. Chem. Educ., 95, 1905–1911.
  48. Tamir P. and Zohar A., (1991), Anthropomorphism and teleology in reasoning about biological phenomena, Sci. Educ., 75, 57–67.
  49. Wright L., (1972), Explanation and Teleology, Philos. Sci., 39, 204–218.
  50. Wright L., (1976), Teleological Explanations: An Etiological Analysis of Goals and Functions, University of California Press.
  51. Zimmerman B. J. and Martinez-Pons M., (1988), Construct Validation of a Strategy Model of Student Self-Regulated Learning, J. Educ. Psychol., 80, 284–290.
  52. Zimmerman C., (2000), The Development of Scientific Reasoning Skills, Dev. Rev., 20, 99–149.

This journal is © The Royal Society of Chemistry 2020