Analyzing explanations of substitution reactions using lexical analysis and logistic regression techniques
Assessments that aim to evaluate student understanding of chemical reactions and reaction mechanisms should ask students to construct written or oral explanations of mechanistic representations; students can reproduce pictorial mechanism representations with minimal understanding of the meaning of the representations. Grading such assessments is time-consuming, which is a limitation for use in large-enrollment courses and for timely feedback for students. Lexical analysis and logistic regression techniques can be used to evaluate student written responses in STEM courses. In this study, we use lexical analysis and logistic regression techniques to score a constructed-response item which aims to evaluate student explanations about what is happening in a unimolecular nucleophilic substitution (i.e., SN1) reaction and why. We identify three levels of student explanation sophistication (i.e., descriptive only, surface level why, and deeper why), and qualitatively describe student reasoning about four main aspects of the reaction: leaving group, carbocation, nucleophile and electrophile, and acid–base proton transfer. Responses scored as Level 1 (N = 113, 11%) include only a description of what is happening in the reaction and do not address the why for any of the four aspects. Level 2 responses (N = 549, 53%) describe why the reaction is occurring at a surface level (i.e., using solely explicit features or mentioning implicit features without deeper explanation) for at least one aspect of the reaction. Level 3 responses (N = 379, 36%) explain the why at a deeper level by inferring implicit features from explicit features explained using electronic effects for at least one reaction aspect. We evaluate the predictive accuracy of two binomial logistic regression models for scoring the responses with these levels, achieving 86.9% accuracy (with the testing data set) when compared to human coding. The lexical analysis methodology and emergent scoring framework could be used as a foundation from which to develop scoring models for a broader array of reaction mechanisms.