Reasoning, granularity, and comparisons in students’ arguments on two organic chemistry items

Jacky M. Deng; Alison B. Flynn

doi:10.1039/D0RP00320D

View PDF VersionPrevious ArticleNext Article

DOI: 10.1039/D0RP00320D (Paper) Chem. Educ. Res. Pract., 2021, 22, 749-771

Reasoning, granularity, and comparisons in students’ arguments on two organic chemistry items

Jacky M. Deng and Alison B. Flynn *
Department of Chemistry & Biomolecular Sciences, University of Ottawa, 10 Marie Curie, Ottawa, Ontario K1N 6N5, Canada. E-mail: alison.flynn@uOttawa.ca

Received 20th October 2020 , Accepted 5th April 2021

First published on 16th April 2021

Abstract

In a world facing complex global challenges, citizens around the world need to be able to engage in scientific reasoning and argumentation supported by evidence. Chemistry educators can support students in developing these skills by providing opportunities to justify how and why phenomena occur, including on assessments. However, little is known about how students’ arguments vary in different content areas and how their arguments might change between tasks. In this work, we investigated the reasoning, granularity, and comparisons demonstrated in students’ arguments in organic chemistry exam questions. The first question asked them to decide and justify which of three bases could drive an acid–base equilibrium to products (Q1, n = 170). The majority of arguments exhibited relational reasoning, relied on phenomenological concepts, and explicitly compared between possible claims. We then compared the arguments from Q1 with arguments from a second question on the same final exam: deciding and justifying which of two reaction mechanisms was more plausible (Q2, n = 159). The arguments in the two questions differed in terms of their reasoning, granularity, and comparisons. We discuss how course expectations related to the two questions may have contributed to these differences, as well as how educators might use these findings to further support students’ argumentation skill development in their courses.

Introduction

Citizens need to be able to argue from scientific evidence

In a world facing complex global issues (United Nations, 2015), citizens need to be able to make decisions and argue for those decisions using scientific evidence. For example, an evidence-based decision of whether to vaccinate requires deciding to rely on evidence (rather than intuition and emotion), interpreting the quality of the available evidence, and using this evidence to reason for or against a particular decision (Jones and Crow, 2017).

National frameworks for science education in the United States have identified explanations and arguments about phenomena as a key scientific practice (National Research Council, 2012), and the importance of such skills has also been articulated in Europe (European Union, 2006; Jimenez-Aleixandre and Federico-Agraso, 2009), Canada (Social Sciences and Humanities Research Council, 2018), and other international organizations (e.g., Organisation for Economic Cooperation and Development, 2006). However, chemistry education research has found that opportunities for students to argue and explain have largely been absent within traditional chemistry assessments. For example, constructing scientific explanations appeared in less than 10% of American Chemical Society (ACS) general chemistry exam items examined in 2016 (Laverty et al., 2016; Reed et al., 2017). Additionally, an ACS Exam for organic chemistry did not assess students’ ability to construct scientific explanations or arguments at all (Stowe and Cooper, 2017). To better support student development of argumentation and explanation skills, curricula have emerged that explicitly include argumentation and explanation (Talanquer and Pollard, 2010; Cooper and Klymkowsky, 2013), as well as research focused on characterizing argumentation and explanation in laboratory settings (Carmel et al., 2019).

Arguments provide insight into students’ reasoning

Arguments and explanations are distinct. An explanation is used to explain an agreed-upon fact or phenomenon (Osborne and Patterson, 2011; National Research Council, 2012), while arguments justify a fact or phenomenon that is not agreed-upon (McNeill et al., 2006; Kuhn, 2011); rather, the claim is in doubt and must be advanced through reasoning by constructing an argument about the fit between the evidence and claim (Toulmin, 1958; Osborne and Patterson, 2011). Arguments therefore provide a an opportunity to investigate how students are reasoning about phenomena (Emig, 1977; Berland and Reiser, 2009; Grimberg and Hand, 2009).

Recent studies in chemistry education research have worked to characterize students’ reasoning by analysing their arguments about chemical phenomena (Sevian and Talanquer, 2014; Weinrich and Talanquer, 2016; Bodé et al., 2019; Moon et al., 2019; Moreira et al., 2019). For example, Sevian and Talanquer (2014) interviewed individuals ranging from high school chemistry students to chemistry experts (e.g., academia, and industry professionals). The interviewees were asked to construct arguments when deciding on a fuel to power a GoKart; through their responses, the researchers characterized students’ reasoning as one of descriptive, relational, linear causal, or multi-component causal. These modes of reasoning have since been used in other studies to characterize students’ reasoning through analysis of arguments and explanations across a variety of contexts and tasks (Sevian and Talanquer, 2014; Weinrich and Talanquer, 2016; Bodé et al., 2019; Moon et al., 2019; Moreira et al., 2019). In the present study, we analyzed students’ reasoning in an acid–base context.

Acid–base equilibria are key to many domains of chemistry

Knowledge of acid–base chemistry underpins understanding of the majority of reactions in both organic chemistry and biochemistry, and previous work found that acid–base reactions are often the first reaction type taught to organic chemistry students (Stoyanovich et al., 2015).

Research on acid–base chemistry concepts has identified how many students struggle with the Brønsted–Lowry and Lewis definitions of acids and bases, applying Lewis acid–base chemistry in novel contexts (Bhattacharyya, 2006; Cartrette and Mayo, 2011; McClary and Talanquer, 2011), describing why acid–base reactions proceed in the fashion that they do (Cooper et al., 2016), and interpreting and using data related to acid–base chemistry, such as pK_a and pH data (Krajcik and Nakhleh, 1994; Orgill and Sutherland, 2008; Flynn and Amellal, 2016).

Previous research has also sought to identify students’ misconceptions about individual chemical equilibrium concepts, such as Le Chatelier's principle and chemical equilibrium equations (Wheeler and Kass, 1978; Hackling and Garnett, 1985; Banerjee, 1991; Quilez-Pardo and Solaz-Portoles, 1995; Huddle and Pillay, 1996; Voska and Heikkinen, 2000). However, work is needed that directly investigates students’ competencies using acid–base concepts within the context of chemical equilibria, as the synthesis of these two domains of chemistry underpins many of the phenomena students encounter in biochemical and biological contexts later in their studies (e.g., enzymes, ocean acidification).

Given the foundational role that acid–base chemistry plays in other reactivity, we first sought to investigate how students construct an argument within the context of an acid–base equilibrium. This content area has also yet to be investigated within the current chemistry education literature on argumentation, despite its relative importance in both general and organic chemistry and biochemistry (Duis, 2011).

Studies focused on argumentation in chemistry education have also been limited to single content areas (i.e., investigating students’ arguments for a single question), which has made it difficult to determine how different tasks might influence students’ arguments. For example, students may struggle to generate sophisticated arguments in one content area but may not struggle in other content areas. Therefore, in this work, we next compared students’ arguments in the acid–base question with a previous analysis of students’ arguments in a different content area: comparing mechanistic pathways (Bodé et al., 2019).

Analytical framework

We analysed students’ arguments using a framework with three dimensions: modes of reasoning, granularity, and comparisons (Fig. 1), as described below.


	Fig. 1 The dimensions comprising the analytical framework in this work: reasoning, granularity, and comparisons.

Modes of reasoning. Reasoning has been analysed through a variety of different lenses and frameworks in chemistry education research. These approaches include Type I and II reasoning (Talanquer, 2007, 2017; McClary and Talanquer, 2011; Maeyer and Talanquer, 2013), teleological reasoning (Talanquer, 2007; Abrams and Southerland, 2010; Trommler et al., 2018; Caspari et al., 2018a, 2018b; DeCocq and Bhattacharyya, 2019), abstractedness and abstraction (Sevian et al., 2015; Weinrich and Sevian, 2017), rules-, case-, and model-based reasoning (Windschitl et al., 2008; Kraft et al., 2010; DeCocq and Bhattacharyya, 2019), and causal, mechanistic, and causal mechanistic reasoning (Cooper et al., 2016; Crandell et al., 2018).

In this study, we analysed students’ arguments in terms of four modes of reasoning: descriptive, relational, linear causal, and multi-component causal (Sevian and Talanquer, 2014; Weinrich and Talanquer, 2016; Caspari et al., 2018a). We chose this framework because of its alignment with the intended learning outcomes of the course context in which this study was conducted, including the associated classroom activities related to crafting scientific arguments. We describe each mode below.

Descriptive arguments list or give features and/or the properties of entities (e.g., the reactants, products) without establishing connections. For example, to justify a claim that humans are causing global warming, one might give “Burning fossil fuels generating CO₂.” However, without an explicit link between the evidence and the claim, it is unclear how the evidence is connected to the claim, if at all (e.g., Why is CO₂ important in climate change? How does it have an effect?).

Relational arguments include connections between properties of the entities and their activities, but these relationships are discussed in a correlative fashion (i.e., absent of causality). In other words, connections are stated but the argument does not extend to why these links or evidence are appropriate. For example, to justify a claim that humans are causing global warming, one might state: “Humans are causing global warming because they generate CO₂ by burning fossil fuels.” Compared to the descriptive example, this argument includes an explicit link between the evidence and the claim. However, the reader is left wondering why or how CO₂ contributes to global warming.

Causal arguments include all features of a relational argument and additionally contain cause-and-effect relationships between the relevant properties of the entities and their activities. In other words, links are stated and additional reasoning explains why or how these links are relevant and/or appropriate, often by referencing scientific knowledge, principles, additional evidence, etc. Linear causal arguments establish a single chain of causal relationships between one or more pieces of evidence to justify a claim. For example, a linear causal argument to justify a claim that humans are causing global warming may be: “Humans are causing global warming because they generate CO₂ by burning fossil fuels. CO₂ is a greenhouse gas that contributes to global warming by trapping heat in the Earth's atmosphere.” Here, the second sentence serves as the reasoning that explains the relationship between the claim and evidence in the first sentence.

Multi-component causal arguments establish multiple chains of causal relationships between more than one piece of evidence to support a claim. A multi-component causal argument to justify the claim that humans are causing global warming may include the same linear causal example above, but with an added “chain” of causal reasoning to support the original claim, such as: “Humans are causing global warming because they generate CO₂ from burning fossil fuels. CO₂ is a greenhouse gas that contributes to global warming by trapping heat in the Earth's atmosphere. In addition, humans participate in agricultural activities that increase CH₄ generation, another greenhouse gas that traps heat in the Earth's atmosphere.” The argument could continue even further by describing the chemical properties of CO₂ and CH₄ that make them greenhouse gases, a concept that we describe below as levels of granularity.

Levels of granularity. Beyond constructing arguments about phenomena at different levels of reasoning, they can be constructed at different levels of granularity (Fig. 2). For example, a justification for why aspirin acts as an acid in water may focus on pH and pK_a data (a phenomenological level) or how aspirin has a carboxylic acid functional group that is resonance-stabilized when deprotonated (underlying level that includes structural, electronic, and energetic factors) (Talanquer, 2018a). Different contexts and tasks require different levels of granularity, as different phenomena may be explained from increasingly large macroscopic perspectives (e.g., global levels and beyond) or increasingly small submicroscopic perspectives (e.g., atomic levels and beyond) (Darden, 2002). The idea of granularity has been described in other work on scientific reasoning, including scales (Talanquer, 2018b), levels (van Mil et al., 2013), nested hierarchies (Southard et al., 2017), emergence (with ideas of downward and upward causality) (Luisi, 2002), and bottom-out reasoning (Darden, 2002).


	Fig. 2 Different contexts/tasks require different levels of granularity.

In this study, we categorized students’ arguments into four levels of granularity relevant to the questions they were asked: phenomenological, energetic, structural, and electronic.

The phenomenological level captures descriptions of chemical phenomena that arise from the interactions of molecules and atoms and their structural, electronic, and energetic properties. For example, within a given context, the favoured direction of a chemical equilibrium may be a phenomenon to be explained. The interplay of structural, electronic, and energetic properties/interactions of molecules and atoms can be used to determine and justify the direction of an equilibrium. Depending on the task, arguments may also be focused on other phenomenological factors that can be equally valid; for example, pK_a data could be used to determine the direction of an acid–base equilibrium.

The structural level captures descriptions of structural features of molecules and atoms. For example, in an acid–base equilibrium context, a student's description of the relative stability of two basic atoms would be considered discussion at the structural level. In the context of an organic chemistry mechanism, a structural discussion might include identifying steric bulk around a particular reactive centre and connecting the steric interactions to the effects at the transition state (energetic level). The structural level itself contains grain sizes, such as cells, biomolecules, small molecules, molecular fragments, functional groups, and individual atoms.

The electronic level captures descriptions of electronic features of molecules and atoms. For example, electronegativity and partial charges could be used to explain reactivity at the electronic level. Other examples might compare formal charges and electron density on basic atoms to justify the direction of an equilibrium, or discuss molecular orbitals to describe electronic features of molecules.

The energetic level captures descriptions of the energetics of reactions, including thermodynamic and kinetic considerations. Descriptions at this level could include considering the relative stabilities of conjugate acids/bases to justify the direction of an equilibrium or justifying the plausibility of various reaction mechanisms based on activation energies.

In this study, we used these four levels of granularity based on the concepts and ideas identified in students’ responses, the intended learning outcomes related to the questions we analysed, as well as previous theoretical work related to chemistry students’ reasoning (Machamer et al., 2000; Luisi, 2002; van Mil et al., 2013; Southard et al., 2017; Talanquer, 2018a). Different levels of granularity may be more relevant for other contexts, such as other content areas within chemistry or other disciplines (biology, physics, etc.). For example, chemical reactions and equilibria may be the phenomena to be explained in the chemical contexts investigated in this study and the highest level of granularity needed for these contexts, while in molecular biology contexts, these phenomena may be the deepest level of granularity needed for an explanation (e.g., explaining why a substrate binds to an enzyme).

Levels of comparison. A comparison is needed when an argument involves two or more possible claims, or when there are various factors that influence an outcome, phenomenon, or claim (Toulmin, 1958). Without a comparison, a species cannot be more/less, bigger/smaller, or faster/slower than another. Comparing between alternatives is also a key aspect of scientific practice; for example, to justify why global warming is happening, one might leverage evidence to refute counterclaims (claims that global warming does not exist). In the questions used in this study, students had to argue for one of multiple claims, thereby providing an opportunity to construct arguments in which they compared their claim to alternatives. The arguments may include full, partial, or no comparisons.

Goals and research questions

We characterized students’ arguments for an acid–base equilibrium question (Q1) in terms of the concepts, links, and comparisons that were articulated, specifically using the following research question (RQ):

(1) When constructing an argument to decide which base will drive an equilibrium towards products:

(a) What concepts do students include?

(b) What links do students establish between concepts?

(d) What modes of reasoning do the arguments exhibit?

Next, we used the findings from RQ1 to compare with the analysis of a question that prompted students to compare mechanisms (Q2) (Bodé et al., 2019). Specifically, we investigated:

(2) How might students’ arguments differ on two different organic chemistry questions from a single exam in terms of reasoning, granularity, and comparisons?

Methods

Setting and course

This research was conducted within the context of an Organic Chemistry II course at a large, bilingual, research-intensive university in Canada. At this institution, introductory organic chemistry is offered across two semesters as Organic Chemistry I (OCI) and Organic Chemistry II (OCII). OCI is offered in the winter semester of students’ first year of studies while OCII is offered in both the following summer and fall. Students can take the courses in either English or French. OCII is a 12 week course (∼400 students per section) consisting of two weekly classes (1.5 hours each, mandatory, lecture or flipped format) and a voluntary tutorial session (1.5 hours) (Flynn, 2015, 2017). Assessments for the course are comprised of in-class participation via a classroom response system, online homework assignments, two midterms, and a final exam. The course is comprised of ∼75% Faculty of Science students, ∼17% Faculty of Health Sciences students, and ∼8% students from other faculties. General topics addressed in OCII include reactions with σ electrophiles (i.e., S_N1/S_N2/E1/E2 and oxidation reactions), introduction to ¹H NMR and IR spectroscopy, reactions of π electrophiles with leaving groups, and reactions with activated π nucleophiles (e.g., aldol reactions) (Flynn and Ogilvie, 2015; Ogilvie et al., 2017; Raycroft and Flynn, 2020).

Data source. We analysed and compared findings from students’ responses to two final exam questions (Fig. 3) from the OCII 2017 final exam. Question 1 (Q1, n = 170) asked students to justify the direction of an acid–base equilibrium and Question 2 (Q2, n = 159) asked students to justify why one of two similar reaction mechanisms was more plausible (S_N1 versus S_N2). For Q1, pK_a values were not provided to students, though values for chemical analogues were provided in a data table attached to the exam. Each question followed Toulmin's claim–evidence–reasoning pattern, as students were asked to: (a) choose a claim given multiple options, (b) justify their choice in an argument using evidence and reasoning. Prior to our analysis, we received Research Ethics Board approval (H03-15-18).


	Fig. 3 The acid–base equilibrium question (Q1, top), and the comparing mechanisms questions (Q2, bottom). Both questions prompted students for their claim, evidence, and reasoning.

The analysis of concepts, links, comparisons, and modes of reasoning for Q2 had been previously reported as part of a separate research work (Bodé et al., 2019). We used this analysis to support our investigation of RQ2. Therefore, when discussing concepts, links, and comparisons in this work (RQ1), we report only the analysis and findings for Q1.

Coding process. The first part of our analysis focused on the concepts, links, and comparisons in students’ arguments. We initially identified these components based on the expected answer to Q1 (Appendix B), which was constructed based on the intended acid–base learning outcomes from the OCII course (Appendix C). This process established content validity for the initial coding scheme, ensuring that we defined the initial scheme using concepts that matched course expectations. During the coding process, we added codes that were not present in the initial coding scheme but were present in students’ answers. We included these additional codes even if they were described in error or were irrelevant to the question.

The analysis followed the following sequence:

(1) Identifying concepts present in the argument and whether these concepts were discussed correctly or with errors.

(2) Identifying links between individual concepts in the argument and whether these links were canonically correct or not.

(3) Identifying which concepts were used to explicitly compare/contrast between possible claims.

Only explicit instances of concepts were coded. For example, we only coded for the concept of “base strength” if the argument included phrases like “NaH is a strong base”. Links between concepts were said to be present only when the student was explicitly linking between concepts with words like “because”, “therefore”, “so”, etc. A concept was said to compare between claims if reference was made to one or more of the other possible claims. For example, “NaH is a stronger base than NH₃” or “NaH is a strong base and NH₃ is a weak base” would warrant a comparison code for a base strength concept code. Next, we determined the mode of reasoning of students’ arguments to be one of descriptive, relational, linear causal, or multi-component causal using the definitions provided in Table 1. For example, a descriptive argument was defined as one in which a student simply described concepts or features of molecules but did not make any connections between these statements (e.g., stating a claim and providing some evidence, but not connecting these ideas). In contrast, linear-causal response was said to be present if a student made a claim (e.g., “The equilibrium will favour products…”), justified that claim with a concept/feature (“because NaH is a strong base…”), and justified this connection by describing how we know why a strong base drives the equilibrium towards products (“A strong base drives the equilibrium towards products because it has a conjugate acid with the highest pK_a value”). Appendix A provides additional examples of the coding process for the modes of reasoning.

Table 1 Descriptions and examples of the modes of reasoning

Mode of reasoning	Description	Example
Descriptive	Claims are supported solely by describing features and/or the properties of entities (e.g., the reactants, products, etc.). No connections are established	NaH is the strongest base
Relational	Relationships between properties of the reaction materials and their activities are discussed in a correlative fashion (i.e., a cause–effect relationship is not established)	NaH will drive the equilibrium towards products because its conjugate acid has a higher pK _a value (35) than the alkyne (25)
Linear causal	All of the features of a relational argument are present and are accompanied by cause-and-effect relationships between the relevant properties of the entities and their activities	NaH will drive the equilibrium towards products because its conjugate acid (pK _a value 35) is weaker/more stable than the starting alkyne (pK _a value 25), according to the relevant pK _a values. NaOH and NH ₃ are not strong enough bases to drive the equilibrium to the product side, as both of their conjugate acids are stronger than the starting alkyne (pK _a values of 15.7 and 9, respectively)
Multi-component causal	Phenomena are seen as the result of the static or dynamic interplay of more than one factor and the direct inter-actions of several components	The hydride is the only base of the choices that is sufficiently strong to drive the equilibrium to products. The hydride base is smaller and less electronegative than carbon, making it less able to stabilize a negative charge than the alkyne anion (conjugate base)
		The oxygen atom in NaOH is more electronegative than the carbon atom in the conjugate base, making it better able to stabilize the negative charge. Although atomic size (larger carbon atom) and hybridization suggest that the carbon atom in the conjugate base could be more stable, the pK _a values of the acid and conjugate acids support the conclusion that NaOH is the weaker base, since the stronger the acid, the weaker its conjugate base (pK _a value of H ₂ O = 15.7; pK _a of alkyne = 25)
		NH ₃ is similarly too weak to drive the equilibrium to the product side, with the nitrogen atom being neutral in addition to being more electronegative than the carbon anion

Canonical correctness of the links was not a factor when deciding on the mode of reasoning, as (a) we were principally interested in students’ domain-general abilities to reason and (b) an argument can still be logically sound while including canonically incorrect information (Toulmin, 1958).

To support our analysis, we drew diagrams to visually represent students’ arguments. These diagrams allowed us to visually organize the units (links and concepts) within students’ arguments, helping us assign a mode of reasoning to each argument. Examples of diagrams to facilitate analysis of arguments have been previously described (Verheij, 2003; Moreira et al., 2019) and we provide examples of diagrams used in this work in Appendix A.

We assigned a level of granularity to each argument based on the granularity of the concepts identified in the first part of the analysis (Table 2). For example, in Q1, we categorized an argument relating two concepts—direction of an equilibrium and pK_a values—to be at the phenomenological level of granularity because this argument did not consider any underlying factors that contributed to these phenomena (i.e., it did not acknowledge any energetic, structural, or electronic factors). In contrast, an answer that discussed how the electronegativity of a particular atom (electronic) could be used to determine the relative stability of the molecule (energetic) and the direction of an equilibrium (phenomenological) was coded as having discussed concepts at three distinct levels of granularity.

Table 2 Examples of concepts at each level of granularity for Q1 and Q2. Concepts with a * indicate ones that students proposed in their responses but were unexpected based on course learning outcomes

Examples of concepts at each level of granularity
Level of granularity	Q1: acid–base equilibrium	Q2: comparing mechanisms
Phenomenological	Direction of equilibrium, pK_a values, conjugate acid strength, base strength, *reaction pathway	*Reaction likelihood
Structural	Atom size	Steric hinderance, number of α-carbon substituents
Electronic	Electronegativity, formal charge, *nucleophilicity	Hyperconjugation
Energetic	Relative stability	Activation energy

Lastly, we coded each argument as one of three levels of comparison—isolated, partially compared, and fully compared—based on the degree to which concepts in the argument were used to compare between the possible claims (Table 3). For example, if an argument included the concepts base strength and acid strength, but both these codes were discussed only in terms of the chosen claim, then we coded this argument as isolated. If one (but not both) of these concepts was used to compare to another possible claim (“NaH is a stronger base than NH_3”), then we coded this argument as partially compared. If both concepts were used to compare to another base (“NaH is a stronger base than NH₃, which means H₂ is a weaker conjugate acid than NH₄⁺”), then we coded this statement as fully compared.

Table 3 Descriptions for each level of comparison from Bodé et al. (2019)

Level of comparison	Description
Isolated	Concepts in argument for a claim are all discussed in isolation from the other possible claim. Concepts are never used to compare/contrast between the claims
Partially compared	Some (but not all) concepts in argument for a claim are discussed in relation to the other possible claim. These concepts are used to compare/contrast between the claims
Fully compared	All concepts in argument for a claim are discussed in relation to the other possible claim. All concepts are used to compare/contrast between the claims

Inter-rater reliability

To evaluate and improve the reliability of our qualitative analysis, a second coder analysed a subset of exams for the units outlined in Table 4 using the method described above to establish inter-rater reliability (Krippendorff, 1970; Hallgren, 2012). We used Krippendorff's α as a statistical measure to evaluate agreement between coders (Krippendorff, 1970). Unlike percent agreement, Krippendorff's α accounts for chance agreement between coders. We calculated inter-rater reliability for the analysis of concepts, links, comparisons, and modes of reasoning, as levels of granularity and levels of comparison were dependent on concepts and comparisons, respectively.

Table 4 Krippendorff α values obtained from inter-rater analysis for units in students’ arguments. Acceptable agreement = 0.67

Unit of analysis	Krippendorff's α
Unit of analysis	Round 1	Round 2	Round 3
Concepts	0.58	0.77	0.82
Links between concepts	0.42	0.71	0.86
Concepts used to compare	0.58	0.86	0.95
Mode of reasoning	0.40	0.49	0.74

For each question, after the primary coder coded the entire set of responses, the second coder used the first iteration of the codebook to code a subset of 15% of students’ arguments. Both coders then met to discuss differences between their respective analyses. The most common challenges in our coding were (1) determining whether a student was making implied references to links or comparisons and (2) determining the arguments’ mode of reasoning. For example, one argument stated “NaH is the strong base. The equilibrium is forced to the products.” In this case and similar cases, the coders were unsure about the presence/absence of implied links and comparisons. Based on these discussions, we decided to code mainly for explicit references to links and comparisons to limit the number of assumptions we could make during our analysis. Any assumptions about implied references were first discussed with other raters before making a final decision. We repeated the interrater process with new subsets of data (15% of the dataset) until the two coders obtained a Krippendorff's α greater than 0.67 for each of the units described in Table 4, the value that exceeds the threshold of acceptability for inter-rater reliability (Krippendorff, 1970). Between each round of the inter-rater process, the codebook (Appendix A) was revised based on discussions between the two raters.

Results and discussion

The following sections related to RQ1 describe findings from our analysis of the concepts, links, and comparisons identified in students’ arguments to Q1. We had collected similar data for Q2 in previous work (Bodé et al., 2019) and used this previously collected data for investigating RQ2.

RQ1a: What concepts do students include?

For Q1, we found differences in the concepts discussed depending on whether they provided a correct or incorrect claim (Fig. 4). Arguments with correct claims more frequently discussed the direction of the equilibrium, conjugate acid strength, and the pK_avalues of conjugate acids. In the context of the OCII course, all three of these concepts were relevant to the claim and were key concepts employed in the expected answer for this question (Appendix B).


	Fig. 4 For Q1, concepts discussed in arguments for correct claims (n = 110, left) and incorrect claims (n = 60, right).

For incorrect claims, the two most frequently discussed concepts were base strength and reaction pathways. For example, Student 10 provided the following argument which used base strength to justify a suggested reaction pathway:

Student 10: “NaH is the strong base choice therefore it is most likely to react by deprotonating the carbon.” [emphasis by the authors]

Despite base strength being the most prevalent concept discussed in incorrect claims, the majority of arguments for incorrect claims discussed this concept incorrectly. This was found to be reflective of a broader trend, as correct claims were more frequently justified with concepts that were discussed correctly compared to incorrect claims.

RQ1b: What links do students establish between concepts?

We visualized the links made between concepts in students’ arguments for Q1 using Gephi data visualization software. Nodes represent concepts; edges (i.e., a line between two nodes) represent links between two concepts (Fig. 5). The frequency of links between two concepts is correlated with thickness of the edge. In other words, a thicker edge represents two nodes (concepts) that were more frequently connected in students’ arguments. In contrast, a node with no edges represents a concept that had no links to other concepts in the dataset.


	Fig. 5 For Q1, connections made between concepts made for correct claims (left, n = 110) and incorrect claims (right, n = 60).

Three concepts were the most prevalent in correct claims: the direction of the equilibrium, conjugate acid strength, and pK_avalues of conjugate acids. These were also the three concepts that exhibited the most frequent connections. Often, arguments for correct claims included a triad of concepts and links that included stating the respective pK_a values of the conjugate acids of the given bases, using these pK_a values to rank the relative strengths of the conjugate acids, then using these rankings to justify the extent to which an equilibrium involving each base/conjugate acid would favour a particular direction. For example, Student 116 provided the following argument which included this triad:

Student 116: “I chose NaH as a base because its conjugate acid has a pK_avalue of around 36, which makes it a weaker acid than the starting material. The equilibrium will favour the side with the weaker acid. I did not choose NaOH or NH₃ because their respective conjugate acids would have a pK_avalue less than that of the SM [starting material], meaning that the equilibria would favour the starting materials (pK_a∼ 15.7 for H₂O and ∼10 for NH₄⁺).”

In some cases, this type of argument was expanded to include a discussion of base strength. These arguments included identifying the relationship between the relative strengths of the conjugate acids from the relative strengths of the bases, and then using these ideas in concert to determine the direction of the equilibrium.

The most common connection made in incorrect claims was between base strength and reaction pathway. In these cases, students often used base strength as the principle concept to justify how their chosen base (or all three bases) would react with the alkyne or the acyl chloride. For example, Student 43 provided the following argument, which linked NaOH being a strong base to how the base would proceed in a reaction (compared to the other options):

Student 43: “[NaOH is] a strong base that can remove the hydrogen from the alkyl chain, whereas the other bases are weaker and need more activation energy to remove the hydrogen.”

We suspect that students who linked the codes base strength and reaction pathway may have done so in a rote fashion. This link was present in both incorrect claims and correct claims; however, in correct claims, base strength was also linked to other concepts, such as conjugate acid strength.

RQ1c: What concepts do students use to compare between claims?

Fig. 6 shows how often a given concept was used in a comparison between claims. Correct claims primarily compared between claims during discussions of pK_avalues of conjugate acids, conjugate acid strength, and the direction of the equilibrium. For example, Student 14 listed the pK_a values for all three conjugate acids, used these to compare the relative strength of the acids based on these values, then described which direction the equilibrium would favour in each case:


	Fig. 6 For Q1, how often each concept was used to compare between claims in arguments for both correct (left, n = 110) and incorrect (right, n = 60) claims.

Student 14: “I chose NaH as the base because its conjugate acid has a higher pK_avalue than the alkyne. That means that the conjugate acid is a weak acid, weaker than the alkyne, so the reaction will favour the products. I did not choose NaOH or NH₃ because their conjugate acids have smaller values than the alkyne, driving the equilibrium towards the starting materials.”

In contrast, incorrect claims primarily compared between claims using the concepts of base strength and reaction pathways. A common example was a student stating that one base was stronger than the other two bases, leading them to conclude that the stronger base would be able to react as a base with the alkyne. For example, Student 55's argument:

Student 55: “NaH will take the H of the bonding end of the triple bond to make H₂(g). NaH is a much stronger base than NaOH and NH₃. NaOH and NH₃ are too weak to deprotonate the alkyne. NH₃would break the triple bond and add NH₂to the end of the triple bond. NaOH wouldn’t react at all. NaH when a solution has H⁻ floating around, which are extremely reactive.”

Determining the levels of comparison for Q1, arguments for correct claims more frequently compared against the other possible claims than arguments for incorrect claims, χ²(1, N = 170) = 11.2, p = 0.001, ϕ = 0.257 (Fig. 7). In other words, students who provided correct claims were more likely to compare and contrast between claims, while students who provided incorrect claims were more likely to discuss their claim in isolation of the other possible claims.


	Fig. 7 Levels of comparison for Q1. Students who provided correct claims (n = 110) were more likely to compare and contrast between claims, while students who provided incorrect claims (n = 60) were more likely to discuss their claim in isolation of the other possible claims.

RQ1d: What modes of reasoning do the arguments exhibit?

For Q1, the majority of students (62%) provided the correct claim (i.e., chose the correct base) for which base would drive the equilibrium in question to products (Fig. 8). However, causal reasoning was present in only 31% of all answers (either linear causal or multi-component causal). Correct claims more frequently exhibited causal arguments than incorrect claims (linear causal and multi-component causal), while incorrect claims more frequently exhibited descriptive arguments than correct claims. The frequency of causal arguments was significantly different between arguments for correct claims vs. arguments for incorrect claims, χ²(1, N = 170) = 18.1, p < 0.001 with a medium effect size, ϕ = 0.33.


	Fig. 8 Modes of reasoning for students’ arguments in Q1 (correct claims, n = 110; incorrect claims, n = 60). Students who were arguing for correct claims were more likely to exhibit causal modes of reasoning.

Relational arguments were the most prevalent across all student arguments for Q1 (48% of all answers). The most common relational argument discussed how a chosen base was a strong base (base strength) that was strong enough to drive the equilibrium towards products (direction of the equilibrium). Other relational arguments were similar but discussed acid strength or pK_a values in place of base strength. The commonality here was that these arguments did not include discussions of why base strength, acid strength, or pK_a values would affect the direction of the equilibrium. In contrast, a common linear causal argument discussed how the equilibrium would favour the products due to differences in pK_a values and would then explain why these pK_a values were relevant to the claim by referencing how pK_a values enabled comparison between relative acids strengths. For example, the first part of Student 19's argument linked the direction of the equilibrium to conjugate acid strength, and justified this link with pK_a values:

Student 19: “The equilibrium of the first step is dependent on the acid–base reaction and as a result, it is dependent on which side does [sic] the stronger acid lie. Based on the structure of the reactant, the more acidic proton is at the terminal alkyne (pK_a50 [C–H sp³] vs. 24 [C–H sp]), so the appropriate base must have a weaker conjugate acid…”

Although this argument is linear causal, it has a phenomenological level of granularity, as there is no discussion of any underlying factors that contribute to acid strength/pK_a values and the direction of an equilibrium. The latter portion of Student 19's argument does achieve a deeper level of granularity by relating these phenomena to electronic factors, such as electronegativity:

Student 19 (continued): “…Based on the electronegativity of OH and NH₃, they would serve as better bases than the alkyne as the greater electronegativity of O and N allowing the ionized forms to better stabilize a negative charge (for O, making the ⁻OH a more stable base than the ionized alkyne) and less able to stabilize a positive charge (for N, NH₄⁺ (CA for NH₃) is more acidic than alkynes and hence, shifts equilibrium to the alkyne).”

Multi-component causal arguments were only present in arguments for correct claims. The most common multi-component causal arguments justified the direction of the equilibrium using both base concepts (base strength, electronegativity) and acid concepts (conjugate acid strength, pK_avalues).

RQ2: How might students’ arguments differ on two different organic chemistry questions from a single final exam in terms of reasoning, granularity, and comparisons?

The distributions for the modes of reasoning for Q1 arguments differ qualitatively from the modes of reasoning for Q2 arguments uncovered in our previous work (Bodé et al., 2019) (Fig. 9). To determine the statistical significance of these differences, we compared the respective percentages of causal and non-causal arguments between Q1 and Q2 to determine the extent to which students’ reasoning differed between the two questions. We found that arguments for Q2 had significantly more causal arguments than for Q1 (linear and multi-component), with a medium effect size, χ²(1, N = 329) = 20.456, p < 0.001, ϕ = 0.27.


	Fig. 9 Modes of reasoning for the acid–base equilibrium (Q1, n = 170) and comparing mechanisms (Q2, n = 159) questions.

Next, we determined the levels of granularity using the concepts identified in arguments for both Q1 and Q2. Each level of granularity had a different number of underlying concepts (e.g., for Q1, five concepts were considered phenomenological, while only two concepts were considered electronic). We therefore normalized the different number of concepts that could be described at each level of granularity by dividing the frequency of concepts at each level of granularity by the number of possibilities for each level (e.g., for Q1: dividing the sum of all concepts at phenomenological level by five).

Because Q1 and Q2 assessed different conceptual knowledge and required different levels of granularity, we qualitatively compared the granularity expressed in students’ arguments for the two questions (Fig. 10). For Q1, the concepts were primarily at a phenomenological level of granularity (e.g., arguments focused on pK_avalues, conjugate acid strength, direction of the equilibrium); however, some students’ arguments included concepts from more granular levels (e.g., electronegativity, formal charge, stability). For Q2, the majority of concepts were at the structural and energetic levels, which included concepts such as number of α-carbon substituents, number of carbocation substituents, and activation energy.


	Fig. 10 The proportion of concepts exhibited at each level of granularity for both Q1 (acid–base, n = 503) and Q2 (comparing mechanisms, n = 468). Descriptions for each level of granularity are described in Table 2.

We also investigated how students compared between claims in Q1 versus Q2 (Fig. 11). Students more frequently compared concepts (either partially or fully) on Q2 than Q1, χ²(1, N = 329) = 10.748, p = 0.001, ϕ = 0.18. Additionally, when investigating the relative frequencies of partial versus full comparisons, we found that students more frequently made full comparisons on Q2 than Q1, χ²(1, N = 329) = 36.170, p < 0.001, ϕ = 0.354.


	Fig. 11 Differences in the levels of comparison between Q1 (acid–base equilibrium, n = 170) and Q2 (comparing mechanisms, n = 159).

We were interested in identifying potential factors for why a single group of students produced arguments that differed in terms of reasoning, granularity, and comparisons on a single exam. Therefore, we compared the intended and enacted learning outcomes from the OCII course for the two questions (Stoyanovich et al., 2015; Raycroft and Flynn, 2020). Intended learning outcomes (ILOs) are defined as the knowledge, skills, and values students are expected to demonstrate by the end of a course (Biggs and Tang, 2011), which are often described in course syllabi. We analysed ways in which the ILOs were taught, practiced, and assessed through the course (Dixson and Worrell, 2016; Carle and Flynn, 2020; Raycroft and Flynn, 2020). First, we reviewed the OCII course syllabus for ILOs relevant to Q1 and Q2 (full list available Appendix C). We then reviewed how these ILOs were enacted in the course notes and videos (taught), problem sets and in-class activities (practiced), and midterms and exams (assessed).

Reviewing the course materials related to Q1 and Q2, we found that how these questions were taught, practiced, and assessed aligned well with how students responded to these questions. For Q1, students were expected throughout the course to be able to justify the direction of acid–base equilibria using both chemical factors and pK_a data (Flynn; Stoyanovich et al., 2015; Flynn and Amellal, 2016). However, in cases where chemical factors were competing—for example, a base in the starting materials being resonance stabilized but the conjugate base in the products bearing a larger and more electronegative atom—students could rely on pK_a data of the acids (i.e., experimental evidence) to make their final decision. This is the case in Q1, as orbitals/hybridization suggests that NaH is more stable than the acetylene anion, but electronegativity and charge suggest the opposite. Therefore, students may have focused their arguments on pK_a data to come to a final decision, perhaps resulting in the less granular, non-causal arguments found in our analysis.

In contrast, for Q2, students were expected to leverage a combination of structural and energetic information when making decisions about whether S_N1/S_N2 and E1/E2 reactions would occur (examples of course notes in Appendix C). Further, on the midterm exam earlier in the course, students had been asked a question similar to Q2 of this study in which they were expected to justify which of two mechanisms was more plausible by establishing connections between the structural features of molecules and energetic information within reaction coordinate diagrams. These activities may have reinforced expectations throughout the class about generating more granular, causal arguments for questions like Q2, such as those we uncovered in our analysis.

Conclusions

This study provides insight into how students construct arguments when justifying the direction of an acid–base equilibrium (RQ1) as well as how students’ arguments can differ between content areas within chemistry (RQ2). This work adds to a growing body of research on analysing students’ abilities to justify claims about chemical phenomena through argumentation and reasoning.

For Q1, arguments for correct and incorrect claims were focused on different sets of concepts; arguments with correct claims more frequently discussed pK_a values and conjugate acid strength while arguments with incorrect claims more frequently discussed relative base strength and described how molecules would react (RQ1a). Arguments for correct claims more frequently linked the direction of the equilibrium to pK_a values, conjugate acid strength, and relative base strength, while incorrect claims more frequently linked relative base strength to descriptions of how molecules would react (RQ1b). Arguments for correct claims more frequently completely compared between different bases in their arguments, while incorrect claims more frequently discussed claims in isolation of other possibilities. Lastly, arguments with correct claims more frequently exhibited causal reasoning (linear causal and multi-component causal), while incorrect claims more often exhibited relational reasoning.

Related to the second research question (RQ2), Q1 arguments demonstrated more relational reasoning compared to Q2 arguments, which demonstrated more causal reasoning. In general, concepts discussed in Q1 were more phenomenological, often focusing on pK_a values or general descriptors (e.g., strong acid, strong base) to justify claims. In comparison, arguments for Q2 more often argued using underlying factors, such as structural and energetic information, to justify their claims. Lastly, Q1 arguments exhibited more complete comparisons between claims than Q2 arguments.

Students’ arguments on the two questions broadly aligned with how these questions were taught, practiced, and assessed within the course context (Fig. 12). These findings reinforce the notion that students’ arguments—including the reasoning, granularity, and comparisons demonstrated in an argument, as shown in this work—depend on the course context, the stakes, how well expectations are communicated, in addition to students’ actual abilities (Kelly et al., 1998; Sadler, 2004; Sadler and Zeidler, 2005; von Aufschnaiter et al., 2008; Barwell, 2018; Cian, 2020). For example, research on students’ arguments in other content areas in organic chemistry, such as delocalization, have also found that students’ arguments can differ depending on the task/context (Carle et al., 2020). In summary, from Q1 and Q2 combined, over 60% of students in this work demonstrated that they can construct causal arguments, but whether they choose to will depend on appropriateness and need (Bodé et al., 2019).


	Fig. 12 Aligning different factors within a course context can help support student achievement of the intended learning outcomes.

Implications for teaching and research

If we expect students to argue in a particular way and leverage specific concepts and/or evidence in their arguments, then as educators we need to be explicit and consistent in how we communicate these expectations within our course contexts (Fig. 12) (Bernholt and Parchmann, 2011; Stoyanovich et al., 2015; Weinrich and Talanquer, 2016; Caspari et al., 2018b; Carle and Flynn, 2020). As noted by Macrie-Shuck and Talanquer (2020): “the complex nature of mechanistic reasoning in chemistry demands integrating multiple pieces of knowledge and connecting various scales (e.g., macro, multiparticle, single-particle), dimensions (compositional, energetic), and modes of description and explanation (phenomenological, mechanical, structural). Developing mastery in this area likely demands time and sustained and concerted effort across multiple courses and areas of knowledge.” Although causal arguments are suggested to be more sophisticated modes of reasoning in various frameworks used to characterize reasoning, this mode of reasoning is not necessarily “better” than any other mode. What is considered “better” reasoning depends on the argument's context and purpose; in scientific practice and chemical thinking, less “sophisticated” arguments may be completely acceptable, practical, and successful for accomplishing a given task and meeting a certain expectation.

One potential avenue to further investigate the influence of course context and expectations on students’ arguments might be asking students to construct two arguments, each with a different mode of reasoning, level of granularity, and level of comparison, respectively, and to determine whether students are able to effectively traverse across these dimensions when constructing arguments. Another option would be to provide students with pre-constructed arguments and ask them to identify the reasoning, granularity, and comparison(s). For example, the OCII course has incorporated assessment items that explicitly prompt students to consider the different chemical factors and pK_a data involved in making decisions about chemical equilibria (Fig. 13).


	Fig. 13 Example of assessment item to prompt students to consider factors at various levels of granularity.

Limitations

Open responses such as the ones analysed in this study provide rich insight into students’ thinking; however, they are limited in that they may give an incomplete picture. For example, the design of the prompts presented in this work may have influenced the types of responses students generated (e.g., no multicomponent reasoning exhibited in Q2). We decided to analyse students’ written arguments in this work to allow for statistical analysis of trends within a larger sample. Other qualitative methods such as interviews and focus groups would provide researchers with even richer insight and more opportunities for dialogue and inquiry.

Conflicts of interest

There are no conflicts to declare.

Appendix A: codebook and coding examples

List of concepts (Table 5)

Concepts are defined as key words or phrases in a student's argument that indicate their use of a chemical property, concept, phenomenon, etc.

Table 5 Codes for concepts proposed in arguments, with notes to guide appropriate application

Code	Concept	Notes
pK_a	pK_a value of conjugate acids	Reference to the pK_a value of any (or all) of the conjugate acids described
		• May include explicit discussion of relative pK_a values without referring to actual values (e.g., pK_a value of H₂O is lower than that of H₂)
		• Correct if the pK_a value for the chosen base is stated incorrectly or discussed incorrectly relative to other bases
		• Incorrect if the pK_a for the chosen base is stated incorrectly or discussed incorrectly relative to other bases
Acidstr	Conjugate acid strength	Reference to the strength of the conjugate acid(s) of any of the bases provided
		• Must explicitly discuss acid strength of the conjugate acid(s) (using terms like “strong” and/or “weak”)
		• Do not code if argument only mentions “conjugate acid” without reference to the strength of that conjugate acid
Equilfav	Direction of equilibrium	Reference to the direction favoured in an acid–base equilibrium
		• Correct if the favoured direction is aligned with other concepts (stability, pK_a, etc.)
		• Incorrect if the favoured direction is misaligned with other concepts (stability, pK_a, etc.)
Basestr	Base strength	Reference to the strengths (“strong”, “weak”) of any of the three bases
		• Correct if NaH > NaOH > NH₃ discussed explicitly or implicitly
		○ Explicit: NaH is a stronger base than NaOH or NH₃
		○ Implicit: NaH is the strongest base
		• Incorrect if comparisons between bases are incorrect
Stab	Stability of molecule	Reference to the stability of a molecule
		• Does not need to include detailed discussion of energetics; simply saying “stable” warrants inclusion of this code
		• Also code for references to favourability of lower-energy species (“H₂ is a low energy species”)
		• Correct if the conclusion about stability is aligned with other concepts (acid strength, pK_a, etc.)
		• Incorrect if the conclusion about stability is misaligned with other concepts (acid strength, pK_a, etc.)
React	Reaction pathway	Describing how a molecule will react
		• Example: “Because NaH is a strong base, it will be able to deprotonate the alkyne”
		• Correct if the correct reactive pathway is described (base deprotonating acid)
		• Incorrect if incorrect reactive pathway is described (nucleophilic attack of carbonyl, etc.)
Electroneg	Electronegativity	Reference electronegativity of the basic atoms on any of the given bases
		• Code if differences in electron density are discussed or if an atom is said to have a greater capacity to attract electrons; argument does not need to explicitly say “electronegativity”
		• Correct if O > N > H discussed explicitly or implicitly
FC	Formal charge	Reference to the charges (or lack of charges) on any of the bases
		• Must be explicitly discussed as a concept in-text (“The oxygen has a negative charge, which means…”)
		• Correct if appropriate charge assigned to species
Size	Atom size	Reference to the atomic size of any of the bases
		• Correct if B > A > C discussed explicitly or implicitly
		• B > A, C acceptable. For example, “NH₃ is the largest base of the three” would be coded as correct
		• Incorrect if comparisons between bases are incorrect
LeChat	Le Châtelier's principle	References to the equilibrium “shifting” to compensate for loss of gaseous products
LeChat	Le Châtelier's principle	Correct if phrases describing Le Châtelier's principle is used. No need to say “Le Châtelier's principle” explicitly in the argument
Other	Other	Concepts that are unique and infrequent. Provide supplementary notes to define what “other” is referring to
Nuc	Nucleophilicity	Discussing the nucleophilicity of bases
		• Can be discussed correctly even though nucleophilicity not relevant to the argument
		• Correct if A, B > C discussed explicitly or implicitly (C is less nucleophilic and A, B are more nucleophilic)
		• Incorrect if comparisons between bases are incorrect

Coding sequence for concepts, links, and comparisons

1. Concepts

For each concept, the following considerations should be made:

• Is it discussed correctly [y(g)] (y = yes, concept is present; g = correct) or with errors [y(e)] (y = yes, concept is present; e = error)?

• Note that there may be cases where the discussion of the concept itself is correct while the link to another concept is incorrect

○ Example: NaH is the strongest base because it is the smallest of the three bases.

– Relative base strength and atom size are each discussed correctly.

2. Links

• Is the basis for the link correct (g) or incorrect (e)?

• In most cases, a link will be established in an argument as a student describes how Concept 1 impacts or explains Concept 2.

• If two concepts are linked, but the argument moved through an additional concept to link them, be sure to code all three concepts as being linked.

○ Example: if someone states that pK_a is linked to conjugate acid strength which is then linked to base strength, do not code pK_a as being linked to base strength (unless the two ideas are explicitly linked elsewhere). Links in this case would be pKa – acidstr – basestr.

3. Comparisons

• Example: “NaH is the strongest base” or “NaH is a stronger base than NaOH which is a stronger base than NH₃”

• If no comparisons are made, still include the base for which the concept is being discussed in reference to.

○ If the argument only discusses the pK_a of A, then the comparison portion of the code would simply have “A”

In a given cell in an Excel spreadsheet (in this example, coding for an instance of the concept “base strength”):

Table 6 Descriptions of the four modes of reasoning and associated diagrams

Level of reasoning	Description	Example diagram
Descriptive (D)	Claims are supported solely by describing features and/or the properties of entities (e.g. the reactants, products, etc.)
Relational (R)	Relationships between properties of the reaction materials and their activities are discussed in a correlative fashion (i.e., a cause–effect relationship is not established)
Linear causal (L)	All of the features of a relational argument are present and are accompanied by cause-and-effect relationships between the relevant properties of the entities and their activities
Multi-component causal (M)	Phenomena are seen as the result of the static or dynamic interplay of more than one factor and the direct interactions of several components

Coding Examples

Example 1

Concepts: Base strength discussed incorrectly. Steric hinderance (coded as other in this context) discussed correctly.
Level of granularity: Base strength is a phenomenological concept. Steric hinderance is a structural concept.
Links: No links established between any concepts.
Mode of reasoning: Descriptive. No explicit links established between any of the concepts described.
Comparisons: NH₃ explicitly stated to be stronger base than NaOH and bulkier than NaH.
Level of comparison: Completely compared. Both concepts are explicitly used to compare between the bases.

Example 2

Concepts: Base strength discussed correctly. “Strength of lone pairs” considered as other and discussed incorrectly. A reactive pathway is described but it is incorrect.
Level of granularity: Base strength is a phenomenological concept. Describing a reactive pathway is phenomenological.
Links: Base strength is linked to the reactive pathway and this link is justified through “strength of lone pairs” (other).
Mode of reasoning: Linear causal. The suggested reactive pathway is said to be dependent on base strength, and this is justified with “strength of lone pairs” (other).
Comparisons: Base strength and “strength of lone pair” explicitly compared between all three bases. The reactive pathway suggested is only discussed in relation to NaH.
Level of comparison: Partially compared.

Example 3

Concepts: pK_a values, conjugate acid strength, base strength, direction of the equilibrium, electronegativity, stability, and Le Chatelier's principle all discussed correctly.
Level of granularity: pK_a values, conjugate acid strength, base strength, direction of the equilibrium, and Le Chatelier's principle are all phenomenological. Electronegativity is an electronic concept. Stability is an energetic concept.
Links: The direction of the equilibrium is linked to pK_a values, conjugate acid strength, base strength, electronegativity. Stability is linked to Le Chatelier's principle and electronegativity.
Mode of reasoning: Multi-component causal. The direction of equilibrium is justified in three distinct ways: (1) pK_a values, (2) conjugate acid strength, (3) electronegativity with reference to how this influences stability, and (4) Le Chatelier's principle with reference to how this influences stability.
Comparisons: pK_a values and conjugate acid strength are only discussed for NaH. Base strength and electronegativity are used to compare NaOH and NH₃. Only NaH is included in discussions about Le Chatelier's principle. All three bases are compared in discussions involving the direction of the equilibrium and stability.
Level of comparison: Partially compared.

Example 4

Concepts: pK_a values, base strength, direction of the equilibrium, and stability all discussed correctly.
Level of granularity: pK_a values, base strength, and direction of the equilibrium, are all phenomenological. Stability is an energetic concept.
Links: The direction of the equilibrium is linked to pK_a values and base strength. Stability is linked to pK_a values.
Mode of reasoning: Linear causal. The direction of the equilibrium is said to be dependent on pK_a values, and this link is justified with reference to stability. Base strength is also offered as a factor, but no causal reasoning provided for why this is relevant to the direction of an equilibrium.
Comparisons: All concepts described are discussed in a general comparison across all three claims.
Level of comparison: Fully compared.

Example 5

Concepts: pK_a values and direction of the equilibrium both discussed correctly.
Level of granularity: pK_a values and direction of the equilibrium are both phenomenological.
Links: The direction of the equilibrium is linked to pK_a values.
Mode of reasoning: Relational. The direction of the equilibrium is said to be dependent on pK_a values, but this is link is not justified with additional causal reasoning.
Comparisons: Concepts are used to discuss NaH only.
Level of comparison: Isolated

Appendix B: expected answer based on intended acid–base learning outcomes

1. Draw the major product of the reaction in the box above.

2. Circle the base below that can be used to force the equilibrium of the first step to the product side:

3. Explain your answer in part b (why you chose one base and did not choose the others), using chemical structures as part of your answer.

Relevant pK _a values in H ₂ O:

NH₃ (NH₄⁺): 9.2

NaOH (H₂O): 15.7

sp C–H: 24 (acetylene; estimated)

NaH (H₂): 35 (estimated, see J. Chem. Soc., Chem. Comm., 1976, 648.)

Arguments (Table 6)

Expected (based on learning outcomes and expectations in that specific question): For one of the indicated bases to drive the acid–base equilibrium in the reaction illustrated above towards the products, it must be a less stable and stronger base than the carbanion. A base with a pK_a value of its conjugate acid greater than that of the alkyne (24) will be sufficiently strong. The pK_a values of the conjugate acids of NaH, NaOH, and NH₃ are 35, 15.7, and 9.2, respectively. Therefore, NaH is the only base capable of driving the equilibrium to products as it is the only base with a pK_a of its conjugate acid (35) greater than that of the alkyne (24). Using either NaOH or NH₃ would instead result in the equilibrium favouring reactants.

More granular causal argument (with chemical reasons as justification, molecular and atomic): Chemical properties and pK_a values together can be used to justify why NaOH and NH₃ will not be suitable bases to drive the acid–base equilibrium to products, and why H₂ will be suitable. These properties are used to explain the relative stability of the base (sodium hydroxide, ammonia, or sodium hydride) and conjugate base (acetylide).

The base and conjugate base involved in the equilibrium with sodium hydroxide have two competing chemical factors affecting their stability: the oxygen atom in the hydroxide (base) is more electronegative than the carbon atom in the conjugate base, which stabilizes the negative charge of the base more than the conjugate base. The electrons in the conjugate base are in an sp-hybridized orbital, which stabilizes the electrons more than the oxygen atom's electrons in the base that are in an sp³-hybridized orbital (by virtue of being closer to the protons in the nucleus and therefore having a lesser effective negative charge). pK_a evidence indicates that the electronegativity factor dominates over the hybridization factor, with H₂O being a stronger acid (pK_a = 15.7) than the acetylene (pK_a = 24). The equilibrium favours the side with the weaker (more stable) species.

Ammonia will similarly not be a suitable base to drive the equilibrium to products. Hybridization factors stabilize the electrons on the C in the conjugate base more than on the N in the base (sp versus sp³-hybrized, as above). However, both electronegativity and charge factors contribute to greater stability of the base than the conjugate base. The greater electronegativity of N (base) compared to C (conjugate base) stabilizes the base more than the conjugate base. Since charge decreases the stability of a species, the neutral nitrogen atom (base) is more stabilized than the negatively charged carbon atom (conjugate base). pK_a evidence indicates that the electronegativity and charge factors dominate over the hybridization factor, with ammonium being a stronger acid (pK_a = 9.2) than the acetylene (pK_a = 24). The equilibrium favours the side with the weaker (more stable) species.

Sodium hydride will be a suitable base to drive the equilibrium to products. Hybridization/Orbital factors stabilize the electrons on the H in the base more than on the C in the conjugate base (s versus sp-hybrized). However, both electronegativity and atom size contribute to greater stability of the conjugate base. The greater electronegativity of C (conjugate base) compared to H (base) stabilizes the conjugate base more than the base. Because carbon is a larger atom than hydrogen, the larger atom can better disperse the electron density (and negative charge), stabilizing the conjugate base more than the base. pK_a evidence indicates that the electronegativity and atom size factors dominate over the hybridization/orbital factor, with the acetylene being a stronger acid (pK_a = 25) than the acetylene (pK_a = 35). The equilibrium favours the side with the weaker (more stable) species, in this case, the product.

Appendix C: CHM 2120 2017 intended and enacted learning outcomes for Q1 and Q2

Acid–base learning outcomes

Required skills

• Draw the mechanism of an acid–base reaction, given the starting materials

• Identify the acid, base, conjugate acid, and conjugate base, given the starting materials

• Deprotonate a given molecule

• Protonate a given molecule

• Draw the conjugate acid and conjugate base, given a molecule

• Estimate the pK_a value of a given molecule

Key concepts

Apply the following ideas

• The stronger the acid, the weaker its conjugate base (and vice versa)

• An equilibrium favours the direction with the weaker (most stable) species (acid or base)

• The lower the pK_a value, the stronger the acid

• The following terms are synonymous: stronger = less stable = higher energy and weaker = more stable = lower energy

Applying required skills and key concepts to:

• Compare pK_a values of acids

• Compare relative stabilities of two species (e.g., bases or acids), analyzing the effects of the following chemical principles: electronegativity, atom size, resonance, hybridization, inductive effects, charge, solvent

In the following contexts:

• Within a single molecule

• Between multiple molecules

• In an acid–base equilibrium

Integrate acid–base concepts in situations including:

• Identify the strongest acid and base that can exist in a given solvent

• Identify the predominant form of a molecule at a given pH (Henderson–Hasselbalch equation)

• Draw (or select from a list) a base/acid that could quantitatively deprotonate/protonate a given acid/base

Note: For associated explanations and activities, see http://www.flynnresearchgroup.com/acid-base and/or the acid–base module in https://orgchem101.com/ (Flynn).

Course learning outcomes relevant to comparing mechanisms:

• Identify the leaving group, α carbon, base, nucleophile, electrophile, and solvent in a given reaction

• Decide whether a reaction is likely to proceed via an E1/S_N1, E2, or S_N2 mechanism using structural and energetic features

• Draw a mechanism for an E1, S_N1, E2, and S_N2 reaction, given various starting materials

• Identify appropriate solvents for each reaction

• Draw and label the reaction coordinate diagram for E1, S_N1, E2, and S_N2 reactions

• Predict the major product of the reaction, including stereochemical and regiochemical considerations

Acknowledgements

We thank Myriam Carle for her assistance with the inter-rater reliability portion of this study. JD thanks the Natural Sciences and Engineering Research Council for funding in the form of a Canadian Graduate Scholarship (Master's).

Notes and references

Abrams E. and Southerland S., (2010), The how's and why's of biological change: How learners neglect physical mechanisms in their search for meaning, Int. J. Sci. Educ., 23(12), 1271–1281.
Banerjee A. C., (1991), Misconceptions of students and teachers in chemical equilibrium, Int. J. Sci. Educ., 13(4), 487–494.
Barwell R., (2018), Word problems as social texts, Numer. Soc. Pract. Glob. Local Perspect., 101–120.
Berland L. K. and Reiser B. J., (2009), Making sense of argumentation and explanation, Sci. Educ., 93, 26–55.
Bernholt S. and Parchmann I., (2011), Assessing the complexity of students’ knowledge in chemistry, Chem. Educ. Res. Pract., 12(2), 167–173.
Bhattacharyya G., (2006), Practitioner development in organic chemistry: How graduate students conceptualize organic acids, Chem. Educ. Res. Pract., 7(4), 240–247.
Biggs J. and Tang C., (2011), Aligning assessment tasks with intended learning outcomes: principles, in Teaching for Quality Learning at University, pp. 191–223.
Bodé N. E., Deng J. M. and Flynn A. B., (2019), Getting past the rules and to the WHY: Causal mechanistic arguments when judging the plausibility of organic reaction mechanisms, J. Chem. Educ., 96(6), 1068–1082.
Carle M. S. and Flynn A. B., (2020), Essential learning outcomes for delocalization (resonance) concepts: How are they taught, practiced, and assessed in organic chemistry? Chem. Educ. Res. Pract., 21(2), 622–637.
Carle M. S., El Issa R., Pilote N. and Flynn A. B., (2020), Ten essential delocalization learning outcomes: How well are they achieved? ChemRxiv, 1–28.
Carmel J. H., Herrington D. G., Posey L. A., Ward J. S., Pollock A. M. and Cooper M. M., (2019), Helping Students to “do Science”: Characterizing scientific practices in general chemistry laboratory curricula, J. Chem. Educ., 96(3), 423–434.
Cartrette D. P. and Mayo P. M., (2011), Students’ understanding of acids/bases in organic chemistry contexts, Chem. Educ. Res. Pract., 12(1), 29–39.
Caspari I., Kranz D. and Graulich N., (2018a), Resolving the complexity of organic chemistry students’ reasoning through the lens of a mechanistic framework, Chem. Educ. Res. Pract., 19(4), 1117–1141.
Caspari I., Weinrich M. L., Sevian H. and Graulich N., (2018b), This mechanistic step is “productive”: Organic chemistry students’ backward-oriented reasoning, Chem. Educ. Res. Pract., 19(1), 42–59.
Cian H., (2020), The influence of context: Comparing high school students’ socioscientific reasoning by socioscientific topic, Int. J. Sci. Educ., 42(9), 1–19.
Cooper M. and Klymkowsky M., (2013), Chemistry, life, the universe, and everything: A new approach to general chemistry, and a model for curriculum reform, J. Chem. Educ., 90(9), 1116–1122.
Cooper M. M., Kouyoumdjian H. and Underwood S. M., (2016), Investigating students’ reasoning about acid–base reactions, J. Chem. Educ., 93(10), 1703–1712.
Crandell O. M., Kouyoumdjian H., Underwood S. M. and Cooper M. M., (2018), Reasoning about reactions in organic chemistry: Starting it in general chemistry.
Darden L., (2002), Strategies for discovering mechanisms: Schema instantiation, modular subassembly, forward/backward chaining, Philos. Sci., 69(S3), 354–365.
DeCocq V. and Bhattacharyya G., (2019), TMI (Too much information)! Effects of given information on organic chemistry students’ approaches to solving mechanism tasks, Chem. Educ. Res. Pract., 20(1), 213–228.
Dixson D. D. and Worrell F. C., (2016), Formative and summative assessment in the classroom, Theory Pract., 55(2), 153–159.
Duis J. M., (2011), Organic chemistry educators’ perspectives on fundamental concepts and misconceptions: An exploratory study, J. Chem. Educ., 88(3), 346–350.
Emig J., (1977), Writing as a mode of learning, Coll. Compos. Commun., 28(2), 122–128.
European Union, (2006), Recommendation of the European Parliament and of the Council of 18 December 2006 on key competences for lifelong learning, Off. J. Eur. Union, L394/19–L394/18.
Flynn A. B., (2017), Flipped chemistry courses: Structure, aligning learning outcomes, and evaluation, in Online Approaches to Chemical Education, American Chemical Society, pp. 151–164.
Flynn A. B., (2015), Structure and evaluation of flipped chemistry courses: Organic & spectroscopy, large and small, first to third year, English and French, Chem. Educ. Res. Pract., 16(2), 198–211.
Flynn A. B., OrgChem101.
Flynn A. B. and Amellal D. G., (2016), Chemical information literacy: pK_a values-where do students go wrong? J. Chem. Educ., 93(1), 39–45.
Flynn A. B. and Ogilvie W. W., (2015), Mechanisms before reactions: A mechanistic approach to the organic chemistry curriculum based on patterns of electron flow, J. Chem. Educ., 92(5), 803–810.
Grimberg B. I. and Hand B., (2009), Cognitive pathways: Analysis of students’ written texts for science understanding, Int. J. Sci. Educ., 31(4), 503–521.
Hackling M. W. and Garnett P. J., (1985), Misconceptions of chemical equilibrium, Eur. J. Sci. Educ., 7(2), 205–214.
Hallgren K. A., (2012), Computing inter-rater reliability for observational data: An overview and tutorial, Tutor. Quant. Methods. Psychol., 8(1), 23–34.
Huddle P. A. and Pillay A. E., (1996), An in-depth study of misconceptions in stoichiometry and chemical equilibrium at a South African University, J. Res. Sci. Teach., 33(1), 65–77.
Jimenez-Aleixandre M. P. and Federico-Agraso M., (2009), Justification and persuasion about cloning: Arguments in Hwang's paper and journalistic reported versions, Res. Sci. Educ., 39(3), 331–347.
Jones M. D. and Crow D. A., (2017), How can we use the “science of stories” to produce persuasive scientific stories, Palgrave Commun., 3(1), 1–9.
Kelly G. J., Druker S. and Chen C., (1998), Students’ reasoning about electricity: Combining performance assessments with argumentation analysis, Int. J. Sci. Educ., 20(7), 849–871.
Kraft A., Strickland A. M. and Bhattacharyya G., (2010), Reasonable reasoning: Multi-variate problem-solving in organic chemistry. Chem. Educ. Res. Pract., 11(4), 281–292.
Krajcik J. S. and Nakhleh M. B., (1994), Influence of levels of information as presented by different technologies on students’ understanding of acid, base, and pH concepts, J. Res. Sci. Teach., 31(10), 1077–1096.
Krippendorff K., (1970), Estimating the reliability, systematic error and random error of interval data, Educ. Psychol. Meas., 30(1), 61–70.
Kuhn D., (2011), The skills of argument, Cambridge University Press.
Laverty J. T., Underwood S. M., Matz R. L., Posey L. A., Carmel J. H., Caballero M. D., et al., (2016), Characterizing college science assessments: The three-dimensional learning assessment protocol, PLoS One, 11(9), 1–21.
Luisi P. L., (2002), Emergence in chemistry: Chemistry as the embodiment of emergence, Found. Chem., 4(3), 183–200.
Machamer P., Darden L. and Craver C. F., (2000), Thinking about Mechanisms, Philos. Sci., 67(1), 1–25.
MacRie-Shuck M. and Talanquer V., (2020), Exploring Students' Explanations of Energy Transfer and Transformation, J. Chem. Educ., 97(12), 4225–4234.
Maeyer J. and Talanquer V., (2013), Making predictions about chemical reactivity: Assumptions and heuristics, J. Res. Sci. Teach., 50(6), 748–767.
McClary L. and Talanquer V., (2011), Heuristic reasoning in chemistry: Making decisions about acid strength, Int. J. Sci. Educ., 33(10), 1433–1454.
McNeill K. L., Lizotte D. J., Krajcik J. and Marx R. W., (2006), Supporting students’ construction of scientific explanations by fading scaffolds in instructional materials, J. Learn. Sci., 15(2), 153–191.
Moon A., Moeller R., Gere A. R. and Shultz G. V., (2019), Application and testing of a framework for characterizing the quality of scientific reasoning in chemistry students’ writing on ocean acidification, Chem. Educ. Res. Pract., 20(3), 484–494.
Moreira P., Marzabal A. and Talanquer V., (2019), Using a mechanistic framework to characterise chemistry students’ reasoning in written explanations, Chem. Educ. Res. Pract., 20(1), 120–131.
National Research Council, (2012), A Framework for K-12 Science Education, National Academies Press.
Ogilvie W. W., Ackroyd N., Browning S., Deslongchamps G., Lee F. and Sauer E., (2017), Organic Chemistry: Mechanistic Patterns, 1st edn, Nelson Education Ltd.
Organisation for Economic Cooperation and Development, (2006), Assessing scientific, reading and mathematical literacy: A framework for PISA.
Orgill M. and Sutherland A., (2008), Undergraduate chemistry students’ perceptions of and misconceptions about buffers and buffer problems, Chem. Educ. Res. Pract., 9(2), 131–143.
Osborne J. F. and Patterson A., (2011), Scientific argument and explanation: A necessary distinction? Sci. Educ., 95(4), 627–638.
Quilez-Pardo J. and Solaz-Portoles J. J., (1995), Students’ and teachers’ misapplication of Le Chatelier's Principle: Implications for the teaching of chemical equilibrium, J. Res. Sci. Teach., 32(9), 939–957.
Raycroft M. A. R. and Flynn A. B., (2020), What works? What's missing? An evaluation model for science curricula that analyses learning outcomes through five lenses, Chem. Educ. Res. Pract., 21(4), 1110–1131.
Reed J. J., Brandriet A. R. and Holme T. A., (2017), Analyzing the role of science practices in ACS exam items, J. Chem. Educ., 94(1), 3–10.
Sadler T. D., (2004), Informal reasoning regarding socioscientific issues: A critical review of research, J. Res. Sci. Teach., 41(5), 513–536.
Sadler T. D. and Zeidler D. L., (2005), The significance of content knowledge for informal reasoning regarding socioscientific issues: Applying genetics knowledge to genetic engineering issues, Sci. Educ., 89(1), 71–93.
Sevian H. and Talanquer V., (2014), Rethinking chemistry: a learning progression on chemical thinking, Chem. Educ. Res. Pr., 15(1), 10–23.
Sevian H., Bernholt S., Szteinberg G. A. and Auguste S., (2015), Use of representation mapping to capture abstraction in problem solving in different courses in chemistry, Chem. Educ. Res. Pract., 16(3), 429–446.
Social Sciences and Humanities Research Council, (2018), Truth Under Fire in a Post-Fact World.
Southard K. M., Espindola M. R., Zaepfel S. D. and Molly S., (2017), Generative mechanistic explanation building in undergraduate molecular and cellular biology, Int. J. Sci. Educ., 39(13), 1795–1829.
Stowe R. L. and Cooper M. M., (2017), Practicing what we preach: Assessing “critical thinking” in organic chemistry, J. Chem. Educ., 94(12), 1852–1859.
Stoyanovich C., Gandhi A. and Flynn A. B., (2015), Acid–base learning outcomes for students in an introductory organic chemistry course, J. Chem. Educ., 92(2), 220–229.
Talanquer V., (2007), Explanations and teleology in chemistry education, Int. J. Sci. Educ., 29(7), 853–870.
Talanquer V., (2017), Concept inventories: Predicting the wrong answer may boost performance, J. Chem. Educ., 94(12), 1805–1810.
Talanquer V., (2018a), Assessing for chemical thinking, in Research and Practice in Chemistry Education, Springer Nature Singapore Pte Ltd, pp. 123–133.
Talanquer V., (2018b), Progressions in reasoning about structure – property relationships, Chem. Educ. Res. Pract., 19(4), 998–1009.
Talanquer V. and Pollard J., (2010), Let's teach how we think instead of what we know, Chem. Educ. Res. Pract., 11(2), 74–83.
Toulmin S., (1958), The Uses of Argument, Cambridge University Press.
Trommler F., Gresch H., Hammann M., Trommler F., Gresch H. and Hammann M., (2018), Students’ reasons for preferring teleological explanations, Int. J. Sci. Educ., 40(2), 159–187.
United Nations, (2015), Transforming our World: the 2030 Agenda for Sustainable Development.
Verheij B., (2003), Dialectical argumentation with argumentation schemes: An approach to legal logic, Artif. Intell. Law, 11(2–3), 167–195.
van Mil M. H. W., Jan D., Arend B. and Waarlo J., (2013), Modelling molecular mechanisms: A framework of scientific reasoning to construct molecular-level explanations for cellular behaviour, Sci. Educ., 22(1), 93–118.
von Aufschnaiter C., Erduran S., Osborne J., Simon S., Education P. and Giessen J., (2008), Arguing to learn and learning to argue: Case studies of how students’ argumentation relates to their scientific knowledge, J. Res. Sci. Teach., 45(1), 101–131.
Voska K. W. and Heikkinen H. W., (2000), Identification and analysis of student conceptions used to solve chemical equilibrium problems, J. Res. Sci. Teach., 37(2), 160–176.
Weinrich M. L. and Sevian H., (2017), Capturing students’ abstraction while solving organic reaction mechanism problems across a semester, Chem. Educ. Res. Pract., 18(1), 169–190.
Weinrich M. L. and Talanquer V., (2016), Mapping students’ modes of reasoning when thinking about chemical reactions used to make a desired product, Chem. Educ. Res. Pract., 17(2), 394–406.
Wheeler A. E. and Kass H., (1978), Student misconceptions in chemical equilibrium, Sci. Educ., 62(2), 223–232.
Windschitl M., Thompson J. and Braaten M., (2008), Beyond the scientific method: Model-based inquiry as a new paradigm of preference for school science investigations, Sci. Educ., 92(5), 941–967.

Click here to see how this site uses Cookies. View our privacy policy here.