Analysing the impact of a discussion-oriented curriculum on first-year general chemistry students' conceptions of relative acidity

Lisa Shah a, Christian A. Rodriguez a, Monica Bartoli b and Gregory T. Rushton *ab
aDepartment of Chemistry, Stony Brook University, Stony Brook, New York 11794, USA
bInstitute for STEM Education, Stony Brook University, Stony Brook, New York 11794, USA. E-mail:

Received 4th August 2017 , Accepted 19th February 2018

First published on 19th February 2018

Instructional strategies that support meaningful student learning of complex chemical topics are an important aspect of improving chemistry education. Adequately assessing the success of these approaches can be supported with the use of aligned instruments with established psychometrics. Here, we report the implementation and assessment of one such curriculum, Chemical Thinking, on first-year general chemistry students' conceptions of relative acidity using the recently-developed concept inventory, ACIDI. Our results reveal that, overall, students performed significantly better on ACIDI following instruction, with scores consistent with those previously reported for students who had completed one semester of organic chemistry. Students performed equally well on a delayed post-test administered ten weeks after final instruction, which suggests that instruction promoted a stable conceptual reprioritisation. Item analysis of ACIDI revealed that students generally made conceptual gains on items where inductive effects were the primary determinants of conjugate base stability and relative acidity. However, students overwhelmingly struggled on items where resonance was the primary determinant. Analysis of student–student arguments in active learning settings provided evidence for how the quality of student arguments impacted their conceptions. Overall, these findings suggest that students were able to avoid several superficial misconceptions cited in the literature about relative acidity, and that this topic, traditionally taught exclusively in organic chemistry, may be introduced earlier in the sequence of curricular topics. Implications for future studies on the role of argumentational aspects of student–student conversations and facilitation strategies in promoting or hindering meaningful learning are discussed.


Teaching chemistry in ways conducive to strengthening students' conceptual understanding is an important strategy for improving chemical education at the tertiary level. Traditional curricula often reward rote memorisation over deep conceptual understanding, which may inhibit or discourage students from pursuing more advanced study in the field (Grove et al., 2008; Grove and Bretz, 2012).

Acid–base chemistry is one of the most fundamental, yet challenging, topics in the discipline. The majority of organic and bioorganic processes include an acid–base reaction at some step (Bhattacharyya, 2006; Brown et al., 2009). However, current approaches to teaching acid–base chemistry tend to leave students with a superficial understanding of related concepts, which leads to the misapplication of heuristics in making qualitative predictions (Bhattacharyya, 2006; Rushton et al., 2008; McClary and Talanquer, 2011a, 2011b; Stoyanovich et al., 2014; Cooper et al., 2016). For example, at the introductory level, students are known to rely on the presence of an ‘H’ or ‘OH’ to identify acids and bases, respectively (Furió-Más et al., 2005), and have difficulty understanding the calculations associated with acid–base equilibria (e.g., pH, buffers, titrations) (Nakhleh and Krajcik, 1994; Demerouti et al. 2004a, 2004b). Little work has been done to examine students' understanding of acids and bases in more advanced college-level courses (Bhattacharyya, 2006; Ferguson and Bodner, 2008; Cartrette and Mayo, 2011; McClary and Talanquer, 2011a, 2011b; Stoyanovich et al., 2014). Bhattacharyya (2006) reported on graduate students' fragmented mental models of Brønsted acids and Ferguson and Bodner (2008) reported on undergraduates’ superficial use of arrow formalism in the mechanistic explanation of acid–base reactions. With respect to relative acidity of organic acids, McClary and Talanquer (2011a, 2011b) have examined undergraduate students' over-reliance on heuristics (e.g., using surface-level functional groups as key indicators of acidity) as a means to compensate for their lack of deep, conceptual understanding. Specific instances of this type of reasoning are noted in the later work of McClary (McClary and Bretz, 2012), who found that students were often misguided by the presence of an explicitly-drawn hydrogen in a compound as an indication that it was more acidic than one without this representation because it could exclusively be deprotonated. These findings offer a useful perspective of the knowledge gaps of “successful” students and suggest that traditional instructional methods (i.e., didactic lecture) may be less sensitive in ways that facilitate long-term retention of core concepts.

A popular approach to improving performance and retention of content within higher education chemistry communities has been the use of cooperative learning pedagogies, in which students engage in the completion of activities designed to encourage collaboration and reasoned argumentation to promote learning (Cooper, 1995; Gupta, 2004; Moog et al., 2006). Recent meta-analyses of these approaches in chemistry have reported improved student achievement across instructional contexts (Warfa, 2015; Apugliese and Lewis, 2017). However, the more recent development and use of norm-referenced instruments to measure learning gains have revealed variations in reported results (Lewis and Lewis, 2005; Hein, 2012; Chase et al., 2013), which have led researchers to consider the means by which cooperative learning does or does not achieve desired student outcomes (Towns and Kraft, 2011; Stains and Vickrey, 2017). Despite these differences, few would argue that cooperative learning approaches as a whole are less effective than traditional didactic lecture pedagogies. Rather, the details of these implementations and whether or not they are being used alongside curricula that promote conceptual understanding over rote memorisation (Demerouti et al., 2004a) may be a significant factor in their observed impact on student performance.

Recently, Talanquer and co-authors have reported the development of a revised curriculum for teaching general chemistry called Chemical Thinking (Talanquer and Pollard, 2010; Sevian and Talanquer, 2014). In this curriculum, content is organised into units shaped by fundamental questions that guide the practice of chemistry (e.g., How do we distinguish substances? How do we determine structure?). These questions inform a discussion-oriented focus of the curriculum that goes beyond algorithmic applications of knowledge by providing opportunities for students to engage in conversation with peers or facilitators in order to make sense of concepts and ideas throughout the course. Lecture slides often include experimental data or graphs alongside thought-provoking questions meant to initiate class discussions, and similar materials are provided for small-group cooperative learning sessions. Given the highly-conceptual nature of this curriculum, we hypothesised that this type of intervention might be useful for promoting students' understanding of challenging concepts by encouraging and rewarding thoughtful discussion of course topics.

An implicit assumption made by instructors in cooperative learning chemistry courses is that students benefit from environments where small-group, student–student interactions complement the patterns in instructor–student conversations of more didactic settings. Advocates of active learning hold to the premise that peer discussions can encourage argumentation and socio-cognitive conflict to promote conceptual reprioritisation, making them at least as valuable as direct instruction from lectures (Cooper, 1995; Limón, 2001). The quality of these arguments likely dictates the extent to which students undergo conceptual reprioritisation (Cazden and Beck, 2003), as researchers associate students' use of well-constructed, scientific arguments with conceptual understanding (Kittleson and Southerland, 2004). Argumentation analysis has more recently been used by chemical education researchers to investigate student–student and student–instructor interactions in small-groups (Cole et al., 2012; Kulatunga et al., 2013, 2014; Warfa et al., 2014). An analysis of the quality of student arguments can yield insights about the specific instructional aspects that may be beneficial or counterproductive to student learning.

However, McClary and Talanquer (2011b) cite that a barrier to the uptake of new instructional approaches is their evaluation due to a lack of aligned assessment instruments, specifically those with the ability to detect student conceptions. Recently, chemical education researchers have worked to develop valid and reliable concept inventories, cognitive instruments with items designed to identify the strength with which students hold normative or alternative ideas about chemical topics. When used in conjunction with a curriculum that facilitates conceptual understanding, concept inventories are proposed as measures of the effectiveness of the approach (Bretz, 2014). In the past decade, topic-specific (e.g., chemical bonding, acid–base chemistry, equilibria, redox) concept inventories have been developed for use in general chemistry courses (Mulford and Robinson, 2002; Evans et al., 2003; Bardar et al., 2007; Brandriet et al., 2011; McClary and Bretz, 2012; Luxford and Bretz, 2014), which affords greater opportunities to compare interventions within and across institutions and topics. ACIDI is one such inventory developed by Bretz and coworkers (McClary and Bretz, 2012; Bretz and McClary, 2014), designed to investigate undergraduate students' conceptions of factors governing relative acidity based on previous studies that explore students' use of mental models in ranking and justifying trends in organic acid strength (McClary and Talanquer, 2011a). The combination of answer, reason, and confidence tiers throughout the assessment provides instructors with valuable insights about which normative or alternative conceptions students hold and how strongly they adhere to them.

The focus of this study was to evaluate a curriculum that introduces first-year general chemistry students to the topic of relative acidity using the discussion-oriented curriculum Chemical Thinking. ACIDI is well-aligned to this curriculum and was used to evaluate both specific student conceptions about acid strength and the effectiveness of instruction at helping students avoid common misconceptions cited in the literature (McClary and Talanquer, 2011a; McClary and Bretz, 2012). Students' first experience with new material was in the context of small-group, cooperative learning settings, and analysis of group conversations provided insights about how the quality of arguments may have impacted student conceptions. Our findings and conclusions speak to the effectiveness of such a curriculum at moving first-year students beyond surface-level conceptions of acid strength and help identify specific areas for improvement. From a broader perspective, implications for future studies on the role of argumentation in student–student conversations and facilitation strategies in promoting or hindering conceptual understanding are discussed.

Theoretical framework

The presented study aims to analyse the impact of a discussion-based curriculum on student learning. Students' conceptual reprioritisation is assessed as a means of capturing their changing knowledge states using a concept inventory and the quality of student arguments are analysed as a means of explaining the presence of normative or alternative conceptions. These goals are informed by the social constructivist theory of learning as well as its relationship to argumentation framework.

To accommodate new ideas and beliefs, some researchers have suggested that a conceptual ‘reconstruction’ must occur (Hammann et al., 2008; Duit and Treagust, 2012). More recently, however, others have proposed that naive or preconceptions can coexist with scientifically normative ones (Clement, 2008; Potvin et al., 2015; Potvin, 2017), and as a result of instruction, students can come to prioritise one over the other. Literature regarding conceptual reprioritisation has primarily been situated within the context of traditional constructivist theories of learning, where the mental structures of individual learners and their pre-existing knowledge base are critical determinants of how new knowledge will be incorporated, and whether this integration of knowledge will result in the prioritisation of normative or alternative conceptions (Duit and Treagust, 1995; Limón, 2001). However, traditional constructivist theories operate under the assumption that individual learners come to prioritise certain conceptions over others when presented with information in isolation from their social and cultural environments. Sociocultural theories of learning, in contrast to this perspective, claim that individuals are inseparable from their respective cultures and that cultural tools (e.g., language, symbols) employed in social interactions mediate the internalisation of knowledge (Vygotsky, 1980, 1997). Our position, consistent with that of social constructivists, is that conceptual reprioritisation necessitates the coordination of ideas from both theories, with language as the mediating factor between social and individual planes (Leach and Scott, 2002). O'loughlin's (1992) perspective highlights the interplay between these two views:

“…meaning making is neither exclusively a product of the person acting, nor of the activity, nor of the setting, but a dialectical interaction between all three in a given context. This too is a form of constructivism, but it is one that emphasises the subjectivity, the sociocultural situatedness, and the intrinsically dialectical nature of the process of coming to know.

We argue that while individual learners may ultimately come to prioritise their conceptions themselves, they do not do so passively. By engaging in social interactions, they make sense of the concepts encountered in the intermental (i.e., social) plane before being able to fully internalise knowledge and prioritise certain conceptions over others.

With respect to instruction, social constructivist views support the use of discussion-based approaches to facilitate individual conceptual reprioritisation. Discussion actively promotes efforts to make meaning of the content (Moog et al., 2006; Moon et al., 2016) and explain ideas (Inagaki and Hatano, 2008; Fonseca and Chi, 2011), and allows students to evaluate the claims of others (Greene et al., 2008). It further enables students to publicise their views, consider their peers’ alternatives, and resolve issues stemming from opposing viewpoints (Warfa et al., 2014). While discussion can involve a myriad of oral interactions, argumentation plays a critical role in the construction of knowledge within the scientific community (Erduran et al., 2004) and among students in science classrooms (Driver et al., 2000). Argumentation is the specific process by which individuals with opposing initial conceptions come to accept or refute presented claims. When these claims are directly related to specific normative or alternative conceptions, students' abilities to construct a persuasive and convincing argument is a critical determinate of the conceptions toward which they and their peers gravitate (Nystrand et al., 1997; Teasley, 1997; Wells, 2007; Rapp, 2014). A number of studies on the quality of argumentation in educational environments have applied Toulmin's model (Toulmin, 1958) as an analytic framework, in which categories of (i) claims, (ii) evidence/data (iii) warrants, and (iv) rebuttals are among those used to examine the structure and quality of student arguments (Voss et al., 1983; Kelly et al., 1998; Cho and Jonassen, 2002; Erduran et al., 2004; Kenyon and Reiser, 2006). Erduran et al. (2004), for example, used Toulmin's model to develop a rubric to assess the quality of arguments based on the presence or absence of specific argument features. In this rubric, arguments are rated Level 1 (e.g., simple claim-versus-claim) through level 5 (e.g., claims presented with data countered by several rebuttals). We employ the same rubric in our study (see Appendix Table 4) as a means of evaluating the quality of arguments constructed by students in the cooperative learning workshops. The relative quality of student arguments serves as evidence for how they may have impacted student conceptions. Therefore, instructional approaches that encourage deep discussion are likely to elicit quality arguments between students with opposing conceptions that drive internal socio-cognitive conflict, ultimately promoting student learning.

Rationale and research questions

We contend that chemical instruction should be appropriately designed to guide learners through the process of conceptual reprioritisation. Research suggests the need to implement curricula in general chemistry that promote students' use of evaluative reasoning over memorization to improve their conceptual understanding of acid–base chemistry (McClary and Talanquer, 2011a, 2011b). Our work aims to assess the effectiveness of one such curriculum (Chemical Thinking) (Talanquer and Pollard, 2010) at remediating students' alternative conceptions of relative acidity in a first-year general chemistry course. Using ACIDI to identify student conceptions and analysis of student–student conversations to pinpoint instances of construction of these conceptions (Brown et al., 2010; Sawyer et al., 2013), we sought to answer the following research questions:

(1) What is the effect of the curriculum on first-year general chemistry students' normative and alternative conceptions about relative acidity?

(2) How does the quality of student arguments in cooperative learning environments help explain students' normative or alternative conceptions?

Data sources and methods

Study context

The study was conducted over ten weeks in the Fall of 2016 at a large research university in the northeastern United States with a population of 230 first-year students enrolled in the first of a three semester General Chemistry–Organic Chemistry sequence. The course was intended for advanced students who had previously taken at least two years of high school chemistry (e.g., Advanced Placement, International Baccalaureate). The Chemical Thinking curriculum was used as the primary framework for instruction. The course covered qualitative and quantitative analysis of thermodynamic stability, chemical kinetics, equilibrium, reactivity, and electrochemistry. Relevant instruction for this study included a three-week introduction to acids and bases, which covered qualitative and quantitative analysis of acid–base equilibria (2 weeks) as well as relative acidity of organic acids and bases (1 week). A graduate student identified specific instances in the curriculum (e.g., class slides, workshop prompts) where the course content was closely aligned to items on the ACIDI concept inventory. Emphasis was placed on encouraging students to analyse organic structures with respect to the identity of atoms bearing the proton(s) of interest, resonance (de)stabilisation, and inductive effects to gauge the relative acidity or basicity of molecules (see Appendix Fig. 4). The course consisted of a discussion-oriented, large lecture component (160 minutes per week) and a separate cooperative-learning ‘workshop’ (80 minutes per week). The lecture component was designed to promote student–student conversations across the entire classroom population. Specifically, the instructor would pose a question, often alongside some experimental data, one student would use a microphone to publicly respond, and then multiple students would also use a microphone to respond publicly by either supporting or critiquing their peers’ argument(s). The public vocalisation of competing conceptions was intended to bring about the cognitive discord necessary for even non-vocal students to evaluate their ideas in the context of the several others presented. The workshop sessions occurred once a week for the entirety of the semester and were facilitated by graduate and undergraduate teaching assistants (TAs). Students were first exposed to new material in the context of the workshops settings, where they were expected to discuss new ideas with a small group of their peers for the entirety of the 80 minute session to develop their understandings. As outlined in the course syllabus, groups were expected to work cooperatively (i.e., all students participate in conversations, students present and critique each other's ideas for the entirety of the 80 minute sessions) and were prompted by TAs if needed. Data collection was approved by the Human Subjects in Research Committee of the university where the study was conducted. ACIDI was administered first as a diagnostic tool for determining student preconceptions about the relative acidity of organic acids prior to instruction, then as a post-test after the acid/base module was completed. A delayed post-test was administered after a mid-semester break, immediately at the start of the following semester, approximately 10 weeks after students had been given any related instruction on the topic. Of those who consented to participate in the study, 213 students completed the pre- and post-tests, and 117 completed the pre-, post-, and delayed post-tests. Additionally, six student groups (of three or four members each) were audio and video recorded during all workshop meetings. These recordings were used to capture argumentation structures used in this active learning environment.

Concept inventories

ACIDI was used as a diagnostic tool to identify students' normative and alternative conceptions about organic acid strength. The assessment contains nine multiple choice items, several of which are multi-tiered, with confidence intervals for each item. A three-question sequence is repeated for three sets of compounds (McClary and Bretz, 2012; Bretz and McClary, 2014). Students were (i) told which compound is most acidic of the three and asked to choose the response that best justifies why, (ii) asked to rank the relative acidity of the two remaining compounds, and (iii) then asked to choose the answer that best supported their ranking (Table 1). The concept inventory was administered during the workshop sessions as a pre-test and post-test (four weeks after the pre-test) during the Fall 2016 semester. It was re-administered as a delayed post-test at the start of the following semester (Spring 2017) during the lecture component of an introductory organic chemistry course (ten weeks after the post-test). While concept inventories have been used previously to measure conceptual reprioritisation, delayed post-tests have not been the standard in chemical education literature, but allow researchers to more accurately investigate claims of conceptual reprioritisation following an intervention (Hameed et al., 1993; Çalık et al., 2007, 2009; Coştu et al., 2010). The delayed post-test allowed us to ascertain how strongly students held to their altered conceptions two and a half months after formal instruction had ended.
Table 1 Structure sets used in ACIDI. Reprinted from Çalık et al. (2009) by permission of Taylor & Francis Ltd, (license #4165870868313)
Set 1 image file: c7rp00154a-u1.tif
Set 2 image file: c7rp00154a-u2.tif
Set 3 image file: c7rp00154a-u3.tif

Audio/video recordings

Six student groups (with three or four members each) were selected from among those who consented to participate in the study for audio/video recording during all workshop sessions over the course of the semester. A total of nineteen students were selected for recording. After each session, recordings were transcribed and pseudonyms were ascribed to each student to protect their privacy. While recordings for all fourteen workshop sessions were analysed by the authors, analysis is presented here for just one workshop session, during which students discussed relevant content aligned to ACIDI. Six groups were recorded, however audio was unavailable for two groups due to equipment failure during the sessions. The coded transcripts for the remaining four groups are provided in the Appendix. Two of these four transcripts are discussed in detail in the main text.

Validity and reliability

The psychometric standards for qualitative research have been previously outlined to encourage researchers to provide sufficient detail to inform future work in the field (Arjoon et al., 2013). This study utilised several established criteria for validity and reliability where possible to demonstrate the rigor of the presented work.

Content validity of ACIDI was established by an organic chemistry faculty member who teaches study participants in the following course (Organic Chemistry I). He agreed that many aspects of the instrument adequately assessed students' understanding of relative acidity, but stated that student interviews would be critical in identifying their reasons for selecting particular options. Students' cognitive interview responses to the statement “read the question and tell me what you thought it was asking” were used to further establish content validity (see Appendix). These types of interviews are valuable for assessing the nuances of particular assessment items (Carmines and Zeller, 1979). Cronbach α values (Cronbach, 1951) of 0.30, 0.28, and 0.29 (a measure of internal consistency) were determined for the pre-test, post-test, and delayed post-test respectively, consistent with low α previously reported for this concept inventory (McClary and Bretz, 2012). While the accepted standard for internal consistency is α > 0.70, the authors of this inventory have described that diagnostic tools aimed at identifying student misconceptions are unlikely to meet this criteria due to the lack of redundancy among items (Bretz and McClary, 2014). The first two authors independently coded all workshop transcripts a priori using Toulmin's framework (i.e., claim, data, warrant, rebuttal) before meeting to compare codes. After comparison, the two authors were able to achieve an inter-rater reliability of 79.7%. The two authors discussed the remaining 20% until 100% agreement was reached. These authors then coded each individual or co-constructed argument using the framework developed by Erduran et al. (2004). The authors were able to independently code and subsequently compare to reach 100% agreement.

Student interviews

The nineteen recorded students were invited via e-mail to be individually interviewed about their responses to ACIDI as well as their insights about which specific aspects of the course most impacted their conceptions of the topic. Twelve of these students agreed and were individually interviewed by one of the authors. The interview protocol is included in the Appendix. For each item, students were asked to (i) explain what they thought the question was asking, (ii) why they chose the response they did, and (iii) why they did not choose the remaining choices.

Data analysis

Students who consented to participating in the research study were given unique numerical identifiers to protect student privacy and keep data confidential. Student answers and confidence ratings on the pre-test, post-test, and delayed post-test were entered into an SPSS spreadsheet and only scores for students who provided both an answer and a corresponding confidence rating for all items are reported here (Npre/post = 213, Ndelayed[thin space (1/6-em)]post = 117). For each of the nine questions, dummy variables of 0 and 1 were used to code incorrect and correct responses, respectively. Scores for each administration were computed and converted to percentages. Mean percent correct and mean confidence levels were determined for the inventory overall, as well as for each individual item. Previous work on this inventory has established the use of the percentage of students who chose a distractor and the average student confidence for each distractor to identify the strength of alternative student conceptions (McClary and Bretz, 2012). Incorrect responses that were chosen by a percentage of students 10% greater than that due to random chance (i.e., 35% for items with four response options or 60% for items with two response options) were identified for further characterisation of the strength of each alternative conception as either spurious (mean confidence levels below 50%) or genuine (mean confidence level above 50%) (Caleon and Subramaniam, 2010).

Argumentation analysis

Toulmin's (1958) argumentation scheme was used to code student–student discussions in the workshop portion of the course. This analytical framework identifies the basis of an argument to include claims supported by data, which can be linked through warrants. Rebuttals are made to challenge assertions with opposing claims. Erduran's et al. (2004) framework for assessing the quality of arguments was then applied to gauge relative differences in the level of student arguments in the workshop setting.


ACIDI scores

The primary goal of our study was to assess the impact of the Chemical Thinking curriculum on first-year general chemistry students' normative or alternative conceptions about relative acidity using the ACIDI concept inventory. Each inventory item was classified into one of two modes of reasoning (based on instruction) that students were expected to utilise (i.e., inductive effects or resonance stabilisation on the conjugate base), which groups items by the primary determinant of relative acidity (Fig. 1). Following formal instruction on this topic, student scores on ACIDI significantly improved from pre-test to post-test (t = 7.95, p < 0.001) with a pre-test mean of 2.59 ± 1.50 and a post-test mean of 3.74 ± 1.52 and a medium to large effect size (Cohen's d) of 0.76 (Fig. 1) (Cohen, 1988). Delayed post-test scores were not statistically significant from post-test scores, indicating that students had retained their new conceptions two and a half months after instruction. On two of the nine items (i.e., one two-tiered question), students performed significantly worse (Q2: t = 6.12, p < 0.001; Q3: t = 4.52, p < 0.001) on the post-test than the pre-test.
image file: c7rp00154a-f1.tif
Fig. 1 Student pre-, post-, and delayed post-test scores on ACIDI. Items are grouped by the mode of reasoning students were expected to use to answer the question. Npre/post = 213, Ndelayed[thin space (1/6-em)]post = 117. Asterisks (** and ***) represent statistically significant differences (p < 0.01. and 0.001, respectively).

To investigate whether students were aware of what they understood well or poorly, mean percent confidence quotients (CDQ) were calculated for each item, as previously reported (Cartrette and Mayo, 2011) (Table 2). Briefly, the CDQ quantifies the standardised difference in confidence between those who answered an item correctly versus incorrectly. A negative CDQ is an indication that students are not aware of what they do not understand while positive values indicate that students are aware of what they understand. Unlike students in the McClary and Bretz (2012) field test sample, students in this course who responded correctly were more confident on average, with positive CDQ scores for six of the nine items. Interestingly, all five items prompting the use inductive effects to gauge relative acidity yielded positive CDQ values, the largest of which occurred for item five. Consistent with post-test performance (Fig. 1), items two and three yielded negative CDQ values, which indicated that students were not aware of their misconceptions concerning the compounds in this set.

Table 2 Item-level confidence data for the ACIDI post-test. Total mean confidence (top row) is further broken down into means for students who responded to the item correctly or incorrectly. Mean percent confidence quotient (CDQ) = [(mean % confidence – correct) − (mean % confidence – incorrect)]/(item standard deviation in confidence)
Resonance Inductive effect
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9
Total mean % confidence 68.36 68.7 70.28 63.96 63.54 60.22 62.30 59.55 51.90
Mean % confidence-correct 66.43 63.75 61.25 66.76 64.07 63.61 64.21 63.00 57.36
Mean % confidence-incorrect 69.05 69.46 71.43 61.11 50.5 58.49 61.18 51.07 47.63
Mean % confidence quotient −0.12 −0.26 −0.46 +0.26 +0.55 +0.23 +0.13 +0.48 +0.39


However, the explanation and confidence tiers on ACIDI did not fully capture the intricacies of students' reasoning and possible misconceptions behind their responses. To more thoroughly investigate how students reasoned through each question, twelve of the nineteen recorded participants were interviewed regarding their ACIDI responses.

Inductive effects

Seven of the twelve students interviewed correctly explained how their knowledge of inductive effects was used to respond to two-tiered items 8 and 9, on which students in the course made the largest gains (Fig. 1).

I think [phenol]'s definitely going to be the stronger of the two compounds as far as its acidic nature just because in [phenol] you have exactly the same compoundand then in [p-methylphenol] instead of just that you have the methyl group added to it. Again, that's going to kind of help donate and push electron density up towards the oxygen and hydrogen which are bonded together, which is going to again make it less likely to be deprotonated. (Michael)

The only difference is the methyl group. And then I thinkI think carbons are supposed to be electroelectron donating groups so since there's another carbon in there it wouldn't help stabilise the conjugate base. (Sara)

Students seemed to recognise the impact of induction by neighbouring functional groups on the relative stability of the conjugate bases of organic acids, and used their knowledge of this topic to rank and justify the relative acidities of p-methylphenol and phenol (Table 1) in these items. These quotes suggest that seven of the twelve students interviewed were able to rationalise that the methyl group of p-methylphenol would destabilise the negative charge in its conjugate base relative to phenolate ion from the deprotonation of phenol, consistent with the approach presented by the Chemical Thinking curriculum.

On items 5 and 6, interviews helped identify an alternative conception held by seven of the twelve interviewed students.

because the O is closer to the H so the hydrogen being closer to the O makes it more acidic. (Carlos)

because I figured that the hydrogen would be closer and more likely to receive the inductive effect of oxygen. (Edwin)

When comparing the relative acidities of acetone and acetaldehyde (Table 1) these students assumed that the explicitly-drawn hydrogen of the aldehyde was the most acidic hydrogen, which impacted the way they appealed to the inductive effect to support their answer. While the approach of examining the inductive impact of neighbouring groups was employed in a manner consistent with Chemical Thinking, interviews revealed an alternative conception, where the idea that the aldehyde hydrogen would be deprotonated (instead of one of the α-hydrogens) misguided their reasoning.

When asked to justify why p-nitrophenol was the strongest acid among the substituted phenols (Table 1, Set 3), six of the twelve interviewed students echoed the sentiments of the following quotes.

I said that the presence of the NO 2 will make [p-nitrophenol] the most acidic out of the three compounds because it has that electron pulling effect that perhaps would be better at stabilising a negative charge that would be present in the conjugate base. (Ron)

the electronegativity from the N would help draw electrons toward it so it would help stabilise the conjugate base, more or less. (Ralph)

Students made statements suggesting that the electron-withdrawing effect of the nitro group in p-nitrophenol in stabilising the charge in the conjugate base would yield the most stable conjugate of the three species. Their unfamiliarity with the Lewis structure of the nitro group (a topic that was not explicitly covered in our course) may have prevented them from identifying resonance stabilisation as an additional influencing factor.

Overall, interviews with students regarding items 5–9 revealed that at least half of the students interviewed reasoned through these items using the normative conceptions presented in Chemical Thinking.


An analysis of interview transcripts also revealed details about students' conceptions of resonance as it relates to acid strength. In item 4, students were asked to identify why pentane-2,4-dione is more acidic than either acetone or acetaldehyde. McClary and Bretz (2012) found that students' incorrect responses to this item were a consequence of an alternative conceptions that the presence of certain functional groups (e.g., COOH) determines acid strength. While three of the twelve students interviewed held this misconception, six students made statements like the following:

Just saying it has two carbonyl functional groups doesn't really say anything, it just says “oh that's there” but it's not really an explanation as why is more acidic. It doesn’t have anything to do with stability or resonance, which would justify it being more acidic. (Ralph)

I could tell that just because something's a carboxylic acid or like an oxyacid per matter doesn't necessarily justify why it's more acidic than something else(Jim)

Half of the students interviewed clearly stated that the presence of particular functional groups in general was not a sufficient justification for increased acidity and, instead, justified this ranking as follows:

because it has the resonance to balance the conjugate base and make it more stable, which would add to the acidity of the acid. (Carlos)

So hydrogen is just bonded to carbon in all of these so the only one with resonancewell [pentane-2,4-dione] had the most resonance structures so I figured that it was the most likely to be the strongest. (Edwin)

Carlos noted that if pentane-2,4-dione was the most acidic, it must have the most stable conjugate base due to increased stabilisation of the base from the resonance structures available. Explanations of this kind are in keeping with the scientific principles presented in the course curriculum. Edwin had a similar, but more fragmented understanding of resonance, suggesting that the acid (not the conjugate base) was stabilized by resonance.

However, students performed worse on items 2 and 3 (where students were asked to rank the relative acidities of pentane-2,4-dione and phenol) and justify their response following instruction. When asked to explain their reasoning, the following ideas were present in the comments made by five of the twelve interviewed students:

So in terms of looking at [pentane-2,4-dione] and [phenol], cause that's what you have to compare, [phenol] has a benzene ring as opposed to [pentane-2,4-dione] so it would be able to stabilise its own charge more and have more resonance structures, as opposed to what I tried drawing for [pentane-2,4-dione]. So I felt like [phenol] had more resonance structures and it was more stable and more likely to act like an acid Instead of [pentane-2,4-dione]. (Ralph)

So if you were to take off the hydrogen of the alcohol group, the negative charge of [phenol] is better distributed throughout the benzene ring than it could be throughout whatever those are- carbonyl groups. So there's more resonance structures, yeah. I was like [expletive] certain on that one. (Libby)

Students seemed to rely on the quantity of resonance structures as a determinant of acid strength to explain the significant drop in scores on two-tiered items 2 and 3. Unlike the outcome in McClary and Bretz's study, many of our students did not exclusively associate the benzene ring of phenol with resonance. Five of the twelve students recognised that resonance structures were available to both compounds. The conjugate base of pentane-2,4-dione has three contributing resonance structures, where the negative charge of the conjugate base is distributed over the alpha carbon and each of the carbonyl oxygens. Phenolate has four contributing resonance structures, where the negative charge of the phenoxide anion is distributed over the oxygen and the carbons on in the aromatic ring. Students generally recognised the contributing resonance structures for the conjugate bases of both compounds, but prioritised the quantity of available resonance structures over their stability (i.e., which atoms would bear the delocalised charge). These nuances of resonance stabilisation were not explicitly addressed in our course, but interview data revealed ways in which instruction might be improved to address this alternative conception in future iterations.

Table 3 summarises the most significant distractor responses (item choice) and alternative conceptions (italics) that were identified from interviews and quantified from concept inventory scores and confidence intervals. For several items, a significant alternative conception was identified even after instruction. Our findings indicate that students generally understood the impact of induction on the relative stability of conjugate bases better than that of resonance when determining acid strength. The most prevalent alternative conceptions held (i.e., impact of quantity vs. stability on resonance stabilisation, misidentifying the most acidic proton in acetaldehyde) were often not explained by simple heuristics unlike those reported in the previous study of this assessment (McClary and Bretz, 2012). This suggests that instruction may have played a role in helping students think more critically about these topics.

Table 3 Most prevalent incorrect responses to Acid 1 items and the underlying alternative conceptions (italics). Incorrect responses represent those chosen by a percentage of students 10% greater than that due to random chance (i.e., 35% for items with four response options or 60% for items with two response options). The strength of alternative conceptions were deemed spurious (<50%) or genuine (>50%) based on the mean confidence of student who selected that option
Item Specific alternative conception Frequency/strength
Pre-test Post-test
1 Acetic acid is more acidic than pentane-2,4-dione and phenol because it is a carboxylic acid. Functional groups primarily determine relative acidity. 47.7% (genuine) 61.2% (genuine)
2 Phenol is more acidic than pentane-2,4-dione. The quantity of available resonance structures determines relative acidity. 58.9% (spurious) 84.6% (genuine)
3 Phenol is more acidic than pentane-2,4-dione because benzene better stabilizes the conjugate base than the carbonyl groups of pentane-2,3-dione. The quantity of available resonance structures determines relative acidity. 48.1% (spurious) 76.6% (genuine)
6 Acetaldehyde is more acidic than acetone because acetaldehyde has a hydrogen atom instead of another methyl group. The aldehyde hydrogen is the acidic hydrogen and it more strongly experiences induction from the carbonyl than the methyl hydrogen of acetone. 52.8% (spurious) 57.5% (genuine)

Item and argumentation analysis

At the conclusion of each interview, students were asked “Were there any specific aspects of the course that helped you come to a different understanding from pre-test to post-test?” In response to this question, seven of the twelve interviewed students stated that workshops (i.e., cooperative learning environments) were particularly helpful.

I think the place where I really learned where to find right and wrong in terms of answers is probably [the small-group] workshop. Because I know there were a lot of times when we were doing quizzes, me and my group, we would have an answer and have the best explanation for it but then somebody else would say another an answer and give another explanation for it and it kind of showed us a different way of thinking of it.” (Carlos)

I would say, like, through lecture and during the workshop itself. Because my two group members usually disagreed with each other. So they had different reasoning.” (Sofia)

Students seemed to appreciate the role of workshop in exposing them to different modes of reasoning that arose from discussions with their group members. We therefore sought to investigate what role the quality of student arguments in the cooperative learning section of the course may have played in forming or altering their related conceptions (Boller et al., 1990; Osborne et al., 2004; Sandoval and Millwood, 2005). Fig. 2 shows post-test item analysis for students in each of the recorded groups. While the overall concept inventory data resembles that of the entire population of students, we noticed several interesting variations between the responses across groups. First, it was worth noting that of the fewer than 10% of students (16 students out of 217) responded correctly to both items two and three of the inventory. Intriguingly, two of those students happened to be in the same workshop group (Ron and Jim, Group 2) and reported confidence levels above 50% (darker green shading). Additionally, two of the students in Group 1 (Jay and Ralph) responded incorrectly to both of these items, but were among the most confident in their responses relative to students in the other groups (darker red shading). Additionally, Amira seemed much less confident than the others in her group, as indicated by the relatively lighter shading across her inventory responses. Overall, item analysis of student responses across groups led us to more closely examine the arguments of Group 1 and Group 2 to assess whether the quality of their arguments may have impacted their conceptions on these particular inventory items.

image file: c7rp00154a-f2.tif
Fig. 2 Item-level ACIDI post-test data for students in the recorded groups. Green and red colouring indicate correct or incorrect responses to items, respectively. Within each colour, relative shading indicates reported confidence levels from 0% (light) to 100% (dark). Dashes indicate missing confidence tier data and have been shaded as 50% confidence.

To investigate the differences between these groups, we analysed recorded student–student conversations from the cooperative learning portion of the course. We sought to assess the quality of student arguments and whether students appealed to the course content to support their assertions. Toulmin's (1958) argumentation scheme was used to code discussions. As previously mentioned, the analytical framework identifies the basis of an argument to include claims supported by data, which can be linked through warrants. Rebuttals are made to challenge assertions with opposing claims (see Appendix Table 5). Erduran's et al. (2004) established rubric for rating the level of arguments based on aspects of Toulmin's model was then employed to assess the quality of argumentation for each item discussed. Two representative conversations coded using Toulmin's model and the prompt eliciting these discussions are shown in Fig. 3. While the prompt requires students to determine relative basicity instead of acidity, students were expected to evaluate similar structural factors (i.e., induction by neighbouring functional groups, available resonance structures, sterics) when making their predictions.

image file: c7rp00154a-f3.tif
Fig. 3 Representative conversations from two recorded student groups in response the indicated prompt. Scientific arguments were coded using Toulmin's model of argumentation (red).

In Group 1, Ralph, Jay, and Amira (who is present, but does not participate in the conversation) have a discussion led primarily by Jay around the basicity of the nitrogen atoms shown in Fig. 3. The conversation begins with Jay and Ralph co-constructing an incorrect claim about relative basicity (“to act as a base it needs to be able to accept electrons”). Despite agreeing with this incorrect claim, Jay proceeds to think out loud about the impact of induction and resonance on the basicity of each nitrogen in the molecule. He claims that N1 cannot be the most basic due to the inductive effect of the nearby carbonyl (he does not mention the additional impact of resonance here). He recognises the influence of resonance structures on basicity, but has a flawed understanding of resonance in his claim (“not really near any resonance structures”). He uses induction to decide on N2 due to the presence of “electron donating” alkyl groups. Overall, the conversation is dominated by Jay who relies primarily on claims and data to make his case. The lack of warrants and rebuttals from the other group members is indicative of a Level 2 argument (Erduran et al., 2004), and may suggest that the chain of reasoning from claims to data is faulty or only partially understood.

In Group 2, Sara, Ron, and Jim have an exchange where all three students participate in a conversation that briefly focuses on inductive effects before shifting to a discussion about resonance (Fig. 3). Sara claims that “you don’t want to have the inductive effect” in her argument for why N1 is not the most basic nitrogen, a claim that is left unchallenged by her groupmates and does not include the additional impact of resonance. In addition to making many claims, the students offer several pieces of data linked to their claims by warrants. A rebuttal is also made when Sara and Ron quickly correct Jim about the protonation state of the conjugate acid. Unlike the first group, Ron, Sara, and Jim take the time to discuss what the resonance structures of the conjugate might look like. They attempt to work through the specifics of “moving around electrons” and the consequences of doing so on the charge of the atoms. However, they ultimately ask for assistance from their TA when they are unable to rationalise the effect of resonance in their discussion. Though the students were unable to arrive at a conclusion before seeking assistance, they co-constructed a Level 5 argument, which included claims made by each student, several uses of data and warrants, and two rebuttals.

Across recorded groups, arguments about the impact of induction were minimal, though students seemed to understand inductive effects reasonably well (Fig. 1). Students spent most of their time reasoning through resonance effects, possibly because this topic is more nuanced (McClary and Talanquer, 2011b) and less familiar. While neither group was able to demonstrate a robust understanding of resonance effects on base strength, Ron and Jim (Group 2) seemed fairly confident in their correct responses to aligned inventory items two and three, perhaps owing to the more substantive arguments in their group about the related prompt. These higher quality arguments may have provided the cognitive conflict necessary for them to recognise their incomplete conceptions of resonance effects and further examine the topic outside of the workshop setting.

Discussion and implications

Overall, students in this study exhibited long-term retention of conceptual gains and primarily positive CDQ values on ACIDI following instruction, and avoided several of the misconceptions exhibited by students in the McClary and Bretz study (2012). Students in this study performed as well as students who had already taken one semester of organic chemistry, suggesting that this topic may be introduced earlier in the sequence of curricular topics. Students displayed a deeper understanding of the role of induction on conjugate base stability and relative acid strength than they did resonance. Even in the case of acetaldehyde, where 58% of interviewees incorrectly identified the acidic proton, students still correctly employed reasoning about the impact of relative distance on inductive effects. The nuances of resonance stabilisation were likely an area of confusion for students, who generally emphasised the quantity of resonance structures over their stability (McClary and Talanquer, 2011b). Student conversations from the workshop portion of the course yielded insights about how students came to their conceptions about these topics. Representative conversations and item-level data for individual students suggested that the quality of student arguments may have impacted their conceptual understanding.

Students' concept inventory scores even after instruction averaged less than 50%, though their grades on course exams averaged 80–85%. These data suggest some areas of improvement of our instructional approach. Particularly, it was clear that the impact of resonance on relative acidity/basicity was not thoroughly addressed or assessed in our course. While an understanding of inductive effects on relative acidity may be achieved through more simplistic evaluations of structural features, the nuances of resonance stabilization are difficult to evaluate without an ability to explicitly identify contributing resonance structures. It is possible that students found it difficult to draw individual resonance forms that would have more accurately informed their choices. Therefore, efforts to more intentionally promote the development of this critical skill might improve student outcomes in future iterations of the course.

The workshop prompts themselves (Fig. 3) seemed to be useful for promoting student argumentation about the topic, though some groups engaged in higher quality argumentation than others. Strategies for improving student engagement and confidence in these settings (for students like Amira) may be worth considering in future implementations (Michael, 2006; Hermann, 2013). While cooperative learning settings did seem to encourage quality argumentation among some students, formal instruction on the core components of a strong argument may have improved their observed quality.

Additionally, this work has implications for the broader chemistry education community. Comparing individual item responses to the quality of student arguments suggests that these conversations may have played a role in encouraging students to move beyond heuristic reasoning and think more deeply about these topics. Future investigations of the specific aspects of student discussion that may have promoted or disrupted their conceptual understanding may be helpful toward improving the effects of cooperative learning on student performance. Examining the impact of various facilitation strategies on the quality of student conversations and performance may also yield insights about how facilitators may impact student learning. An evaluation of this discussion-oriented curriculum on other challenging topics (e.g., redox chemistry, kinetics) may help more thoroughly establish its success and areas of improvement across content areas. Finally, implementing curricula and instructional strategies that improve student reasoning and performance may reduce attrition rates in chemistry and motivate students to pursue more advanced coursework in the field.


Threats to internal validity of single group pre-/post-test experimental designs have some bearing on our findings (Marsden and Torgerson, 2012). History, regression, and instrumentation threats are not applicable to this study. Maturation threat (i.e., the impact of the passage of time on subjects’ performance), which is typically an issue in long-term studies, is likely minimal over the three-week interval between pre-test and post-test. Testing threat (i.e., pre-test may sensitise participants in ways that impact their post-test results) may have some impact on our results. Retention in student scores across the ten-week interval between the post-test and delayed post-test may alleviate this concern somewhat. Experimental mortality (i.e., differential loss of participants across groups) is also minimally concerning, as the deadline for dropping the course had passed prior to the administration of the pre-test. Additionally, several aspects of this work limit its generalisability across contexts. Students in this study were advanced first-year students in the United States and do not fully represent all US general chemistry students or the international population of general chemistry students. The impact of instruction was examined in the context of acid–base chemistry, and may yield different insights if applied to other content areas. The small samples size of recorded/interviewed are worth noting. Furthermore, discussion-oriented instruction may be more challenging for certain student groups (e.g., English-language learners, first-generation students) than others. Studies that evaluate the impact of these types of instruction on particular groups may help improve future implementations.

Conflicts of interest

There are no conflicts to declare.


Tables 4, 5 and Fig. 4.
Table 4 Analytical framework used to assess argument quality. Adapted from Erduran et al. (2004) by permission of John Wiley and Sons (license #4205970158426)
Analytical framework used for assessing the quality of argumentation
Level 1 Level 1 argumentation consists of arguments that are simple claim versus a counter-claim or a claim versus a claim.
Level 2 Level 2 argumentation has arguments consisting of a claim versus a claim with either data, warrants, or backings but do not contain any rebuttals.
Level 3 Level 3 argumentation has arguments with a series of claims or counter-claims with either data, warrants, or backings with the occasional weak rebuttal.
Level 4 Level 4 argumentation shows arguments with a claim with a clearly identifiable rebuttal. Such an argument may have several claims and counter-claims.
Level 5 Level 5 argumentation displays and extended argument with more than one rebuttal.

Table 5 Student argument analysis codebook
Code Description Example
Claim A chemical assertion that is made by at least one student S: Yeah so we’re looking at pKb's and we know that guanine is the weakest and cytosine is the strongest.
Data A statement in which a student appeals to reliable, chemical evidence (e.g., discussion worksheets, lecture notes, periodic table, statements made by facilitators). S1: It has no oxygen to pull away and destabilize…
S2: And it also has two methyl groups surrounding it.
Warrant A statement made by at least one student which directly links a stated claim to supporting data. S3:…Wait which one is bigger, oxygen or sulfur?
S2: They’re like the same group. Let's look up the periodic table…So sulfur is bigger.
S3: So sulfur is larger and sulfur should be better at stabilizing.
Rebuttal A statement of opposition that is made by at least one student to a previously stated claim made by another student. S2: Yeah so. Let's say the proton to the proximity to the hydroxyl…
S1: (interrupts) Hydroxide.
S2: No. Hydroxide is the ion. Hydroxyl is the functional group.

image file: c7rp00154a-f4.tif
Fig. 4 Sample lecture slide.

Interview protocol

For each of the nine items:

(1) Read the question and explain what you thought it was asking.

(2) Explain why you chose the answer that you did.

(3) Explain why the remaining choices are incorrect.

For the inventory overall:

(4) Were there any words in the questions/choices that you did not understand?

(5) Where there any specific aspects of the course that helped you come to a different understanding from pre-test to post-test?

Additional group conversations around prompt in Fig. 3

Group 3 (Level 4 argument)

Barry: (reading the prompt) Identify the most basic…

Ankit: What's basic?

Barry: Trying to accept the hydrogen [claim]

Carlos: That means everything around it has to be really electronegative [claim]

Barry: Wait, why?

Carlos: Because if this is really electronegative that means it will attract the hydrogen [warrant]

Barry: Oh ok.

Carlos: Yeah.

Barry: Weren’t we just looking for most electronegative for acids and now we’re looking for the same thing for base…[claim]

Ankit: well I understand what you’re saying but then at the same time, the electronegative atoms would pull the electrons away [rebuttal].

Carlos: When acids like the…one side's positive and the other side…cuz [sic] the electrons moving toward the electronegative part, it becomes really negative so… [warrant] I think this one, you just have to find the most electronegative place…[claim] like for this it's here, (points to N1) here (points to N2), and there (points to N3) [data].

Group 4 (Level 4 argument)

Edwin: (reads prompt)…basic because they contain nitrogen? Wait why?

Samantha: It's because there are a lone pair of electrons [data] and that makes them partially negative [warrant].

Edwin: More partially negative? Is that correct? (shows paper)

Samantha: Yeah. Here it is [data].

George: This is the (audio overlaps). Well, they (TAs) were talking about like…uh, like…um… how the electron donating groups [data].

Samantha: So that means that this one would probably be basic [claim]and I’m gonna go with this one because that one has carbons on both sides but this one only has the methyl group on one end and also it has that thing (points to screen) [data]

George: So wouldn’t it be this one? (points to screen) [rebuttal]. Because this one has the electron withdrawing? [data]

Samantha: But we’re looking at the weakest acid. [rebuttal]

George: Oh the weakest that's right.

Samantha: So I think it's that one. [claim]

George: So yeah. Yeah yeah.


The authors acknowledge Dr Vicente Talanquer, Dr John Pollard, and Dr Kimberly Linenberger Cortes for their valuable guidance and feedback.


  1. Apugliese A. and Lewis S. E., (2017), Impact of instructional decisions on the effectiveness of cooperative learning in chemistry through meta-analysis, Chem. Educ. Res. Pract., 18(1), 271–278.
  2. Arjoon J. A., Xu X. and Lewis J. E., (2013), Understanding the state of the art for measurement in chemistry education research: examining the psychometric evidence, J. Chem. Educ., 90(5), 536–545.
  3. Bardar E. M., Prather E. E., Brecher K. and Slater T. F., (2007), Development and validation of the light and spectroscopy concept inventory, Astron. Educ. Rev., 5(2), 103–113.
  4. Bhattacharyya G., (2006), Practitioner development in organic chemistry: how graduate students conceptualize organic acids, Chem. Educ. Res. Pract., 7(4), 240–247.
  5. Boller G. W., Swasy J. L. and Munch J. M., (1990), Conceptualizing argument quality via argument structure, Adv. Consum. Res., 17, 321–328.
  6. Brandriet A. R., Xu X., Bretz S. L. and Lewis J. E., (2011), Diagnosing changes in attitude in first-year college chemistry students with a shortened version of Bauer's semantic differential, Chem. Educ. Res. Pract., 12(2), 271–278.
  7. Bretz S. L., (2014), Tools of Chemistry Education Research, pp. 155–168.
  8. Bretz S. L. and McClary L., (2014), Students' understandings of acid strength: how meaningful is reliability when measuring alternative conceptions? J. Chem. Educ., 92(2), 212–219.
  9. Brown W. H., Foote C. S., Iverson B. L. and Anslyn E., (2009), Organic chemistry, Brooks Cole Cengage Learning.
  10. Brown P., Sawyer K., Frey R. and Gealy D., (2010), Presented in part at the Learning in the disciplines: proceedings of the 9th international conference of the learning sciences, Chicago, IL, June 29th–July 2nd, 2010.
  11. Caleon I. and Subramaniam R., (2010), Development and application of a three-tier diagnostic test to assess secondary students' understanding of waves, Int. J. Sci. Educ., 32(7), 939–961.
  12. Çalik M., Ayas A. and Coll R. K., (2007), Enhancing pre-service elementary teachers' conceptual understanding of solution chemistry with conceptual change text, Int. J. Sci. Math. Educ., 5(1), 1–28.
  13. Çalık M., Ayas A. and Ebenezer J. V., (2009), Analogical reasoning for understanding solution rates: students' conceptual change and chemical explanations, Res. Sci. Technol. Educ., 27(3), 283–308.
  14. Carmines E. G. and Zeller R. A., (1979), Reliability and validity assessment, Sage Publications.
  15. Cartrette D. P. and Mayo P. M., (2011), Students' understanding of acids/bases in organic chemistry contexts, Chem. Educ. Res. Pract., 12(1), 29–39.
  16. Cazden C. B. and Beck S. W., (2003), in Handbook of discourse processes, ed. Graesser A. C., Gernsbacher M. A. and Goldman S. R., Mahwah, New Jersey: Lawrence Erlbaum Associates, ch. 5, pp. 165–197.
  17. Chase A., Pakhira D. and Stains M., (2013), Implementing process-oriented, guided-inquiry learning for the first time: adaptations and short-term impacts on students' attitude and performance, J. Chem. Educ., 90(4), 409–416.
  18. Cho K. L. and Jonassen D. H., (2002), The effects of argumentation scaffolds on argumentation and problem solving, Educ. Technol. Res. Dev., 50(3), 5–22.
  19. Clement J., (2008), in International handbook of research on conceptual change, ed. Vosniadou S., pp. 417–452.
  20. Cohen J., (1988), Statistical power analysis for the behavioral sciences, New York: Lawrence Erlbaum Associates.
  21. Cole R., Becker N., Towns M., Sweeney G., Wawro M. and Rasmussen C., (2012), Adapting a methodology from mathematics education research to chemistry education research: documenting collective activity, Int. J. Sci. Math. Educ., 10(1), 193–211.
  22. Cooper M. M., (1995), Cooperative learning: an approach for large enrollment courses, J. Chem. Educ., 72(2), 162–164.
  23. Cooper M. M., Kouyoumdjian H. and Underwood S. M., (2016), Investigating students' reasoning about acid-base reactions, J. Chem. Educ., 93(10), 1703–1712.
  24. Coştu B., Ayas A. and Niaz M., (2010), Promoting conceptual change in first year students' understanding of evaporation, Chem. Educ. Res. Pract., 11(1), 5–16.
  25. Cronbach L. J., (1951), Coefficient alpha and the internal structure of tests, Psychometrika, 16(3), 297–334.
  26. Demerouti M., Kousathana M. and Tsaparlis G., (2004a), Acid—base equilibria, part I: upper secondary students' misconceptions and difficulties, Chem. Educ., 9(2), 122–131.
  27. Demerouti M., Koushathana M. and Tsaparlis G., (2004b), Acid–base equilibria, part II: effect of developmental level and disembedding ability on students' conceptual understanding and problem-solving ability, Chem. Educ., 9(2), 132–137.
  28. Driver R., Newton P. and Osborne J., (2000), Establishing the norms of scientific argumentation in classrooms, Sci. Educ., 84(3), 287–312.
  29. Duit R. and Treagust D. F., (1995), in Improving science education, ed. Fraser B. J. and Walberg H. J., University of Chicago, ch. 3, pp. 46–69.
  30. Duit R. H. and Treagust D. F., (2012), Issues and challenges in science education research, Springer, pp. 43–54.
  31. Erduran S., Simon S. and Osborne J., (2004), TAPping into argumentation: developments in the application of Toulmin's argument pattern for studying science discourse, Sci. Educ., 88(6), 915–933.
  32. Evans D. L., Gray G. L., Krause S., Martin J., Midkiff C., Notaros B. M., Pavelich M., Rancour D., Reed-Rhoads T., Steif P., Streveler R. and Wage K., (2003), Progress on concept inventory assessment tools, Westminster, Colorado.
  33. Ferguson R. and Bodner G. M., (2008), Making sense of the arrow-pushing formalism among chemistry majors enrolled in organic chemistry, Chem. Educ. Res. Pract., 9(2), 102–113.
  34. Fonseca B. A. and Chi M. T. H., (2011), in Handbook of research on learning and instruction, ed. Mayer R. E. and Alexander P. A., New York: Routledge, ch. 15, pp. 296–321.
  35. Furió-Más C., Luisa Calatayud M., Guisasola J. and Furió-Gómez C., (2005), How are the concepts and theories of acid-base reactions presented? Chemistry in textbooks and as presented by teachers, Int. J. Sci. Educ., 27(11), 1337–1358.
  36. Greene J. A., Azevedo R. and Torney-Purta J., (2008), Modeling epistemic and ontological cognition: philosophical perspectives and methodological directions, Educ. Psychol., 43(3), 142–160.
  37. Grove N. P. and Bretz S. L., (2012), A continuum of learning: from rote memorization to meaningful learning in organic chemistry, Chem. Educ. Res. Pract., 13(3), 201–208.
  38. Grove N. P., Hershberger J. W. and Bretz S. L., (2008), Impact of a spiral organic curriculum on student attrition and learning, Chem. Educ. Res. Pract., 9(2), 157–162.
  39. Gupta M. L., (2004), Enhancing student performance through cooperative learning in physical sciences, Assess. Eval. High. Educ., 29(1), 63–73.
  40. Hameed H., Hackling M. W. and Garnett P. J., (1993), Facilitating conceptual change in chemical equilibrium using a CAI strategy, Int. J. Sci. Educ., 15(2), 221–230.
  41. Hammann M., Reiss M., Boulter C. and Tunnicliffe S. D., (2008), Biology in context: learning and teaching for the twenty-first century: a selection of papers presented at the 6th Conference of European Researchers in Didactics of Biology, London: UCL Institute of Education.
  42. Hein S. M., (2012), Positive impacts using POGIL in organic chemistry, J. Chem. Educ., 89(7), 860–864.
  43. Hermann K. J., (2013), The impact of cooperative learning on student engagement: results from an intervention, Act. Learn. High. Educ., 14(3), 175–187.
  44. Inagaki K. and Hatano G., (2008), in International handbook of research on conceptual change, ed. Vosniadou S., New York: Routledge, pp. 240–262.
  45. Kelly G. J., Druker S. and Chen C., (1998), Students' reasoning about electricity: combining performance assessments with argumentation analysis, Int. J. Sci. Educ., 20(7), 849–871.
  46. Kenyon L. and Reiser B. J., (2006), Presented in part at the American Educational Research Association, San Francisco, CA.
  47. Kittleson J. M. and Southerland S. A., (2004), The role of discourse in group knowledge construction: a case study of engineering students, J. Res. Sci. Teach., 41(3), 267–293.
  48. Kulatunga U., Moog R. S. and Lewis J. E., (2013), Argumentation and participation patterns in general chemistry peer-led sessions, J. Res. Sci. Teach., 50(10), 1207–1231.
  49. Kulatunga U., Moog R. S. and Lewis J. E., (2014), Use of Toulmin's argumentation scheme for student discourse to gain insight about guided inquiry activities in college chemistry, J. Coll. Sci. Teach., 43(5), 78–86.
  50. Leach J. and Scott P., (2002), Designing and evaluating science teaching sequences: an approach drawing upon the concept of learning demand and a social constructivist perspective on learning, Stud. Sci. Educ., 38(1), 115–142.
  51. Lewis S. E. and Lewis J. E., (2005), Departing from lectures: an evaluation of a peer-led guided inquiry alternative, J. Chem. Educ., 82(1), 135.
  52. Limón M., (2001), On the cognitive conflict as an instructional strategy for conceptual change: a critical appraisal, Learn. Instr., 11(4), 357–380.
  53. Luxford C. J. and Bretz S. L., (2014), Development of the bonding representations inventory to identify student misconceptions about covalent and ionic bonding representations, J. Chem. Educ., 91(3), 312–320.
  54. Marsden E. and Torgerson C. J., (2012), Single group, pre-and post-test research designs: some methodological concerns, Oxf. Rev. Educ., 38(5), 583–616.
  55. McClary L. M. and Bretz S. L., (2012), Development and assessment of a diagnostic tool to identify organic chemistry students' alternative conceptions related to acid strength, Int. J. Sci. Educ., 34(15), 2317–2341.
  56. McClary L. and Talanquer V., (2011a), Heuristic reasoning in chemistry: making decisions about acid strength, Int. J. Sci. Educ., 33(10), 1433–1454.
  57. McClary L. and Talanquer V., (2011b), College chemistry students' mental models of acids and acid strength, J. Res. Sci. Teach., 48(4), 396–413.
  58. Michael J., (2006), Where's the evidence that active learning works? Adv. Physiol. Educ., 30(4), 159–167.
  59. Moog R. S., Spencer J. N. and Straumanis A. R., (2006), Process-oriented guided inquiry learning: POGIL and the POGIL project, Metrop. Univ., 17(4), 41–52.
  60. Moon A., Stanford C., Cole R. and Towns M., (2016), The nature of students' chemical reasoning employed in scientific argumentation in physical chemistry, Chem. Educ. Res. Pract., 17(2), 353–364.
  61. Mulford D. R. and Robinson W. R., (2002), An inventory for alternate conceptions among first-semester general chemistry students, J. Chem. Educ., 79(6), 739.
  62. Nakhleh M. B. and Krajcik J. S., (1994), Influence of levels of information as presented by different technologies on students' understanding of acid, base, and pH concepts, J. Res. Sci. Teach., 31(10), 1077–1096.
  63. Nystrand M., Gamoran A., Kachur R. and Prendergast C., (1997), Opening dialogue, New York: Teachers College.
  64. O'loughlin M., (1992), Rethinking science education: beyond Piagetian constructivism toward a sociocultural model of teaching and learning, J. Res. Sci. Teach., 29(8), 791–820.
  65. Osborne J., Erduran S. and Simon S., (2004), Enhancing the quality of argumentation in school science, J. Res. Sci. Teach., 41(10), 994.
  66. Potvin P., (2017), The coexistence claim and its possible implications for success in teaching for conceptual change, Eur. J. Sci. Math. Educ., 5(1), 55–66.
  67. Potvin P., Sauriol É. and Riopel M., (2015), Experimental evidence of the superiority of the prevalence model of conceptual change over the classical models and repetition, J. Res. Sci. Teach., 52(8), 1082–1108.
  68. Rapp D. N., (2014), Processing inaccurate information: theoretical and applied perspectives from cognitive science and the educational sciences, Cambridge, MA: MIT Press.
  69. Rushton G. T., Hardy R. C., Gwaltney K. P. and Lewis S. E., (2008), Alternative conceptions of organic chemistry topics among fourth year chemistry students, Chem. Educ. Res. Pract., 9(2), 122–130.
  70. Sandoval W. A. and Millwood K. A., (2005), The quality of students' use of evidence in written scientific explanations, Cogn. Instr., 23(1), 23–55.
  71. Sawyer K., Frey R. and Brown P., (2013), in Productive multivocality in the analysis of group interactions, ed. Suthers D. D., Lund K., Rosé C. P., Teplovs C. and Law N., Springer, pp. 191–204.
  72. Sevian H. and Talanquer V., (2014), Rethinking chemistry: a learning progression on chemical thinking, Chem. Educ. Res. Pract., 15(1), 10–23.
  73. Stains M. and Vickrey T., (2017), Fidelity of implementation: an overlooked yet critical construct to establish effectiveness of evidence-based instructional practices, CBE Life Sci. Educ., 16(1), rm1–11.
  74. Stoyanovich C., Gandhi A. and Flynn A. B., (2014), Acid–base learning outcomes for students in an introductory organic chemistry course, J. Chem. Educ., 92(2), 220–229.
  75. Talanquer V. and Pollard J., (2010), Let's teach how we think instead of what we know, Chem. Educ. Res. Pract., 11(2), 74–83.
  76. Teasley S. D., (1997), Discourse, tools and reasoning, Springer, pp. 361–384.
  77. Toulmin S. E., (1958), The uses of argument, Cambridge University.
  78. Towns M. and Kraft A., (2011), Review and synthesis of research in chemical education from 2000–2010, in Second Committee Meeting on the Status, Contributions, and Future Directions of Discipline-Based Education Research, available:
  79. Voss J. F., Tyler S. W. and Yengo L. A., (1983), Individual Differences in Cognition, vol. 1, pp. 205–233.
  80. Vygotsky L. S., (1980), Mind in society: the development of higher psychological processes, Harvard University.
  81. Vygotsky L. S., (1997), The collected works of L. S. Vygotsky: problems of the theory and history of psychology, Springer Science & Business Media.
  82. Warfa A.-R. M., (2015), Using cooperative learning teach chemistry: a meta-analytic review, J. Chem. Educ., 93(2), 248–255.
  83. Warfa A. R. M., Roehrig G. H., Schneider J. L. and Nyachawaya J., (2014), Role of teacher-initiated discourses in students' development of representational fluency in chemistry: a case study, J. Chem. Educ., 91(6), 784–792.
  84. Wells G., (2007), Semiotic mediation, dialogue and the construction of knowledge, Hum. Dev., 50(5), 244–274.

This journal is © The Royal Society of Chemistry 2018