Closing the gap of organic chemistry students’ performance with an adaptive scaffold for argumentation patterns

Leonie Sabine Lieber a, Krenare Ibraj a, Ira Caspari-Gnann b and Nicole Graulich *a
aJustus-Liebig University Giessen, Institute of Chemistry Education, Heinrich-Buff-Ring 17, 35392 Giessen, Germany. E-mail: Nicole.graulich@dc.jlug.de
bDepartment of Chemistry, Tufts University, 62 Talbot Ave, Medford, MA 02155, USA

Received 10th January 2022 , Accepted 30th May 2022

First published on 31st May 2022


Abstract

Building reasonable scientific arguments is a fundamental skill students need to participate in scientific discussions. In organic chemistry, students’ argumentation and reasoning skills on reaction mechanisms are described as indicators of success. However, students often experience challenges with how to structure their arguments, use scientific principles appropriately and engage in multivariate, instead of one-reason decision-making. Since every student experiences their individual challenges with a multitude of expectations, we hypothesise that students would benefit from scaffolding that is adapted to their needs. In the present study, we investigated how 64 chemistry students interacted with an adaptive scaffold that offered different ways of support based on students’ strengths and limitations with structural and conceptual aspects that are needed to build a scientific argument in organic chemistry. Based on the students’ performance in a diagnostic scaffold in which they were asked to judge the plausibility of alternative organic reaction pathways by building arguments, the students were assigned to one of four support groups that received a scaffold adapted to their respective needs. Comparing students’ performance in the diagnostic and adapted scaffolds allows us to determine quantitatively (1) to what extent the adaptive scaffold closes the gap in students’ performance and (2) whether an adaptive scaffold improves the students’ performance in their respective area of support (argumentation and/or concept knowledge). The results of this study indicate that the adaptive scaffold can adaptively advance organic chemistry students’ argumentation patterns.


Introduction

The overall goal of chemistry education is to enable students to participate in scientific discourse (McGinn and Roth, 1999; Stowe et al., 2021) by providing them with the tools to think critically, explain phenomena, and build reasoned arguments (Driver et al., 1994). However, the way how we want students to build arguments often differs from how students actually build, use, and understand argumentation (McNeill and Krajcik, 2012). The most common observation is that students experience difficulties supporting their claim with evidence and justifying it with reasoning when building arguments (Deng and Flynn, 2021; Petritis et al., 2021; Lieber and Graulich, 2022), as students often do not know what counts as evidence or reasoning (Sadler, 2004) or tend to limit their argumentation to evidence and overlook reasoning. Moreover, students experience challenges with using scientific rules and principles when connecting evidence and reasoning (McNeill and Krajcik, 2012; Walker et al., 2019). As a result, students may rely on personal views such as gut feelings (Hogan and Maglienti, 2001) as it avoids uncertainties with scientific principles, among other reasons. In addition to the reported challenges with structuring an argument, applying conceptual understanding when building arguments can also be challenging. Difficulties with conceptual knowledge might hinder building chemically knowledgeable and multivariate arguments (Cruz-Ramirez de Arellano and Towns, 2014; Moon et al., 2016; Pabuccu and Erduran, 2017; Deng and Flynn, 2021; Lieber and Graulich, 2022). Thereby, students make use of single concepts to justify their claim, even when it is necessary to include multiple concepts in their decision-making (McNeill et al., 2006).

Ongoing research on student argumentation clearly shows that the reported difficulties are either caused by (1) missing knowledge on how to structure an argument or missing activation of that knowledge for the problem at hand or (2) missing conceptual understanding or its application required for the argument or (3) both reasons. Given that prior knowledge has a major impact on students’ performance, students need to receive adapted support to build upon their strengths and limitations with argumentation, and the level of conceptual understanding they bring into the classroom (Chen, 2014). Scaffolds tailored to students’ needs may support them adaptively to solve a task on their own and by purposefully guiding and slowing down specific aspects of the argumentation process. It may, thus, support the students to direct their focus to the expected structure of an argument or to consider conceptual understanding that they may not have activated without a scaffold (Wood et al., 1976). Especially when building arguments for multivariate mechanistic tasks, slowing down the reasoning process with a scaffold has been shown to assist learners in first collecting multiple relevant chemical concepts and weighing them afterward before making a decision (Caspari and Graulich, 2019; Flynn, 2021; Watts et al., 2021). McNeill et al. (2006) emphasised that scaffolding as a flexible process should not be rigid. Instead, scaffolding should be adjusted to students’ needs.

Previous research on argumentation in chemistry demonstrated that (1) students experience challenges with building sound arguments, (2) students experience difficulties using appropriate scientific principles, and (3) scaffolds are powerful tools to address students’ needs. However, adaptive scaffolds designed to close this performance gap are still limited (Chen, 2014).

In this study, students received scaffolded training consisting of two consecutive parts (i.e., two different data points) which we refer to as an adaptive scaffold. In the first part of the adaptive scaffold, which we refer to as ‘diagnostic scaffold’ students received practice and support in building arguments and using concept knowledge. Students’ answers in this diagnostic scaffold served as a diagnosis for the second part of the adaptive scaffold in which students received one of four scaffolds. These adapted scaffolds addressed the previously mentioned difficulties, resulting in four adapted scaffolds for argumentation patterns, the use of concept knowledge, argumentation patterns and the use of concept knowledge, and students with no apparent difficulties.

Therefore, this quantitative study reports on the effectiveness of the adaptive scaffold (i.e., a combination of a diagnostic scaffold and four adapted scaffolds), designed as an online learning environment to adaptively scaffold students based on their performance of building arguments for alternative organic reaction pathways. In this manner, we investigated the extent to which the adaptive scaffold closes the gap in organic chemistry students’ performance and whether the adaptive scaffold improved students’ performance.

Theoretical framework

Claim-evidence-reasoning (CER) argumentation model

Toulmin (2003) postulated that an argument consists of six elements: a claim that is supported by data, a warrant that bridges the gap between data and claim, and a backing for the warrant. The argument also includes a qualifier and a rebuttal that weaken the claim by questioning its necessity and whether there are exceptions to the rule (Toulmin, 2003). However, the model is often too complex to introduce an argument structure to students, e.g., distinguishing between data, warrant, and backing is challenging (Erduran, 2007). Moreover, both students and teachers reported that some elements of Toulmin's argumentation model cause ambiguity (Lazarou and Erduran, 2020). Toulmin already stated that not every argument necessarily has to consist of all six components. Instead, he emphasised that the core of an argument only includes claim, data, and warrant (Toulmin, 2003). For this reason, a simplified argumentation model, the CER model, is frequently used, which consists of three components: claim, evidence and reasoning (McNeill et al., 2006; McNeill and Krajcik, 2012). Both argumentation models have in common that a claim is supported by evidence and is justified with reasoning. However, the CER claim is similar to Toulmin's claim, CER evidence is similar to Toulmin's data, and CER reasoning is similar to Toulmin's warrant and backing (Toulmin, 2003; McNeill et al., 2006; Osborne and Patterson, 2011). The key aspect of making a claim is an assertion or statement to solve a problem or question (McNeill et al., 2006; McNeill and Krajcik, 2012; Van Eemeren et al., 2014). A claim is always in doubt and therefore cannot stand alone. Thus, the claim must be advanced by known data (Osborne and Patterson, 2011). This results in the second element of evidence which is based on scientific data that support the claim and that the scientific data have to be appropriate and sufficient (McNeill et al., 2006; McNeill and Krajcik, 2012). Moreover, a claim can be supported with multiple pieces of evidence since complex problems often require multivariate argumentation (McNeill and Krajcik, 2012). The third element is reasoning which acts as a bridge between claim and evidence. Reasoning answers the question as to why pieces of evidence support a claim, which requires the use of scientific principles (McNeill et al., 2006). All elements are equally important as an argument is only complete when all three elements are present, i.e., when a claim is strengthened through evidence and justified by reasoning (McNeill and Krajcik, 2012).

Scaffolding

Scaffolding is widely used in science education (Belland, 2017; Yuriev et al., 2017; Fan et al., 2020; Luo et al., 2020; Flynn, 2021). It refers to multiple ways of assistance that support students’ learning and reasoning including modelling or targeted prompting (Van de Pol et al., 2010; Kang et al., 2014). Scaffolding has been defined by Wood et al. as a process that “enables a child or a novice to solve a problem, carry out a task or achieve a goal which would be beyond his unassisted efforts” (Wood et al., 1976, p. 90). Vygotsky (1980) built upon this idea when conceptualising the zone of proximal development, which describes that learners should, ideally, work in the zone of proximal development in which they are not able to solve a task on their own but with assisted guidance. A scaffold in this sense serves as an enabler for learners to complete tasks that they cannot solve alone (Wood et al., 1976; Pea, 2004; Lajoie, 2005; Kang et al., 2014). However, if a scaffold provides too much information, or structures the problem-solving process to a very large extent, it may no longer be as effective as students do not feel challenged (McNeill et al., 2006). As a result, the support provided is strongly dependent on the individual (Van de Pol et al., 2010). This arises the need for adaptive scaffolding as we acknowledge that the students’ prior knowledge and former experience often differ significantly (Vygotsky, 1980; Pea, 2004). This is supported by a scaffold's core criterion of contingency including a tailored and differentiated support (Van de Pol et al., 2010). Thus, an adaptive scaffold can be supportive for students’ self-regulated learning since the students’ individual needs are integrated in the design of the scaffold (Azevedo et al., 2004).

As scaffolding is a temporary process assisting students if they need support, it is also important to fade out the given support when it is no longer needed (Lajoie, 2005; McNeill et al., 2006). However, appropriate fading of support depends on the tasks’ complexity and students’ progress (Kang et al., 2014). Fading too early may have adverse effects as students might not have yet fully understood certain concepts or activities (Noroozi et al., 2017).

Scaffolding argumentation

Scaffolding can take place in a wide variety of disciplines and is also finding increasing use as a support for argumentation. In a meta-analysis of argumentation in computer-supported collaborative learning, Wecker and Fischer (2014) revealed that the occurrence of specific functional components (i.e., whether and how often claim-evidence-reasoning occur in learning material) has a significant medium to large effect (Cohen's d = 0.72) on argumentation (Wecker and Fischer, 2014).

In addition to the benefits of using argument components, building strong arguments also involves the appropriate use of scientific concepts (Sandoval and Millwood, 2005; Choi et al., 2013; Lieber and Graulich, 2022). Therefore, CER scaffolding does not only provide support for the structure of arguments but can also be enhanced with the incorporation of concepts (McNeill et al., 2006; Songer and Gotwals, 2012). The interplay of concept knowledge and argumentation was demonstrated by Songer and Gotwals (2012) as students’ conceptual understanding increased by using CER scaffolding in a pre-post intervention. In addition to the main components of CER scaffolds such as argumentation and concept knowledge, the type of scaffold should also be considered.

Kang et al. (2014) suggested that four of the six types of instructional scaffolding (originally analysed for English language learners (Walqui, 2006)) fulfil different functions in the construction of evidence-based explanations in a scientific context. These different types of scaffolds can be combined to be beneficial for the students. The first type is instructional modelling, which gives students clear examples of what is expected from them. This is especially important when implementing a new principle or task. Accordingly, students need to see in advance what the finished product should look like (Walqui, 2006; Kang et al., 2014). In terms of argumentation and concept knowledge, this can be achieved by providing students with examples that illustrate the structure of an argument or presenting arguments that connect concept knowledge. The second type is bridging which functions as a link between existing and new knowledge (Walqui, 2006). This can be accomplished in scaffolds by asking students targeted questions that both activate their prior knowledge and link it directly to new content. The third type is contextualising, and this refers to using language in appropriate context as academic language is not only different from everyday language but often also intangible. For example, pictures or films can be used to support contextualisation (Walqui, 2006). When implementing this type of scaffold in argumentation, the visualisation of problem contexts or the illustration of the argument structure (in partial steps) is beneficial. The last type of scaffolding is developing metacognition, which fosters students’ ability to evaluate and reflect on their current state of knowledge (Walqui, 2006). This function of a scaffold can be realised by prompting students to assess the difficulty and confidence, but also evaluating tasks and whether the students need help in certain areas of the task.

Research questions

Since building well-grounded arguments with the use of appropriate scientific principles is an essential skill for organic chemistry, our goal was to design an adaptive scaffold that supports students in argumentation based on their needs, i.e., argumentation and/or concept knowledge support. Therefore, students were assigned to one of four groups of an adapted scaffold based on their performance on argumentation and concept knowledge in a diagnostic scaffold. This resulted in the overarching question whether the adaptive scaffold addresses students’ experience of difficulties and closes the performance gap in a pre-post comparison. This overarching question was divided into two main research questions.

(1) To what extent does the adaptive scaffold close the gap in students’ performance?

(a) To what extent do the group performances differ after the diagnostic scaffold based on scoring argumentation and concept knowledge?

(b) Do the group performances converge after the adapted scaffolds?

(2) Does the adaptive scaffold improve students’ performance in the respective area of support (argumentation and/or concept knowledge)?

To answer these two research questions, a quantitative analysis based on students’ answers of the diagnostic scaffold and adapted scaffolds was conducted.

Methods

Participants and study setting

The study was conducted at a private, liberal arts, research-intensive university in the North-eastern United States in April and May 2021. Sixty-four students were recruited voluntarily from the Organic Chemistry II course via an announcement in the lecture. Students received extra credit for their participation in the study, which was 1% of their final course grade. Organic Chemistry I is a prerequisite course for taking Organic Chemistry II. The Organic Chemistry I and II courses cover topics such as structure and properties of molecules, nucleophilic substitutions, carboxylic acids and their derivates, conjugated π-systems, retrosynthetic analyses and synthetic strategies. It is assumed that students are familiar and knowledgeable with these organic chemistry topics as they were discussed in class before the study was conducted. The students enrolled in the course were between 18–22 years of age and were majoring in chemistry, biochemistry, chemical engineering, and biology, among others. Among the students that participated in this study, 34, 29, and 1 student identified themselves as female, male, and non-binary, respectively, of which 56.3%, 31.3%, 4.7%, 4.7%, 1.6%, and 1.6% identified as white, Asian, Black, more than one race, Latino/a/x, and other races, respectively. All students created a user code as a pseudonym. The pseudonyms assigned to participants for the study as well as given names in this manuscript do not reflect their race, ethnicity, gender, or other identities.

Research instrument

To investigate how adaptively scaffolding students’ argumentation patterns and concept knowledge affects students’ argumentation skills in organic chemistry, we designed a sequence of two online scaffolds using Qualtrics, with each scaffold consisting of multiple parts (see ABC in Fig. 1). Students had 48 hours to complete each of the scaffolds with a completion time of approximately 60 minutes. The goal of the diagnostic scaffold was to establish a baseline of students’ argumentation skills and their use of concept knowledge. Based on how the students performed in the diagnostic scaffold, they received one out of four versions of an adapted scaffold that focuses on supporting them in the area in which they had experienced the greatest difficulties in the diagnostic scaffold. Based on our previous study (Lieber and Graulich, 2022), we assumed that the students either experience challenges with (1) the appropriate use of argument components (claim, evidence, reasoning), (2) the appropriate use of chemical concepts in their argumentation, (3) with both, the appropriate use of argumentation components and chemical concepts, or (4) the ability to build multivariate arguments. The tasks underlying the scaffolds followed our previously published task design (Lieber and Graulich, 2020; Lieber and Graulich, 2022), in which students are prompted to judge the plausibility of alternative reaction pathways, which are shown in Fig. 2. Exemplary arguments for the eight alternative reaction pathways are shown in Appendix 1.
image file: d2rp00016d-f1.tif
Fig. 1 Illustration of the areas of support for the four adapted scaffold groups. Dependent on their scoring after the diagnostic scaffold, the groups received an adapted scaffold for argumentation (ArgS group in yellow), use of concept knowledge (ConS group in blue), argumentation and use of concept knowledge (ArgConS in green), or multivariate reasoning (ReaS group in purple).

image file: d2rp00016d-f2.tif
Fig. 2 Students judged the plausibility of four alternative reaction products for the reaction of 4-chlorobutanol and hydroxide in the diagnostic scaffold and for the reaction of methyl acetate with diisopropylamide in the adapted scaffolds. The correct reaction products are highlighted in green. There are two correct products for each reaction because the left green molecules are the precursors of the correct products.
Diagnostic scaffold. The diagnostic scaffold used was the same for all students. It served as a pre-measure to determine the differences and difficulties that students have with argumentation on the plausibility of alternative organic reaction pathways. First, students were asked to predict the product of the reaction of 4-chlorobutanol with hydroxide and to justify their decision (see Fig. 2). This was followed by a task on general argumentation patterns. Here, students received the input that an argument consists of three basic components (claim, evidence, and reasoning), what function each component has, and a concrete example of how to use them in the building of arguments (see Step A, Fig. 1). It was also explicitly discussed that a claim can be supported with several pieces of evidence and that a piece of evidence can be justified with several reasoning statements. Following this explanation, students completed three tasks in which they were given a science-related argument and had to assign the basic components (claim, evidence, reasoning) to the appropriate sentence component. Students always had the option to indicate that they do not know which basic component can be assigned to which sentence component (see Appendix 2). After the task, students received sample solutions with accompanying explanations. The next step was the activation of chemical concepts related to the reaction of 4-chlorobutanol with hydroxide, where students were given directed questions covering different general chemical concepts, e.g., to decide whether steric aspects need to be considered in the reaction or to determine at which positions the molecules involved react as a nucleophile/an electrophile/an acid/a base (see Step B, Fig. 1 and Appendix 3). The questions were chosen to apply to any type of alternative reaction pathway students will encounter. After the argumentation pattern task and the activation of concept knowledge, students judged the plausibility of four alternative reaction products for the reaction of 4-chlorobutanol and hydroxide (see Step C, Fig. 1 and Appendix 4). In the first step, they stated their claim by deciding whether they think the reaction product shown is plausible or implausible. In a second step, they had to support the given claim with evidence and to justify it with reasoning. After building arguments for four alternative reaction products, the students had the opportunity to revise or defend their initial claim by indicating whether they still choose the reaction product they formed at the beginning of the diagnostic scaffold or whether they choose one or more of the alternative reaction products. The diagnostic scaffold ended with evaluation questions related to the argumentation task and the questions on chemical concepts.
Adapted scaffolds. The adapted scaffolds are comparable in style to the diagnostic scaffold. All four adapted scaffolds began with students predicting the product of the reaction of methyl acetate with diisopropylamide and justifying their decision (see Fig. 2). To repeat the structure of an argument, students received an argumentation task with new science-related arguments. Students were then given the same questions on chemical concepts as in the diagnostic scaffold, but answered them on the reaction of methyl acetate with diisopropylamide. They were also given four alternative reaction products for which they formed a claim, evidence, and reasoning. Lastly, students indicated whether they wanted to keep or revise their formed reaction product.

As described shortly above, the four adapted scaffold groups differed with regard to the additional support provided to the respective group of students.


Group 1. Argumentation support (ArgS). The ArgS group received additional support for structuring and building arguments, i.e., the three components claim, evidence, and reasoning. The students were given eight different science-related arguments, divided into sentence sequences, for which they had to decide whether it was a claim, evidence or reasoning. Moreover, by building arguments by themselves when judging the plausibility of alternative reaction products, a definition of the argument components (claim, evidence, and reasoning) was added at the positions when their formation was required.
Group 2. Concept knowledge support (ConS). The ConS group obtained additional support in using appropriate chemical concepts when building arguments. The targeted questions on chemical concepts prior to the building of arguments remained unchanged, as students were first asked to activate their prior knowledge to link it to the reaction in the task. When building their arguments for the alternative reaction products, students received author-generated answers to the previous concept questions but had to interpret them by themselves, e.g., they received the pKa values of the molecules but no information about the interpretation of pKa values.
Group 3. Argumentation + concept knowledge support (ArgConS). The ArgConS group received both, additional support for the structure of arguments from the ArgS group and additional support for the use of chemical concepts from the ConS group.
Group 4. Multivariate reasoning support (ReaS). Since the ReaS group already had built appropriate arguments including correct chemical concepts, the students did neither receive additional argumentation patterns nor concept knowledge support. Instead, the students were additionally encouraged to build multiple reasoning statements to justify their claims. Therefore, in the initial argumentation pattern task, they were given three sample science-related arguments that contained multiple reasoning statements. Moreover, when students had built their arguments for the alternative reaction products, the ReaS group received three additional blank slots to encourage them to use further reasoning statements for evidence in their argumentation while the other groups only needed to provide one reasoning statement.

Data collection

The diagnostic scaffold took place in the 10th week and the adapted scaffolds were provided in the 13th week of the 14 week long semester. All data collection procedures received Institutional Review Board approval for human subjects research (STUDY00001480). Students gave informed consent for their participation in the study. The collected data contain student answers to demographic questions as well as students’ responses of the diagnostic scaffold and adapted scaffold.

Data analysis

All steps of the data analysis were discussed multiple times with the co-authors and the research group. To answer the research questions, the data analysis proceeded in three consecutive steps.
Step I: scoring students’ answers to determine their performance in the diagnostic scaffold. In our previous study it was assumed that students either experienced difficulties with building arguments, using concept knowledge, or argumentation building and using concept knowledge or experienced no challenges (Lieber and Graulich, 2022). In order to analyse students’ performance in the present study, the first step was to determine an argumentation score and a concept knowledge score for each student from their answers with the diagnostic scaffold.

The task considered for the argumentation score included all of the three argumentation pattern tasks (see Appendix 2), in which sentence sequences of an argument had to be assigned to the three components, claim, evidence, and reasoning, and the building of autonomous arguments in relation to the alternative reaction products (see Appendix 4). Students could receive up to three points on each of the three argumentation pattern tasks. One point each was awarded when all sentence sequences representing either claim, evidence or reasoning were labelled correctly. In the second part, in which students had built arguments for the alternative reaction products, four points were awarded for each alternative reaction product, i.e., two points for the pieces of evidence and two points for the reasoning statements. There were no points awarded for the claim since the students had a given claim for which they only had to decide whether the alternative reaction product was plausible or implausible. For the evidence and reasoning statements, attention was paid to whether they met the following requirements. Chemical correctness was disregarded at this point. An evidence statement was considered correct when the statement relates to the claim. Thereby, the statement does not have to refer to an explicit characteristic of the reaction given but answers the question why the alternative reaction product is (im)plausible. In addition, the statement must be objective, based on data, and an explanation rather than a description. A student example which meets the criteria of a piece of evidence is “the ester has acidic alpha protons”. This built evidence supports and refers to the claim (“The reaction product is plausible”) and answers the why-question because it is an objective statement, which is based on data. In our context, data refers to structural characteristics of the molecule as well as implicit properties, e.g., acidity, as no experimental data are given. A student example which does not meet the criteria of a piece of evidence is “in the reaction above, what is shown is a SN2 reaction”. This self-declared piece of evidence does not support the claim in answering the why-question but this statement could serve as a claim itself. However, the claims (i.e., if plausible or implausible) are given, thus, they are not requested to build, but only to choose a claim by themselves. A reasoning statement was considered correct when a justification was provided as to why the evidence fits the claim. Reasoning must be objective, logical, and based on scientific principles. Thereby, it was not important whether the scientific principles were chemically correct but if they considered scientific principles in general. A student example which meets the criteria of a reasoning statement is “these protons are acidic because the enolate conjugate can be resonance stabilized”. The student applied scientific principles to justify as to why the evidence fits the claim and the statement is objective. A student example which does not meet the criteria of a reasoning statement is “based on the assumption of the previous argument, the deprotonation of OH would not occur”. In this statement, no scientific principles are applied. Moreover, this statement does not serve as a justification but seems more like a conclusion. All evidence and reasoning statements were evaluated according to the criteria mentioned above. Two points (and thus full points) were awarded when all statements were formed correctly. One point was awarded when at least 50% of the statements were correct and zero points when less than 50% of the statements were correct. In total, the argumentation score consisted of 25 points.

The concept knowledge score consists of both the answers to the questions on chemical concepts (Step B in Fig. 1) and the application of concept knowledge in building autonomous arguments (see Appendix 4). In the first part, students were given questions on different chemical concepts (see Appendix 3). While for most questions, students could receive a maximum of two points, they could receive a maximum of three points for the question on electronic effects as this question covered more than one concept a student could consider. In the second part, the use of chemical concepts in building arguments was assessed. Two points could be obtained for each alternative reaction product. The correct structural formation of evidence and reasoning was not scored. Zero points were given if no chemical concepts were used. One point was awarded if at least 50% of the concepts were incorrect and two points were awarded if more than 50% of concepts were used correctly. In total, the concept knowledge score consisted of 29 points.

Step II: grouping into the four adapted scaffold groups. To divide the students into one of the four adapted scaffolding groups (support for argumentation (ArgS), support for concept knowledge (ConS), support for argumentation and concept knowledge (ArgConS), and support for multivariate reasoning (ReaS)), the argumentation score and the concept knowledge score were determined for each student. Here, a qualitative discussion within the research team took place for each student. We chose to use a percentage of the total score as a threshold. The percentage is 65% of the total score as it corresponds to the passing D grade in many US universities. Students who did not reach the threshold score received further support in argumentation patterns and/or the use of concept knowledge in argumentation. The threshold for the argumentation and concept knowledge score corresponds to 16 points (of 25 points) for the argumentation score and 18 points (of 29 points) for the concept knowledge score. These scores were continuously assessed that each student received the support that corresponded to their performance. Fig. 3 illustrates the grouping of the four adapted scaffold groups. Students were assigned to the ArgS group when the argumentation score was less than 16 and the concept knowledge score was at least 18. The ConS group included students whose argumentation score was at least 16 and the concept knowledge score was less than 18. When the argumentation score was less than 16 and the concept knowledge score was less than 18 a student was assigned to the ArgConS group. Students whose scores exceeded the threshold of 16 (argumentation score) and 18 (concept knowledge score) were dedicated to the ReaS group.
image file: d2rp00016d-f3.tif
Fig. 3 Grouping of the four adapted scaffold groups.
Step III: quantitative analysis of students’ scores of the diagnostic scaffold and adapted scaffolds. Statistical measurement methods were used to answer the research questions and were conducted using the software R. Therefore, students’ answers of the adapted scaffolds were scored in the same way as the answers of the diagnostic scaffold. The numbers and types of arguments in the task on argumentation patterns (see Step A, Fig. 1) differed in the adapted scaffolds. Therefore, we decided to only use students’ arguments built for the alternative reaction pathways (see Step C, Fig. 1 and Appendix 4) to determine the argumentation score. Thus, the argumentation score of the diagnostic scaffold was also adapted by only using students’ answers of Step C to be able to compare the argumentation score of the diagnostic scaffold and the adapted scaffolds. As a result, the new maximum argumentation score for the diagnostic scaffold and the adapted scaffolds consisted of 16 points, whereas the maximum concept knowledge score remained 29 points.

First, a visual inspection of the normal distribution of students’ argumentation and concept knowledge scores was performed, which was supported by a Shapiro–Wilk test. This analysis revealed that the data were not normally distributed, which indicated the use of non-parametric tests. For all measurements, an α-level of 0.05 was used.

To determine to what extent the group performances differ after the diagnostic scaffold and whether the group performances converge after the adapted scaffolds, a Kruskal–Wallis test and subsequent post hoc comparisons with Wilcoxon rank-sum tests and Bonferroni-adjusted p-values were performed (Field et al., 2012). The Kruskal–Wallis test as the non-parametric counterpart of the one-way ANOVA was chosen to compare more than two independent samples of different sample sizes. As the Kruskal–Wallis test only indicates whether groups are significantly different, subsequent post hoc comparisons are necessary to identify these groups. In case of significant results in the post hoc comparisons, the correlation coefficient r as a measure of effect size was calculated from the conversion of the z-score (Rosenthal, 1991). The correlation coefficient r was defined as 0.10 ≤ r ≤ 0.30 as small effect, 0.30 ≤ r ≤ 0.50 as medium, and r ≥ 0.50 as large (Cohen, 1992).

To determine whether the adaptive scaffold improved students’ performance in the areas of support, a Wilcoxon signed-rank test with Bonferroni-adjusted p-values was performed. The Wilcoxon signed-rank test as the non-parametric counterpart of the dependent t-test was chosen since we wanted to compare two dependent samples, i.e., pre-post comparisons of the same group. In the case of significant results, the correlation coefficient r was reported as the effect size.

Results and discussion

RQ1: To what extent does the adaptive scaffold close the gap in students’ performance?

To determine the extent to which an adaptive scaffold closes the gap in students’ performance, a Kruskal–Wallis test was performed. The analysis showed whether the groups significantly differ after the diagnostic scaffold, which serves as a pre-measure, to verify the qualitative grouping of the students. In case of significant results, subsequent post hoc comparisons with Wilcoxon rank-sum tests and Bonferroni-adjusted p-values were performed. The statistical results are summarised in Table 1.
Table 1 Results of the post hoc comparisons with Bonferroni-adjusted p-values of the Kruskal–Wallis test of the argumentation score and concept knowledge score after the diagnostic scaffold. The correlation coefficient r is reported as effect size in case of significant p-values. Significant results are highlighted in bold
Comparisons Diagnostic scaffold argumentation score Diagnostic scaffold concept knowledge score
M first[thin space (1/6-em)]group M second[thin space (1/6-em)]group p r M first[thin space (1/6-em)]group M second[thin space (1/6-em)]group p r
ArgS vs. ConS 10 13 0.001 0.58 21 16 <0.001 0.79
ArgS vs. ArgConS 10 8.5 0.056 21 13.5 <0.001 0.79
ArgS vs. ReaS 10 14 <0.001 0.68 21 21 >0.999
ConS vs. ArgConS 13 8.5 <0.001 0.78 16 13.5 0.003 0.51
ConS vs. ReaS 13 14 >0.999 16 21 <0.001 0.78
ArgConS vs. ReaS 8.5 14 <0.001 0.77 13.5 21 <0.001 0.77


After the diagnostic scaffold (pre-measure), the groups differed significantly in terms of their argumentation score H(3) = 43.97, p = <0.001. Post hoc comparisons revealed significant differences with large effects in four of six group comparisons, indicated with black lines on the left side in Fig. 4. However, the ArgS group (yellow) and the ArgConS group (green) as well as the ConS group (blue) and the ReaS group (purple) did not vary significantly in the pre-measure. This sheds light on the appropriateness of the qualitative grouping since in the pre-measure, both groups (ArgS and ArgConS) with students’ argumentation scores below the threshold of 16, differed significantly with a large effect from the two groups (ConS and ReaS) in which students’ argumentation score was above the threshold and, thus, from those groups who will not receive additional argumentation support. Moreover, the pre-measure of the ArgS and ArgConS groups, who will receive an adapted scaffold for argumentation patterns, did not vary significantly from each other. The ConS and ReaS groups also did not differ significantly after the diagnostic scaffold and will consequently not receive an adapted scaffold for argumentation patterns. Fig. 4 illustrates the non-significant comparisons after the diagnostic scaffold (pre-measure) with dashed black lines. These results indicate that the qualitative grouping was successful for the argumentation score because the ArgS and ArgConS groups, who will receive an adapted scaffold for argumentation patterns, differed significantly from the two groups, ConS and ReaS groups, not receiving additional support for argumentation patterns.


image file: d2rp00016d-f4.tif
Fig. 4 Differences between the four groups after the diagnostic scaffold (shown in black lines) and after the adapted scaffold (shown in grey lines). In case of non-significant differences, the black and grey lines are dashed. Groups who received an adapted scaffold in the respective area are highlighted with a grey background circle. Significance levels of the group comparisons are indicated (*p < 0.05; **p < 0.01; ***p < 0.001).

After the diagnostic scaffold (pre-measure), the groups were found to differ significantly in terms of their concept knowledge score H(3) = 50.92, p = <0.001. Subsequent post hoc comparisons were performed and demonstrated that five of the six group comparisons differed significantly with large effects, which is illustrated with black lines on the right side of Fig. 4 and summarised in Table 1. Comparison of the concept knowledge score revealed that the two groups, ArgS and ReaS, who will subsequently not receive additional information on chemical concepts, did not differ significantly. The two groups (ConS and ArgConS), who will receive an adapted scaffold on concept knowledge, varied from each other in the pre-measure. Thereby, the ConS and ArgConS groups are significantly different with a large effect. The fact that both groups, who will receive an adapted scaffold on concept knowledge (ConS and ArgConS), differ from each other, but also that these groups differ from the two other groups (ArgS and ReaS), who will not receive an adapted scaffold on concept knowledge, revealed that the ConS and ArgConS groups received their adapted scaffold on a legitimate basis. This means that the qualitative grouping was also successful for the concept knowledge score. However, it is not surprising that the ArgConS group is significantly different from all of the other groups since the students received distinctly fewer points in the concept knowledge score. These differences in score can also be observed through qualitative observations of the students’ answers, as many questions (e.g., on nucleophilicity and electrophilicity, steric effects, or electronic effects) were either answered incorrectly or with the phrase “I don't know.”

After analysing the differences in students’ performance after the diagnostic scaffold (pre-measure), the results of the Kruskal–Wallis test and subsequent post hoc comparisons of the group performances after the adapted scaffolds (post-measure) were reported. In terms of the argumentation score, the groups did not differ significantly in the post-measure H(3) = 4.79, p = 0.188. This is illustrated with dashed grey lines in Fig. 4. Therefore, it can be assumed that after the adapted scaffolds, the groups have converged in terms of their argumentation score, which means that no significant differences were measurable. Regarding the concept knowledge score, the groups differed significantly from each other in the Kruskal–Wallis test H(3) = 10.46, p = 0.015, but in the subsequent post hoc tests with Bonferroni-adjusted p-values it became clear that only two groups differ significantly from each other after the adapted scaffolds (post-measure). The ArgConS and ReaS groups still differ from each other significantly with a medium effect after the adapted scaffolds in the concept knowledge score, which is illustrated in Fig. 4 with a grey line. All other group comparisons did not show a significant difference in post hoc comparisons, shown in Table 2. This reveals that the four groups also converged in terms of the use of concept knowledge and that the performance gap which was diagnosed beforehand is not significantly noticeable in the post-measure. This suggests overall that the adaptive scaffold supported the respective groups and closed the gap in their performance. The difference between the ArgConS and ReaS groups can be understood when considering them as two sides of a continuum. Students in the ArgConS group received the lowest scores in the concept knowledge score whereas the students of the ReaS group achieved the highest scores in this category. This is also noticeable in the pre-measure since the ConS and ArgConS groups differed significantly although both groups will receive an adapted scaffold for concept knowledge. However, these two extrema came a little closer to each other demonstrated by the comparison between the median values from the diagnostic scaffold (difference of median values for the ReaS group and ArgConS group = 7.5) and the adapted scaffolds (difference of median values for the ReaS group and ArgConS group = 7).

Table 2 Results of the post hoc comparisons with Bonferroni-adjusted p-values of the Kruskal–Wallis test of the concept knowledge score after the adapted scaffolds (post-measure). The correlation coefficient r was reported as effect size in case of significant p-values. Significant results are highlighted in bold
Comparisons Adapted scaffolds concept knowledge score
M first[thin space (1/6-em)]group M second[thin space (1/6-em)]group p r
ArgS vs. ConS 20 19 >0.999
ArgS vs. ArgConS 20 18 >0.999
ArgS vs. ReaS 20 25 0.543
ConS vs. ArgConS 19 18 >0.999
ConS vs. ReaS 19 25 0.188
ArgConS vs. ReaS 18 25 0.013 0.43


Overall, it became apparent that the groups who will receive additional support in the adapted scaffolds were significantly different to those who will not receive support, with a large effect in the pre-measure. Thus, this confirms that the grouping after the diagnostic scaffold was successful and revealed that a performance gap was present. After the adapted scaffolds (post-measure), the groups did not differ significantly in terms of argumentation score and concept knowledge score. The ArgConS group constitutes an exception as this group differed significantly from all the other groups regarding the concept knowledge score in the pre-measure and the ReaS group in terms of the concept knowledge score in the post-measure. This is because the ArgConS group was conceptually weaker after both the diagnostic scaffold and adapted scaffold compared to all three other groups.

RQ2: Does an adaptive scaffold improve students’ performance in the respective area of support (argumentation and/or concept knowledge)?

To compare the performance of each group in terms of argumentation score and concept knowledge score, a Wilcoxon signed-rank test with Bonferroni-adjusted p-values was performed. Fig. 5 summarises and compares the performance of each group in the diagnostic scaffold (pre) and adapted scaffolds (post). Detailed results are shown in Appendix 5. The following examples represent students’ original built arguments. Thereby, all students received free-text boxes labeled as evidence and reasoning, which allows students to independently decide the assignment of statements to evidence and reasoning, as well as the numbering and order of the statements (see Appendix 4, Fig. 8). Therefore, certain pieces of evidence and reasoning statements do not fulfill all required characteristics of argument components. Moreover, certain statements are not technical correct regarding scientific principles. However, these students have never built arguments before, hence, it is not surprising that they did not build perfect arguments. Instead, they improved their argumentation during the adaptive scaffold.
image file: d2rp00016d-f5.tif
Fig. 5 Argumentation scores and concept knowledge scores of the four groups after the diagnostic scaffold (pre) and adapted scaffolds (post). Groups who received an adapted scaffold in the respective area are highlighted with a grey background. Horizontal stripes in box plots indicate median-values. Significance levels of the pre- and post-comparisons are indicated (NS. p > 0.05; *p < 0.05; **p < 0.01; ***p < 0.001).

The ArgS group increased significantly in the argumentation score from pre- (M = 10) to post- (M = 14.5) measure (V = 107.5, p = 0.007, r = 0.68) with a large effect. For the concept knowledge score, on the other hand, there was no significant change from pre (M = 21) to post (M = 20) (V = 47.5, p = 0.299) measurable. Therefore, the adapted scaffold on argumentation patterns supported the students significantly in building arguments but not in the use of concept knowledge. This result is encouraging as the students only received additional support on argumentation patterns. Louis, a participant in this study, serves as a student example of the ArgS group. He received 15/25 points in the argumentation score and 24/29 points in the concept knowledge score, which resulted in assigning him to the ArgS group. After receiving an adapted scaffold for building arguments, Louis received 16/16 points in the argumentation score and 27/29 in the concept knowledge score. In the diagnostic scaffold, he claimed that tetrahydrofuran (THF) is an implausible product of the reaction from 4-chlorobutanol and hydroxide. For that purpose, Louis built his argument by using the free-text boxes labeled as evidence and reasoning as follows:

Claim: The reaction product is implausible.

Evidence 1: While it is plausible that the OHwill initially deprotonate the alcohol, the molecule must contort significantly to put the alkoxide group next to the C–Cl bond.

Reasoning 1: The likelihood of getting the alkoxide group next to the C–Cl bond of the same molecule before, say, the alkoxide reacts with the C–Cl bond of a neighboring identical molecule is unlikely because the molecule would have to bend in on itself. In addition, entropics don’t favor the reaction because a ring has less entropy than a straight chain.

It is noticeable that Louis, referring to the argument structure, related the evidence to the claim and even explicitly mentioned what he claimed implausible regarding the formation of THF. However, after that, he did not clearly separate the argument components evidence and reasoning. In the first part of his reasoning statement, he described the evidence again in other words, but did not explain why “the molecule must contort significantly”. Moreover, in the second part of his reasoning statement, there is another piece of evidence and corresponding reasoning statement regarding entropy.

In the adapted scaffold, Louis then received additional argumentation support and built the argument shown below for the reaction of methyl acetate and diisopropylamide (LDA) to methyl acetoacetate via a Claisen condensation.

Claim: The reaction product is plausible.

Evidence 1: The amine is more likely to act as a base than nucleophile.

Reasoning 1: The amine is sterically bulky so it can’t approach an electrophile easily but can deprotonate a molecule.

Evidence 2: The amine is basic.

Reasoning 2: Since the amine has a negative charge and doesn’t have resonance structures it will be stabilized by receiving a proton and become neutral.

Evidence 3: The ester has acidic alpha protons.

Reasoning 3: These protons are acidic because the enolate conjugate can be resonance stabilized.

Evidence 4: The enolate can attack an ester because the carbanion is nucleophilic and the carbonyl is electrophilic.

Reasoning 4: The carbanion is nucleophilic because it is negatively charged and the carbonyl is electrophilic because the oxygen pulls electron density away.

At a first glance, it becomes clear that Louis has made a separation between evidence and reasoning. Each of his built pieces of evidence refers to the claim and answers the why-question. Furthermore, his reasoning statements consist of scientific principles that justify the evidence. By comparing these two formed arguments, Louis has improved significantly in building arguments by not only making a distinction between evidence and reasoning statements but also by building more pieces of evidence and reasoning, which gives depth to his argument. However, Louis can continue to improve in building arguments in the future. For example, he can concretize evidence 1 by already mentioning the structure of the amine. Furthermore, he can split evidence 4 and consider the nucleophilicity and electrophilicity separately from each other to address the electronic properties of the molecules more specifically in his reasoning statements.

When looking at the ConS group, the data analysis revealed a comparable trend, as in this group, there was no significant change for the argumentation score (pre-M = 13, post-M = 14), V = 30, p = 0.823. In comparison the concept knowledge score increased significantly from pre (M = 16) to post (M = 19), with a large effect (V = 82, p = 0.011, r = 0.66). This means that the adapted scaffold on the use for concept knowledge supported the students significantly for using concept knowledge but not in building argumentation patterns. This result is also encouraging as students in the ConS group only received additional support in the use of concept knowledge. Jessica, a participant in this study, serves as an example for the ConS group since she received 21/25 points in the argumentation score but 17/29 points in the concept knowledge score. After she worked with an adapted scaffold for the use of concept knowledge, she obtained 13/16 points in the argumentation score and 25/29 points in the concept knowledge score.

Jessica's example argument is also regarding the formation of THF as a reaction product from 4-chlorobutanol and hydroxide.

Claim: The reaction product is plausible.

Evidence 1: It forms stable products.

Reasoning 1: The Clis more stable than the OH, H2O is more stable than OH, and a membered ring with oxygen is relatively stable.

Evidence 2: There could be conditions where this is the most favorable reaction.

Reasoning 2: Though the diol attacking itself to form a ring is likely not the most kinetically favorable reaction, sometimes certain conditions promote reactions like that.

Jessica supported her claim with pieces of evidence by stating that the products are stable and that the reaction would be possible under certain conditions. Both pieces of evidence are rather vague and do not give a concrete indication for which conceptual reasons the formation of THF is plausible. While Jessica has not used any incorrect concepts per se, she did not elaborate on them. Instead, she justified stable products by comparing stability, but without elaborating on the reasons for stability. She also did not specify the reaction conditions. However, after getting additional information on chemical concepts in the adapted scaffold, such as the pKa values of the involved molecules or the electronegativity values, she built the following argument on the formation of methyl acetoacetate from methyl acetate and LDA.

Claim: The reaction product is plausible.

Evidence 1: The product is not entropically unfavorable.

Reasoning 1: 3 molecules become 3 molecules

Evidence 2: The negative charge on the product is stabilized by resonance.

Reasoning 2: It can put negative charge on two different oxygens (allylic to two C[double bond, length as m-dash]O bonds).

Evidence 3: The negative charge is stabilized by inductive effects.

Reasoning 3: It is allylic to two C[double bond, length as m-dash]O bonds, which are electron-withdrawing.

Evidence 4: The ester will be more likely to donate a proton.

Reasoning 4: It has a lower pKathan the amine.

Evidence 5: The final products are all stable.

Reasoning 5: Methanol and the amine are both stable as well as the ester and enolate compound.

Jessica, like Louis in the ArgS group, improved the quality of her argument with respect to the concept knowledge used. With the help of the additional concept information, which she still had to interpret herself, Jessica used concepts such as entropy, electronic effects, and acidity. In contrast to the diagnostic scaffold, she did not remain vague but used explicit scientific principles to support and justify her claim. Jessica tried to include a variety of scientific principles in building her arguments, since, for example, the consideration of entropy alone would not have been a sufficient justification. Furthermore, she did not build reasoning statements by repeating comparable statements she already used in her pieces of evidence, which is still apparent in reasoning 5. This is an evident improvement in her performance after the adapted scaffold. Thus, Jessica could continue to improve in building arguments in the future. In evidence 5 and reasoning 5, it becomes apparent that she should engage again with the concept of stability, as she is unable to provide a satisfactory justification for the stability of molecules in both the diagnostic scaffold and the adapted scaffold.

The ArgConS group, who received support in both, argumentation and concept knowledge, was able to achieve a significant increase with a large effect in both, the argumentation score (pre-M = 8.5, post-M = 13.5, V = 193.5, p = <0.001, r = 0.74) as well as in the concept knowledge score from pre- to post-testing (pre-M = 13.5, post-M = 18, V = 158.5, p = 0.011, r = 0.57). The student Mike is exemplary of the ArgConS group; he received 10/25 points in the argumentation score and 16/29 points in the concept knowledge score in the diagnostic scaffold. After working with the adaptive scaffold on building arguments and using concept knowledge, he obtained 16/16 points in the argumentation score and 23/29 points in the concept knowledge score. In Mike's example argument, he claimed that an alkoxide is a plausible product of the reaction of 4-chlorobutanol and hydroxide.

Claim: The reaction product is plausible.

Evidence 1: The oxygen is better stabilized.

Reasoning 1: On a larger molecule, the negative charge is better stabilized because it can be stabilized through resonance.

Evidence 2: The smaller molecule is more stable.

Reasoning 2: A water molecule is much more stable than a hydroxyl group.

Like most of his fellow students, Mike tried to build a reasoning statement for each piece of evidence. Evidence 1 meets all conditions of an evidence statement, such as the answer to the why question and that it is an explanation rather than a description. Reasoning 1 also formally meets the criteria, but technical deficiencies appear, for example, the alkoxide cannot be stabilized by resonance, which is a common misconception (Carle and Flynn, 2020). In the second part of the argument, Mike used the same argument as Jessica with respect to stability, as he tried to justify ‘the stability of molecules with their stability’ instead of justifying the stability with chemical concepts (see evidence 2 and reasoning 2). In reasoning statement 2, he did not provide any further information, so the justification of the evidence is not apparent. Furthermore, he remained vague in evidence 2 because without the reasoning statement it is not clear which molecule he referred to as “smaller molecule”. In the adapted scaffold, Mike worked on an adapted scaffold for building arguments and using concept knowledge, building the following argument for the formation of methyl acetoacetate from methyl acetate and LDA.

Claim: The reaction product is plausible.

Evidence 1: The reaction is entropically favored.

Reasoning 1: There are more products than reactants which leads to increasing disorder.

Evidence 2: The oxygen is stable with the negative charge.

Reasoning 2: Oxygen is electronegative and can stabilize the negative charge after the molecule is rearranged via resonance.

Evidence 3: The nitrogen is not very stable with the negative charge.

Reasoning 3: The nitrogen is not very electronegative and thus cannot stabilize the charge as well.

Evidence 4: The negatively charged product is larger.

Reasoning 4: The negative charge can be better stabilized.

Evidence 5: The ester is a better leaving group.

Reasoning 5: The negatively charged oxygen is protonated, which makes its formation more favorable.

From an argument structure point of view, Mike improved considerably. All of his built pieces of evidence supported the claim and all reasoning statements justified the evidence. He also used scientific principles as justification. From a conceptual point of view, Mike has improved, but this does not mean that all statements were technically correct. For example, he talked about the reaction being entropically favoured due to a higher number of products compared to reactants, which is incorrect because there are three molecules involved in the reaction on both the reactant and product side. Furthermore, in evidence 5, he referred to an ester as a leaving group. Here it is impossible to understand which molecule Mike was referring to as the leaving group since no ester is split off during the reaction. Nevertheless, Mike's improvement is evident across all arguments, both in terms of argument structure and the use of concept knowledge. In the future, Mike could further be supported in building arguments. Thereby, he can concretize his justification, for example by providing a counterpart in comparisons (e.g., X is a better leaving group than Y or molecule X is larger than molecule Y).

The scores in the ReaS group suggest that students in this group did not achieve a significant improvement in either the argumentation score from pre (M = 14) to post (M = 16), V = 34, p = 0.534, or the concept knowledge score from pre (M = 21) to post (M = 25), V = 51, p = 0.118, opposite to the other groups. This is not surprising because the students already received high scores in the diagnostic scaffold. Thus, the scores did not differ significantly which is a good result. Rachel is a student representative of the ReaS group. She received 23/25 points in the argumentation score and 26/29 points in the concept knowledge score in the diagnostic scaffold. For the formation of THF as a product of the reaction of 4-chlorobutanol and hydroxide, Rachel has built the following argument.

Claim: The reaction product is plausible.

Evidence 1: Intramolecular reactions are faster than intermolecular ones.

Reasoning 1: Rate is dependent on concentration of substrate. When the reactants are connected, there is essentially limitless substrate and thus this reaction can take place quite quickly.

Evidence 2: Hydroxide will deprotonate the hydroxyl.

Reasoning 2: The pKaof hydroxyl is similar to that of hydroxide and they will thus exist in a proton transfer equilibrium. When the alkoxide ion is formed it will react with the nearby electrophile.

Evidence 3: This reaction is enthalpically favorable.

Reasoning 3: Five membered rings are stable because they have optimal bond angles for sp3hybridization. This means that they are lower in energy/more stable and thus this reaction will be exothermic.

All arguments built by Rachel can be used as sample solutions for other students. She was able to separate her pieces of evidence from her reasoning statements as the pieces of evidence supported her claim and the reasoning statements justified the pieces of evidence. Moreover, Rachel used several chemical concepts such as enthalpy, kinetics, and basicity in her argumentation. After the adapted scaffold, Rachel obtained 16/16 points in the argumentation score and 28/29 points in the concept knowledge score. The students of the ReaS group were the only ones who were additionally prompted to build up to three reasoning statements for one piece of evidence. For the formation of an enolate as a product of the reaction of methyl acetate and LDA, Rachel built the following argument.

Claim: The reaction product is plausible.

Evidence 1: The amide anion is highly basic.

Reasoning 1.1: High electron density on nitrogen.

Reasoning 1.2: Nitrogen adjacent to electron donating groups.

Reasoning 1.3: Bulky groups mean it won’t act as a nucleophile.

Evidence 2: The alpha proton is slightly acidic.

Reasoning 2.1: The negative charge from removing a proton will be resonance stabilized.

Reasoning 2.2: The carbonyl contributes an inductive effect.

Evidence 3: The reaction is enthalpically favorable.

Reasoning 3.1: A weaker acid is formed than on the reactant side (protonated amide).

Reasoning 3.2: A weaker base is formed (enolate) than on the reactant side.

Reasoning 3.3: The enolate is resonance stabilized.

Rachel improved in both, her argumentation score and concept knowledge score, but already performed well in the diagnostic scaffold. It is noticeable that her arguments are at a high level. Her evidence and reasoning statements answered the why-questions, and her use of scientific principles is multivariate. By building up to three reasoning statements, Rachel became more detailed and justified her pieces of evidence from multiple perspectives. In her first reasoning statement, for example, she justified the basicity of LDA given the electronic effects of nitrogen, and the electronic and steric effects of the adjacent groups. In Rachel's case, she could be further supported to include more scientific principles in her arguments in the future, e.g., arguing with entropy or pKa values.

In summary, the adapted scaffolds improved students’ performance in the respective areas of support. The groups that received support in building arguments (ArgS and ArgConS) improved significantly with a large effect on the argumentation score and the groups. Those who received additional support in using concept knowledge (ConS and ArgConS) improved their performance significantly with a large effect on the concept knowledge score. Only the ReaS group showed no significant improvement, which is not surprising because the students of the ReaS group already had high scores in the diagnostic scaffold. Moreover, the fact that only the groups who received extra support in argumentation patterns and/or concept knowledge improved their performance in this area indicates that the improvement is not a simple training effect. Instead, the adaptive scaffold might be responsible for students’ improvement.

Conclusions and implications

In organic chemistry, the use of chemical concepts and decision-making regarding alternative reaction pathways plays a central role in determining the reaction product. Building arguments following a structured method allows the appropriate use and reasoning of chemical concepts. However, there is a current gap between what students should learn and what they learn since the focus in chemistry is often on facts and memorisation (Stowe et al., 2021), and building arguments is still rarely taught in their studies. The challenges faced by students with structuring arguments and appropriately using concept knowledge have been previously reported (Cruz-Ramirez de Arellano and Towns, 2014; Moon et al., 2016; Deng and Flynn, 2021; Lieber and Graulich, 2022).

In this study, we investigated if adapted scaffolding that provides students with support in the area of argumentation and use of concept knowledge can make a significant difference in performance. Based on a diagnostic scaffold, which served as a pre-measure to analyse how students build arguments and how they used concept knowledge, students received an argumentation and concept knowledge score. They were then assigned to one of four adapted scaffolding groups: support for argumentation (ArgS), concept knowledge (ConS), argumentation and concept knowledge (ArgConS), and multivariate reasoning (ReaS). Consequently, each group received a different scaffold tailored to their needs in argumentation and/or use of concept knowledge. An argumentation score and concept knowledge score were given to each student based on their performance in the adapted scaffolds (post-measure).

The first research question in this study, examined to what extent the support groups differed from each other in the pre- and post-measure. When evaluating students’ answers, it became apparent that the groups differed significantly from each other in the pre-measure. In particular, students grouped into the ReaS group already built well-grounded arguments after the diagnostic scaffold (pre-measure) as multiple pieces of evidence and reasoning were used as support and justification of the claim. Both evidence and reasoning statements consisted to a large extent of scientific principles and answered the why-questions. The other three groups still showed some gaps in the pre-measure in terms of argumentation and/or the use of concept knowledge. A closer look at the analysis revealed that the ArgS and the ArgConS groups each differed significantly with high effects (r = 0.58 and r = 0.78) in terms of the argumentation score from the ConS and the ReaS groups after the diagnostic scaffold. However, the ArgS and ArgConS group, as well as the ConS and the ReaS groups, did not vary significantly from each other, which exemplifies that the argumentation score determined group performance differences in building arguments but not in terms of concept knowledge. This indicated that the grouping of the students in the ArgS and the ArgConS groups in the adapted scaffolds for argumentation patterns was successful. By comparing the groups in terms of concept knowledge, the ConS and the ArgConS groups differed significantly from the ArgS and the ReaS groups with high effects (between r = 0.77 to r = 0.79) in the pre-measure. Therefore, the grouping in the adapted scaffolds for the use of concept knowledge was also successful. An exception is evident when comparing the ConS and ArgConS groups. Both groups differed significantly from each other with a high effect (r = 0.51) after the diagnostic scaffold, although both groups will receive an adapted scaffold for concept knowledge. However, this can be explained as students in the ArgConS group were conceptually weaker in comparison to all other groups. Based on this first analysis, the initial grouping was successful and the groups received the support in the respective area needed (i.e., regarding argumentation and the use of concept knowledge). The second part of the first research question investigated whether the gap that occurred between the groups at the beginning could be closed using the adaptive scaffold. No significant differences between the groups were apparent in the argumentation skill after applying the adapted scaffold. In the area of concept knowledge, a significant difference with a medium effect (r = 0.43) only occurred between the ArgConS and ReaS groups. All other group comparisons showed no significant differences. It can therefore be assumed that the adaptive scaffolds closed the gap in students’ performance. More chemical concepts were used to build arguments. The link between concept knowledge and the argument components is a key aspect in building arguments as it is considered an important part of the quality of an argument (Sandoval and Millwood, 2005; Choi et al., 2013). In this context, one might assume that argumentation and concept knowledge can be fundamentally distinguished from each other. Songer and Gotwals (2012) reported a connection between concept knowledge and argumentation, which was not found in this study. However, one should not interpret this as evidence for the interdependence of argumentation and concept knowledge. The scoring process, separating argumentation and concept knowledge, did not explicitly acknowledge this linkage since strict attention was paid to the fact that both topics were considered separately from each other in the scoring process. Thus, in building arguments, technical correctness was not considered, and in the use of concept knowledge, no attention was given to whether the argument components (evidence and reasoning) were built correctly.

In the analysis of the second research question, the scoring results of the pre-post comparisons were compared to determine a possible improvement in each support group. Here, with respect to the argumentation score, the two groups that improved significantly with a high effect (r = 0.68 and r = 0.74) were those that received additional support for argumentation (ArgS and ArgConS group). Similarly, for the concept knowledge score, only the two groups that received support for the use of concept knowledge improved significantly with a high effect (r = 0.57 and r = 0.66) (ConS and ArgConS group). These results suggest that the adaptive scaffolds targeted areas where support was needed. Thus, this study demonstrates that an adaptive scaffold improved students’ performance in the respective area of support. In organic chemistry, scientific reasoning on reaction pathways and products requires considering multiple chemical concepts in the decision-making process so that alternative reaction products and by-products can also be considered (Popova and Bretz, 2018). The implementation of this adaptive scaffold is useful in supporting students in applying the content to context, such as in suggesting alternative reaction products (Chen, 2014; Graulich and Caspari, 2021).

Implications for teaching

The results from this study suggest that the adaptive scaffold can be useful in teaching and learning organic chemistry. Thereby, teachers should consider adapting scaffolds to provide each student with the support they need. Teachers do not have to use the adaptive scaffold as a whole in their classes but can apply and adapt several parts. The scoring system does not necessarily need to be applied in class, which is the most time-consuming part of the scaffold. Thus, the possibility arises to separate tasks on argumentation patterns and the use of concept knowledge to support students first in building appropriate argument components (i.e., claim, evidence, and reasoning). When students can build arguments, the formation can be enhanced by using chemical concepts appropriately. This might be beneficial as we did not observe an improvement in students’ concept knowledge when students only received an adapted scaffold for argumentation pattern. Depending on students’ prior knowledge regarding building arguments only the activation of concept knowledge can be used and adapted to the current course topic. Thereby, building arguments can also encourage students to strengthen and connect chemical concepts by applying it in building arguments. This indicates that the adaptive scaffold can be adapted to a variety of reactions and reaction mechanisms in organic chemistry. Moreover, by expanding the argumentation pattern by building several reasoning statements per evidence, students can be encouraged to use various chemical concepts in their argumentation. In addition, building arguments with this material can also be used collaboratively. Here, students can evaluate each other's arguments in peer-review processes. Thereby, students can undertake the task of scoring their peers. As a result, students experience a change in perspective that leads to an analytical decision-making process (Milkman et al., 2009). Moreover, discussing the problem together can lead to an enhancing in understanding (Smith et al., 2009) and the inhibition to make mistakes is reduced (Coppola and Pontrello, 2014). Additionally, students have the opportunity not only to build arguments against the product in the case of implausible reaction products but also to practice building counterarguments in the group discussions.

Implications for research

This study also has implications for research. First, in a further inquiry, interviews with students could be conducted to revise and improve the scaffolds. For example, to learn in more detail how students experienced the scaffolds, their thought processes while solving the problems, or the difficulties students encountered. In addition, the learning environment could be extended and applied over a longer period of time. Fading can then play a central role in this process. Here, it might be important to adaptively fade the scaffold according to students’ learning progress, as the relevant content should be understood first (Kang et al., 2014; Noroozi et al., 2017). Fading also gives students greater responsibility for their learning over time (McNeill et al., 2006). Moreover, when the scaffolds are conducted over a longer period of time, the effectiveness of the adaptive scaffold can be further examined. This might be beneficial for an improvement in students’ concept knowledge as Songer and Gotwals (2012) revealed in their longitudinal study. Moreover, it calls us as researchers to purposefully rethink scaffolding because not all students benefit equally from a scaffold as they experience a variety of individual challenges (Caspari and Graulich, 2019; Petritis et al., 2022). Thus, to provide students with even more individualised support, a computer-based adaptive learning system can be generated so that each student receives support exactly in the area needed. However, a large amount of data is required in advance (Zhou, 2016) and more adapted scaffolds need to be created to support content-related interpersonal differences (Shute and Zapata-Rivera, 2008) and acknowledge other individual differences such as motivational, metacognitive, and affective aspects, among others (Azevedo and Gasevic, 2019). An advantage of the computer-based adaptive learning system would be that the time and number of staff can be reduced and so more students have the opportunity to receive adaptive support in a shorter period of time (Dood et al., 2020; Lee et al., 2021; Yik et al., 2021). A useful application would be an automatic scoring system of students’ scaffold answers with machine learning approaches as scoring is the most time-consuming part of the adaptive scaffolding process.

Limitations

The conclusions discussed in the above paragraphs should be considered with caution as the study has certain limitations. First, the reactions used in the diagnostic and adapted scaffolds are only comparable to a certain extent because the organic chemistry reactions are different. However, it would not have been appropriate to use the same reaction twice as students may simply recall it in the adapted scaffolds. Both reactions were aligned well with the curriculum so that all required chemical concepts were covered in advance. Secondly, around one scoring point was used to assign the respective support group. However, apart from the score, each student was individually assessed as to the type of support needed based on the answers given. Lastly, we observed that students’ answers were shorter in written arguments compared to oral interviews, which resulted in answers that were not always precise. To evaluate the students objectively and to guarantee impartiality, the students’ answers were taken verbatim. An exception was when students named, for example, molecules incorrectly, but one could make an unambiguous assignment based on the statement (e.g., mistaking hydroxide with hydroxyl).

Author contributions

Leonie Sabine Lieber: conceptualisation, formal analysis, methodology, project administration, writing – original draft, visualization. Krenare Ibraj: methodology, writing – review and editing, formal analysis. Ira Caspari-Gnann: investigation, resources, writing – review and editing, project administration. Nicole Graulich: supervision, conceptualisation, methodology, writing – review and editing, project administration

Conflicts of interest

There are no conflicts to declare.

Appendix 1: exemplary arguments for the eight alternative reaction pathways

Fig. 6 shows exemplary arguments for the eight alternative reaction pathways of the diagnostic scaffold and adapted scaffolds. The arguments are exemplary and can be broadly extended. Moreover, arguments can be doubled because arguments for a plausible alternative reaction pathway are also arguments to justify why a certain alternative reaction pathway is implausible.
image file: d2rp00016d-f6.tif
Fig. 6 Exemplary arguments for the four alternative reaction pathways in the diagnostic scaffold and the four alternative reaction pathways in the adapted scaffolds.

Appendix 2: an example of a science-related argument of the tasks on argumentation patterns

Fig. 7 shows an example of a science-related argument of the tasks on argumentation patterns. Students received tasks on argumentation patterns which varied in the number of arguments depending on their adapted scaffold. In the diagnostic scaffold, all groups received three arguments whereas, in the adapted scaffold, the ArgS and ArgConS groups received eight arguments and the ConS and ReaS groups received three arguments. All arguments were science-related. Students had to assign the argument components (claim, evidence, and reasoning) to sentence components. They could always indicate that they do not know the answer.
image file: d2rp00016d-f7.tif
Fig. 7 Example task on argumentation patterns. The correct answers are blue colour-coded.

image file: d2rp00016d-f8.tif
Fig. 8 An example of an alternative reaction pathway for the reaction of 4-chlorobutanol and hydroxide.

Appendix 3: questions on chemical concepts with full-point student examples

Table 3 outlines the eight questions on chemical concepts students received in the diagnostic scaffold and adapted scaffolds. For each question, there is an example of a student's answer which was awarded full points in the scoring of concept knowledge. The student examples for the question on nucleophilicity and electrophilicity as well as on acidity and basicity are bipartite because these aspects were considered separately in the scoring system.
Table 3 Illustration of the questions on chemical concepts and student examples for each question, which received full points
Questions on chemical concepts Student example
Decide whether steric aspects need to be considered in the reaction and explain why you think so. “I do not think that steric aspects need to be considered in the reaction because the reaction is taking place on a primary alkyl halide. A primary alkyl halide only has one other non-hydrogen substituent so it is relatively unhindered. Therefore, the –OH can attack the carbon without experiencing significant hindrance from other substituents which would be steric considerations.”
Approximate the pKa values of the involved molecules in this reaction, or, if you do not know, outline how you think the pKa values of the different molecules compare to each other (e.g., molecule x has the highest and molecule y the lowest pKa) “I'd say the chlorobutanol is of a pKanear 15 because I think that's the pKaof water. The –OH itself probably has a pKaof water too because the –OH is the conjugate base of water.”
Determine at which positions you think the involved molecules react as a nucleophile and at which positions they react as an electrophile. Explain your thinking. “The O of the OH group on the alkane acts as a nucleophile along with the O on the hydroxide ion because they have extra electron density.”
“I would expect the carbon bonded to the chlorine to be the most electrophilic site because chlorine is very electronegative and will pull electron density away from the carbon making it electrophilic. The carbon bonded to the hydroxyl group will be electrophilic for the same reason (oxygen is electronegative) but not as electronegative as the aforementioned carbon because oxygen is not as electronegative as chlorine.”
Determine at which positions you think the involved molecules react as an acid and at which positions they react as a base. Explain your thinking. “The 4-chloro-1-butanol reagent will act as a weak acid due to its mildly acidic hydroxyl group. This molecule will only lose its hydroxyl proton to moderately or strongly basic species that react to form a conjugate acid with a pKahigher than that of 1-chloro-4-butanol (a higher pKacorresponds to a less acidic, and thereby lower-energy, product).”
“On the hydroxide, oxygen molecule is primary source of basicity due to negative charge on it.”
Determine whether you think there are any effects that stabilise your product compared to the reactants. If so, explain how the effect/s stabilise the product. “The product is stable because it is a five membered ring. This structure allows for optimal bond angles for sp 3 hybridization. The hydroxyl becomes protonated to form water, which is more stable since there are no formal charges in the molecule as oxygen forms two bonds.”
Determine whether you think there are any entropic effects that influence the reaction process. If so, explain why you think so. “I do not think there are entropic effects that influence the reaction. There are two starting molecules and two products.”
Determine whether electronic effects (e.g., inductive effects, resonance, electronegativity,…) influence the reaction process and why you think so. “I think that the only electronic effect here is electronegativity and induction, as the Cl–C bond is polarized so that the carbon is a slightly positive center; there are no double bonds to induce resonance. The Cl is a better leaving group than OH partially because Cl is more electronegative than O, at least it is more electronegative towards other potential electrons than the effective electronegativity of an O bonded already to one H. The stronger inductive effects and electronegativity of Cl make it a better leaving group than the OH on the alcohol/chloride alkane.”
Decide whether the reaction is reactant- or product-favoured from an energetic perspective (enthalpy). Explain your thinking. “Product-favored. C–O bonds are stronger than C–Cl bonds due to the shorter length of C–O, so this substitution lowers the energy of the system.”


Appendix 4: task for building arguments for alternative reaction pathways

Fig. 8 illustrates the task of building arguments for alternative pathways for the formation of THF. Students judged the plausibility of four alternative reaction pathways. Thereby, they created a claim and built evidence and reasoning statements to support and justify their decision. The text boxes were modified in size according to the number of words in the answers.

Appendix 5: summary of the results of the Wilcoxon signed-rank test

Table 4 shows the results of the Wilcoxon signed-rank test with Bonferroni-adjusted p-values for pre-post comparisons of the argumentation score and concept knowledge score.
Table 4 Results of the Wilcoxon signed-rank test with Bonferroni-adjusted p-values for pre-post comparisons of the argumentation score and concept knowledge score. The correlation coefficient r was reported as effect size when p-values were significant. Significant results are highlighted in bold
Groups M pre M post p r
Argumentation score
ArgS 10 14.5 0.007 0.68
ConS 13 14 0.823
ArgConS 8.5 13.5 <0.001 0.74
ReaS 14 16 0.534
Concept knowledge score
ArgS 21 20 0.299
ConS 16 19 0.001 0.66
ArgConS 13.5 18 0.011 0.57
ReaS 21 25 0.118


Acknowledgements

This publication is part of the first author's doctoral (Dr rer. nat.) thesis at the Faculty of Biology and Chemistry, Justus-Liebig-University Giessen, Germany. We are thankful for all students willing to participate in the study and the professor for the opportunity to work with his students. Moreover, we thank all members of the Graulich group, Sascha Bernholt, Clay Bennett and especially David Kranz and Axel Langner for fruitful discussions and eye-opening moments. Leonie Sabine Lieber especially thanks the Verband der Chemischen Industrie (German Chemical Industry Association) for supporting her with the Kekulé Fellowship.

References

  1. Azevedo R., Cromley J. G. and Seibert D., (2004), Does adaptive scaffolding facilitate students’ ability to regulate their learning with hypermedia?, Contemp. Educ. Psychol., 29(3), 344–370.
  2. Azevedo R. and Gasevic D., (2019), Analyzing Multimodal Multichannel Data about Self-Regulated Learning with Advanced Learning Technologies: Issues and Challenges, Comput. Hum. Behav., 96, 207–210.
  3. Belland B. R., (2017), Instructional scaffolding in STEM education: Strategies and efficacy evidence, Springer Nature.
  4. Carle M. S. and Flynn A. B., (2020), Essential learning outcomes for delocalization (resonance) concepts: How are they taught, practiced, and assessed in organic chemistry?, Chem. Educ. Res. Pract., 21(2), 622–637.
  5. Caspari I. and Graulich N., (2019), Scaffolding the structure of organic chemistry students’ multivariate comparative mechanistic reasoning, Int. J. Phys. Chem. Educ., 11(2), 31–43.
  6. Chen C. H., (2014), An adaptive scaffolding e-learning system for middle school students' physics learning, Aust. J. Educ. Technol., 30(3), 342–355.
  7. Choi A., Hand B. and Greenbowe T., (2013), Students' Written Arguments in General Chemistry Laboratory Investigations, Res. Sci. Educ., 43(5), 1763–1783.
  8. Cohen J., (1992), A power primer, Psychol. Bull., 112(1), 155–159.
  9. Coppola B. P. and Pontrello J. K., (2014), Using errors to teach through a two-staged, structured review: Peer-reviewed quizzes and “What's wrong with me?”, J. Chem. Educ., 91(12), 2148–2154.
  10. Cruz-Ramirez de Arellano D. and Towns M. H., (2014), Students' understanding of alkyl halide reactions in undergraduate organic chemistry, Chem. Educ. Res. Pract., 15(4), 501–515.
  11. Deng J. M. and Flynn A. B., (2021), Reasoning, granularity, and comparisons in students' arguments on two organic chemistry items, Chem. Educ. Res. Pract., 22(3), 749–771.
  12. Dood A. J., Dood J. C., de Arellano D. C. R., Fields K. B. and Raker J. R., (2020), Analyzing explanations of substitution reactions using lexical analysis and logistic regression techniques, Chem. Educ. Res. Pract., 21(1), 267–286.
  13. Driver R., Asoko H., Leach J., Scott P. and Mortimer E., (1994), Constructing scientific knowledge in the classroom, Educ. Res., 23(7), 5–12.
  14. Erduran S., (2007), Argumentation in Science Education, Springer, pp. 47–69.
  15. Fan Y. C., Wang T. H. and Wang K. H., (2020), Studying the effectiveness of an online argumentation model for improving undergraduate students' argumentation ability, J. Comput. Assist. Learn., 36(4), 526–539.
  16. Field A., Miles J. and Field Z., (2012), Discovering statistics using R, Sage publications.
  17. Flynn A. B., (2021), Problems and Problem Solving in Chemistry Education: Analysing Data, Looking for Patterns and Making Deductions, pp. 145–165.
  18. Graulich N. and Caspari I., (2021), Designing a scaffold for mechanistic reasoning in organic chemistry, Chem. Teach. Int., 3(1), 19–30.
  19. Hogan K. and Maglienti M., (2001), Comparing the epistemological underpinnings of students' and scientists' reasoning about conclusions, J. Res. Sci. Teach., 38(6), 663–687.
  20. Kang H., Thompson J. and Windschitl M., (2014), Creating Opportunities for Students to Show What They Know: The Role of Scaffolding in Assessment Tasks, Sci. Educ., 98(4), 674–704.
  21. Lajoie S. P., (2005), Extending the scaffolding metaphor, Instr. Sci., 33(5–6), 541–557.
  22. Lazarou D. and Erduran S., (2020), “Evaluate What I Was Taught, Not What You Expected Me to Know”: Evaluating Students’ Arguments Based on Science Teachers’ Adaptations to Toulmin's Argument Pattern, J. Sci. Teach. Educ., 32(3), 306–324.
  23. Lee H. S., Gweon G. H., Lord T., Paessel N., Pallant A. and Pryputniewicz S., (2021), Machine Learning-Enabled Automated Feedback: Supporting Students' Revision of Scientific Arguments Based on Data Drawn from Simulation, J. Sci. Educ. Technol., 30(2), 168–192.
  24. Lieber L. and Graulich N., (2020), Thinking in Alternatives-A Task Design for Challenging Students' Problem-Solving Approaches in Organic Chemistry, J. Chem. Educ., 97(10), 3731–3738.
  25. Lieber L. and Graulich N., (2022), Investigating Students' Argumentation when Judging the Plausibility of Alternative Reaction Pathways in Organic Chemistry, Chem. Educ. Res. Pract., 23, 38–54.
  26. Luo X. L., Wei B., Shi M. and Xiao X., (2020), Exploring the impact of the reasoning flow scaffold (RFS) on students' scientific argumentation: based on the structure of observed learning outcomes (SOLO) taxonomy, Chem. Educ. Res. Pract., 21(4), 1083–1094.
  27. McGinn M. K. and Roth W.-M., (1999), Preparing students for competent scientific practice: Implications of recent research in science and technology studies, Educ. Res., 28(3), 14–24.
  28. McNeill K. L. and Krajcik J., (2012), Book study facilitator's guide: Supporting grade 5–8 students in constructing explanations in science: The claim, evidence and reasoning framework for talk and writing, New York: Pearson Allyn & Bacon.
  29. McNeill K. L., Lizotte D. J., Krajcik J. and Marx R. W., (2006), Supporting students' construction of scientific explanations by fading scaffolds in instructional materials, J. Learn. Sci., 15(2), 153–191.
  30. Milkman K. L., Chugh D. and Bazerman M. H., (2009), How can decision making be improved?, Perspect. Psychol. Sci., 4(4), 379–383.
  31. Moon A., Stanford C., Cole R. and Towns M., (2016), The nature of students' chemical reasoning employed in scientific argumentation in physical chemistry, Chem. Educ. Res. Pract., 17(2), 353–364.
  32. Noroozi O., Kirschner P. A., Biemans H. J. A. and Mulder M., (2017), Promoting Argumentation Competence: Extending from First- to Second-Order Scaffolding Through Adaptive Fading, Educ. Psychol., 30(1), 153–176.
  33. Osborne J. F. and Patterson A., (2011), Scientific argument and explanation: A necessary distinction?, Sci. Educ., 95(4), 627–638.
  34. Pabuccu A. and Erduran S., (2017), Beyond rote learning in organic chemistry: the infusion and impact of argumentation in tertiary education, Int. J. Sci. Educ., 39(9), 1154–1172.
  35. Pea R. D., (2004), The social and technological dimensions of scaffolding and related theoretical concepts for learning, education, and human activity, J. Learn. Sci., 13(3), 423–451.
  36. Petritis S. J., Kelley C. and Talanquer V., (2021), Exploring the impact of the framing of a laboratory experiment on the nature of student argumentation, Chem. Educ. Res. Pract., 22(1), 105–121.
  37. Petritis S. J., Kelley C. and Talanquer V., (2022), Analysis of factors that affect the nature and quality of student laboratory argumentation, Chem. Educ. Res. Pract., 23, 257–274.
  38. Popova M. and Bretz S. L., (2018), “It's Only the Major Product That We Care About in Organic Chemistry”: An Analysis of Students' Annotations of Reaction Coordinate Diagrams, J. Chem. Educ., 95(7), 1086–1093.
  39. Rosenthal R., (1991), Meta-analytic procedures for social research, Newbury Park, CA: Sage.
  40. Sadler T. D., (2004), Informal reasoning regarding socioscientific issues: A critical review of research, J. Res. Sci. Teach., 41(5), 513–536.
  41. Sandoval W. A. and Millwood K. A., (2005), The quality of students' use of evidence in written scientific explanations, Cogn. Instr., 23(1), 23–55.
  42. Shute V. J. and Zapata-Rivera D., (2008), Handbook of research on educational communications and technology, New York, NY: Taylor and Francis, pp. 277–294.
  43. Smith M. K., Wood W. B., Adams W. K., Wieman C., Knight J. K., Guild N. and Su T. T., (2009), Why peer discussion improves student performance on in-class concept questions, Science, 323(5910), 122–124.
  44. Songer N. B. and Gotwals A. W., (2012), Guiding explanation construction by children at the entry points of learning progressions, J. Res. Sci. Teach., 49(2), 141–165.
  45. Stowe R. L., Scharlott L. J., Ralph V. R., Becker N. M. and Cooper M. M., (2021), You Are What You Assess: The Case for Emphasizing Chemistry on Chemistry Assessments, J. Chem. Educ., 98(8), 2490–2495.
  46. Toulmin S. E., (2003), The Uses of Argument, Cambridge University Press.
  47. Van de Pol J., Volman M. and Beishuizen J., (2010), Scaffolding in teacher–student interaction: A decade of research, Educ. Psychol., 22(3), 271–296.
  48. Van Eemeren F. H., Garssen B., Krabbe E. C. W., Snoeck Henkemans A. F., Verheij B. and Wagemans J. H. M., (2014), Handbook of Argumentation Theory, Dordrecht: Springer, pp. 203–256.
  49. Vygotsky L. S., (1980), Mind in society: The development of higher psychological processes, Harvard University Press.
  50. Walker J. P., Van Duzor A. G. and Lower M. A., (2019), Facilitating Argumentation in the Laboratory: The Challenges of Claim Change and Justification by Theory, J. Chem. Educ., 96(3), 435–444.
  51. Walqui A., (2006), Scaffolding instruction for English language learners: A conceptual framework, Int. J. Biling. Educ. Biling., 9(2), 159–180.
  52. Watts F. M., Zaimi I., Kranz D., Graulich N. and Shultz G. V., (2021), Investigating students' reasoning over time for case comparisons of acyl transfer reaction mechanisms, Chem. Educ. Res. Pract., 22(2), 364–381.
  53. Wecker C. and Fischer F., (2014), Where is the evidence? A meta-analysis on the role of argumentation for the acquisition of domain-specific knowledge in computer-supported collaborative learning, Comput. Educ., 75, 218–228.
  54. Wood D., Bruner J. S. and Ross G., (1976), The role of tutoring in problem solving, J. Child Psychol. Psychiatry, 17(2), 89–100.
  55. Yik B. J., Dood A. J., de Arellano D. C. R., Fields K. B. and Raker J. R., (2021), Development of a machine learning-based tool to evaluate correct Lewis acid-base model use in written responses to open-ended formative assessment items, Chem. Educ. Res. Pract., 22(4), 866–885.
  56. Yuriev E., Naidu S., Schembri L. S. and Short J. L., (2017), Scaffolding the development of problem-solving skills in chemistry: guiding novice students out of dead ends and false starts, Chem. Educ. Res. Pract., 18(3), 486–504.
  57. Zhou Z. H., (2016), Learnware: on the future of machine learning, Front. Comput. Sci., 10(4), 589–590.

This journal is © The Royal Society of Chemistry 2022