Patterns of reasoning – exploring the interplay of students’ work with a scaffold and their conceptual knowledge in organic chemistry

David Kranz; Michael Schween; Nicole Graulich

doi:10.1039/D2RP00132B

View PDF VersionPrevious ArticleNext Article

DOI: 10.1039/D2RP00132B (Paper) Chem. Educ. Res. Pract., 2023, 24, 453-477

Patterns of reasoning – exploring the interplay of students’ work with a scaffold and their conceptual knowledge in organic chemistry

David Kranz ^a, Michael Schween ^b and Nicole Graulich *^a
^aJustus-Liebig-University, Giessen, Institute of Chemistry Education, Heinrich-Buff-Ring 17, 35392 Giessen, Germany. E-mail: Nicole.Graulich@didaktik.chemie.uni-giessen.de
^bPhilips-University, Marburg, Faculty of Chemistry, Hans-Meerwein-Strasse 4, 35032 Marburg, Germany

Received 9th May 2022 , Accepted 2nd November 2022

First published on 2nd November 2022

Abstract

Reaction mechanisms are a core component of organic chemistry. Being able to handle these mechanisms is a central skill for students in this discipline. Diagnosing and fostering mechanistic reasoning is hence an important branch of chemistry education research. When it comes to reasoning about mechanisms, students often experience difficulties because they either lack conceptual understanding, cannot make appropriate inferences, or struggle to link representations to chemical concepts. Instructional tools to bridge this gap are thus required in organic chemistry education. Recently, scaffolds to support students to make connections between properties and reaction pathways as well as connecting a representation to chemical concepts are documented to be helpful to foster students’ reasoning. Although the advantage of scaffolds is reported, the question arises of how students are working with scaffolds and if scaffolds can influence students’ scores in a conceptual knowledge test. In this study, we explored in a pre–post mixed methods approach how students recruited from an organic chemistry course are working with a written scaffold. We correlated their level of causal complexity and multivariateness expressed in the scaffolds with their score in a conceptual knowledge test pre and post. The task used consisted of scaffolded contrasting cases of two addition reaction steps of a nucleophile on a carbonyl carbon. The paper-pencil test used for pre-, and posttest covered the respective conceptual knowledge. We qualitatively identified patterns along the dimensions of causal complexity and multivariateness in students’ written responses in the scaffold and looked for relationships between students’ scores in the pre- and posttest and these two dimensions. We found five different patterns in students’ responses and were able to show that the score they achieved in the pretest influenced how effectively students were supported by the scaffold. Thus, this exploratory study provides encouraging implications and insights into the use of scaffolds.

Introduction

One of the main tasks of instruction is to constantly reassess students’ learning and to design new methods of supporting students in the learning process. Instructional settings which tackle students’ difficulties in organic chemistry have been developed and evaluated in various studies, such as using the lens of framing in the context of an organic chemistry lab (Petritis et al., 2021), writing-to-learn assignments (Schmidt-McCormack et al., 2019; Gupte et al., 2021), scaffolds and purposefully designed instructional prompting (Underwood et al., 2018; Caspari and Graulich, 2019; Lieber and Graulich, 2020; Deng and Flynn, 2021; Flynn, 2021; Keiner and Graulich, 2021), or substantial curricular changes (Cooper et al., 2019). Because mechanisms are a substantial part of organic chemistry, current research is focused more explicitly on promoting mechanistic reasoning.

There is evidence, that scaffolding and prompting helps students to increase their mechanistic reasoning abilities (Caspari and Graulich, 2019; Noyes and Cooper, 2019; Crandell et al., 2020; Watts et al., 2021), and may lead to higher success in contexts where transfer is necessary (Grove et al., 2012; Caspari et al., 2018).

However, it still remains unclear how students are making use of scaffolds when working individually and whether the use of a scaffold for mechanistic case comparisons is equally supportive for students in a heterogeneous group. To determine both, it is necessary to look, on the one hand, at how students work with the scaffold itself (e.g., how they use a scaffold to collect arguments) and, on the other hand, at the potential gain in conceptual understanding that results from using a scaffold.

In this study, we used a pre–post mixed methods approach to explore how organic chemistry students worked with a written scaffold (i.e., their levels in the dimensions of causal complexity and multivariateness), and if a gain in conceptual knowledge is related to these dimensions as well as to students’ score in a conceptual knowledge test prior (i.e., low- vs. high-score students) to the scaffolded problem-solving setting.

Mechanistic reasoning

To elicit how and why phenomena around us function as they do, mechanistic explanations have become the standard for scientific explanations (i.e., explaining phenomena in cause-effect relations) since the 17th century (Westfall, 1977; Russ et al., 2009). Mechanistic reasoning comprises tracing all steps from an initiating cause to a terminating effect to unpack the phenomenon (Machamer et al., 2000; Tabery, 2004). Machamer et al. (2000) describe mechanisms as “entities and activities organized such that they are productive of regular changes from start or set-up to finish or termination conditions” (Machamer et al., 2000, p. 3). To reason about reaction mechanisms from the origin (i.e., reactants) to the end (i.e., products) multiple components need to be identified (Becker et al., 2016; Caspari et al., 2018; Bodé et al., 2019). Entities, for instance, are molecules, atoms, or groups of atoms, whereas charges or orbital configurations of atoms or atom groups (e.g., functional groups) can be understood as properties of entities. Activities are then all transformations that cause a change in entities and their properties (Machamer et al., 2000; Caspari et al., 2018; Watts et al., 2020). To investigate how learners reason mechanistically, several frameworks have been used to describe students’ reasoning and argumentation (Russ et al., 2008; Sevian and Talanquer, 2014; Caspari et al., 2018; Crandell et al., 2018; Bodé et al., 2019).

In their learning progression, Sevian and Talanquer (2014) for example distinguished students’ responses in terms of modes of reasoning: descriptive, relational, linear causal, and multicomponent. An argument is descriptive if it remains superficial and only includes explicit properties. An argument is relational if implicit properties are included but their effects are not justified. Linear causal arguments recognize cause-effect relationships but explain phenomena unicausally. Multicomponent means that several causes for phenomena are considered and weighed against each other. This framework has already been successfully applied (Weinrich and Talanquer, 2016). While Deng and Flynn (2021) use similar criteria regarding modes of reasoning, they additionally include the granularity of the involved concepts and examine how they are connected. Caspari et al. (2018) distinguish between different levels of complexity when students are building relations between structural differences and changes in a reaction. Student statements are classified as low complexity if their reasoning relates directly from an explicit feature to a change. To be assigned to middle complexity, a student's explanation needs to contain additionally an implicit property derived from an explicit feature. For a student's statement to be categorized as high complexity, the student must first derive an implicit property from an explicit feature. Subsequently, it needs to be recognized that this implicit property has an electronic effect that ultimately affects the change. The number of knowledge elements used thus increased with increasing complexity. Crandell et al. (2018) make further distinctions regarding causality and the mechanistic nature of an argument. In addition to the division into no answer and non-normative, the answers were divided into descriptive general (what), descriptive causal (what and why), descriptive mechanistic (what and how), and causal mechanistic (what, why, and how). As can be derived from the corresponding questions, descriptive general means that only a simple description is given, and descriptive causal that a causal explanation is included. For an answer to be descriptive mechanistic, a mechanistic aspect (for example, electron movement) must be included. Causal mechanistic means that the students include both causal and mechanistic elements in their reasoning.

Two central dimensions can be derived from these frameworks. One is the characterization of causal complexity, which considers the elaborateness or depth of students’ mechanistic reasoning. Causal complexity involves the depth to which learners can link identified components of a mechanism to build causal arguments. The second dimension reflected in two of the four frameworks (i.e., depending on the task used) is multivariateness, which refers to considering multiple influential factors or variables governing an organic reaction. For this exploratory study, these two dimensions are applied as a lens to characterize students’ use of a scaffold.

Scaffolding reasoning

If we want to promote reasoning, students need guidance to activate the conceptual knowledge or procedural knowledge necessary to solve the task (Rittle-Johnson and Star, 2007; Rittle-Johnson and Star, 2009; Shemwell et al., 2015; Chin et al., 2016). One way to ensure this guidance is to provide students with scaffolds and sequence the presented task in several manageable subtasks (Benson, 1997). This sequential prompting helps students to complete tasks that they would not be able to solve without guidance and improve their problem-solving abilities (Wood et al., 1976; Belland, 2011; Yuriev et al., 2017).

Scaffolding as a label for explicit guidance can be used in different forms. Yuriev et al. (2017), for example, developed a scaffold for guiding students through solving physical chemistry problems. Their scaffold helped students reflect on their problem-solving strategies in metacognitive means while engaged in the problem-solving process. Flynn (2021) presented some question types that were able to scaffold students’ synthesis skill-building and could identify which strategies have been particularly successful. Scaffolding can also help to slow down the reasoning process, for example, allowing students to reflect in-depth on the concepts used (Caspari and Graulich, 2019; Graulich et al., 2021). Scaffolding can thus support students to use or activate more of their resources when solving mechanistic problems, which they could overlook when not scaffolded (Maeyer and Talanquer, 2010; Talanquer, 2014).

The ultimate aim of scaffolding is to slowly withdraw or fade out the guidance when learners become more and more proficient in solving the respective task (Puntambekar and Hubscher, 2005; McNeill et al., 2006; Lin et al., 2012). The amount of support provided by the scaffold is then gradually reduced to allow learners to apply the learned problem-solving steps independently to solve a task without further scaffolding. Deciding at which point in the learning process one should begin fading the scaffold requires an in-depth understanding of how students are working with the scaffold.

A scaffold designed by Caspari and Graulich (2019) aimed at guiding students in solving mechanistic case comparison tasks. Students were prompted to identify the explicit structural differences between the structures of the compared mechanistic steps (e.g., different-sized hydrocarbon residues at the reaction center), to identify all property changes that take place from the reactants to the products (e.g., the formation of a positive charge in the elimination step of an S_N1 reaction) and finally to explain the influence of the differences on the property changes.

One finding drawn from this study by Caspari and Graulich (2019) is that the number of detected influential factors (i.e., multivariateness) increases significantly with a high effect size (Cohen's d = 1.272) when comparing students’ problem-solving processes with and without the scaffold. Students found more relations when guided by the scaffold than solving the tasks intuitively. Watts et al. (2021) found as well that scaffolds support students to take more resources into account than working on a task without scaffolding. Other studies have shown that targeted prompting increases the depth of arguments (Noyes and Cooper, 2019; Crandell et al., 2020).

Students’ conceptual knowledge and scaffolding

One could conclude based on prior research that scaffolding is particularly suitable for supporting students’ problem-solving as well as specifically fostering the causal complexity and multivariateness in reasoning about organic mechanisms. Often scaffolds are framed as a tool to support especially those students who struggle to appropriately use and apply conceptual knowledge, to help them make the necessary connections between knowledge elements. However, in one of our former studies, for instance, we observed that one quarter of the participants did not overly profit from the scaffold (Caspari and Graulich, 2019). Watts et al. (2021) also found that scaffolding did not activate resources equally well for all students. Thus far, these findings indicate that scaffolds might not be equally beneficial for some students, and the question arises if this is related to how students work with a scaffold or if students’ prior conceptual understanding, considered as their network of knowledge elements to a certain moment in time, might be related to the extent to which they are profiting from a scaffold. As scaffolds are often meant to guide students through a problem without providing additional conceptual knowledge, students’ activation of knowledge elements while solving a scaffolded task is crucial.

On the one hand, one can assume that for a student who reaches high scores in a prior conceptual knowledge test and, thus, activates multiple knowledge elements, using a scaffold that prompts to explicitly activate knowledge elements can be redundant or distracting (Kalyuga, 2007; Oksa et al., 2010). Such an expertise reversal effect has already been documented in several studies other than chemistry with participants with differing scores in a prior-knowledge test (Homer and Plass, 2010; Nückles et al., 2010; Salden et al., 2010). On the other hand, for such a student with a high conceptual understanding, further linking concepts through scaffolded guidance might be productive, if knowledge elements can be activated easily.

If a learner has a fragmented conceptual understanding (e.g., struggles to activate the necessary knowledge elements required to solve a task independently), a scaffold can build on it (Van Der Stuyf, 2002; Lajoie, 2005), supporting students to activate, organize, and integrate resources (Hammer et al., 2005). These assumptions are supported by the principle of contingency (i.e., adaption to students’ conceptual understanding), which is a basic characteristic of scaffolding (van de Pol et al., 2010). Thereby, the support for learners should be provided either on the current level of understanding or slightly higher. Conversely, if the learner shows less conceptual understanding, then important resources to adequately solve a given task may simply be missing and may, thus, not be activated by the scaffold. Since a scaffold is not per se designed to provide information but rather to structure the problem-solving process, the learner may not be able to profit from the scaffold's instructions and activate the knowledge resources necessary.

To aim at a meaningful use of scaffolds in the classroom, one should investigate to what extent students are activating possible knowledge resources to form causal or multivariate arguments while working with a scaffold. And additionally, if there are measurable indications that students’ gain in conceptual knowledge score after working with the scaffold is related to their prior knowledge (i.e., low- vs. high-score group) and/or affected by the quality of the work with the scaffold.

Theoretical framework

The resources framework

The study is based on the considerations from Hammer and colleagues’ resources framework (Hammer and Elby, 2000; Hammer et al., 2005). In this approach, resources are assumed to be the smallest unit in the epistemological reasoning process. To solve a domain-specific problem, for instance, several resources which are available to the learner must be activated simultaneously in the problem-solving process to achieve a solution for the problem. The process of activating resources is an individual process, guided by one's own prior knowledge, experiences, and understanding of a learning situation. Elby and Hammer argue that activating different resources when solving a problem is guided by framing, which can be described as “the cognitive ability underlying a student's sense of ‘what is going on here’ with respect to knowledge” (Elby and Hammer, 2010, p. 7). This framing can occur by prompts or by systematic scaffolding and sequencing of a problem-solving process. However, the same frame or prompt does not necessarily lead to similar resources being activated in all learners solving the problem. Only repeated concerted activation of certain resources makes it more likely for the same cluster of resources to be activated again in a similar context, as postulated by Hammer et al. (2005).

Investigating scaffolding through the lens of this framework frames conceptual understanding not as a stable construct but as a sum of simultaneously activated resources from a network of knowledge elements that are activated in response to tasks or prompts. A scaffold, thus, may support the activation of different resources while solving problems compared to a non-guided setting and may strengthen the simultaneous activation of resources in new contexts. In our study, we qualitatively capture students’ resources in the scaffold by the two dimensions causal complexity and multivariateness, which require a different combination of knowledge elements. Furthermore, conceptual knowledge tests can be considered as a frame to activate students’ resources regarding their conceptual understanding which leads to the assumption that conceptual knowledge scores are merely an indicator of resource activation for the same prompts and not a reflection of the entire conceptual or prior knowledge of the learners. Working with a scaffolded contrasting cases task between two conceptual knowledge tests might thus lead to an increase in students’ test scores from pre- to posttest.

Research questions

As the application of scaffolds or instructional prompts is increasing, especially for supporting students’ mechanistic reasoning, it is important to explore how students are working with scaffolds and how this is related to students’ scores in a conceptual knowledge test. This study is thus guided by the following questions:

(1) What kind of patterns (in terms of causal complexity and multivariateness) can be observed in students’ written answers in the scaffold?

(2) To what extent does students’ gain in their conceptual knowledge score, after working with the scaffold, relate to their conceptual knowledge score before working with a scaffolded contrasting cases task (i.e., low- vs. high-score group) and to their level of causal complexity and multivariateness shown in the scaffold?

These research questions were investigated in a mixed-methods exploratory study with chemistry undergraduate students using a conceptual knowledge paper-pencil test and a scaffolded contrasting case task.

To examine how students are working with a scaffolded task, we first qualitatively analyzed students’ written solutions in the dimensions of causal complexity and multivariateness, and, second, analyzed exploratively the interplay between the qualitative work with the scaffold and students’ score in a conceptual knowledge test.

Since meta-analyses (Belland et al., 2017a; Belland et al., 2017b), as well as a systematic review (Valero Haro et al., 2019), suggest that the effects of scaffolding on students’ conceptual understanding can be assessed in a pre–post design (showing significant effects), this study was conducted in a pre–post design as well. As we were aware of low test power as one of the main disadvantages of small sample research, we followed some strategies (e.g., reporting effect sizes of non-significant test results) recommended by Hoyle (1999). We also used the less strict false discovery rate correction (FDR-correction) instead of family-wise error rate correction (FWER-correction) to minimize the type II error probability, which is also a meaningful choice for exploratory research (Benjamini and Hochberg, 1995; Groppe et al., 2011).

Methods

Context

The study was conducted in October 2019 (the first half of the semester) at a German university. N = 18 (identified themselves as female (12) and male (6)) undergraduate chemistry majors between 19 and 31 years old were recruited from an Organic Chemistry II class voluntarily by an announcement in class. The course is titled Organic Reaction Mechanisms and covers, inter alia, reactions of carboxyl derivates. Basic concepts like hyperconjugation, inductive, and mesomeric effects† were already taught in Organic Chemistry I, which was required to attend the Organic Chemistry II course. Tasks were presented in German, quotes were translated for this article and checked multiple times by an English native speaker (Appendix 1, Table 3).

The students were informed both orally and in form of a consent form about their rights regarding their data and how the data is processed. The privacy policy was based on the regulations of the General Data Protection Regulation (GDPR) by the European Union (2016), which represents an EU-wide uniform data protection regulation. They were also informed that their anonymized writing and data would be analyzed and discussed by members of the research group and be used for publications. Institutional Review Board approval is not required at German universities, nevertheless, the data collection followed ethical guidelines (Deutsche Forschungsgemeinschaft, 2022) and allowed each participant to drop out at any point. Informed consent was obtained from all participants. Only the authors had access to the data, digital data were stored on a local hard drive, and written tests were stored in an internal archive. Data of participants who opted out would have been deleted (i.e., files would have been deleted and written tests would have been destroyed). However, no participant made use of this possibility.

Research instrument

The research instrument (Fig. 1 shows the study design) consisted of two parts. The first part was a conceptual knowledge test, which was used as a pre- and posttest.


	Fig. 1 Study outline in three steps. First, a 20 minute pretest consisting of a conceptual knowledge test, then a 20 minute problem-solving phase consisting of a written task, and finally a 20 minute posttest.

The second part was a scaffolded contrasting cases task, which was used for the instructional setting.

Conceptual knowledge test

The paper-pencil test (pre- and posttest) contained 12 single- and multiple-choice tasks (45 response options in total for analysis, some examples can be found in Appendix 2). The test contained typical organic chemistry questions, such as identifying the most electrophilic site within a molecule or sorting molecules according to their basicity, but also questions about mesomeric and inductive effects or hyperconjugation as well as questions similar to contrasting cases with given justifications to choose from. Using these types of tasks, we wanted to assess whether students were able to activate the appropriate resources to answer typical organic chemistry questions. We piloted the paper-pencil test (N = 8) before conducting the main study, interviewed the participants on how they understood the different items in the test, and adapted them accordingly to ensure response process validity (Deng et al., 2021). Cohen's α values are provided in the methodology for the quantitative part of the data analysis.

Contrasting cases task and scaffold

A scaffolded case comparison task has been chosen for this study. Case comparisons have been shown to have a positive effect with a medium effect size (d = 0.60; 95% CI [0.47, 0.72]) on the identification of relevant variables compared to tasks without case comparisons (Alfieri et al., 2013). Promising contrasting cases have been proposed (Graulich and Schween, 2018) and used as an instrument in several studies (Bodé et al., 2019; Caspari and Graulich, 2019; Rodemer et al., 2020; Deng and Flynn, 2021; Watts et al., 2021).

The contrasting case used asks students to compare the rate of two addition steps of a nucleophile to carboxylic acid derivates (Fig. 2) by using a scaffold grid (Fig. 3) which is a modified version of a previously used scaffold design (Caspari and Graulich, 2019; Graulich and Caspari, 2021). It has already been shown in qualitative studies that this scaffold grid led to students being able to identify more implicit properties (Caspari and Graulich, 2019) and that they weighed their arguments after using the scaffold in contrast to before (Watts et al., 2021). We adapted the scaffold insofar as we considered the explicit structural difference and the explicit structural difference's property in two different cells to make it unambiguous for students what is expected of them. Additionally, we analyzed the answers and videos of students who participated in a pilot study and adjusted the scaffold grid accordingly to ensure response process validity (Deng et al., 2021). Furthermore, we discussed the scaffold and task with organic chemistry experts to ensure the accuracy of the content. In reaction A, the hydroxide anion reacts with an acid chloride, and in reaction B with an ester. The activation energy is lower in reaction A because the electron density of the electrophilic center is lower compared to B. The chlorine atom attracts electrons along the σ-bond (strong negative inductive effect) and pushes electrons only weakly along the π-bond (weak positive mesomeric effect). In contrast, the methoxy group has a strong electron-withdrawing effect along the σ-bond (strong negative inductive effect) and a strong electron-pushing effect along the π-bond (strong positive mesomeric effect). Due to the lower electron density, the carbonyl-carbon atom is more strongly partially positively charged in reaction A, which means that the electrostatic interactions between the negatively charged hydroxide anion and the partially positively charged carbonyl-carbon atom are stronger than in B and the bond can be formed more easily. Since these effects also prevail in the transition state, it is at a lower energetic level, the activation energy is, therefore, lower, and the reaction takes place faster. Students would have to activate resources regarding the explicit structural differences including their properties and their relation to at least one property change to answer this task correctly.


	Fig. 2 Contrasting case and corresponding guiding prompt.


	Fig. 3 Schematic representation of the scaffold. On the left side, the arrows show how a relation is composed. On the right side, the subtasks are listed on which the students were guided through the task (the subtasks each contained an example that was intended to clarify what should be entered in which field based on an S_N1 mechanism).

To solve this written task students were asked to fill out the scaffold grid, as illustrated in Fig. 3. A possible solution for this task based on the scaffold grid can be found in Appendix 3. The students did not receive further help regarding the resources required to solve the case, they only received the prompts included in the scaffold. In the context of the framing of our study, we understood students’ answers in each cell as an activation of resources while solving the written task.

Data collection

The data were collected during the lecture of an organic chemistry II class. Demographic data were collected with a questionnaire before the pretest began. Four versions of the pre- and posttest respectively, with the same items but in a differing order, were distributed in class. To evaluate the reliability of the paper-pencil test all 53 students attending the course participated. Students had no additional materials. For the 18 volunteers who participated in the following instruction, we used this as a pretest. The volunteering students had 60 minutes to complete the pre- and posttest as well as the scaffolded task. The other 35 students solved traditional organic chemistry tasks for the remaining time of the lecture and participated in the posttest as well, which allowed us to determine the reliability of pre- and posttest with a higher number of participants. The data from the volunteering students have been collected aside from class, with two teaching assistants co-supervising the process and making sure that no additional tools were used.

For later processing, students generated codes for anonymization (8-character strings created with information only the students themselves can know). For the presentation of the data, students were randomly given gender-specific pseudonyms, as to keep their self-assigned gender identity.

The data of the paper-pencil test (pre- and posttest) as well as students’ written answers in the scaffold were transcribed for later processing.

Data analysis

The test data were transformed into data frames (i.e., a variable type, which is similar to a matrix, but which can contain different types of variables). During the qualitative analysis, we first proceeded deductively and used a modified coding system (Table 1) based on the framework of Caspari et al. (2018).

Table 1 Coding system. The first column shows how the codes were merged. The second column shows how they were assigned in the first step. The descriptions and the examples refer to the initial codes respectively. The coding scheme refers to the task solved by the students (see Fig. 2). The examples were written by the students into the scaffold grid (see Fig. 3). The respective cell is indicated in parentheses behind the quotes

Merged codes	Initial codes	Description of code	Student example
Incomplete	Non-existing/no effect of property on change or reaction center	The cell effect of a property of an explicit structural difference on a change/reaction center is either empty there is no effect of a property on a change/reaction center described no explicit structural difference is mentioned the property of an explicit structural difference or the change cannot be unambiguously assigned.	“the reaction is faster” (effect of property on change/reaction center)
Incomplete			No entries in other cells for this relation

Non-causal	Explicit	The effect of a property of an explicit structural difference on a change/reaction center is explained explicitly the property of an explicit structural difference is only described explicitly, an explicit structural difference is mentioned, the property of an explicit structural difference and the change can be unambiguously assigned.	“the methoxy group only slightly polarizes the carbonyl carbon” (effect of property on change/reaction center)
			“methyl group” (explicit structural difference)
			“big substituent” (property of an explicit structural difference)
		or	“nucleophilic attack” (property change)
	Non-electronic	The explanation about the effect of a property of an explicit structural difference on a change/reaction center is made on an implicit level, while at least one part of the chain of arguments is non-electronic an electronic property/effect is not described electronically, an explicit structural difference is mentioned, the property of an explicit structural difference and the change can be unambiguously assigned.	“mesomeric stabilization of the negative charge → greater energetic lowering of the p-orbital → effects already active in transition state → lowering of activation energy” (effect of property on change/reaction center)
			“Cl” (explicit structural difference)
			“greater +M effect” (property of an explicit structural difference)
		or	“formation of a negative charge” (property change)
	Non-causal electronic	All conditions but the presence of non-electronic arguments in the previous paragraph are met. All parts of the chain of arguments are described electronically. The chain of arguments is lacking at least one part or connection that is needed to formulate a completely causal explanation. If a property has already been described electronically, which is then mentioned in the cell effect of a property of an explicit structural difference on a change/reaction center, then the property is also treated as electronically described in this cell.	“by the −I effect, Cl attracts the electrons to itself, thus enabling a nucleophilic attack, whereby a negative charge enters the system” (effect of property on change/reaction center)
			“Cl” (explicit structural difference)
			“electron-withdrawing effect” (property of an explicit structural difference)
			“formation of a negative charge (full p-orbital)” (property change)

Causal electronic	Causal electronic	All conditions of the previous paragraph are met except the chain of arguments is completely causal.	“pulls electrons from [carbonyl] carbon, [carbonyl] carbon ∂+ -> accelerates” (effect of property on change/reaction center)
			“Chloride” (explicit structural difference) “[is] strongly electronegative, draws electrons (−I)” (property of an explicit structural difference)
			“formation of a negative charge on the oxygen” (property change)

The coding system was adjusted until agreement under constant discussion of the authors. Patterns were then derived from the data in the dimensions of causal complexity and multivariateness. In the second, exploratory quantitative part of our analysis, connections between students’ scores in the conceptual knowledge test before the intervention (i.e., low- vs. high-score group) as well as the dimensions of causal complexity and multivariateness with students’ gain in their conceptual knowledge score were analyzed. The software MAXQDA (VERBI Software, 2019) was used for the qualitative analysis, and RStudio (RStudio Team, 2022) with the programming language R (R Core Team, 2021) for the quantitative analysis. For graphical plotting, the R library ggplot2 (Wickham, 2016), as well as ggstatsplot (Patil, 2021) and the software Affinity Designer (Serif (Europe), 2022) for the post-editing, were used.

Qualitative analysis

Step 1: characterizing students’ work with the scaffold. In the beginning, as many nuances as possible were tried to be captured within the written solutions of the students in the scaffold. In our coding, we focused on the cells of the scaffold grid in which the effect (e.g., lower electron-density or increased attraction between nucleophile and electrophile) of the property (e.g., electron-withdrawal effect) of an explicit structural difference (e.g., chlorine vs. methoxy-group) on a property change (e.g., formation of a bond between nucleophile and electrophile) or the reaction center (e.g., carbonyl carbon) were described (see the yellow cells in Fig. 3, subtask d). For each of these cells, the activation of different resources by the learner is required to find elaborated answers for each prompt. Three prompts (subtask a–c in Fig. 3) are explicitly asking for a specific part of a mechanistic (i.e., cause-effect) relation between an explicit structural difference and a property change. Thus, the prompts should trigger students to activate resources to provide this particular information. The prompt regarding the four yellow cells of the scaffold grid (subtask d in Fig. 3) asks students to link this information to integrate the activated resources to answer the subtask. Chemical correctness was not assessed for the coding, as the main interest was to look at how and in what depth the students were using the scaffold. The remaining cells were also included in the coding to get an accurate impression of what the students had written. Students’ answers in the coded cells (see the framed cells in Fig. 4(a)) will be called relations in the following and equated as such.


	Fig. 4 Process of the steps of analysis (qualitative). First, codes were assigned to all students using the coding scheme. Then, the dimensions electronic and causal were combined into the dimension of causal complexity. Finally, the assigned codes were visualized, with multivariateness added as a second dimension.

Students could consider either none, one, or two different property changes in a relation (i.e., none, one, or both rows of the framed cells in Fig. 4(a)). If a student formed only one relation or recognized only one property change, the argumentation is considered univariate. If instead, two different property changes were taken into account, the argumentation is considered multivariate.

Thereby, the initial coding system (Table 1) distinguished five codes, but by comparing the coded transcripts (Fig. 4, Section (a)) with each other, we noticed similarities in terms of causal complexity. The dimension of causal complexity reflects the degree of activation and integration of resources to answer the last subtask (d in Fig. 3)). In the column description of code (Table 1) we tried to outline which key elements and linkages students’ answers had to contain to be categorized in a certain level of causal complexity.

The codes have been merged as shown in Fig. 4, Section b, from five initial coding categories for students’ relations to three merged codes for the causal complexity dimension. In this way, it was possible to minimize ambiguities in students’ written answers and to find more accurate labels to capture students’ use and integration of resources. If a student has not established any complete relation from an explicit structural difference to a property change, then the merged code incomplete (i.e., no resource activation or fragmented resource activation without integration) has been assigned for the relations on that property change (i.e., the row of relations regarding this property change, see Fig. 4, Section (b)). As soon as one of the relations got a code between explicit and non-causal electronic, this was merged into the code non-causal (i.e., activation of resources and partial integration) for the relation.

The initial code explicit was assigned, whenever a student remained on an explicit level while describing the effect of a property of an explicit structural difference on a property change/the reaction center. Mary (Appendix 4), for example, mentioned an electronic property (i.e., electron pushing) for the explicit structural difference (i.e., halogen), but did not relate this property to the effect on the property change/the reaction center (i.e., steric hindrance on the tetrahedron). Hence, she compared the molecules on an explicit level to explain the effect.

We used the initial code non-electronic for relations, which contained implicit properties and described their effect on a property change/the reaction center without explaining them electronically. Emma (Appendix 4) for example used an implicit property (i.e., mesomeric effect by ester) and related it to the property change (i.e., stabilization of the negative charge via mesomeric effect), but did not explain the property on an electronic level (e.g., the electrons can be shifted along the π-bond).

Relations were coded as non-causal electronic in the initial coding, whenever a student related a property that was explained electronically on the property change/the reaction center, while the argumentation was not completely causal (i.e., a part of a causal chain was missing). Alexander (Fig. 8) for example related an electronically described property (i.e., can transfer electron density by mesomeric effects) on the reaction center (i.e., can stabilize the emerging carbocation well via +M-effect, +M-effect means positive mesomeric effect which is not applicable in this context since there are no π-bonds in the products), but did not explain, why the mesomeric effect stabilizes the carbocation (he used the word carbocation here to describe the negatively charged product). Hence, the causal chain was incomplete.

As soon as at least one relation (from an explicit structural difference to a property change/the reaction center) could be categorized as causal electronic, the code causal electronic (i.e., activation and integration of resources) was assigned as the overall code for the relations on this property change. The initial code causal electronic was assigned whenever a student built a relation from a property of an explicit structural difference to a property change or the reaction center while describing the property electronically and arguing causally without leaving a gap in the causal chain. Clara for example (Fig. 5) described the property electronically (i.e., strong electron-withdrawal effect) and related it causally to the reaction center (i.e., due to the −I effect, electron density is subtracted, which is why the partial positive charge is amplified and the attack takes place more quickly).


	Fig. 5 Process from coding to pattern classification of Clara. On the top is an example of an answer the student wrote into the scaffold grid with the corresponding code for this relation. On the bottom, the development from the initial codes to the pattern is shown (the code corresponding to the shown written solution is highlighted).

In addition, we coded whether students reasoned in a uni- or multivariate way (i.e., considered none, one, or two different property changes in at least one complete relation respectively). To get the highest code in both dimensions (i.e., causal complexity and multivariateness), a student had to form at least one relation to two different property changes on the level of causal electronic. Thus, a pattern derived from these codes would fill the respective quadrants in the dimension of causal complexity as well as multivariateness. Based on this coding, patterns for students’ written answers were formed by filling out (or not filling out) the quadrants. The different pattern types exist alongside each other and are not hierarchically organized.

Daniels’ written response in Fig. 6 illustrates the procedure. In his answer in the cell effect of property on change/reaction center, for example, it is not apparent that the property (i.e., electron withdrawing → electronegative) of an explicit structural difference (i.e., Cl) is linked to the property change (i.e., formation of a negative charge) or the reaction center (i.e., the carbonyl carbon). Daniel, therefore, got the code no effect of property on change/reaction center for his explanation. This answer resulted in the code incomplete in the causal complexity dimension. Since this code was assigned for both possibly considered property changes respectively, Daniel neither reached an appropriate level of causal complexity nor multivariateness. The pattern was left blank for his scaffold grid (Fig. 6).


	Fig. 6 Process from coding to pattern classification of Daniel. On the top is an example of an answer the student wrote into the scaffold grid with the corresponding code for this relation. On the bottom, the development from the initial codes to the pattern is shown (the code corresponding to the shown written solution is highlighted).

Clara (Fig. 5), as mentioned above, got the merged code causal electronic for the relations to one of the two property changes. She got the code incomplete for the relations to the other property change. Thus, her pattern is filled in the causal complexity but not in the multivariateness dimension (more examples in Appendix 4).

Quantitative analysis

Determining score groups. In the first step, we scored the paper-pencil tests. The likelihood to answer the items correctly was determined to eliminate items that were too often (>90%, i.e., two answer options) or too infrequently (<10%, i.e., one answer option) answered correctly and were therefore not differentiating. This deletion led to a total number of 41 answer options. We did not conduct a RASCH analysis or other logistic item-response-theory models, since our sample was too small (recommended sample size for RASCH analysis is N > 100 (De Ayala, 2013)). For reliability calculation for the pretest and posttest (α_pre = 0.77, α_post = 0.82) the library psych (Revelle, 2021) and 35 additional test sheets of the same cohort who did not work with the scaffold were used. To calculate the total score of the paper-pencil tests the sum of points achieved by each student was divided by the maximum number of possible points. Low and high-score groups were formed by dividing our sample into students who achieved less than 50% of the maximum achievable points (i.e., the low-score group) and students who achieved more than or equal to 50% (which is the limit in German universities, as of when a test is passed) of the maximum achievable points in the pretest (i.e., the high-score group). We also compared how many students of which score group were assigned to which pattern type (further details are shown in Appendix 5).

Building groups in both dimensions. For further analyses, groups were built along the dimensions of causal complexity and multivariateness. In the dimension of multivariateness, the students were divided into three groups according to the number of property changes they considered in their relations: (1) none (i.e., no property changes considered), (2) univariate (i.e., one property change considered), and (3) multivariate (i.e., two property changes considered).

In the dimension of causal complexity, three groups were formed as well (according to the highest merged code): (1) none (i.e., highest merged code incomplete), (2) low causal complexity (i.e., highest merged code non-causal), and (3) high causal complexity (i.e., highest merged code causal electronic).

Formulating hypotheses. Through these groups, we were able to determine whether there were significant differences between the groups low vs. high score, the groups none vs. low causal complexity vs. high causal complexity, and the groups none vs. univariate vs. multivariate regarding their pretest and posttest scores, respectively. We were also able to investigate whether these groups increased significantly from pre- to posttest regarding their conceptual knowledge score.

To test the independence of these grouping variables from each other, χ² tests (with Monte Carlo simulations due to our small sample sizes) were conducted.

For the differences between the groups of the three grouping variables (i.e., score groups, causal complexity groups, and multivariateness groups) at the time of the pretest and posttest, the hypothesis to be tested for the pretest and posttest was the following for each of the three grouping variables respectively:

There is a significant difference in terms of conceptual knowledge scores at the time of the pretest and posttest between the different groups. (H _1A)

For the score difference between pretest and posttest within each group of the three grouping variables (i.e., score groups, causal complexity groups, and multivariateness groups), the following hypothesis was tested for each group in each grouping variable respectively:

There is a significant increase in the group's score after working with a scaffolded contrasting cases task compared to before the task. (H _1B)

Testing the hypotheses. Since the number of members in each group of the three grouping variables (i.e., score groups, causal complexity groups, and multivariateness groups) was very low (e.g., N = 4), we decided to use non-parametric tests because normal distribution could not be investigated with certainty for such small samples. We used Kruskal–Wallis tests for comparisons between independent samples, Wilcoxon rank-sum tests for post hoc analyses and comparisons between two independent samples as well as Wilcoxon signed-rank tests for comparisons between dependent samples (pre–post comparisons) (Field et al., 2012). All used non-parametric tests need ordinal or metric scaled variables but do not have to meet any other assumptions. To be able to compare the medians between the groups, the homogeneity of variances was tested with Levene's test (Field et al., 2012) as can be seen in Appendix 6, Table 4.

Since this study was exploratory and the sample size was small, FDR control was used (i.e., Benjamini-Hochberg correction) instead of FWER control to adjust the p-values for multiple testing, which has more test power and is more likely to discover effects, that can be validated in a confirmatory study (Benjamini and Hochberg, 1995; Groppe et al., 2011). The level of significance criterion was set to <0.05 for not adjusted and all FDR-adjusted p-values.

Data frames were transformed for testing with functions of the library reshape2 (Wickham, 2007) and tidyverse (Wickham et al., 2019); descriptive statistics were calculated with pastecs (Grosjean and Frederic, 2018). The tests were conducted with the library psych (Revelle, 2021). Levene's tests were performed with the library car (Fox, Sanford, 2019). Effect sizes were calculated with a custom function based on the calculation of the effect size r (Field et al., 2012), with the limits r ≥ 0.10 for small effects, r ≥ 0.30 for medium effects, and r ≥ 0.50 for large effects (Cohen, 1992). Everything else was analyzed using the basic functions included in R (R Core Team, 2021).

Results and discussion

Qualitative results

What kind of patterns (in terms of causal complexity and multivariateness) can be observed in students’ written answers in the scaffold?

Based on the coding described above, we were able to qualitatively characterize patterns, which could be found in students’ written answers in the scaffold grid along the two dimensions of causal complexity and multivariateness.

Pattern I: incomplete/no relation. The first pattern (Fig. 7) was allocated to students whose relations were all coded as incomplete. Four students were assigned to this pattern type. Daniel (Fig. 6) exemplarily matched this pattern. He did neither reach the first level of causal complexity nor the first level of multivariateness. Students whose answers showed this pattern seem to have difficulties carrying out the subtasks of the scaffold as intended. The scaffold was neither able to help these students to activate resources at all nor to integrate them.


	Fig. 7 All five identified pattern types are based on students’ answers in the scaffolded contrasting cases task. For design purposes, some patterns are shown side by side.


	Fig. 8 Process from coding to pattern classification of Alexander.

Pattern II: non-causal/univariate. Pattern II appeared three times in our data. If a student received the merged code non-causal for relations to one of the two property changes identified and the code incomplete for the other one (or did not consider a second property change), this corresponded to pattern II (Fig. 7). Alexander (Fig. 8) was assigned to this pattern type. Students corresponding to this pattern showed initial stages for an electronic causal response for the relations to one property change.

All students with this pattern included the mesomeric effect but did not explain it electronically or related it in a non-causal way to the property change or reaction center. This shows that the scaffold was able to help these students activate resources and partially integrate them. No relation was formed for another property change. One reason why the students did not get beyond the non-causal level and argued univariately could be that the prompt did not fit the needs of these students to activate the necessary resources to go beyond a non-causal level and that the prompt did not further encourage them to look for more possible relations. However, it could also be the case that the experience with this type of question was lacking, which could be resolved by reframing the task (Eckhard et al., 2021) (e.g., by changing the wording of the subtask or by giving more examples).

Pattern III: non-causal/multivariate. To be assigned to pattern III (Fig. 7), which was found three times in our data, the students had to form relations to both property changes. The relations to two different property changes received the merged code non-causal, (e.g., Mary and Emma, Appendix 4). Two of the students addressed steric effects on one of the two considered property changes. Since the responses of these students were similar to those from pattern II, the explanation for the students to remain at a non-causal level of complexity could be similar.

Pattern IV: causal electronic/univariate. Pattern IV (Fig. 7) was allocated to seven students who got the merged code causal electronic for the relations to one of the two identified property changes and received the code incomplete for the other one (or did not recognize a second property change). The written work of seven students was assigned to this pattern; thus, it was the most frequent one. Clara (Fig. 5) was assigned to this pattern type. Remarkably, there were only students assigned to this pattern type who were either able to describe both relations causally and electronically (i.e., got the code causal electronic for both relations to one property change) or who were not able to create a second relation to the same property change, they referred a causal electronic relation to, at all. It seems that the scaffold was able to prompt these students to activate and integrate their resources. The lack of a second relation to the same property change could be the result of a missing explicit prompt to write down all relations between the properties of the explicit structural differences and property changes.

Pattern V: causal electronic/multivariate. To be assigned to pattern V (Fig. 7) students had to get the merged code causal electronic for the relations to two different property changes. This was the case for one student (Fig. 9). This pattern type is similar to pattern IV, except that two property changes instead of one were taken into account. The prompts of the scaffold were sufficient for this student to help her to activate and integrate her resources for relations to two different property changes. Therefore, the student in this pattern type reached the second level in both dimensions.

Overall, we were able to categorize all students into five pattern types along the dimensions of causal complexity and multivariateness. A higher pattern type number is not necessarily equivalent to a higher quality of the answer.


	Fig. 9 Process from coding to pattern classification of Johanna.

However, a higher pattern number serves as a good indicator to assess how elaborate a student's response is, on the one hand, in terms of the depth of single relations (i.e., activation and integration of resources) and, on the other hand, regarding the number of recognized relations. Considering more than one aspect in the reasoning process has been shown to support a more differentiated justification of the arguments (Watts et al., 2021; Lieber and Graulich, 2022). However, this task could be answered sufficiently with only one argument.

These five qualitative patterns that emerged from students’ written data illustrated how students were working with the scaffold to solve the given case comparison and allowed them to easily visualize the elaborateness of the answers in two dimensions.

Quantitative results

To what extent does students’ gain in their conceptual knowledge score, after working with the scaffold, relate to their conceptual knowledge score before working with a scaffolded contrasting cases task (i.e., low- vs. high-score group) and to their level of causal complexity and multivariateness shown in the scaffold?

To exploratively examine how students’ score gain is related to their score group and the levels in the dimensions of causal complexity and multivariateness, we first investigated whether these three grouping variables (i.e., score groups, causal complexity groups, and multivariateness groups) were independent of each other. Subsequently, rank sum analyses were performed to determine whether there were significant differences between the groups in terms of conceptual knowledge score (i.e., both in terms of score gain and the score in pre- and posttest respectively).

Independence of the grouping variables. χ ² tests were conducted with the groups (1) low vs. high score, (2) none vs. low causal complexity vs. high causal complexity and (3) none vs. univariate vs. multivariate. There was no statistically significant association between the score- and causal complexity groups (p = 0.693) or between the score- and multivariateness groups (p = 0.807). Specifically, this means that, for example, someone with a high score in the pretest was not automatically in the high causal complexity group, or that someone from the low-score group was not necessarily in the low causal complexity group. Due to the equal condition to be assigned to the none-group in both the causal complexity and the multivariateness groups (both were students from pattern I), the χ² test was significant (p = 0.000) as long as the none-group was included. However, without the none-group, the χ² test was no longer significant (p = 0.241) for the causal complexity and multivariateness groups. The variables can therefore be examined separately with exception of the none-group in the following analyses. Therefore, we conducted the internal group comparisons in the causal complexity and the multivariateness groups together with the none-group for both the pre- and the post-score but limited the pre–post comparisons to consider the none-group for both causal complexity and multivariateness together.

Differences between the groups regarding their pre- and posttest score. Kruskal–Wallis tests were performed with post hoc Wilcoxon rank-sum tests (FDR-adjusted p-values) to find out whether there were significant differences between the scores of each group of the grouping variables causal complexity and multivariateness in terms of conceptual knowledge scores in the pre- and posttest respectively. Since the score-group variable consisted of two groups only Wilcoxon rank-sum tests were used for this grouping variable.

Conceptual knowledge score differences between score groups (pre- and posttest score). We found a significant difference between the conceptual knowledge score between the low-score and the high-score group in the pretest (i.e., the H_1A-hypothesis was confirmed), but not in the posttest (see Fig. 10(a) and Table 2). The fact that the two groups differ significantly in terms of their scores in the pretest is a confirmation that the separation of the groups was appropriate and that they differ sufficiently to distinguish them from each other. There is a convergence of the low-score group's scores with those of the high-score group from pretest to posttest (i.e., the difference between the groups is no longer significant in the posttest). This may indicate that the scaffold was able to provide the necessary framing for the low-score group to help activate resources in the posttest in a similar productive manner as the students in the high-score group already did in the pretest.


	Fig. 10 Visual representation of the post hoc Wilcoxon rank-sum tests. Significant differences are represented as black lines with respective effect-size. Non-significant differences are represented as grey lines with respective effect-size. Effect-sizes were also calculated for non-significant results as recommended by Hoyle (1999).

Table 2 Summary of the results of the post hoc Wilcoxon rank-sum tests. Significant (FDR corrected) p- and r-values are bold. M is the median of the corresponding group

Comparison	Pretest				Posttest
Comparison	M _1st	M _2nd	p	r	M _1st	M _2nd	p	r
Low score vs. high score	0.341	0.598	0.012	0.79	0.476	0.646	0.102	0.58
None vs. low causal complexity	0.354	0.500	0.557	0.27	0.341	0.561	0.557	0.27
None vs. high causal complexity	0.354	0.354	0.854	0.08	0.341	0.537	0.176	0.46
Low vs. high causal complexity	0.500	0.354	0.557	0.24	0.561	0.537	0.965	0.03
None vs. univariate	0.354	0.415	0.676	0.18	0.341	0.585	0.131	0.52
None vs. multivariate	0.354	0.451	0.786	0.14	0.341	0.488	0.844	0.10
Univariate vs. multivariate	0.415	0.451	>0.999	0.00	0.585	0.488	0.557	0.24

Conceptual knowledge score differences between the different performances (causal complexity and multivariateness) while working with the scaffold (pre- and posttest score). We found that there was no significant difference between the causal complexity groups regarding the conceptual knowledge scores in the pre- (H(2) = 2.034, p = 0.362) and posttest (H(2) = 3.681, p = 0.159). Thus, the students who later showed different causal complexities in the scaffold grid did not have significantly different results in the pretest. Also, the different causal complexities did not lead to significantly different results between the groups in the posttest.

We found similar non-significant results for the difference between the multivariateness groups regarding the conceptual knowledge scores for the pre- (H(2) = 0.822, p = 0.663) and posttest (H(2) = 5.082, p = 0.079). In other words, students who later showed different multivariateness in the scaffold did not differ significantly in their pretest results. Furthermore, different multivariateness did not lead to significantly different results between the groups in the posttest.

Fig. 10(b) and (c) show the results of the post hoc analyses (i.e., Wilcoxon rank-sum tests. All post hoc results are summarized in Table 2).

Some group differences between pretest and posttest increased, especially between the none and low causal complexity, none and high causal complexity, none and univariate as well as none and multivariate groups, but not to such a degree that they became significant. Based on these results, we can assume that framing in form of a scaffold grid, led 14 students to activate and/or integrate resources in the scaffolded task itself as well as activating more resources in the posttest, but not to the extent that they outperformed students for whom framing did not lead to activation or integration of resources in the scaffolded task.

Differences within the groups regarding their increase in conceptual knowledge score from pre- to posttest. For further analysis, Wilcoxon signed-rank tests were conducted to inspect the score improvement from pre- to posttest in general and the influence of the different groups on it (Fig. 11). The p-values were adjusted using FDR-control due to multiple testing (Appendix 7, Table 5).


	Fig. 11 Boxplots for pre–post comparisons (Wilcoxon signed-rank tests). Section (a) shows the overall difference between pre- and posttest as well as the progression for each score group. Section (b) shows the progression dependent on the work with the scaffold (i.e., causal complexity and multivariateness). NS: not significant, *: p < 0.05.

In general (Fig. 11a) the conceptual knowledge score increased significantly (V = 26, p = 0.036) from pre- (M (Median) = 0.378) to posttest (M = 0.512) with a medium effect of r = 0.39. This result shows, that the scaffolded problem-solving led to a general increase of the conceptual knowledge score (i.e., H_1B-hypothesis was confirmed).

Increase in conceptual knowledge score within the score groups (low- and high-score group). In the next step the influence of the score group on the gain in conceptual knowledge score from pre- to posttest was tested (Fig. 11(a)). On average, we found that the increase for the low-score group from pre- (M = 0.341) to posttest score (M = 0.476) was significant V = 6, p = 0.036 with a large effect size of r = 0.52 (i.e., H_1B-hypothesis was confirmed). For the high-score group, the increase from pre- (M = 0.598) to posttest (M = 0.646) was not significant V = 5, p = 0.784 with a small effect size of r = 0.16. The students in the low-score group achieved a higher, significant gain in scores in the conceptual knowledge test than the students in the high-score group, whose gain was not significant.

The non-significant change in score gain for the high-score students can be interpreted, on the one hand, with high-score students most likely having well-consolidated and linked resources, thus, the scaffold might not have further added new connections of resources, which in turn did not influence the activation of resources in the posttest. On the other hand, the scaffold might have imposed a certain approach to the task, which may not have matched the students’ approaches.

Just by looking at the pre–post score gain, one could conclude that the students in the high-score group did not profit from the scaffold and that the low-score students profited to increase their scores in the posttest.

As the scaffold does not provide conceptual knowledge, but rather prompts (i.e., provides a different framing) students to activate resources and integrate them, an increase in conceptual knowledge in the posttest is likely to be linked to how they could better integrate available resources by working with the scaffold.

Increase in conceptual knowledge score within the groups regarding the work with the scaffold (none, low causal complexity, high causal complexity, univariate, and multivariate group). In the next step, we thus investigated whether how students worked with the scaffold (i.e., causal complexity and multivariateness) made a difference regarding the increases in conceptual knowledge scores from pre- to posttest (Fig. 11(b)). Students who were assigned to the none-group in both dimensions causal complexity and multivariateness did not increase significantly V = 3, p > 0.999 in their score from pre- (M = 0.354) to posttest (M = 0.341) with no effect r = 0.00.

The median score even dropped from pre- to posttest. This finding indicates that the scaffold must be used in a meaningful way to profit from it.

Students who built relations with high causal complexity (i.e., high causal complexity group) increased significantly V = 0, p = 0.036 from pre- (M = 0.354) to posttest (M = 0.537), with a large effect-size r = 0.61 (i.e., H_1B-hypothesis was confirmed). For the students with low causal complexity, we found an average increase from pre- (M = 0.5) to posttest (M = 0.561) which was not significant (V = 7, p = 0.784) with no effect r = 0.00.

The significant increase in the conceptual-knowledge score for students with high causal complexity compared to the non-significant increase for students with none or low causal complexity indicates that students who built relations with higher causal complexity seem to have benefited from the scaffold. To achieve higher causal complexity, the learner must activate and integrate resources for the given context, leading to more in-depth work with the scaffold prompts.

For multivariateness, there was a non-significant increase for the students who built multivariate relations from pre- (M = 0.451) to posttest (M = 0.488), V = 5, p > 0.999, with no effect r = 0.00. Students with univariate argumentation, however, had a significant increase from pre- (M = 0.415) to posttest (M = 0.585), V = 0, p = 0.036, with a large effect size of r = 0.62 (i.e., H_1B-hypothesis was confirmed).

These results put up the assumption that univariate reasoning while solving the scaffolded case comparison led to more successful activation of resources afterward than solving the task multivariately.

Further research is needed to elucidate if this observation holds for a larger sample and more complex tasks.

Conclusions

In this study, we investigated first which patterns in terms of causal complexity (i.e., resource activation and integration) and multivariateness could be identified when students work with a written scaffolded contrasting case. For this detailed analysis, students’ answers in the scaffold grid were categorized in the dimensions of causal complexity and multivariateness which resulted in five simplified patterns, all of which appeared at least once in the data. In a second step, we quantitatively explored to what extent students’ score gain in a conceptual knowledge test might be related to students’ score in the pretest (low- vs. high-score group) as well as to the level of causal complexity and multivariateness students showed in the scaffold.

Students from the low-score group seem to have benefited significantly from the scaffold when considering their gain in score compared to those in the high-score group, which remained on the same high conceptual knowledge score. This is supported by the fact, that there is a significant difference in conceptual knowledge score between the score groups in the pretest, but not in the posttest, which indicates a convergence of the scores. It can be assumed that the increase in conceptual knowledge score is a result of working with the scaffold which supported the students to activate and integrate their resources, as no additional information was given, besides the prompts, in the scaffold. The fact that a higher causal complexity in the work with the scaffold led to a significant increase in conceptual knowledge score afterward, independent of the score groups, indicates that creating causal electronic relations in the scaffold is positively related to students’ scores in the conceptual knowledge test. Providing causal electronic relations requires activating and integrating more resources to verbalize how a property is electronically influencing the change in the reaction process, than in the case of none or low causal complexity. This observed relation between a high causal complexity and an increase in score in a conceptual knowledge test requires more in-depth investigation to clarify the mechanism behind prompting and activation as well as integration of resources in learners’ mental networks.

However, the findings show as well that students who used univariate argumentation in the scaffold showed a greater increase in conceptual knowledge score than those who used multivariate argumentation. Multivariate reasoning does not necessarily imply more elaborate reasoning. However, since there was only one student in our sample who used multivariate reasoning with high causal complexity, further research with a larger sample and with more complex problems is required to further elucidate the impact of univariate vs. multivariate reasoning.

The quantitative findings document a significant increase overall and that low-score students in our sample showed a significant increase compared to high-score students, who had no significant increase in score from pre- to posttest. However, the detailed qualitative analysis of students’ work with the scaffold (i.e., the patterns identified) as well as the exploratory quantitative analysis revealed that there is much more going on when students solve scaffolded case comparisons. We might not yet fully understand and be able to capture the nuances of scaffolding. These exploratory findings discussed here illustrate the need for a more thorough analysis of how and why scaffolds can be supportive and especially answer the question of for whom and for what.

Implications

Using scaffolds in teaching

The patterns and codes derived in this study could be used in teaching as a tool to diagnose students’ work with a given scaffold grid. Multivariateness can be detected visually by looking at the grid and the number of relations formed. Causal complexity can be estimated using the merged codes used in this study.

Our quantitative analysis showed a significant increase in conceptual knowledge scores for learners with high causal complexity answers. To support students to achieve higher causal complexity, building cause-effect relationships could be trained for instance with helping cards, which prompt learners to put a relation in the right order, which might help them to integrate resources. They could also prompt learners to determine if a relation is of low or high causal complexity or could provide resources in form of knowledge elements that students should incorporate in their argumentation.

Many students in the sample successfully used univariate argumentation. However, teachers should explicitly and regularly create tasks, that require multivariate argumentations (Lieber and Graulich, 2020), since problems in organic chemistry often contain more than one variable. From former research, we know that students tend to reason univariately, even if prompted explicitly to consider multiple variables (Kraft et al., 2010). In this regard, it could also be helpful to explicitly promote the process of weighing arguments to avoid many arguments being listed but only superficially considered (Lieber and Graulich, 2022). Giving students the opportunity to use the same framing for multiple arguments might also lead to a structurally stable frame, which helps students to activate and integrate resources in other contexts without the needing for repeated explicit prompting (Elby and Hammer, 2010).

To give students in the high-score group, who showed no significant increase, the opportunity to further improve their reasoning, it might be useful to let them work on tasks that stimulate their epistemological thinking to optimize their approach (Hammer et al., 2005). Adaptive scaffolding could represent a good way to achieve this and customize scaffolds to the needs of high performers (Kalyuga, 2007).

To fade the scaffold and let students solve the task step by step more independently, the grid structure could be omitted as a first step so that the students receive only the prompts of the subtasks. In the second step, the prompts could be further summarized so that only the keywords property change, properties of structural differences, and effects of properties on the property change are presented to completely omit scaffolding in the third step. This might also be a good way to test, whether the scaffold grid led to a structurally stable frame for students’ resource activation and integration (Elby and Hammer, 2010).

Further implications for research

To investigate whether the identified patterns and dimensions in our sample also appear in other contexts, it is necessary to conduct research in this respect, which, for example, examines other reaction types. Likewise, the results should also be compared with other samples to see whether these pattern types are also applicable in other settings.

Given that multivariateness and causal complexity (i.e., activation and integration of resources) differently affected students’ conceptual knowledge gain (i.e., increase of activation of resources in the posttest compared to the pretest) in our sample, it might be worth considering other types of knowledge when measuring the impact of scaffolds. Based on the findings, it might be promising to look at how procedural knowledge (Rittle-Johnson and Star, 2007; Rittle-Johnson and Star, 2009) develops through working with a scaffold, also because the scaffold, due to its structure, might be particularly suitable for establishing connections between resources. Promoting this interconnectedness of knowledge represents another central part of mechanistic reasoning that needs to be investigated (Bodé et al., 2019). However, a conceptual knowledge test might not capture the effect of linking and connecting multiple aspects (and thus resources) when prompting students to reason multivariately. Moreover, the task we presented could be solved well univariately. A comparable student sample could be retested with an instrument, that measures procedural knowledge and conceptual knowledge on separate scales. The written task could be designed to require multivariate reasoning and weighing of several aspects to be solved successfully to investigate whether the ability to form more complex and multiple relations in the scaffold results in an improved understanding of how to link components by being able to integrate resources while reasoning about mechanistic problems. Assessing only students’ conceptual knowledge after working with a scaffold may underestimate the potential of scaffolding and may capture too small a picture of students’ epistemological resources.

Besides acknowledging the conceptual or procedural understanding, students’ ability in thinking about multiple arguments and causality should be taken into account when building adaptive scaffolding. It might, thus, be important to further explore how adaptively designed scaffolds (Lieber and Graulich, 2022) can be used to achieve equally beneficial support for the high-score group and students from the lower end of the low-score group who had problems working with the scaffold (i.e., categorized in pattern I).

For a deeper understanding of students’ thought processes and more accurate conclusions about the impact of scaffolding on structures of students’ epistemological resources, it would be useful to conduct a qualitative interview study like Caspari and Graulich (2019). Using these data, an in-depth analysis could be conducted with a finer grain-size allowing the influence on students’ resources to be more clearly inferred.

Additionally, fading the scaffold should be investigated, for example by reducing the guidance stepwise over a semester. The written scaffolds could be analyzed at multiple points to assure that students can internalize the scaffolds’ sequence of steps to find causal mechanistic arguments and if fading is meaningful for this scaffold (Belland et al., 2017b).

Limitations

For the qualitative part of this study, data from students’ written responses were collected and, due to the overall setting, no asking of follow-up questions during the data collection was possible, which could have added another layer of interpretation. Through merging and adjusting the coding system, though, we tried to ensure the most precise evaluation possible without having to interpret the data to an unreasonable extent.

The grain-size of the qualitative part of the study is also relatively large due to the few and very short texts in the cells of the scaffold grid. Thus, inferences about the structures of students’ epistemological resources when using a scaffold can only be drawn roughly. It would therefore be useful to conduct further in-depth investigations in a study focusing on a qualitative analysis (for example, using interviews). However, the results we found can serve as a promising stepping stone for further qualitative studies with similar but more detailed data.

Although we used this type of scaffold in earlier studies and evaluated the understanding of the prompts in various pilot settings, how students of this cohort were framing the prompts at the moment of working with the scaffold was not assessed. As framing a prompt is a context-specific act of interpreting a situation, influenced by epistemological assumptions, the results reported therein only provide a limited perspective on students’ work with a scaffold.

The total duration of the intervention was 60 minutes, which is a short period to both diagnose conceptual knowledge in the pretest and posttest and to conduct a problem-solving phase. As this study was meant to be exploratory, we aimed at generating hypotheses on how and to what extent students were working with the scaffold. Further research is needed here to strengthen and verify the hypotheses generated.

As only addition–elimination mechanisms on carboxyl derivatives were used, we were only able to make statements about students’ reasoning concerning these types of mechanisms and cannot relate the patterns shown in the scaffold to their mechanistic reasoning in other contexts. In addition, the coding system refers to the data collected and may have gaps regarding causal links we did not consider, such as causal reasoning solely regarding explicit properties, which did not occur in our type of task.

Since students were able to participate in the study voluntarily, it is possible that more high-achieving students participated, as they are more likely to feel confident in such a task. However, we could see in the data that more than half of the participants answered less than 50% of the tasks correctly in the pretest, which indicates that we covered a broad spectrum of prior knowledge.

The quantitative part of this study aimed to explore the interplay of conceptual knowledge and students’ written solutions in a scaffold in a small sample size, thus, only observations made in this sample could be reported. Although the selected tests are suitable for small samples (Dwivedi et al., 2017), uncertainty remains as to whether these results can be generalized to other samples. Larger studies should be conducted to confirm the findings.

Conflicts of interest

There are no conflicts to declare.

Appendix 1. Translations of students’ responses

Table 3 Table with the translations for students’ responses used in this article

Pseudonym	German	English
Daniel	Eigenschaftsänderung: “Ausbildung einer negativen Ladung”	Property change: “formation of a negative charge”
	Expliziter struktureller Unterschied: “Cl“	Explicit structural difference: “Cl”
	Eigenschaft: “e- ziehend → elektronegativ“	Property: “electron withdrawing → electronegative”
	Effekt der Eigenschaft auf die Eigenschaftsänderung/das Reaktionszentrum: “schneller”	Effect of property on change/reaction center: “faster”

Mary	Eigenschaftsänderung: “Tetraedrischer Aufbau”	Property change: “tetrahedral structure”
	Expliziter struktureller Unterschied: “Halogen”	Explicit structural difference: “halogen”
	Eigenschaft: “elektronenschiebend”	Property: “electron pushing”
	Effekt der Eigenschaft auf die Eigenschaftsänderung/das Reaktionszentrum: “sterische Hinderung am Tetraeder“”	Effect of property on change/reaction center: “steric hindrance on the tetrahedron”

Emma	Eigenschaftsänderung: “Bildung eines Oxoniumions”	Property change: “formation of an oxonium ion”
	Expliziter struktureller Unterschied: “Ester”	Explicit structural difference: “ester”
	Eigenschaft: “Mesomerieeffekt durch Ester”	Property: “mesomeric effect by ester”
	Effekt der Eigenschaft auf die Eigenschaftsänderung/das Reaktionszentrum: “Stabilisierung der negativen Ladung durch Mesomerie”	Effect of property on change/reaction center: “stabilization of the negative charge via mesomeric effect”

Alexander	Eigenschaftsänderung: “Ausbildung einer kovalenten Bindung + (Entstehung) bzw. Verlagerung einer neg. Ladung	Property change: “formation of a covalent bond + (formation) or displacement of a negative charge”
	Expliziter struktureller Unterschied: “-Cl als Rest”	Explicit structural difference: “-Cl as residue”
	Eigenschaft: “Kann Elektronendichte durch mesomere Effekte verlagern (stellt Elektronen bereit)	Property: “can transfer electron density by mesomeric effects (provides electrons)”
	Effekt der Eigenschaft auf die Eigenschaftsänderung/das Reaktionszentrum: “kann das entestehende Carbokation gut über +M-Effekt stabilisieren”	Effect of property on change/reaction center: “can stabilize the emerging carbocation well via +M-effect”

Clara	Eigenschaftsänderung: “Addition einer zusätzlichen Hydroxylgruppe”	Property change: “addition of an additional hydroxyl group”
	Expliziter struktureller Unterschied: “Säurechlorid (Cl-)”	Explicit structural difference: “acid chloride (Cl-)”
	Eigenschaft: “stark elektronenziehender Effekt (−I)”	Property: “strong electron-withdrawal effect (−I)”
	Effekt der Eigenschaft auf die Eigenschaftsänderung/das Reaktionszentrum: “Durch den −I-Effekt wird Elektronendichte abgezogen, weswegen die partial positive Ladung verstärkt wird und der Angriff schneller erfolgt”	Effect of property on change/reaction center: “Due to the −I effect, electron density is subtracted, which is why the partial positive charge is amplified and the attack takes place more quickly.”

Appendix 2. Items used in the conceptual knowledge test


	Fig. 12 Three tasks as they appeared in the conceptual knowledge test. In the upper task, the students were asked to tick what applies. Correct answers are marked with a green cross.

The items shown here are exemplary of the items the paper-pencil test contained. The main goal was to test students’ conceptual knowledge. Other types of knowledge (e.g., procedural knowledge (Rittle-Johnson and Star, 2007; Rittle-Johnson and Star, 2009) or epistemological knowledge (Hammer et al., 2005)) were not tested explicitly.

Appendix 3. Sample solution for the scaffolded contrasting case


	Fig. 13 Sample solution for the contrasting case task, which should be solved by the students during the problem-solving phase filled in into the scaffold grid.

Appendix 4. Examples of students’ pattern determination

We want to give two more examples of how we coded students’ answers and transformed them into patterns. Mary (Fig. 14) received the code explicit for her solution. In this case, she built a relation between the explicit structural difference (i.e., halogen) and the property change (i.e., tetrahedral structure). She left out the property of the explicit structural difference (i.e., electron pushing) in the cell effect of property on change/reaction center (i.e., steric hindrance on the tetrahedron). She gets the code explicit because she did not further relate the property of an explicit structural difference to a property change, but built a direct relation between the explicit structural difference and the property change. The resulting merged code was non-causal. Since Mary received the code non-causal for the relations to two different property changes, the pattern that corresponded to her written answer could be halfway filled in the dimensions of causal complexity and multivariateness. Emma, as shown in Fig. 15 obtained the code non-electronic. She recognized an implicit property (i.e., mesomeric effect) of an explicit structural difference (i.e., ester) and related the property to a property change (i.e., formation of an oxonium ion), but she neither described the property electronically nor did she describe the effect of this property on the property change (i.e., stabilization of the negative charge via mesomeric effect) electronically (i.e., stabilization). To get the code electronic non-causal or causal electronic, Emma would have had to describe the mesomeric effect of the ester, for example, as an effect that provides π-electrons and call the effect on the negative charge, instead of stabilization, for example, as attenuation of the negative charge. The merged code in the dimension of causal complexity was non-causal for Emma. Since Emma also received the non-causal code for the relations to the second property change, her pattern was the same as Mary's.


	Fig. 14 Process from coding to pattern classification of Mary. On the top is an example of an answer the student wrote into the scaffold grid with the corresponding code for this relation. On the bottom, the development from the initial codes to the pattern is shown (the code corresponding to the shown written solution is highlighted).


	Fig. 15 Process from coding to pattern classification of Emma. On the top is an example of an answer the student wrote into the scaffold grid with the corresponding code for this relation. On the bottom, the development from the initial codes to the pattern is shown (the code corresponding to the shown written solution is highlighted).

Appendix 5. Qualitative comparison of score group and pattern

Fig. 16 gives a graphical overview of how the patterns are distributed along the two score groups. No clear trend that could be explained based on students’ score group was apparent. Many students in the low-score group showed patterns I and IV. Only one student from the low-score group was assigned to pattern V, whereas another student from this group showed pattern I. In the high-score group, four students showed patterns III and IV.


	Fig. 16 Graphical overview of the distribution of patterns along the score groups. The columns represent the two score groups. The rows represent the five patterns. At the end of each column and row, the sums of the whole column/row are shown.

Appendix 6. Levene's test results

Table 4 Results for the Levene test per pair of groups. In the left column, all pairs of groups are listed. In the right column, the respective F- and p-values can be found. All tests indicated homogeneity of variances since no result is significant

Pair of groups	Levene's test result
Pretest score by score-groups	F(1,16) = 0.03, p = 0.855
Pretest score by causal complexity	F(2,15) = 0.07, p = 0.932
Pretest score by multivariateness	F(2,15) = 0.09, p = 0.911
Posttest score by score-groups	F(1,16) = 1.26, p = 0.278
Posttest score by causal complexity	F(2,15) = 0.24, p = 0.792
Posttest score by multivariateness	F(2,15) = 0.13, p = 0.881

Appendix 7. FDR-adjusted p-values

Table 5 List of adjusted p-values for the Wilcoxon signed-rank tests (pre–post comparisons). The FDR adjustments for the Wilcoxon ranked-sum tests were automatically calculated by the R function. Significant p-values are printed in bold

Test	p _original	p _adjusted
Overall	0.018	0.036
Low score	0.011	0.036
High score	0.588	0.784
None	>0.999	>0.999
Low causal complexity	0.529	0.784
High causal complexity	0.014	0.036
Univariate	0.006	0.036
Multivariate	>0.999	>0.999

Acknowledgements

The authors would like to thank the German Research Foundation DFG (Deutsche Forschungsgemeinschaft) for funding this research (project number: 446349713). This publication is part of the first author's doctoral (Dr rer. nat.) thesis at the Faculty of Biology and Chemistry, Justus-Liebig-University Giessen, Germany. We are thankful for all students willing to participate in the study. We would especially like to thank Sascha Bernholt and Leonie Lieber as well as all members of the Graulich Group for productive discussions and their support.

References

Alfieri L., Nokes-Malach T. J. and Schunn C. D., (2013), Learning Through Case Comparisons: A Meta-Analytic Review, Educ. Psychol., 48, 87–113.
Becker N., Noyes K. and Cooper M., (2016), Characterizing Students' Mechanistic Reasoning about London Dispersion Forces, J. Chem. Educ., 93, 1713–1724.
Belland B. R., (2011), Distributed Cognition as a Lens to Understand the Effects of Scaffolds: The Role of Transfer of Responsibility, Educ. Psychol. Rev., 23, 577–600.
Belland B. R., Walker A. E. and Kim N. J., (2017a), A Bayesian network meta-analysis to synthesize the influence of contexts of scaffolding use on cognitive outcomes in STEM education, Rev. Educ. Res., 87, 1042–1081.
Belland B. R., Walker A. E., Kim N. J. and Lefler M., (2017b), Synthesizing results from empirical research on computer-based scaffolding in STEM education: A meta-analysis, Rev. Educ. Res., 87, 309–344.
Benjamini Y. and Hochberg Y., (1995), Controlling the false discovery rate: a practical and powerful approach to multiple testing, J. R. Stat. Soc.: Ser. B (Methodol.), 57, 289–300.
Benson B. K., (1997), Scaffolding, Engl. J., 86, 126.
Bodé N. E., Deng J. M. and Flynn A. B., (2019), Getting Past the Rules and to the WHY: Causal Mechanistic Arguments When Judging the Plausibility of Organic Reaction Mechanisms, J. Chem. Educ., 96, 1068–1082.
Caspari I. and Graulich N., (2019), Scaffolding the structure of organic chemistry students’ multivariate comparative mechanistic reasoning, Int. J. Phys. Chem. Educ., 11, 31–43.
Caspari I., Kranz D. and Graulich N., (2018), Resolving the complexity of organic chemistry students' reasoning through the lens of a mechanistic framework, Chem. Educ. Res. Pract., 19, 1117–1141.
Chin D. B., Chi M. and Schwartz D. L., (2016), A comparison of two methods of active learning in physics: inventing a general solution versus compare and contrast, Instructional Sci., 44, 177–195.
Cohen J., (1992), A power primer, Psychol. Bull., 112, 155.
Cooper M. M., Stowe R. L., Crandell O. M. and Klymkowsky M. W., (2019), Organic Chemistry, Life, the Universe and Everything (OCLUE): A Transformed Organic Chemistry Curriculum, J. Chem. Educ., 96, 1858–1872.
Crandell O. M., Kouyoumdjian H., Underwood S. M. and Cooper M. M., (2018), Reasoning about reactions in organic chemistry: starting it in general chemistry, J. Chem. Educ., 96, 213–226.
Crandell O. M., Lockhart M. A. and Cooper M. M., (2020), Arrows on the Page Are Not a Good Gauge: Evidence for the Importance of Causal Mechanistic Explanations about Nucleophilic Substitution in Organic Chemistry, J. Chem. Educ., 97, 313–327.
De Ayala R., (2013), The IRT tradition and its applications, The Oxford handbook of quantitative methods, vol. 1, p. 144e169.
Deng J. M. and Flynn A. B., (2021), Reasoning, granularity, and comparisons in students’ arguments on two organic chemistry items, Chem. Educ. Res. Pract., 22(3), 749–771.
Deng J. M., Streja N. and Flynn A. B., (2021), Response process validity evidence in chemistry education research, J. Chem. Educ., 98, 3656–3666.
Deutsche Forschungsgemeinschaft, (2022), Guidelines for Safeguarding Good Research Practice, Code of Conduct.
Dwivedi A. K., Mallawaarachchi I. and Alvarado L. A., (2017), Analysis of small sample size studies using nonparametric bootstrap test with pooled resampling method, Stat. Med., 36, 2187–2205.
Eckhard J., Rodemer M., Langner A., Bernholt S. and Graulich N., (2021), Let's frame it differently – analysis of instructors’ mechanistic explanations, Chem. Educ. Res. Pract., 23(1), 78–99.
Elby A. and Hammer D., (2010), Epistemological resources and framing: A cognitive framework for helping teachers interpret and respond to their students’ epistemologies, Personal epistemology in the classroom: Theory, research, and implications for practice, vol. 4, pp. 409–434.
European Union, (2016), Regulation (EU) 2016/679 (General Data Protection Regulation), Official Journal of the European Union, OJ L 119, 04.05.2016; cor. OJ L 127, 23.5.2018.
Field A., Miles J. and Field Z., (2012), Discovering statistics using R, London: Sage.
Flynn A. B., (2021), Scaffolding Synthesis Skills in Organic Chemistry, Problems and Problem Solving in Chemistry Education: Analysing Data, Looking for Patterns and Making Deductions.
Fox J. W. and Sanford, (2019), An {R} Companion to Applied Regression, Thousand Oaks, CA, USA: Sage.
Graulich N. and Caspari I., (2021), Designing a scaffold for mechanistic reasoning in organic chemistry, Chem. Teach. Int., 3, 19–30.
Graulich N. and Schween M., (2018), Concept-Oriented Task Design: Making Purposeful Case Comparisons in Organic Chemistry, J. Chem. Educ., 95, 376–383.
Graulich N., Langner A., Vo K. and Yuriev E., (2021), Scaffolding Metacognition and Resource Activation During Problem Solving: A Continuum Perspective, Problems and Problem Solving in Chemistry Education: Analysing Data, Looking for Patterns and Making Deductions.
Groppe D. M., Urbach T. P. and Kutas M., (2011), Mass univariate analysis of event-related brain potentials/fields I: A critical tutorial review, Psychophysiology, 48, 1711–1725.
Grosjean P. I., Frederic, (2018), pastecs: Package for Analysis of Space-Time Ecological Series, Software Library, (1.3.21).
Grove N. P., Cooper M. M. and Cox E. L., (2012), Does Mechanistic Thinking Improve Student Success in Organic Chemistry? J. Chem. Educ., 89, 850–853.
Gupte T., Watts F. M., Schmidt-McCormack J. A., Zaimi I., Gere A. R. and Shultz G. V., (2021), Students’ meaningful learning experiences from participating in organic chemistry writing-to-learn activities, Chem. Educ. Res. Pract., 22, 396–414.
Hammer D. and Elby A., Epistemological resources, 2000.
Hammer D., Elby A., Scherr R. E. and Redish E. F., (2005), Resources, framing, and transfer, Transfer of learning from a modern multidisciplinary perspective, vol. 89.
Homer B. D. and Plass J. L., (2010), Expertise reversal for iconic representations in science visualizations, Instructional Sci., 38, 259–276.
Hoyle R. H., (1999), Statistical strategies for small sample research, Thousand Oaks, Calif.: Sage.
Kalyuga S., (2007), Expertise Reversal Effect and Its Implications for Learner-Tailored Instruction, Educ. Psychol. Rev., 19, 509–539.
Keiner L. and Graulich N., (2021), Beyond the beaker: students’ use of a scaffold to connect observations with the particle level in the organic chemistry laboratory, Chem. Educ. Res. Pract., 22, 146–163.
Kraft A., Strickland A. M. and Bhattacharyya G., (2010), Reasonable reasoning: multi-variate problem-solving in organic chemistry, Chem. Educ. Res. Pract., 11, 281–292.
Lajoie S. P., (2005), Extending the Scaffolding Metaphor, Instructional Sci., 33, 541–557.
Lieber L. and Graulich N., (2020), Thinking in Alternatives—A Task Design for Challenging Students’ Problem-Solving Approaches in Organic Chemistry, J. Chem. Educ., 97, 3731–3738.
Lieber L. and Graulich N., (2022), Investigating students’ argumentation when judging the plausibility of alternative reaction pathways in organic chemistry, Chem. Educ. Res. Pract., 23, 38–54.
Lin T.-C., Hsu Y.-S., Lin S.-S., Changlai M.-L., Yang K.-Y. and Lai T.-L., (2012), A review of empirical evidence on scaffolding for science education, Int. J. Sci. Math. Educ., 10, 437–455.
Machamer P., Darden L. and Craver C. F., (2000), Thinking about mechanisms, Philos. Sci., 67, 1–25.
Maeyer J. and Talanquer V., (2010), The role of intuitive heuristics in students' thinking: Ranking chemical substances, Sci. Educ., 94, 963–984.
McNeill K. L., Lizotte D. J., Krajcik J. and Marx R. W., (2006), Supporting students' construction of scientific explanations by fading scaffolds in instructional materials, J. Learn. Sci., 15, 153–191.
Noyes K. and Cooper M. M., (2019), Investigating Student Understanding of London Dispersion Forces: A Longitudinal Study, J. Chem. Educ., 96, 1821–1832.
Nückles M., Hübner S., Dümer S. and Renkl A., (2010), Expertise reversal effects in writing-to-learn, Instructional Sci., 38, 237–258.
Oksa A., Kalyuga S. and Chandler P., (2010), Expertise reversal effect in using explanatory notes for readers of Shakespearean text, Instructional Sci., 38, 217–236.
Patil I., (2021), Visualizations with statistical detail: The ‘ggstatsplot’ approach, J. Open Source Softw., 6, 3167.
Petritis S. J., Kelley C. and Talanquer V., (2021), Exploring the impact of the framing of a laboratory experiment on the nature of student argumentation, Chem. Educ. Res. Pract., 22, 105–121.
Puntambekar S. and Hubscher R., (2005), Tools for scaffolding students in a complex learning environment: What have we gained and what have we missed? Educ. Psychol., 40, 1–12.
R Core Team, (2021), R: A language and environment for statistical computing, Computer Program, (4.1.2).
Revelle W., (2021), psych: Procedures for Personality and Psychological Research, Software Library, (2.1.9).
Rittle-Johnson B. and Star J. R., (2007), Does comparing solution methods facilitate conceptual and procedural knowledge? An experimental study on learning to solve equations, J. Educ. Psychol., 99, 561–574.
Rittle-Johnson B. and Star J. R., (2009), Compared with what? The effects of different comparisons on conceptual knowledge and procedural flexibility for equation solving, J. Chem. Educ., 101, 529.
Rodemer M., Eckhard J., Graulich N. and Bernholt S., (2020), Decoding Case Comparisons in Organic Chemistry: Eye-Tracking Students’ Visual Behavior, J. Chem. Educ., 97, 3530–3539.
RStudio Team, (2022), RStudio: Integrated Development Environment for R, Computer Program, (2021.9.2.382).
Russ R. S., Coffey J. E., Hammer D. and Hutchison P., (2009), Making Classroom Assessment More Accountable to Scientific Reasoning: A Case for Attending to Mechanistic Thinking, Sci. Educ., 93, 875–891.
Russ R. S., Scherr R. E., Hammer D. and Mikeska J., (2008), Recognizing mechanistic reasoning in student scientific inquiry: A framework for discourse analysis developed from philosophy of science, Sci. Educ., 92, 499–525.
Salden R. J., Aleven V., Schwonke R. and Renkl A., (2010), The expertise reversal effect and worked examples in tutored problem solving, Instructional Sci., 38, 289–307.
Schmidt-McCormack J. A., Judge J. A., Spahr K., Yang E., Pugh R., Karlin A., Sattar A., Thompson B. C., Gere A. R. and Shultz G. V., (2019), Analysis of the role of a writing-to-learn assignment in student understanding of organic acid–base concepts, Chem. Educ. Res. Pract., 20, 383–398.
Serif (Europe), (2022), Affinity Designer, Computer Program, version 1.10.4.
Sevian H. and Talanquer V., (2014), Rethinking chemistry: a learning progression on chemical thinking, Chem. Educ. Res. Pract., 15, 10–23.
Shemwell J. T., Chase C. C. and Schwartz D. L., (2015), Seeking the General Explanation: A Test of Inductive Activities for Learning and Transfer, J. Res. Sci. Teach., 52, 58–83.
Tabery J. G., (2004), Synthesizing activities and interactions in the concept of a mechanism, Philos. Sci., 71, 1–15.
Talanquer V., (2014), Chemistry Education: Ten Heuristics To Tame, J. Chem. Educ., 91, 1091–1097.
Underwood S. M., Posey L. A., Herrington D. G., Carmel J. H. and Cooper M. M., (2018), Adapting Assessment Tasks To Support Three-Dimensional Learning, J. Chem. Educ., 95, 207–217.
Valero Haro A., Noroozi O., Biemans H. and Mulder M., (2019), First- and second-order scaffolding of argumentation competence and domain-specific knowledge acquisition: a systematic review, Technol. Pedagogy Educ., 28, 329–345.
van de Pol J., Volman M. and Beishuizen J., (2010), Scaffolding in Teacher–Student Interaction: A Decade of Research, Educ. Psychol. Rev., 22, 271–296.
Van Der Stuyf R. R., (2002), Scaffolding as a teaching strategy, Adolescent learning and development, vol. 52, pp. 5–18.
VERBI Software, (2019), MAXQDA 2020, Computer Program, (20.4.0).
Watts F. M., Schmidt-Mccormack J. A., Wilhelm C. A., Karlin A., Sattar A., Thompson B. C., Gere A. R. and Shultz G. V., (2020), What students write about when students write about mechanisms: analysis of features present in students’ written descriptions of an organic reaction mechanism, Chem. Educ. Res. Pract., 21, 1148–1172.
Watts F. M., Zaimi I., Kranz D., Graulich N. and Shultz G. V., (2021), Investigating students’ reasoning over time for case comparisons of acyl transfer reaction mechanisms, Chem. Educ. Res. Pract., 22, 364–381.
Weinrich M. L. and Talanquer V., (2016), Mapping students' modes of reasoning when thinking about chemical reactions used to make a desired product, Chem. Educ. Res. Pract., 17, 394–406.
Westfall R. S., (1977), The construction of modern science: Mechanisms and mechanics, Cambridge: Cambridge University Press.
Wickham H., (2007), Reshaping Data wit the ‚reshape‘ Package, J. Stat. Softw., 21, 1–20.
Wickham H., (2016), ggplot2: Elegant Graphics for Data Analysis, New-York: Springer-Verlag.
Wickham H., Averick M., Bryan J., Chang W., McGowan L. D. A., François R., Grolemund G., Hayes A., Henry L. and Hester J., (2019), Welcome to the Tidyverse, J. Open Source Softw., 4, 1686.
Wood D., Bruner J. S. and Ross G., (1976), The role of tutoring in problem solving, J. Child Psychol. Psychiatry, 17, 89–100.
Yuriev E., Naidu S., Schembri L. S. and Short J. L., (2017), Scaffolding the development of problem-solving skills in chemistry: guiding novice students out of dead ends and false starts, Chem. Educ. Res. Pract., 18, 486–504.

Footnote

† In Germany it is common to speak of inductive effects (I effects) when talking about electron pushing (+I effect) and withdrawal effects (−I effect) along σ bonds and mesomeric effects (M effects) when talking about resonance. M effect is hence used synonymously for the potential of an atom or group of atoms to push π- and n-electrons to adjacent atoms or groups of atoms (+M effect) or pull π- and n-electrons towards itself (−M effect).

Click here to see how this site uses Cookies. View our privacy policy here.