David
Kranz
a,
Michael
Schween
b and
Nicole
Graulich
*a
aJustus-Liebig-University, Giessen, Institute of Chemistry Education, Heinrich-Buff-Ring 17, 35392 Giessen, Germany. E-mail: Nicole.Graulich@didaktik.chemie.uni-giessen.de
bPhilips-University, Marburg, Faculty of Chemistry, Hans-Meerwein-Strasse 4, 35032 Marburg, Germany
First published on 2nd November 2022
Reaction mechanisms are a core component of organic chemistry. Being able to handle these mechanisms is a central skill for students in this discipline. Diagnosing and fostering mechanistic reasoning is hence an important branch of chemistry education research. When it comes to reasoning about mechanisms, students often experience difficulties because they either lack conceptual understanding, cannot make appropriate inferences, or struggle to link representations to chemical concepts. Instructional tools to bridge this gap are thus required in organic chemistry education. Recently, scaffolds to support students to make connections between properties and reaction pathways as well as connecting a representation to chemical concepts are documented to be helpful to foster students’ reasoning. Although the advantage of scaffolds is reported, the question arises of how students are working with scaffolds and if scaffolds can influence students’ scores in a conceptual knowledge test. In this study, we explored in a pre–post mixed methods approach how students recruited from an organic chemistry course are working with a written scaffold. We correlated their level of causal complexity and multivariateness expressed in the scaffolds with their score in a conceptual knowledge test pre and post. The task used consisted of scaffolded contrasting cases of two addition reaction steps of a nucleophile on a carbonyl carbon. The paper-pencil test used for pre-, and posttest covered the respective conceptual knowledge. We qualitatively identified patterns along the dimensions of causal complexity and multivariateness in students’ written responses in the scaffold and looked for relationships between students’ scores in the pre- and posttest and these two dimensions. We found five different patterns in students’ responses and were able to show that the score they achieved in the pretest influenced how effectively students were supported by the scaffold. Thus, this exploratory study provides encouraging implications and insights into the use of scaffolds.
There is evidence, that scaffolding and prompting helps students to increase their mechanistic reasoning abilities (Caspari and Graulich, 2019; Noyes and Cooper, 2019; Crandell et al., 2020; Watts et al., 2021), and may lead to higher success in contexts where transfer is necessary (Grove et al., 2012; Caspari et al., 2018).
However, it still remains unclear how students are making use of scaffolds when working individually and whether the use of a scaffold for mechanistic case comparisons is equally supportive for students in a heterogeneous group. To determine both, it is necessary to look, on the one hand, at how students work with the scaffold itself (e.g., how they use a scaffold to collect arguments) and, on the other hand, at the potential gain in conceptual understanding that results from using a scaffold.
In this study, we used a pre–post mixed methods approach to explore how organic chemistry students worked with a written scaffold (i.e., their levels in the dimensions of causal complexity and multivariateness), and if a gain in conceptual knowledge is related to these dimensions as well as to students’ score in a conceptual knowledge test prior (i.e., low- vs. high-score students) to the scaffolded problem-solving setting.
In their learning progression, Sevian and Talanquer (2014) for example distinguished students’ responses in terms of modes of reasoning: descriptive, relational, linear causal, and multicomponent. An argument is descriptive if it remains superficial and only includes explicit properties. An argument is relational if implicit properties are included but their effects are not justified. Linear causal arguments recognize cause-effect relationships but explain phenomena unicausally. Multicomponent means that several causes for phenomena are considered and weighed against each other. This framework has already been successfully applied (Weinrich and Talanquer, 2016). While Deng and Flynn (2021) use similar criteria regarding modes of reasoning, they additionally include the granularity of the involved concepts and examine how they are connected. Caspari et al. (2018) distinguish between different levels of complexity when students are building relations between structural differences and changes in a reaction. Student statements are classified as low complexity if their reasoning relates directly from an explicit feature to a change. To be assigned to middle complexity, a student's explanation needs to contain additionally an implicit property derived from an explicit feature. For a student's statement to be categorized as high complexity, the student must first derive an implicit property from an explicit feature. Subsequently, it needs to be recognized that this implicit property has an electronic effect that ultimately affects the change. The number of knowledge elements used thus increased with increasing complexity. Crandell et al. (2018) make further distinctions regarding causality and the mechanistic nature of an argument. In addition to the division into no answer and non-normative, the answers were divided into descriptive general (what), descriptive causal (what and why), descriptive mechanistic (what and how), and causal mechanistic (what, why, and how). As can be derived from the corresponding questions, descriptive general means that only a simple description is given, and descriptive causal that a causal explanation is included. For an answer to be descriptive mechanistic, a mechanistic aspect (for example, electron movement) must be included. Causal mechanistic means that the students include both causal and mechanistic elements in their reasoning.
Two central dimensions can be derived from these frameworks. One is the characterization of causal complexity, which considers the elaborateness or depth of students’ mechanistic reasoning. Causal complexity involves the depth to which learners can link identified components of a mechanism to build causal arguments. The second dimension reflected in two of the four frameworks (i.e., depending on the task used) is multivariateness, which refers to considering multiple influential factors or variables governing an organic reaction. For this exploratory study, these two dimensions are applied as a lens to characterize students’ use of a scaffold.
Scaffolding as a label for explicit guidance can be used in different forms. Yuriev et al. (2017), for example, developed a scaffold for guiding students through solving physical chemistry problems. Their scaffold helped students reflect on their problem-solving strategies in metacognitive means while engaged in the problem-solving process. Flynn (2021) presented some question types that were able to scaffold students’ synthesis skill-building and could identify which strategies have been particularly successful. Scaffolding can also help to slow down the reasoning process, for example, allowing students to reflect in-depth on the concepts used (Caspari and Graulich, 2019; Graulich et al., 2021). Scaffolding can thus support students to use or activate more of their resources when solving mechanistic problems, which they could overlook when not scaffolded (Maeyer and Talanquer, 2010; Talanquer, 2014).
The ultimate aim of scaffolding is to slowly withdraw or fade out the guidance when learners become more and more proficient in solving the respective task (Puntambekar and Hubscher, 2005; McNeill et al., 2006; Lin et al., 2012). The amount of support provided by the scaffold is then gradually reduced to allow learners to apply the learned problem-solving steps independently to solve a task without further scaffolding. Deciding at which point in the learning process one should begin fading the scaffold requires an in-depth understanding of how students are working with the scaffold.
A scaffold designed by Caspari and Graulich (2019) aimed at guiding students in solving mechanistic case comparison tasks. Students were prompted to identify the explicit structural differences between the structures of the compared mechanistic steps (e.g., different-sized hydrocarbon residues at the reaction center), to identify all property changes that take place from the reactants to the products (e.g., the formation of a positive charge in the elimination step of an SN1 reaction) and finally to explain the influence of the differences on the property changes.
One finding drawn from this study by Caspari and Graulich (2019) is that the number of detected influential factors (i.e., multivariateness) increases significantly with a high effect size (Cohen's d = 1.272) when comparing students’ problem-solving processes with and without the scaffold. Students found more relations when guided by the scaffold than solving the tasks intuitively. Watts et al. (2021) found as well that scaffolds support students to take more resources into account than working on a task without scaffolding. Other studies have shown that targeted prompting increases the depth of arguments (Noyes and Cooper, 2019; Crandell et al., 2020).
On the one hand, one can assume that for a student who reaches high scores in a prior conceptual knowledge test and, thus, activates multiple knowledge elements, using a scaffold that prompts to explicitly activate knowledge elements can be redundant or distracting (Kalyuga, 2007; Oksa et al., 2010). Such an expertise reversal effect has already been documented in several studies other than chemistry with participants with differing scores in a prior-knowledge test (Homer and Plass, 2010; Nückles et al., 2010; Salden et al., 2010). On the other hand, for such a student with a high conceptual understanding, further linking concepts through scaffolded guidance might be productive, if knowledge elements can be activated easily.
If a learner has a fragmented conceptual understanding (e.g., struggles to activate the necessary knowledge elements required to solve a task independently), a scaffold can build on it (Van Der Stuyf, 2002; Lajoie, 2005), supporting students to activate, organize, and integrate resources (Hammer et al., 2005). These assumptions are supported by the principle of contingency (i.e., adaption to students’ conceptual understanding), which is a basic characteristic of scaffolding (van de Pol et al., 2010). Thereby, the support for learners should be provided either on the current level of understanding or slightly higher. Conversely, if the learner shows less conceptual understanding, then important resources to adequately solve a given task may simply be missing and may, thus, not be activated by the scaffold. Since a scaffold is not per se designed to provide information but rather to structure the problem-solving process, the learner may not be able to profit from the scaffold's instructions and activate the knowledge resources necessary.
To aim at a meaningful use of scaffolds in the classroom, one should investigate to what extent students are activating possible knowledge resources to form causal or multivariate arguments while working with a scaffold. And additionally, if there are measurable indications that students’ gain in conceptual knowledge score after working with the scaffold is related to their prior knowledge (i.e., low- vs. high-score group) and/or affected by the quality of the work with the scaffold.
Investigating scaffolding through the lens of this framework frames conceptual understanding not as a stable construct but as a sum of simultaneously activated resources from a network of knowledge elements that are activated in response to tasks or prompts. A scaffold, thus, may support the activation of different resources while solving problems compared to a non-guided setting and may strengthen the simultaneous activation of resources in new contexts. In our study, we qualitatively capture students’ resources in the scaffold by the two dimensions causal complexity and multivariateness, which require a different combination of knowledge elements. Furthermore, conceptual knowledge tests can be considered as a frame to activate students’ resources regarding their conceptual understanding which leads to the assumption that conceptual knowledge scores are merely an indicator of resource activation for the same prompts and not a reflection of the entire conceptual or prior knowledge of the learners. Working with a scaffolded contrasting cases task between two conceptual knowledge tests might thus lead to an increase in students’ test scores from pre- to posttest.
(1) What kind of patterns (in terms of causal complexity and multivariateness) can be observed in students’ written answers in the scaffold?
(2) To what extent does students’ gain in their conceptual knowledge score, after working with the scaffold, relate to their conceptual knowledge score before working with a scaffolded contrasting cases task (i.e., low- vs. high-score group) and to their level of causal complexity and multivariateness shown in the scaffold?
These research questions were investigated in a mixed-methods exploratory study with chemistry undergraduate students using a conceptual knowledge paper-pencil test and a scaffolded contrasting case task.
To examine how students are working with a scaffolded task, we first qualitatively analyzed students’ written solutions in the dimensions of causal complexity and multivariateness, and, second, analyzed exploratively the interplay between the qualitative work with the scaffold and students’ score in a conceptual knowledge test.
Since meta-analyses (Belland et al., 2017a; Belland et al., 2017b), as well as a systematic review (Valero Haro et al., 2019), suggest that the effects of scaffolding on students’ conceptual understanding can be assessed in a pre–post design (showing significant effects), this study was conducted in a pre–post design as well. As we were aware of low test power as one of the main disadvantages of small sample research, we followed some strategies (e.g., reporting effect sizes of non-significant test results) recommended by Hoyle (1999). We also used the less strict false discovery rate correction (FDR-correction) instead of family-wise error rate correction (FWER-correction) to minimize the type II error probability, which is also a meaningful choice for exploratory research (Benjamini and Hochberg, 1995; Groppe et al., 2011).
The students were informed both orally and in form of a consent form about their rights regarding their data and how the data is processed. The privacy policy was based on the regulations of the General Data Protection Regulation (GDPR) by the European Union (2016), which represents an EU-wide uniform data protection regulation. They were also informed that their anonymized writing and data would be analyzed and discussed by members of the research group and be used for publications. Institutional Review Board approval is not required at German universities, nevertheless, the data collection followed ethical guidelines (Deutsche Forschungsgemeinschaft, 2022) and allowed each participant to drop out at any point. Informed consent was obtained from all participants. Only the authors had access to the data, digital data were stored on a local hard drive, and written tests were stored in an internal archive. Data of participants who opted out would have been deleted (i.e., files would have been deleted and written tests would have been destroyed). However, no participant made use of this possibility.
The second part was a scaffolded contrasting cases task, which was used for the instructional setting.
The contrasting case used asks students to compare the rate of two addition steps of a nucleophile to carboxylic acid derivates (Fig. 2) by using a scaffold grid (Fig. 3) which is a modified version of a previously used scaffold design (Caspari and Graulich, 2019; Graulich and Caspari, 2021). It has already been shown in qualitative studies that this scaffold grid led to students being able to identify more implicit properties (Caspari and Graulich, 2019) and that they weighed their arguments after using the scaffold in contrast to before (Watts et al., 2021). We adapted the scaffold insofar as we considered the explicit structural difference and the explicit structural difference's property in two different cells to make it unambiguous for students what is expected of them. Additionally, we analyzed the answers and videos of students who participated in a pilot study and adjusted the scaffold grid accordingly to ensure response process validity (Deng et al., 2021). Furthermore, we discussed the scaffold and task with organic chemistry experts to ensure the accuracy of the content. In reaction A, the hydroxide anion reacts with an acid chloride, and in reaction B with an ester. The activation energy is lower in reaction A because the electron density of the electrophilic center is lower compared to B. The chlorine atom attracts electrons along the σ-bond (strong negative inductive effect) and pushes electrons only weakly along the π-bond (weak positive mesomeric effect). In contrast, the methoxy group has a strong electron-withdrawing effect along the σ-bond (strong negative inductive effect) and a strong electron-pushing effect along the π-bond (strong positive mesomeric effect). Due to the lower electron density, the carbonyl-carbon atom is more strongly partially positively charged in reaction A, which means that the electrostatic interactions between the negatively charged hydroxide anion and the partially positively charged carbonyl-carbon atom are stronger than in B and the bond can be formed more easily. Since these effects also prevail in the transition state, it is at a lower energetic level, the activation energy is, therefore, lower, and the reaction takes place faster. Students would have to activate resources regarding the explicit structural differences including their properties and their relation to at least one property change to answer this task correctly.
To solve this written task students were asked to fill out the scaffold grid, as illustrated in Fig. 3. A possible solution for this task based on the scaffold grid can be found in Appendix 3. The students did not receive further help regarding the resources required to solve the case, they only received the prompts included in the scaffold. In the context of the framing of our study, we understood students’ answers in each cell as an activation of resources while solving the written task.
For later processing, students generated codes for anonymization (8-character strings created with information only the students themselves can know). For the presentation of the data, students were randomly given gender-specific pseudonyms, as to keep their self-assigned gender identity.
The data of the paper-pencil test (pre- and posttest) as well as students’ written answers in the scaffold were transcribed for later processing.
Merged codes | Initial codes | Description of code | Student example |
---|---|---|---|
Incomplete | Non-existing/no effect of property on change or reaction center | The cell effect of a property of an explicit structural difference on a change/reaction center is either empty there is no effect of a property on a change/reaction center described no explicit structural difference is mentioned the property of an explicit structural difference or the change cannot be unambiguously assigned. | “the reaction is faster” (effect of property on change/reaction center) |
No entries in other cells for this relation | |||
Non-causal | Explicit | The effect of a property of an explicit structural difference on a change/reaction center is explained explicitly the property of an explicit structural difference is only described explicitly, an explicit structural difference is mentioned, the property of an explicit structural difference and the change can be unambiguously assigned. | “the methoxy group only slightly polarizes the carbonyl carbon” (effect of property on change/reaction center) |
“methyl group” (explicit structural difference) | |||
“big substituent” (property of an explicit structural difference) | |||
or | “nucleophilic attack” (property change) | ||
Non-electronic | The explanation about the effect of a property of an explicit structural difference on a change/reaction center is made on an implicit level, while at least one part of the chain of arguments is non-electronic an electronic property/effect is not described electronically, an explicit structural difference is mentioned, the property of an explicit structural difference and the change can be unambiguously assigned. | “mesomeric stabilization of the negative charge → greater energetic lowering of the p-orbital → effects already active in transition state → lowering of activation energy” (effect of property on change/reaction center) | |
“Cl” (explicit structural difference) | |||
“greater +M effect” (property of an explicit structural difference) | |||
or | “formation of a negative charge” (property change) | ||
Non-causal electronic | All conditions but the presence of non-electronic arguments in the previous paragraph are met. All parts of the chain of arguments are described electronically. The chain of arguments is lacking at least one part or connection that is needed to formulate a completely causal explanation. If a property has already been described electronically, which is then mentioned in the cell effect of a property of an explicit structural difference on a change/reaction center, then the property is also treated as electronically described in this cell. | “by the −I effect, Cl attracts the electrons to itself, thus enabling a nucleophilic attack, whereby a negative charge enters the system” (effect of property on change/reaction center) | |
“Cl” (explicit structural difference) | |||
“electron-withdrawing effect” (property of an explicit structural difference) | |||
“formation of a negative charge (full p-orbital)” (property change) | |||
Causal electronic | Causal electronic | All conditions of the previous paragraph are met except the chain of arguments is completely causal. | “pulls electrons from [carbonyl] carbon, [carbonyl] carbon ∂+ -> accelerates” (effect of property on change/reaction center) |
“Chloride” (explicit structural difference) “[is] strongly electronegative, draws electrons (−I)” (property of an explicit structural difference) | |||
“formation of a negative charge on the oxygen” (property change) |
The coding system was adjusted until agreement under constant discussion of the authors. Patterns were then derived from the data in the dimensions of causal complexity and multivariateness. In the second, exploratory quantitative part of our analysis, connections between students’ scores in the conceptual knowledge test before the intervention (i.e., low- vs. high-score group) as well as the dimensions of causal complexity and multivariateness with students’ gain in their conceptual knowledge score were analyzed. The software MAXQDA (VERBI Software, 2019) was used for the qualitative analysis, and RStudio (RStudio Team, 2022) with the programming language R (R Core Team, 2021) for the quantitative analysis. For graphical plotting, the R library ggplot2 (Wickham, 2016), as well as ggstatsplot (Patil, 2021) and the software Affinity Designer (Serif (Europe), 2022) for the post-editing, were used.
Students could consider either none, one, or two different property changes in a relation (i.e., none, one, or both rows of the framed cells in Fig. 4(a)). If a student formed only one relation or recognized only one property change, the argumentation is considered univariate. If instead, two different property changes were taken into account, the argumentation is considered multivariate.
Thereby, the initial coding system (Table 1) distinguished five codes, but by comparing the coded transcripts (Fig. 4, Section (a)) with each other, we noticed similarities in terms of causal complexity. The dimension of causal complexity reflects the degree of activation and integration of resources to answer the last subtask (d in Fig. 3)). In the column description of code (Table 1) we tried to outline which key elements and linkages students’ answers had to contain to be categorized in a certain level of causal complexity.
The codes have been merged as shown in Fig. 4, Section b, from five initial coding categories for students’ relations to three merged codes for the causal complexity dimension. In this way, it was possible to minimize ambiguities in students’ written answers and to find more accurate labels to capture students’ use and integration of resources. If a student has not established any complete relation from an explicit structural difference to a property change, then the merged code incomplete (i.e., no resource activation or fragmented resource activation without integration) has been assigned for the relations on that property change (i.e., the row of relations regarding this property change, see Fig. 4, Section (b)). As soon as one of the relations got a code between explicit and non-causal electronic, this was merged into the code non-causal (i.e., activation of resources and partial integration) for the relation.
The initial code explicit was assigned, whenever a student remained on an explicit level while describing the effect of a property of an explicit structural difference on a property change/the reaction center. Mary (Appendix 4), for example, mentioned an electronic property (i.e., electron pushing) for the explicit structural difference (i.e., halogen), but did not relate this property to the effect on the property change/the reaction center (i.e., steric hindrance on the tetrahedron). Hence, she compared the molecules on an explicit level to explain the effect.
We used the initial code non-electronic for relations, which contained implicit properties and described their effect on a property change/the reaction center without explaining them electronically. Emma (Appendix 4) for example used an implicit property (i.e., mesomeric effect by ester) and related it to the property change (i.e., stabilization of the negative charge via mesomeric effect), but did not explain the property on an electronic level (e.g., the electrons can be shifted along the π-bond).
Relations were coded as non-causal electronic in the initial coding, whenever a student related a property that was explained electronically on the property change/the reaction center, while the argumentation was not completely causal (i.e., a part of a causal chain was missing). Alexander (Fig. 8) for example related an electronically described property (i.e., can transfer electron density by mesomeric effects) on the reaction center (i.e., can stabilize the emerging carbocation well via +M-effect, +M-effect means positive mesomeric effect which is not applicable in this context since there are no π-bonds in the products), but did not explain, why the mesomeric effect stabilizes the carbocation (he used the word carbocation here to describe the negatively charged product). Hence, the causal chain was incomplete.
As soon as at least one relation (from an explicit structural difference to a property change/the reaction center) could be categorized as causal electronic, the code causal electronic (i.e., activation and integration of resources) was assigned as the overall code for the relations on this property change. The initial code causal electronic was assigned whenever a student built a relation from a property of an explicit structural difference to a property change or the reaction center while describing the property electronically and arguing causally without leaving a gap in the causal chain. Clara for example (Fig. 5) described the property electronically (i.e., strong electron-withdrawal effect) and related it causally to the reaction center (i.e., due to the −I effect, electron density is subtracted, which is why the partial positive charge is amplified and the attack takes place more quickly).
In addition, we coded whether students reasoned in a uni- or multivariate way (i.e., considered none, one, or two different property changes in at least one complete relation respectively). To get the highest code in both dimensions (i.e., causal complexity and multivariateness), a student had to form at least one relation to two different property changes on the level of causal electronic. Thus, a pattern derived from these codes would fill the respective quadrants in the dimension of causal complexity as well as multivariateness. Based on this coding, patterns for students’ written answers were formed by filling out (or not filling out) the quadrants. The different pattern types exist alongside each other and are not hierarchically organized.
Daniels’ written response in Fig. 6 illustrates the procedure. In his answer in the cell effect of property on change/reaction center, for example, it is not apparent that the property (i.e., electron withdrawing → electronegative) of an explicit structural difference (i.e., Cl) is linked to the property change (i.e., formation of a negative charge) or the reaction center (i.e., the carbonyl carbon). Daniel, therefore, got the code no effect of property on change/reaction center for his explanation. This answer resulted in the code incomplete in the causal complexity dimension. Since this code was assigned for both possibly considered property changes respectively, Daniel neither reached an appropriate level of causal complexity nor multivariateness. The pattern was left blank for his scaffold grid (Fig. 6).
Clara (Fig. 5), as mentioned above, got the merged code causal electronic for the relations to one of the two property changes. She got the code incomplete for the relations to the other property change. Thus, her pattern is filled in the causal complexity but not in the multivariateness dimension (more examples in Appendix 4).
In the dimension of causal complexity, three groups were formed as well (according to the highest merged code): (1) none (i.e., highest merged code incomplete), (2) low causal complexity (i.e., highest merged code non-causal), and (3) high causal complexity (i.e., highest merged code causal electronic).
To test the independence of these grouping variables from each other, χ2 tests (with Monte Carlo simulations due to our small sample sizes) were conducted.
For the differences between the groups of the three grouping variables (i.e., score groups, causal complexity groups, and multivariateness groups) at the time of the pretest and posttest, the hypothesis to be tested for the pretest and posttest was the following for each of the three grouping variables respectively:
There is a significant difference in terms of conceptual knowledge scores at the time of the pretest and posttest between the different groups. (H 1A )
For the score difference between pretest and posttest within each group of the three grouping variables (i.e., score groups, causal complexity groups, and multivariateness groups), the following hypothesis was tested for each group in each grouping variable respectively:
There is a significant increase in the group's score after working with a scaffolded contrasting cases task compared to before the task. (H 1B )
Since this study was exploratory and the sample size was small, FDR control was used (i.e., Benjamini-Hochberg correction) instead of FWER control to adjust the p-values for multiple testing, which has more test power and is more likely to discover effects, that can be validated in a confirmatory study (Benjamini and Hochberg, 1995; Groppe et al., 2011). The level of significance criterion was set to <0.05 for not adjusted and all FDR-adjusted p-values.
Data frames were transformed for testing with functions of the library reshape2 (Wickham, 2007) and tidyverse (Wickham et al., 2019); descriptive statistics were calculated with pastecs (Grosjean and Frederic, 2018). The tests were conducted with the library psych (Revelle, 2021). Levene's tests were performed with the library car (Fox, Sanford, 2019). Effect sizes were calculated with a custom function based on the calculation of the effect size r (Field et al., 2012), with the limits r ≥ 0.10 for small effects, r ≥ 0.30 for medium effects, and r ≥ 0.50 for large effects (Cohen, 1992). Everything else was analyzed using the basic functions included in R (R Core Team, 2021).
Based on the coding described above, we were able to qualitatively characterize patterns, which could be found in students’ written answers in the scaffold grid along the two dimensions of causal complexity and multivariateness.
Fig. 7 All five identified pattern types are based on students’ answers in the scaffolded contrasting cases task. For design purposes, some patterns are shown side by side. |
All students with this pattern included the mesomeric effect but did not explain it electronically or related it in a non-causal way to the property change or reaction center. This shows that the scaffold was able to help these students activate resources and partially integrate them. No relation was formed for another property change. One reason why the students did not get beyond the non-causal level and argued univariately could be that the prompt did not fit the needs of these students to activate the necessary resources to go beyond a non-causal level and that the prompt did not further encourage them to look for more possible relations. However, it could also be the case that the experience with this type of question was lacking, which could be resolved by reframing the task (Eckhard et al., 2021) (e.g., by changing the wording of the subtask or by giving more examples).
Overall, we were able to categorize all students into five pattern types along the dimensions of causal complexity and multivariateness. A higher pattern type number is not necessarily equivalent to a higher quality of the answer.
However, a higher pattern number serves as a good indicator to assess how elaborate a student's response is, on the one hand, in terms of the depth of single relations (i.e., activation and integration of resources) and, on the other hand, regarding the number of recognized relations. Considering more than one aspect in the reasoning process has been shown to support a more differentiated justification of the arguments (Watts et al., 2021; Lieber and Graulich, 2022). However, this task could be answered sufficiently with only one argument.
These five qualitative patterns that emerged from students’ written data illustrated how students were working with the scaffold to solve the given case comparison and allowed them to easily visualize the elaborateness of the answers in two dimensions.
To exploratively examine how students’ score gain is related to their score group and the levels in the dimensions of causal complexity and multivariateness, we first investigated whether these three grouping variables (i.e., score groups, causal complexity groups, and multivariateness groups) were independent of each other. Subsequently, rank sum analyses were performed to determine whether there were significant differences between the groups in terms of conceptual knowledge score (i.e., both in terms of score gain and the score in pre- and posttest respectively).
Fig. 10 Visual representation of the post hoc Wilcoxon rank-sum tests. Significant differences are represented as black lines with respective effect-size. Non-significant differences are represented as grey lines with respective effect-size. Effect-sizes were also calculated for non-significant results as recommended by Hoyle (1999). |
Comparison | Pretest | Posttest | ||||||
---|---|---|---|---|---|---|---|---|
M 1st | M 2nd | p | r | M 1st | M 2nd | p | r | |
Low score vs. high score | 0.341 | 0.598 | 0.012 | 0.79 | 0.476 | 0.646 | 0.102 | 0.58 |
None vs. low causal complexity | 0.354 | 0.500 | 0.557 | 0.27 | 0.341 | 0.561 | 0.557 | 0.27 |
None vs. high causal complexity | 0.354 | 0.354 | 0.854 | 0.08 | 0.341 | 0.537 | 0.176 | 0.46 |
Low vs. high causal complexity | 0.500 | 0.354 | 0.557 | 0.24 | 0.561 | 0.537 | 0.965 | 0.03 |
None vs. univariate | 0.354 | 0.415 | 0.676 | 0.18 | 0.341 | 0.585 | 0.131 | 0.52 |
None vs. multivariate | 0.354 | 0.451 | 0.786 | 0.14 | 0.341 | 0.488 | 0.844 | 0.10 |
Univariate vs. multivariate | 0.415 | 0.451 | >0.999 | 0.00 | 0.585 | 0.488 | 0.557 | 0.24 |
We found similar non-significant results for the difference between the multivariateness groups regarding the conceptual knowledge scores for the pre- (H(2) = 0.822, p = 0.663) and posttest (H(2) = 5.082, p = 0.079). In other words, students who later showed different multivariateness in the scaffold did not differ significantly in their pretest results. Furthermore, different multivariateness did not lead to significantly different results between the groups in the posttest.
Fig. 10(b) and (c) show the results of the post hoc analyses (i.e., Wilcoxon rank-sum tests. All post hoc results are summarized in Table 2).
Some group differences between pretest and posttest increased, especially between the none and low causal complexity, none and high causal complexity, none and univariate as well as none and multivariate groups, but not to such a degree that they became significant. Based on these results, we can assume that framing in form of a scaffold grid, led 14 students to activate and/or integrate resources in the scaffolded task itself as well as activating more resources in the posttest, but not to the extent that they outperformed students for whom framing did not lead to activation or integration of resources in the scaffolded task.
In general (Fig. 11a) the conceptual knowledge score increased significantly (V = 26, p = 0.036) from pre- (M (Median) = 0.378) to posttest (M = 0.512) with a medium effect of r = 0.39. This result shows, that the scaffolded problem-solving led to a general increase of the conceptual knowledge score (i.e., H1B-hypothesis was confirmed).
The non-significant change in score gain for the high-score students can be interpreted, on the one hand, with high-score students most likely having well-consolidated and linked resources, thus, the scaffold might not have further added new connections of resources, which in turn did not influence the activation of resources in the posttest. On the other hand, the scaffold might have imposed a certain approach to the task, which may not have matched the students’ approaches.
Just by looking at the pre–post score gain, one could conclude that the students in the high-score group did not profit from the scaffold and that the low-score students profited to increase their scores in the posttest.
As the scaffold does not provide conceptual knowledge, but rather prompts (i.e., provides a different framing) students to activate resources and integrate them, an increase in conceptual knowledge in the posttest is likely to be linked to how they could better integrate available resources by working with the scaffold.
The median score even dropped from pre- to posttest. This finding indicates that the scaffold must be used in a meaningful way to profit from it.
Students who built relations with high causal complexity (i.e., high causal complexity group) increased significantly V = 0, p = 0.036 from pre- (M = 0.354) to posttest (M = 0.537), with a large effect-size r = 0.61 (i.e., H1B-hypothesis was confirmed). For the students with low causal complexity, we found an average increase from pre- (M = 0.5) to posttest (M = 0.561) which was not significant (V = 7, p = 0.784) with no effect r = 0.00.
The significant increase in the conceptual-knowledge score for students with high causal complexity compared to the non-significant increase for students with none or low causal complexity indicates that students who built relations with higher causal complexity seem to have benefited from the scaffold. To achieve higher causal complexity, the learner must activate and integrate resources for the given context, leading to more in-depth work with the scaffold prompts.
For multivariateness, there was a non-significant increase for the students who built multivariate relations from pre- (M = 0.451) to posttest (M = 0.488), V = 5, p > 0.999, with no effect r = 0.00. Students with univariate argumentation, however, had a significant increase from pre- (M = 0.415) to posttest (M = 0.585), V = 0, p = 0.036, with a large effect size of r = 0.62 (i.e., H1B-hypothesis was confirmed).
These results put up the assumption that univariate reasoning while solving the scaffolded case comparison led to more successful activation of resources afterward than solving the task multivariately.
Further research is needed to elucidate if this observation holds for a larger sample and more complex tasks.
Students from the low-score group seem to have benefited significantly from the scaffold when considering their gain in score compared to those in the high-score group, which remained on the same high conceptual knowledge score. This is supported by the fact, that there is a significant difference in conceptual knowledge score between the score groups in the pretest, but not in the posttest, which indicates a convergence of the scores. It can be assumed that the increase in conceptual knowledge score is a result of working with the scaffold which supported the students to activate and integrate their resources, as no additional information was given, besides the prompts, in the scaffold. The fact that a higher causal complexity in the work with the scaffold led to a significant increase in conceptual knowledge score afterward, independent of the score groups, indicates that creating causal electronic relations in the scaffold is positively related to students’ scores in the conceptual knowledge test. Providing causal electronic relations requires activating and integrating more resources to verbalize how a property is electronically influencing the change in the reaction process, than in the case of none or low causal complexity. This observed relation between a high causal complexity and an increase in score in a conceptual knowledge test requires more in-depth investigation to clarify the mechanism behind prompting and activation as well as integration of resources in learners’ mental networks.
However, the findings show as well that students who used univariate argumentation in the scaffold showed a greater increase in conceptual knowledge score than those who used multivariate argumentation. Multivariate reasoning does not necessarily imply more elaborate reasoning. However, since there was only one student in our sample who used multivariate reasoning with high causal complexity, further research with a larger sample and with more complex problems is required to further elucidate the impact of univariate vs. multivariate reasoning.
The quantitative findings document a significant increase overall and that low-score students in our sample showed a significant increase compared to high-score students, who had no significant increase in score from pre- to posttest. However, the detailed qualitative analysis of students’ work with the scaffold (i.e., the patterns identified) as well as the exploratory quantitative analysis revealed that there is much more going on when students solve scaffolded case comparisons. We might not yet fully understand and be able to capture the nuances of scaffolding. These exploratory findings discussed here illustrate the need for a more thorough analysis of how and why scaffolds can be supportive and especially answer the question of for whom and for what.
Our quantitative analysis showed a significant increase in conceptual knowledge scores for learners with high causal complexity answers. To support students to achieve higher causal complexity, building cause-effect relationships could be trained for instance with helping cards, which prompt learners to put a relation in the right order, which might help them to integrate resources. They could also prompt learners to determine if a relation is of low or high causal complexity or could provide resources in form of knowledge elements that students should incorporate in their argumentation.
Many students in the sample successfully used univariate argumentation. However, teachers should explicitly and regularly create tasks, that require multivariate argumentations (Lieber and Graulich, 2020), since problems in organic chemistry often contain more than one variable. From former research, we know that students tend to reason univariately, even if prompted explicitly to consider multiple variables (Kraft et al., 2010). In this regard, it could also be helpful to explicitly promote the process of weighing arguments to avoid many arguments being listed but only superficially considered (Lieber and Graulich, 2022). Giving students the opportunity to use the same framing for multiple arguments might also lead to a structurally stable frame, which helps students to activate and integrate resources in other contexts without the needing for repeated explicit prompting (Elby and Hammer, 2010).
To give students in the high-score group, who showed no significant increase, the opportunity to further improve their reasoning, it might be useful to let them work on tasks that stimulate their epistemological thinking to optimize their approach (Hammer et al., 2005). Adaptive scaffolding could represent a good way to achieve this and customize scaffolds to the needs of high performers (Kalyuga, 2007).
To fade the scaffold and let students solve the task step by step more independently, the grid structure could be omitted as a first step so that the students receive only the prompts of the subtasks. In the second step, the prompts could be further summarized so that only the keywords property change, properties of structural differences, and effects of properties on the property change are presented to completely omit scaffolding in the third step. This might also be a good way to test, whether the scaffold grid led to a structurally stable frame for students’ resource activation and integration (Elby and Hammer, 2010).
Given that multivariateness and causal complexity (i.e., activation and integration of resources) differently affected students’ conceptual knowledge gain (i.e., increase of activation of resources in the posttest compared to the pretest) in our sample, it might be worth considering other types of knowledge when measuring the impact of scaffolds. Based on the findings, it might be promising to look at how procedural knowledge (Rittle-Johnson and Star, 2007; Rittle-Johnson and Star, 2009) develops through working with a scaffold, also because the scaffold, due to its structure, might be particularly suitable for establishing connections between resources. Promoting this interconnectedness of knowledge represents another central part of mechanistic reasoning that needs to be investigated (Bodé et al., 2019). However, a conceptual knowledge test might not capture the effect of linking and connecting multiple aspects (and thus resources) when prompting students to reason multivariately. Moreover, the task we presented could be solved well univariately. A comparable student sample could be retested with an instrument, that measures procedural knowledge and conceptual knowledge on separate scales. The written task could be designed to require multivariate reasoning and weighing of several aspects to be solved successfully to investigate whether the ability to form more complex and multiple relations in the scaffold results in an improved understanding of how to link components by being able to integrate resources while reasoning about mechanistic problems. Assessing only students’ conceptual knowledge after working with a scaffold may underestimate the potential of scaffolding and may capture too small a picture of students’ epistemological resources.
Besides acknowledging the conceptual or procedural understanding, students’ ability in thinking about multiple arguments and causality should be taken into account when building adaptive scaffolding. It might, thus, be important to further explore how adaptively designed scaffolds (Lieber and Graulich, 2022) can be used to achieve equally beneficial support for the high-score group and students from the lower end of the low-score group who had problems working with the scaffold (i.e., categorized in pattern I).
For a deeper understanding of students’ thought processes and more accurate conclusions about the impact of scaffolding on structures of students’ epistemological resources, it would be useful to conduct a qualitative interview study like Caspari and Graulich (2019). Using these data, an in-depth analysis could be conducted with a finer grain-size allowing the influence on students’ resources to be more clearly inferred.
Additionally, fading the scaffold should be investigated, for example by reducing the guidance stepwise over a semester. The written scaffolds could be analyzed at multiple points to assure that students can internalize the scaffolds’ sequence of steps to find causal mechanistic arguments and if fading is meaningful for this scaffold (Belland et al., 2017b).
The grain-size of the qualitative part of the study is also relatively large due to the few and very short texts in the cells of the scaffold grid. Thus, inferences about the structures of students’ epistemological resources when using a scaffold can only be drawn roughly. It would therefore be useful to conduct further in-depth investigations in a study focusing on a qualitative analysis (for example, using interviews). However, the results we found can serve as a promising stepping stone for further qualitative studies with similar but more detailed data.
Although we used this type of scaffold in earlier studies and evaluated the understanding of the prompts in various pilot settings, how students of this cohort were framing the prompts at the moment of working with the scaffold was not assessed. As framing a prompt is a context-specific act of interpreting a situation, influenced by epistemological assumptions, the results reported therein only provide a limited perspective on students’ work with a scaffold.
The total duration of the intervention was 60 minutes, which is a short period to both diagnose conceptual knowledge in the pretest and posttest and to conduct a problem-solving phase. As this study was meant to be exploratory, we aimed at generating hypotheses on how and to what extent students were working with the scaffold. Further research is needed here to strengthen and verify the hypotheses generated.
As only addition–elimination mechanisms on carboxyl derivatives were used, we were only able to make statements about students’ reasoning concerning these types of mechanisms and cannot relate the patterns shown in the scaffold to their mechanistic reasoning in other contexts. In addition, the coding system refers to the data collected and may have gaps regarding causal links we did not consider, such as causal reasoning solely regarding explicit properties, which did not occur in our type of task.
Since students were able to participate in the study voluntarily, it is possible that more high-achieving students participated, as they are more likely to feel confident in such a task. However, we could see in the data that more than half of the participants answered less than 50% of the tasks correctly in the pretest, which indicates that we covered a broad spectrum of prior knowledge.
The quantitative part of this study aimed to explore the interplay of conceptual knowledge and students’ written solutions in a scaffold in a small sample size, thus, only observations made in this sample could be reported. Although the selected tests are suitable for small samples (Dwivedi et al., 2017), uncertainty remains as to whether these results can be generalized to other samples. Larger studies should be conducted to confirm the findings.
Pseudonym | German | English |
---|---|---|
Daniel | Eigenschaftsänderung: “Ausbildung einer negativen Ladung” | Property change: “formation of a negative charge” |
Expliziter struktureller Unterschied: “Cl“ | Explicit structural difference: “Cl” | |
Eigenschaft: “e- ziehend → elektronegativ“ | Property: “electron withdrawing → electronegative” | |
Effekt der Eigenschaft auf die Eigenschaftsänderung/das Reaktionszentrum: “schneller” | Effect of property on change/reaction center: “faster” | |
Mary | Eigenschaftsänderung: “Tetraedrischer Aufbau” | Property change: “tetrahedral structure” |
Expliziter struktureller Unterschied: “Halogen” | Explicit structural difference: “halogen” | |
Eigenschaft: “elektronenschiebend” | Property: “electron pushing” | |
Effekt der Eigenschaft auf die Eigenschaftsänderung/das Reaktionszentrum: “sterische Hinderung am Tetraeder“” | Effect of property on change/reaction center: “steric hindrance on the tetrahedron” | |
Emma | Eigenschaftsänderung: “Bildung eines Oxoniumions” | Property change: “formation of an oxonium ion” |
Expliziter struktureller Unterschied: “Ester” | Explicit structural difference: “ester” | |
Eigenschaft: “Mesomerieeffekt durch Ester” | Property: “mesomeric effect by ester” | |
Effekt der Eigenschaft auf die Eigenschaftsänderung/das Reaktionszentrum: “Stabilisierung der negativen Ladung durch Mesomerie” | Effect of property on change/reaction center: “stabilization of the negative charge via mesomeric effect” | |
Alexander | Eigenschaftsänderung: “Ausbildung einer kovalenten Bindung + (Entstehung) bzw. Verlagerung einer neg. Ladung | Property change: “formation of a covalent bond + (formation) or displacement of a negative charge” |
Expliziter struktureller Unterschied: “-Cl als Rest” | Explicit structural difference: “-Cl as residue” | |
Eigenschaft: “Kann Elektronendichte durch mesomere Effekte verlagern (stellt Elektronen bereit) | Property: “can transfer electron density by mesomeric effects (provides electrons)” | |
Effekt der Eigenschaft auf die Eigenschaftsänderung/das Reaktionszentrum: “kann das entestehende Carbokation gut über +M-Effekt stabilisieren” | Effect of property on change/reaction center: “can stabilize the emerging carbocation well via +M-effect” | |
Clara | Eigenschaftsänderung: “Addition einer zusätzlichen Hydroxylgruppe” | Property change: “addition of an additional hydroxyl group” |
Expliziter struktureller Unterschied: “Säurechlorid (Cl-)” | Explicit structural difference: “acid chloride (Cl-)” | |
Eigenschaft: “stark elektronenziehender Effekt (−I)” | Property: “strong electron-withdrawal effect (−I)” | |
Effekt der Eigenschaft auf die Eigenschaftsänderung/das Reaktionszentrum: “Durch den −I-Effekt wird Elektronendichte abgezogen, weswegen die partial positive Ladung verstärkt wird und der Angriff schneller erfolgt” | Effect of property on change/reaction center: “Due to the −I effect, electron density is subtracted, which is why the partial positive charge is amplified and the attack takes place more quickly.” |
Fig. 12 Three tasks as they appeared in the conceptual knowledge test. In the upper task, the students were asked to tick what applies. Correct answers are marked with a green cross. |
Fig. 13 Sample solution for the contrasting case task, which should be solved by the students during the problem-solving phase filled in into the scaffold grid. |
Pair of groups | Levene's test result |
---|---|
Pretest score by score-groups | F(1,16) = 0.03, p = 0.855 |
Pretest score by causal complexity | F(2,15) = 0.07, p = 0.932 |
Pretest score by multivariateness | F(2,15) = 0.09, p = 0.911 |
Posttest score by score-groups | F(1,16) = 1.26, p = 0.278 |
Posttest score by causal complexity | F(2,15) = 0.24, p = 0.792 |
Posttest score by multivariateness | F(2,15) = 0.13, p = 0.881 |
Test | p original | p adjusted |
---|---|---|
Overall | 0.018 | 0.036 |
Low score | 0.011 | 0.036 |
High score | 0.588 | 0.784 |
None | >0.999 | >0.999 |
Low causal complexity | 0.529 | 0.784 |
High causal complexity | 0.014 | 0.036 |
Univariate | 0.006 | 0.036 |
Multivariate | >0.999 | >0.999 |
Footnote |
† In Germany it is common to speak of inductive effects (I effects) when talking about electron pushing (+I effect) and withdrawal effects (−I effect) along σ bonds and mesomeric effects (M effects) when talking about resonance. M effect is hence used synonymously for the potential of an atom or group of atoms to push π- and n-electrons to adjacent atoms or groups of atoms (+M effect) or pull π- and n-electrons towards itself (−M effect). |
This journal is © The Royal Society of Chemistry 2023 |