Steven J.
Petritis
*,
Colleen
Kelley
and
Vicente
Talanquer
Department of Chemistry & Biochemistry, University of Arizona, 1306 E. University Blvd., Tucson, AZ 85721, USA. E-mail: petritis@email.arizona.edu
First published on 1st December 2021
Previous research on student argumentation in the chemistry laboratory has emphasized the evaluation of argument quality or the characterization of argument structure (i.e., claims, evidence, rationale). In spite of this progress, little is known about the impact of the wide array of factors that impact students’ argumentation in the undergraduate laboratory. Building on our previous work involving activity framing, we analyzed student arguments produced following eight experiments that comprise the first semester of a college organic chemistry laboratory. Arguments were characterized by a set of domain-general coding categories that were related to the nature and quality of student arguments. Further, we explored the impact of four laboratory factors on the quality of arguments produced across the eight experiments in the laboratory curriculum. Our analysis revealed no trends on the effect of experiment order or general type on the quality of student arguments; however, the amount and types of data sources as well as the level of scaffolding provided both had an impact on student argument quality. Although the undergraduate laboratory offers a ripe opportunity for students to engage in argument from evidence, laboratory activity involves a complex web of components each with the potential to affect productive and quality sensemaking. Our findings highlight the importance of explicit consideration of various laboratory factors and their impact on how students express their chemical reasoning through written argumentation.
Building arguments from evidence is one of the core epistemic practices of science (Crujeiras-Pérez and Jiménez-Aleixandre, 2017), and chemistry educators should strive to better understand how to create robust opportunities for undergraduate students to formulate scientific arguments in the instructional laboratory. Despite broad consensus on the need for students to develop productive argumentation skills, relatively little is known about how different aspects of experimental task design, implementation, and assessment impact student argumentation in college chemistry labs. Building on our previous work involving laboratory activity framing (Petritis et al., 2021), this paper seeks to identify and characterize major factors that affect the nature and quality of arguments built by students to communicate their laboratory findings.
Research in chemistry classrooms has often relied on Toulmin's framework for argumentation to characterize how students coordinate “evidence and theory to support or refute an explanatory conclusion, model or prediction” (Jimenez-Aleixandre and Erduran 2008). Meanwhile, the undergraduate chemistry laboratory has seen deployment of a variety of evidence-based instructional models to facilitate productive engagement in science practices (Abi-El-Mona and Abd-El-Khalick, 2006; Abi-El-Mona and Abd-El-Khalick, 2011). Educational research in laboratory settings has mainly focused on two areas: (1) assessment of argument quality, and (2) characterization of argument structure.
Other instructional approaches have also demonstrated efficacy at promoting increased student argument quality. For example, Katchevich et al. (2013) found that students that conducted inquiry laboratory experiments constructed higher quality arguments compared to those produced during confirmatory experiments. Results from studies in chemistry laboratories are in line with those from investigations in biology and physiology education that have shown increased engagement with argumentation and conceptual knowledge as a result of using inquiry-based laboratory curricula (Reiser et al., 2001; Colthorpe et al., 2017; Cronje et al., 2013; Carmel et al., 2019). These studies reveal clear alignment between higher argument quality and laboratory curricula that promote engagement in science practices and argumentation. However, more research is needed to identify which specific features of these laboratory implementations lead students to construct higher quality arguments.
Although students’ general abilities to build arguments have been investigated thoroughly, there are fewer reports on how learners use specific disciplinary knowledge to coordinate various argument components. For example, Stowe and Cooper (2019) found that students could analyze and integrate spectroscopic data from various sources in their arguments, but they struggled to connect their evidence to a reasonable chemical claim. In the present work, we aim to gain additional insights in this area by exploring how students’ use of specific chemical data in various types of experiments impacts both the nature and quality of their post-lab arguments.
Among the various factors that can affect the nature and quality of students’ arguments in the undergraduate organic chemistry laboratory, we investigated the impact of activity framing in our previous work (Petritis et al., 2021). The frame of a student's educational experience is defined as a “set of expectations an individual has about the situation in which she finds herself that affect what she notices and how she thinks to act” (Hammer et al., 2005). Existing research suggests that students’ perceived frames impact their engagement and participation in argumentation (Berland and Hammer, 2012). Our research helped further elucidate this effect by engaging students in a single lab activity framed in two different ways: a predict-verify frame and an observe-infer frame (Petritis et al., 2021). Through our analysis of both domain-specific and domain-general features, we discovered that framing impacted the level of integration of evidence and theory, the specificity of student claims, the alignment of arguments, and the approach to reasoning that students followed in their arguments. In the present study, we expanded our analysis to a wider set of organic chemistry experiments seeking to identify other factors that may significantly affect the nature and quality of student argumentation in undergraduate organic chemistry labs.
(1) In what ways are the eight experiments similar and different in terms of the nature and quality of student argumentation?
(2) Which factors are most impactful and how do these factors affect the nature and quality of student argumentation across each of the eight experiments of interest?
Category | Experiment | Description | Student N | Post-lab report N | Argument N |
---|---|---|---|---|---|
Data collection, analysis, and interpretation experiments (DC) | Thin-layer chromatography (TLC) | Students were presented two pure substances, a mixture with these two components, and three laboratory solvents to separate and identify the two substances in the mixture. They collected data involving the movement of their substances on a silica gel TLC plate and analyzed these data by calculating retention factor (Rf) values. Interpretation of these data allowed students to investigate the relative polarities of the substances and observe differences in behavior | 59 | 59 | 177 |
Infrared (IR) spectroscopy | Students were tasked with preparing a sample for analysis using IR spectroscopy. Students collected spectroscopic data in the form of IR spectra and analyzed these data by identifying the wavenumbers of the peaks shown. Interpretation of these data allowed students to identify various bond types and functional groups in molecules of interest | 68 | 68 | 208 | |
Extraction and characterization experiment (EC) | Column chromatography (CC) | Students were given a hexane extract from ground, raw spinach leaves. This extract contained a mixture of several organic compounds, including the β-carotene compound that students were tasked to isolate using column chromatographic techniques. Students assessed the identity and purity of their isolate using thin-layer chromatography (comparing to a pure standard) | 99 | 99 | 198 |
Identification and structure elucidation experiments (ISE) | Gas chromatography (GC) | Students performed a transesterification reaction to convert an unknown triglyceride into its component fatty acid methyl esters. They recorded qualitative data about the reactions and collected chromatographic data involving the composition of their product mixture. These data were then used to craft arguments that identified their unknown starting material based on comparisons to known GC standards | 88 | 59 | 161 |
Nuclear magnetic resonance (NMR) spectroscopy | Students were assigned an unknown compound, prepared their own NMR sample, and collected both 1H NMR data and an IR spectrum. These data were accompanied by the molecular formula and 13C NMR data that were provided to each student for their respective unknown compound. Using the provided and collected spectroscopic data, students were tasked with elucidating the structure of their unknown compound and developing an argument that rationalized their structural choice based on their data | 170 | 170 | 170 | |
Prediction-verification experiments (PV) | Substitution reactions (SR) | Students explored the behavior of eight known alkyl halide starting materials under two sets of reaction conditions: SN1-favorable conditions (AgNO3 in ethanol) and SN2-favorable conditions (NaI in acetone). They used background information about solvent environment (protic vs. aprotic) and molecular structure (methyl, 1°, 2°, or 3° compounds) to predict the reactivity of their eight starting materials under each set of reaction conditions | 126 | 56 | 162 |
Elimination reactions (ER) | Students performed two separate elimination reactions using known starting materials: (1) the acid-catalyzed dehydration of 2-butanol (E1), and (2) the base-catalyzed dehydrohalogenation of 2-bromobutane (E2). They used background information about solvent environment, reaction mechanisms, and the molecular structure of their expected products to predict the major and minor products in these reactions | 140 | 70 | 210 | |
Synthesis and characterization experiment (SC) | Synthesis of esters (SE) | Students performed a Fischer esterification reaction in which they refluxed acetic acid, a catalytic amount of sulfuric acid, and an alcohol starting material of their choice to produce a fragrant ester product. Students recorded qualitative data regarding the fragrance of their reaction in addition to collecting both 1H NMR and IR spectroscopic data for their observed product. Arguments were then constructed to characterize their synthesized product | 176 | 69 | 207 |
Students completed each laboratory experiment following procedures described in their laboratory workbook and by the instructions of their respective GSI. After completion of the in-lab portion of each experiment, students wrote a post-laboratory report in which they constructed arguments following a claim–evidence–rationale framework. Post-lab arguments were constructed individually, in pairs, or groups of three to five depending on the instructional decisions made by each GSI. Regardless of whether post-lab arguments were written by individual or multiple students, each argument was counted only once during the data collection and analysis processes. Each argument contained an individual claim, evidence, and rationale component as identified by the student(s) and were handwritten into the post-lab argumentation scaffold available to each student in their laboratory workbook (Fig. 1). Arguments collected from each of the eight laboratory experiments of interest served as the primary source of data for this research study and analysis of these arguments is presented herein.
Table 1 summarizes student participation, post-lab reports collection, and total arguments analyzed for each laboratory experiment. Post-lab arguments produced by the research participants were collected following each laboratory session. Each post-lab report was de-identified, scanned, and immediately returned to each respective GSI. Post-lab arguments were then transcribed and used for qualitative coding analysis and quantitative characterization of argument quality.
Category | Experiment | Examples |
---|---|---|
Specificity | Characterized students’ claims as either “case-specific” or “class-level.” “Case-specific” claims referred to specific findings that students made based on their data. “Class-level” claims identified general inferences made about types of substances | Case specific: “Unknown compound 4 is corn oil” (GC) |
Class-level: “The most polar compounds have a high retention time” (GC) | ||
Explicitness | Highlighted the clarity with which students expressed both their evidence and rationale in their arguments. The “explicit evidence” code identified when students clearly identified the laboratory data they collected. The “implicit evidence” code identified instances when students did not clearly include experimental evidence that supported their argument. The “explicit rationale” code referred to rationales that were clearly described and did not require additional inference. The “implicit rationale” code identified rationales that lacked clarity in supporting their inference | Explicit evidence: “Benzil had an Rf value = 0.74 in our 75:25 hexane:acetone mixture” (TLC) |
Implicit evidence: “The solvent mixture of hexane and acetone shows that benzil is nonpolar” (TLC) | ||
Explicit rationale: “The IR spectra found that there was one O–H bond in the molecule. This coincides with the NMR spectra, as there is a single hydrogen very close to an oxygen atom” (NMR) | ||
Implicit rationale: “I think this structure is correct because the chemical shift and splitting patterns helped me determine which H's were next to each other” (NMR) | ||
Completeness | Characterized how thoroughly students presented the necessary evidence and rationale for their argument. The “complete evidence” code identified instances where students provided a detailed account of their experimental observations. The “incomplete evidence” code highlighted when the evidence provided lacked sufficient detail. The “complete rationale” code was applied to rationales that sufficiently outlined how their experimental evidence justified the claim made in their argument. “Incomplete rationales” lacked pertinent details to make sense of the argument being presented | Complete evidence: “Bond = Wavenumber: O–H = 3331 cm−1, CC = 1653 cm−1, C–H = 2881 cm−1” (IR) |
Incomplete evidence: “The different wavenumbers at the varying percent transmittance values” (IR) | ||
Complete rationale: “Trans-2-butene occurred the most (larger peak) in GCs for E1 and E2. Trans-2-butene is the most stable and substituted alkene so it would occur the most compared to 1-butene” (ER) | ||
Incomplete rationale: “The least stable product will form the least and the major and minor products will form” (ER) | ||
Differentiation | Identified instances in which students compared or contrasted the substances, properties, reactions, and behaviors related to their laboratory experiments. This coding category was used to characterize the claims, evidence, and rationale components. The “multiple” code identified instances where students referred to similar behaviors or properties. The “single” code was used when only individual substances or reactions were referred to | Multiple: “Esters have no hydrogen bonding, so compared to the reactants in this reaction, the ester is more volatile” (SE) |
Single: “Our ester should have had a nondescript fruity smell. Since our product did have a fruity smell, we know we had an ester” (SE) | ||
Integration | Characterized the level of coordination of chemical concepts and experimental observations in student rationales. The “integrated” code applied to rationales that connected student ideas and laboratory findings. The “fragmented” code highlighted when students separately discussed their experimental observations and chemical knowledge without attempt to connect their ideas | Integrated: “As shown on the TLC plate, there is only one spot for the isolated carotenes that matches the β-carotene standard. Also, the Rf value for each isolated carotene matches the Rf value for the β-carotene standard.” (CC) |
Fragmented: “Because the Rf value for our isolate was the same Rf value for the β-carotene structure” (CC) | ||
Alignment | Characterized arguments that had a consistent focus between the claims and rationale components. “Aligned” arguments demonstrated instances where student claims and rationale components presented a coherent focus. “Misaligned” arguments failed to demonstrate a coherent focus between the claims and rationale | Aligned: “Claim: SN2 reactions happened faster than SN1 reactions. Rationale: In general, the SN2 reactions were observed to be faster than the SN1 reactions due to the fact that it's a one-step reaction, also because SN2 usually occurs with primary carbons, which is less hindered” (SR) |
Misaligned: “Claim: Tert-butyl chloride did not react as expected. Rationale: The tertiary carbon favors an SN1 reaction because it is the most stable which means it has the lowest transition state energy” (SR) | ||
Approach to reasoning | Identified the line of reasoning employed by students as they rationalized their claims and was characterized as “deductive”, “inductive”, or “hybrid”. The “deductive” code highlighted students applying general chemical principles. The “inductive” code was used when students rationalized general claims involving their experiments. The “hybrid” code identified claims supported by both experimental data and general chemical rules and principles | Deductive: “When using TLC, molecules travel further in solvents with similar polarities. It is known that methanol is polar, so benzoin is polar” (TLC) |
Inductive: “The Rf value for β-carotene was very close to one, meaning that the vitamin moved closely along with the nonpolar mobile phase, which means the β-carotene is also nonpolar” (TLC) | ||
Hybrid: “Nonpolar components would result in it traveling further up the plate, which it did with an Rf value of 0.92. The pure β-carotene has CC and C–H bonds, which makes it very hydrophobic” (TLC) |
Qualitative analysis began with the first author (a graduate student) randomly selecting post-lab reports for the eight laboratory experiments of interest. A qualitative codebook was developed for each of these experiments which included all domain-general coding categories. Coding categories and codes were first identified and applied by the graduate student researcher and discussed with the second author. The two researchers independently coded each argument component from the selected reports for a given laboratory experiment and subsequently met to discuss their respective coding decisions. Discussions were had until both researchers agreed on their coding choices for each analyzed argument. This iterative process was followed until consensus was reached on at least 25% of the collected arguments for each of the eight laboratory experiments of interest. Remaining arguments were qualitatively coded by the graduate student researcher.
To quantitatively characterize the quality of arguments from each experiment, we assigned a value of “1” to argument components characterized as “explicit”, “complete”, “integrated”, and “aligned”, and assigned a “0” to argument components coded as “implicit”, “incomplete”, “fragmented”, and “misaligned”. The explicitness and completeness coding categories were both counted twice in this analysis as these categories were applied to both the evidence and rationale components in our qualitative coding analysis. Thus, in total each argument got scores for each of the six components and an added quality score in the range 0 to 6. Additional domain-general codes not deemed indicative of argument quality (i.e., specificity, differentiation, and approach to reasoning) were given an arbitrary value. For example, the approach to reasoning coding category had three possible codes, “deductive”, “inductive”, and “hybrid”, which were assigned a “2”, “1”, and “0”, respectively. A total of 1493 arguments (each consisting of one claim, one evidence, and one rationale component) were analyzed and categorized in this manner.
The R statistical software (Windows, version 4.0.3) was used to run all statistical analysis for this research study. The frequency of occurrence of each qualitative code were calculated for each of the eight laboratory experiments in our study. We used the chi-square (χ2) test for independence to investigate the association between our qualitative codes and each laboratory experiment as well as to compare overall argument quality across our eight experiments. Statistically significant associations between each code and laboratory experiment were investigated at the α = 0.05 level. In addition, we used the R statistical software to calculate the standardized chi-square residual values for the association between each combination of the categorical variables (qualitative codes and laboratory experiment).
Coding category | Codes | TLC (%) (N = 177) | IR (%) (N = 208) | CC (%) (N = 198) | GC (%) (N = 161) | NMR (%) (N = 170) | SR (%) (N = 162) | ER (%) (N = 210) | SE (%) (N = 207) |
---|---|---|---|---|---|---|---|---|---|
‘C’ indicates a code assigned to student claims, ‘E’ indicates a code assigned to student evidence, and ‘R’ indicates a code assigned to student rationale. | |||||||||
Specificity | Case-specific | 94.9 | 96.2 | 100.0 | 82.6 | 100.0 | 63.0 | 77.1 | 92.8 |
Class-level | 5.1 | 3.8 | 0.0 | 17.4 | 0.0 | 37.0 | 22.9 | 7.2 | |
Explicitness | Explicit (E) | 84.7 | 92.3 | 100.0 | 69.6 | 76.5 | 42.6 | 65.7 | 36.2 |
Implicit (E) | 15.3 | 7.7 | 0.0 | 30.4 | 23.5 | 57.4 | 34.3 | 63.8 | |
Explicit (R) | 44.1 | 49.0 | 59.1 | 43.5 | 40.0 | 39.5 | 50.5 | 46.4 | |
Implicit (R) | 55.9 | 51.0 | 40.9 | 56.5 | 60.0 | 60.5 | 49.5 | 53.6 | |
Completeness | Complete (E) | 44.1 | 88.5 | 100.0 | 26.1 | 51.8 | 48.8 | 16.2 | 4.3 |
Incomplete (E) | 55.9 | 11.5 | 0.0 | 73.9 | 48.2 | 51.2 | 83.8 | 95.7 | |
Complete (R) | 9.6 | 68.3 | 24.2 | 26.1 | 24.7 | 30.9 | 4.8 | 10.1 | |
Incomplete (R) | 90.4 | 31.7 | 75.8 | 73.9 | 75.3 | 69.1 | 95.2 | 89.9 | |
Differentiation | Multiple (C) | 47.5 | 0.0 | 1.5 | 13.0 | 0.0 | 27.8 | 58.1 | 0.0 |
Single (C) | 52.5 | 100.0 | 98.5 | 87.0 | 100.0 | 72.2 | 41.9 | 100.0 | |
Multiple (E) | 25.4 | 0.0 | 0.0 | 17.4 | 0.0 | 25.9 | 59.0 | 2.9 | |
Single (E) | 74.6 | 100.0 | 100.0 | 82.6 | 100.0 | 74.1 | 41.0 | 97.1 | |
Multiple (R) | 33.2 | 4.8 | 83.3 | 87.0 | 30.6 | 32.1 | 83.8 | 8.7 | |
Single (R) | 66.8 | 95.2 | 16.7 | 13.0 | 69.4 | 67.9 | 16.2 | 92.3 | |
Integration | Integrated | 25.4 | 38.5 | 60.6 | 87.0 | 32.9 | 68.5 | 17.1 | 11.6 |
Fragmented | 74.6 | 61.5 | 39.4 | 13.0 | 67.1 | 31.5 | 82.9 | 88.4 | |
Alignment | Aligned | 88.1 | 96.2 | 97.0 | 100.0 | 82.4 | 79.6 | 87.6 | 95.7 |
Misaligned | 11.9 | 3.8 | 3.0 | 0.0 | 17.6 | 20.4 | 12.4 | 4.3 | |
Approach to reasoning | Deductive | 52.5 | 26.0 | 34.8 | 34.8 | 4.7 | 72.2 | 50.5 | 11.6 |
Inductive | 47.5 | 69.2 | 56.1 | 65.2 | 76.5 | 13.0 | 41.9 | 88.4 | |
Hybrid | 0.0 | 4.8 | 9.1 | 0.0 | 18.8 | 14.8 | 7.6 | 0.0 |
Coding category | Codes | TLC (N = 177) | IR (N = 208) | CC (N = 198) | GC (N = 161) | NMR (N = 170) | SR (N = 162) | ER (N = 210) | SE (N = 207) |
---|---|---|---|---|---|---|---|---|---|
‘C’ indicates a code assigned to student claims, ‘E’ indicates a code assigned to student evidence, and ‘R’ indicates a code assigned to student rationale.a Chi-square residual value that is statistically significantly greater (>2) or less (<−2) than the expected frequency of each code for all eight experiments. | |||||||||
Specificity | Case-specific | 0.871 | 1.134 | 1.681 | −0.827 | 1.557 | −3.484a | −1.785 | 0.612 |
Explicitness | Explicit (E) | 2.124a | 3.595a | 4.789a | −0.256 | 0.804 | −4.323a | −0.953 | −5.971a |
Explicit (R) | −0.560 | 0.439 | 2.493a | −0.643 | −1.323 | −1.383 | 0.745 | −0.121 | |
Completeness | Complete (E) | −0.698 | 8.515a | 10.659a | −3.969a | 0.769 | 0.198 | −6.610a | −9.030a |
Complete (R) | −6.487a | 13.159a | 0.134 | 0.601 | 0.248 | 1.850 | −5.651a | −4.022a | |
Differentiation | Multiple (C) | 9.002a | −6.190a | −5.542a | −1.589 | −5.596a | 2.775a | 13.397a | −6.175a |
Multiple (E) | 2.960a | −5.842a | −5.700a | 0.307 | −5.282a | 2.990a | 15.253a | −4.799a | |
Multiple (R) | −2.517a | −8.626a | 8.078a | 7.971 | −2.781a | −2.428a | 8.422a | −7.771a | |
Integration | Integrated | −3.235a | −0.570 | 4.311a | 9.120a | −1.639 | 5.472a | −5.398a | −6.606a |
Alignment | Aligned | −0.412 | 0.765 | 0.867 | 1.184 | −1.194 | −1.529 | −0.527 | 0.687 |
Approach to reasoning | Deductive | 4.262a | −1.965 | 0.231 | 0.194 | −6.537a | 7.910a | 4.128a | −5.511a |
Inductive | −2.053a | 1.851 | −0.600 | 0.967 | 2.899a | −7.527a | −3.281a | 5.428a | |
Hybrid | −3.460a | −1.085 | 1.258 | −3.300a | 6.045a | 3.992a | 0.476 | −3.742a |
Claims | Evidence | Rationale | |
---|---|---|---|
Column chromatography experiment | The isolated carotenes look greatly pure as the carotenes from the spinach matched the ones from the pure liquid | R f value of β-carotene = 3 cm/3 cm = 1. Rf value of spinach extract = 3 cm/3 cm = 1 | The extremely nonpolar carotene from the liquid traveled as a nonpolar would to the top. The isolated spinach carotenes traveled as much as the pure form did which shows that it was nicely purified to get even match the exact same Rf values of 1 as the nonpolar carotene would show. It would match the nonpolar carotenes with the high Rf value; high Rf value = very least polar |
(Included drawing with labeled TLC plate) | |||
Substitution reaction experiment | Carbons not sp3 hybridized will not undergo substitution reactions | Bromobenzene did not undergo SN1 or SN2 reactions | AgNO3 and ethanol solution favors SN1. NaI and acetone favors SN2 – neither reaction took place |
Elimination reaction experiment | E2 reactions happen quicker than E1 reactions | E2 reactions happen faster because they are one step and only depend on the concentration of our alkyl halide and the base | Our E2 test tubes filled with gas more quickly than our E1 tubes. We knew it would react quickly because it has a strong base reacting with our alkene |
Chemical rationales across each experiment did not vary greatly with regards to how explicitly these ideas were supported in student arguments. Arguments across most experiments were either about evenly split between the frequency of the explicit and implicit rationale codes (IR and elimination reaction experiments) or showed a slight preference for an implicit rationale (TLC, GC, NMR, and synthesis of esters experiments). Consider the following argument from the GC experiment:
Claim: “Different length fatty acid carbon chains exhibit different retention times.”
Evidence: “Carbon chains with 12, 14, 16, and 18 carbons show different peaks on GC spectra.”
Rationale: “We had peaks showing carbon chains of 12, 14, 16, and 18 carbons in our unknown fat.”
In this argument, the student supported their class-level claim about fatty acid carbon chain length and retention time with a rationale that did not clearly connect the chain lengths observed in their “unknown fat” to the “different peaks on the GC spectra” they collected. Thus, this chemical rationale implicitly rationalized the presence of “different length fatty acid carbon chains” without using their data to complete the inference. On the other hand, only the column chromatography (59.1%) experiment was characterized by chemical rationales that were more often coded as explicit than expected across all experiments. Rationales produced during the column chromatography experiment were often clearly supported by coordinating ideas related to observed reaction conditions (72.7%), chemical properties (75.8%) and molecular structures (37.9%) of the molecules with which they worked, and the chromatographic data (87.9%) collected from their experiment. As exemplified by the rationale in Table 5, the student clearly rationalized the TLC plate behavior of each β-carotene sample, as indicated by Rf values, and connected these ideas to the expected polarity of the compound.
Claim: “Limonene has alkane and alkene functional groups.”
Evidence: “The wavenumber values from the IR spectroscopy: C–H was 3010 cm−1 and CC was 1644 cm−1.”
Rationale: “The known values for these two bonds are 3100–2900 cm−1, which matches our result of 3010 cm−1. The 1650 cm−1 for CC compares to 1644 cm−1 experimental value.”
In this argument, the student provided evidence regarding their observed IR peaks, making specific reference to the wavenumbers collected in their spectra and the bond types identified by each peak related to their compound of interest. Conversely, the GC (73.9%), elimination reaction (83.8%), and synthesis of esters (95.7%) experiments shown a major preference for incomplete evidence components. Arguments coded with incomplete evidence for these experiments often included individual pieces of data in their argument and, thus, excluded key pieces of information need to support the proposed chemical inference. For example, consider the elimination reaction experiment argument shown below.
Claim: “E1 reaction produced a greater percentage of 2-butene for its products.”
Evidence: “E1 reaction was completed with 0.4 mL of 2-butanol and 0.6 mL of an acid mixture.”
Rationale: “E1 2-butene total area was greater than 1-butene. This was not true for E2 reaction.”
In this argument, the student made a claim comparing the amount of products produced in E1 reactions implicitly to E2 reactions. The evidence referenced the reaction conditions of the E1 reaction, but failed to provide experimental data that could support the claim regarding the “percentage of 2-butene.” Although arguments from the elimination reaction experiment often relied on chromatographic data (85.7% of arguments) as evidence, these arguments unsuccessfully detailed all the necessary evidence needed to support the proposed claim.
Arguments across each experiment (except the IR experiment) were most frequently characterized as having incomplete rationale components as well. Incomplete rationales excluded key features of either laboratory data or conceptual knowledge needed to support the claims students made. Again, consider the elimination reaction example shown in the previous paragraph. This student rationalized that the “E1 2-butene total area was greater than 1-butene” without making specific reference to the area of the peaks that would support this inference. Similarly, they stated that “This was not true for E2 reactions” without detailing the basis for the rationale they employed in that argument. Incomplete rationales were most common in the synthesis of esters (89.9%), TLC (90.4%), and elimination reaction (95.2%) experiments while the IR experiment remained the only experiment in which students more frequently provided complete rationales (68.3%) in their arguments. Looking at the IR experiment argument shown above, the student rationalized the claim “Limonene has alkane and alkene functional groups” by saying “The known values for these two bonds are 3100–2900 cm−1, which matches our result of 3010 cm−1. The 1650 cm−1 for CC compares to 1644 cm−1 experimental value.” In this rationale, the student provided a rationale that included the wavenumbers of the data collected and the theoretical wavenumbers for each peak observed in their molecule.
Claim: “cis-3-hexen-1-ol has an O–H functional group.”
Evidence: “The IR spectroscopy results for cis-3-hexen-1-ol show a wavenumber 3335.20 cm−1 with a curved dip.”
Rationale: “On IR spectra, O–H bond is shown as a curved dip with a broad wavenumber range between 3600–2900 cm−1 (theoretically).”
In this argument, the student made a claim about the presence of the O–H functional group in their cis-3-hexen-1-ol substance, reported the wavenumber (in cm−1) of their observed peak as evidence, and conferred this bond type with what was theoretically expected for the O–H bond for their rationale. The focus of this argument was a single substance, a single observed peak, and a single bond type and, thus, each component of the argument was coded as “single” for the differentiation coding category. Similarly, arguments from the IR, NMR, and synthesis of esters experiments all demonstrated lower than expected frequency of the “multiple” code of differentiation for student claims, evidence, and rationale components (Table 4). Conversely, several experiments were relatively comparison-rich. For instance, the majority of student claims, evidence, and rationale components of the elimination reaction experiment were characterized as “multiple” differentiation, including the example shown in Table 5. This student supported their claim about the difference in reaction rate between the E1 and E2 reaction pathways with evidence and rationale that further differentiated their observations and theoretical knowledge concerning the two reaction mechanisms. Additionally, both the column chromatography (83.3%) and GC (87.0%) experiments included rationale components containing predominately “multiple” differentiation, including the column chromatography example in Table 5, which compared “the same Rf values of 1” for the isolate and pure β-carotenes observed by that student.
Claim: “The unknown sample was linseed oil.”
Evidence: “The 16 carbon peak: 7.6% (GC). The sp3 C–H bond: 3008.31 cm−1 (IR). The CC and CO bonds are absent from the IR spectrum.”
Rationale: “The unknown was determined as linseed oil based on the peak for 16-carbon showing ∼6% (7.6%) as expected for linseed oil (palmitic acid) and the rest being composed of other carbons. The data for IR are also coherent with the structure of the fatty acid (only sp3 C–H bonds).”
In this argument, the student made a claim about the identity of their unknown fat starting material in the GC experiment and supported this claim with the presence and absence of IR and GC peaks. They rationalized their claim by describing how both sets of data match was “expected for linseed oil (palmitic acid)” based on the materials provided to them about each possible unknown compound. The student coordinated the data they collected with what is known about each of their possible compounds and, thus, their rationale was coded as integrated. In contrast, the majority of laboratory experiments contained arguments whose rationales were characterized as fragmented, in which students failed to adequately coordinate data and observations with background knowledge in support of their proposed claims. Fragmented rationales were most common in the synthesis of esters (88.4%), elimination reaction (82.4%), and thin-layer chromatography (74.6%) experiments. Consider the example shown below from the synthesis of esters experiment.
Claim: “The combination of 1-hexanol (an alcohol) with acetic acid and sulfuric acid results in an ester.”
Evidence: “After the reflux reaction, we wafted the scent of our product.”
Rationale: “What we smelled was characteristic of what we would expect to smell in the ester, propyl acetate.”
In this argument, the students claimed that their Fischer esterification reaction yielded them an ester product and support this claim with the qualitative data, related to the product's smell, they collected from their reaction. The student rationalized this claim by describing how their product characteristically smelled of propyl acetate. This student relied solely on the smell of their product to support their claim despite having collected both IR and NMR spectral data on their ester product. Due to their inability to coordinate these pieces of evidence in support of their claim, this rationale was coded as fragmented. Additionally, the student made two other arguments that focused independently on the IR and NMR data as they compartmentalized their data analysis and interpretation in support of their proposed claims about the identity of their ester product.
Claim: “cis-3-hexen-1-ol has alkene, alcohol, and alkane groups.”
Evidence: “O–H bond at 3333 cm−1, CC bond at 1654 cm−1, and C–H bond at 3008 cm−1. These values match the known values for the functional group wavenumbers.”
Rationale: “The peaks at different wavenumbers vary with the percent transmittance values.”
In this argument, the student made a claim about the functional groups present in their cis-3-hexen-1-ol substance and listed the wavenumber values obtained from their IR spectrum as their evidence. For their rationale, the student described one aspect of IR spectral data analysis. Although their rationale was related to IR analysis, the general statement about IR spectral interpretation (“different numbers vary with the percent transmittance values”) was misaligned from their claim about the functional groups present in their substance of interest. Misaligned arguments were fairly uncommon across all eight experiments with none of the experiments having arguments coded as “misaligned” more or less frequently than expected.
Claim: “tert-Butyl chloride reacted very readily through SN1 but not through SN2.”
Evidence: “When tert-butyl chloride was added to the solvent, a precipitate formed as a sediment at the bottom of the vial.”
Rationale: “The molecular structure of this substrate contains a tertiary carbon, which react only by SN1. This explains why the reaction occurred quickly in SN1 but didn’t react through SN2.”
In this argument, the student rationalized their specific claim about the reactivity of tert-butyl chloride with conceptual knowledge about the reactivity of substrates containing tertiary carbons. Many of these arguments (76.5%) were rationalized with chemical principles related to molecular structure in lieu of emphasizing their observations of reaction conditions (46.3%) as rationale. Subsequently, along with showing the greatest preference for deductive reasoning, arguments from the substitution reaction experiment were characterized by the lowest frequency of the inductive approach to reasoning (13.0%). Arguments such as the substitution reaction example in Table 5 demonstrate the inductive approach to reasoning in which students support their general claims (“Carbons not sp3 hybridized will not undergo substitution reactions”) with their laboratory findings (“neither reaction took place”). This approach to reasoning was much more commonly characteristic of arguments from all other experiments, including the IR (69.2%), NMR (76.5%), and synthesis of esters (88.4%) experiments. Consider the synthesis of esters experiment example below:
Claim: “Characteristic peaks show us our verified ester product structure.”
Evidence: “There was a three hydrogen singlet peak downfield.”
Rationale: “The three hydrogen singlet peak matches the structure of isopentyl alcohol confirming our product structure connectivity.”
In this argument representative of those in the synthesis of esters experiment, the student inductively supported their claim about the successful synthesis of their “ester product” with a rationale that emphasized the NMR spectroscopic data the student collected during the course of their experiment. In addition to showing a highest frequency for inductive reasoning, arguments from the IR, NMR, and synthesis of esters experiments showed the greatest frequencies of rationale support with spectroscopic data (98.1%, 82.4%, and 59.4%, respectively) as well as the lowest rationale frequencies for reaction conditions (2.9%, 0.0%, and 33.3%, respectively).
The hybrid approach to reasoning, in which students supported their findings with both chemical concepts and observable data, was far less frequent across all laboratory experiment. The NMR (18.8%) and substitution reaction (14.8%) were demonstrated as the only experiments to have a higher frequency of this approach to reasoning compared to expectations across all experiments. Consider the NMR experiment rationale example shown below:
Rationale: “Since we know there are three neighbors (since it was a quartet peak), that means there cannot be 4 hydrogens bonded to a carbon (that would be methane). So, that indicates this is including more than one identical group (the integration). Then, since the IR spectra indicates the presence of a CO bond, it is the next logical step to assume it is what connects the two groups previously determined from the NMR. We also know that since the triple is the most shielded, which is indicated by chemical shift, we know it is furthest away from the oxygen (most electronegative atom).”
In this example, the student made a claim (not shown) in which they proposed the structure for their unknown substance. In support of this case-specific claim, the student drew from both NMR conceptual background and analysis of their own spectroscopic data. By identifying “the most shielded” peak as “indicated by chemical shift” the student applied NMR principles regarding the position of peaks in reference to their own data about “the triple” peak they observed on their spectra. Additionally, the student pieced together their proposed structure on the basis of their spectroscopic data in saying “there are three neighbors (since it was a quartet peak).” These simultaneous references to the principles guiding analysis of an NMR spectrum and presentation of their own data analysis demonstrate the hybrid approach to reasoning that was most commonly associated with arguments produced from the NMR experiment. Similarly in the substitution reaction experiment, students produced a notable proportion of arguments following the hybrid approach to reasoning (14.8% of arguments) in which they supported their claims with rationale referencing molecular structure (76.5% of rationale components) and their observed reaction conditions (46.3% of rationale components).
Coding category | Codes | DC | EC | ISE | PV | SC | |||
---|---|---|---|---|---|---|---|---|---|
TLC (N = 177) | IR (N = 208) | CC (N = 198) | GC (N = 161) | NMR (N = 170) | SR (N = 162) | ER (N = 210) | SE (N = 207) | ||
‘E’ indicates a code assigned to student evidence, and ‘R’ indicates a code assigned to student rationale.a Indicates a statistically significantly different value than expected of each code for all eight experiments. | |||||||||
Explicitness | Explicit (E) | 0.847a | 0.923a | 1.000a | 0.696 | 0.765 | 0.426a | 0.657 | 0.362a |
Explicit (R) | 0.441 | 0.490 | 0.591a | 0.435 | 0.400 | 0.395 | 0.505 | 0.464 | |
Completeness | Complete (E) | 0.441 | 0.885a | 1.000a | 0.261a | 0.518 | 0.488 | 0.162a | 0.043a |
Complete (R) | 0.096a | 0.683a | 0.242 | 0.261 | 0.247 | 0.309 | 0.048a | 0.101a | |
Integration | Integrated | 0.254a | 0.385 | 0.606a | 0.870a | 0.329 | 0.685a | 0.171a | 0.116a |
Alignment | Aligned | 0.881 | 0.962 | 0.970 | 1.000 | 0.824 | 0.796 | 0.876 | 0.957 |
Argument quality | Scaled 0 to 6 | 2.96a | 4.33a | 4.41a | 3.52a | 3.08 | 3.10 | 2.42a | 2.04a |
The overall average argument score of the 1493 arguments analyzed in this research study was 3.23 out of a possible 6. The column chromatography experiment contained arguments of the highest quality (4.41 out of 6). An example from the column chromatography experiment is shown in Table 7 to demonstrate the assignment of argument quality scores for each experiment. In this example, the argument presented a clear evidence component that fully detailed the TLC data collected (as well as a drawing of the TLC plate – not shown) and how these data were used to calculate the Rf value of the observed β-carotene compound. This argument was also coded as “explicit” and “complete” for the rationale component, “integrated”, and “aligned” and, thus, received a total quality score of 6 out of 6. The column chromatography experiment led to arguments that, on average, had the highest quality of the eight analyzed in this study, including having the greatest quality for the explicitness of evidence, explicitness of rationale, and completeness of evidence as well as the second highest quality of alignment and third highest quality of integration.
Claims | Evidence | Rationale | |
---|---|---|---|
Column chromatography experiment | The β-carotene is very pure | Labeled TLC plate | Since both of them have reached the same height. The pure β-carotene is very nonpolar and it has CC and C–H bonds which makes it very hydrophobic. The more nonpolar a molecule is, the more distance (Rf) it will travel, which explains why the Rf value was 0.92 which wouldn’t be the case if the β-carotene was mixed with polar chlorophyll which would make the Rf significantly less than 0.92 since the polar molecules have a higher affinity to the silica plate |
R f for both = 3.5 cm/3.8 cm = 0.92 | |||
(Drawn and labeled TLC plate included) | |||
Overall score = 6 | Aligned = 1 | Explicit (E) = 1 | Explicit (R) = 1 |
Complete (E) = 1 | Complete (R) = 1 | ||
Integrated = 1 | |||
Synthesis of esters experiment | We got our ester | The 1H NMR has a characteristic singlet | The singlet on the 1H NMR is characteristic of an ester with acetic acid. Since we saw that singlet, we know we had an ester |
Overall score = 2 | Aligned = 1 | Implicit (E) = 0 | Explicit (R) = 1 |
Incomplete (E) = 0 | Incomplete (R) = 0 | ||
Fragmented = 0 |
The IR and GC experiments also led to arguments with higher-than-expected quality. The IR experiment was characterized by the highest quality of completeness of student rationale components as well as the second highest explicitness and completeness of evidence. Similarly, the GC experiment was associated with arguments with the highest quality in alignment and integration.
Arguments from the substitution reaction experiment (3.10) had average quality. Despite having the second highest quality in completeness of rationale and integration, this experiment included the lowest quality arguments in terms of explicitness of rationale and alignment, as well as the second lowest quality in explicitness of evidence. Similarly, the NMR experiment (3.08) was linked to arguments with an average quality in most categories.
Arguments from the TLC, elimination reaction, and synthesis of esters experiments were of lower quality than expected. The TLC experiment was the first experiment performed by students and was characterized by arguments with the third lowest overall quality (2.96). These arguments had the second lowest completeness of rationale and the third lowest integration quality. The elimination reaction experiment (2.42) had arguments with the second lowest overall quality, which included several coding categories with lower-than-expected quality compared to the other experiments.
The final experiment in the semester was the synthesis of esters experiment and resulted in arguments of the lowest average quality. Table 7 includes an example of argument quality characterization for the synthesis of esters experiment. In this example, the student provided an aligned argument in which the focus of the claim and the rationale are both “our ester” product that “we know we had.” The unclear description of only the “characteristic singlet” observed in their NMR spectrum neglected to include description of other peaks as well as the other types of data collected about their ester product. Thus, the argument was rated as both “implicit” and “incomplete” quality of evidence. In their chemical rationale, although the student explicitly tied the presence of “the singlet on the 1H NMR” to the ester product they synthesized, they included an incomplete description of how their evidence supported their claim. The student produced a fragmented rationale in which they identified only NMR data and failed to coordinate other observations, data, and structural information about their ester product.
Experiment (#, type) | Argument quality | Types of data sources (amount of data sources) |
---|---|---|
a Indicates a statistically significantly different argument quality than expected across all eight experiments. | ||
Column chromatography CC (3, EC) | 4.41a | Qualitative TLC data + known molecules (2) |
Infrared spectroscopy IR (2, DC) | 4.33a | IR spectroscopic data + known molecules (2) |
Gas chromatography GC (4, ISE) | 3.52a | Qualitative observations + quantitative GC data (2) |
Substitution reactions SR (6, PV) | 3.10 | Qualitative observations + known molecules (2) |
Nuclear magnetic resonance spectroscopy NMR (5, ISE) | 3.08 | Spectroscopic data: IR, 1H NMR, 13C NMR (3) |
Thin-layer chromatography TLC (1, DC) | 2.96a | Qualitative TLC data + known molecules (2) |
Elimination reactions ER (7, PV) | 2.42a | Qualitative observations + quantitative GC data + known molecules (3) |
Synthesis of esters SE (8, SC) | 2.04a | Qualitative observations + IR, 1H NMR spectroscopic data + known molecules (4) |
For example, access to both chromatographic data and spectroscopic data during an experiment (e.g., IR, NMR, and SE labs) often led students to engage in an inductive approach to reasoning, while the absence of these types of data was more frequently linked to arguments built using deductive reasoning (based on theoretical knowledge). This latter case is exemplified by the substitution reaction (SR) lab in which students had access to only observational data and their deductive arguments were strongly based on their content knowledge about SN1 and SN2 mechanistic paths. In contrast, in the elimination reactions (ER) experiment, in the same category (PV) as the substitution reactions lab, students gathered chromatographic data and their reliance on inductive reasoning when building arguments was significantly larger.
The amount and types of data available to students also impacted the quality of their argumentation. For example, the CC and IR experiments had the fewest data sources for students to utilize and resulted in the highest quality arguments in our analysis. On the other hand, arguments from experiments in which several types of data were available to students led to lower quality arguments, as highlighted by the elimination reaction and synthesis of esters experiments. Lower argument quality suggested that students had difficulty coordinating multiple pieces of evidence. As shown in Table 3, students’ arguments from these two labs were characterized as highly incomplete and fragmented.
For example, in the CC experiment, the workbook explicitly asked students to make claims regarding the “purity of your isolated β-carotene” and the relative non-polarity of β-carotene related to “what you would predict based on the structures.” In this case, the students in our sample constructed arguments with case-specific claims that addressed the explicit guidance given to them. Explicit directions were also given on the evidence to be provided by asking students to “draw the results from your TLC” and “calculate the Rf value of each spot.” As a result, all analyzed arguments included the requested elements and were coded as both explicit and complete in the evidence component. This explicit guidance also impacted student rationales, which were characterized by a high frequency of multiple differentiation and a slight preference for an inductive approach to reasoning. In contrast, explicit scaffolding for argumentation was missing in the TLC lab in which students collected and analyzed the same amount and types of data sources as the CC lab (see Table 8) in addition to considering the same chemical concepts of structure, purity, and polarity. In this case, student arguments were of characteristically lower quality in their explicitness and completeness of both the evidence and rationale components. In their rationales, students relied less heavily on comparisons between their observed molecules and more on using theoretical constructs, which led to a slight preference for deductively framed arguments. Despite several striking similarities between the resources available in these experiments, the explicit scaffolding for argumentation in the CC lab correlated with arguments of the highest quality (4.41) while a lack of scaffolding was linked to TLC arguments whose quality (2.96) was below the average for the experiments analyzed in this study.
Our analysis revealed very little variability in the specificity of student claims and the alignment of argument components. The majority of collected arguments across the eight experiments had a case-specific focus on laboratory observations and were aligned in terms of the concepts and ideas used when making claims, providing evidence, and constructing rationales. Nevertheless, the explicitness and completeness of student evidence and rationale components varied greatly across each of the eight experiments. Although the evidence presented by students in their arguments was largely characterized as explicit across most labs, the evidence component for two experiments (substitution reaction and synthesis of esters) was mostly implicit. In general, student rationales were split between being explicit and implicit in nature. While some experiments (elimination reaction and synthesis of esters) were characterized by highly incomplete evidence and rationale components alike, the column chromatography experiment was uniquely coded as having unanimously complete evidence and highly incomplete rationales amongst student arguments. Several experiments (notably IR, NMR, and synthesis of esters) were characterized by very few if any differentiation between substances and reactivity whereas some experiments (elimination reaction) or rationale components (column chromatography and GC) emphasized comparisons between substances and their properties. The level of integration of chemical concepts and ideas with data/observations was also highly variable within student rationales with some experiments (column chromatography, GC, and substitution reaction) more frequently characterized by integrated rationales while the remaining experiments displayed a majority of rationales coded as fragmented. Student approach to reasoning was often either deductive or inductive depending on the experiment, and very few instances of a hybrid approach were observed. Experiments that led to the collection of chromatographic and spectroscopic data (SR) tended to result in arguments that more often followed and inductive approach to reasoning, while experiments in which these types of data were not generated or provided led to arguments that were more frequently characterized as deductive.
We also explored the impact (or lack thereof) of various laboratory factors on the nature and quality of student reasoning as manifested in their written arguments. These factors included experiment order, experiment type, amount and types of data sources available to students, and the level scaffolding provided. Several prior studies have reported increased argument quality over time in an academic course (Sampson et al., 2010; Walker and Sampson, 2013a; Çetin and Eymur, 2017; Hosbein et al., 2021). However, in our case we observed no noticeable trend in argument quality over the course of the academic semester. We also speculated that experiments of the same type would be characterized by similar argument quality. However, we found no relationship between type of experiment and argument quality. In our study, student arguments built while working on experiments classified within the same category (e.g., data collection, analysis, and interpretation) often exhibited very different qualities.
On the other hand, our analysis suggested that the amount and type of data sources available to students in an experiment as well as the degree of scaffolding provided greatly impacted both the nature and quality of the arguments they constructed. Less data-intensive experiments presented students with an easier task of analyzing their data to use in constructing well-aligned arguments. Although at times these arguments still lacked clarity and key details in supporting their claims, experiments that involved fewer data sources and types of data also trended towards higher argument quality. Conversely, experiments that were more data-intensive and included a greater quantity of data sources as well as different types of data tended to be characterized by arguments of lower quality, such as both the ER and SE experiments. Our findings are in alignment with previous studies suggesting that more data-intensive experiments that demand the consideration of different types of evidence often lead to student arguments that lack sophistication in the coordination of both empirical and theoretical pieces (Sandoval, 2003; Havdala and Ashkenazi, 2007). Recent research studies in chemistry education have shown that although students may be able to collect and analyze spectroscopic evidence, they often struggle to construct arguments that are consistent with the entirety of their data (Stowe and Cooper, 2019).
As part of our analysis, we also elucidated the effect of argument scaffolding on the quality of student reasoning. We identified experiments that, despite being similar in terms of the amount and types of data available to make sense of laboratory findings, led to student arguments of quite different quality. In these, cases the amount of explicit scaffolding provided for the construction of arguments seemed to be responsible for the difference. For example, students in the column chromatography experiment were explicitly prompted to make specific types of claims as well as present distinct pieces of evidence. Meanwhile, the TLC experiment involved collection of the same data and an identical conceptual focus but included no prompting for student argument components. Overall, CC experiment arguments were characterized as having the highest quality while the TLC experiment was characterized by arguments of the third lowest quality. Previous studies have presented conflicting findings as to the impact of scaffolding on students’ arguments. In one case, simplifying instructional contexts through scaffolding facilitated more complex argumentation (Berland and McNeill, 2010), while another recent investigation showed that explicit prompt scaffolding had no significant impact on students’ data-based inferences (Stowe and Cooper, 2009). These conflicting results suggests that more research is needed to better understand how scaffolding may affect student argumentation.
In our study, the quality of written arguments was closely related to the nature and amount of the data with which students had to grapple. This result suggests that laboratory designers, managers, and instructors should pay close attention to the types of data students are expected to collect and analyze in any given experiment, and purposefully select and sequence experimental activity to gradually increase the variety and complexity of the data to be analyzed. Our findings indicate that, in terms of supporting the development of argumentation abilities, the type of experiment that students conduct is likely less important than the nature and amount of data they are expected to analyze and integrate to make sense of their results.
Our results also point to the need for purposeful support of student argumentation as learners become more familiar with this epistemic practice. Various studies suggest that students may not know how to argue in the laboratory context or may lack clarity of the goal of their argumentative task (Berland and Hammer, 2012; Garcia-Mila et al., 2013). Thus, chemistry students are likely to benefit from explicit guidance on how to coordinate experimental data with core chemical concepts and ideas in developing high-quality arguments. McNeill et al. (2006) suggested that fading written instructional scaffolds for argumentation in chemistry better prepared students to produce higher quality arguments compared to students lacking this support. Related studies propose developing learning progression for the epistemic practice of argumentation to improve student performance across the semester (Smith et al., 2006; Berland and McNeill, 2010). These recommendations along with our findings highlight the need for students to be trained in argumentation and have opportunities to learn how to argue from evidence, especially early on in their laboratory experience.
While our findings inform us on several key factors that affect student argumentation in the organic chemistry laboratory, more research is needed to characterize the effect of other variables on the development of this scientific practice, such as the nature of laboratory instruction and the interaction with peers. If we are to foster the ability of students to productively engage in scientific argumentation, it is critical that chemistry education researchers and practitioners better understand the various factors that affect students’ engagement in this epistemic practice and the mechanisms through which these factors affect student reasoning.
This journal is © The Royal Society of Chemistry 2022 |