Analysis of factors that affect the nature and quality of student laboratory argumentation

Steven J. Petritis *, Colleen Kelley and Vicente Talanquer
Department of Chemistry & Biochemistry, University of Arizona, 1306 E. University Blvd., Tucson, AZ 85721, USA. E-mail: petritis@email.arizona.edu

Received 6th November 2021 , Accepted 29th November 2021

First published on 1st December 2021


Abstract

Previous research on student argumentation in the chemistry laboratory has emphasized the evaluation of argument quality or the characterization of argument structure (i.e., claims, evidence, rationale). In spite of this progress, little is known about the impact of the wide array of factors that impact students’ argumentation in the undergraduate laboratory. Building on our previous work involving activity framing, we analyzed student arguments produced following eight experiments that comprise the first semester of a college organic chemistry laboratory. Arguments were characterized by a set of domain-general coding categories that were related to the nature and quality of student arguments. Further, we explored the impact of four laboratory factors on the quality of arguments produced across the eight experiments in the laboratory curriculum. Our analysis revealed no trends on the effect of experiment order or general type on the quality of student arguments; however, the amount and types of data sources as well as the level of scaffolding provided both had an impact on student argument quality. Although the undergraduate laboratory offers a ripe opportunity for students to engage in argument from evidence, laboratory activity involves a complex web of components each with the potential to affect productive and quality sensemaking. Our findings highlight the importance of explicit consideration of various laboratory factors and their impact on how students express their chemical reasoning through written argumentation.


Introduction

To train the next generation of research and industry professionals, laboratory courses need to create opportunities for students to develop various epistemic practices. In particular, the undergraduate laboratory should open spaces for students to formulate their own research questions, design experiments, collect and analyze data, and engage in argumentation as they search for meaning in their laboratory findings. Experimental work should prepare chemistry students to coordinate empirical data with theoretical constructs to make sense of the observable world. By participating in these authentic scientific experiences, students can also develop a more meaningful understanding of the role of chemistry in everyday life.

Building arguments from evidence is one of the core epistemic practices of science (Crujeiras-Pérez and Jiménez-Aleixandre, 2017), and chemistry educators should strive to better understand how to create robust opportunities for undergraduate students to formulate scientific arguments in the instructional laboratory. Despite broad consensus on the need for students to develop productive argumentation skills, relatively little is known about how different aspects of experimental task design, implementation, and assessment impact student argumentation in college chemistry labs. Building on our previous work involving laboratory activity framing (Petritis et al., 2021), this paper seeks to identify and characterize major factors that affect the nature and quality of arguments built by students to communicate their laboratory findings.

Research on chemistry laboratory argumentation

Argumentation in the classroom and the laboratory is a “tool for understanding student reasoning, engagement in scientific practices, and development of conceptual and epistemic understanding” (Kelly and Takao, 2002). Of the eight science practices identified by the National Research Council (2012), “engaging in argument from evidence” has become increasingly commonplace in science education at all educational levels (Berland and Reiser, 2011). Thus, in recent years chemistry education researchers have sought to gain a better and deeper understanding of how students engage in this epistemic practice. Research on student argumentation in chemical contexts has been carried out in diverse educational settings, including elementary schools (Park et al., 2020; Soysal and Yilmaz-Tuzun, 2021), secondary schools (Juntunen and Aksela, 2014; Grooms et al., 2018; Çetin, 2021), and in undergraduate courses in a variety of chemistry subdisciplines (Kelly and Takao, 2002; Cruz-Ramírez de Arellano and Towns, 2014; Moon et al., 2019).

Research in chemistry classrooms has often relied on Toulmin's framework for argumentation to characterize how students coordinate “evidence and theory to support or refute an explanatory conclusion, model or prediction” (Jimenez-Aleixandre and Erduran 2008). Meanwhile, the undergraduate chemistry laboratory has seen deployment of a variety of evidence-based instructional models to facilitate productive engagement in science practices (Abi-El-Mona and Abd-El-Khalick, 2006; Abi-El-Mona and Abd-El-Khalick, 2011). Educational research in laboratory settings has mainly focused on two areas: (1) assessment of argument quality, and (2) characterization of argument structure.

Evaluation of argument quality

A significant body of work analyzing chemistry laboratory argument quality has been completed using the Science Writing Heuristic (SWH) and the Argument-Driven Inquiry (ADI) frameworks, which were developed to help foster and assess the efficacy of argumentation-based laboratory curricula (Keys et al., 1999; Sampson et al., 2010). The SWH framework was designed to increase student engagement and learning in the chemistry laboratory while allowing students to carry out their own investigations, making claims based on their own data, and reflect on what they learned from their laboratory experience (Burke and Greenbowe, 2006; Hand and Choi, 2010). Subsequent research under the SWH umbrella has evaluated the analytical and holistic quality of student argumentation and showed correlation between argument quality and academic performance in the general chemistry lecture course (Choi et al., 2013). The ADI model, on the other hand, guides students to construct arguments that are shared with classmates in a peer review process (Walker et al., 2011). Researchers have observed increases in oral and written argumentation quality when following this type of argument-centered laboratory curriculum (Sampson and Walker, 2012; Walker and Sampson, 2013a; Çetin and Eymur, 2017). In both frameworks, learners’ competence in developing arguments is fostered by creating multiple opportunities for students to make claims, support their findings with evidence, and provide solid rationales to support their ideas (Walker et al., 2012; Walker et al., 2019). Overall, work within these two approaches has demonstrated increased quality of written arguments over time in both general chemistry and organic chemistry laboratory courses (Hand and Choi, 2010; Walker and Sampson, 2013a; Hosbein et al., 2021).

Other instructional approaches have also demonstrated efficacy at promoting increased student argument quality. For example, Katchevich et al. (2013) found that students that conducted inquiry laboratory experiments constructed higher quality arguments compared to those produced during confirmatory experiments. Results from studies in chemistry laboratories are in line with those from investigations in biology and physiology education that have shown increased engagement with argumentation and conceptual knowledge as a result of using inquiry-based laboratory curricula (Reiser et al., 2001; Colthorpe et al., 2017; Cronje et al., 2013; Carmel et al., 2019). These studies reveal clear alignment between higher argument quality and laboratory curricula that promote engagement in science practices and argumentation. However, more research is needed to identify which specific features of these laboratory implementations lead students to construct higher quality arguments.

Characterization of argument components

Characterization of student arguments has emphasized a domain-general approach to understanding how students build their claims, select evidence, and generate rationales. For example, several studies have shown that students are more likely to construct higher quality arguments when they adequately connect their claims and evidence (Sandoval and Millwood, 2005; Grimberg and Hand, 2009; Choi et al., 2013; Katchevich et al., 2013; Walker and Sampson, 2013b). Unfortunately, Kuhn (1991) showed that students often provide unsubstantiated claims that lack evidentiary support, and Brem and Rips (2000) indicated that students sometimes replaced missing evidence with highly implicit reasoning. Students may also fail to explicitly link empirical evidence to their claims, present their arguments as self-evident without adequate support from their experimental findings (Brem and Rips, 2000; McNeill and Krajcik, 2007), and focus on individual pieces of laboratory data or evidence when building arguments (Bell and Linn, 2000). Nevertheless, they seem capable of properly connecting claims and evidence as well as coordinating theory with empirical data in particular contexts (Jimenez-Aleixandre, 2008; McNeill and Krajcik, 2009). Thus, more research is needed to understand when and how students build such connections, and what educational structures and strategies help them coordinate these elements properly (Carey and Smith, 1993; Driver et al., 2000; Sampson and Clark, 2008).

Although students’ general abilities to build arguments have been investigated thoroughly, there are fewer reports on how learners use specific disciplinary knowledge to coordinate various argument components. For example, Stowe and Cooper (2019) found that students could analyze and integrate spectroscopic data from various sources in their arguments, but they struggled to connect their evidence to a reasonable chemical claim. In the present work, we aim to gain additional insights in this area by exploring how students’ use of specific chemical data in various types of experiments impacts both the nature and quality of their post-lab arguments.

Factors in the chemistry laboratory

The chemistry laboratory is a complex learning environment in which students need to consider a wide array empirical evidence and integrate their data with diverse chemical ideas and concepts to build meaning based on their laboratory findings. Adding to this intricate web, different experiments are designed with various explicit and implicit goals, outcomes, approaches, and procedures (Domin, 1999). Criswell (2012) described the chemistry laboratory as a complex combination of context, goals, actions, tools, and interactions, each of which has the potential to impact student argumentation. Existing research on argumentation in the laboratory alludes to various factors that can impact students’ experiences and ability to coordinate empirical evidence and theoretical concepts (Kadayifci et al., 2012). These factors include explicit prompts and instructional scaffolds (McNeill et al., 2006; Cooper, 2015; Cooper and Stowe, 2018), instructional complexity (Smith et al., 2006; Berland and McNeill, 2010), instructional style (Grooms, 2020), and collaborative argumentation practices (Sampson and Clark, 2011).

Among the various factors that can affect the nature and quality of students’ arguments in the undergraduate organic chemistry laboratory, we investigated the impact of activity framing in our previous work (Petritis et al., 2021). The frame of a student's educational experience is defined as a “set of expectations an individual has about the situation in which she finds herself that affect what she notices and how she thinks to act” (Hammer et al., 2005). Existing research suggests that students’ perceived frames impact their engagement and participation in argumentation (Berland and Hammer, 2012). Our research helped further elucidate this effect by engaging students in a single lab activity framed in two different ways: a predict-verify frame and an observe-infer frame (Petritis et al., 2021). Through our analysis of both domain-specific and domain-general features, we discovered that framing impacted the level of integration of evidence and theory, the specificity of student claims, the alignment of arguments, and the approach to reasoning that students followed in their arguments. In the present study, we expanded our analysis to a wider set of organic chemistry experiments seeking to identify other factors that may significantly affect the nature and quality of student argumentation in undergraduate organic chemistry labs.

Research study

Research goals and questions

The main goal of this research study was to identify and characterize different factors that could affect the nature and quality of student written argumentation in the undergraduate organic chemistry laboratory. To accomplish this goal, this research project followed eight laboratory experiments in the first semester of a 200-level organic chemistry laboratory in a U.S. University. Specifically, we aimed to characterize various components of student arguments to better understand the factors that impacted the nature and quality of student claims, evidence, and rationales as they made sense of their laboratory findings. We used the following research questions to guide our analysis and presentation of results:

(1) In what ways are the eight experiments similar and different in terms of the nature and quality of student argumentation?

(2) Which factors are most impactful and how do these factors affect the nature and quality of student argumentation across each of the eight experiments of interest?

Laboratory setting

The research study described herein was conducted during the first semester of a two-semester sequence of organic chemistry laboratory courses at a public research-intensive university in the southwestern US. Students enrolled in the course attended a three-hour laboratory section every week. Each weekly session began with a pre-laboratory lecture and discussion designed to introduce students to the laboratory techniques and core chemical concepts and ideas relevant to each experiment. The next hour to two hours were spent conducting experiments as described by Kelley's (2019) laboratory workbook. Students used the remaining time during each laboratory session to analyze data and construct arguments with a claims–evidence–rationale scaffold for argumentation. Following this framework, students were guided to make claims that “describe what happened”, provide evidence in the form of “data to support [their] claim”, and a rationale to “connect [their] evidence to [their] claim” (Kelley, 2019). Fig. 1 shows an example of a post-lab argument constructed following this claims–evidence–rationale framework. Student reports from the eight experiments were analyzed to characterize similarities and differences in the arguments built by study participants. These eight experiments were categorized into five types based on the goals and tasks presented to students in the laboratory, including (a) data collection, analysis, and interpretation; (b) isolation and characterization; (c) identification and structure elucidation; (d) prediction–verification; and (e) synthesis and characterization. Basic characteristics of the experiments in each of these categories are summarized in Table 1. Experiments were completed in lab in the same order as presented in this table.
image file: d1rp00298h-f1.tif
Fig. 1 Example of a post-lab argument from the thin-layer chromatography (TLC) laboratory experiment. For their arguments, students were asked to compare the relative polarities of the compounds they separated from a mixture.
Table 1 Laboratory experiments performed in the targeted organic chemistry laboratory course and the number of study participants, post-lab reports, and arguments collected and analyzed. Experiments were completed in lab in the order presented in this table
Category Experiment Description Student N Post-lab report N Argument N
Data collection, analysis, and interpretation experiments (DC) Thin-layer chromatography (TLC) Students were presented two pure substances, a mixture with these two components, and three laboratory solvents to separate and identify the two substances in the mixture. They collected data involving the movement of their substances on a silica gel TLC plate and analyzed these data by calculating retention factor (Rf) values. Interpretation of these data allowed students to investigate the relative polarities of the substances and observe differences in behavior 59 59 177
Infrared (IR) spectroscopy Students were tasked with preparing a sample for analysis using IR spectroscopy. Students collected spectroscopic data in the form of IR spectra and analyzed these data by identifying the wavenumbers of the peaks shown. Interpretation of these data allowed students to identify various bond types and functional groups in molecules of interest 68 68 208
Extraction and characterization experiment (EC) Column chromatography (CC) Students were given a hexane extract from ground, raw spinach leaves. This extract contained a mixture of several organic compounds, including the β-carotene compound that students were tasked to isolate using column chromatographic techniques. Students assessed the identity and purity of their isolate using thin-layer chromatography (comparing to a pure standard) 99 99 198
Identification and structure elucidation experiments (ISE) Gas chromatography (GC) Students performed a transesterification reaction to convert an unknown triglyceride into its component fatty acid methyl esters. They recorded qualitative data about the reactions and collected chromatographic data involving the composition of their product mixture. These data were then used to craft arguments that identified their unknown starting material based on comparisons to known GC standards 88 59 161
Nuclear magnetic resonance (NMR) spectroscopy Students were assigned an unknown compound, prepared their own NMR sample, and collected both 1H NMR data and an IR spectrum. These data were accompanied by the molecular formula and 13C NMR data that were provided to each student for their respective unknown compound. Using the provided and collected spectroscopic data, students were tasked with elucidating the structure of their unknown compound and developing an argument that rationalized their structural choice based on their data 170 170 170
Prediction-verification experiments (PV) Substitution reactions (SR) Students explored the behavior of eight known alkyl halide starting materials under two sets of reaction conditions: SN1-favorable conditions (AgNO3 in ethanol) and SN2-favorable conditions (NaI in acetone). They used background information about solvent environment (protic vs. aprotic) and molecular structure (methyl, 1°, 2°, or 3° compounds) to predict the reactivity of their eight starting materials under each set of reaction conditions 126 56 162
Elimination reactions (ER) Students performed two separate elimination reactions using known starting materials: (1) the acid-catalyzed dehydration of 2-butanol (E1), and (2) the base-catalyzed dehydrohalogenation of 2-bromobutane (E2). They used background information about solvent environment, reaction mechanisms, and the molecular structure of their expected products to predict the major and minor products in these reactions 140 70 210
Synthesis and characterization experiment (SC) Synthesis of esters (SE) Students performed a Fischer esterification reaction in which they refluxed acetic acid, a catalytic amount of sulfuric acid, and an alcohol starting material of their choice to produce a fragrant ester product. Students recorded qualitative data regarding the fragrance of their reaction in addition to collecting both 1H NMR and IR spectroscopic data for their observed product. Arguments were then constructed to characterize their synthesized product 176 69 207


Data collection and participants

Most students enrolled in the targeted laboratory course were non-chemistry, science majors (e.g., biology, engineering, physiology) who were concurrently enrolled in the first semester of a two-semester organic chemistry sequence. The course was structured to have up to 24 students divided into individual laboratory sections, each of which was led by a Graduate Student Instructor (GSI). A total of 13 laboratory sections (totaling 11 different GSIs) were randomly selected and consented for participation between the Spring 2019 and Fall 2019 semesters. Recruitment of students in each of these laboratory sections was done verbally at the start of the semester before any laboratory experiments were conducted. In accordance with Institutional Review Board (IRB, protocol #1901297974) policies, all participants consented to participate in the study and data collection, storage, and analysis were conducted following approved IRB guidelines.

Students completed each laboratory experiment following procedures described in their laboratory workbook and by the instructions of their respective GSI. After completion of the in-lab portion of each experiment, students wrote a post-laboratory report in which they constructed arguments following a claim–evidence–rationale framework. Post-lab arguments were constructed individually, in pairs, or groups of three to five depending on the instructional decisions made by each GSI. Regardless of whether post-lab arguments were written by individual or multiple students, each argument was counted only once during the data collection and analysis processes. Each argument contained an individual claim, evidence, and rationale component as identified by the student(s) and were handwritten into the post-lab argumentation scaffold available to each student in their laboratory workbook (Fig. 1). Arguments collected from each of the eight laboratory experiments of interest served as the primary source of data for this research study and analysis of these arguments is presented herein.

Table 1 summarizes student participation, post-lab reports collection, and total arguments analyzed for each laboratory experiment. Post-lab arguments produced by the research participants were collected following each laboratory session. Each post-lab report was de-identified, scanned, and immediately returned to each respective GSI. Post-lab arguments were then transcribed and used for qualitative coding analysis and quantitative characterization of argument quality.

Qualitative coding analysis

Qualitative data analysis was conducted by adapting a previously established coding scheme (Petritis et al., 2021). This coding scheme focused on domain-general components of arguments that illustrated the focus and connectivity of student claims, evidence, and rationale components (see Table 2). The same previously reported domain-general coding categories were used to characterize the arguments produced by students from each of the eight laboratory experiments analyzed in this study. They included specificity, explicitness, completeness, differentiation, integration, alignment, and approach to reasoning. These domain-general codes served as our primary method for comparing and contrasting the arguments crafted during each experiment.
Table 2 Our previously established qualitative scheme utilized for domain-general coding categories and examples of each code from different experiments
Category Experiment Examples
Specificity Characterized students’ claims as either “case-specific” or “class-level.” “Case-specific” claims referred to specific findings that students made based on their data. “Class-level” claims identified general inferences made about types of substances Case specific: “Unknown compound 4 is corn oil” (GC)
Class-level: “The most polar compounds have a high retention time” (GC)
Explicitness Highlighted the clarity with which students expressed both their evidence and rationale in their arguments. The “explicit evidence” code identified when students clearly identified the laboratory data they collected. The “implicit evidence” code identified instances when students did not clearly include experimental evidence that supported their argument. The “explicit rationale” code referred to rationales that were clearly described and did not require additional inference. The “implicit rationale” code identified rationales that lacked clarity in supporting their inference Explicit evidence: “Benzil had an Rf value = 0.74 in our 75[thin space (1/6-em)]:[thin space (1/6-em)]25 hexane[thin space (1/6-em)]:[thin space (1/6-em)]acetone mixture” (TLC)
Implicit evidence: “The solvent mixture of hexane and acetone shows that benzil is nonpolar” (TLC)
Explicit rationale: “The IR spectra found that there was one O–H bond in the molecule. This coincides with the NMR spectra, as there is a single hydrogen very close to an oxygen atom” (NMR)
Implicit rationale: “I think this structure is correct because the chemical shift and splitting patterns helped me determine which H's were next to each other” (NMR)
Completeness Characterized how thoroughly students presented the necessary evidence and rationale for their argument. The “complete evidence” code identified instances where students provided a detailed account of their experimental observations. The “incomplete evidence” code highlighted when the evidence provided lacked sufficient detail. The “complete rationale” code was applied to rationales that sufficiently outlined how their experimental evidence justified the claim made in their argument. “Incomplete rationales” lacked pertinent details to make sense of the argument being presented Complete evidence: “Bond = Wavenumber: O–H = 3331 cm−1, C[double bond, length as m-dash]C = 1653 cm−1, C–H = 2881 cm−1” (IR)
Incomplete evidence: “The different wavenumbers at the varying percent transmittance values” (IR)
Complete rationale: “Trans-2-butene occurred the most (larger peak) in GCs for E1 and E2. Trans-2-butene is the most stable and substituted alkene so it would occur the most compared to 1-butene” (ER)
Incomplete rationale: “The least stable product will form the least and the major and minor products will form” (ER)
Differentiation Identified instances in which students compared or contrasted the substances, properties, reactions, and behaviors related to their laboratory experiments. This coding category was used to characterize the claims, evidence, and rationale components. The “multiple” code identified instances where students referred to similar behaviors or properties. The “single” code was used when only individual substances or reactions were referred to Multiple: “Esters have no hydrogen bonding, so compared to the reactants in this reaction, the ester is more volatile” (SE)
Single: “Our ester should have had a nondescript fruity smell. Since our product did have a fruity smell, we know we had an ester” (SE)
Integration Characterized the level of coordination of chemical concepts and experimental observations in student rationales. The “integrated” code applied to rationales that connected student ideas and laboratory findings. The “fragmented” code highlighted when students separately discussed their experimental observations and chemical knowledge without attempt to connect their ideas Integrated: “As shown on the TLC plate, there is only one spot for the isolated carotenes that matches the β-carotene standard. Also, the Rf value for each isolated carotene matches the Rf value for the β-carotene standard.” (CC)
Fragmented: “Because the Rf value for our isolate was the same Rf value for the β-carotene structure” (CC)
Alignment Characterized arguments that had a consistent focus between the claims and rationale components. “Aligned” arguments demonstrated instances where student claims and rationale components presented a coherent focus. “Misaligned” arguments failed to demonstrate a coherent focus between the claims and rationale Aligned: “Claim: SN2 reactions happened faster than SN1 reactions. Rationale: In general, the SN2 reactions were observed to be faster than the SN1 reactions due to the fact that it's a one-step reaction, also because SN2 usually occurs with primary carbons, which is less hindered” (SR)
Misaligned: “Claim: Tert-butyl chloride did not react as expected. Rationale: The tertiary carbon favors an SN1 reaction because it is the most stable which means it has the lowest transition state energy” (SR)
Approach to reasoning Identified the line of reasoning employed by students as they rationalized their claims and was characterized as “deductive”, “inductive”, or “hybrid”. The “deductive” code highlighted students applying general chemical principles. The “inductive” code was used when students rationalized general claims involving their experiments. The “hybrid” code identified claims supported by both experimental data and general chemical rules and principles Deductive: “When using TLC, molecules travel further in solvents with similar polarities. It is known that methanol is polar, so benzoin is polar” (TLC)
Inductive: “The Rf value for β-carotene was very close to one, meaning that the vitamin moved closely along with the nonpolar mobile phase, which means the β-carotene is also nonpolar” (TLC)
Hybrid: “Nonpolar components would result in it traveling further up the plate, which it did with an Rf value of 0.92. The pure β-carotene has C[double bond, length as m-dash]C and C–H bonds, which makes it very hydrophobic” (TLC)


Qualitative analysis began with the first author (a graduate student) randomly selecting post-lab reports for the eight laboratory experiments of interest. A qualitative codebook was developed for each of these experiments which included all domain-general coding categories. Coding categories and codes were first identified and applied by the graduate student researcher and discussed with the second author. The two researchers independently coded each argument component from the selected reports for a given laboratory experiment and subsequently met to discuss their respective coding decisions. Discussions were had until both researchers agreed on their coding choices for each analyzed argument. This iterative process was followed until consensus was reached on at least 25% of the collected arguments for each of the eight laboratory experiments of interest. Remaining arguments were qualitatively coded by the graduate student researcher.

Quantitative analysis of argument quality

Following our qualitative coding analysis, we sought to characterize the quality of student arguments for each of the eight laboratory experiments. We identified four of our seven domain-general coding categories as indicative of the overall quality of an argument, specifically the explicitness, completeness, integration, and alignment coding categories. For each of these four coding categories, one of the two possible codes was assigned a higher-quality argument score. For example, in the integration coding category for the rationale component, the code “integrated” was assigned a higher numerical value than “fragmented”. Similarly, the explicitness, completeness, and alignment coding categories all contain a more highly valued code. Arguments coded as “explicit” (evidence and rationale), “complete” (evidence and rationale), “integrated” (rationale), and “aligned” (rationale) were judged to lead to a higher quality argument.

To quantitatively characterize the quality of arguments from each experiment, we assigned a value of “1” to argument components characterized as “explicit”, “complete”, “integrated”, and “aligned”, and assigned a “0” to argument components coded as “implicit”, “incomplete”, “fragmented”, and “misaligned”. The explicitness and completeness coding categories were both counted twice in this analysis as these categories were applied to both the evidence and rationale components in our qualitative coding analysis. Thus, in total each argument got scores for each of the six components and an added quality score in the range 0 to 6. Additional domain-general codes not deemed indicative of argument quality (i.e., specificity, differentiation, and approach to reasoning) were given an arbitrary value. For example, the approach to reasoning coding category had three possible codes, “deductive”, “inductive”, and “hybrid”, which were assigned a “2”, “1”, and “0”, respectively. A total of 1493 arguments (each consisting of one claim, one evidence, and one rationale component) were analyzed and categorized in this manner.

The R statistical software (Windows, version 4.0.3) was used to run all statistical analysis for this research study. The frequency of occurrence of each qualitative code were calculated for each of the eight laboratory experiments in our study. We used the chi-square (χ2) test for independence to investigate the association between our qualitative codes and each laboratory experiment as well as to compare overall argument quality across our eight experiments. Statistically significant associations between each code and laboratory experiment were investigated at the α = 0.05 level. In addition, we used the R statistical software to calculate the standardized chi-square residual values for the association between each combination of the categorical variables (qualitative codes and laboratory experiment).

Results

In the following sections, we present results on: (1) the characterization of similarities and differences of written arguments for the eight experiments we explored in this research study, and (2) the analysis of several factors and their impact on the nature and quality of laboratory arguments.

In what ways are the eight experiments similar and different in terms of the nature and quality of student argumentation?

Students’ arguments in the eight laboratory experiments varied in structure depending on the coding category of interest. Our qualitative analysis of the nature of student argumentation focused on the domain-general categories described in Table 2. Table 3 displays the relative frequency of each code broken down by laboratory experiment. Table 4 summarizes the chi-square residual values for each coding category across the eight experiments.
Table 3 Relative frequency of domain-general codes for the claims, evidence, and rationale components of students arguments for each experiment
Coding category Codes TLC (%) (N = 177) IR (%) (N = 208) CC (%) (N = 198) GC (%) (N = 161) NMR (%) (N = 170) SR (%) (N = 162) ER (%) (N = 210) SE (%) (N = 207)
‘C’ indicates a code assigned to student claims, ‘E’ indicates a code assigned to student evidence, and ‘R’ indicates a code assigned to student rationale.
Specificity Case-specific 94.9 96.2 100.0 82.6 100.0 63.0 77.1 92.8
Class-level 5.1 3.8 0.0 17.4 0.0 37.0 22.9 7.2
Explicitness Explicit (E) 84.7 92.3 100.0 69.6 76.5 42.6 65.7 36.2
Implicit (E) 15.3 7.7 0.0 30.4 23.5 57.4 34.3 63.8
Explicit (R) 44.1 49.0 59.1 43.5 40.0 39.5 50.5 46.4
Implicit (R) 55.9 51.0 40.9 56.5 60.0 60.5 49.5 53.6
Completeness Complete (E) 44.1 88.5 100.0 26.1 51.8 48.8 16.2 4.3
Incomplete (E) 55.9 11.5 0.0 73.9 48.2 51.2 83.8 95.7
Complete (R) 9.6 68.3 24.2 26.1 24.7 30.9 4.8 10.1
Incomplete (R) 90.4 31.7 75.8 73.9 75.3 69.1 95.2 89.9
Differentiation Multiple (C) 47.5 0.0 1.5 13.0 0.0 27.8 58.1 0.0
Single (C) 52.5 100.0 98.5 87.0 100.0 72.2 41.9 100.0
Multiple (E) 25.4 0.0 0.0 17.4 0.0 25.9 59.0 2.9
Single (E) 74.6 100.0 100.0 82.6 100.0 74.1 41.0 97.1
Multiple (R) 33.2 4.8 83.3 87.0 30.6 32.1 83.8 8.7
Single (R) 66.8 95.2 16.7 13.0 69.4 67.9 16.2 92.3
Integration Integrated 25.4 38.5 60.6 87.0 32.9 68.5 17.1 11.6
Fragmented 74.6 61.5 39.4 13.0 67.1 31.5 82.9 88.4
Alignment Aligned 88.1 96.2 97.0 100.0 82.4 79.6 87.6 95.7
Misaligned 11.9 3.8 3.0 0.0 17.6 20.4 12.4 4.3
Approach to reasoning Deductive 52.5 26.0 34.8 34.8 4.7 72.2 50.5 11.6
Inductive 47.5 69.2 56.1 65.2 76.5 13.0 41.9 88.4
Hybrid 0.0 4.8 9.1 0.0 18.8 14.8 7.6 0.0


Table 4 Chi-square residuals of domain-general codes for the claims, evidence, and rationale components of students arguments for each experiment
Coding category Codes TLC (N = 177) IR (N = 208) CC (N = 198) GC (N = 161) NMR (N = 170) SR (N = 162) ER (N = 210) SE (N = 207)
‘C’ indicates a code assigned to student claims, ‘E’ indicates a code assigned to student evidence, and ‘R’ indicates a code assigned to student rationale.a Chi-square residual value that is statistically significantly greater (>2) or less (<−2) than the expected frequency of each code for all eight experiments.
Specificity Case-specific 0.871 1.134 1.681 −0.827 1.557 −3.484a −1.785 0.612
Explicitness Explicit (E) 2.124a 3.595a 4.789a −0.256 0.804 −4.323a −0.953 −5.971a
Explicit (R) −0.560 0.439 2.493a −0.643 −1.323 −1.383 0.745 −0.121
Completeness Complete (E) −0.698 8.515a 10.659a −3.969a 0.769 0.198 −6.610a −9.030a
Complete (R) −6.487a 13.159a 0.134 0.601 0.248 1.850 −5.651a −4.022a
Differentiation Multiple (C) 9.002a −6.190a −5.542a −1.589 −5.596a 2.775a 13.397a −6.175a
Multiple (E) 2.960a −5.842a −5.700a 0.307 −5.282a 2.990a 15.253a −4.799a
Multiple (R) −2.517a −8.626a 8.078a 7.971 −2.781a −2.428a 8.422a −7.771a
Integration Integrated −3.235a −0.570 4.311a 9.120a −1.639 5.472a −5.398a −6.606a
Alignment Aligned −0.412 0.765 0.867 1.184 −1.194 −1.529 −0.527 0.687
Approach to reasoning Deductive 4.262a −1.965 0.231 0.194 −6.537a 7.910a 4.128a −5.511a
Inductive −2.053a 1.851 −0.600 0.967 2.899a −7.527a −3.281a 5.428a
Hybrid −3.460a −1.085 1.258 −3.300a 6.045a 3.992a 0.476 −3.742a


Specificity. The majority of student claims across all eight experiments were characterized as case-specific, including the column chromatography and NMR experiments that contained solely case-specific claims. For example, the column chromatography experiment example in Table 5 has a case-specific claim referring to the “greatly pure” isolated “carotenes” with which the students worked. Class-level claims were sparse for most experiments with the GC (17.4%), elimination reaction (22.9%), and substitution reaction (37.0%) experiments exhibiting the greatest frequency of such claims. Class-level claims, as exemplified by the substitution reaction experiment claim in Table 5, were inferences about general classes of molecules, reactions, or substances (“Carbons not sp3 hybridized”). The two prediction-verification experiments were characterized by the most class-level claims, the majority of which were made about reaction pathway for both the substitution reaction (90.1%) and elimination reaction (61.9%) experiments. However, Chi-square analysis, as summarized in Table 4, demonstrated that only the substitution reaction experiment was characterized as having fewer case-specific claims (more class-level claims) compared to what was expected of the claims for all eight experiments. Thus, the experiments did not vary greatly with regards to the specificity of the claims students identified in their arguments.
Table 5 Examples of three student written arguments: one from the column chromatography experiment, one from the substitution reaction experiment, and one from the elimination reaction experiment
Claims Evidence Rationale
Column chromatography experiment The isolated carotenes look greatly pure as the carotenes from the spinach matched the ones from the pure liquid R f value of β-carotene = 3 cm/3 cm = 1. Rf value of spinach extract = 3 cm/3 cm = 1 The extremely nonpolar carotene from the liquid traveled as a nonpolar would to the top. The isolated spinach carotenes traveled as much as the pure form did which shows that it was nicely purified to get even match the exact same Rf values of 1 as the nonpolar carotene would show. It would match the nonpolar carotenes with the high Rf value; high Rf value = very least polar
(Included drawing with labeled TLC plate)
Substitution reaction experiment Carbons not sp3 hybridized will not undergo substitution reactions Bromobenzene did not undergo SN1 or SN2 reactions AgNO3 and ethanol solution favors SN1. NaI and acetone favors SN2 – neither reaction took place
Elimination reaction experiment E2 reactions happen quicker than E1 reactions E2 reactions happen faster because they are one step and only depend on the concentration of our alkyl halide and the base Our E2 test tubes filled with gas more quickly than our E1 tubes. We knew it would react quickly because it has a strong base reacting with our alkene


Explicitness. Six of the eight experiments were characterized by a majority of arguments coded as having an explicit evidence component. Consider the evidence component of the column chromatography experiment argument listed in Table 5. The student clearly identified their TLC plate measurements, presented their calculation of the corresponding Rf values for both the β-carotene standard and the spinach extract, drew a labeled TLC plate as a visual guide, leading to the assigned explicit evidence code. The TLC (84.7%), IR (92.3%), and column chromatography (100.0%) experiments were most often characterized by arguments with explicit evidence components, each of which were more frequently coded as explicit compared to what was expected across all eight laboratory experiments. Conversely, the explicit evidence code appeared least often for the substitution reaction (42.6%) and synthesis of esters (36.2%) experiments, both falling below the expected frequency of the explicit evidence code. The arguments from these experiments more often included implicit evidence, as highlighted by evidence provided for the substitution reaction experiment example in Table 5. In case, the student identified that “Bromobenzene did not undergo SN1 or SN2 reactions” without clearly detailing specific observations made while running these separate sets of reaction. Because one has to imply that no precipitation, color change, or cloudiness was observed throughout either of the SN1 and SN2 reactions described, this evidence component was coded as implicit for this student's argument.

Chemical rationales across each experiment did not vary greatly with regards to how explicitly these ideas were supported in student arguments. Arguments across most experiments were either about evenly split between the frequency of the explicit and implicit rationale codes (IR and elimination reaction experiments) or showed a slight preference for an implicit rationale (TLC, GC, NMR, and synthesis of esters experiments). Consider the following argument from the GC experiment:

Claim: “Different length fatty acid carbon chains exhibit different retention times.”

Evidence: “Carbon chains with 12, 14, 16, and 18 carbons show different peaks on GC spectra.”

Rationale: “We had peaks showing carbon chains of 12, 14, 16, and 18 carbons in our unknown fat.”

In this argument, the student supported their class-level claim about fatty acid carbon chain length and retention time with a rationale that did not clearly connect the chain lengths observed in their “unknown fat” to the “different peaks on the GC spectra” they collected. Thus, this chemical rationale implicitly rationalized the presence of “different length fatty acid carbon chains” without using their data to complete the inference. On the other hand, only the column chromatography (59.1%) experiment was characterized by chemical rationales that were more often coded as explicit than expected across all experiments. Rationales produced during the column chromatography experiment were often clearly supported by coordinating ideas related to observed reaction conditions (72.7%), chemical properties (75.8%) and molecular structures (37.9%) of the molecules with which they worked, and the chromatographic data (87.9%) collected from their experiment. As exemplified by the rationale in Table 5, the student clearly rationalized the TLC plate behavior of each β-carotene sample, as indicated by Rf values, and connected these ideas to the expected polarity of the compound.

Completeness. The degree of completeness of student evidence varied greatly depending on which experiment students performed. The IR (88.5%) and column chromatography (100.0%) experiments displayed the greatest frequency of complete evidence components, in which students identified all the necessary supporting evidence to justify their proposed claim. Again, consider the evidence component of the column chromatography experiment argument shown in Table 5. This student identified all the relevant pieces of empirical data associated with their experiment and provided an illustration of their TLC plate as support. Similarly, reflect on this argument from the IR experiment:

Claim: “Limonene has alkane and alkene functional groups.”

Evidence: “The wavenumber values from the IR spectroscopy: C–H was 3010 cm−1 and C[double bond, length as m-dash]C was 1644 cm−1.”

Rationale: “The known values for these two bonds are 3100–2900 cm−1, which matches our result of 3010 cm−1. The 1650 cm−1 for C[double bond, length as m-dash]C compares to 1644 cm−1 experimental value.”

In this argument, the student provided evidence regarding their observed IR peaks, making specific reference to the wavenumbers collected in their spectra and the bond types identified by each peak related to their compound of interest. Conversely, the GC (73.9%), elimination reaction (83.8%), and synthesis of esters (95.7%) experiments shown a major preference for incomplete evidence components. Arguments coded with incomplete evidence for these experiments often included individual pieces of data in their argument and, thus, excluded key pieces of information need to support the proposed chemical inference. For example, consider the elimination reaction experiment argument shown below.

Claim: “E1 reaction produced a greater percentage of 2-butene for its products.”

Evidence: “E1 reaction was completed with 0.4 mL of 2-butanol and 0.6 mL of an acid mixture.”

Rationale: “E1 2-butene total area was greater than 1-butene. This was not true for E2 reaction.”

In this argument, the student made a claim comparing the amount of products produced in E1 reactions implicitly to E2 reactions. The evidence referenced the reaction conditions of the E1 reaction, but failed to provide experimental data that could support the claim regarding the “percentage of 2-butene.” Although arguments from the elimination reaction experiment often relied on chromatographic data (85.7% of arguments) as evidence, these arguments unsuccessfully detailed all the necessary evidence needed to support the proposed claim.

Arguments across each experiment (except the IR experiment) were most frequently characterized as having incomplete rationale components as well. Incomplete rationales excluded key features of either laboratory data or conceptual knowledge needed to support the claims students made. Again, consider the elimination reaction example shown in the previous paragraph. This student rationalized that the “E1 2-butene total area was greater than 1-butene” without making specific reference to the area of the peaks that would support this inference. Similarly, they stated that “This was not true for E2 reactions” without detailing the basis for the rationale they employed in that argument. Incomplete rationales were most common in the synthesis of esters (89.9%), TLC (90.4%), and elimination reaction (95.2%) experiments while the IR experiment remained the only experiment in which students more frequently provided complete rationales (68.3%) in their arguments. Looking at the IR experiment argument shown above, the student rationalized the claim “Limonene has alkane and alkene functional groups” by saying “The known values for these two bonds are 3100–2900 cm−1, which matches our result of 3010 cm−1. The 1650 cm−1 for C[double bond, length as m-dash]C compares to 1644 cm−1 experimental value.” In this rationale, the student provided a rationale that included the wavenumbers of the data collected and the theoretical wavenumbers for each peak observed in their molecule.

Differentiation. Students compared and contrasted the substances and reactions observed in each of the eight experiments. Unlike the specificity coding category, there was greater variability observed for differentiation across the claims, evidence, and rationale components of their arguments. Several experiments included arguments which were characterized by very few or no differentiations across all argument components. For instance, consider the following example from the IR experiment:

Claim: “cis-3-hexen-1-ol has an O–H functional group.”

Evidence: “The IR spectroscopy results for cis-3-hexen-1-ol show a wavenumber 3335.20 cm−1 with a curved dip.”

Rationale: “On IR spectra, O–H bond is shown as a curved dip with a broad wavenumber range between 3600–2900 cm−1 (theoretically).”

In this argument, the student made a claim about the presence of the O–H functional group in their cis-3-hexen-1-ol substance, reported the wavenumber (in cm−1) of their observed peak as evidence, and conferred this bond type with what was theoretically expected for the O–H bond for their rationale. The focus of this argument was a single substance, a single observed peak, and a single bond type and, thus, each component of the argument was coded as “single” for the differentiation coding category. Similarly, arguments from the IR, NMR, and synthesis of esters experiments all demonstrated lower than expected frequency of the “multiple” code of differentiation for student claims, evidence, and rationale components (Table 4). Conversely, several experiments were relatively comparison-rich. For instance, the majority of student claims, evidence, and rationale components of the elimination reaction experiment were characterized as “multiple” differentiation, including the example shown in Table 5. This student supported their claim about the difference in reaction rate between the E1 and E2 reaction pathways with evidence and rationale that further differentiated their observations and theoretical knowledge concerning the two reaction mechanisms. Additionally, both the column chromatography (83.3%) and GC (87.0%) experiments included rationale components containing predominately “multiple” differentiation, including the column chromatography example in Table 5, which compared “the same Rf values of 1” for the isolate and pure β-carotenes observed by that student.

Integration. Laboratory experiments also displayed great variability in the level of integration of core chemical concepts and ideas with the data/observations students detailed in their rationales. Three experiments were characterized as having more highly integrated rationales, including the column chromatography (60.6%), substitution reaction (68.5%), and GC (87.0%) experiments. Arguments that were characterized by integrated rationales can be seen in the example below from the GC experiment.

Claim: “The unknown sample was linseed oil.”

Evidence: “The 16 carbon peak: 7.6% (GC). The sp3 C–H bond: 3008.31 cm−1 (IR). The C[double bond, length as m-dash]C and C[double bond, length as m-dash]O bonds are absent from the IR spectrum.”

Rationale: “The unknown was determined as linseed oil based on the peak for 16-carbon showing ∼6% (7.6%) as expected for linseed oil (palmitic acid) and the rest being composed of other carbons. The data for IR are also coherent with the structure of the fatty acid (only sp3 C–H bonds).”

In this argument, the student made a claim about the identity of their unknown fat starting material in the GC experiment and supported this claim with the presence and absence of IR and GC peaks. They rationalized their claim by describing how both sets of data match was “expected for linseed oil (palmitic acid)” based on the materials provided to them about each possible unknown compound. The student coordinated the data they collected with what is known about each of their possible compounds and, thus, their rationale was coded as integrated. In contrast, the majority of laboratory experiments contained arguments whose rationales were characterized as fragmented, in which students failed to adequately coordinate data and observations with background knowledge in support of their proposed claims. Fragmented rationales were most common in the synthesis of esters (88.4%), elimination reaction (82.4%), and thin-layer chromatography (74.6%) experiments. Consider the example shown below from the synthesis of esters experiment.

Claim: “The combination of 1-hexanol (an alcohol) with acetic acid and sulfuric acid results in an ester.”

Evidence: “After the reflux reaction, we wafted the scent of our product.”

Rationale: “What we smelled was characteristic of what we would expect to smell in the ester, propyl acetate.”

In this argument, the students claimed that their Fischer esterification reaction yielded them an ester product and support this claim with the qualitative data, related to the product's smell, they collected from their reaction. The student rationalized this claim by describing how their product characteristically smelled of propyl acetate. This student relied solely on the smell of their product to support their claim despite having collected both IR and NMR spectral data on their ester product. Due to their inability to coordinate these pieces of evidence in support of their claim, this rationale was coded as fragmented. Additionally, the student made two other arguments that focused independently on the IR and NMR data as they compartmentalized their data analysis and interpretation in support of their proposed claims about the identity of their ester product.

Alignment. This characteristic was fairly consistent across each of our eight experiments with the “aligned” code ranging in frequency of occurrence between 79.6% for the substitution reaction experiment and 100.0% for the GC experiment. Arguments coded as “aligned” has a consistent focus from claim to rationale, as seen in the column chromatography experiment example in Table 7. In this example, the student makes a claim about the “very pure” β-carotene compound they isolate. The student then rationalizes this claim by discussing that the “Rf value was 0.92” and that “both of them have reached the same height,” referring to the standard that was compared to their isolated β-carotene sample. Similarly, the synthesis of esters example argument in Table 7 was also coded as “aligned” as it referred to the synthesized ester in both the claims and rationale components. Arguments coded as “misaligned” were far less common across each experiment we investigated. Consider the following example from the IR experiment:

Claim: “cis-3-hexen-1-ol has alkene, alcohol, and alkane groups.”

Evidence: “O–H bond at 3333 cm−1, C[double bond, length as m-dash]C bond at 1654 cm−1, and C–H bond at 3008 cm−1. These values match the known values for the functional group wavenumbers.”

Rationale: “The peaks at different wavenumbers vary with the percent transmittance values.”

In this argument, the student made a claim about the functional groups present in their cis-3-hexen-1-ol substance and listed the wavenumber values obtained from their IR spectrum as their evidence. For their rationale, the student described one aspect of IR spectral data analysis. Although their rationale was related to IR analysis, the general statement about IR spectral interpretation (“different numbers vary with the percent transmittance values”) was misaligned from their claim about the functional groups present in their substance of interest. Misaligned arguments were fairly uncommon across all eight experiments with none of the experiments having arguments coded as “misaligned” more or less frequently than expected.

Approach to reasoning. Students’ arguments varied also greatly in this area for each of the eight experiments. The deductive approach to reasoning, in which students supported their specific claims by drawing from general chemical principles, appeared as a slight preference for both the elimination reaction (50.5%) and TLC (52.5%) experiments compared to the inductive approach (41.9% and 47.5%, respectively), but was most frequently employed in arguments from the substitution reaction (72.2%) experiments. Considering the following example from the substitution reaction experiment:

Claim: “tert-Butyl chloride reacted very readily through SN1 but not through SN2.”

Evidence: “When tert-butyl chloride was added to the solvent, a precipitate formed as a sediment at the bottom of the vial.”

Rationale: “The molecular structure of this substrate contains a tertiary carbon, which react only by SN1. This explains why the reaction occurred quickly in SN1 but didn’t react through SN2.”

In this argument, the student rationalized their specific claim about the reactivity of tert-butyl chloride with conceptual knowledge about the reactivity of substrates containing tertiary carbons. Many of these arguments (76.5%) were rationalized with chemical principles related to molecular structure in lieu of emphasizing their observations of reaction conditions (46.3%) as rationale. Subsequently, along with showing the greatest preference for deductive reasoning, arguments from the substitution reaction experiment were characterized by the lowest frequency of the inductive approach to reasoning (13.0%). Arguments such as the substitution reaction example in Table 5 demonstrate the inductive approach to reasoning in which students support their general claims (“Carbons not sp3 hybridized will not undergo substitution reactions”) with their laboratory findings (“neither reaction took place”). This approach to reasoning was much more commonly characteristic of arguments from all other experiments, including the IR (69.2%), NMR (76.5%), and synthesis of esters (88.4%) experiments. Consider the synthesis of esters experiment example below:

Claim: “Characteristic peaks show us our verified ester product structure.”

Evidence: “There was a three hydrogen singlet peak downfield.”

Rationale: “The three hydrogen singlet peak matches the structure of isopentyl alcohol confirming our product structure connectivity.”

In this argument representative of those in the synthesis of esters experiment, the student inductively supported their claim about the successful synthesis of their “ester product” with a rationale that emphasized the NMR spectroscopic data the student collected during the course of their experiment. In addition to showing a highest frequency for inductive reasoning, arguments from the IR, NMR, and synthesis of esters experiments showed the greatest frequencies of rationale support with spectroscopic data (98.1%, 82.4%, and 59.4%, respectively) as well as the lowest rationale frequencies for reaction conditions (2.9%, 0.0%, and 33.3%, respectively).

The hybrid approach to reasoning, in which students supported their findings with both chemical concepts and observable data, was far less frequent across all laboratory experiment. The NMR (18.8%) and substitution reaction (14.8%) were demonstrated as the only experiments to have a higher frequency of this approach to reasoning compared to expectations across all experiments. Consider the NMR experiment rationale example shown below:

Rationale: “Since we know there are three neighbors (since it was a quartet peak), that means there cannot be 4 hydrogens bonded to a carbon (that would be methane). So, that indicates this is including more than one identical group (the integration). Then, since the IR spectra indicates the presence of a C[double bond, length as m-dash]O bond, it is the next logical step to assume it is what connects the two groups previously determined from the NMR. We also know that since the triple is the most shielded, which is indicated by chemical shift, we know it is furthest away from the oxygen (most electronegative atom).”

In this example, the student made a claim (not shown) in which they proposed the structure for their unknown substance. In support of this case-specific claim, the student drew from both NMR conceptual background and analysis of their own spectroscopic data. By identifying “the most shielded” peak as “indicated by chemical shift” the student applied NMR principles regarding the position of peaks in reference to their own data about “the triple” peak they observed on their spectra. Additionally, the student pieced together their proposed structure on the basis of their spectroscopic data in saying “there are three neighbors (since it was a quartet peak).” These simultaneous references to the principles guiding analysis of an NMR spectrum and presentation of their own data analysis demonstrate the hybrid approach to reasoning that was most commonly associated with arguments produced from the NMR experiment. Similarly in the substitution reaction experiment, students produced a notable proportion of arguments following the hybrid approach to reasoning (14.8% of arguments) in which they supported their claims with rationale referencing molecular structure (76.5% of rationale components) and their observed reaction conditions (46.3% of rationale components).

Quality of student argumentation. Through the analysis of arguments from our eight experiments, we identified several domain-general coding categories that were associated with argument quality: explicitness, completeness, alignment, and integration. Quality scores included in Table 6 were calculated using the procedure described in the data analysis section. This table includes the average score (relative frequency) of each quality-based argument code broken down by laboratory experiment as well as the average overall argument score (from 0 to 6) of each experiment.
Table 6 Average quality (relative frequency) of the six domain-general codes associated with student argument quality for each of our eight experiments
Coding category Codes DC EC ISE PV SC
TLC (N = 177) IR (N = 208) CC (N = 198) GC (N = 161) NMR (N = 170) SR (N = 162) ER (N = 210) SE (N = 207)
‘E’ indicates a code assigned to student evidence, and ‘R’ indicates a code assigned to student rationale.a Indicates a statistically significantly different value than expected of each code for all eight experiments.
Explicitness Explicit (E) 0.847a 0.923a 1.000a 0.696 0.765 0.426a 0.657 0.362a
Explicit (R) 0.441 0.490 0.591a 0.435 0.400 0.395 0.505 0.464
Completeness Complete (E) 0.441 0.885a 1.000a 0.261a 0.518 0.488 0.162a 0.043a
Complete (R) 0.096a 0.683a 0.242 0.261 0.247 0.309 0.048a 0.101a
Integration Integrated 0.254a 0.385 0.606a 0.870a 0.329 0.685a 0.171a 0.116a
Alignment Aligned 0.881 0.962 0.970 1.000 0.824 0.796 0.876 0.957
Argument quality Scaled 0 to 6 2.96a 4.33a 4.41a 3.52a 3.08 3.10 2.42a 2.04a


The overall average argument score of the 1493 arguments analyzed in this research study was 3.23 out of a possible 6. The column chromatography experiment contained arguments of the highest quality (4.41 out of 6). An example from the column chromatography experiment is shown in Table 7 to demonstrate the assignment of argument quality scores for each experiment. In this example, the argument presented a clear evidence component that fully detailed the TLC data collected (as well as a drawing of the TLC plate – not shown) and how these data were used to calculate the Rf value of the observed β-carotene compound. This argument was also coded as “explicit” and “complete” for the rationale component, “integrated”, and “aligned” and, thus, received a total quality score of 6 out of 6. The column chromatography experiment led to arguments that, on average, had the highest quality of the eight analyzed in this study, including having the greatest quality for the explicitness of evidence, explicitness of rationale, and completeness of evidence as well as the second highest quality of alignment and third highest quality of integration.

Table 7 Argument quality of two student arguments: one from the column chromatography experiment and one from the synthesis of esters lab
Claims Evidence Rationale
Column chromatography experiment The β-carotene is very pure Labeled TLC plate Since both of them have reached the same height. The pure β-carotene is very nonpolar and it has C[double bond, length as m-dash]C and C–H bonds which makes it very hydrophobic. The more nonpolar a molecule is, the more distance (Rf) it will travel, which explains why the Rf value was 0.92 which wouldn’t be the case if the β-carotene was mixed with polar chlorophyll which would make the Rf significantly less than 0.92 since the polar molecules have a higher affinity to the silica plate
R f for both = 3.5 cm/3.8 cm = 0.92
(Drawn and labeled TLC plate included)
Overall score = 6 Aligned = 1 Explicit (E) = 1 Explicit (R) = 1
Complete (E) = 1 Complete (R) = 1
Integrated = 1
Synthesis of esters experiment We got our ester The 1H NMR has a characteristic singlet The singlet on the 1H NMR is characteristic of an ester with acetic acid. Since we saw that singlet, we know we had an ester
Overall score = 2 Aligned = 1 Implicit (E) = 0 Explicit (R) = 1
Incomplete (E) = 0 Incomplete (R) = 0
Fragmented = 0


The IR and GC experiments also led to arguments with higher-than-expected quality. The IR experiment was characterized by the highest quality of completeness of student rationale components as well as the second highest explicitness and completeness of evidence. Similarly, the GC experiment was associated with arguments with the highest quality in alignment and integration.

Arguments from the substitution reaction experiment (3.10) had average quality. Despite having the second highest quality in completeness of rationale and integration, this experiment included the lowest quality arguments in terms of explicitness of rationale and alignment, as well as the second lowest quality in explicitness of evidence. Similarly, the NMR experiment (3.08) was linked to arguments with an average quality in most categories.

Arguments from the TLC, elimination reaction, and synthesis of esters experiments were of lower quality than expected. The TLC experiment was the first experiment performed by students and was characterized by arguments with the third lowest overall quality (2.96). These arguments had the second lowest completeness of rationale and the third lowest integration quality. The elimination reaction experiment (2.42) had arguments with the second lowest overall quality, which included several coding categories with lower-than-expected quality compared to the other experiments.

The final experiment in the semester was the synthesis of esters experiment and resulted in arguments of the lowest average quality. Table 7 includes an example of argument quality characterization for the synthesis of esters experiment. In this example, the student provided an aligned argument in which the focus of the claim and the rationale are both “our ester” product that “we know we had.” The unclear description of only the “characteristic singlet” observed in their NMR spectrum neglected to include description of other peaks as well as the other types of data collected about their ester product. Thus, the argument was rated as both “implicit” and “incomplete” quality of evidence. In their chemical rationale, although the student explicitly tied the presence of “the singlet on the 1H NMR” to the ester product they synthesized, they included an incomplete description of how their evidence supported their claim. The student produced a fragmented rationale in which they identified only NMR data and failed to coordinate other observations, data, and structural information about their ester product.

Which factors are most impactful and how do these factors affect the nature and quality of student argumentation across each of our eight experiments of interest?

To address our second research question, we identified several factors that we thought would impact the nature and quality of student argumentation across the eight laboratory experiments. These factors included experiment order in the semester, experiment type, the amount and types of data sources available to students, and the level of scaffolding for student argumentation present in each experiment. Below we describe and exemplify the impact of these factors on the nature and quality of student arguments.
Experiment order. We hypothesized that as students became increasingly familiar with the practice of post-lab argumentation, argument quality would increase throughout the course of a semester. As shown in Table 8, where experiments are listed in order of decreasing argument quality, that was not the case. Students’ arguments from the last two experiments of the semester, ER and SE, exhibited the lowest quality of all, while arguments built for the second (IR) and third (CC) experiments in the sequence were found to have the highest quality as characterized in our study. We also analyzed the quality of each domain-general coding category in relation to experiment placement during the semester. Notably, the elimination reaction and synthesis of esters experiments were characterized by arguments of the lowest quality for the explicitness of evidence (SE), completeness of evidence (SE), completeness of rationale (ER), and integration (SE) coding categories compared to the other experiments. Our data indicated that experiment order did not have an impact on overall argument quality nor on the quality of any of the domain-general coding categories of interest.
Table 8 Average argument quality (ranked from high to low), amount of data sources, and types of data sources for each laboratory experiment
Experiment (#, type) Argument quality Types of data sources (amount of data sources)
a Indicates a statistically significantly different argument quality than expected across all eight experiments.
Column chromatography CC (3, EC) 4.41a Qualitative TLC data + known molecules (2)
Infrared spectroscopy IR (2, DC) 4.33a IR spectroscopic data + known molecules (2)
Gas chromatography GC (4, ISE) 3.52a Qualitative observations + quantitative GC data (2)
Substitution reactions SR (6, PV) 3.10 Qualitative observations + known molecules (2)
Nuclear magnetic resonance spectroscopy NMR (5, ISE) 3.08 Spectroscopic data: IR, 1H NMR, 13C NMR (3)
Thin-layer chromatography TLC (1, DC) 2.96a Qualitative TLC data + known molecules (2)
Elimination reactions ER (7, PV) 2.42a Qualitative observations + quantitative GC data + known molecules (3)
Synthesis of esters SE (8, SC) 2.04a Qualitative observations + IR, 1H NMR spectroscopic data + known molecules (4)


Experiment type. We also hypothesized that the type of laboratory experiment (e.g., prediction–verification versus synthesis and characterization) would be an impactful factor on the nature and quality of student argumentation. We speculated that different types of experiments would exert varied cognitive challenges and tap on diverse conceptual resources. However, our data suggests that experiment type was not a major determinant of argument quality. For example, although the TLC and IR experiments were both characterized as data collection, analysis, and interpretation (DC) experiments, students’ arguments from these two labs significantly differed in both overall quality and specific quality for several domain-general categories of analysis. The overall quality of arguments from the IR experiment was the second highest (4.33), while arguments from the TLC lab were characterized as having the third lowest quality (2.96). Students’ arguments from the IR experiment exhibited higher completeness of both the evidence and rationale components, while arguments from the TLC were characterized by a greater degree of differentiation. Additionally, arguments from the IR lab were more frequently based on an inductive approach to reasoning using spectroscopic data compared to arguments from the TLC lab. Equivalent disparities were observed between arguments from other experiments in the same category, while experiments classified as different types led to arguments of similar overall and specific quality. This is exemplified by the IR and CC labs in the group of experiments with highest overall argument quality and the ER and SE labs on the bottom of the list in Table 8.
Amount and types of data sources. Each of our eight experiments presented students with different pieces of data to collect, analyze, and utilize as they made sense of their laboratory findings as summarized by Table 8. We speculated that more data-intensive experiments would present a greater challenge for students as they had to coordinate more pieces and types of data to construct their post-lab arguments. Our analysis showed that the amount and types of data presented to students did have an impact on both the nature and quality of their post-lab arguments.

For example, access to both chromatographic data and spectroscopic data during an experiment (e.g., IR, NMR, and SE labs) often led students to engage in an inductive approach to reasoning, while the absence of these types of data was more frequently linked to arguments built using deductive reasoning (based on theoretical knowledge). This latter case is exemplified by the substitution reaction (SR) lab in which students had access to only observational data and their deductive arguments were strongly based on their content knowledge about SN1 and SN2 mechanistic paths. In contrast, in the elimination reactions (ER) experiment, in the same category (PV) as the substitution reactions lab, students gathered chromatographic data and their reliance on inductive reasoning when building arguments was significantly larger.

The amount and types of data available to students also impacted the quality of their argumentation. For example, the CC and IR experiments had the fewest data sources for students to utilize and resulted in the highest quality arguments in our analysis. On the other hand, arguments from experiments in which several types of data were available to students led to lower quality arguments, as highlighted by the elimination reaction and synthesis of esters experiments. Lower argument quality suggested that students had difficulty coordinating multiple pieces of evidence. As shown in Table 3, students’ arguments from these two labs were characterized as highly incomplete and fragmented.

Scaffolding of student argumentation. Finally, we hypothesized that the level of scaffolding of student work could impact argument quality. We characterized level of scaffolding in terms of the amount of guidance provided to students in building their arguments through available resources, such as the laboratory workbook. We speculated that more explicit guidance would enhance the quality of students’ arguments and our results seemed to confirm it.

For example, in the CC experiment, the workbook explicitly asked students to make claims regarding the “purity of your isolated β-carotene” and the relative non-polarity of β-carotene related to “what you would predict based on the structures.” In this case, the students in our sample constructed arguments with case-specific claims that addressed the explicit guidance given to them. Explicit directions were also given on the evidence to be provided by asking students to “draw the results from your TLC” and “calculate the Rf value of each spot.” As a result, all analyzed arguments included the requested elements and were coded as both explicit and complete in the evidence component. This explicit guidance also impacted student rationales, which were characterized by a high frequency of multiple differentiation and a slight preference for an inductive approach to reasoning. In contrast, explicit scaffolding for argumentation was missing in the TLC lab in which students collected and analyzed the same amount and types of data sources as the CC lab (see Table 8) in addition to considering the same chemical concepts of structure, purity, and polarity. In this case, student arguments were of characteristically lower quality in their explicitness and completeness of both the evidence and rationale components. In their rationales, students relied less heavily on comparisons between their observed molecules and more on using theoretical constructs, which led to a slight preference for deductively framed arguments. Despite several striking similarities between the resources available in these experiments, the explicit scaffolding for argumentation in the CC lab correlated with arguments of the highest quality (4.41) while a lack of scaffolding was linked to TLC arguments whose quality (2.96) was below the average for the experiments analyzed in this study.

Discussion

Throughout an academic semester in a college chemistry laboratory students work on different experiments that often provide diverse scenarios from which they are expected to construct meaning. After collecting and analyzing their own data, students are likely to follow varied approaches to reasoning as they analyze and communicate their laboratory findings. In our work, we identified similarities and differences in the nature and quality of the arguments generated by students working on eight distinct experiments in an organic chemistry lab. In particular, we characterized the claims, evidence, and rationale components in students’ arguments along seven domain-general dimensions of analysis as summarized in Table 3.

Our analysis revealed very little variability in the specificity of student claims and the alignment of argument components. The majority of collected arguments across the eight experiments had a case-specific focus on laboratory observations and were aligned in terms of the concepts and ideas used when making claims, providing evidence, and constructing rationales. Nevertheless, the explicitness and completeness of student evidence and rationale components varied greatly across each of the eight experiments. Although the evidence presented by students in their arguments was largely characterized as explicit across most labs, the evidence component for two experiments (substitution reaction and synthesis of esters) was mostly implicit. In general, student rationales were split between being explicit and implicit in nature. While some experiments (elimination reaction and synthesis of esters) were characterized by highly incomplete evidence and rationale components alike, the column chromatography experiment was uniquely coded as having unanimously complete evidence and highly incomplete rationales amongst student arguments. Several experiments (notably IR, NMR, and synthesis of esters) were characterized by very few if any differentiation between substances and reactivity whereas some experiments (elimination reaction) or rationale components (column chromatography and GC) emphasized comparisons between substances and their properties. The level of integration of chemical concepts and ideas with data/observations was also highly variable within student rationales with some experiments (column chromatography, GC, and substitution reaction) more frequently characterized by integrated rationales while the remaining experiments displayed a majority of rationales coded as fragmented. Student approach to reasoning was often either deductive or inductive depending on the experiment, and very few instances of a hybrid approach were observed. Experiments that led to the collection of chromatographic and spectroscopic data (SR) tended to result in arguments that more often followed and inductive approach to reasoning, while experiments in which these types of data were not generated or provided led to arguments that were more frequently characterized as deductive.

We also explored the impact (or lack thereof) of various laboratory factors on the nature and quality of student reasoning as manifested in their written arguments. These factors included experiment order, experiment type, amount and types of data sources available to students, and the level scaffolding provided. Several prior studies have reported increased argument quality over time in an academic course (Sampson et al., 2010; Walker and Sampson, 2013a; Çetin and Eymur, 2017; Hosbein et al., 2021). However, in our case we observed no noticeable trend in argument quality over the course of the academic semester. We also speculated that experiments of the same type would be characterized by similar argument quality. However, we found no relationship between type of experiment and argument quality. In our study, student arguments built while working on experiments classified within the same category (e.g., data collection, analysis, and interpretation) often exhibited very different qualities.

On the other hand, our analysis suggested that the amount and type of data sources available to students in an experiment as well as the degree of scaffolding provided greatly impacted both the nature and quality of the arguments they constructed. Less data-intensive experiments presented students with an easier task of analyzing their data to use in constructing well-aligned arguments. Although at times these arguments still lacked clarity and key details in supporting their claims, experiments that involved fewer data sources and types of data also trended towards higher argument quality. Conversely, experiments that were more data-intensive and included a greater quantity of data sources as well as different types of data tended to be characterized by arguments of lower quality, such as both the ER and SE experiments. Our findings are in alignment with previous studies suggesting that more data-intensive experiments that demand the consideration of different types of evidence often lead to student arguments that lack sophistication in the coordination of both empirical and theoretical pieces (Sandoval, 2003; Havdala and Ashkenazi, 2007). Recent research studies in chemistry education have shown that although students may be able to collect and analyze spectroscopic evidence, they often struggle to construct arguments that are consistent with the entirety of their data (Stowe and Cooper, 2019).

As part of our analysis, we also elucidated the effect of argument scaffolding on the quality of student reasoning. We identified experiments that, despite being similar in terms of the amount and types of data available to make sense of laboratory findings, led to student arguments of quite different quality. In these, cases the amount of explicit scaffolding provided for the construction of arguments seemed to be responsible for the difference. For example, students in the column chromatography experiment were explicitly prompted to make specific types of claims as well as present distinct pieces of evidence. Meanwhile, the TLC experiment involved collection of the same data and an identical conceptual focus but included no prompting for student argument components. Overall, CC experiment arguments were characterized as having the highest quality while the TLC experiment was characterized by arguments of the third lowest quality. Previous studies have presented conflicting findings as to the impact of scaffolding on students’ arguments. In one case, simplifying instructional contexts through scaffolding facilitated more complex argumentation (Berland and McNeill, 2010), while another recent investigation showed that explicit prompt scaffolding had no significant impact on students’ data-based inferences (Stowe and Cooper, 2009). These conflicting results suggests that more research is needed to better understand how scaffolding may affect student argumentation.

Implications

Our first investigation of student argumentation in college organic chemistry labs (Petritis et al., 2021) allowed us to develop an analytical framework to characterize the nature and quality of student arguments at domain-general and domain-specific levels. That study showed the major impact that the framing of laboratory activity can have on student argumentation. In this contribution, we expanded upon this work to better understand how other factors may affect the nature and quality of students’ arguments. The results of both studies provide information and insights that can be used by designers of laboratory curricula, laboratory managers, and trainers of laboratory instructors to enhance the quality of the arguments that students build individually or collectively in the laboratory.

In our study, the quality of written arguments was closely related to the nature and amount of the data with which students had to grapple. This result suggests that laboratory designers, managers, and instructors should pay close attention to the types of data students are expected to collect and analyze in any given experiment, and purposefully select and sequence experimental activity to gradually increase the variety and complexity of the data to be analyzed. Our findings indicate that, in terms of supporting the development of argumentation abilities, the type of experiment that students conduct is likely less important than the nature and amount of data they are expected to analyze and integrate to make sense of their results.

Our results also point to the need for purposeful support of student argumentation as learners become more familiar with this epistemic practice. Various studies suggest that students may not know how to argue in the laboratory context or may lack clarity of the goal of their argumentative task (Berland and Hammer, 2012; Garcia-Mila et al., 2013). Thus, chemistry students are likely to benefit from explicit guidance on how to coordinate experimental data with core chemical concepts and ideas in developing high-quality arguments. McNeill et al. (2006) suggested that fading written instructional scaffolds for argumentation in chemistry better prepared students to produce higher quality arguments compared to students lacking this support. Related studies propose developing learning progression for the epistemic practice of argumentation to improve student performance across the semester (Smith et al., 2006; Berland and McNeill, 2010). These recommendations along with our findings highlight the need for students to be trained in argumentation and have opportunities to learn how to argue from evidence, especially early on in their laboratory experience.

While our findings inform us on several key factors that affect student argumentation in the organic chemistry laboratory, more research is needed to characterize the effect of other variables on the development of this scientific practice, such as the nature of laboratory instruction and the interaction with peers. If we are to foster the ability of students to productively engage in scientific argumentation, it is critical that chemistry education researchers and practitioners better understand the various factors that affect students’ engagement in this epistemic practice and the mechanisms through which these factors affect student reasoning.

Limitations

Our findings emerged from the analysis of arguments generated by students in a single organic chemistry laboratory course at our institution and may thus not be generalizable to other contexts. Although laboratory instructors can be expected to play an important role in the laboratory environment, our analysis did not consider the effect that differences in instruction may have had on the written arguments that were collected. Additionally, some of the written arguments we analyzed were produced individually while others were constructed in groups of two to five. Although other studies have investigated the role of student collaboration on the outcome of argumentation (Sampson and Clark, 2009), we did not analyze differences that could exist between individually-constructed and group-constructed post-lab arguments. Lastly, we acknowledge that the laboratory learning environment entails an intricate web of factors each of which has the potential to impact student arguments. Thus, further study is warranted to investigate the interplay of these laboratory components and the impact they have on the nature and quality of student reasoning.

Conflicts of interest

There are no conflicts of interest to declare.

Acknowledgements

First and foremost, we would like to thank the students who participated in this research study. Their arguments have provided us invaluable insight into what laboratory factors impact their ability to argue from experimental evidence. We would also like to extend our gratitude to the graduate student instructors who cooperated with our research team and granted us access to their laboratory sections.

References

  1. Abi-El-Mona I. and Abd-El-Khalick F., (2006), Argumentation discourse in a high school chemistry course, Sch. Sci. Math., 106(8), 349–361.
  2. Abi-El-Mona I. and Abd-El-Khalick F., (2011), Perceptions of the nature and ‘goodness’ of argument among college students, science teachers, and scientists, Int. J. Sci. Educ., 33(4), 573–605.
  3. Bell P. and Linn M. C., (2000), Scientific arguments as learning artifacts: Designing for learning from the web with KIE, Int. J. Sci. Educ., 22, 797–817.
  4. Berland L. K. and Hammer D., (2012), Framing for scientific argumentation, J. Res. Sci. Teach., 49(1), 68–94.
  5. Berland L. K. and McNeill K. L., (2010), A learning progression for student argumentation: Understanding student work and designing supportive instructional contexts, Sci. Educ., 94, 765–793.
  6. Berland L. K. and Reiser B. J., (2011), Classroom communities’ adaptations of the practice of scientific argumentation, Sci. Educ., 95, 191–216.
  7. Brem S. K. and Rips L. J., (2000), Explanation and evidence in informal argument, Cogn. Sci., 24(4), 573–604.
  8. Burke K. A. and Greenbowe T. J., (2006), Implementing the science writing heuristic in the chemistry laboratory, J. Chem. Educ., 83(7), 1032–1038.
  9. Carey S. and Smith C., (1993), On understanding the nature of scientific knowledge, Educ. Psych., 28, 235–251.
  10. Carmel J. H., Herrington D. G., Posey L. A., Ward J. S., Pollock A. M. and Cooper M. M., (2019), Helping students to “do science”: Characterizing scientific practices in general chemistry laboratory curricula, J. Chem. Educ., 96, 423–434.
  11. Çetin P. S., (2021), Effectiveness of inquiry based laboratory instruction on developing secondary students’ views on scientific inquiry, J. Chem. Educ., 98, 756–762.
  12. Çetin P. S. and Eymur G., (2017), Developing students’ scientific writing and presentation skills through argument-driven inquiry: An exploratory study, J. Chem. Educ., 94, 837–843.
  13. Choi A., Hand B. and Greenbowe T., (2013), Students’ written arguments in general chemistry laboratory investigations, Res. Sci. Educ., 43, 1763–1783.
  14. Colthorpe K., Abraha H. M., Zimbardi K., Ainscough L., Spiers J. G., Chen H.-J. C. and Lavidis N. A., (2017), Assessing students’ ability to critically evaluate evidence in an inquiry-based undergraduate laboratory course, Adv. Physiol. Educ., 41, 154–162.
  15. Cooper M. M., (2015), Why ask why? J. Chem. Educ., 92, 1273–1279.
  16. Cooper M. M. and Stowe R. L., (2018), Chemistry education research–from personal empiricism to evidence, theory, and informed practice, J. Chem. Educ., 118, 6053–6087.
  17. Criswell B., (2012), Framing inquiry in high school chemistry: Helping students see the bigger picture, J. Chem. Educ., 89, 199–205.
  18. Cronje R., Murray K., Rohlinger S., Wellnitz T., (2013), Using the science writing heuristic to improve undergraduate writing in biology, Int. J. Sci. Educ., 35(16), 2718–2731.
  19. Crujeiras-Pérez B. and Jiménez-Aleixandre M. P., (2017), High school students’ engagement in planning investigations: Findings from a longitudinal study in Spain, Chem. Educ. Res. Pract., 18, 99–112.
  20. Cruz-Ramírez de Arellano D. and Towns M. H., (2014), Students’ understanding of alkyl halide reactions in undergraduate organic chemistry, Chem. Educ. Res. Pract., 15, 501–515.
  21. Domin D. S., (1999), A review of laboratory instruction styles, J. Chem. Educ., 76(4), 543–547.
  22. Driver R., Newton P. and Osborne J., (2000), Establishing the norms of scientific argumentation in classrooms, Sci Educ., 84(3), 287–313.
  23. Garcia-Mila M., Gilabert S., Erduran S. and Felton M., (2013), The effect of argumentative task goal on the quality of argumentative discourse, Sci. Educ., 97(4), 497–523.
  24. Grimberg B. I. and Hand B., (2009), Cognitive pathways: Analysis of students’ written texts for science understanding, Int. J. Sci. Educ., 31(4), 503–521.
  25. Grooms J., (2020), A comparison of argument quality and students’ conceptions of data and evidence for undergraduates experiencing two types of laboratory instruction, J. Chem. Educ., 97(8), 2057–2064.
  26. Grooms J., Sampson V. and Enderle P., (2018), How concept familiarity and experience with scientific argumentation are related to the way groups participate in an episode of argumentation, J. Res. Sci. Teach., 55, 1264–1286.
  27. Hammer D., Elby A., Scherr R. E. and Redish E. F., (2005), Resources, framing, and transfer, in Mestre J. P. (ed.), Transfer of Learning from a Modern Multidisciplinary Perspective, Greenwich, CT: Information Age Publishing, pp. 89–119.
  28. Hand B. and Choi A., (2010), Examining the impact of student use of multiple model representations in constructing arguments in organic chemistry laboratory classes, Res. Sci. Educ., 40, 29–44.
  29. Havdala R. and Ashkenazi G., (2007), Coordination of theory and evidence: Effect of epistemological theories on students’ laboratory practice, J. Res. Sci. Teach., 44(8), 1134–1159.
  30. Hosbein K. N., Lower M. A. and Walker J. P., (2021), Tracking student argumentation skills across general chemistry through argument-driven inquiry using the assessment of scientific argumentation in the classroom observation protocol, J. Chem. Educ., 98, 1875–1887.
  31. Jimenez-Aleixandre M. P., (2008), Designing argumentation learning environments, in Erduran S. and Jimenez-Aleixandre M. P. (ed.), Argumentation in science education: Perspectives from classroom-based research, Dordrecht: Springer Academic Publishers, pp. 91–115.
  32. Jimenez-Aleixandre M. and Erduran S., (2008), Argumentation in science education: An overview, in Jimenez-Aleixandre M. and Erduran S. (ed.), Argumentation in science education: Perspectives from classroom-based research, Springer: Dordrecht, pp. 3–27.
  33. Juntunen M. K. and Aksela M. K., (2014), Improving students’ argumentation skills through a product life-cycle analysis project in chemistry education, Chem. Educ. Res. Pract., 15, 639–649.
  34. Kadayifci H., Atasoy B. and Akkus H., (2012), The correlation between the flaws students define in an argument and their creative and critical thinking abilities, Soc. Behav. Sci., 47, 802–806.
  35. Katchevich D., Hofstein A. and Mamlok-Naaman R., (2013), Argumentation in the chemistry laboratory: Inquiry and confirmatory experiments, Res. Sci. Educ., 43, 317–345.
  36. Kelley C., (2019), Thinking Through the Laboratory: An Organic Chemistry I Workbook, 1st edition, Dubuque, IA: Kendall Hunt.
  37. Kelly G. J. and Takao A., (2002), Epistemic levels in argument: AN analysis of university oceanography students’ use of evidence in writing, Sci. Educ., 86, 314–342.
  38. Keys C. W., Hand B., Prain V. and Collins S., (1999), Using the science writing heuristic as a tool for learning from laboratory investigations in secondary science, J. Res. Sci. Teach., 36(10), 1065–1084.
  39. Kuhn D., (1991), The skills of argument, Cambridge, England: Cambridge University Press.
  40. McNeill K. L. and Krajcik J., (2007), Middle school students’ use of appropriate and inappropriate evidence in writing scientific explanations, in M. Lovett and P. Shah (ed.), Thinking with data: The proceedings of 33rd Carnegie Symposium on Cognition, Mahwah, NJ: Erlbaum.
  41. McNeill K. L. and Krajcik J., (2009), Synergy between teacher practices and curricular scaffolds to support students in using domain-specific and domain-general knowledge in writing arguments to explain phenomena, J. Learn. Sci., 18, 416–460.
  42. McNeill K. L., Lizotte D. J., Krajcik J. and Marx R. W., (2006), Supporting students’ construction of scientific explanations by fading scaffolds in instructional materials, J. Learn. Sci., 15(2), 153–191.
  43. Moon A., Moeller R., Gere A. R. and Shultz G. V., (2019), Application and testing of a framework for characterizing the quality of scientific reasoning in chemistry students’ writing on ocean acidification, Chem. Educ. Res. Pract., 20, 484–494.
  44. National Research Council, (2012), A framework for K-12 Science Education: Practices, Crosscutting Concepts, and Core Ideas, Washington, DC: National Academies Press.
  45. Park J., Jung D., Kim G., Jun J. and Nam J., (2020), The effects of argument-based inquiry activities on elementary school students’ claims and evidence in science writing, J. Kor. Chem. Soc., 64(6), 389–400.
  46. Petritis S. J., Kelley C. and Talanquer V., (2021), Exploring the impact of the framing of a laboratory experiment on the nature of student argumentation, Chem. Educ. Res. Pract., 22, 105–121.
  47. Reiser B. J., Tabak I., Sandoval W. A., Smith B. K., Steinmuller F. and Leone A. J., (2001), BGuILE: Strategic and conceptual scaffolds for scientific inquiry in biology classrooms, in S. M. Carver and D. Klahr (ed.), Cognition and instruction: Twenty-five years of progress, Mahwah, NJ: Erlbaum, pp. 263–305.
  48. Sampson V. and Clark D. B., (2008), Assessment of the ways students generate arguments in science education: Current perspectives and recommendations for future directions, Sci. Educ., 92, 447–472.
  49. Sampson V. and Clark D. B., (2009), The impact of collaboration on the outcomes of scientific argumentation, Sci. Educ., 93, 448–484.
  50. Sampson V. and Clark D. B., (2011), A comparison of the collaborative scientific argumentative practices of two high and two low performing groups, Res. Sci. Educ., 41, 63–97.
  51. Sampson V. and Walker J., (2012), Learning to write in the undergraduate chemistry laboratory: The impact of argument-driven inquiry, Int. J. Sci. Educ., 34(10), 1443–1485.
  52. Sampson V., Grooms J. and Walker J. P., (2010), Argument-driven inquiry as a way to help students learn how to participate in scientific argumentation and craft written arguments: An exploratory study, Sci. Educ., 95, 217–257.
  53. Sandoval W. A., (2003), Conceptual and epistemic aspects of students' scientific explanations, J. Learn. Sci., 12(1), 5–51.
  54. Sandoval W. A. and Millwood K. A., (2005), The quality of students’ use of evidence in written scientific explanations, Cogn. Instruct., 23(1), 23–55.
  55. Smith C. L., Wiser M., Anderson C. W. and Krajcik J., (2006), Focus Article: Implications of research on children's learning for standards and assessment: A proposed learning progression for matter and atomic-molecular theory, Meas.: Interdiscip. Res. Perspect., 4(1&2), 1–98.
  56. Soysal Y. and Yilmaz-Tuzun O., (2021), Relationships between teacher discursive moves and middle school students’ cognitive contributions to science concepts, Res. Sci. Educ., 51(suppl. 1), S325–S367.
  57. Stowe R. L. and Cooper M. M., (2019), Arguing from spectroscopic evidence, J. Chem. Educ., 96(10), 2072–2085.
  58. Walker J. and Sampson V., (2013a), Argument-driven inquiry: Using to improve undergraduates’ science writing skills through meaningful science writing, peer-review, and revision, J. Chem. Educ., 90, 1269–1274.
  59. Walker J. and Sampson V., (2013b), Learning to argue and arguing to learn: Argument-driven inquiry as a way to help undergraduate chemistry students learn how to construct arguments and engage in argumentation during a laboratory course, J. Res. Sci. Teach., 50(5), 561–596.
  60. Walker J. P., Sampson V. and Zimmerman C. O., (2011), Argument-driven inquiry: An introduction to a new instructional model for use in undergraduates chemistry labs, J. Chem. Educ., 88, 1048–1056.
  61. Walker J. P., Sampson V., Grooms J., Anderson B. and Zimmerman C. O., (2012), Argument-driven inquiry in undergraduate chemistry labs: The impact on students’ conceptual understanding, argument skills, and attitudes toward science, J. Coll. Sci. Teach., 41(4), 74–81.
  62. Walker J., Van Duzor A. G. and Lower M. A., (2019), Facilitating argumentation in the laboratory: The challenges of claim change and justification by theory, J. Chem. Educ., 96, 435–444.

This journal is © The Royal Society of Chemistry 2022