Assessing high school students’ chemical thinking using an essential questions-perspective framework

Ming Chi; Changlong Zheng; Peng He

doi:10.1039/D4RP00106K

View PDF VersionPrevious ArticleNext Article

DOI: 10.1039/D4RP00106K (Paper) Chem. Educ. Res. Pract., 2024, 25, 1143-1158

Assessing high school students’ chemical thinking using an essential questions-perspective framework†

Ming Chi ^a, Changlong Zheng *^a and Peng He ^b
^aInstitute of Chemical Education, Northeast Normal University, People's Republic of China. E-mail: zhengcl@nenu.edu.cn
^bCollege of Education, Washington State University, USA

Received 6th April 2024 , Accepted 25th June 2024

First published on 29th June 2024

Abstract

Chemical thinking is widely acknowledged as a core competency that students should develop in the context of school chemistry. This study aims to develop a measurement instrument to assess students’ chemical thinking. We employed the Essential Questions-Perspectives (EQ-P) framework and Structure of Observed Learning Outcome (SOLO) classification to construct a hypothetical model of chemical thinking. This model comprises three aspects and each aspect includes five cognitive levels to assess students’ chemical thinking. Accordingly, we developed an initial instrument consisting of 27 items in multiple formats, including multiple-choice, two-tier diagnostic, and open-ended questions. We applied the partial credit Rasch model to establish the validity and reliability of measures for the final instrument. Following the process of pilot test, revision, and field test, we finalized the instrument with a refined 20-item instrument. Two hundred and twenty-one Chinese high school students (Grade 12) participated in the pilot and field tests. The results demonstrate that the final instrument effectively produces reliable and valid measures of students’ chemical thinking. Furthermore, the empirical results align well with the hypothetical model, suggesting that the SOLO classification can effectively distinguish the levels of proficiency in students’ chemical thinking.

Introduction

Promoting students’ chemical thinking has been a critical learning goal in chemistry education (Johnstone, 1982; Kraft et al., 2010; Sevian and Talanquer, 2014; Taber, 2014). Researchers primarily focus on students’ development of cognitive thinking from domain-general and domain-specific perspectives. Domain-general thinking is not limited to specific subject content and has a positive impact on student academic achievement (Lawson, 1979; Chandran et al., 1987). A significant amount of research has explored how to develop and assess students’ domain-general thinking within chemistry context, including logical thinking (Lewis and Lewis, 2007; Stamovlasis et al., 2010; Eskandar et al., 2013), critical thinking (Zoller and Pushkin, 2007; Danczak et al., 2020; Wan et al., 2023), and creative thinking (Laius et al., 2009; Tomasevic and Trivic, 2014). Differentiating from domain-general thinking, domain-specific thinking is closely related to specific disciplinary content, such as chemical thinking (Sevian and Talanquer, 2014) and mathematical thinking (Restrepo and Villaveces, 2012; NRC, 2013). In the field of chemistry education, school chemistry plays an irreplaceable role in cultivating students’ chemical thinking. Chemical thinking not only helps students solve problems in the context of chemistry, but also has broader applications in science and societal issues (Sjöström, 2006; Freire et al., 2019). Consequently, chemical thinking has gradually been regarded as a core competency for all students in school chemistry (Talanquer and Pollard, 2010; Sevian and Talanquer, 2014; Ministry of Education, P. R. China, 2017; Landa et al., 2020; Chi et al., 2023).

Chemical thinking involves students in developing and applying knowledge and practices, serving specific purposes such as chemical analysis, transformation, and synthesis (Sevian and Talanquer, 2014). Moreover, chemical thinking equips learners with domain-specific thinking to access and organize chemical knowledge to find and address questions within the discipline (Sevian and Talanquer, 2014; Landa et al., 2020; Chi et al., 2023). Many studies have been conducted to investigate students’ chemical thinking in response to specific disciplinary questions (Yan and Talanquer, 2015; Moreira et al., 2019; Macrie-Shuck and Talanquer, 2020). For instance, Ngai and Sevian (2017) applied open-ended questions to assess students’ chemical identification thinking (i.e., “What is this substance?”). Through qualitative analysis, Weinrich and Talanquer (2015) examined different thinking patterns among students while addressing questions related to chemical causality (i.e., “Why do chemical reactions happen?”), chemical mechanism (i.e., “How do these processes occur?”), and chemical control (i.e., “How can these processes be controlled?”). Cullipher and his colleagues (2015) explored students’ chemical thinking at different levels when they solved tasks associated with benefits, costs, and risks (i.e., “How to evaluate the impacts of chemically transforming matter?”). Overall, researchers have investigated students’ chemical thinking at different aspects, which contribute to a deeper understanding of students’ domain-specific cognitive processes in solving problems using chemical knowledge and practices.

However, previous research has revealed several limitations in assessing students’ chemical thinking. Firstly, assessing students' chemical thinking necessitates an explicit delineation of the types of chemical rationales expected to guide their problem-solving approaches (Talanquer, 2019). While previous studies have either implicitly or explicitly identified the chemical rationales demonstrated by students when tackling different disciplinary problems (Ngai et al., 2014; Weinrich and Talanquer, 2015; Stammes et al., 2022), there remains a lack of an explicit framework to systematically organize these findings and subsequently integrate them into the design of an instrument aimed at assessing students' chemical thinking. Moreover, while most studies have focused on mapping the types and levels of students’ chemical thinking, little evidence of the reliability and validity in assessing students’ chemical thinking was established. Therefore, this paper intends to address this gap by applying the Essential Questions-Perspectives (EQ-P) framework (Chi et al., 2023, N.D. under review) to establish an alternative understanding of chemical thinking and by developing and validating an instrument to assess students’ chemical thinking.

Theoretical foundations

Chemical thinking

Chemical thinking includes two crucial elements: “what to think” and “how to think” (Bensaude-Vincent, 2009; Talanquer, 2011). Examining the historical development of science, prolonged engagement in chemical practice has influenced chemists’ visions, determining the questions they prioritize and continuously raise (Bensaude-Vincent, 2009), such as “What is the unfamiliar substance?”. In response to these questions, chemists have formulated an array of concepts, principles, and laws (Laudan, 1992). Subsequently, those confirmed chemical knowledge have become thinking tools that facilitate problem-solving endeavours (Laudan, 1992; Crombie, 1994; Hacking, 2002).

Over the past decade, significant research efforts have investigated the characteristics of chemical thinking from these two aspects. Certain studies have focused on what chemistry can make us think, thereby identifying certain disciplinary essential questions (Hoffmann, 1995; NRC, 2003; Sevian and Talanquer, 2014). Others have concentrated on examining how chemistry guides our thought processes, highlighting perspectives as thinking tools and identifying critical chemical perspectives relevant to school chemistry (Landa et al., 2020; Ottenhofm et al., 2022). Disciplinary essential questions play a pivotal role in defining chemists’ visions and delineating the overall landscape of chemistry, encapsulating the realm within which chemistry provides explanations (Crombie, 1994; Bensaude-Vincent, 2009; Talanquer, 2011), such as “Why do the properties or behaviours of substances emerge?”. Chemical perspectives embody chemists’ valuable viewpoints on specific aspects of the world and serve as indispensable domain-specific thinking tools, such as thermodynamic perspective on explaining the extent of chemical processes (Giere, 2006; Brigandt, 2013; Landa et al., 2020).

More recently, we have established a chemical thinking framework linking consensus essential questions with perspectives, providing a comprehensive approach for researchers and practitioners to understand, teach, and assess chemical thinking in school chemistry (Chi et al., 2023, ND. under review). Our framework refers to the essential questions-perspectives (EQ-P) framework (shown in Fig. 1), encompassing three disciplinary essential questions and 12 corresponding chemical perspectives. Table 1 presents a descriptive definition of chemical thinking in our framework.


	Fig. 1 The essential questions-perspectives framework.

Table 1 Description of EQ-P framework

Essential questions	Perspectives	Definition (Chi et al., 2023)
What is the substance? (description and identification [D&I])	Physical characteristics (PC)	The fundamental assumption of chemical identification is that each material exhibits certain unique characteristics that differ from those exhibited by other substances. Substances have specific characteristics that can emerge without chemical reactions, such as colour and smell. The physical characteristics of substances are one of the important clues to describing and identifying chemical substances.
	Reaction characteristics (RC)	Substances have specific behavioural characteristics that must manifest through chemical reactions, such as redox, acid–base, thermal stability, etc. Chemical substances can be described and identified based on their representative chemical reactivities.
	Composition (C)	The material system consists of specific chemical components, such as pure substances, chemical elements, or atoms. Components of the substance system are critical information for characterizing and identifying substances.
	Structure (S)	There are interactions between the various components of a substance, giving each chemical substance its unique structure. The fundamental assumption underlying the determination of structure is that there is precisely one distinctive chemical structure for every chemical substance. Chemical substances can be characterized and identified based on their unique structures.

Why do the properties or behaviours of substances emerge? (explanation and prediction [E&P])	Particle (P)	Early chemistry relied on a reductionist view that portrayed matter as an assembly of submicroscopic particles. The properties of substances are thus associated with particles with inherent characteristics that account for the observed properties of materials in linear and additive ways. From the particle perspective, matter is a static chemical system composed of submicroscopic particles such as atoms, groups, and molecules. The types and properties of these particles or groups of such particles influence the properties of matter.
	Interaction (I)	The systematic view shows that the properties and behaviour of matter are supposed to emerge from the dynamic interactions among submicroscopic components (e.g., molecules, atoms, ions). These interactions manifest at the subatomic, atomic, molecular, and multimolecular scale, and the properties that appear at different scale result from the interactions among subscale particles. The properties or phenomenon that emerge at different scales can be explained and predicted based on the interactions among subscale particles.
	Thermodynamics (T)	As a dynamic system in a state of equilibrium, chemical substances exist under certain conditions concerning temperature, pressure, and other contextual variables at the macroscopic level. The possibility of substance change emerges in interaction with the system's surroundings or other systems. The stability and reactivity of chemical substances depend on the system state function and the changes (energetic, entropic) in that function that characterize multiple interactions between the system and its surroundings. The reaction system state function and its changes can be used to explain and predict the direction and extent of the transformation of the material system.
	Kinetics (K)	Matter systems can undergo chemical reactions by interacting with other matter systems or the surroundings. Chemical reactions involve the issue of rate. The reaction rate depends on particles’ random collision and vibration under specific conditions, activation energy, and other factors, representing the kinetic factors of the reaction. Based on those factors, the mechanism can be revealed, and the reaction rate can be explained and predicted.

How do we transform and make the substance? (transformation and synthesis [T&S])	Target molecule (TM)	The synthesis needs to identify desirable products and their molecular structures. Retrosynthetic analysis theory reflects that determining the target molecule's structure and disconnecting its bond can design the synthetic route. Starting materials, reagents, and synthetic routes can be deduced step by step by disconnecting the target molecule.
	Starting material (SM)	The starting material perspective includes the impact of starting material on material synthesis. The starting materials’ structure, nature, and degree of fit with the synthetic reagents influence the artificial design. From the perspective of starting materials, synthetic reagents and suitable chemical reactions can be selected, and synthetic routes can be designed.
	Addition-removal (AR)	Making target products often involves the addition and removal of specific chemical components. The addition-removal perspective emphasises introducing and removing certain components for a specific purpose during the synthesis process. For example, target functional groups can be protected based on the rational addition and removal of groups.
	Process control (PC)	Transformation and synthesis of substances in chemistry must employ chemical reactions. The synthetic route, reaction thermodynamics, and kinetic factors limit the synthesis yield. From the process control perspective, simplifying the reaction steps and controlling various thermodynamic and kinetic factors can improve the synthesis yield.

The EQ-P framework introduces three key aspects of chemical practice from a disciplinary standpoint: description and identification (D&I), explanation and prediction (E&P), and transformation and synthesis (T&S). Additionally, our framework offers four alternative and complementary chemical perspectives to address essential questions. Each perspective can guide students’ thought processes in solving problems related to these questions. For example, the question “Why do energy changes relate to the occurrence of a chemical reaction?” falls under the E&P category. Students can approach this question using various chemical perspectives. One approach is the interaction perspective (see Table 1), where students recognize that bond breaking and formation in a chemical reaction result in endothermic and exothermic phenomena. Alternatively, students can address the question through a thermodynamic perspective (see Table 1), considering the difference in total energy before and after a chemical reaction system, which leads to endothermic and exothermic processes. Thus, the EQ-P framework emphasizes the importance of establishing multiple chemical perspectives for understanding and analysing problems. It encourages students to develop diverse perspectives and consider specific problems from various chemical perspectives, thereby characterizing the sophisticated conceptual mode we expect students to form. Addressing these questions through different perspectives reflects the varying levels of students’ chemical thinking proficiency about conceptual sophistication.

A progression of students’ chemical thinking

We applied the structure of observed learning outcome (SOLO) taxonomy to frame the hypothetical levels of chemical thinking in developing our measurement instrument. The SOLO taxonomy, developed by Biggs and Collis (1982), serves as a robust theoretical model for delineating and examining the hierarchical levels of students’ conceptual sophistication in educational settings (Biggs and Collis, 1982; Brabrand and Dahl, 2009; Rundgren et al., 2012). Conceptual sophistication is a crucial dimension for assessing students’ proficiency in chemical thinking (Talanquer, 2019; Sevian and Talanquer, 2014; Stammes et al., 2022). The application of the SOLO classification offers a general framework for measuring students' conceptual sophistication (Rundgren et al., 2012), which can be integrated with domain-specific content frameworks (Minogue and Jones, 2009; Quinn et al., 2009; Feldman-Maggor et al., 2022). Numerous studies have demonstrated that SOLO classification can provide a reliable theoretical basis for designing student performance in terms of conceptual sophistication across diverse disciplinary fields (Quinn et al., 2009; Evangelou and Kotsis, 2019; Luo et al., 2020).

The SOLO taxonomy encompasses five levels of sophistication, namely ‘prestructural’, ‘unistructural’, ‘multistructural’, ‘relational’, and ‘extended abstract’. Prestructural responses indicate a lack of understanding, whereby students cannot provide meaningful answers to questions. Unistructural responses imply that students can only apply a single aspect of information, fact, or idea to address questions. Multistructural responses demonstrate that students can utilize information, facts, and ideas from multiple aspects; however, these aspects remain disparate and are not yet integrated into a cohesive framework. These three levels manifest the progressive complexity of student thinking through quantitative changes in performance characteristics.

In contrast, ‘relational’ and ‘extended abstract’ represent the two ‘deep level’ responses, which reflect an enhancement in the quality of student thinking and a transition from concrete to abstract reasoning (Biggs and Collis, 1982; Minogue and Jones, 2009). Relational responses involve the integration of at least two distinct aspects of information, facts, or ideas. Such integration facilitates structured thinking, where these aspects collaborate to answer a given question. ‘Extended abstract’ thinking surpasses the confines of provided information and context, enabling the extraction of more general rules or frameworks with broader applicability, involving metacognition of thought processes.

We proposed a theoretical hypothesis model that combines the EQ-P framework with SOLO classification (see Fig. 2). Fig. 2 demonstrates that the EQ-P framework offers a domain-specific foundation for chemical thinking, while SOLO classification characterizes the varying levels of conceptual sophistication students exhibit.


	Fig. 2 The hypothetical model of students’ chemical thinking.

Based on the SOLO classification, we defined Levels 0 to 2 from a quantitative perspective to assess whether students have developed one or more chemical perspectives to address relevant disciplinary questions (D&I, E&P, and T&S, see Table 1) and thus determine their levels of chemical thinking. Subsequently, Level 3 assesses whether students comprehend the connections and differences between two or more chemical perspectives and effectively apply the corresponding chemical perspectives to solve disciplinary problems. Furthermore, Level 4 measures students’ capability to transcend the given problem context and generalize a comprehensive framework for addressing essential questions. These descriptions of conceptual sophistication levels constitute the theoretical basis for developing an instrument to assess student chemical thinking.

Research questions

This study aims to develop and validate an instrument for assessing students’ chemical thinking in accordance with the EQ-P framework. The framework encompasses three key aspects of chemical thinking, namely D&I, E&P, and T&S, presenting a realistic and comprehensive picture of chemical thinking (Chi et al., 2023, N. D. under review). Using the SOLO classification (Biggs and Collis, 1982), this study has developed a hypothetical progression of students’ chemical thinking, which requires empirical evidence to validate the hypothetical levels. Therefore, this study addresses the following research questions (RQs):

RQ1: What evidence exists pertaining to the reliability and validity of the obtained data using the developed instrument for assessing students’ chemical thinking?

RQ2: What evidence exists to support the appropriateness of the hypothetical levels of chemical thinking in distinguishing students’ chemical thinking proficiencies?

Methods

According to our hypothetical model (see Fig. 2), we developed an initial measurement instrument with corresponding test items. We then administered it to high school students as a pilot test. The insights gained from the pilot test analysis led to targeted revisions of the instrument. Subsequently, we administered the revised version of our instrument to students as a field test to establish the reliability and validity of the measures. A detailed description of this process is presented below.

Item design

Various item formats, such as multiple-choice, two-tier diagnostic, and open-ended questions, can be integrated into the design of measurement instruments to assess student achievement (Park and Liu, 2016; Yang et al., 2018). Each item format serves a distinct purpose within the measurement instruments. Although multiple-choice questions have certain limitations in assessing students’ higher-order cognitive abilities, they can be effective when linked to clear assessment objectives, especially when combined with other item formats (Draper, 2009; Koretsky et al., 2016; Krell et al., 2020). This approach can quickly and effectively identify students’ weaknesses in specific areas, thereby enhancing the diagnostic power of the instrument (Nicol, 2007; Hardy et al., 2014; He et al., 2022a). This feature allows us to purposefully use multiple-choice questions to diagnose chemical thinking proficiencies in students who have not developed a chemical perspective or have overlooked specific chemical perspectives. The two-tier diagnostic format includes an explanation of reason based on multiple-choice questions, facilitating the exploration of students’ comprehension regarding the connections and distinctions between various chemical perspectives (Treagust et al., 2011). Additionally, open-ended questions have the potential to offer deeper insights into students’ structured and reflective thinking processes without suggesting specific answers, thereby presenting additional benefits for measurement instruments (Loh and Subramaniam, 2018).

Building upon the hypothetical model delineating distinct levels of chemical thinking, we incorporated a range of item formats, encompassing multiple-choice, two-tier diagnostic, and open-ended questions. Specifically, Levels 1 and 2 predominantly assess students’ mastery of chemical perspectives through multiple-choice questions, while Level 3 primarily assesses students’ structured thinking using two-tier diagnostic questions. Finally, Level 4 focuses on measuring students’ metacognitive abilities related to the problem-solving process by incorporating open-ended questions. The examples and design intentions of the four level items are shown in Table 2. To comprehensively assess each essential question of chemical thinking (see Table 1), we have developed a total of 27 items, with nine items assigned to each aspect (D&I, E&P, and T&S). A depiction of the relationship between all the items in the initial instrument and the hypothetical model can be found in Table 3.

Table 2 Examples and design intentions of the four level items

Level	Items	Design intention
Level 1	Adenosine triphosphate (ATP) is a molecule that directly provides energy for cellular processes. The process of ATP hydrolysis (breakdown) into adenosine diphosphate (ADP) releases energy and is an essential part of cellular metabolism. Which of the following statements about this process is incorrect? (Correct answer: D)	The item focuses on enthalpy and entropy changes that occur during the transformation of ATP to ADP. If a student holds a thermodynamic perspective, the student can predict the directionality and spontaneity of reactions, based on heat release, entropy increase, and Gibbs free energy decrease. Additionally, the student may recognize that catalysts are independent of reaction spontaneity. Conversely, if a student does not hold a thermodynamic perspective, the student may struggle to answer the question correctly. We used binary scores (1 and 0) to grade students’ responses, with a correct answer receiving a score of 1 (D), indicating that students have established a chemical perspective. An incorrect answer (A, B, C) was assigned a score of 0, indicating that a thermodynamic perspective had not yet been established.

	A. The hydrolysis of ATP to ADP releases energy in the form of heat and tends to drive the reaction towards ADP formation.
	B. The process of ATP hydrolysis increases the degree of entropy (disorder) in the system, contributing to a tendency towards ADP formation.
	C. The hydrolysis of ATP to ADP is an energetically favourable reaction, with a decrease in Gibbs free energy.
	D. The hydrolysis of ATP to ADP typically requires an enzyme catalyst and does not occur spontaneously under physiological conditions at 37 °C and neutral pH.

Level 2	Which of the following propositions about the exothermic reaction of hydrogen peroxide (H₂O₂) decomposition is correct? (correct answer: D)	This item assesses students’ comprehension of the decomposition reaction of hydrogen peroxide from thermodynamic and interaction perspectives. It includes four propositions, with Propositions I and II testing thermodynamic understanding. Students must utilize thermodynamic data, such as heat release and entropy increase, and apply Gibbs free energy to predict the spontaneity of the reaction at all temperatures. Proposition II, which suggests a catalyst affects reaction directionality, is incorrect. Propositions III and IV examine chemical interactions, expecting students to explain hydrogen peroxide's solubility and weak acidity through molecular interactions and bond instability. Proposition IV is incorrect as students can predict the properties of molecules by considering the strength of interactions among atoms. Scoring ranges from 0 to 2, with option D indicating mastery of both perspectives for a score of 2, option B reflecting an understanding of only one perspective for a score of 1, and any other incorrect choices resulting in a score of 0.
	I. At any temperature above absolute zero, the reaction can proceed spontaneously in the forward direction.
	II. Manganese dioxide (MnO₂) can reduce the activation energy of the reaction, leading to greater progress in the forward direction.
	III. The reason why H₂O₂ can dissolve in H₂O in any proportion may be due to the high polarity of both H₂O₂ and H₂O molecules.
	IV. Based on the molecular structure of H₂O₂, the H₂O₂ solution should not exhibit acidity.
	A. I, IV B. III C. II, III D. I, III E. IV

Level 3	In the figure, the catalytic reaction process between benzene and liquid bromine is depicted.	This item in a two-tier diagnostic question format assesses students’ ability to distinguish between thermodynamic and kinetic perspectives in chemical reactions. This approach not only gauges their understanding of each perspective through Propositions I–IV but also their capacity to integrate both in problem-solving. The format's strength lies in its exploration of cognitive dimensions, as it prompts students to justify their choices, thereby revealing their depth of thinking (Treagust, 1988; Peterson et al., 1989; Tsui and Treagust, 2003). The grading rubric reflects this, with scores ranging from 0–3 based on the extent of their perspective integration and the thoroughness of their reasoning. This method provides a comprehensive evaluation of students’ conceptual grasp and application skills in chemistry.

	(1) Please select the correct proposition about this reaction from the options below: (correct answer: C)	The grading rubric is as follows: a score of 0 is allocated to students opting for an incorrect answer, signifying an absence of a discernible chemical perspective; a score of 1 pertains to individuals solely favouring a kinetic perspective and selecting option B; a score of 3 is assigned when students comprehensively consider thermodynamic and kinetic factors, predict the possibility and reality of the reaction, and provide detailed reasoning for their answer. If the answer rationale lacks thoroughness or comprehensiveness, 2 points are awarded (associated with option C).
	I. Based on the information in the figure, the catalytic reaction between benzene and Br₂ is exothermic.
	II. Under the influence of the catalyst, the rate of transformation from to is faster than the rate of generation to .
	III. The rate of this reaction primarily depends on the step involving the transformation of to .
	IV. The main product obtained from the catalytic reaction between benzene and Br₂ is .
	A. I B. III C. II, III, IV D. I, II, IV E. I, III, IV
	(2) Please analyse the reaction in detail and explain the reason for choosing the above option?

Level 4	Solubility is an important property for chemists to consider, as different substances generally exhibit varying levels of solubility under identical conditions. For instance, phosphine displays weaker solubility in water compared to ammonia (both of which release heat upon dissolution), while calcium chloride exhibits greater solubility in water than sodium chloride (with calcium chloride causing a significant temperature increase upon dissolution, whereas sodium chloride does not). Please provide a comprehensive explanation for the reasons behind these phenomena based on different chemical perspectives. Based on the above examples, please think and answer the causes responsible for the differences in solubility observed between different substances.	This open-ended question assesses students’ ability to reflect abstractly on their problem-solving strategy, focusing on solubility as influenced by interaction and thermodynamics. Students could consider ammonia's polarity and hydrogen bonding with water. For sodium chloride and calcium chloride, students could adopt a thermodynamic perspective, considering calcium chloride's exothermic dissolution and entropy increase, based on Gibbs free energy. Scoring reflects students’ integration of chemical concepts in explaining solubility. A score of 4 indicates effective summarization and reflection on the cognitive process for explaining solubility of four substances. A score of 3 shows a comprehensive explanation for each substance, while a score of 2 means students can explain solubility for some substances. A score of 1 shows the ability to explain solubility for one substance.

Table 3 Items of the initial instrument

Level	Item format	D&I	E&P	T&S
Level 1	Multiple-choice questions	Q1, Q2	Q3, Q4	Q5, Q6
Level 2	Multiple-choice questions	Q7, Q8, Q9	Q10, Q11, Q12	Q13, Q14, Q15
Level 3	Two-tier diagnostic questions	Q16, Q17	Q18, Q19	Q20, Q21
Level 4	Open-ended questions	Q22, Q23	Q24, Q25	Q26, Q27

During the development of the instrument, both the test content and response process serve as pivotal sources of validity evidence (American Educational Research Association, 2014; Lewis, 2022). The validation of the test content aims to ensure the theoretical soundness of the instrument and its acceptability and accessibility among the target population (Deng et al., 2021; Lewis, 2022). To achieve this, an expert panel comprising three chemistry education researchers and one physics chemistry professor meticulously evaluated the appropriateness and coherence of the test content designed to assess students’ chemical thinking (Abd-El-Khalick et al., 2015; He et al., 2022b; Li et al., 2024). Additionally, four high school chemistry teachers assessed the clarity and suitability of the item content for high school students.

The response process was designed to address concerns regarding the interpretability of the items for students (Deng et al., 2021; Lewis, 2022; Schreurs et al., 2024). To this end, ten Grade 12 students participated voluntarily in this process. They were administered the instrument and then participated in interviews. The following four questions were posed to the participants:

(1) What is your understanding of this question?

(2) What chemical knowledge or principles do you believe are required to answer this question?

(3) Can you suggest any improvements to the wording of any items to enhance comprehension?

(4) Why did you decide to pick the option?

Questions 1–3 were intended to elicit feedback from students regarding their comprehension of the instrument, its effectiveness in assessing students’ chemical perspectives, and any remaining challenges in item wording (Danczak et al., 2020; Deng et al., 2021; He et al., 2021). The fourth question aimed to explore whether students’ explanations of the items aligned with our expectations. Based on the insights gained from these interviews, we revised the wording of certain items in line with feedback from the expert panel, teachers, and students. For example, when faced with Proposition IV in Example 2 of Table 2, none of the interviewed students applied the interaction perspective to consider the properties of hydrogen peroxide solution. They often believed, based on their life experience, that hydrogen peroxide solutions cannot be acidic. Consequently, we revised the wording of Proposition IV to include “Please predict properties based on the molecular structure of hydrogen peroxide” to better diagnose whether students have failed to construct an interaction perspective. After this process, it is expected that all items used for the pilot test will encourage students to apply certain chemical perspectives.

Rasch model

We employed Rasch measurement to establish the psychometric properties and empirical evidence necessary for the revision of items, thus forming our final instrument (Liu and Boone, 2006, 2023). The Rasch model is widely used to provide evidence for enhancing the quality of assessments related to teachers’ performance and students’ achievement in science education (e.g., Neumann et al., 2013; He et al., 2016, 2023; Yang et al., 2018). Instead of employing classical test theory, we opted for Rasch measurement for the following reasons. Firstly, the Rasch model reduces the problem of nonlinearity by converting the probability of success to log odds and logits (Waugh, 2007; Park and Liu, 2016). Secondly, each item difficulty and person ability are estimated on the same logit scale, and the relationship between items and person can be compared by measures (Bond and Fox, 2007; He et al., 2016; 2023). Third, the measures in the Rasch model are statistically equivalent for the items regardless of which persons are analysed and for the items regardless of which items are analysed (Linacre, 2011). After calibrating these reliability and validity indicators, the instrument can be applied to any sample of examinees to produce measures of students’ chemical thinking. The use of the Rasch model provides a robust and standardized approach to measurement.

The Rasch model offers indices that are essential in establishing the reliability and validity of the measurement. The separation index for both persons and items was utilized in Rasch model, which serves as an indicator of the reliability. Generally, a separation index exceeding two is considered adequate (Duncan et al., 2003). Additionally, the validation of an instrument based on the Rasch model typically involves testing for fit statistics, local independence and dimensionality, and the Wright map (Liu, 2010).

Item fit statistics commonly used include mean square residual (MNSQ) and standardized mean square residual (ZSTD), which evaluate the degree of the deviation between observed and expected scores based on the Rasch model. MNSQ and ZSTD can be summed over all persons for each item by two methods of Infit and Outfit, resulting in four fit statistics. An acceptable fit criterion is Infit and Outfit MNSQ within the range of 0.7 to 1.3 and Infit and Outfit ZSTD within the range of −2.0 to +2.0 (Linacre, 2011). In addition, Linacre (2022) recommends that before examining fit statistics, point-measure correlation (PTMEA) should be examined first. The PTMEA indicates how the items contribute to the measures, referring to the correlation between students’ scores on the items and their Rasch measures, which should be positively correlated, and a higher positive correlation is better (Liu, 2010). The purpose of assessing local independence and dimensionality is to determine the relevance and non-redundancy of all instrument items, as well as whether the variation among responses to an item is accounted for by a latent construct (Andrich and Marais, 2019). The criteria to confirm local independence include a ZSTD value greater than −2.0 or a correlation coefficient of residuals lower than 0.7 (Smith, 2005; Linacre, 2011; He et al., 2016). Principal Components Analysis of Rasch Residual is employed to evaluate the extent to which a data set deviates from unidimensionality (Linacre, 2011). Lastly, the Wright map aligns persons and items on a shared linear scale, providing insightful information on the ordering and spacing of items, and thus indicating the alignment between the actual difficulty and the predetermined difficulty of items (Liu and Boone, 2023).

This study utilized the partial credit Rasch models for data analysis. The partial credit Rasch model was appropriate for this study due to the instrument's inclusion of various item formats and its evaluation using multiple rating categories (Liu, 2010). The Rasch model analysis was performed using Winsteps software (Version 3.72.0).

Pilot test

A convenient sampling method was employed to recruit 83 Grade 12 students from a prominent provincial high school in Jilin Province of Mainland China. Given the acceptable level of measurement errors (less than 0.35), the minimum sample size required for conducting Rasch analysis is suggested at 50 individuals (Romine et al., 2015; Wang et al., 2017; Liu and Boone, 2023). Moreover, considering that the EQ-P framework serves as a comprehensive map delineating students’ chemical thinking anticipated to be cultivated upon completing high school chemistry courses (Chi et al., 2023), the students in the study had completed all relevant courses in high school chemistry. They were expected to pursue STEM-related majors in the future. The test was administered by the teacher during students’ designated individual study time within the classroom, with students voluntarily participating and answering using pencil and paper. No time restrictions were imposed during the test administration to ensure students had sufficient time to engage deeply and provide comprehensive responses to the items (Mandinach et al., 2005). On average, students took approximately 3 hours to complete the test. The typical allocation time for college entrance examinations in STEM subjects in Mainland China is 2.5 hours. Given the extended testing time, specific items could be deleted based on Rasch analysis to account for this practical consideration. To ensure consistent and reliable evaluation of the open-ended items in the instrument, the first author and an experienced high school teacher independently scored these items, demonstrating excellent inter-rater reliability (Cohen's kappa = 0.89).

In the pilot study, the person separation index yielded a value of 2.09, equating to a person reliability coefficient of 0.81. Moreover, the item separation index yielded a value of 4.93, indicating an item reliability coefficient of 0.96. Based on the criterion of the Rasch model, four items were found to fall outside the acceptable range for fit statistics, specifically the Infit MNSQ and Infit ZSTD. These four items simultaneously involved both substance identification (D&I) and explanation of properties (E&P), which are two aspects of chemical thinking. Therefore, we removed these four items. Furthermore, the other four items that did not meet the measure of unidimensionality were removed. The deletion was because these items involved everyday situations or socio-scientific issues that could potentially influence students’ cognitive judgments.

Through the analysis of the Wright map, we identified a gap in the item difficulty measures and addressed this concern by adding a multiple-choice question at level 2. This item was designed with the background of the synthesis of salicylic acid and involves the T&S aspect of chemical thinking. Additionally, we modified the item format of three items whose difficulty measures did not align with the expected difficulty observed in the pilot test. For example, Q2 (Level 1) was found to be close in difficulty to Level 3 and was therefore revised to utilize the format of two-tier diagnostic questions (Level 3). Furthermore, we noticed that two Level 4 items lacked corresponding ability measures, indicating that students with higher abilities should be selected for the field test. Ultimately, the revised instrument consisted of 20 items, and the new item code and a detailed description are listed in Table 4.

Table 4 Items of the revised instrument

Level	Items
Note: The annotations in parentheses are the item codes of the initial instrument in Table.
Level 1	L1A (Q1), L1B (Q8), L1C (Q5), L1D (Q3), L1E (Q4), L1F (Q6)
Level 2	L2A (Q16), L2B (Q12), L2C (Q19), L2D (Q14), L2E (new addition)
Level 3	L3A (Q2), L3B (Q17), L3C (Q18), L3D (Q21)
Level 4	L4A (Q25), L4B (Q22), L4C (Q24), L4D (Q23), L4E (Q27)

Field test

To ensure the inclusion of high-ability students, a total of 84 Grade 12 students from the top-tier high school in Jilin Province of Mainland China, voluntarily took part in the field test. Additionally, 54 students from the second-tier high school, all in Grade 12, participated in the field test. Among the participants (n = 138), there were 77 males and 61 females. Grounded in practical considerations and Rasch analysis from the pilot test, the number of items in the revised instrument has been reduced. Consequently, students completed the field test within approximately 2 hours. The separation index, fit statistics, local independence and dimensionality, and Wright map were examined using the Rasch model, which provide reliability and validity evidence for measures of the revised instrument.

Results

We utilized data from the field test to present our findings. To address RQ1, we investigated the psychometric properties of the revised instrument, providing reliability and validity evidence of measures. Regarding RQ2, the measurement of the item difficulty was used to provide empirical evidence supporting the rationality behind the hypothetical level design.

Evidence of reliability and validity

Reliability. In the revised instrument, all items were modeled using a unidimensional approach. The analysis revealed a person separation index of 3.04, corresponding to a high person reliability coefficient of 0.90. Similarly, the item separation index was found to be 7.14, indicating a high item reliability coefficient of 0.98. These coefficients both demonstrate a high level of reliability of the measurement.

Fit statistics. Table 5 provides a comprehensive overview of the summaries concerning both persons and items. In the Rasch model, the estimated ability (for persons) or difficulty (for items) serves as the key measurement. Accordingly, Table 5 reveals a remarkable proximity between person abilities and item difficulties. Additionally, the errors of person ability and item difficulty, which indicate the accuracy of parameter estimation, stand at 0.18 and 0.16 respectively, essentially approaching zero. Remarkably, all fit statistics in this field test exhibit a close resemblance to the acceptable values. Table 6 showcases the fit statistics for the 20 items encompassed in the revised instrument. Significantly, only two items, specifically L1A and L1D, deviate from the acceptable ranges across four fit statistics. All PTMEA values remain positive and reasonably substantial, falling within the range of 0.11 to 0.79.

Table 5 Summaries of persons and items of field test

	Measure	Error	Infit		Outfit
	Measure	Error	MNSQ	ZSTD	MNSQ	ZSTD
Person	0.07	0.18	0.94	−0.2	1.05	0.1
Item	0.00	0.16	0.99	−0.1	1.03	0.0

Table 6 Item fit statistics of the field test

Item	Measure	S.E.	Infit		Outfit		PTMEA
Item	Measure	S.E.	MNSQ	ZSTD	MNSQ	ZSTD	PTMEA
L1A	−1.56	0.23	1.33	2.5	1.86	2.7	0.11
L1D	0.91	0.20	1.26	3.5	1.63	3.4	0.18
L1F	−1.04	0.21	1.11	1.1	1.20	1.9	0.32
L1B	−2.21	0.27	1.22	1.3	1.18	1.0	0.16
L2D	−0.30	0.13	1.11	1.1	1.16	1.9	0.54
L2B	−0.87	0.16	1.13	1.1	1.15	1.2	0.46
L4C	1.67	0.12	1.07	0.6	1.08	0.7	0.63
L2E	−1.17	0.15	0.97	−0.2	1.05	0.3	0.54
L3C	0.08	0.11	1.05	0.4	1.02	0.2	0.67
L1C	−1.77	0.24	0.98	−0.1	1.01	0.1	0.38
L1E	−1.18	0.21	0.99	0.0	0.99	0.0	0.42
L3A	0.25	0.12	0.98	−0.1	0.97	−0.3	0.66
L4B	1.32	0.11	0.92	−0.7	0.93	−0.5	0.74
L2A	−0.48	0.13	0.90	−0.9	0.89	−0.7	0.64
L4D	1.15	0.10	0.88	−1.0	0.89	−0.7	0.74
L4A	1.74	0.16	0.83	−1.3	0.80	−1.6	0.65
L4E	2.21	0.13	0.78	−1.2	0.76	−1.3	0.65
L2C	1.05	0.15	0.75	−1.4	0.74	−1.6	0.65
L3D	0.49	0.12	0.74	−1.5	0.72	−1.8	0.77
L3B	−0.29	0.12	0.73	−1.9	0.70	−2.0	0.79

Local independence and dimensionality. Table 6 illustrates that the ZSTD values for all items exceed −2.0. Furthermore, the residual correlation coefficient between items L3A and L3B is the highest at 0.36 among all items examined. The correlation coefficients of residuals for all item pairs are consistently below 0.7. These findings suggest that the revised instrument is local independent.

The loading scatterplot of the residuals’ principal components analysis is presented in Fig. 3. The horizontal axis indicates the estimated item difficulty in the Rasch model, while the left vertical axis displays the correlation coefficient between students’ item scores and an additional latent construct unrelated to chemical thinking. Each letter in the plot represents a different item, and the right vertical axis is used to show the frequency of items with a correlation coefficient from the left vertical axis. When the items fall within the contrast loading range of −0.4 to +0.4, they do not strongly measure another construct. Based on Fig. 3, the majority of the items meet the criterion of unidimensionality, with only three items (L1C, L1F, and L4A) falling outside the −0.4 to +0.4 range. This suggests that most of the items in the instrument effectively assess students’ latent construct of chemical thinking.


	Fig. 3 Contrast loadings of revised instrument.

Wright map. Fig. 4 illustrates the distribution of person ability on the left side of the vertical line, and the distribution of item difficulty on the right. The vertical line represents the interval scale of logits. Overall, the difficulty distribution of items aligns with the distribution of person ability, with all items corresponding to individuals. The majority of items are consistent with the pre-set level. However, some items, such as L1D and L2C, exhibit higher difficulty than expected, whereas L2E and L3B show lower difficulty than anticipated. Additionally, a gap still exists between L2D and L3A, which can be purposefully addressed in future iterations of the instrument to optimize its performance.


	Fig. 4 The Wright Map of revised instrument.

Evidence of hypothetical levels design

The revised instrument demonstrates acceptable reliability and validity for assessing students’ chemical thinking, as indicated by the above indices. As a result, the instrument can be effectively utilized in larger sample sizes to differentiate students’ chemical thinking proficiencies. To facilitate this, the average difficulty for each item in the four-level range of the Wright map was defined as a threshold (He et al., 2016; Wang et al., 2017), which is presented in Table 7.

Table 7 Threshold values of four chemical thinking level

Level	Items (measures)	Threshold
Level 1	L1A (−1.56), L1B (−2.21), L1C (−1.77),	−1.48
Level 1	L1E (−1.18), L1F (−1.04), L2E (−1.17)	−1.48
Level 2	L2A (−0.48), L2B (−0.87),	−0.49
Level 2	L2D (−0.30), L3B (−0.29)	−0.49
Level 3	L3A (0.25), L2C (1.05), L1D (0.91)	0.56
Level 3	L3C (0.08), L3D (0.49)	0.56
Level 4	L4A (1.74), L4B (1.32), L4C (1.67),	1.62
Level 4	L4D (1.15), L4E (2.21)	1.62

According to Table 7, five different levels of chemical thinking among students can be identified. A student's chemical thinking proficiency is categorized as level 0 if their Rasch measurement is less than −1.48. For students whose measure falls between −1.48 and −0.49, their chemical thinking proficiency is categorized as level 1. Similarly, a measurement between −0.49 and 0.56 indicates level 2 proficiency, while a measurement between 0.56 and 1.62 indicates level 3 proficiency. Finally, the Rasch measurement above 1.62 suggests a proficiency level of 4. The range of students’ chemical thinking levels along the Rasch scale is shown in Fig. 5. This result aligns with the hypothetical model based on SOLO classification, which supports the rationale of the hypothetical levels of chemical thinking.


	Fig. 5 The five levels of the students' chemical thinking proficiencies.

Discussion

Chemical thinking has been widely acknowledged as a core competency for problem-solving among students (Sevian and Talanquer, 2014; Landa et al., 2020; Chi et al., 2023). Diagnosing the progress of students’ chemical thinking is a critical means to enhance teaching aligned with chemical thinking. In this study, we developed and validated an instrument to assess students’ chemical thinking. Specifically, we combined the EQ-P framework with SOLO classification to establish a hypothetical model for determining the developmental levels of students’ chemical thinking. Based on this model, we developed an initial instrument encompassing three chemical thinking aspects (D&I, E&P, and T&S). Utilizing the Rasch model, we conducted two rounds of testing, revised the initial instrument, and provided reliability and validity of the measurement. The data obtained from the revised instrument exhibits satisfactory reliability and validity levels, thus providing robust and insightful measures of students’ chemical thinking. The analysis results showed that students’ chemical thinking proficiencies are a good fit with the hypothesized level, which can effectively distinguish their chemical thinking proficiencies.

Our findings contribute to existing research in the following aspects. Firstly, this study presents an alternative model for assessing students' chemical thinking proficiencies in terms of conceptual sophistication. Assessing students’ chemical thinking entails identifying the specific types of thinking we expect them to exhibit (Sevian and Talanquer, 2014; Stammes et al., 2022; Talanquer, 2019). The EQ-P framework offers a novel approach to characterizing students' chemical thinking within the conceptual mode dimension. The chemical perspectives outlined by the EQ-P framework explicitly represent core chemical ideas that we aim for students to comprehend during their secondary school chemistry education (Chi et al., 2023, N.D. under review). It is also essential to recognize that various chemical perspectives can simultaneously yield valid and potentially complementary responses (Giere, 2006; Griesemer, 2011; Talanquer, 2019). Our objective is for students not only to acquire multiple chemical perspectives but also to apply and explore these perspectives in problem-solving scenarios. To this end, this study introduced the SOLO classification to characterize students' conceptual sophistication levels within the EQ-P framework. The findings provide empirical evidence that the SOLO classification can function as a theoretical framework for differentiating students’ proficiency in chemical thinking. This integration facilitates a more nuanced assessment of students’ chemical thinking proficiencies by incorporating both quantitative and qualitative aspects of their cognitive processes.

Secondly, previous studies primarily focused on characterizing students’ levels of chemical thinking performance using phenomenological methods (Yan and Talanquer, 2015; Moreira et al., 2019; Macrie-Shuck and Talanquer, 2020). However, these studies have not yet developed instruments with satisfactory reliability and validity of measurement that are suitable for large-scale measurement. To address this gap, our study hypothesized a theoretical model for assessing chemical thinking and employed the Rasch model to establish psychometric properties for the measurement. By following a rigorous development process, we finalized a high-quality instrument capable of differentiating students’ performance in the realm of chemical thinking. The instrument is expected to be utilized in testing larger samples, thus further advancing research in this chemical education.

Thirdly, the development of students’ chemical thinking depends on designing curriculum and instruction that explicitly target chemical thinking (Talanquer and Pollard, 2010), offering ample opportunities for its cultivation. Pioneering research has demonstrated the effectiveness of such curriculum implementations, revealing that, compared to traditional methods, students exhibit positive improvements in performance on the ACS conceptual exam and subsequent courses (Talanquer and Pollard, 2017). Nonetheless, evaluating the effectiveness of teaching implementations still requires a set of instruments specifically tailored to chemical thinking (Talanquer and Pollard, 2017; Talanquer, 2019), enabling targeted diagnostics of students’ development in this area. The findings of this study provide a practical and relevant instrument for assessing the efficacy of chemistry curriculum designs and instructional methods in secondary schools.

Implications

The implications of this study pertain to both teaching practice and the assessment research of students’ chemical thinking.

Implications for practice

Regarding teaching practice, the junior and senior high school chemistry curriculum standards in Mainland China explicitly emphasize the importance of enhancing teachers’ understanding of chemical ways of thinking and cultivating students’ core competencies in understanding and solving problems from the chemical perspectives (Ministry of Education, P. R. China, 2017, 2022; He et al., 2021). The EQ-P framework can serve as a guiding tool for implementing chemical thinking instruction (Chi et al., 2023). We have been conducting teaching practice projects based on “Disciplinary Essential Questions and Chemical Perspectives” since 2019. The primary method of this teaching project involves externalizing the disciplinary essential questions implicitly contained in each curriculum theme, encouraging students to develop various chemical perspectives for understanding and addressing these questions. Viewing essential questions from multiple chemical perspectives highlights the diversity of chemical thinking, and it is important for students to recognize this diversity. Encouraging the diverse development of students’ chemical thinking will prompt them to explore various potential solutions from multiple perspectives, which is valuable for fostering their systematic thinking, problem-solving and creativity (Choi et al., 2011; Broman et al., 2018; Martins Gomes and McCauley, 2021).

Implications for research

Regarding assessment research, the hypothesis model and research method of this study offer a new perspective for researchers to assess domain-specific thinking. On the one hand, the SOLO classification reflects students’ chemical thinking proficiencies through the progressive increase and organization of their chemical perspectives. This study further demonstrates that the SOLO classification is suitable as an analytical framework for understanding domain-specific thinking. This approach enriches our understanding of students’ conceptual sophistication, thereby more deeply characterizing and diagnosing the diversity of student thinking. On the other hand, although the Rasch model has been widely applied in ability assessment, this study demonstrates its further application in the field of chemical thinking. The partial credit model, applicable to various item formats, provides robust methodological guidance for developing quantitative assessment instruments for each essential question based on existing qualitative explorations (Cullipher et al., 2015; Weinrich and Talanquer, 2015; Yan and Talanquer, 2015).

Furthermore, there is a need to continuously explore and refine the chemical thinking assessment model. This study utilized the SOLO classification to investigate students’ potential to engage in sophisticated conceptual modes during chemical thinking. Previous studies have found that more advanced knowledge may lead individuals to construct less sophisticated but more targeted and productive explanations (Weinrich and Talanquer, 2016). In other words, among the various reasoning modes, such as descriptive or multicomponent (Sevian and Talanquer, 2014), students often do not need the most complex reasoning mode when solving real problems. However, this study indicates that students exhibit different levels of conceptual sophistication when asked to provide more complex argumentation. Students who have not established any chemical perspective or only apply partial perspectives may find it difficult to effectively solve all the problems at hand. In contrast, students with a sophisticated conceptual mode may have more options when facing problems and are more likely to find the most suitable path or reasoning mode to solve those problems. Nonetheless, the challenge remains in determining the most appropriate conceptual and reasoning mode for student problem-solving. Therefore, assessing students' chemical thinking requires not only recognizing conceptual sophistication and various reasoning modes but also determining which approach is most necessary for solving problems based on the nature of the problem or task (Weinrich and Talanquer, 2016; Talanquer, 2019). It is essential to develop an instrument based on a chemical thinking assessment framework that integrates ideas, reasoning, and practice to identify the most effective problem-solving strategies in future studies. (Talanquer, 2019).

Limitations

Our research has some limitations that should be addressed in future studies. Firstly, the EQ-P framework primarily took a disciplinary perspective, which caused us to overlook some important aspects of chemical thinking, such as evaluating the chemical impact on the environment (Sevian and Talanquer, 2014). In the current social context, chemistry knowledge and practice play an irreplaceable role in addressing social problems (Sevian and Talanquer, 2014; Banks et al., 2015; Cullipher et al., 2015; Zheng et al., 2019). The discipline of chemistry also provides unique perspectives to address benefit-cost-risk issues and make informed decisions, such as through green chemistry and risk assessment (Banks et al., 2015; Cullipher et al., 2015). Future research should further develop and confirm domain-specific perspectives on designing and evaluating the chemical transformation of matter to expand our framework. Additionally, it is necessary to introduce important application scenarios, such as energy, environment, and medical sciences, to develop instruments for assessing students’ chemical thinking proficiency in real-world problems.

Secondly, the majority of students who participated in the instrument administration have learned through traditional chemistry teaching methods. Our instrument still measured the chemical thinking proficiency of students who solve chemical problems by applying the content they have learned traditionally. However, a high exam performer may not necessarily exhibit higher levels of chemical thinking. It is crucial to further investigate whether our instruments can distinguish between students who excel in traditional exams and those who truly excel in chemical thinking. With the ongoing promotion of the “Disciplinary Essential Questions and Chemical Perspectives” teaching method in senior high schools in Mainland China, we plan to apply the instrument to assess students who have experienced this method in the future. We will compare their proficiency with that of students taught through traditional methods to further enhance the discriminant and concurrent validity of the measurement using the instrument.

Thirdly, the final version of the instrument required an excessive amount of time for students to complete. This may be due to the high number of items or our requirement for students to demonstrate sophisticated thinking levels as much as possible. This characteristic may render our instrument suitable for summative assessment purposes but is impractical for classroom administration by teachers. To mitigate this issue, categorizing the items into different essential questions categories (D&I, E&P, T&S) could yield three sub-instruments. However, further development is needed to create formative assessment instruments that are more suitable for classroom use by teachers.

Finally, the data were collected from a limited sample of students from two schools, which may not be representative of the broader population of students across different regions or educational settings. Future research should incorporate a more diverse sample from multiple schools and regions to enhance the generalizability of the results and provide a more comprehensive understanding of students’ chemical thinking proficiencies in various educational environments.

In conclusion, this study highlights the significance of assessing students’ development of chemical thinking and presents the development process of a high-quality measurement instrument. The research findings offer valuable insights for effective assessing and teaching chemical thinking within the context of school chemistry. Some limitations identified in this study will be addressed in future research.

Ethical statement

We conducted our research with a commitment to ethical rigor, carefully balancing the anticipated benefits of advancing educational assessment knowledge against any potential risks. Our ethical protocols included measures for maintaining participant privacy, managing researcher positionality, and ensuring the voluntary nature of participation. This approach underscores our commitment to ethical integrity and the reliability of our findings.

Data availability

The data supporting the findings of this study cannot be available due to ethical confidentiality requirements. The sensitive nature of the data and the need to protect participant privacy precludes us from sharing the data publicly. For further inquiries, please contact the corresponding author (zhengcl@nenu.edu.cn).

Conflicts of interest

The authors declare that they have no conflict of interest.

Acknowledgements

This paper is based upon work supported by several grants, such as National Social Science Fund the 14th Five-year Plan General Education Project under Grant No. HBA 210153, titled “Construction of a system of assessment indices for scientific literacy of primary and secondary school students: with the purpose of the cultivation of top innovative talents” and Northeast Normal University Teacher Education Major Project (Grant No. JSJY20220101), titled “Research on the Construction of a New System of Chemistry Teaching Theory from the Perspective of Discipline Understanding”. Additional support was provided by grants No. ZD22093 and No. 2022M720702. Any opinions, findings, and conclusions or recommendations expressed in the materials are those of the authors and do not necessarily reflect the views of those funding grants.

References

Abd-El-Khalick F., Summers R., Said Z., Wang S. and Culbertson M., (2015), Development and largescale validation of an instrument to assess Arabic-speaking students’ attitudes toward science, Int. J. Sci. Educ., 37(16), 2637–2663.
American Educational Research Association, (2014), American Psychological Association, & National Council on Measurement in Education. Standards for educational and psychological testing. American Educational Research Association.
Andrich D. and Marais I., (2019), A course in Rasch measurement theory: Measuring in the educational, social and human sciences. Springer Nature.
Banks G., Clinchot M., Cullipher S., Huie R., Lambertz J., Lewis R., Ngai C., Sevian H., Szteinberg G., Talanquer V. and Weinrich M., (2015), Uncovering chemical thinking in students’ decision making: a fuel-choice scenario. J. Chem. Educ., 92(10), 1610–1618 DOI:10.1021/acs.jchemed.5b00119.
Bensaude-Vincent B., (2009), The chemists’ style of thinking, Berichte zur Wissenschaftsgeschichte, 32(4), 365–378.
Biggs J. B. and Collis K. F., (1982), Origin and description of the SOLO taxonomy, Evaluating the Quality of Learning, Elsevier, pp. 17–31.
Bond T. G. and Fox C. M., (2007), Applying the Rasch model: fundamental measurement in the human sciences, 2nd edn, Mahwah, NJ: Lawrence Erlbaum.
Brabrand C. and Dahl B., (2009), Using the SOLO taxonomy to analyze competence progression of university science curricula, High. Educ., 58(4), 531–549 DOI:10.1007/s10734-009-9210-4.
Brigandt I., (2013), Explanation in biology: reduction, pluralism, and explanatory aims, Sci. Educ., 22(1), 69–91 DOI:10.1007/s11191-011-9350-7.
Broman K., Bernholt S. and Parchmann I., (2018), Using model-based scaffolds to support students solving context-based chemistry problems, Int. J. Sci. Educ., 40(10), 1176–1197 DOI:10.1080/09500693.2018.1470350.
Chandran S., Treagust D. F. and Tobin K., (1987), The role of cognitive factors in chemistry achievement, J. Res. Sci. Teach., 24(2), 145–160 DOI:10.1002/tea.3660240207.
Chi M., Zheng C. and He P., (2023), Reframing chemical thinking through the lens of disciplinary essential questions and perspectives for teaching and learning chemistry, Sci. Educ. DOI:10.1007/s11191-023-00438-3.
Chi M., Zheng C. and He P., (N.D.), Using the lens of essential questions-perspectives to investigate the representations of chemical thinking in Chinese secondary chemistry textbooks, Res. Sci. Educ., Under review.
Choi K., Lee H., Shin N., Kim S.-W. and Krajcik J., (2011), Re-conceptualization of scientific literacy in South Korea for the 21st century, J. Res. Sci. Teach., 48(6), 670–697 DOI:10.1002/tea.20424.
Crombie A., (1994), Styles of Scientific Thinking in the European Tradition, London: Gerald Duckworth.
Cullipher S., Sevian H. and Talanquer V., (2015), Reasoning about benefits, costs, and risks of chemical substances: mapping different levels of sophistication, Chem. Educ. Res. Pract., 16(2), 377–392 10.1039/c5rp00025d.
Danczak S. M., Thompson C. D. and Overton T. L., (2020), Development and validation of an instrument to measure undergraduate chemistry students’ critical thinking skills, Chem. Educ. Res. Pract., 21(1), 62–78 10.1039/c8rp00130h.
Deng J. M., Streja N. and Flynn A. B., (2021), Response process validity evidence in chemistry education research, J. Chem. Educ., 98(12), 3656–3666 DOI:10.1021/acs.jchemed.1c00749.
Draper S. W., (2009), Catalytic assessment: understanding how MCQs and EVS can foster deep learning, Br. J. Educ. Technol., 40(2), 285–293 DOI:10.1111/j.1467-8535.2008.00920.x.
Duncan P. W., Bode R. K., Min Lai S., Perera S. and Glycine Antagonist in Neuroprotection Americans Investigators, (2003), Rasch analysis of a new stroke-specific outcome scale: the Stroke Impact Scale, Arch. Phys. Med. Rehabil., 84(7), 950–963 DOI:10.1016/s0003-9993(03)00035-2.
Eskandar F.-A., Bayrami M., Vahedi S. and Abdollahi Adli Ansar V., (2013), The effect of instructional analogies in interaction with logical thinking ability on achievement and attitude toward chemistry, Chem. Educ. Res. Pract., 14(4), 566–575 10.1039/c3rp00036b.
Evangelou F. and Kotsis K., (2019), Real vs virtual physics experiments: comparison of learning outcomes among fifth grade primary school students. A case on the concept of frictional force, Int. J. Sci. Educ., 41(3), 330–348 DOI:10.1080/09500693.2018.1549760.
Feldman-Maggor Y., Tuvi-Arad I. and Blonder R., (2022), Development and evaluation of an online course on nanotechnology for the professional development of chemistry teachers, Int. J. Sci. Educ., 44(16), 2465–2484 DOI:10.1080/09500693.2022.2128930.
Freire M., Talanquer V. and Amaral E., (2019), Conceptual profile of chemistry: a framework for enriching thinking and action in chemistry education, Int. J. Sci. Educ., 41(5), 674–692 DOI:10.1080/09500693.2019.1578001.
Giere R. N., (2006), Scientific perspectivism, Chicago: University of Chicago Press.
Griesemer J., (2011), Philosophy and tinkering, Biol. Philos., 26(2), 269–279 DOI:10.1007/s10539-008-9131-0.
Hacking I., (2002), ‘Style’ for historians and philosophers, historical ontology, Cambridge: Harvard University Press.
Hardy J., Bates S. P., Casey M. M., Galloway K. W., Galloway R. K., Kay A. E., Kirsop P. and McQueen H. A., (2014), Student-Generated Content: enhancing learning through sharing multiple-choice questions, Int. J. Sci. Educ., 36(13), 2180–2194 DOI:10.1080/09500693.2014.916831.
He P., Liu X., Zheng C. and Jia M., (2016), Using Rasch measurement to validate an instrument for measuring the quality of classroom teaching in secondary chemistry lessons, Chem. Educ. Res. Pract., 17(2), 381–393 10.1039/c6rp00004e.
He P., Zhai X., Shin N. and Krajcik J., (2023), Applying Rasch measurement to assess knowledge-in-use in science education, in X. Liu and W. J. Boone, (ed.), Advances in Applications of Rasch Measurement in Science Education. Contemporary Trends and Issues in Science Education, vol. 57, Cham: Springer DOI:10.1007/978-3-031-28776-3_13.
He P., Zheng C. and Li T., (2021), Development and validation of an instrument for measuring Chinese chemistry teachers’ perceptions of pedagogical content knowledge for teaching chemistry core competencies, Chem. Educ. Res. Pract., 22(2), 513–531.
He P., Zheng C. and Li T., (2022a), High school students’ conceptions of chemical equilibrium in aqueous solutions: development and validation of a two-tier diagnostic instrument, J. Baltic Sci. Educ., 21(3), 428–444 DOI:10.33225/jbse/22.21.428.
He P., Zheng C. and Li T., (2022b), Development and validation of an instrument for measuring Chinese chemistry teachers’ perceived self-efficacy towards chemistry core competencies, Int. J. Sci. Math. Educ., 20(7),1337–1359 DOI:10.1007/s10763-021-10216-8.
Hoffmann R., (1995), The same and not the same, New York: Columbia University Press.
Johnstone A. H., (1982), Macro- and micro-chemistry, School Sci. Rev., 64, 377–379.
Koretsky M. D., Brooks B. J. and Higgins A. Z., (2016), Written justifications to multiple-choice concept questions during active learning in class, Int. J. Sci. Educ., 38(11), 1747–1765 DOI:10.1080/09500693.2016.1214303.
Kraft A., Strickland A. M. and Bhattacharyya G., (2010), Reasonable reasoning: multi-variate problem-solving in organic chemistry, Chem. Educ. Res. Pract., 11(4), 281–292 10.1039/c0rp90003f.
Krell M., Mathesius S., van Driel J., Vergara C. and Krüger D., (2020), Assessing scientific reasoning competencies of pre-service science teachers: translating a German multiple-choice instrument into English and Spanish, Int. J. Sci. Educ., 42(17), 2819–2841 DOI:10.1080/09500693.2020.1837989.
Laius A., Kask K. and Rannimäe M., (2009), Comparing outcomes from two case studies on chemistry teachers’ readiness to change, Chem. Educ. Res. Pract., 10(2), 142–153 10.1039/b908251b.
Landa I., Westbroek H., Janssen F., van Muijlwijk J. and Meeter M., (2020), Scientific perspectivism in secondary-school chemistry education, Sci. Educ., 29(5), 1361–1388 DOI:10.1007/s11191-020-00145-3.
Laudan L., (1992), Progress and its problems: Towards a theory of scientific growth, University of California Press.
Lawson A. E., (1979), The developmental learning paradigm, J. Res. Sci. Teach., 16(6), 501–515 DOI:10.1002/tea.3660160604.
Lewis S. E., (2022), Considerations on validity for studies using quantitative data in chemistry education research and practice, Chem. Educ. Res. Pract., 23(4), 764–767 10.1039/d2rp90009b.
Lewis S. E. and Lewis J. E., (2007), Predicting at-risk students in general chemistry: comparing formal thought to a general achievement measure, Chem. Educ. Res. Pract., 8(1), 32–51 10.1039/b6rp90018f.
Li T., He P. and Peng L., (2024), Measuring high school student engagement in science learning: an adaptation and validation study, Int. J. Sci. Educ., 46(6), 524–547 DOI:10.1080/09500693.2023.2248668.
softwareLinacre J. M., (2011), WINSTEPS (version 3.72.0) [Computer program]. Retrieved from https://www.winsteps.com.
softwareLinacre J. M., (2022), Winsteps® Rasch measurement computer program User's Guide, https://www.winsteps.com.
Liu X. F., (2010), Using and developing measurement instruments in science education: A Rasch modeling approach, Charlotte, NC: Information Age.
Liu X. and Boone W. J., (2006), Applications of Rasch measurement in science education, JAM Press.
Liu X. and Boone W. J., (2023), Advances in applications of Rasch measurement in science education, 1st edn, Springer International Publishing.
Loh A. S. L. and Subramaniam R., (2018), Mapping the knowledge structure exhibited by a cohort of students based on their understanding of how a galvanic cell produces energy, J. Res. Sci. Teach., 55(6), 777–809 DOI:10.1002/tea.21439.
Luo X., Wei B., Shi M. and Xiao X., (2020), Exploring the impact of the reasoning flow scaffold (RFS) on students’ scientific argumentation: based on the structure of observed learning outcomes (SOLO) taxonomy, Chem. Educ. Res. Pract., 21(4), 1083–1094 10.1039/c9rp00269c.
Macrie-Shuck M. and Talanquer V., (2020), Exploring students’ explanations of energy transfer and transformation, J. Chem. Educ., 97(12), 4225–4234 DOI:10.1021/acs.jchemed.0c00984.
Mandinach E. B., Bridgeman B., Cahalan-Laitusis C. and Trapani C., (2005), The impact of extended time on sat® test performance, ETS Research Report Series, 2005(2), i–35 DOI:10.1002/j.2333-8504.2005.tb01997.x.
Martins Gomes D. and McCauley V., (2021), Creativity in science: a dilemma for informal and formal education, Sci. Educ., 105(3), 498–520 DOI:10.1002/sce.21614.
Ministry of Education, P. R. China, (2017), Chemistry curriculum standards for senior high school, Beijing: People's Education Press.
Ministry of Education, P. R. China, (2022), Chemistry curriculum standards for compulsory education, Beijing: People's Education Press.
Minogue J. and Jones G., (2009), Measuring the impact of haptic feedback using the SOLO taxonomy, Int. J. Sci. Educ., 31(10), 1359–1378 DOI:10.1080/09500690801992862.
Moreira P., Marzabal A. and Talanquer V., (2019), Using a mechanistic framework to characterise chemistry students’ reasoning in written explanations, Chem. Educ. Res. Pract., 20(1), 120–131 10.1039/c8rp00159f.
National Research Council (NRC), (2003), Beyond the molecular frontier: challenges for chemistry and chemical engineering, Washington, DC: National Academy Press.
National Research Council, (2013), The Next Generation Science Standards, The National Academies Press: Washington, DC.
Neumann K., Viering T., Boone W. J. and Fischer H. E., (2013), Towards a learning progression of energy, J. Res. Sci. Teach., 50(2), 162–188 DOI:10.1002/tea.21061.
Ngai C. and Sevian H., (2017), Capturing chemical identity thinking, J. Chem. Educ., 94(2), 137–148 DOI:10.1021/acs.jchemed.6b00387.
Ngai C., Sevian H. and Talanquer V., (2014), What is this Substance? What Makes it Different? Mapping Progression in Students’ Assumptions about Chemical Identity, Int. J. Sci. Educ., 36(14), 2438–2461 DOI:10.1080/09500693.2014.927082.
Nicol D., (2007), E-assessment by design: using multiple-choice tests to good effect, J. Further High. Educ., 31(1), 53–64 DOI:10.1080/03098770601167922.
Ottenhofm K., Westbroek H., van Muijlwijk-Koezen J., Meeter, M. and Janssen, F., (2022), Enhancing ecological hierarchical problem-solving with domain-specific question agendas, Int. J. Sci. Educ., 44 (17), 2565–2588 DOI:10.1080/09500693.2022.2138728.
Park M. and Liu X., (2016), Assessing understanding of the energy concept in different science disciplines: assessing understanding of the energy, Sci. Educ., 100(3), 483–516 DOI:10.1002/sce.21211.
Peterson R. F., Treagust D. F. and Garnett P., (1989), Development and application of a diagnostic instrument to evaluate grade-11 and -12 students’ concepts of covalent bonding and structure following a course of instruction, J. Res. Sci. Teach., 26(4), 301–314 DOI:10.1002/tea.3660260404.
Quinn F., Pegg J. and Panizzon D., (2009), First-year Biology Students’ Understandings of Meiosis: an investigation using a structural theoretical framework, Int. J. Sci. Educ., 31(10), 1279–1305 DOI:10.1080/09500690801914965.
Restrepo G. and Villaveces J., (2012), Mathematical Thinking in Chemistry. HYLE – International Journal for Philosophy of Chemistry, 18(1), 3–22.
Romine W. L., Schaffer D. L. and Barrow L., (2015), Development and Application of a Novel Rasch-based Methodology for Evaluating Multi-Tiered Assessment Instruments: validation and utilization of an undergraduate diagnostic test of the water cycle, Int. J. Sci. Educ., 37(16), 2740–2768 DOI:10.1080/09500693.2015.1105398.
Rundgren C.-J., Hirsch R., Chang Rundgren S.-N. and Tibell L. A. E., (2012), Students’ communicative resources in relation to their conceptual understanding—the role of non-conventionalized expressions in making sense of visualizations of protein function, Res. Sci. Educ., 42(5), 891–913 DOI:10.1007/s11165-011-9229-2.
Schreurs D. G., Trate J. M., Srinivasan S., Teichert M. A., Luxford C. J., Schneider J. L. and Murphy K. L., (2024), Investigation into the intersection between response process validity and answer-until-correct validity: development of the repeated attempt processing issue detection (RAPID) method, Chem. Educ. Res. Pract., 25(2), 560–576 10.1039/d3rp00204g.
Sevian H. and Talanquer V., (2014), Rethinking chemistry: a learning progression on chemical thinking, Chem. Educ. Res. Pract., 15(1), 10–23 10.1039/c3rp00111c.
Sjöström J., (2006), Beyond classical chemistry: subfields and metafields of the molecular sciences, Chem. Int., 28, 9–15.
Smith, Jr.E. V., (2005), Effect of item redundancy on Rasch item and person estimates, J. Appl. Measurement, 6(2), 147–163.
Stammes H., Henze I., Barendsen E. and de Vries M., (2022), Characterizing conceptual understanding during design-based learning: analyzing students’ design talk and drawings using the chemical thinking framework, J. Res. Sci. Teach., 60(3), 643–674 DOI:10.1002/tea.21812.
Stamovlasis D., Tsitsipis G. and Papageorgiou G., (2010), The effect of logical thinking and two cognitive styles on understanding the structure of matter: an analysis with the random walk method, Chem. Educ. Res. Pract., 11(3), 173–181 10.1039/c005466f.
Taber K. S., (2014), The significance of implicit knowledge for learning and teaching chemistry, Chem. Educ. Res. Pract., 15(4), 447–461 10.1039/c4rp00124a.
Talanquer V., (2011), School Chemistry: The Need for Transgression, Sci. Educ., 22(7), 1757–1773 DOI:10.1007/s11191-011-9392-x.
Talanquer V., (2019), Assessing for chemical thinking, Research and Practice in Chemistry Education, Springer Singapore, pp. 123–133 DOI:10.1007/978-981-13-6998-8_8.
Talanquer V. and Pollard J., (2010), Let's teach how we think instead of what we know, Chem. Educ. Res. Pract., 11(2), 74–83 10.1039/c005349j.
Talanquer V. and Pollard J., (2017), Reforming a large foundational course: successes and challenges, J. Chem. Educ., 94(12), 1844–1851 DOI:10.1021/acs.jchemed.7b00397.
Tomasevic B. and Trivic D., (2014), Creativity in teaching chemistry: how much support does the curriculum provide? Chem. Educ. Res. Pract., 15(2), 239–252 10.1039/c3rp00116d.
Treagust D. F., (1988), Development and use of diagnostic tests to evaluate students’ misconceptions in science, Int. J. Sci. Educ., 10(2), 159–169 DOI:10.1080/0950069880100204.
Treagust D. F., Chandrasegaran A. L., Zain A. N. M., Ong E. T., Karpudewan M. and Halim L., (2011), Evaluation of an intervention instructional program to facilitate understanding of basic particle concepts among students enrolled in several levels of study, Chem. Educ. Res. Pract., 12(2), 251–261 10.1039/c1rp90030g.
Tsui C.-Y. and Treagust D. F., (2003), Genetics reasoning with multiple external representations, Res. Sci. Educ., 33(1), 111–135.
Wan Y., Yao R., Li Q. and Bi H., (2023), Views of Chinese middle school chemistry teachers on critical thinking, Chem. Educ. Res. Pract., 24(1), 161–175 10.1039/d2rp00237j.
Wang Z., Chi S., Luo M., Yang Y. and Huang M., (2017), Development of an instrument to evaluate high school students’ chemical symbol representation abilities, Chem. Educ. Res. Pract., 18(4), 875–892 10.1039/c7rp00079k.
Waugh R. T., (2007), Rasch measurement model, in N. J. Salkind (ed.), Encyclopedia of measurement and statistics, Thousand Oaks, CA: Sage, pp. 820–825.
Weinrich, M. L. and Talanquer, V., (2015), Mapping students’ conceptual modes when thinking about chemical reactions used to make a desired product, Chem. Educ. Res. Pract., 16(3), 561–577 10.1039/c5rp00024f.
Weinrich M. L. and Talanquer V., (2016), Mapping students’ modes of reasoning when thinking about chemical reactions used to make a desired product, Chem. Educ. Res. Pract., 17(2), 394–406 10.1039/c5rp00208g.
Yan F. and Talanquer V., (2015), Students’ Ideas about How and Why Chemical Reactions Happen: mapping the conceptual landscape, Int. J. Sci. Educ., 37(18), 3066–3092 DOI:10.1080/09500693.2015.1121414.
Yang Y., He P. and Liu X., (2018), Validation of an instrument for measuring students’ understanding of interdisciplinary science in grades 4–8 over multiple semesters: a Rasch measurement study, Int. J. Sci. Math. Educ., 16(4), 639–654 DOI:10.1007/s10763-017-9805-7.
Zheng C., Li L., He P. and Jia M., (2019), The development, validation, and interpretation of a content coding map for analyzing chemistry lessons in Chinese secondary schools, Chem. Educ. Res. Pract., 20(1), 246–257 10.1039/c8rp00085a.
Zoller U. and Pushkin D., (2007), Matching Higher-Order Cognitive Skills (HOCS) promotion goals with problem-based laboratory practice in a freshman organic chemistry course, Chem. Educ. Res. Pract., 8(2), 153–171 10.1039/b6rp90028c.

Footnote

† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d4rp00106k

Click here to see how this site uses Cookies. View our privacy policy here.