Examining the diversity of scientific methods in college entrance chemistry examinations in China

Yufeng Xu; Huinan Liu; Bo Chen; Sihui Huang; Chongyu Zhong

doi:10.1039/D2RP00235C

View PDF VersionPrevious ArticleNext Article

DOI: 10.1039/D2RP00235C (Paper) Chem. Educ. Res. Pract., 2023, 24, 494-508

Examining the diversity of scientific methods in college entrance chemistry examinations in China

Yufeng Xu ^a, Huinan Liu ^b, Bo Chen *^a, Sihui Huang ^a and Chongyu Zhong ^a
^aSchool of Chemistry and Chemical Engineering, Nantong University, Nantong, P. R. China. E-mail: njcb0128@aliyun.com
^bSchool of Teacher Education, Nantong University, Nantong, P. R. China

Received 12th August 2022 , Accepted 18th November 2022

First published on 20th November 2022

Abstract

Scientific methods have received widespread attention in recent years. Based on the analytical framework derived from Brandon's matrix consisting of four categories of scientific methods, this paper aims to conduct a content analysis to examine how the diversity of scientific methods is represented in college entrance chemistry examination papers from three exam boards in China. It was found that the percentages of the four categories of scientific methods in the examination papers varied significantly from each exam board, highlighting an imbalanced representation of scientific methods. Furthermore, among the four categories of scientific methods, non-manipulative parameter measurement (Non-MPM) accounted for the largest proportion in each examination paper, while the proportion of manipulative hypothesis testing (MHT) was very small, indicating that the practical chemistry items in China are less experimental. At the end of this paper, the implications of the findings and suggestions for further studies are discussed.

Introduction

In the context of the nature of learning, four aspects are incorporated in the science, namely, content or product, process or method, attitude, and technology. Regarded as a process or method, science is a process of acquiring knowledge. Generally speaking, in the philosophy of science, the description and cognition of the ‘scientific method’ have undergone a transformation from ‘the scientific method’ to ‘scientific methods’ (Erduran and Dagher, 2014; Woodcock, 2014). The former regards scientific research as a linear step-wise process, while the latter emphasizes the diversity of scientific methods. Nowadays, the diversity of scientific methods has received increasing attention and has been considered as one of the major concerns for teaching and learning in the field of science education (Duschl et al., 2007; Hodson, 2014). Through scientific methods, it will be easier for the general public and young learners to understand what scientists do and the complexity of scientific work (Erduran and Dagher, 2014). More importantly, after internalizing scientific methods into their own way of thinking and behaving, students could develop habits of mind about knowledge acquisition, spared from the misunderstanding of solving problems by experience. It should be noted that scientific methods are different from the basic methods used in chemical experiments (such as distillation, extraction, filtration, etc.), which refer to specific steps of experimental operations. In reality, scientific methods belong to a sort of methodology.

As an important component of scientific literacy, scientific methods have been emphasized in curriculum reform documents by different countries. For instance, the National curriculum in England: science programmes of study (DFE, 2015) set clear requirements for the cultivation of scientific methods and proposed the arrangements for training students’ scientific methods at different stages. In another example, Next Generation Science Standards (NGSS Lead States, 2013) in the USA stated that scientific investigations use diverse methods to obtain data. Similarly, the newly promulgated Chemistry Curriculum Standards of Senior High School (the 2017 version) in China also focused on scientific methods, indicating that ‘scientific inquiry’ is regarded as one aspect of the chemistry core competencies, and emphasized the cultivation of method elements (e.g., question, evidence, explanation and communication) (MoE, 2018).

Given the importance of scientific methods, its evaluations ought to deserve our attention, especially in summative assessments. At present, summative assessments of scientific methods in many countries are presented in the practical items of high-stakes paper-and-pencil examinations (Cullinane et al., 2019), rather than ‘hands-on’ evaluations of practical science skills, such as in England (Ofqual, 2015) and Canada (Hollins and Reiss, 2016). This is also the case in China. Scientific methods and practical science skills of high school students are not evaluated through activity performance assessments, but through standardized paper-and-pencil tests (i.e., college entrance examinations, called gaokao in China), which determines whether or not students can be qualified into a university. As we know, the assessment regime had a great effect on the amount and quality of practical work implemented by teachers (Erduran et al., 2019). Especially in China, the ‘culture of examination’ is prevalent and the education system is dominated by high-stakes exams (Gu, 2004). Some studies have proved that college entrance examinations influenced the classroom implementation of inquiry activities in China (e.g., Chen and Wei, 2015; Zhang et al., 2003). Hence, it is necessary to examine the characteristics of scientific methods presented in Chinese college entrance chemistry examination papers. The significance of this study is to provide empirical data for revealing the features of summative assessments of scientific methods in China. Furthermore, it provides implications for realizing the diversity of scientific methods in summative assessments, especially in college entrance examinations, that ultimately promotes the understanding of scientific methods by students.

Theoretical framework

Scientific methods in science education

The notion of ‘the scientific method’ was first introduced to American science education, in the late 19th century, as an emphasis on formalist laboratory methods that contributed to scientific facts (Hodson, 1996; Rudolph, 2005). Then, it evolved into a cognitive prescription after Dewey (1910) summarized the analysis of reflective thinking into a five-step ‘complete act of thoughts’. Since then, the myth of the existence of a universal method has been pervasive in science education. Specifically, ‘the scientific method’ was often employed in science curricula to set out the specific steps that scientists followed in their researches, which typically include observing, making a hypothesis, experimenting, analyzing data, confirming or rejecting the hypothesis, and making conclusions (Woodcock, 2014). Subsequently, the idea of ‘the scientific method’ has also been promoted in textbooks as a unique step-wise process through which scientists could test their hypotheses and draw conclusions (Ioannidou and Erduran, 2021). This mindset was attractive in teaching because it represented a simple and reproducible process through which students could plan, conduct, and communicate scientific inquiries.

However, the above description of ‘the scientific method’ has been increasingly criticized by the science education literature (Dodick et al., 2009; Woodcock, 2014) because it implied a cognitive misunderstanding, that is, there was a unified and interdisciplinary method for scientific practice, which reflected a preference for experimental investigations (Cleland, 2001). To support the critique of ‘the scientific method’, relevant studies have proved that scientists employed a variety of methods to obtain scientific evidence, including non-experimental methods such as natural observation and historical investigations (Bauer, 1994; Gray, 2014). As emphasized by Cleland (2001), experimental scientists and historical scientists had different concerns and methods in their respective research; nevertheless, the discrepancies in methodology failed to support the claim that historical sciences are methodologically inferior to experimental sciences. In other words, scientists’ choice of methods often depends on the nature of the problem to be solved and the availability of tools and methods at a given time. For instance, astronomers often use telescopes, employing observational methods, to collect information on celestial objects. Although no experiments and no hypothesis testing were involved in these investigations, the methods utilized are valid and legitimate in gathering information and presenting evidence.

Based on the above discussions, we argued that authentic scientific methods should be presented in school science, that is, teachers and students need to be exposed to different types of scientific research methods. In this way, school science could enable both teachers and students to comprehend the various methods of obtaining data, for example, by taking into account the different scientific cultures and traditions that accounted for the evolution of the methods (Ioannidou and Erduran, 2021). Moreover, it has been explicitly advocated by the learning outcomes at each grade level to convey a pluralistic orientation to scientific methods (Erduran and Dagher, 2014). In this regard, science education ought to promote the idea that not one method or particular line of evidence, but rather the convergence from various sources, can support scientific theoretical claims, thus leading to scientific explanations. Hence, we take the diversity of scientific methods as the basic stance in this study.

Brandon's matrix

Regarding the diversity of scientific methods, many efforts have been made to represent a broader range of scientific methods, by framework alternatives to ‘the scientific method’ (e.g., Lawson, 2003; Turner, 2013; Irzik and Nola, 2014). The representative framework is Brandon's matrix (Brandon, 1994), which is unique because it provides a summary of scientific methods via a taxonomy about the diversity in scientific investigations. Firstly, Brandon (1994) explicitly made the distinction between experiments and observations, and the distinction between experiments and descriptions. Then, he defined the term ‘manipulation’ in a rigorous way, which means ‘the deliberate alteration of phenomena’ (p. 61). On this basis, Brandon (1994) proposed that a scientific investigation can be classified according to two criteria: (a) whether or not it involves manipulation (i.e., only observation), and (b) whether it includes hypothesis testing or parameter measurement (description). This idea was delineated by a two-by-two matrix (Table 1), which included four categories of scientific methods, representing the connection between experiments and observations. Through this matrix, Brandon (1994) illustrated that not all investigations in science relied on hypothesis testing and that not all descriptive work was non-manipulative.

Table 1 Four categories of scientific methods in Brandon's matrix

	Manipulative	Non-manipulative
Parameter measurement	Manipulative parameter measurement (MPM)	Non-manipulative parameter measurement (Non-MPM)
Hypothesis testing	Manipulative hypothesis testing (MHT)	Non-manipulative hypothesis testing (Non-MHT)

As shown in Table 1, the four categories of scientific methods are: (1) manipulative parameter measurement (MPM), (2) non-manipulative parameter measurement (Non-MPM), (3) manipulative hypothesis testing (MHT), and (4) non-manipulative hypothesis testing (Non-MHT). According to Ioannidou and Erduran (2021), in Brandon's matrix, there was no hierarchy among different categories of methods, which means that experimentation (or the manipulation of a variable) and hypothesis testing are regarded as possibilities rather than necessities for scientific investigations. As Brandon (1994) emphasized, for scientists, scientific investigations should be guided by research questions, existing resources and professional knowledge rather than a hierarchy that makes some methods more reliable than others. Moreover, Brandon (1994) pointed out that these methods can be viewed as components of two continua, which ranged from testing to not testing and from manipulation to non-manipulation. Fig. 1 illustrates this relationship, where investigations can be viewed as more (upper left corner) or less (lower right corner) experimental.


	Fig. 1 Representation of the space of experimentality (Brandon, 1994, p. 66).

In reality, there is a close interrelation between ‘scientific methods’ and ‘science practices’, given that they both reflect the contemporary understanding of the nature of science (NOS) (e.g., Abd-El-Khalick and Lederman, 2000; Osborne et al., 2003; Erduran and Dagher, 2014; NSTA, 2020). In addition, the change from ‘scientific methods’ to ‘scientific processes’ and then to ‘science practices’ in the history of science education has been reviewed by Bybee (2011), who highlighted the development and connection between these terms. More specifically, ‘scientific processes’ usually refer to how scientific research is done, emphasizing particular skills such as the manipulation of variables. Typical science practices suggested by the National Research Council (NRC, 2012) are classified into eight categories, one of which is ‘planning and carrying out investigations’. According to Wei and Wang (2021), this category can be divided into exploratory investigations and confirmatory investigations, the former involving parameter measurement and the latter involving hypothesis testing. To sum up, the diversity of ‘scientific methods’ proposed in Brandon's matrix is to some extent consistent with the connotation of ‘science practices’, both of which involve variable manipulation, parameter measurement and hypothesis testing.

Recently, Brandon's matrix, as an analytical framework, has been investigated both theoretically (Erduran and Dagher, 2014) and empirically (e.g., Cullinane et al., 2019; Wei et al., 2022). Erduran and Dagher (2014) first proposed the application of Brandon's matrix as a useful tool to illustrate the diversity of scientific methods and provided an example (the periodicity of elements) in chemistry. After that, Cullinane et al. (2019) used the matrix to investigate what methods underlie the practical items in high-stakes chemistry examination papers in England. The results showed that non-manipulative parameter measurement dominated the exam papers, while manipulative hypothesis testing was the category with the lowest frequency of items, highlighting an imbalance in the representation of scientific methods in chemistry examinations. In addition, Wei et al. (2022) employed the matrix to examine how the diversity of scientific methods is presented in practical work in Chinese school science textbooks, and found that the distribution of the four categories of scientific methods varied from different science textbooks, and that the predominant category was non-manipulative parameter measurement. The researchers concluded that an imbalance of scientific methods existed in the textbooks of each subject (chemistry, physics, and biology). Given that the reliability and validity of Brandon's matrix have been verified in empirical studies, which is suitable for the practical items in chemistry examination papers (Cullinane et al., 2019) and the educational context in China (Wei et al., 2022), we adopted it as the analytical framework in this study.

The illustrations of four categories of scientific methods

When Brandon's matrix is introduced to analyze college entrance chemistry examination papers, the characteristics of the four categories of scientific methods need to be illustrated in the context of practical items. The illustrations are formulated on the basis of reference to the literature of Cullinane et al. (2019) and the presentation form of the chemistry items of college entrance examinations in China. To establish the validity of the analytical framework, three experts had been consulted for their comments on illustrations of the four categories of scientific methods. The experts are professors in the area of chemistry education and are familiar with scientific methods and Chinese college entrance chemistry examinations. They were asked to give opinions about whether or not the illustrations were clear and on whether or not the items could be classified according to them. This work helped us to modify the illustrations of the four categories of scientific methods to be clearer and more complete. For instance, in the initial illustrations, the presentation of hypothesis testing in practical items was divided into only two cases. Two of the three experts suggested adding a third case, where the items provide the question to be explored, the prediction of the question, and relevant experimental data, and then require students to judge and explain whether or not the experimental data support the previous prediction. Hence, we followed the advice of the two experts and added the third case in the final version. The specific illustrations of the four categories of scientific methods are described below.

Manipulative parameter measurement (MPM). There are some subtle differences regarding parameter measurement in practical items and practical work. Since students are unable to measure parameters realistically in the context of paper-and-pencil tests (e.g., phenomenon, state, and volume), they can only understand the process of investigation through the steps or phenomena described in the items, and conduct data calculation and analysis according to the known conditions given in the items. This category of method is characterized by the inclusion of variable manipulation, but does not require students to test hypotheses or make predictions about the phenomenon to be explored.

Non-manipulative parameter measurement (Non-MPM). This category of method is consistent with the previous one because they both belong to the parameter measurement. It mainly examines the students’ ability to analyze chemistry practical items in combination with data (such as diagrams and tables), including data calculations, phenomenon explanations, scheme improvements, etc. This category of method is characterized by the exclusion of manipulating variables, and does not require students to test hypotheses or make predictions about the phenomenon to be explored.

Manipulative hypothesis testing (MHT). The representation of hypothesis testing in practical items has generally been divided into the following three cases. The first one is that a hypothesis for the question to be explored has been put forward in the item, and students are required to design a scheme to verify it. The second is that the question to be explored and relevant methods have been articulated in the item, and the students are required to make predictions about results of the experiment and make explanations about the reasons for the prediction. The third one is that the question to be explored, the prediction of the question, and relevant experimental data have been provided in the item, and the students are required to judge and explain whether or not the experimental data support the previous prediction. In general, this category of method is characterized by the inclusion of variable manipulation, and involves the element of hypotheses or predictions.

Non-manipulative hypothesis testing (Non-MHT). This category of method is consistent with the third category because they both belong to testing of the hypothesis. This category mainly examines the students’ ability to make experimental hypotheses and predictions based on questions or data, including predicting phenomena, verifying hypotheses, demonstrating predictions, etc. This category of method is characterized by the exclusion of manipulating variables, but involves hypotheses or predictions about the question to be investigated.

Research questions

The present study aims to examine how the diversity of scientific methods is represented in college entrance chemistry examination papers from three exam boards in China. Specifically, the research questions of this study are posed as follows:

(1) What scientific methods underlie the practical items in college entrance chemistry examinations?

(2) How does the coverage of scientific methods compare across the three exam boards?

Methodology

Content analysis was adopted as the research methodology in this study, which aims to discover and describe the phenomena under consideration by compressing large volumes of words into fewer content categories based on explicit encoding rules (Stemler, 2001; Berg and Lune, 2017). In this study, the examination was based on the aforementioned analytical framework and aims to provide evidence for the research questions raised earlier.

Context

The college entrance examination, which is one of the most influential national exams in China, refers to the selective qualification test before higher education, which should be taken by high school graduates (usually 18 years old) or candidates with an equivalent education level at a unified time in order to be admitted into the university. Due to discrepancies in the levels of education development in different regions, college entrance examination papers in China are generally administrated in two ways. One is administrated by the education examinations authority of the Ministry of Education (called the National Paper), which is used in most regions of China; the other is autonomously administrated by the education examinations authority of a province or municipality with higher education levels, granted by the Ministry of Education, which is used locally.

Given the significance of college entrance examinations, the education examinations authority (attached to the Ministry of Education, a province or municipality) carefully selects the exam writers every year. The team of exam writers for each subject is usually composed of several subject experts and subject education experts. They are university professors and subject teaching supervisors.† The exam writers for each subject change annually. After compiling the examination paper, it will be reviewed by the education examinations authority, which is responsible for its validity. Relevant studies have confirmed that Chinese college entrance examination papers have shown good reliability and validity in recent years (Zhao et al., 2022). In order to ensure the fairness of college entrance examinations, test papers are strictly confidential before the examination. All exam writers must conduct centralized design of the test papers at the location arranged by the education examinations authority with a closed-off management. They cannot leave there until the end of the college entrance examinations and must sign a non-disclosure agreement that the members, process and details of the design of the test papers should be guaranteed as a secret.

Although college entrance chemistry examination papers in China are administrated by different exam boards, the assessment is based on the same guidelines, that is Chemistry Curriculum Standards of Senior High School (the 2017 version), including a total of 14 content topics (e.g., chemical science and experimental inquiry, common inorganic substances and their applications, structural basis of substances and rules of chemical reaction, simple organic compounds and their applications, chemistry and social development, etc.). In addition, the curriculum standards divide the academic quality (i.e., the academic achievements of students who have completed chemistry courses) into 4 levels, and clearly stipulate that Level 4 is the basis for the design of college entrance examination papers. Four aspects are incorporated in Level 4 (see Table 2), each of which corresponds to certain chemistry core competencies (Wei, 2019). Aspect I focuses on ‘macroscopic identification and microscopic analysis’ and ‘evidence-based reasoning and modeling’; Aspect II stresses ‘changes and equilibrium’; Aspect III highlights ‘scientific inquiry and innovation’; and Aspect IV underlines ‘scientific attitude and social responsibility’. As can be seen from Table 2, the characteristics of variable manipulation, parameter measurement and hypothesis testing are described in Level 4 of the academic quality. Finally, the curriculum standards put forward four principles for design of the examination papers: (1) the core competencies should be taken as the test purpose; (2) the real situation should be taken as the test carrier; (3) actual problems should be taken as the test task; and (4) chemical knowledge should be taken as the tool for solving problems.

Table 2 Contents for Level 4 of the academic quality

Aspect	Contents
I	Students can choose different methods according to their needs, and analyze and infer substances and their changes from different angles; they can explain or predict substances according to their types, compositions, particle structures, and inter-particle forces as well as provide reasons for their predictions; they can analyze and characterize the energy transformation in substance changes from the perspectives of macroscopic and microscopic, qualitative and quantitative, etc.; they can propose suggestions about the application of substances in production, life, science and technology based on the properties of them.
II	Students can comprehensively analyze reaction conditions from the aspects of regulating the reaction rate and improving the reaction conversion rate, and propose measures to effectively control the reaction conditions; they can choose concise and reasonable characterizations to describe and explain the nature and laws of chemical changes, predict the products of transformation based on chemical reaction laws, and determine the evidence to test the prediction made; they can put forward practical suggestions for using chemical change to realize energy storage and release based on the law of energy transformation, design the preparation of inorganic compounds and the synthesis of organic compounds based on the concept of ‘green chemistry’, and comment or optimize the schemes; they can analyze and evaluate the impact of the substance transformation processes on the environment and resource utilization.
III	Students can enumerate the experimental methods for determining the composition and structure of substances, and infer the composition and structure of simple substances according to the data and diagrams obtained from instruments; they can propose valuable experimental research topics in complex chemical problem situations, and design comprehensive experimental plans related to substance transformation, separation and purification, and applications of properties, etc.; they can use variable manipulation to explore and determine appropriate reaction conditions, and complete experiments safely and smoothly; they can use data, charts, symbols, etc., to describe experimental evidence, and analyze and reason accordingly to form conclusions; they can evaluate the experimental plan, process and conclusion, and propose further research ideas.
IV	Students can explain the important role of chemical science development in the utilization of natural resources, material synthesis, environmental protection, safeguarding of human health, and promoting the development of science and technology; they can use chemical principles and methods to make creative suggestions for solving hot issues in production and life; they can conduct analysis and risk assessments on the promotion and application of chemical technology and the use of chemicals; they can analyze the problems that exist in the production and application of certain chemical products based on the idea of ‘green chemistry’, and propose solutions for dealing with chemical problems.

Data sources

In this study, in order to conduct a more comprehensive inspection of college entrance chemistry examination papers in China, the papers administrated by the Ministry of Education and local authorities were selected as the analysis targets. For chemistry, from 2017 to 2021, there were 17–26 provincial-level administrative regions using National Papers (the regions that use National Papers varies from year to year), more than half of the total (31) in China mainland. During these five years, only five regions had been adopting the self-determined papers, namely Zhejiang, Jiangsu, Beijing, Tianjin, and Shanghai. For the regional self-determined papers, as representatives we selected the chemistry examination papers from Zhejiang Province (a province in eastern China with a high level of education and economic development) and Beijing (the capital of China, and a municipality directly under the central government). Finally, the National Papers,‡ Zhejiang papers and Beijing papers from 2017 to 2021 were established as the analysis targets of this study. We believe that the analysis results from these three exam board papers are representative.

All examination papers in this study are composed of multiple choice and non-multiple choice items. Students completed the paper within the specified time and received the corresponding marks according to their answers. The basic information of these papers was listed in Table 3.

Table 3 Basic information of college entrance chemistry exam papers

Papers	Year	Multiple choice items	Non-multiple choice items	Total items
National Papers	2017	7	5	12
	2018	7	5	12
	2019	7	5	12
	2020	7	5	12
	2021	7	5	12

Zhejiang papers	2017	25	7	32
	2018	25	7	32
	2019	25	7	32
	2020	25	6	31
	2021	25	6	31

Beijing papers	2017	7	4	11
	2018	7	4	11
	2019	7	4	11
	2020	14	5	11
	2021	14	5	19

Data analysis

In order to classify the items in the exam papers, it is first necessary to clarify the unit of analysis in this study. For multiple choice items, each item (e.g., Item 20) was regarded as a unit of analysis; for non-multiple choice items, given that an item usually contains several sub-items, which examine different knowledge facets and involve different scientific methods, each sub-item (e.g., Item 19(2) or Item 28(1)①) was considered as a unit of analysis. On the basis of establishing the unit of analysis, we coded through the following operational procedures. Firstly, we determined whether or not a unit is a practical item. If the unit is not related to practical investigation or practical work, and can be completed by recalling knowledge or simple calculations, it is not a practical item and will not be classified; if the unit is a practical item, the category of scientific methods to which it belongs will be determined according to the classification criteria in the analytical framework and then counted. Finally, we got the total number and percentage of the unit of analysis for each category of scientific methods in each paper. The codings for all units of analysis are presented in the Appendix.

To ensure the reliability of the analysis, three coders (the first, fourth and fifth authors) analyzed all items independently, whose average mutual agreement reached 88%, indicating that the coding consistency was high and the data were reliable (Miles and Huberman, 1994). For different or controversial opinions, discussions were held to reach a consensus. It should be noted that all items analyzed in this study are in Chinese and the coding was done before translation into English, and an English expert was invited to check the correctness of the translation. Table 4 shows four examples of the item analysis, which will provide readers with details of the coding process.

Table 4 Examples of the item analysis

Category	Example	Remarks
MPM	(Item 28(1)① of National Papers in 2019)	The experiment that is described in this item is to measure the equilibrium conversion ratio of HCl corresponding to different temperatures under the conditions of different reactant concentration ratios. Three lines are drawn in the graph, taking the temperature as the independent variable (x) and the conversion ratio as the dependent variable (y). This experiment involves the manipulation of two variables, namely temperature and the reactant concentration ratio. Students are required to determine which line has a reactant concentration ratio of 4:1 (i.e., temperature is the only variable at this time), and then explore how changes in temperature affect the equilibrium constant of this chemical reaction. To answer this question, students need to compare the equilibrium constants at different temperatures based on the experimental data in the figure. Given that this item does not require students to test hypotheses or predict phenomena, it is classified as the parameter measurement category. In addition, the item involves the manipulation of the variable (i.e., temperature), so it belongs to manipulative parameter measurement (MPM).
	In recent years, with the rapid development of the polyester industry, the demand for chlorine and the output of hydrogen chloride have also increased rapidly. Therefore, the technology of converting hydrogen chloride into chlorine has become a hot topic of scientific research. Answer the following questions:
	The direct oxidation method (the Deacon process) is:
	4HCl(g) + O₂(g) = 2Cl₂(g) + 2H₂O(g). The following figure shows the relationship between the equilibrium conversion ratio of HCl and the temperature when the reactant concentration ratios, namely c(HCl):c(O₂), are 1:1, 4:1 and 7:1, respectively, in a rigid container.
	It can be known that the chemical equilibrium constant
	K (300 °C)K (400 °C) (fill in “>” or “<”).

Non-MPM	(Item 20 of Zhejiang papers in 2021)	The experiment described in this item is to measure the volume of O₂ released at different times at a certain temperature and convert it into the concentration of N₂O₅, which are listed in the table. Students need to calculate the degree of the chemical reaction and the rate of the chemical reaction according to the experimental data in the table. This item does not require students to test hypotheses or predict phenomena, so it is classified as the parameter measurement category. Besides, this item involves no variable manipulation, so it belongs to non-manipulative parameter measurement (Non-MPM).
	At a certain temperature, a decomposition reaction occurred in a solution of N₂O₅ in CCl₄ (100 mL): 2N₂O₅ ⇌ 4NO₂ + O₂. The volume of O₂ released at different times was measured and converted into the concentration of N₂O₅ as shown in the following table:

	Which of the following statements is correct:
	A. During the period of ∼600–1200 s, the average rate of generating NO₂ was 5.0 × 10⁻⁴ mol L⁻¹ s⁻¹.
	B. When the reaction reached 2220 s, the volume of O₂ released was 11.8 L (under standard conditions).
	C. When the reaction reached equilibrium, the forward reaction rate of N₂O₅ was equal to twice the reverse reaction rate of NO₂.
	D. In the above table, x can be inferred to be 3930.
MHT	(Item 19(1)③ of Beijing papers in 2021)	The question investigated in this item is that why the reactants are not all consumed when MnO₂ reacts with concentrated hydrochloric acid. Step I prompts students that the ion concentration in the solution will change as the chemical reaction proceeds. Step II tells the students that the concentration of H⁺ and Mn²⁺ will affect the oxidative properties of MnO₂, and asks the students to make a hypothesis based on the second half-reaction. To answer this question, students need to put forward a reasonable hypothesis, combined with the phenomena of the supplementary experiment in Step III, and test whether or not their own hypothesis is correct. The experiment in Step III involves the manipulation of several variables (i.e., the presence or absence of H⁺, Cl⁻, and Mn²⁺) and presents the corresponding experimental results, so it is classified as the manipulative category. Unlike the above two categories, this item does not require students to measure parameters but asks them to propose a reasonable hypothesis based on the experimental phenomena and test whether or not their own hypothesis is correct. Therefore, it belongs to manipulative hypothesis testing (MHT).
	A group investigated the redox reactions involving halogens, and analyzed the change rule in the oxidation and reduction properties of substances from the perspective of electrode reactions.
	Concentrated hydrochloric acid and manganese dioxide are mixed and heated to generate chlorine. When chlorine is no longer released, hydrochloric acid and manganese dioxide still exist in the solid–liquid mixture.
	I Electrode reaction formulae:
	i. Reduction reaction: MnO₂ + 2e⁻ + 4H⁺ = Mn²⁺ + 2H₂O
	ii. Oxidation reaction: 2Cl⁻ − 2e⁻ = Cl₂↑
	II According to the electrode reaction formulae, you need to analyze the reason why hydrochloric acid and manganese dioxide still exist in the solid–liquid mixture.
	i. With the decrease in H⁺ concentration or the increase in Mn²⁺ concentration, the oxidative properties of MnO₂ are weakened.
	ii. With the decrease in Cl⁻ concentration, .
	III Supplementary experiments confirmed the analysis in question II.

Non-MHT	(Item 19(2) of Beijing papers in 2020)	The question described in this item is to investigate the thermal decomposition products of Na₂SO₃, and requires students to test the products through experiments. Considering that students may lack the corresponding knowledge, the item first provides supplementary information and explains some experimental operation steps. Then, the students are required to supplement the experimental steps after reflection, and analyze as well as predict the corresponding experimental phenomena, so as to confirm that SO₄²⁻ is contained in the decomposition products. The item requires the students to supplement the experimental steps and predict the experimental phenomenon according to the known conditions, so it is classified as the hypothesis testing category. Furthermore, this item involves no variable manipulation, so it belongs to non-manipulative hypothesis testing (Non-MHT).
	Explore the thermal decomposition products of Na₂SO₃ (s).
	Information:
	I
	II Na₂S + (x − 1)S = Na₂S_x, where Na₂S_x can react with acid to generate S and H₂S.
	III BaS is easily soluble in water.
	Under the conditions of isolating air, anhydrous Na₂SO₃ (s) was heated to obtain yellow A (s), and no gas was detected during the process. Then, A (s) was added to water to obtain a suspension, which was left to stand for a while to obtain colorless B (aq).
	To test the decomposition product Na₂SO₄:
	Firstly, a small amount of B (aq) is taken and added to BaCl₂ (aq), which produces a white precipitate. Secondly, HCl (aq) is added, which leads to an increase of the precipitate (which is verified to contain S) and the production of a gas (H₂S) with the smell of rotten eggs. Since the accumulating precipitate will disturb the observation, another small amount of B (aq) should be taken and sufficient HCl (aq) is added, after which we separate the solid from the liquid, (fill in operations and phenomena). Hence, it can be confirmed that SO₄²⁻ is contained in the decomposition products.

Results

In this section, we first present the analysis results of the examination papers from the three exam boards, respectively, and then compare them to inspect their commonalities and differences.

National Papers

The analysis results of practical items in the National Papers are presented in Table 5. Among the four categories of scientific methods, non-manipulative parameter measurement (Non-MPM) took up a much larger proportion of units than the other three categories, with an average of 31.75%. From 2017 to 2021, the percentage of non-manipulative parameter measurement (Non-MPM) showed a fluctuating change, ranging from 18.00% to 43.48%. Although non-manipulative hypothesis testing (Non-MHT) ranked second, its five-year average percentage was only 11.11%, which is about one-third of non-manipulative parameter measurement (Non-MPM). This was followed by manipulative parameter measurement (MPM), whose proportion was slightly lower than non-manipulative hypothesis testing (Non-MHT), with an average of 8.73% and only exceeded 10% in 2018. Manipulative hypothesis testing (MHT) ranked at the bottom, and did not appear in any item of the National Papers. The chi-squared tests showed that there were significant differences in the distribution of the four categories of scientific methods in each year's paper (see Table 5), but there was no significant difference in the distribution of scientific methods between the five years (χ² = 12.027, p > 0.05). In general, the average unit proportion of the parameter measurement in the National Papers remained at 40.48%, while the hypothesis testing was only at 11.11%, from which we could conclude that practical items paid more attention to the parameter measurement. With regard to the variable manipulation, the proportion of units for manipulative parameter measurement (MPM) plus manipulative hypothesis testing (MHT) was only 8.73%, while that for the non-manipulative parameter measurement (Non-MPM) plus non-manipulative hypothesis testing (Non-MHT) was surprisingly as high as 42.86%, which indicated that the National Papers were lacking in assessing the students’ ability to manipulate variables.

Table 5 Distribution of the four categories of scientific methods in the National Papers

Year	MPM	Non-MPM	MHT	Non-MHT	Total units	χ ²
Note: ① MPM, manipulative parameter measurement. Non-MPM, non-manipulative parameter measurement. MHT, manipulative hypothesis testing. Non-MHT, non-manipulative hypothesis testing. Total units refers to the total number of units of analysis in each paper.a ② p < 0.05. ③ Figures outside brackets are the total number of units of analysis involved in the same scientific methods. Figures in brackets are percentages obtained by dividing the value of units by the total.
2017	3 (5.66%)	17 (32.08%)	0 (0.00%)	4 (7.55%)	53	12.661^a
2018	8 (16.00%)	9 (18.00%)	0 (0.00%)	9 (18.00%)	50	8.173^a
2019	4 (8.70%)	20 (43.48%)	0 (0.00%)	3 (6.52%)	46	15.664^a
2020	4 (7.55%)	17 (32.08%)	0 (0.00%)	5 (9.43%)	53	12.261^a
2021	3 (6.00%)	17 (34.00%)	0 (0.00%)	7 (14.00%)	50	12.753^a
Average	4.40 (8.73%)	16.00 (31.75%)	0 (0.00%)	5.60 (11.11%)	50.40	—

Zhejiang papers

The analysis results of practical items in the Zhejiang papers were presented in Table 6. There was a similar distribution of the four categories of scientific methods between the Zhejiang papers and the National Papers, where the same sequence from the highest to the lowest in the proportion was for non-manipulative parameter measurement (Non-MPM), non-manipulative hypothesis testing (Non-MHT), manipulative parameter measurement (MPM) and manipulative hypothesis testing (MHT). In addition, items involving manipulative hypothesis testing (MHT) were zero in the above two paper types. In the Zhejiang papers, the unit proportions of the top two categories, namely non-manipulative parameter measurement (Non-MPM) and non-manipulative hypothesis testing (Non-MHT), were at 30.32% and 25.63%, respectively, which is not a great difference between the two. However, manipulative parameter measurement (MPM), which ranked third, took up only 5.05%, which is far lower than the previous two categories. From the perspective of the variation tendency during these five years, non-manipulative parameter measurement (Non-MPM) and non-manipulative hypothesis testing (Non-MHT) showed a similar trend, both first increasing then decreasing, and then increasing again, while manipulative parameter measurement (MPM) showed a slight downward trend. The chi-squared tests showed that there were significant differences in the distribution of the four categories of scientific methods in each year's paper (see in Table 6), but there was no significant difference in the distribution of scientific methods between the five years (χ² = 3.685, p > 0.05). Through further analysis it could be found that the total proportion of units for the parameter measurement was 35.37%, while the hypothesis test was 25.63%, indicating that practical items were slightly inclined to the parameter measurement. There was a significant difference between the proportions of units with or without variable manipulation, at 5.05% (manipulation) and 55.95% (non-manipulation), that is to say, only a few practical items involved variable manipulation.

Table 6 Distribution of the four categories of scientific methods in the Zhejiang papers

Year	MPM	Non-MPM	MHT	Non-MHT	Total units	χ ²
Note: ① MPM, manipulative parameter measurement. Non-MPM, non-manipulative parameter measurement. MHT, manipulative hypothesis testing. Non-MHT, non-manipulative hypothesis testing. Total units refers to the total number of units of analysis in each paper.a ② p < 0.05. ③ Figures outside brackets are the total number of units of analysis involved in the same scientific methods. Figures in brackets are percentages obtained by dividing the value of units by the total.
2017	4 (7.27%)	15 (27.27%)	0 (0.00%)	14 (25.45%)	55	13.088^a
2018	4 (7.27%)	18 (32.73%)	0 (0.00%)	16 (29.09%)	55	16.201^a
2019	4 (7.02%)	18 (31.58%)	0 (0.00%)	15 (26.32%)	57	15.412^a
2020	1 (1.89%)	13 (24.53%)	0 (0.00%)	12 (22.64%)	53	14.562^a
2021	1 (1.75%)	20 (35.09%)	0 (0.00%)	14 (24.56%)	57	20.649^a
Average	2.80 (5.05%)	16.80 (30.32%)	0 (0.00%)	14.20 (25.63%)	55.40	—

Beijing papers

The analysis results of practical items in the Beijing papers are presented in Table 7. Non-manipulative parameter measurement (Non-MPM) still ranked first, with an average proportion of units at 43.13%. Non-manipulative hypothesis testing (Non-MHT) ranked second, whose units accounted for about one-fifth (20.85%) in the Beijing papers. The average percentage of manipulative hypothesis testing (MHT) was close to manipulative parameter measurement (MPM), more exactly, the latter (5.69%) had a higher proportion of units than the former (2.84%). It is worth mentioning that the manipulative hypothesis testing (MHT) category appeared in the Beijing papers, and even exceeded manipulative parameter measurement (MPM), but only in 2021. The chi-squared tests showed that there were significant differences in the distribution of the four categories of scientific methods in each year's paper (see Table 7), and there was a significant difference in the distribution of scientific methods between the five years (χ² = 29.272, p < 0.05). This latter result was mainly due to the emergence of six units of manipulative hypothesis testing (MHT) in 2021. In the Beijing papers, the unit proportions of parameter measurement and hypothesis testing remained at 48.82% and 23.69%, respectively, with a disparity of more than twofold. The unit proportions for manipulative and non-manipulative remained at 8.53% and 63.98%, respectively, with a more obvious gap where the latter was more than sevenfold the former.

Table 7 Distribution of the four categories of scientific methods in the Beijing papers

Year	MPM	Non-MPM	MHT	Non-MHT	Total units	χ ²
Note: ① MPM, manipulative parameter measurement. Non-MPM, non-manipulative parameter measurement. MHT, manipulative hypothesis testing. Non-MHT, non-manipulative hypothesis testing. Total units refers to the total number of units of analysis in each paper.a ② p < 0.05. ③ Figures outside brackets are the total number of units of analysis involved in the same scientific methods. Figures in brackets are percentages obtained by dividing the value of units by the total.
2017	1 (2.78%)	19 (52.78%)	0 (0.00%)	10 (27.78%)	36	18.102^a
2018	6 (15.00%)	13 (32.50%)	0 (0.00%)	7 (17.50%)	40	8.815^a
2019	1 (2.50%)	17 (42.50%)	0 (0.00%)	8 (20.00%)	40	15.681^a
2020	1 (2.22%)	21 (46.67%)	0 (0.00%)	10 (22.22%)	45	19.494^a
2021	3 (6.00%)	21 (42.00%)	6 (12.00%)	9 (18.00%)	50	8.714^a
Average	2.40 (5.69%)	18.20 (43.13%)	1.2 (2.84%)	8.8 (20.85%)	42.20	—

Overall comparison across the three exam boards

We compared the average percentages of units in chemistry examination papers from three exam boards during 2017–2021. The results are presented in Fig. 2.


	Fig. 2 Average percentages of units in examination papers of the three exam boards.

It can be seen from Fig. 2 that practical items have a relatively high proportion in the examination papers (51.59% in the National Papers; 61.00% in the Zhejiang papers; 72.51% in the Beijing papers), and a similar tendency was observed in the distribution of the four categories of scientific methods in the examination papers from different exam boards. Non-manipulative parameter measurement (Non-MPM) accounted for the largest proportion, followed by non-manipulative hypothesis testing (Non-MHT), and, again, manipulative parameter measurement (MPM), and finally manipulative hypothesis testing (MHT). The chi-squared test also showed that there was no significant difference in the distribution of different scientific methods across the three exam boards (χ² = 5.135, p > 0.05). Another commonality was that, in papers from each exam board, only a small proportion of practical items involved variable manipulation. In terms of differences, manipulative hypothesis testing (MHT) was included in items of the Beijing papers, which disappeared in papers from the other two exam boards. Furthermore, the proportion of non-manipulative hypothesis testing (Non-MHT) varied to some extent between different papers, that is, it accounted for the largest proportion in the Zhejiang papers, but accounted for the least proportion in the National Papers, and showed the middle proportion in the Beijing papers. Considering that all examination papers in this study were designed according to the same guidelines, that is, to the senior high school chemistry curriculum standards, we argue that the differences in the papers across the three exam boards and in annual papers from the same exam board are mainly attributed to the ‘exam writers effect’. As mentioned in the previous ‘Context’ section, the exam writers of college entrance examinations vary from year to year. The distinctions in understanding of scientific methods and curriculum standards by different exam writers may lead to the differences between examination papers.

Conclusion and discussions

In this study, based on the theoretical framework derived from Brandon's matrix, we have illustrated the characteristics of four categories of scientific methods and examined how the diversity of scientific methods is represented in college entrance chemistry examination papers from three exam boards in China. Our empirical research supported the applicability of this matrix for characterizing practical chemistry items and revealed the features of summative assessments of scientific methods in China. Although this matrix has been employed in various research studies (e.g., El Masri et al., 2021; Ioannidou and Erduran, 2021), it is the first time that we have endowed four categories of scientific methods with operational illustrations and descriptions in the context of practical items and applied this tool to eastern countries, especially China, where high-stakes tests dominate the education system (Gu, 2004). In this sense, this study not only helps the international audience to learn about Chinese college entrance chemistry examination papers but also makes Brandon's matrix more operationalized and provides researchers from other countries with a novel analytical framework for the analysis of practical chemistry items. We believe that this research will make a contribution to the international literature of scientific methods in the field of chemistry education.

It was found that there is a relatively high proportion of items involving scientific methods in college entrance chemistry examination papers, which reflect that the summative assessments of scientific methods and practical science skills are valued in China. As we know, school chemistry is often thought of as a practical science. The results are encouraging. Hence, teachers should pay more attention to the development of practical activities and the application of scientific methods by students in daily teaching, and strengthen the training of practical items when preparing for exams. Besides, the research results show that the percentages of the four categories of scientific methods in the examination papers varied significantly between the exam boards. The proportion of non-manipulative parameter measurement (Non-MPM) was large, while manipulative hypothesis testing (MHT) accounted for a very small proportion, and even disappeared in the National Papers and Zhejiang papers. Based on this finding and by reference to Fig. 1, which was created by Brandon (1994), it can be seen that the practical chemistry items in China are less experimental. Furthermore, it can be concluded that there exists an imbalance in scientific methods in Chinese college entrance chemistry examination papers. This kind of imbalance has appeared several times in empirical studies on the diversity of scientific methods (Cullinane et al., 2019; El Masri et al., 2021; Wei et al., 2022), which to some extent reflects the diversity of methods being neglected in science education. As mentioned above, the design of Chinese college entrance chemistry examination papers is guided by the senior high school chemistry curriculum standards. Given the fact that method elements (such as variable manipulation, parameter measurement and hypothesis testing) are embedded in the description of academic quality Level 4, college entrance chemistry examinations should appropriately assess the four categories of scientific methods as discussed in Brandon's matrix. However, the imbalance in scientific methods indicates that the design of college entrance examination papers is not completely aligned with the requirements of the curriculum standards. That is to say, assessments sometimes fail to perfectly reflect the intention of the curriculum standards, which may precisely reflect the inadequacy of the compilation of Chinese college entrance chemistry examination papers. What is more, further inspection indicated that the curriculum standards do not explicitly require the diversity of scientific methods, and nor do they emphasize the balance of different categories of scientific methods, which may make exam writers pay inadequate attention to this issue. We assumed that this was a possible explanation for the lack of items about variable manipulation, the virtual disappearance of manipulative hypothesis testing (MHT), and the imbalance of scientific methods in examination papers. Erduran and Dagher (2014) have provided a few cautions for overlooking the diversity of scientific methods, one of which is that this overturns the students’ understanding of practices and the content of the discipline. According to Cullinane et al. (2019), it is necessary to design summative assessments that emphasize a more balanced representation of scientific methods in chemistry. In view of the relationship between curriculum standards and college entrance examination papers in China, we provide the following two suggestions. One is for the revision of future senior high school chemistry curriculum standards: the idea of the diversity of scientific methods and the balanced representation of scientific methods should be clearly put forward in the principles or requirements of designing the exam papers. Our other suggestion is for the administrators of college entrance examinations (i.e., the education examinations authority attached to the Ministry of Education, a province or municipality): Brandon's matrix (Brandon, 1994) can be employed to train the chemistry exam writers so that they can fully understand the diversity of scientific methods, and strive to achieve a balanced representation of scientific methods in the process of designing papers. As mentioned above, considering that neglecting the diversity of methods in science education seems to be a relatively common situation, we believe that the two suggestions presented here are not only applicable to China, but also have implications for chemistry education in other countries, especially for the design of high-stakes chemistry exam papers.

From the perspective of the relationship between teaching and examination, although the idea of ‘teaching to the test’ is inappropriate and has been criticized in recent years (Copp, 2018), in reality, under the pressure of summative assessments, many teachers perceive ‘teaching to the test’ as a moral duty since the stakes are high for students who do not pass such exams (Salloum and BouJaoude, 2019). According to Williams-McBean (2022), internationally, there is evidence that the format and content of high-stakes, standardized, summative assessments have pervaded and wielded influence on the content taught, as well as the teaching and assessment methodologies used in the classroom. This situation may be more pronounced in China because of the local examination culture (Gu, 2004). Specifically, given the significance of college entrance examinations, the content and format of exam items in recent years will inevitably be regarded as an important reference for teachers’ instructional practices. Based on the findings of this study, we are reasonably concerned that the representation of scientific methods in college entrance chemistry examinations will have a negative impact on implementing a diversity of scientific methods in chemistry classroom teaching. The above discussions once again illustrate the need for the balanced representation of scientific methods in chemistry high-stakes exams.

For all of the three exam boards, as found in this study, the predominant category of scientific methods in examination papers is non-manipulative parameter measurement (Non-MPM), which is similar to the analysis results of practical work in Chinese science textbooks by Wei et al. (2022). This fact showed that the assessment is consistent with curriculum materials, and the circle of science education in China places more emphasis on this category of scientific method. From another point of view, we infer that the reason for this situation may be that this type of practical item or work is easier to design and present in examination papers or science textbooks, because it involves neither manipulation nor hypothesis testing. Based on the above facts, from the perspective of preparing for the college entrance examination, chemistry teachers should ensure that this type of practical work is widely carried out in classroom teaching, and appropriately increase these types of practical item in students’ training, which would help students to better complete items of non-manipulative parameter measurement (Non-MPM) in examinations.

By contrast, manipulative hypothesis testing (MHT), which is often presented as ‘the scientific method’ in many science classrooms around the world (Erduran and Dagher, 2014), almost disappeared in Chinese college entrance chemistry examination papers, which is in line with the findings of Cullinane et al. (2019). According to Wei et al. (2022), it is not easy to formulate and test hypotheses concerning the subject matter of chemistry. This may partly explain the reason for the lower percentage of manipulative hypothesis testing (MHT) in examination papers, although the underlying reason needs to be further explored in the future. To promote a balanced representation of scientific methods, it is suggested that more items of manipulative hypothesis testing (MHT) should be designed in chemistry examination papers. We believe that the illustrations of characteristics of manipulative hypothesis testing (MHT) items in this study and the manipulative hypothesis testing (MHT) items of the Beijing papers in 2021 can be used as effective scaffolds to provide specific reference for exam writers to design these types of item. For example, one chemistry item can put forward the research hypothesis that strong acids ionize more than weak acids under the same conditions, requiring the students to design a reasonable experimental scheme to verify it. As another example, the item can provide an experimental scheme to explore the impact of different concentrations of reactants on the rate of a chemical reaction under the same temperature and pressure, requiring the students to predict the experimental results and elaborate the reasons for their prediction. Such items not only require students to understand the variable manipulation in the item stems but also to make relevant hypotheses and predictions, which are suitable for examining the high-level thinking and reasoning skills of students.

Despite some theoretical and practical implications discussed above, there are some methodological limitations in this study. Content analysis focuses on what and how documents convey, rather than why (Stemler, 2001). In other words, content analysis can be conducted to reveal the current situation of texts and the trend of content change, but it is not good at explaining these features. The interpretability of the results is relatively weak. Therefore, although some explanations have been made in this paper, some important issues still deserve further study in the future, such as why the diversity of scientific methods is generally ignored by chemistry exam writers, and why the proportion of manipulative hypothesis testing (MHT) is so low. Furthermore, it is well known that different school science subjects (chemistry, physics, and biology) possess distinct inherent characteristics. Wei et al. (2022) have found that the distributions of these four categories of scientific methods in textbooks of the three subject-based science are different. Given that the current analysis of high-stakes exams in view of the diversity of scientific methods focuses exclusively on the chemistry discipline, future research can compare exam papers of different disciplines, which will more comprehensively demonstrate the characteristics of scientific methods in summative assessments in a country or region. In addition, although there has been a consensus that summative assessments exert an impact on classroom teaching (Erduran et al., 2019), the issues on how exactly the diversity of scientific methods is represented in teaching practice and whether it is in accord with the distribution presented in assessments or textbooks deserve our exploration. Hence, in future studies, practical work in classroom teaching can be analyzed with the help of Brandon's 1994 framework or other models of the diversity of scientific methods, which will further deepen people's understanding of scientific methods education.

Conflicts of interest

There are no conflicts to declare.

Appendices

Specific coding data in the National Papers

	MPM	Non-MPM	MHT	Non-MHT	Non-practical item	Total units
Note: MPM, manipulative parameter measurement. Non-MPM, non-manipulative parameter measurement. MHT, manipulative hypothesis testing. Non-MHT, non-manipulative hypothesis testing.
2017 items	13,27(4)①, 36(3)①	11,26(2)②, 26(2)④, 26(3)①, 26(3)②, 26(3)③, 27(2)①, 27(2)②, 27(3)①, 27(3)②, 27(4)②, 28(2)①, 28(2)②, 28(4)②, 28(4)③, 28(4)④, 36(3)②	None	12,36(1)①, 36(5),36(6)	7,8,9,10,26(1)①, 26(1)②, 26(2)①, 26(2)③, 27(1)①, 27(1)②, 27(5),28(1),28(3),28(4)①, 28(4)⑤,35(1)①, 35(1)②, 35(1)③, 35(2)①, 35(2)②, 35(3)①, 35(3)②, 35(4), 35(5)①, 35(5)②, 36(1)②, 36(2)①, 36(2)②, 36(4)	53
2018 items	12,27(3)①, 27(3)②, 27(3)③, 27(3)④, 28(3)①, 28(3)②, 28(3)⑥	10,11,26(2)①, 26(2)②, 26(2)③, 27(2)①, 27(2)②, 27(2)③, 28(1)	None	13,26(1)①, 26(1)②, 26(1)③, 26(1)④, 36(2), 36(4),36(6),36(7)	7,8,9,26(2)④, 26(2)⑤, 27(1),28(2),28(3)③, 28(3)④, 28(3)⑤, 28(3)⑦, 35(1),35(2)①, 35(2)②, 35(3)①, 35(3)②, 35(4)①, 35(4)②, 35(5)①, 35(5)②, 36(1),36(3)①, 36(3)②, 36(5)	50
2019 items	28(1)①, 28(1)②, 36(4)①, 36(4)②	10,11,12,13,26(1)①, 26(1)②, 26(2),26(3),26(4),26(5),26(6),27(3),27(4),27(5),27(6),28(1)③, 28(3),28(4)①, 35(3),36(4)③	None	9,36(5),36(6)	7,8,26(7),27(1),27(2),28(2),28(4)②, 35(1)①, 35(1)②, 35(2)①, 35(2)②, 35(4)①, 35(4)②, 35(4)③, 35(5),36(1),36(2),36(3)①, 36(3)②	46
2020 items	28(2)①, 28(2)②, 28(2)③, 36(6)	10,12,26(2)②, 26(2)③, 26(5)①, 26(5)②, 27(1)①, 27(1)②, 27(2),27(3)①, 27(3)②, 27(4)②, 27(5),27(6),28(1)②, 28(3),28(4)	None	13,35(1)②, 36(2),36(3)②, 36(5)	7,8,9,11,26(1)①, 26(1)②, 26(2)①, 26(3)①, 26(3)②, 26(4)①, 26(4)②, 27(4)①, 28(1)①, 35(1)①, 35(2)①, 35(2)②, 35(2)③, 35(2)④, 35(3)①, 35(3)②, 35(3)③, 35(3)④, 35(4),36(1),36(3)①, 36(4)①, 36(4)②	53
2021 items	12,28(2)②, 28(2)③	13,26(1)①, 26(1)②, 26(1)⑤, 26(2),26(3)①, 26(3)②, 27(2),27(3)③, 27(3)④, 27(4),27(5), 28(1)②, 28(1)③, 28(2)④, 28(2)⑤, 28(2)⑥	None	11,26(1)③, 26(1)④, 36(3),36(4),36(6),36(7)	7,8,9,10,27(1),27(3)①, 27(3)②, 28(1)①, 28(2)①, 35(1)①, 35(1)②, 35(1)③, 35(1)④, 35(2)①, 35(2)②, 35(2)③, 35(3),35(4)①, 35(4)②, 35(4)③, 36(1),36(2),36(5)	50

Specific coding data in the Zhejiang papers

	MPM	Non-MPM	MHT	Non-MHT	Non-practical item	Total units
Note: MPM, manipulative parameter measurement. Non-MPM, non-manipulative parameter measurement. MHT, manipulative hypothesis testing. Non-MHT, non-manipulative hypothesis testing.
2017 items	21,23,30(2)①, 30(2)③	12,16,17,18,24,28(1),29(1),29(2),30(2)②, 30(3),31(2),31(3),32(4)①, 32(4)②, 32(5)	None	14,25,26(1),26(2),26(3),26(4),27(1),27(2),27(3),28(2),32(2),32(3),32(4),32(5)	1,2,3,4,5,6,7,8,9,10,11,13,15,19,20,22,28(1)②, 30(1)①, 30(1)②, 31(1)①, 31(1)②, 32(1)	55
2018 items	22,23,30(5),30(7)	10,14,17,18,21,24,28(1),28(2),29(1),29(2),30(1),30(2),30(3),30(6),31(1),31(2),31(4)②, 31(5)	None	12,25,26(1),26(2),26(3),26(4),27(1)①, 27(1)②, 27(2),28(3),31(3),32(1),32(2),32(3),32(4),32(5)	1,2,3,4,5,6,7,8,9,11,13,15,16,19,20,30(4),31(4)①	55
2019 items	30(2)②, 30(3)②, 30(3)③, 31(4)	9,11,12,17,20,21,22,24,28(2),28(3),30(1),30(3)①, 30(4)①, 30(4)②, 31(1),31(2)②, 31(3)①, 31(3)②	None	14,25,26(1)②, 26(2),26(3),27(1),27(2),27(3)①, 27(3)②, 29,32(1),32(2),32(3),32(4),32(5)	1,2,3,4,5,6,7,8, 10,13,15,16,18,19,23,26(1)①, 28(1)①, 28(1)②, 30(2)①, 31(2)①	57
2020 items	23	18,20,21,24,28(3)②, 28(3)③, 29(1)②, 29(2)①, 29(2)②, 29(2)③, 30(1)②, 30(2),30(3)③	None	16,25,28(1)①, 28(1)②, 28(2), 28(3)①, 29(1)③, 31(1),31(2),31(3),31(4),31(5)	1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,17,19,22,26(1),26(2),26(3),27(1),27(2),29(1)①, 30(1)①, 30(3)①, 30(3)②	53
2021 items	23	19,20,22,24,25,26(2),27(1),27(2),29(2)①, 29(2)②, 29(3)①, 29(3)②, 29(3)③, 29(4)①, 29(4)②, 30(1)②, 30(2),30(3),30(5)②, 31(4)	None	15,16,28(1)①, 28(1)②, 28(2)①, 28(2)②, 28(3)①, 28(3)②, 31(1)①, 31(1)②, 31(2),31(3),31(5),31(6)	1,2,3,4,5,6,7,8,9,10,11,12,13,14,17,18,21,26(1),29(1),30(1)①, 30(4),30(5)①	57

Specific coding data in the Beijing papers

	MPM	Non-MPM	MHT	Non-MHT	Non-practical item	Total units
Note: MPM, manipulative parameter measurement. Non-MPM, non-manipulative parameter measurement. MHT manipulative hypothesis testing. Non-MHT, non-manipulative hypothesis testing.
2017 items	7	4,9(1)②, 9(1)③, 9(1)④, 9(1)⑤, 9(2)①, 9(2)②, 10(1)①, 10(1)②, 10(1)③, 10(1)④, 10(2)①, 10(2)②, 10(2)③, 11(2)①, 11(2)②, 11(2)③, 11(2)④, 11(3)	None	8(1)①, 8(1)②, 8(2),8(3),8(4),8(5),11(1)①, 11(1)②, 11(2)⑤, 11(2)⑥	1,2,3,5,6,9(1)①	36
2018 items	6,7,9(4),10(4)①, 10(4)②, 10(4)③	2,9(3),9(5)①, 9(5)②, 9(6),10(2)①, 10(2)②, 10(3),11(1)③, 11(2)②, 11(2)③, 11(2)④, 11(2)⑤	None	8(2),8(3),8(4),8(5),8(7),8(8),11(2)⑥	1,3,4,5,8(1),8(6),8(8)①, 9(1),9(2)①, 9(2)②, 10(1),11(1)①, 11(1)②, 11(2)①	40
2019 items	7	6,8(1)①, 8(1)②, 9(2),9(3),9(4),9(5),10(1)①, 10(1)②, 10(1)④, 10(1)⑤, 10(2)①, 10(2)②, 10(2)③, 11(3)①, 11(3)②, 11(6)	None	8(2),8(3),8(4),8(5),8(6),8(7),11(2)②, 11(3)③	1,2,3,4,5,9(1),9(6),9(7),9(8),10(1)③, 11(1)①, 11(1)②, 11(2)①, 11(4)	40
2020 items	10	7,9,11,14,15(1)①, 15(1)②, 15(1)③, 15(1)④, 15(2)①, 15(2)②, 17(3)①, 17(3)②, 18(1)②, 18(1)③, 18(2),18(3),18(4),19(1),19(3)②, 19(3)③, 19(4)	None	13,16(2),16(3),16(4),16(5),16(6),17(5)①, 17(5)②, 19(2),19(3)①	1,2,3,4,5,6,8,12,16(1),17(1),17(2),17(4),18(1)①	45
2021 items	12,14,19(4)	5,6,8,9,10,13,15(1)①, 15(1)②, 15(2)①, 15(2)③, 16(1)①, 16(1)②, 16(2)①, 16(2)②, 16(2)③, 16(2)④, 18(1)①, 18(2)②, 18(2)③, 18(3),19(5)	19(1)③, 19(1)④, 19(1)⑤, 19(2)①, 19(2)②, 19(3)	15(2)②, 15(2)④, 16(1)③, 16(1)④, 17(2),17(3),17(4),17(5),17(6)	1,2,3,4,7,11,17(1),18(1)②, 18(2)①, 19(1)①, 19(1)②	50

Acknowledgements

This research was sponsored by the Humanities and Social Sciences Research Program of the Ministry of Education of China, which is entitled the Transformation Mechanism of Science Curriculum Content Facing Society-Science Issues and Its Empirical Study (19YJA880045).

References

Abd-El-Khalick F. and Lederman N. G., (2000), Improving science teachers’ conceptions of the nature of science: a critical review of the literature, Int. J. Sci. Educ., 22(7), 665–701.
Bauer H., (1994), Scientific literacy and the myth of the scientific methods, University of Illinois Press.
Berg L. B. and Lune H., (2017), Qualitative research methods for the social sciences, 9th edn, New York: Pearson.
Brandon R., (1994), Theory and experiment in evolutionary biology, Synthese, 99, 59–73.
Bybee R. W., (2011), Scientific and engineering practices in K-12 classrooms: understanding a framework for K-12 science education, Sci. Children, 49(4), 6–10.
Chen B. and Wei B., (2015), Investigating the factors that influence chemistry teachers’ use of curriculum materials: the case of China, Sci. Educ. Int., 26(2), 195–216.
Cleland C., (2001), Historical science, experimental science and the scientific methods, Geology, 29, 987–990.
Copp D. T., (2018), Teaching to the test: A mixed methods study of instructional change from large-scale testing in Canadian schools, Assess. Educ.: Princ., Policy Practice, 25(5), 468–487.
Cullinane A., Erduran S. and Wooding S. J., (2019), Investigating the diversity of scientific methods in high-stakes chemistry examinations in England, Int. J. Sci. Educ., 41(16), 2201–2217.
Department for Education [DFE], (2015), National curriculum in England: Science programmes of study. Retrieved May 8, 2021, from https://www.gov.uk/government/publications/national-curriculum-in-england-science-programmes-of-study.
Dewey J., (1910), How we think, Lexington, MA: D.C. Heath.
Dodick J., Argamon S. and Chase P., (2009), Understanding scientific methodology in the historical and experimental sciences via language analysis, Sci. Educ., 18(8), 985–1004.
Duschl R. A., Schweingruber H. A. and Shouse A. W. (ed.), (2007), Taking science to school: Learning and teaching science in grades K-8, National Academies Press, vol. 500.
El Masri Y. H., Erduran S. and Ioannidou O., (2021), Designing practical science assessments in England: students’ engagement and perceptions, Res. Sci. Technol. Educ., 1–21.
Erduran S., Cullinane A. and Wooding S. J., (2019), Assessment of practical chemistry in England: An analysis of methods assessed in high stakes examinations, in Schultz M., Schmid S. and Lawrie G. (ed.), Research and practice in chemistry education: Advances from the 25th IUPAC international conference on chemistry education 2018, Dordrecht: Springer, pp. 135–147.
Erduran S. and Dagher Z., (2014), Reconceptualizing the nature of science for science education: Scientific knowledge, practices and other family categories, Dordrecht: Springer.
Gray R., (2014), The Distinction between experimental and historical sciences as a framework for improving classroom Inquiry, Sci. Educ., 98(2), 327–341.
Gu M., (2004), The cultural foundation to the Chinese education, Taiyuan: Shanxi Education Press (in Chinese).
Hodson D., (1996), Laboratory work as scientific method: Three decades of confusion and distortion, J. Curric. Stud., 28(2): 115–135.
Hodson D., (2014), Learning science, learning about science, doing science: Different goals demand different learning methods, Int. J. Sci. Educ., 36(15), 2534–2553.
Hollins M. and Reiss M. J., (2016), A review of the school science curricula in eleven high achieving jurisdictions, Curric. J., 27(1), 80–94.
Ioannidou O. and Erduran S., (2021), Beyond hypothesis testing: investigating the diversity of scientific methods in science teachers’ understanding, Sci. Educ., 30(2), 345–364.
Irzik, G. and Nola, R., (2014), New directions for nature of science research, in M. Matthews (ed.), International handbook of research in history, philosophy and science teaching, Springer, pp. 999–1021.
Lawson A., (2003), Allchin's shoehorn, or why science is hypothetico-deductive, Sci. Educ., 12, 331–337.
Miles M. B. and Huberman M., (1994), Qualitative data analysis: A sourcebook of new methods, 2nd edn, Beverly Hills, CA: Sage Publications.
Ministry of Education [MoE], (2018), The Chemistry Curriculum Standards of Senior High School (the 2017 version), Beijing: People's Education Press (in Chinese).
National Science Teachers Association [NSTA], (2020), NSTA position statement on nature of science, Retrieved May 8, 2021, from https://www.nsta.org/nstas-official-positions/nature-science.
NGSS Lead States, (2013), Next generation science standards: for states, by states, Washington, DC: The National Academies Press.
National Research Council [NRC], (2012), A framework for K-12 science education: practices, crosscutting concepts, and core ideas, Washington, DC: The National Academies Press.
Ofqual, (2015), GCSE subject level conditions and requirements for single science (biology, chemistry, physics) July 2015, Retrieved June 16, 2021 from https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/600867/gcse-subject-level-conditions-and-requirements-for-single-science.pdf.
Osborne J., Collins S., Ratcliffe M., Millar R. and Duschl R., (2003), What “ideas-about-science” should be taught in school science? A Delphi study of the expert community, J. Res. Sci. Teach., 40(7), 692–720.
Rudolph J. L., (2005), Epistemology for the masses: The origins of the scientific method in American schools, Hist. Educ. Q., 45, 341–376.
Salloum S., BouJaoude S., (2019), The use of triadic dialogue in the science classroom: a teacher negotiating conceptual learning with teaching to the test, Res. Sci. Educ., 49, 829–857.
Stemler S., (2001), An overview of content analysis, Pract. Assess., Res. Eval., 7(17), 137–146.
Turner D., (2013), Historical geology: Methodology and metaphysics, in Baker V. R. (ed.), Rethinking the fabric of geology: Geological Society of America special paper 502, Geological Society of America, pp. 11–18.
Wei B., (2019), Reconstructing school chemistry curriculum in the era of core competencies: a case from China, J. Chem. Educ., 96, 1359–1366.
Wei B., Jiang Z. and Gai L., (2022), Examining the nature of practical work in school science textbooks: Coverage of the diversity of scientific methods, Sci. Educ., 31, 943–960.
Wei B. and Wang Y., (2021), The presentation of science practice in twenty historical cases, Sci. Educ., 30, 365–380.
Williams-McBean C. T., (2022), Using school-based assessments to advance the integration of sustainable development competences by capitalising on the practice of teaching to the test, Environ. Educ. Res., 1–18.
Woodcock B. A., (2014), “The scientific methods” as myth and ideal, Sci. Educ., 23, 2069–2093.
Zhang B., Krajcik J. S., Sutherland L. M., Wang L., Wu J. and Qian Y., (2003), Opportunities and challenges of China's inquiry-based education reform in middle and high schools: Perspectives of science teachers and teacher educators, Int. J. Sci. Math. Educ., 1, 477–503.
Zhao X., Zhao J., Guo X. and Wu Y., (2022), A study on the reliability and validity of Gaokao based on correlation analysis, J. Chin: Exam., 30(3), 37–43 (in Chinese).

Footnotes

† In China, the education department of each administrative region sets up a research department of teacher education, which will be equipped with teaching supervisors from different subjects, whose regular work is to guide the teaching of teachers within the region, and regularly organize teaching and research activities to promote the professional development of teachers.

‡ The National Papers in this study are comprehensive science examination papers that contain items from three subjects (chemistry, physics and biology). This study only analyzed the items of chemistry in the National Papers.

Click here to see how this site uses Cookies. View our privacy policy here.