On the prevalence of alternative conceptions on acid–base chemistry among secondary students: insights from cognitive and confidence measures

Kai Yee Hoe; R. Subramaniam

doi:10.1039/C5RP00146C

View PDF VersionPrevious ArticleNext Article

DOI: 10.1039/C5RP00146C (Paper) Chem. Educ. Res. Pract., 2016, 17, 263-282

On the prevalence of alternative conceptions on acid–base chemistry among secondary students: insights from cognitive and confidence measures

Kai Yee Hoe and R. Subramaniam *
Nanyang Technological Univeristy – National Institute of Education, Singapore. E-mail: subramaniam.r@nie.edu.sg

Received 31st July 2015 , Accepted 11th December 2015

First published on 11th December 2015

Abstract

This study presents an analysis of alternative conceptions (ACs) on acid–base chemistry harbored by grade 9 students in Singapore. The ACs were obtained by the development and validation of a 4-tier diagnostic instrument. It is among the very few studies in the science education literature that have focused on examining results based also on measures used in the educational psychology literature. The results indicate that the students harbor a range of ACs of varying strengths in relation to the properties of acids and bases, strengths of acids and bases, pH, neutralization, indicators, and sub-microscopic views of acids and bases. The 25-item instrument uncovered 30 ACs. A novel insight from this study is that when students are presented with a test item with all incorrect answer and reason responses but with a blank space for them to fill up their own answer and reason if they disagree with any of these responses, hardly any chose to do so. ACs were also identified from this question. Overall, the results reiterate the utility of the 4-tier format for identifying ACs and getting some indication of their strengths.

Introduction

Learning is a complex process in the acquisition of knowledge. Not all content taught is learnt. Gaps are bound to exist in students' understanding. The presence of alternative conceptions (ACs) can also interfere with students' understanding. These ACs are at odds with the relevant scientific concepts, and have to be identified and addressed if proper conceptual understanding is to be promoted. Not surprisingly, studies on students' understanding of topics in the sciences have been a fruitful field of research, and these have allowed for the documenting of ACs on various topics for use by instructors and researchers.

Some may view chemistry as a subject that is difficult to be effectively taught and learnt. Kavanaugh and Moomaw (1981) found that many students have difficulties in understanding fundamental chemistry concepts. Students' difficulties in learning chemistry have been well documented (for example, Gabel and Bunce, 1994). Contributing factors include the abstract nature of the subject (Herron, 1975), remoteness of the language used (Glassman, 1967) and different levels of representations used (Gabel et al., 1987; Nakhleh and Krajcik, 1994).

In order to understand chemistry, one must be familiar with the various forms of representations such as symbols, pictorial forms and equations used. Very often, visualization at the molecular level is crucial for understanding and explaining chemical phenomena. Kozma and Russell (1997) showed that the schemas for chemistry knowledge differ between experts (teachers) and novices (students) in terms of complexity. Novices are lower in representational competence – fluency in using different representations and switching between representations. They concluded that novices normally use only one form of representation and are not quite able to transform it to other forms. Novices also tend to rely on the obvious surface features while experts have better grasps of the underlying principles. These difficulties carry on into all areas of chemistry. A number of studies have focused on student difficulties with concepts in acid–base chemistry, the focus of our research. A summary of the literature on studies involving ACs on acids and bases can be found in Table 1. As can be seen, students and teachers harbour a range of misconceptions on this topic.

Table 1 Summary of studies on alternative conceptions on acids and bases

S/N	Country/reference	Sample	Instrument	Nature of ACs/remarks^a
a Note: As far as possible, the phrasings in the original articles were used for most of the ACs. Not all ACs are reported.
1	France (Cros et al., 1986)	400 (overall) university year 1 students	Free interviews followed by semi-structured interviews and then a questionnaire-based enquiry	• No heat is evolved during a reaction between an acid and a base • A solution with pH 7 ± 2 is drinkable.

2	Australia (Hand & Treagust, 1988)	60 16 year old students	Interviews	• “An acid is something which eats material away or which can burn you.” • “Testing for acids can only be done by trying to eat something away.” • “Neutralisation is the breakdown of an acid or something changing from an acid.” • “The difference between a strong and a weak acid is that strong acids eat material away faster than a weak acid.” • “A base is something which makes up an acid.” (p. 55)

3	Canada (Ross, 1989)	34 grade 12 students who had completed grade 11 advanced chemistry program.	25 items, 4-options MCQ. 8 selected students underwent 40 min. clinical interviews.	• “Acids contain hydroxide ions.” • “All acids are strong acids.” • “Acids are poisonous.” • “Acid rain is formed from water and chlorine or hydrogen gas.” • “Acids contain hydrogen in the gaseous state.” • “Acids and bases react to form a solution.” • “A strong acid has a higher pH than a weak acid.” • “A gas is released when an acid and a metal reacts because heat changes the liquid to a vapor.” • “When hydrochloric acid and magnesium react, more gas is released than (when) acetic acid reacts with magnesium because the reaction is more violent.” • “When hydrochloric acid and magnesium react, more gas is released (than) when acetic acid reacts with magnesium (as) more hydrogen bonds need to be broken.” • “A strong acid reacts more slowly than a weak acid.” (pp. 105–107)

4	Israel (Zoller, 1990)	University freshmen, number unknown. Based on 15 years of teaching experience.	Based on author's teaching experience	• “How come an aqueous solution may have a pH < 0 or pH > 14?” (p. 1058) • “Salts, which are formed in a neutralization of acids with bases, are “neutral” species. As such, their aqueous solutions must be neutral (i.e., the pH is definitely 7).” (p. 1059)

5	India (Banerjee, 1991)	162 undergraduate chemistry students and 69 school teachers	21 item test instrument (7 MCQ, 8 short answer response questions, 3 on problem solving and 3 on application) on chemical equilibrium	• “No hydrogen ions in an aqueous solution of sodium hydroxide or in distilled water.” (p. 490) • “Rainwater is neutral.” (71% of teachers, 79% of students) (p. 491) • “Teachers (76%) and students (46%) believe that for the same concentration, the pH of acetic acid will be less than or equal to that of hydrochloric acid solution in water.” (pp. 490–491)

6	Germany (Schmidt, 1991)	7500 grammar-school students (Gymnasium). Test – 154 items from 10 topics.	10 MCQ items on acids and bases.	• “In any neutralization reaction, a neutral solution is formed, even if a weak acid or base takes part in the reaction.” • “Neutralization is an irreversible reaction.” (p. 459)

7	USA (Nakhleh, 1992)	Grade 11 chemistry students	Interviews	• “When asked how a solution of an acid or a base would appear under a very powerful magnifying glass, 20% of the students drew waves, bubbles or shiny particles”

8	USA (Nakhleh & Krajcik, 1994)	Fifteen grade 11 senior high-school students	Semi-structured Interviews	• “pH is inversely related to harmful” and “bases are not harmful”. • “Acids and bases have their own particular color or color intensity (bases are colored blue, acids are colored pink), and even different pH solutions have different colors.” • “The molecules fight and combine, and phenolphthalein helps with neutralization.” • “Acids melt metals, acids are strong and bases are not strong.” • “pH was regarded as a compound called phenolphthalein, a chemical reaction and a number related to intensity.” (pp. 1087–1090)

9	Sweden (Drechsler & Schmidt, 2005)	Secondary students	Multiple choice questions from examination boards	• Many represented the key reaction between dilute hydrochloric acid and aqueous sodium hydroxide as one that does not involve the reaction of hydrogen ions and hydroxide ions to form water. • Many students did not consider water as an acid in its reaction with ammonia. • In the reaction of nitric acid with copper, many students thought that nitric acid behaves as an acid.

10	Turkey (Demircioğlu et al., 2005)	88 tenth grade secondary school students	Worksheets: 3 sections 20 item MCQ ‘Concept Achievement Test’.	• “Acids burn and melt everything” • “All acids and bases are harmful and poisonous” • “As pH increases, acids become harmless and bases are not harmful” • “Different pH solutions have different colors” • “pH is a measure of acidity” • “A strong acid doesn't dissociate in water solution, because its intra-molecular bonds are very strong” • “The only way to test a sample whether it is an acid or a base is to see if it eats something away, for example metal, plastic, animal, and us” • “All salts are neutral” • “Salts don't have a pH value” • “In all neutralization reactions, acid and base consume each other completely” • “At the end of all neutralization reactions, there are neither H⁻nor OH⁻in the resulting solutions” • “A strong acid is always a concentrated acid” • “Bubbles or bubbling is a sign of chemical reaction or strength of an acid or a base” • “Indicators help with neutralization” • “As the value of pH increases, acidity” increases” • “While bases turn blue litmus paper into red, acids turn red litmus paper into blue” • “As the number of hydrogen atoms increases in the formula of an acid, its acidity becomes stronger” • “Species having formulas with hydrogen are acids and those having formulas with hydroxyl are bases” (p. 46)

11	USA (Sheppard, 2006)	16 students from three high school chemistry classes.	4 Interview tasks: • Introductory pH event • Neutralization • Questions about models • Acid–base titration	• “pH measures acidity only” (p. 36) • pH “measured the ‘strength’ of an acid or base or the amount of acid or base present.” (p. 36) • Students had considerable difficulty explaining “how the pH values related to the actual substances in terms of the particles present.” (p. 36) • “All indicators changed color at the same pH value and this was invariably at pH 7.” (p. 36) • “pH (is) a linear scale.” • “Acids were inherently more ‘powerful’ than bases.” (p. 38) • “Neutralization as a process of dominance of acids over bases.” (p. 38) • “The process of neutralization as the physical mixing of an acid with a base and named no products, drew no equations, and represented the process diagrammatically with unreacted chemical species.” (p. 37) • Students' “representations of sub-microscopic events of neutralization simply showed base particles attached to acid particles.” (p. 37) • In titration curve for initial non-changing value of pH, about half of the students explained that “despite the acid having been added, the reaction had not yet started.” One quarter said “no reaction was occurring.” (p. 40) • “For the sudden drop in pH value near the endpoint, approximately one third of the students described it as the reaction suddenly starting to occur.” (p. 40) • “Half the students described the leveling of the pH after endpoint as resulting from an excess of acid particles.” (p. 40)

12	Taiwan (Chiu, 2007)	A National Survey of Students' Conceptions of Chemistry in Taiwan This was a 6-year study (Pilot: August 2000 to March 2003.) For the first 2 years of the study, researchers developed two-tier diagnostic tests. After pilot testing, revising, and validating the original tests, the two-tier diagnostic items were arranged into three sets of formal paper-and-pencil tests. A nationwide test was carried out in April 2003.		As below:

	Taiwan (Huang, 2004)	400 Elementary school pupils	As part of a 20 item two-tier MCQ instrument.	• “44% thought that soapsuds are neutral because neutral materials do not harm skins and clothes.” • “36% thought that the solution of sodium bicarbonate and acetic acid to be neutral because it is a neutralisation reaction.” • “27% thought that all the acid/bases are toxic—for example, detergent and hydrochloric acid are detrimental to our health.” (p. 434)

	Taiwan (Chiu, 2004)	430 junior high school students and 240 senior high school student	As part of a 17 item two-tier MCQ instrument. 5 two-tier MCQs (Junior high) and 8 two-tier MCQs (Senior high)	• “13% of the junior high and 34% of the high school students thought that weak electrolytes are in a molecular state in a solution. They are in an ionic state after electrification.” • “When the test items dealt with sub-microscopic viewpoints of the acid/base, we did not see cognitive development in student performance. For instance, the correct answers for both-tiers were 11% junior high and 9% senior high school students when the test item asked about the distribution of the HCl particles in the water.” • “25% of junior high school students thought that the mixed solution of equal concentrations and volumes of acid (CH₃COOH) and sodium hydroxide (NaOH) is neutral because they react with each other completely.” • “Whereas the pressure increases, the degree of dissolution of gas decreases; therefore, the CO₂is released from water to increase pH value (21% junior and 7% senior); or, the degree of dissolution of gas does not change with pressure (12% junior and 9% senior).” • “Weak electrolyte exists as a molecule in water because some molecules decompose to ions, then positive and negative ions attract with each other to combine as molecules again (19% junior and 9% senior); or, weak electrolyte exists as a molecule or ions in water because weak electrolyte can just partially decompose (13% junior and 34% senior).” (pp. 434–435)

	Taiwan (Hsu, 2004)	512 elementary school pupils, 207 junior high school students and 251 senior high school students	As part of an 18 item instrument on categorisation of matter	• “Acetic acid and water can dissolve because water has the ability to dilute things (36% elementary, 46% junior, and 40% senior).” • “Acetic acid and water cannot dissolve because acetic acid decomposes instead of dissolving in water (17% junior and 30% senior).” (p. 436)
	Taiwan (Juang, 2004)	288 junior high school students and 256 senior high school students	As part of a ‘Material Science’ test (14 items for junior high school and 19 items for senior high school)	For junior high school students, • “27% thought that C₂H₅OH is base because it contains OH in its molecular formulas.” • “26% thought that C₂H₅OH is an acid because it has the highest number of hydrogen molecules.” • “16% thought that C₂H₅OH is base because all of the organic compounds are neutral.” (p. 436) For senior high school students, • “29% thought that C₂H₅OH to be a base given the functional group (OH).” • “10% thought that thyl alcohol is acidic because it has the highest number of hydrogen molecules.” • “58% understood that ethyl alcohol is neutral” but attributed this to different reasons. • “29% thought that C₂H₅OH is neutral because all of the organic compounds are neutral.” (p. 36)

13	Thailand (Artdej et al., 2010)	55 Grade 11 students	18 two-tier MCQs	• “CH₃COONa is a non-electrolyte.” (p. 175) • “NH₄OH is a non-electrolyte.” (p. 175) • “A strong acid could produce more bubbles upon reaction with metal than a weak acid.” (p. 175) • “A strong acid cannot dissociate in water, because of strong intramolecular hydrogen bonding.” (p. 175) • “All bases are ionic compounds.” (p. 175) • “All bases are corrosive.” (p. 175) • “CO₃²⁻and NH₃and HCOO⁻are Brønsted–Lowry acids.” (p. 176) • “BH₃is a Lewis base.” (p. 176) • “H₂SO₄is the conjugate acid of SO₄²⁻.” (p. 176) • “HPO₄²⁻and NH₃are a conjugate acid–base pair in the reaction HPO₄²⁻(aq) + NH₃(aq) → PO₄³⁻(aq) + NH₄⁺(aq)” (p. 176) • “The concentration and proton attraction of H₃O⁺influences the strength of acids.” (p. 177) • “Aqueous solution of H₂SO₄is more acidic than that of HCl.” (p. 177) • “A weak acid can dissociate completely in solution, and that, in a weak acid solution, water is a significant contributor to the H⁺concentration of the HF solution.” (p. 177) • “Concentrations of AOH and BOH are equal and the base dissociation constant (K_b) of AOH is less than BOH. At equilibrium, concentration of BOH is more than, or equal to, the concentration of AOH.” (p. 178) • For a base BOH (K_b of BOH = 2.6 × 10⁻⁸ mol dm⁻³, concentration of 0.1 mol dm⁻³), “concentration of OH⁻was equal to 1.0 × 10⁻¹mol dm⁻³.” (p. 178) • “Pure water is a good conductor.” (p. 178) • “K_wis equal to 1.0 × 10⁻⁷.” (p. 178) • “The addition of an acidic solution to water did not affect the concentration of OH⁻or H₃O⁺in the system, because the ionization of water depended on temperature.” (p. 179) • “When 0.1 mol of KOH was added into 1.0 L of water, the solution would be neutral.” (p 179)

14	Turkey (Özmen et al., 2009)	59 high school students	25 MCQs (15 one-tier, 10 two-tier) followed by an intervention study.	• “The only way to test a sample whether it is an acid or a base is to see if it eats something away, for example metal, plastic, animal, and us.” • “Acids burn and melt everything.” • “All salts are neutral.” • “Salts don't have a value of pH 10.” • “All acids and bases are harmful and poisonous.” • “Strong acids can react with all metals to form H₂gas.” • “Strength of an acid depends on the number of hydrogen atoms in an acid.” • “As the value of pH increases, acidity increases.” • “pH is only a measure of acidity.” • “A strong acid is always a concentrated acid.” • “A strong acid doesn't dissociate in water solution, because its intra-molecular bonds are very strong.” • “In all neutralization reactions, acid and base consume each other completely.” • “At the end of all neutralization reactions, there is neither H⁺nor OH⁻ions in the resulting solutions.” • “As concentration of H₃O⁺in an acid solution increases, pH of the solution increases.” • “After all the neutralization reactions, the pH of solution formed is always 7.” • “All acids and bases conduct electricity the same.” (p. 14)

15	Turkey (Cetin-Dindar & Geban, 2011)	3 groups of high schools students (12, 111, 156)	Interviews (12) Open-ended questions (111) 18 three-tier MCQs (156)	(Not specified) Study aimed at showing that a three-tier MCQ is more reliable in identifying alternative conceptions than conventional MCQs

16	Turkey (Saglam et al., 2011)	Two stages: Stage 1: 106 university students took a questionnaire. Stage 2: 16 students with proper understanding volunteered to be interviewed.	2 rounds of interviews.	• Most students “associated acids with red color and bases with blue color.” (1407) • “Formula X is an acid, when HCl is added, the equilibrium shifts right, and similarly because Formula Y is a base, when NaOH is added, the equilibrium shifts left in order to eliminate the stress.” (p. 1404)

17	USA (McClary & Bretz, 2012)	104 second semester (in 2 sections) college organic chemistry students.	9 four-tier MCQs (in 3 sets of 3) Confidence scale: 0–100%	• “Functional group determines acid strength (e.g. Acetic acid is more acidic than phenol and 2,4-pentadione because it is a carboxylic acid.) • Stability determines acid strength (e.g. “2,4-pentadione is more acidic than acetone and acetaldehyde because 2,4-pentadione has two carbonyl groups.”) (p. 14)

Gaps in studies reviewed

The literature review explored the range of ACs exhibited by students on the topic of acids and bases, and how these were diagnosed. It can be seen that these studies have been done from elementary, secondary, high school, college to university levels. Most of them are focused on a single level except in the case of the Taiwanese National Survey (Chiu, 2007), which reported very comprehensive and concerted efforts to cover elementary to senior high schools over a six year period of study. It has, however, to be noted that this is a comprehensive survey on various topics in Chemistry, with acid–base chemistry being only a sub-set of the survey.

Except for the study by Demircioğlu et al. (2005), the number of ACs uncovered for upper secondary students in any study is relatively few (five to six). It is not likely that these studies covered sufficient breadth and depth. In addition, the Concept Achievement Test developed by Demircioğlu et al. (2005) is, strictly speaking, an achievement test rather than a diagnostic instrument although it doubled up in its role to identify ACs together with interviews.

Also noticeable is that for most of the studies, other than standard assessments and interviews, 2-tier multiple choice (2TMC) test is one of the diagnostic formats used. The inclusion of the reason tier in 2TMC questions greatly reduces the probability of getting a question correct by guessing (from 0.25 to 0.0625 for 4-response options in each tier), lending to more robust conclusions in the studies. However, it does not help to shed more light on why students get an answer wrong. A correct answer accompanied by wrong reason could mean either the existence of an AC or a lack of knowledge – and it is not possible to differentiate between these (Caleon and Subramaniam, 2010). In the recent literature, two diagnostic instruments on acids/bases have been reported. The 18 item Acid–Base Diagnostic Test (ABDT) of Artdej et al. (2010), based on 2-tier MCQ format, elicited 19 ACs among 55 eleventh grade 11 students. An obvious limitation of the instrument is that the language used is Thai, and it is therefore not easy for others to use it. In addition, since two-tier MCQs were used, chance factors and lack of metacognitive information can also contribute to its shortcomings. The nine item ACID I, in four-tier multiple choice (4TMC) format, developed by McClary and Bretz (2012) uncovered two ACs on acid strength among 104 undergraduates. With the incorporation of confidence ratings, wrong answers and/or reasons with high confidence ratings can provide more reliable indications of ACs. The magnitude of the confidence ratings also provides some indication of the strengths of the ACs (Sreenivasulu and Subramaniam, 2013). The use of 4TMC tests as a diagnostic instrument for uncovering ACs is relatively new, with Caleon and Subramaniam (2010) having used it earlier to diagnose upper secondary students' ACs on the topic of waves.

The use of the 4TMC test as a diagnostic instrument, however, is still largely unexplored – we managed to locate only a handful of references on this. In terms of the number of questions used in the studies, the literature indicates the number of one-tier and two-tier MCQs ranging from 5 to 25 items. Nine items were used by McClary and Bretz (2012) in their 4TMC test. Further studies on 4TMC format in different domains are needed before we can unequivocally reiterate its utility. The number of ACs identified in the studies (approximately) that are equivalent to the coverage of the ‘O’ Level chemistry syllabus, the focus of our study, ranges from 1 to about 16.

The principal objectives of this study are to develop a 4-tier diagnostic instrument on acid–base Chemistry and to document the prevalence and strengths of the ACs among grade 9 students.

Methodology

Development of acid–base chemistry diagnostic instrument (ABCDI)

Research design. We used the approach of Treagust (1988) for 2-tier instruments to craft the ABCDI but with some modifications, which are elaborated below.

Ethics clearance. Approval from the university's Institutional Review Board was obtained for the conduct of this study. All students who participated in this study gave informed consent through their parents.

General information. The test instrument was crafted in English, which is also the medium of instruction in schools in Singapore. Students were taught the topic of acids and bases by their teachers at least two months before this study. They were also given at least a week to revise this topic before sitting for the diagnostic test.

Phase 1 (preliminary phase)

Development of the MCQ instrument with open-ended section for explanation. Based on the literature, conversations with secondary school chemistry teachers over the years, and the first author's teaching experience, a 25 item MCQ instrument was developed. Each MCQ comprises a stem plus four or five options. The reason tier was a blank space for students to provide their explanation for their choice of answer.

Student interviews were not used as an additional approach in this phase to uncover ACs for the following reasons:

(a) While interviews can provide better insights into students' knowledge framework, it is a very time consuming process.

(b) Interviews have to be recorded and transcribed, and the process is tedious. This study aims to also ascertain the effectiveness of crafting 4TMC items without resorting to interviews.

(c) Interviews can only capture certain aspects of conceptions of a small sample of students. In place of the additional ACs that could be potentially generated via interviews, the first author's teaching experience was instead used to generate a number of ACs to be used as distracters for the MCQ items.

The test was administered to secondary 3 (equivalent to Grade 9, ages of about 15–16) students (N = 113) from a mainstream government co-educational school. The students take (Pure) Chemistry as one of the subjects for the Singapore-Cambridge General Certificate of Education. The time allotted for the test was one hour.

Facility index (FI) and discrimination index (DI) were calculated for each item. Distractor analysis for the MCQ items plus analysis of the open-ended responses were also done. These data were used as a predominant guide to modify or delete questions, and to identify potential ACs for use as distracters in framing the second tier for the pilot version of the instrument. In this way a 24 item two-tier MCQ instrument was developed. For each of the answer and reason tiers, a corresponding confidence scale was added, with the scale ranging from 1 (just guessing) to 6 (absolutely confident).

The instrument was sent for validation to three Chemistry academics and a subject head who has taught secondary level Chemistry for more than ten years. They were requested to assess the items in the instrument holistically as well as individually. A checklist that they need to tick on (Yes, No and Remarks columns) was also provided, and it focused on the following:

• The questions can be understood by the level of students being evaluated – this is to ensure that the students would not provide a wrong answer due to weak command of the language.

• The questions test students' understanding of acids and bases – this is to ensure that the questions fall within the scope of the syllabus.

• The test is appropriate for the level being evaluated – this is to avoid dependence on knowledge beyond the intended level in order to answer the items correctly.

• The questions are not ambiguous – this is to ensure that the questions would not be misinterpreted.

• Each question has only one correct response in each tier.

• The time to complete the test is reasonable (one hour)

• Each question asks for an answer on only one aspect (not double barrelled)

• The responses accommodate most of the possible answers – this is to reduce the chances of guessing or elimination of the least plausible options to answer the question.

• The questions do not lead the respondent to a desired answer – this is to avoid a situation where the stem provides clues (e.g. grammatical hints) for the answer.

• The questions do not use emotionally loaded or vaguely defined words – this serves to keep the questions “emotionally neutral”, neither appealing nor discriminating to the students' state of mind, thus decreasing any ambiguities.

Overall, the validators felt that the instrument can be used with some minor modifications. Changes to some of the options and rephrasing of a few questions were undertaken to address their concerns. In this way, a pilot version of the instrument was developed.

Pilot study

The pilot test instrument was administered to another sample of students (N = 116) from secondary 3 in three mainstream government co-educational schools who were studying (Pure) Chemistry. It was found that the one hour duration was more than adequate to complete the test.

Data from this phase were collated and analyzed. Again, FI and DI were computed as well as distractor analyses done. Based on the scores for the answer tier only, FI ranges from 0.06 to 0.61, with an average value of 0.28 (SD = 0.17). For getting both-tiers correct, FI ranged from 0.00 to 0.44. As none of the questions was deemed too easy, all the 24 questions were retained. Further validation indicated that no revisions were necessary.

A sample question is shown below:

Main study

On top of the 24 questions, it was decided to include an additional question to form a 25 item 4TMC instrument for the main study. This item was unique in that no combination of options from the given answer and reason tiers was correct. Students would have to exercise the option of providing their own answer and reason to score a mark if they feel that none of the options provided were satisfactory. All students were briefed about this possibility before the test but no specific reference to any question was indicated. This was an attempt to try out whether a 4TMC question can ‘mimic’ a traditional opened-ended question where students are to provide their own responses. As part of a diagnostic test, an item like this could help to surface ACs that may not be covered by the options provided. If none or only a few students opt to give their own answer and reason, it may indicate that the sample, on the whole, is relying on common test-taking strategies to answer this question.

A different sample of 141 students (71 females, 70 males) participated in the main study. As with the preliminary and pilot studies, the samples are of diverse ethnicities – predominantly Chinese followed by Malays, Indians and others.

Treatment of data

For the 4TMC questions in the pilot phase and main study, FI and DI for the answer, reason and both-tiers were computed for individual test items. T-tests and Cronbach alphas were computed using Microsoft Excel.

For each 4TMC item, the answer and reason responses were scored separately. When analyzing the answer tier, a score of 1 was given for a correct response and zero was given for a wrong response. When analyzing both-tiers, a score of 1 was awarded only if correct responses for both answer and reason tiers were given by the student; otherwise, zero score was assigned. In addition, the confidence ratings for each option chosen were analyzed, and used as a rough gauge to assess the strengths of the ACs. The relevant confidence measures are as follows:

(a) CF: mean confidence rating.

(b) CFC: mean confidence rating for an item with a correct response.

(d) CAQ: Confidence Accuracy Quotient

(e) CB: Confidence Bias

Furthermore, the ACs obtained were categorised based on the classification developed by Caleon and Subramaniam (2010):

(a) Significant alternative conception (SiAC)

(b) Spurious alternative conception (SpAC)

(d) Moderate alternative conception (MAC)

(e) Strong alternative conception (SAC)

(Please refer to Appendix A for details.)

When data analysis was being done for the main study, it was found that Q24 was faulty. Data for this question was thus excluded from the analysis. Also, the trick question (Q25) was not included in the computation of DI as no student got this correct.

Treatment of missing data

For the diagnostic test, missing data were treated by the list-wise deletion approach so that complete case analysis can be done. This approach would generally lead to diminution in the power of the statistical tests conducted owing to the reduction in sample size. This can be addressed significantly if the data can be shown to be missing completely at random (MCAR) (Heitjan and Basu, 1996). If data were MCAR, then the reduced sample size can be considered to be a random sample of the larger sample, in which case the statistical power of the tests would be essentially preserved or reduced only minimally. To check for MCAR, the sample was bifurcated into data sets that were complete and those with one or more missing elements. The distribution properties of the data sets were examined. If t-tests for the normally distributed data show no significant differences, then the data can be considered to be approximately MCAR, and estimates obtained can be considered to be reasonably unbiased.

Results and discussion

Missing data

While collating the data, it was found that a significant number of scripts contained missing information. Analyses of the missing information were carried out, and it was found that the data are approximately MCAR (Appendix B).

Using a smaller sample size comprising the 92 ‘perfect’ scripts would have only a minimal impact on the statistical power of the tests, owing to the reduced sample size, but valid statistical inferences can still be obtained for this sample size. That is, the threats to internal validity are likely to be less.

The reduced sample size in the present study is still especially good when compared to the reduced sample size of 89 students used in the study of McClary and Bretz (2012). These researchers also faced the same issue of missing data. Out of the initial 104 participants, only 89 were included in the final data analysis, disregarding those with missing answers or confidence ratings. The authors, however, did not provide statistical justification for the use of only 89 samples, even though the number of scripts with missing data is small – 15 out of 104, or about 14.4%. As the percentage of missing data is small, it is likely that threats to internal validity are minimal.

In view of the foregoing findings, it was decided that the present study would concentrate on the reduced sample size of 92 for three reasons. Firstly, this sample size is comparable to other studies reported in this area; secondly, there is statistical justification advanced to show that the 47 scripts with missing data can be safely omitted from the analysis; and, thirdly, there is precedence in the literature to disregard missing information in a study using a multi-tier instrument.

Test statistics

Cronbach's alpha (α) was used to determine the internal consistency of the ABCDI, based on the test scores for answer, reason and both tiers. It was found to be low for the answer tier (0.31) and reason tier (0.38). It rose when both answer and reason tiers are considered together (0.44). These statistics imply an advantage of complementing the answer tier with a reason tier in assessing learning and understanding. The criterion-referenced nature of the test items in ABCDI could have reduced the variability of the test scores. This could result in deflation of the values of alpha as reliability measures depend also on the variability in the scores (Popham and Husek, 1969). The present study also involved various sub-topics that were based on different concepts within acid–base Chemistry. The test was therefore not likely to be uni-dimensional (Schmitt, 1996) as the various sub-topics tested may not form a distinctive construct and so the test measured more than one dimension, which could have lowered the value of alpha. The results are similar to those found in the study on waves using 4TMC items (Caleon and Subramaniam, 2010), where the Cronbach alphas for the three corresponding tiers were 0.40, 0.19 and 0.50 respectively. Low Cronbach alphas were also found in a recent study on undergraduates' ACs on chemical thermodynamics (Sreenivasulu and Subramaniam, 2013) using 4TMC test and by (McClary and Bretz, 2012) in their 4TMC test on acid strength.

The proportion of students who gave correct answers as well as the values of the relevant confidence measures per item of ABCDI is summarized in Table 9 (Appendix C). The test is considered difficult for the students as the mean proportion of students providing correct responses is well below 0.5 (Answer tier = 0.30, reason tier = 0.26 and both tiers = 0.16). The low FIs are not a major concern as the test was not meant to determine the degree of mastery of a predefined set of learning objectives for assessment purposes. Tests for assessment or placement need a ‘spread of scores’ to account for the range of concepts or skills examined. The questions also need to be reasonably ‘doable’. In contrast, for a diagnostic test, the aim is to surface ACs, and therefore the items used tend to delve rather deep into the topic matter. With two tier items, the difficulty levels are further amplified. A low FI does not necessarily imply that an item is unreasonably difficult. Instead, it is likely to be pointing to the existence of an AC or lack of knowledge on a number of items among the students. The DIs, based on scores for both tiers correct, range from 0.00 to 0.70, with an average of 0.24 (SD = 0.20). This is again close to that of the pilot test (−0.03 to 0.69, average 0.24). Overall discrimination for the entire instrument, based on scores for both tiers, is considered fair.

Some patterns can be observed when examining Table 9 for students' confidence ratings. For Q13, 15, 16, 18, 19 and 21, the average confidence rating (both tiers) of students who provided the correct answers (CFC) is below 3.5. This is interpreted to mean that they are not too confident about their answers, even though they scored correctly for these questions. Overall, the CFC for the items has a mean value of 3.64 (SD = 0.49).

For about half of the questions, the average confidence ratings of students getting wrong answers (CFW) are higher than 3.5. For questions 1, 2, 4, 6, 11, 12, 14, 15, 17, 18, 19, 21, 22 and 23 (14 questions in total), in respect of answer, reason and both tiers, CFWs are higher than the corresponding CFCs. This corresponds to negative CAQs (in one or more of the tiers). For three of the questions (Q11, 15 and 21), the students are overconfident across all three tiers.

The mean CAQs are low (−0.03 for answer tier, 0.07 for reason tier and 0.16 for both tiers), suggesting that, in general, the students could not discriminate well between what they know and what they do not know.

The CB value (both-tiers) for every item is positive (varies from 0.12 to 0.56), which implies a possibility of overconfidence among the students. This is a little surprising as Asian students, with predominantly Confucian heritage, have a greater tendency to be more modest and would not rate themselves highly when asked to self-assess their own performance (Ho, 2009). A possible explanation could be that the ACs are deeply entrenched such that the students believed that they have answered correctly according to their ‘knowledge and understanding’. The 10 items with the highest CB (for both-tiers) are Q23, 6, 17, 25, 11, 15, 22, 12, 18 and 21, in descending order of CB values.

The foregoing observations did not come as a surprise as they were also flagged off in the findings of the preliminary and pilot phases.

Item 25 is a ‘special’ item, where none of the options provided for the answer and reason tiers are correct. The distractor analysis is summarized in Table 2. The most popular answer–reason combination was D and B (27.17% with a CF of 3.46 for around a quarter of the respondents), followed by the second most popular combination of B and A. Only 2.17% (with a CF of 3.75) of the students chose E–E combination (i.e. none of the given answer–reason combinations is correct), which ought to be the ‘correct’ set of responses provided the written comments are valid. However, after going through all the self-provided responses by the students, none of the responses was found to be acceptable. Answer option D accounted for 48.91% – nearly half of the sample size. The lower percentage of students choosing this free response combinations suggests a likelihood of students' fixation on the idea that the correct answers must be among the given choices (even though it was mentioned on the cover page of the instrument as well as during the briefing before commencement of the test that students can provide their own answer(s) and reason(s) should they find that none of the given options for an item was satisfactory). Apparently, the idea that the pH of an aqueous solution of an acid depends only on its concentration (answer option D) is a very prevalent AC. The next most popular answer option was B (that the pH of an aqueous solution of an acid depends on the strength of the acid only), which accounted for 20.65%. These two options made up a total of 69.56% of the responses, which is more than two-thirds of the sample size. Apparently, the students have only partial understanding of the factors affecting the value of pH. In addition, despite the fact that the term ‘only’ should be providing some hints that each of the options alone may not be encompassing enough; the number of students that decided that none of the answer–reason combinations is correct is far too low, at 2.17%. These observations can be interpreted in terms of the prevalence of incomplete understanding of the relevant concepts. Perhaps, there might be a certain extent of reluctance on the part of the students to venture beyond the provided options.

Table 2 Distractor analysis for item 25

Reason	Answer
Reason	A	B	C	D	E^a
Numbers in brackets are the corresponding confidence ratings.a Student self-provided answer or reason.
A	0.00% (—)	17.39% (3.56)	4.35% (3.50)	1.09% (4.00)	0.00% (—)
B	1.09% (3.50)	0.00% (—)	5.43% (3.30)	27.17% (3.46)	5.43% (3.60)
C	1.09% (3.00)	3.26% (4.50)	0.00% (—)	14.13% (3.77)	0.00% (—)
D	10.87% (3.45)	0.00% (—)	0.00% (—)	2.17% (3.25)	0.00% (—)
E^a	0.00% (—)	0.00% (—)	0.00% (—)	4.35% (4.00)	2.17% (3.75)^a

The 30 ACs found are summarized in Table 10 (Appendix D). A discussion of these ACs follows.

Alternative conceptions identified

The ACs identified are discussed in terms of the groups they occur under. Where available, comparisons are also made with respect to findings from other studies.

Properties of acids and bases. Students think that acids are associated with corrosiveness (AC1), which is similar to the findings of others (Treagust, 1988; Demircioğlu et al., 2005). A significant number of students (35.87%) also considered acids as being more dangerous and reactive than bases (AC2), which is parallel to “Acids melt metals, acids are strong and bases are not strong” (Nakhleh and Krajcik, 1994) or “Acids are more powerful than bases” (Sheppard, 2006). It also partially coincides with “Acids burn and melt everything” (Özmen et al., 2009, 2012). Both AC1 and AC2 are spurious ACs (SpAC, meaning CF < 3.5), which means that they are more likely to be due to lack of knowledge or understanding. Students may have encountered more acids than bases in their daily lives (even tertiary students know more about acids than bases, Cros et al., 1986) and media reports on accidents and attacks involving acids may have led to the preconceptions that acids are rather reactive and dangerous.

The rather high percentage (52.18%) of students thinking that rain water in an unpolluted area is neutral (AC3) came as a surprise. This is comparable to Banerjee's (1991) findings that 71% of chemistry teachers and 79% of undergraduate students exhibited the same AC. Within the 52.18%, about 19.57% supported their case arguing that rain water is pure water. AC3 is a strong AC (mean CF ≥ 4.0), which warrants attention as it is likely to be quite deep-seated. We suspect that the students might have been misled by the term ‘unpolluted’ to some extent, so much so that they assumed that the precipitation from the sky would stay chemically pure in an unpolluted area even though they should be aware that naturally occurring constituents (carbon dioxide in this case) in the atmosphere will dissolve in the rain water to make it acidic.

A naïve and superficial correlation of chemical structures with acidity or basicity may explain why more than a quarter (29%) of the students believed that compounds containing H will produce H⁺ whilst compounds containing OH will produce OH⁻ (AC4). These were also found in other studies (Demircioğlu et al., 2005; Huang, 2004; Özmen et al., 2012). AC4 is a moderate AC (MAC). Simplistically relating compounds containing H as acid is not that surprising. However, the only weak base that students are familiar with at this level is aqueous ammonia. Perhaps the frequent encounters and familiarity with strong bases such as sodium and potassium hydroxides could have dominated their thinking.

AC5, a spurious AC, is probably due to a failure to recall the definition or may also be a language issue – misinterpretation of ‘basic’ in the term ‘basicity’, that leads students to think that the higher basicity of an acid makes it more basic! The strength of this AC is not high and it could be addressed relatively easily once the distinction between basicity and basic is made clearer.

There was also a group of students (11.96%) that have yet to understand that the mechanism of electrical conduction in aqueous solutions is different from that in metals, and believed electrons are also responsible for the conduction of electricity in acid solutions (AC6, MAC). This is similar to the finding in the study of Othman et al. (2008), which revealed students' belief that an ionic compound does not conduct in the solid state but in the aqueous state due to the production of free electrons formed in an aqueous solution. These are spurious to moderate ACs, which would require better understanding of the ions present and how they affect the conductivity of the solutions in order to be addressed.

Strengths of acids and bases. For those students asserting that pure ethanoic acid is a stronger acid than aqueous ethanoic acid (AC7), their reasoning was that pure ethanoic acid produces more hydrogen ions. This moderate AC could also be another case of language issue in that the term ‘pure’ may have given a connotation of ‘without impurity’ and so naturally ‘more’ acid is present to produce more hydrogen ions. The fact that an aqueous medium is needed for the acid to dissociate to form hydrogen ions was somehow ignored or side-lined. This would not be too difficult to address once students understand that water is needed before the typical behaviors of acids can be manifested.

The belief that a stronger acid is one that produces a higher hydrogen ion concentration (AC8) or is one with a higher initial concentration (AC9) indicates incomplete understanding of the definitions of strong and weak acids (complete versus incomplete dissociation) as well as a partial picture of the factors responsible for the quantity of hydrogen ions produced. These are moderate to strong ACs, indicating quite deeply entrenched beliefs. The same flawed argument was also extended to weak acids in AC13 (MAC). Students thought that a weak acid that produced more hydrogen ions was the stronger of the two weak acids. Having a higher concentration of hydrogen ions could be due to more than one factor, of which acid strength is just one of them. This is consistent with the findings for item 25, where less than 10% of the students rejected all the options where only a single factor is responsible for the value of pH (a measure of hydrogen ion concentration). These ACs are not unique among the students as they are also identified in other studies: “Concentrated acid is strong acid” (Ross, 1989; Demircioğlu et al., 2005; Özmen et al., 2009). It did not really come as a surprise that the same flawed conceptions were also extended to bases – a stronger base is one with a higher initial concentration (AC11, SpAC) or a higher initial number of moles (AC12, SAC). Apparently, the students transferred the same flawed thinking to bases.

AC10 again demonstrated another instance of “the more the merrier” way of thinking and therefore, a dibasic acid is considered to be stronger than a monobasic acid. It is not surprising that this has also been reflected as a strong AC as the students think that the potential ability to produce twice the amount of hydrogen ions with the same starting concentration would ‘naturally’ imply a stronger acid. This AC is similar to that found in the sample of students in the study in Turkey, who also thought that if an acid has more H in the formula, its acidity will increase (Demircioğlu et al., 2005; Özmen et al., 2009).

For AC14 (MAC), students seem to be content with the understanding that a dibasic weak acid produces twice the amount of hydrogen ions than a weak monobasic acid. This is closely related to AC8. The medium strength of this AC indicates that it is a genuine AC. This is most probably a (wrong) extrapolation of AC8 to weak acids. In addition, a dichotomous mode of thinking may be involved, that is, either 100% dissociation or partial dissociation, and all ‘partial’ dissociations are equally partial. Unfortunately, ‘partial’ has a wide range (more than 0% to less than 100%), and the nature of the acid also matters.

pH. Almost two-thirds (65.22%) of the students possess the strong AC that pH ranges from 1 to 14 (AC16, SAC) which agreed with Zoller's (1990) findings. Illustrations and diagrams in textbooks as well as teachers' instructional materials could be the source. This could also account for the high strength of the AC as students normally do not question what they see in instructional materials. Without clear emphasis that the range shown is more for illustration purposes only, students could mistake it as the absolute range that pH values would have. In view of the high strength and high proportion of students having this AC, it is one that must be addressed or, better still, preempted while teaching the topic to deter its formation. Inadequate teaching may have led to incomplete learning.

For students with AC17 (SpAC), it would be expected that they would avoid choosing an unfamiliar answer if they are not aware of the self-ionization of pure water (in the reason tier). However, the more important issue is that the students should not have the idea that pH of pure water would be increasing when exposed to air (in the answer tier). With their knowledge of the presence of carbon dioxide in air and/or acid rain, they should deduce that the water should be getting more acidic, accompanied by a decrease in pH. One possible explanation could be that they were aware that the water was getting acidic but got confused over the definition of pH and thought that an increase in acidity should be followed by a corresponding increase in pH instead of a decrease. Similar findings from others (Demircioğlu et al., 2005; Sheppard, 2006; Özmen et al., 2009) indicated that students associated pH with acidity only. Either owing to a lack of understanding of the mathematics or the “more” mentality, students could have thought that more hydrogen ions (assuming it means a higher hydrogen ion concentration to them) lead to a higher (instead of lower) pH. This type of thinking was also exhibited by some Turkish students, with the thinking “As the value of pH increases, acidity increases.” (Demircioğlu et al., 2005).

A partial understanding that more than one factor contributes to the concentration of hydrogen ions in an aqueous solution is responsible for the thinking that the pH of an aqueous acid depends only on either the concentration of the acid (AC19, 17.39%, MAC) or on the strength of the acid (AC20, 14.13%, SAC). Sheppard (2006) also found that students thought that pH measures strength (of an acid). These medium to strong ACs are likely to be the result of partial understanding of the concepts related to strengths of acids and pH. They can be expected to improve once the ACs associated with strengths of acids are addressed.

It was found that about a quarter of the students (28.26%) thought that pH is a measure of the total number of moles of hydrogen ions (AC21, MAC). This is closely related to “pH is a measure of acidity” (Demircioğlu et al., 2005; Sheppard, 2006). Another 14.13% thought that the pH of an aqueous acid is independent of its concentration (AC22, SAC), which is rather illogical. The prevalences and strengths of these two ACs can be reduced if students are formally introduced to the mathematical definition of pH and its implications rather than a cursory introduction.

Neutralization. AC29 is very prevalent (found in items 21 to 23, ranging from about 12% to 41%, and from SpAC to SAC). Other studies found the same AC (Demircioğlu et al., 2005; Özmen et al., 2009) or a closely related one such as: the solution formed after neutralization is neutral (Zoller, 1990; Schmidt, 1991; Huang, 2004). Students are well aware that neutralisation is between H⁺ and OH⁻ and might have treated it as a kind of ‘cancellation reaction’ where all the H⁺ are ‘cancelled out’ by OH⁻ to form water molecules.

AC30 (SAC) is analogous to AC27 (which has probably been ‘extrapolated’ to AC30), in that the cations and anions of the salt formed after complete neutralization is paired up. This is a strong AC that needs to be addressed. The same ‘explanation’ regarding electrostatic attraction as before (as for AC27) should be applicable to this AC. Students need to understand the difference in the environment between the solid (ionic lattice) and the aqueous solution for ionic compounds.

Indicators. That there are no H⁺ left to react with an indicator at the end point and so the indicator changes color (AC15) was the belief of a substantial proportion of the students (43.48%). There are actually two issues concerning this moderate AC. Firstly, there is a supposition that an indicator changes color when there are no hydrogen ions to react with, which implies the notion of “something happens when nothing is present”. This is quite an unusual finding as a reasonably good chemistry student would understand that no change would happen if there is no reaction. The actual cause of this problem probably has something to do with the fundamental understanding of chemical reactions, rather than a specific issue with acids and bases. Secondly, it is linked to AC30 (to be discussed later), where students thought that there was neither hydrogen ions nor hydroxide ions left at the end-point (Schmidt, 1991; Demircioğlu et al., 2005; Özmen et al., 2009).

Sub-microscopic views of acids and bases. There seems to be apparently quite widespread confusion in terms of the ions present in pure ethanoic acid. When pure ethanoic acid is compared with an equimolar amount of aqueous ethanoic acid, students thought that

(a) pure ethanoic acid produces the same number of moles of H⁺ as an equimolar amount of aqueous ethanoic acid. (AC 23, 16.30%, SAC)

(b) pure ethanoic acid produces more H⁺ than aqueous ethanoic acid. (AC24, 19.57%, SAC)

There were takers for all the possible (same, more or fewer) wrong answers, that were also fairly evenly distributed. It seems that there is a fundamental lack of understanding that pure ethanoic acid, in the absence of water, is not expected to ionize to produce ions. Students have been taught that an aqueous medium is needed for the manifestations of acidic and basic behaviors.

Compartmentalized thinking involving exclusive association of hydroxide ions with alkalis, and hydrogen ions with acids could have accounted for students' thinking that there are no hydrogen ions present in an alkaline solution (AC26, SAC). Banerjee (1991) also found the same AC in his sample of students.

Students thought that the sodium and hydroxide ions in aqueous sodium hydroxide are paired up (AC27) – this is attributed to the failure to understand that although the ions are held together by electrostatic attraction in the ionic lattice (solid state), the process of dissolution would require them to separate from each other. This spurious to moderate AC is consistent with other studies (Tien et al., 2007; Nyachwaya et al., 2011; Rosenthal and Sanger, 2012), where students thought that ions from an ionic compound are paired up in aqueous solution. In addition, students have been taught that bases dissociate to form hydroxide ions when dissolved in water. The understanding that electrostatic attraction is strong might have led to the thinking that the attraction is still holding the ions in pairs even after dissolution.

For AC28 (SpAC), students (15.22%) recognized that an equilibrium is established between ammonia and ammonium ions. However, they concluded that there are more ammonium ions than ammonia molecules (answer tier), and that the stabilization of the ammonium ions by the water molecules contributed to a shift in the equilibrium to favor the formation of ammonium ions (reason tier). The students should reasonably have got the answer tier correct. Since ammonia is a weak base (partial dissociation), it should convey an idea that only a small amount of ammonia would react with water to form ammonium ions, leaving most of the ammonia molecules undissociated. A possible remedy suitable at this level would have to come in the form of informing students about the ‘facts’ and/or reasoning using the concept of ‘weak’ base. This should be effective as the strength of the AC is low.

Conclusion

The identification of several ACs on acid–base chemistry in this study suggests that the traditional teaching the students in this study have undergone has not been very effective in building a solid understanding on acid–base chemistry. It is most likely that instruction has focused on the syllabus content, which is understandable from the point of view of examinations. A more effective way would be by being proactive, that is, by ascertaining the ACs on the topic and addressing these also during traditional teaching.

A total of 30 ACs were identified in the main study using the 25-item 4TMC test. Some examples of ACs (with their approximate prevalences) that were found in this study but not found in the reviewed literature on acid–base chemistry are:

(a) The higher the basicity of an acid, the more basic it is (11%).

(b) pH of an aqueous acid is independent of its concentration (14%).

(d) Pure ethanoic acid (in the liquid state) produces more H⁺ than aqueous ethanoic acid (20%).

(e) Equimolar amounts of aqueous and pure ethanoic acid produce the same number of moles of H⁺ (16%).

(f) There are more NH₄⁺ than NH₃ molecules in aqueous ammonia (15%).

With the exception of the ACs concerning strengths of acids (and bases) and ions present in aqueous solutions, we believe that some of the remaining ACs could be easily addressed pre-emptively during teaching. A good example is the AC concerning the range of pH, where the teacher could point out that the pH charts in textbooks just serve as illustrations and that the range of pH can go beyond those values shown to some extent.

Implications

The study has some implications:

(a) The 25 4TMC instrument has helped to identify 30 ACs. This is considered to be effective when compared to other methods of data acquisition (such as interviews) in terms of time needed for testing, scoring and extraction of relevant information. Teachers can use the diagnostic instrument straightaway.

(b) The 4TMC test also provides more information beyond the traditional one-tier MCQ and two-tier MCQs as the extra tiers of confidence ratings provide metacognitive information that helps to surface qualitatively students' levels of certainty for their responses and, to a first approximation, how deep-seated these ACs are.

(c) Computations of indices such as CAQ and CB provide additional insights into the students' perceived confidence of their performance.

(d) The use of a ‘trick’ question in this study where all the stated options are incorrect and where students need to provide their own answer and reason in the blank space provided has implications for test developers. For example, it is the norm, especially in MCQs, two-tier MCQs, three tier MCQs and four-tier MCQs, that the correct answer is in one of the responses. Students can get this correct either through correct content knowledge, partial knowledge or guesswork. It is possible to enhance the robustness of a test by having perhaps one or two questions where the given options are all wrong and where students would need to provide their own answers. Of course, space needs to be provided for each question in such tests for them. This can lengthen the format of the test but it has the additional advantage of penalizing guessing. The ‘trick’ question is a novelty that would be of interest for future research. In this study, no student answered this question correctly.

(e) The use of practitioner knowledge to craft distracters for the MCQs is also a useful approach. Though this approach is not without its drawbacks, it has been shown in this study that in place of interviews (which are time-consuming), this is also an approach worth considering. The first author's teaching experience has come in useful in this regard.

Overall, this study has provided further support for the use of the 4-tier format in diagnostic studies on ACs.

Contributions to the literature

The contribution of this study to the literature can be summarized as follows:

(a) While a number of previous studies on acid–base chemistry have documented several ACs, a number of these may not be true ACs – some of these could be due to the lack of knowledge, which are not apparent when traditional MCQs, two-tier MCQs or certain other approaches are used. In the present study, the use of confidence ratings has helped significantly to discriminate ACs from responses due to guessing or lack of knowledge.

(b) The use of a ‘trick’ question in the test items has shed some useful light – elaborated in item (d) in the section on implications. It is worth exploring this aspect further in a separate study.

Limitations

The relatively small sample size does not represent the entire chemistry student population in the country. There could also be some subjectivity on the part of students in expressing their confidence to their responses to the questions. It is also to be noted that analysis of data which are MCAR is based on a first approximation.

Appendix A

Confidence measures of ACs

Consonant with approaches used in psychology (Stankov and Crawford, 1997; Lundeberg et al., 2000), the following measures were also computed for each tier of every item, as well as for the entire test, using the confidence ratings provided by the students (See also Caleon & Subramaniam, 2010):

(a) CF: mean confidence rating.

where m = number of items; n = number of students in the group; c = confidence rating for an item

(b) CFC: mean confidence rating for an item with a correct response.

(d) CAQ: Confidence Accuracy Quotient, which provides an indication of whether the students can distinguish between what they know and what they do not know. It can be interpreted as a student's meta-cognitive ability (Potgieter and Davidowitz, 2011).

For an item,

(e) CB (Confidence Bias)

This is a measure of how accurate an individual is in assessing his or her own performance.

The CB for an item is calculated as follows (Caleon and Subramaniam, 2010):

where n = number of students in the group; c_i = a student's recoded confidence rating for the item; and p_i = a student's score for an item.

Since the scores of both-tiers were used as a measurement of performance, for the whole group, p is equivalent to B, which is the proportion correct for both-tiers. B ranges from 0 to 1. The confidence rating scale also needs to be recoded to a value range that is the same as the student's score p, that is, B.

This can be achieved by the conversion

Therefore, the whole expression becomes (Sreenivasulu and Subramaniam, 2013)

where B is the proportion correct for both-tiers.

CB has a range from −1.00 to 1.00. It is used and interpreted as an index that reflects on the matching/mismatching of the students perceived confidence in their performance (in answering the questions) with their actual performance. The underlying assumption is that, for self-aware students, their perceived confidence is proportional to their actual performance, that is, higher confidence rating translates directly into higher performance. Hence, a CB value of zero depicts the ‘ideal situation’ where there is exact calibration between their confidence levels and their performance scores. A highly positive CB value would occur when CF is high, accompanied by a low B value. This means that the students could feel very confident with their answers (and presumably should have scored well) but their actual performances were low, that is, they were over confident!

For the purpose of identifying ACs, a combination of percentage of answer–reason combination and associated confidence ratings were used to analyze responses for both-tiers. The types of ACs are categorized following the approach of Caleon and Subramaniam (2010):

(a) Significant alternative conception (SiAC) – answer-reason option selected by 10% or higher of the respondents (See also Tan et al., 2002).

(b) Spurious alternative conception (SpAC) – AC with a mean confidence rating of 3.5 and below. This could be due to lack of knowledge and/or understanding as well as guessing.

(d) Moderate alternative conception (MAC) – genuine AC with an associated mean confidence rating between 3.5 and 4.0.

(e) Strong alternative conception (SAC) – AC with an associated mean confidence rating of 4.0 and above.

Appendix B

Analyses of missing data

The prevalence of missing data is likely to be due to the nature of four-tier MCQs as students have to provide four responses for each question in the test paper, which runs into a number of pages. For a 25 item instrument comprising four-tier MCQs, the total number of responses to be provided by each student would be 25 × 4 = 100. It is very likely that some students would have missed out filling in the relevant information for a number of items. While reviewing studies on ACs using multi-tier MCQs, it was noted that very few authors mentioned about missing data. It is likely that samples with missing data were excluded from the analyses (for example, McClary and Bretz, 2012) or that missing data was not an issue. While it is possible to analyze the data based on intact data alone, some compromises would need to be made while interpreting the findings. An appropriate solution would be to get additional students to take the test. However, as it was already near the end of the school calendar year when the data were gathered, new samples could only be obtained in the following year to make up for the numbers. The time factor was thus not in favor.

A quick analysis was done to tabulate the distribution of missing data (Table 3).

Table 3 Distribution of missing data

Number of missing data	% missing out of maximum	Frequency	% of sample
0	0.00	92	65.25
1	1.09	27	19.15
2	2.17	8	5.67
3	3.26	6	4.26
4	4.35	3	2.13
7	7.61	1	0.71
10	10.87	1	0.71
13	14.13	1	0.71
28	30.43	1	0.71
30	32.61	1	0.71

The two scripts with 28 and 30 missing data respectively (i.e. about one third of the responses were not provided) had to be discarded due to too many missing information.

An alternative approach of using imputing (Rubin, 1987) to address the issue was also considered – substituting for the missing information. This would not work well with missing responses for the answer and reason tiers but may be possible for the confidence ratings. The use of isomorphic questions (there are a few in the instrument) may be able to address this issue to a limited extent in terms of determining the consistency with which students answered similar questions. Isomorphic questions test on the same concept but in different contexts and so the combinations of answer–reason pairs are expected to be somewhat similar to those with missing responses. Even so, responses from isomorphic questions cannot be unequivocally used to substitute data for those with missing responses as there is a possibility that students may perceive the set of isomorphic questions differently since the questions are set in different contexts to test their understanding and their application of the same concepts, and they may end up providing different responses, including confidence ratings.

The main study required detailed analyses of the data ranging from the correctness of the answer tier, reason tier, both tiers and associated confidence ratings to ascertain and confirm the ACs manifested in the students. The validity of substituting missing confidence ratings for the answer and/or reason tiers with other values requires closer examinations. Missing confidence ratings could possibly be addressed in two ways:

(a) Using the assumption that the student is equally confident in choosing their answer and reason, one can consider using the other available confidence rating as a substitute. For example, if the confidence rating for the reason tier is missing, it can be assumed to be the same as the confidence rating for the answer tier. This, however, does not make sense if either or both of the answer and reason response is/are missing as well.

(b) Using the mid-point value of 3.5 as a substitute for the missing confidence rating. However, based on the findings from the pilot phase, the average confidence rating was slightly below the mid-point value. Hence, using 3.5 could have artificially inflated the confidence ratings, especially in cases where students may have resorted to guessing as they do not know the answer and/or reason.

For the 139 scripts, the Pearson correlation coefficients between the confidence rating for the answer tier and the confidence rating for the reason tier were computed, based on complete responses for both tiers (Table 4):

Table 4 Pearson correlation coefficients of confidence ratings between answer and reason tiers

Item	1	2	3	4	5	6	7	8	9
Values were computed according to complete responses for each question.
r	0.667	0.626	0.623	0.688	0.587	0.709	0.746	0.663	0.706

Item	10	11	12	13	14	15	16	17	18	19
r	0.690	0.392	0.592	0.705	0.600	0.753	0.565	0.799	0.572	0.688

Item	21	22	23	25
r	0.677	0.752	0.677	0.576

The overall Pearson correlation coefficient (based on every complete pair of responses to every item) was r(3116) = 0.68, p < 0.001. All these values suggested positive correlations between the two confidence ratings. However, it was felt that the correlations were not strong enough to justify substituting one missing confidence rating with the corresponding confidence rating from the other tier. Furthermore, students may also perceive the answer and reason tiers separately as two different multiple-choice questions (Griffard and Wandersee, 2001). To complicate matters further, as the knowledge level measured in each tier can be somewhat different (Tsai and Chou, 2002), students should reasonably indicate different levels of confidence, thus making it unsound to assume that the two tiers are of equivalent levels. This situation was observed in the study on ACs using 4TMC items (Caleon and Subramaniam, 2010), where half of the samples showed different confidence levels for the answer and reason tiers.

The means of the confidence ratings were also worked out for each item (Table 5):

Table 5 Mean confidence ratings of answer and reason tiers

Question	1	2	3	4	5	6	7	8	9	11
Mean CR(A) – Mean confidence rating for the answer tier for an item. Mean CR(R) – Mean confidence rating for the reason tier for an item. Values were computed according to complete responses for each question.
Mean CR(A)	3.68	3.99	3.31	3.73	3.66	4.19	3.73	3.60	2.97	4.17
Mean CR(R)	3.34	3.68	3.33	3.62	3.73	3.92	3.72	3.53	3.05	3.93

Question	12	13	14	15	16	17	18	19	20	21
Mean CR(A)	3.71	3.01	3.71	3.77	3.09	3.78	3.72	2.57	3.64	3.71
Mean CR(R)	3.71	2.94	3.36	3.71	2.80	3.65	3.38	2.27	3.41	3.48

Question	22	23	25
Mean CR(A)	3.37	3.95	3.73
Mean CR(R)	3.30	3.41	3.45

With respect to the answer tier, more items have mean confidence ratings greater than 3.5. For the reason tier, fewer items have mean confidence ratings greater than 3. Hence, if the mid-point value of 3.5 is used to substitute the missing confidence ratings, it would result in.

(a) deflation of confidence rating for the answer tier, and

(b) inflation of confidence rating for the reason tier

There were only 92 scripts without any data omission. For the remaining 47 scripts (more than one third of the population), 21 out of the 23 items were affected (Table 6). In view of the big number of computations (FI, DI, confidence ratings, etc.) necessary, the analyses would be rendered unnecessarily complicated and excessively time consuming.

Table 6 Number of missing responses for each item

Item number	Missing responses	Item number	Missing responses	Item number	Missing responses
1	0	12	2	22	4
2	0	13	2	23	8
3	1	14	1	25	7
4	8	15	15
5	8	16	3
6	1	17	2
7	5	18	1
8	8	19	9
9	8	20	3
11	4	21	3

After all these considerations, the idea of concentrating the analyses on the 92 ‘perfect’ scripts was explored. To justify this approach, the respective means and standard deviations of the mean cognitive scores of the 139 ‘total’ scripts versus the 92 ‘perfect’ scripts were computed. Imputation for missing items in the response options was done by using the standard approach, whereby items with no answer or reason indicated are scored zero. This was followed by two-tailed t-tests on the mean total scores of these two sets of scripts to see whether there were any statistically significant differences.

The mean scores and standard deviations for the different sample sizes are very close to each other for the answer tier, reason tier and both tiers (Table 7). In addition, all the p-values are greater than 0.05. Hence, there was no statistically significant difference in student performance based on the 139 scripts versus the 92 scripts; that is, the impact of omitting the scripts with missing information would appear to be minimal as far as the performance of the students was concerned.

Table 7 Comparison of the means, standard deviations and t-tests of the ‘total’ and ‘perfect’ samples' cognitive scores

	Answer		Reason		Both
	N = 139^a	N = 92	N = 139^a	N = 92	N = 139^a	N = 92
a Items with missing answer or reason are scored zero.
Mean	30.65	30.15	27.81	26.32	16.80	16.16
Std. dev.	11.27	10.41	11.68	10.96	10.68	9.59
t-test	t(229) = 0.34, p = 0.73		t(229) = 0.97, p = 0.33		t(229) = 0.46, p = 0.65

A somewhat more robust form of analysis was also done to explore whether the data are missing completely at random (MCAR) (Little and Rubin, 1987; Howell, 2007). If data are MCAR, then it is acceptable to use list-wise deletion followed by complete case analysis (Howell, 2007). For this, the distribution of data in the two samples (one with 92 scripts with no missing data and another with 47 scripts with missing data) for one of the variables can be examined. Two items (items 1 and 2) were selected for analysis, and in both samples, these items had no missing data. Only the confidence ratings were chosen for analysis because the values for each tier can be assumed to span a continuum (1 to 6). The summated cognitive scores for each question were not used as for each question, the score is dichotomous (0 or 1) in nature, and this prevents the use of parametric statistics for analysis. The results are summarized in Table 8, which presents a similar picture as that shown in Table 5. The mean confidence ratings and standard deviations were reasonably close to each other for both items and all tiers. More importantly, all the p-values were again greater than 0.05. It can therefore be reasonably concluded that there is no statistically significant differences in confidence ratings, based on the 47 scripts with missing data and the 92 ‘perfect’ scripts for these two items.

Table 8 Comparison of means, standard deviations and t-tests of ‘missing data’ sample's and ‘perfect’ sample's confidence rating for items with no missing response

	Answer-tier		Reason-tier		Both-tiers
	N = 47	N = 92	N = 47	N = 92	N = 47	N = 92
Item 1
Mean	3.68	3.66	3.34	3.33	3.51	3.49
Std. dev.	1.00	1.12	1.01	1.15	0.93	1.03
t-test	t(137) = 0.09, p = 0.93		t(137) = 0.07, p = 0.94		t(137) = 0.09, p = 0.93

Item 2
Mean	4.00	4.00	3.57	3.75	3.79	3.88
Std. dev.	0.98	1.15	1.14	1.11	0.98	1.01
t-test	t(137) = 0.00, p = 1.00		t(137) = 0.88, p = 0.38		t(137) = 0.49, p = 0.62

In other words, the sample with 92 scripts can be considered to be a random sample of the larger sample with 139 scripts. This further supports the notion that the impact of omitting the scripts with missing data would be minimal. It can therefore be reasonably concluded that the data are approximately missing completely at random (MCAR).

Appendix C

Table 9

Table 9 Proportion of students who gave correct responses and the values of relevant confidence measures per item of ABCDI (N = 92)

Qn	DI^a	Proportion correct			A tier					R tier					B tiers
Qn	B	A	R	B	CF	CFC	CFW	CAQ	CB	CF	CFC	CFW	CAQ	CB	CF	CFC	CFW	CAQ	CB
a DI was computed based on both-tiers (proportion of correct responses from approximately top 25% – proportion of correct responses from approximately bottom 25%). ^bQuestion 24 was omitted from computation of DI as it was later found to have some shortcomings. The trick question (Q25) was also omitted from computation of DI as no student got it correct.
1	0.43	0.35	0.33	0.27	3.66	3.63	3.68	−0.05	0.26	3.33	3.63	3.18	0.40	0.19	3.49	3.86	3.36	0.49	0.23
2	0.39	0.24	0.33	0.22	4.00	3.82	4.06	−0.21	0.38	3.75	4.03	3.61	0.38	0.33	3.88	3.93	3.86	0.06	0.36
3	0.70	0.30	0.40	0.30	3.42	3.75	3.28	0.49	0.18	3.46	3.92	3.15	0.76	0.19	3.44	3.93	3.23	0.81	0.18
4	0.39	0.21	0.24	0.20	3.67	3.58	3.70	−0.12	0.34	3.62	3.68	3.60	0.08	0.33	3.65	3.75	3.62	0.14	0.33
5	0.43	0.53	0.59	0.34	3.72	4.02	3.37	0.47	0.21	3.75	4.11	3.24	0.69	0.21	3.73	4.13	3.53	0.51	0.21
6	0.09	0.23	0.25	0.07	4.25	4.29	4.24	0.04	0.58	3.97	3.65	4.07	−0.35	0.53	4.11	3.58	4.15	−0.52	0.56
7	0.52	0.61	0.51	0.45	3.83	4.11	3.39	0.52	0.12	3.79	4.17	3.40	0.61	0.11	3.81	4.29	3.42	0.71	0.12
8	0.26	0.70	0.30	0.20	3.59	3.78	3.14	0.48	0.32	3.49	3.93	3.30	0.48	0.30	3.54	4.11	3.40	0.58	0.31
9	0.39	0.55	0.33	0.17	2.83	3.04	2.56	0.36	0.19	2.90	3.07	2.82	0.19	0.21	2.86	3.56	2.72	0.70	0.20
11	0.26	0.36	0.30	0.12	4.26	4.12	4.34	−0.22	0.53	4.01	3.82	4.09	−0.26	0.48	4.14	3.95	4.16	−0.25	0.51
12	0.26	0.15	0.40	0.12	3.79	3.79	3.79	−0.01	0.44	3.88	4.16	3.69	0.47	0.46	3.84	4.14	3.80	0.37	0.45
13	0.17	0.35	0.24	0.16	3.08	3.22	3.00	0.15	0.25	2.93	2.95	2.93	0.02	0.22	3.01	3.13	2.98	0.12	0.24
14	0.09	0.25	0.26	0.14	3.77	3.65	3.81	−0.14	0.41	3.35	3.50	3.29	0.18	0.33	3.56	3.69	3.54	0.16	0.37
15	0.13	0.15	0.07	0.05	3.78	3.36	3.86	−0.44	0.50	3.78	3.17	3.83	−0.49	0.50	3.78	3.40	3.80	−0.34	0.50
16	0.09	0.18	0.22	0.13	3.12	3.35	3.07	0.21	0.29	2.68	3.10	2.57	0.39	0.21	2.90	3.29	2.84	0.38	0.25
17	0.09	0.03	0.08	0.03	3.84	4.33	3.82	0.46	0.53	3.66	3.57	3.67	−0.08	0.50	3.75	4.33	3.73	0.54	0.52
18	0.00	0.34	0.16	0.11	3.80	3.81	3.80	0.00	0.45	3.39	2.67	3.53	−0.78	0.37	3.60	2.95	3.68	−0.78	0.41
19	0.17	0.23	0.22	0.12	2.45	2.33	2.48	−0.11	0.17	2.17	2.10	2.19	−0.08	0.12	2.31	2.41	2.30	0.10	0.14
20	0.57	0.65	0.43	0.36	3.59	3.73	3.31	0.39	0.16	3.36	3.48	3.27	0.19	0.11	3.47	3.56	3.42	0.13	0.14
21	0.17	0.38	0.20	0.15	3.78	3.66	3.86	−0.19	0.40	3.49	2.83	3.65	−0.67	0.35	3.64	3.00	3.75	−0.73	0.38
22	0.00	0.13	0.07	0.01	3.42	3.42	3.43	−0.01	0.47	3.26	2.67	3.30	−0.51	0.44	3.34	3.50	3.34	0.14	0.46
23	0.00	0.01	0.14	0.00	4.07	1.00	4.10	−2.82	0.61	3.51	3.38	3.53	−0.12	0.50	3.79	—	3.79	—	0.56
25	0.00	0.00	0.00	0.00	3.74	—	3.74	—	0.55	3.42	—	3.42	—	0.48	3.58	—	3.58	—	0.52

Mean	0.24	0.30	0.26	0.16	3.63	3.54	3.56	−0.03	0.36	3.43	3.44	3.36	0.07	0.33	3.53	3.64	3.48	0.16	0.34
SD	0.20	0.20	0.15	0.12	0.43	0.72	0.49	0.68	0.15	0.43	0.56	0.44	0.44	0.14	0.42	0.49	0.44	0.46	0.15

Appendix D

Table 10

Table 10 List of alternative conceptions identified in the main study

	Alternative conception	Answer–reason	% of sample with AC	Mean CFW	Std dev.	AC Type
Properties of acids/bases
AC1	Acids are corrosive (concentration of acid is not considered).	Q1A(d)	15.22	3.36	1.28	SpAC
AC2	Acids are more dangerous than bases because acids are more reactive than bases.	Q3A(a)	35.87	3.30	0.88	SpAC
AC3	Rain water in an unpolluted area is neutral.	Q2D(a)	19.57	4.00	0.77	SAC
AC3	Rain water in an unpolluted area is neutral.	Q2D(c)	32.61	4.23	1.25	SAC
AC4	A compound containing H will produce H⁺, and a compound containing OH will produce OH⁻.	Q5A(a)	11.96	3.82	1.47	MAC
AC4		Q5B(b)	16.30	3.53	1.06	MAC
AC5	The higher the basicity of an acid, the more basic it is.	Q25A(d)	10.87	3.40	0.97	SpAC
AC6	An acid conducts electricity in solution because of free electrons.	Q8C(b)	11.96	3.64	1.29	MAC

Strengths of acids/bases
AC7	Pure ethanoic acid is a stronger acid than aqueous ethanoic acid.	Q4B(c)	17.39	3.88	1.02	MAC
AC8	An acid that produces a higher H⁺ concentration is the stronger acid (initial acid concentration not considered).	Q6C(b)	15.22	4.64	1.15	SAC
		Q6D(b)	10.37	3.70	1.16	MAC
		Q11C(c)	11.96	4.18	0.87	SAC
AC9	An acid with a higher initial concentration is the stronger acid.	Q6C(c)	10.87	4.10	1.37	SAC
AC10	A dibasic acid is a stronger acid than a monobasic acid.	Q6D(a)	22.83	4.71	0.96	SAC
AC11	The higher the initial base concentration, the stronger the base.	Q7D(c)	10.87	3.10	1.20	SpAC
AC12	The higher the initial number of moles of the base, the stronger the base.	Q7D(d)	14.29	4.38	1.33	SAC
AC13	For two weak acids, the one that produces more H⁺ is the stronger acid (without consideration of the fraction dissociated).	Q12A(d)	15.22	3.86	0.95	MAC
		Q12D(c)	14.13	3.69	1.55	MAC
		Q12D(d)	22.83	4.00	0.71	SAC
AC14	A dibasic weak acid produces twice the amount of H⁺ as compared to a monobasic weak acid (without consideration of relative strengths).	Q13B(b)	11.96	3.73	1.79	MAC

Indicators
AC15	At the end point, the indicator changes colour as it has no H⁺ to react with.	Q14C(a)	43.48	3.98	0.95	MAC

pH
AC16	pH ranges only from 1 to 14.	Q15B(d)	65.22	4.03	1.07	SAC
AC17	pH of pure distilled water increases slowly with time as it is exposed to air due to self-ionization.	Q16D(b)	25.00	3.39	1.12	SpAC
AC18	Colas have a pH of about 5.	Q17B(b)	55.43	4.14	1.00	SAC
AC19	pH of an aqueous acid depends on the concentration of the acid only.	Q25B(a)	17.39	3.69	1.01	MAC
AC20	The pH of an aqueous acid depends on the strength of the acid only.	Q25D(c)	14.13	4.31	1.32	SAC
AC21	pH is a measure of the total number of moles of H⁺.	Q25D(b)	28.26	3.54	1.03	MAC
AC22	pH of an aqueous acid is independent of its concentration.	Q25D(c)	14.13	4.31	1.32	SAC

Sub-microscopic views of acids/bases
AC23	Equimolar amounts of aqueous and pure ethanoic acid produce the same number of moles of H⁺.	Q4A(e)	16.30	4.00	0.65	SAC
AC24	Pure ethanoic acid (in the liquid state) produces more H⁺ than aqueous ethanoic acid.	Q4B(d)	19.57	4.06	0.80	SAC
AC25	Pure ethanoic acid (in the liquid state) produces fewer H⁺ than aqueous ethanoic acid.	Q4C(b)	15.22	3.36	1.01	SpAC
AC26	An alkaline solution contains no H⁺.	Q18A(a)	13.04	4.08	1.00	SAC
AC27	In an aqueous solution of NaOH, Na⁺ is paired up with OH⁻.	Q18C(a)	20.65	3.79	0.71	MAC
AC27	In an aqueous solution of NaOH, Na⁺ is paired up with OH⁻.	Q18C(b)	10.87	3.00	0.94	SpAC
AC28	There are more NH₄⁺ than NH₃ molecules in aqueous ammonia because NH₃ reacts with water to form NH₄⁺, which is stabilized by the water molecules, causing the equilibrium to shift to form even more NH₄⁺.	Q19C(b)	15.22	2.86	1.35	SpAC

Neutralisation
AC29	Upon complete neutralisation between an acid and a base, there is neither H⁺ nor OH⁻ left.	Q21C(a)	41.30	4.00	1.04	SAC
		Q21D(a)	14.13	4.00	0.91	MAC
		Q22C(a)	11.96	3.64	0.67	MAC
		Q23A(a)	11.96	3.73	0.65	MAC
		Q23B(a)	39.13	4.19	0.89	SAC
		Q23B(d)	18.48	4.41	1.00	SAC
AC30	Upon complete neutralisation between an acid and a base, the cations and anions of the salt formed are paired up.	Q23B(a)	39.13	4.19	0.89	SAC
AC30		Q23B(d)	18.48	4.41	1.00	SAC

Acknowledgements

Our thanks go to the schools and students which participated in this study. We also thank the reviewers for their constructive comments on an earlier version of this manuscript. RS thanks the Office of Education Research for the award of a research grant (OER40/08 RS). (The findings reported in this paper are based on the authors' interpretations of the results and are not to be construed as necessarily reflecting the views of any of the national agencies mentioned).

References

Artdej R., Ratanaroutai T., Coll R. K. and Thongpanchang T., (2010), Thai Grade 11 students' alternative conceptions for acid–base chemistry, Res. Sci. & Tech. Educ., 28(2), 167–183.
Banerjee A. C., (1991), Misconceptions of students and teachers in chemical equilibrium, Int. J. Sci. Educ., 13(4), 487–494.
Caleon I. and Subramaniam R., (2010), Do students know what they know and what they don't know? Using a four-tier diagnostic test to assess the nature of students' alternative conceptions, Res. Sci. Educ., 40(3), 313–337.
Cetin-Dindar A. and Geban O., (2011), Development of a three-tier test to assess high school students' understanding of acids and bases, Procedia Soc. Behav. Sci., 15, 600–604.
Chiu M. H., (2004), An investigation of exploring mental models and causes of secondary school students' misconceptions in acids-bases, particle theory, and chemical equilibrium. Annual Report to the National Science Council in Taiwan (in Chinese) Taiwan: National Science Council. Quoted in Chiu, M. H., (2007), A national survey of students' conceptions of chemistry in Taiwan, Int. J. Sci. Educ., 29(4), 421–452.
Chiu M. H., (2007), A national survey of students' conceptions of Chemistry in Taiwan, Int. J. Sci. Educ., 29(4), 421–452.
Cros D., Maurin M., Amouroux R., Chastrette M., Leber J. and Fayol M., (1986), Conceptions of first-year university students of the constituents of matter and the notions of acids and bases, Eur. J. Sci. Educ., 8, 305–313.
Demircioğlu G., Ayas A. and Demircioğlu H., (2005), Conceptual change achieved through a new teaching program on acids and bases, Chem. Educ. Res. Pract., 6(1), 36–51.
Drechsler M. and Schmidt H. J., (2005), Textbooks' and teachers' understanding of acid–base models used in chemistry teaching, Chem. Educ. Res. Pract., 6, 19–35.
Gabel D. L. and Bunce D. M., (1994), Research on problem solving: chemistry, in Gabel D. L. (ed.) Handbook of Research on Science Teaching and Learning, New York: Macmillan, pp. 301–326.
Gabel D. L., Samuel K. V. and Hunn D., (1987), Understanding the particulate nature of matter, J. Chem. Educ., 64(8), 695–697.
Glassman S., (1967), High school students' ideas with respect to certain concepts related to chemical formulas and equations, Sci. Educ., 51, 84–103.
Griffard P. B. and Wandersee J. H., (2001), The two-tier instrument on photosynthesis: what does it diagnose? Int. J. Sci. Educ., 23(10), 1039–1052.
Hand B. M. and Treagust D. F., (1988), Application of a conceptual conflict strategy to enhance student learning of acids and bases, Res. Sci. Educ., 18, 53–63.
Heitjan D. F., Basu S., (1996), Distinguishing “missing at random” and “missing completely at random”, Am. Stat., 50(3), 207–213.
Herron J. D., (1975), Piaget for chemists. Explaining what ‘good’ students cannot understand, J. Chem. Educ., 52(3), 146–150.
Ho S. C., (2009), Characteristics of East Asian learners: What we learned from PISA, Educ. Res. J., 24(2), 327–348.
Howell D. C., (2007), The analysis of missing data, in Outhwaite W. and Turner S. (ed.), Handbook of Social Science Methodology, London: Sage.
Hsu L. R., (2004), A study of conceptual formation on classification of chemical property, Annual Report to the National Science Council in Taiwan (in Chinese), Taiwan: National Science Council, Quoted in Chiu, M. H. (2007), A national survey of students' conceptions of chemistry in Taiwan, Int. J. Sci. Educ., 29(4), 421–452.
Huang W. C., (2004), The types and causes of misconceptions of elementary students on acids–bases. Annual Report to the National Science Council in Taiwan (in Chinese) Taiwan: National Science Council. Quoted in Chiu M. H., (2007), A national survey of students' conceptions of chemistry in Taiwan, Int. J. Sci. Educ., 29(4), 421–452.
Juang C. S., (2004), The misconceptions of secondary school students on material science and organic compounds, Annual Report to the National Science Council in Taiwan (in Chinese) Taiwan: National Science Council. Quoted in Chiu M. H., (2007), A national survey of students' conceptions of chemistry in Taiwan, Int. J. Sci. Educ., 29(4), 421–452.
Kavanaugh R. H. and Moomaw W. R., (1981), Including formal thought in introductory chemistry students, J. Chem. Educ., 58, 263–265.
Kozma R. and Russell J., (1997), Multimedia and understanding: expert and novice responses to different representations of chemical phenomena, J. Res. Sci. Teach., 34, 949–968.
Little R. J. A. and Rubin D. B., (1987), Statistical Analysis with Missing Data, New York: Wiley.
Lundeberg M. A., Fox P. W., Brown A. C. and Elbedour S., (2000), Cultural influences on confidence: country and gender, J. Educ. Psychol., 92(1), 152–159.
McClary L. M. and Bretz S. L., (2012), Development and assessment of a diagnostic tool to identify organic chemistry students' alternative conceptions related to acid strength, Int. J. Sci. Educ., 34(15), 2317–2341.
Nakhleh M. B., (1992), Why some students don't learn chemistry: chemistry misconceptions, J. Chem. Educ., 69(3), 191–196.
Nakhleh M. B. and Krajcik J. S., (1994), Influence of levels of understanding as presented by different technologies on students' understanding of acid, base and pH concepts. J. Res. Sci. Teach., 31, 1077–1096.
Nyachwaya J. M., Mohamed A. R., Roehrig G. H., Wood N. B., Kern A. L. and Schneider J. L., (2011), The development of an open-ended drawing tool: an alternative diagnostic tool for assessing students' understanding of the particulate nature of matter, Chem. Educ. Res. Pract., 12, 121–132.
Othman J., Treagust D. F. and Chandrasegaran A. L., (2008), An investigation into the relationship between students' conceptions of the particulate nature of matter and their understanding of chemical bonding, Int. J. Sci. Educ., 30(11), 1531–1550.
Özmen H., Demircioğlu G. and Coll R. K., (2009), A comparative study of the effects of a concept mapping enhanced laboratory experience on Turkish high school students' understanding of acid–base chemistry, Int. J. Sci. Math. Educ., 7, 1–24.
Özmen H., Demircioǧlu G., Burhan Y., Naseriazar A., Demircioǧlu H. and Test A. B. A., (2012), Using laboratory activities enhanced with concept cartoons to support progression in students' understanding of acid-base concepts, Asia-Pacific Forum on Science Learning and Teaching, 13(1), 1–29.
Popham W. J. and Husek T. R., (1969), Implications of criterion-referenced measurement, J. Educ. Meas., 6(1), 1–9.
Potgieter M. and Davidowitz B., (2011), Preparedness for tertiary chemistry: multiple applications of the Chemistry Competence Test for diagnostic and prediction purposes, Chem. Educ. Res. Pract., 12, 193–204.
Rosenthal D. P. and Sanger M. J., (2012), Student misinterpretations and misconceptions based on their explanations of two computer animations of varying complexity depicting the same oxidation–reduction reaction, Chem. Educ. Res. Pract., 13, 471–483.
Ross B. H., (1989), High-school students' concepts of acids and bases. Unpublished Master's thesis, Queen's University, Kingston, Ontario.
Rubin D. B., (1987), Multiple Imputation for Nonresponse in Surveys, Wiley & Sons: New York.
Saglam Y., Karaaslan E. H. and Ayas A., (2011), The Impact of Contextual Factors On The Use Of Students' Conceptions, Int. J. Sci. Math. Educ., 9(6), 1391–1413.
Schmidt H. J., (1991), A label as a hidden persuader: chemists. neutralization concept, Int. J. Sci. Educ., 13, 459–471.
Schmitt N., (1996), Uses and Abuses of Coefficient Alpha, Psychological Assessment, 8(4), 350–353.
Sheppard K., (2006), High school students' understanding of titrations and related acid–base phenomena, Chem. Educ. Res. Pract., 7(1), 32–45.
Sreenivasulu B. and Subramaniam R., (2013), University students' understanding of chemical thermodynamics, Int. J. Sci. Educ., 35(4), 601–635.
Stankov L. and Crawford J. D., (1997), Self-confidence and performance on test of cognitive abilities, Intelligence, 25(2), 93–109.
Tan K. C. D., Goh N. K., Chia L. S. and Treagust D. F., (2002), Development and application of a two – tier multiple choice diagnostic instrument to assess high school students' understanding of inorganic chemistry qualitative analysis*. J. Res. Sci. Teach., 39(4), 283–301.
Tien T. L., Teichert A. M. and Rickey D., (2007), Effectiveness of a MORE laboratory module in prompting students to revise their molecular-level ideas about solutions, J. Chem. Educ., 84, 175–181.
Treagust D. F., (1988), Development and use of diagnostic tests to evaluate students' misconceptions in science, Int. J. Sci. Educ., 10, 159–169.
Tsai C. C. and Chou C., (2002), Diagnosing students' alternative conceptions in science, J. Comput. Assist. Lear., 18, 157–165.
Zoller U., (1990), Student's misunderstandings and misconceptions in college freshmen chemistry (general and organic), J. Res. Sci. Teach., 27, 1053–1065.

Click here to see how this site uses Cookies. View our privacy policy here.

Item number	Missing responses	Item number	Missing responses	Item number	Missing responses
1	0	12	2	22	4
2	0	13	2	23	8
3	1	14	1	25	7
4	8	15	15
5	8	16	3
6	1	17	2
7	5	18	1
8	8	19	9
9	8	20	3
11	4	21	3

Item number	Missing responses	Item number	Missing responses	Item number	Missing responses
1	0	12	2	22	4
2	0	13	2	23	8
3	1	14	1	25	7
4	8	15	15
5	8	16	3
6	1	17	2
7	5	18	1
8	8	19	9
9	8	20	3
11	4	21	3

Item number	Missing responses	Item number	Missing responses	Item number	Missing responses
1	0	12	2	22	4
2	0	13	2	23	8
3	1	14	1	25	7
4	8	15	15
5	8	16	3
6	1	17	2
7	5	18	1
8	8	19	9
9	8	20	3
11	4	21	3