Considerations of sample size in chemistry education research: numbers do count but context matters more!

Gwendolyn Lawrie
School of Chemistry & Molecular Biosciences, The University of Queensland, St. Lucia, Qld 4072, Australia. E-mail: g.lawrie@uq.edu.au

Received 1st September 2021 , Accepted 1st September 2021

Abstract

A question which often arises for chemistry education researchers, and is also frequently raised by reviewers of Chemistry Education Research and Practice (CERP) articles, is whether a research data sample size (N) is big enough? However, the answer to this question is more complicated than a simple ‘yes’ or ‘no’! In fact, there is substantial discussion of this issue within research literature which can make it even harder for a researcher to decide.


Research that is reported in CERP articles tends to be situated in any one or more of four contexts: primary school classrooms, high school classrooms, tertiary learning environments, and public engagement. Potential participants that are recruited in research studies across each context include primary, secondary or tertiary students and teachers (pre-service and in-service teachers), teaching assistants and the general public. The participation of more than one of these groups may be included in a single study. Further, the unit of analysis in a study can extend beyond human participants and include further units of observation, such as de-identified artefacts of learning or teaching activities.

The typical process of research data collection involves ‘sampling’ which is the selection of a number of participants, or artefacts, to be measured, and can be considered to represent a larger population or collection in the context of the study. There are no universal criteria available to guide researchers in terms of what quantity of each unit of analysis or unit of observation is ‘enough’. Ideal sample size is dependent on the research context and study aims. In this editorial several considerations are presented, to assist readers and future authors, informed by published articles related to the field of CER along with highly regarded education research work perspectives drawn from beyond our field. The purpose is to encourage authors towards clearly communicating any processes which they have applied to sample participants (or artefacts of teaching and learning) in their studies while also acknowledging potential limitations or biases that arise. This article is by no means a comprehensive overview of the topic and readers are encouraged to go to cited works to gain a deeper perspective.

Approaches to sampling based on the study context or research question

While a researcher can seek out guidance from a wide variety of sources to inform their research methods, exemplars from their own field often provide the best insight. The first consideration regarding sampling is to provide a clear rationale for the approach which has been taken to data collection, Fig. 1 provides a summary of common sampling strategies that influence the sample composition. Probability sampling is where each unit of analysis has an equal known probability of being selected for inclusion. A representative sample will include an appropriate number of units of analysis that reflect the properties, characteristics, or traits of the whole population. Non-probability sampling is where the chance or probability of each unit of analysis being selected is unknown and there is a greater potential for selection bias to be introduced.
image file: d1rp90009a-f1.tif
Fig. 1 Overview of different sampling strategies applying probability and non-probability sampling approaches (boxes and oval frames indicate the recruitment of study participants).

Qualitative, quantitative and mixed method research paradigms each attract separate considerations with regard to sampling – the sample size and composition becomes very important in terms of the perceived quality of the analysis subsequently applied. Quantitative studies are usually attributed as requiring a minimum sample size to ensure that statistical analysis techniques provide valid results. The sample size can be calculated using power analysis based on the target population size and probability of finding a statistically significant result. These calculations estimate the minimum units of analysis that are required according to the choice of statistical method. However, researchers aiming to complete a confirmatory factor analysis followed by structural equation modelling are advised to apply careful thought to sample size, missing data, biases and latent variables since they are specific to the model under consideration (Wolf et al., 2013). An inherent challenge in recruiting participants to complete survey-based instruments is the potential introduction of non-response bias based on low response rates (the number of usable data sets in proportion to the number of participants approached). We recommend that researchers report response rates and comment on sample quality in their articles.

Qualitative research methods are highly contextualised, hence it is not feasible to ‘calculate’ an ideal minimum sample size. Rather the theoretical basis of the research paradigm becomes important in guiding data collection, for example, Guba (1981) explains that trustworthiness is increased in rationalistic treatments of data when probability sampling approaches are adopted, however in naturalistic treatments, purposive sampling is more appropriate. Herrington and Daubenmire (2014) have provided chemistry education researchers with examples of sampling approaches for different types of qualitative research studies in our field.

Onwuegbuzie and Collins (2007) provide a useful synthesis of literature to recommend ranges of sample sizes which may apply in different research methodology approaches. They also remind readers that the false dichotomy of qualitative and quantitative approaches does not delineate sampling approaches (Onwuegbuzie and Leech, 2007). In mixed methods research, it is rare to find a combination of qualitative and quantitative data that both involve random sampling, and much more common to find a combination of non-probability sampled quantitative and qualitative data. There are many examples where participants are sampled by applying one strategy for a quantitative phase of a study followed by a different strategy for a qualitative phase. A minimum ideal number of interviews that can be calculated as being required to be representative of the population that completed a quantitative instrument does not exist – this sample number is dependent on the nature of the research question or study aim.

The discussion of a sample size is often linked to the quality of a study and potential generalizability of findings, often referred to as external validity. There are three models of generalizability: statistical generalization, analytical generalization and case to case transfer (transferability) (Polit and Beck, 2010). In quantitative research, external validity is considered to be important where the findings of a study are regarded as statistically generalizable, and the sampled data are representative of the whole population. However, this aim can be considered an ideal situation and is not necessarily what is achievable in practice when the entire population is difficult to define or access. In qualitative research, the ideal analytic generalization model aims to support theories or conceptualisations through rigorous inductive analysis approaches and confirmatory strategies (Polit and Beck, 2010). Replication of findings can contribute to supporting generalizability, however for brevity an in-depth discussion of replication is beyond the scope of this editorial.

Data saturation informs the sample size

When considering qualitative research paradigms, there is consensus that a sample size is regarded to be sufficient when data saturation, and theoretical saturation, have been achieved (Lincoln and Guba, 1985). Data saturation is reached when no new information or themes are observed in data from additional units of analysis, particularly involving case studies or interviews (Taber, 2000). There is a question of whether a single 60 minute interview (or focus group) can be regarded as sufficient to achieve data saturation, but it is also recognised that a single case can provide deep, nuanced, and novel insights into previously unexplored phenomena framing future research (Lincoln and Guba, 1985). Indeed, a single case may also be measured across multiple timepoints generating data from multiple interviews. In a chemistry education research setting, it is highly unlikely that the findings of a single case can be considered to translate naturally into other contexts without evidence of transferability, and hence may not be regarded as generalizable (Taber, 2000).

Communicating sampling approaches in the methods section of a CERP article

The process of sampling should be clearly articulated in a methods section, along with acknowledging the potential limitations or biases which may be present based on this process in the specific context. The recruitment of participants needs to be described in depth along with any incentivisation for their participation (gift cards, bonus marks etc.) which might influence the composition of the sample. The number of participants who were invited and the number who consented to participate should both be included so readers can consider the response rate or participation rate in a study. For qualitative research studies, a richer description of any participants is required to make details of purposeful sampling in the specific context explicit.

A variety of sampling approaches are evident in chemistry education research and evaluative studies published in our journal (Table 1 provides several recent example articles). As mentioned earlier, in our research field, the units of analysis or observation in terms of evidence of learning and teaching may involve single or multiple combinations of individual or groups of students, teachers and institutions.

Table 1 Exemplars of the application of a range of sampling approaches in recent CERP articles
Sampling approach Context Unit of analysis Target population size (sample size) Study
Random Semester 2 general chemistry courses Tertiary students 1584 (1086) Farheen and Lewis (2021)
Stratified random Tertiary institutions In-service teachers 6388 (829) Raker et al. (2021)
Quota Multiple-level tertiary chemistry courses Tertiary students 23 (9) Hosbein and Barbera (2020)
Convenience High school classrooms Secondary students 78 (78) Kadioglu-Akbulut and Uzuntiriyaki-Kondakci (2021)
Purposeful and snowball Chemistry outreach activities Graduate students Case 1[thin space (1/6-em)]:[thin space (1/6-em)]5 Santos-Díaz and Towns (2021)
Case 2[thin space (1/6-em)]:[thin space (1/6-em)]4
Voluntary Professional development Doctoral students Incoming enrolments (4) Busby and Harshman (2021)


It is important to acknowledge that quantitative experimental research studies which aim to evaluate the effectiveness of a teaching or learning intervention in classroom settings face multiple sampling challenges. There are often insufficient participants to enable randomised sampling approaches or statistical analysis to compare treatment and control groups. Taber (2019) provides a detailed overview and advice for sampling, generalizability and replication of findings for these types of studies.

Final words of encouragement

Despite the above considerations of what might be considered a ‘big enough’ sample size, we would like to reassure authors that studies which do not meet these ideal sample size criteria are not automatically excluded from publication in CERP. Our journal aims to disseminate well-constructed, evidence-informed practice and we look for data that has been collected using instruments and questions which are either rationally designed or adapted and fit for purpose. Our chemistry education research community recognises that it is often difficult to recruit large and representative samples in certain contexts. In the real world, unpredictable circumstances often impact on access to, or retention of, participants in a sample, so authors are encouraged to explicitly acknowledge any potential limitations in the relevant section of their manuscript. One example of a clearly detailed communication of the potential impact that a drop in the number of participants between multiple collections from a single sample has, can be found in the Limitations section of a quantitative study of latent traits by Ferrell and Barbera (2015). This study also provides an example of where the authors supported the generalizability of their findings by cross-validation of their instrument through collection of samples from different institutions.

CERP readers will often seek out research articles which provide insight into sampling methods and outcomes from studies, regardless of sample size, so that they can inform their own research methods and context. Indeed, data which is curated across multiple cases or contexts, sourced from separate published studies, can be compared and synthesised as part of a meta-analysis to build a consensus picture that may become generalizable through the weight of combined evidence. In summary, our advice to authors is to invest in a detailed description of their data collection processes for their own context, including the sampling procedures used and sample composition. They should also reflect on their sample size and study context further, to acknowledge the limitations or sample biases in their study, before making recommendations for future research or implementation in teaching practice. We suggest that conservative claims, informed by the study's findings, will be more highly regarded by readers than unevidenced claims and generalizations which are not supported by the data that was collected.

References

  1. Busby B. D. and Harshman J., (2021), Program elements’ impact on chemistry doctoral students’ professional development: a longitudinal study, Chem. Educ. Res. Pract., 22(2), 347–363.
  2. Farheen A. and Lewis S. E., (2021), The impact of representations of chemical bonding on students’ predictions of chemical properties, Chem. Educ. Res. Pract., 22,  10.1039/D1RP00070E.
  3. Ferrell B. and Barbera J., (2015), Analysis of students' self-efficacy, interest, and effort beliefs in general chemistry, Chem. Educ. Res. Pract., 16(2), 318–337.
  4. Guba E. G., (1981), Criteria for assessing the trustworthiness of naturalistic inquiries, Educ. Commun. Technol.: J. Theory, Res. Dev., 29(2), 75–91.
  5. Herrington D. G. and Daubenmire P. L., (2014), Using interviews in CER projects: Options, considerations, and limitations, Tools of chemistry education research, American Chemical Society, pp. 31–59.
  6. Hosbein K. N. and Barbera J., (2020), Alignment of theoretically grounded constructs for the measurement of science and chemistry identity, Chem. Educ. Res. Pract., 21(1), 371–386.
  7. Kadioglu-Akbulut C. and Uzuntiryaki-Kondakci E., (2021), Implementation of self-regulatory instruction to promote students’ achievement and learning strategies in the high school chemistry classroom, Chem. Educ. Res. Pract., 22(1), 12–29.
  8. Lincoln Y. S. and Guba E. G., (1985), Naturalistic inquiry, Sage.
  9. Onwuegbuzie A. J., & Collins K. M. (2007), A typology of mixed methods sampling designs in social science research, Qual. Rep., 12(2), 281–316.
  10. Onwuegbuzie A. J. and Leech N. L., (2007), A call for qualitative power analyses, Qual. Quant., 41(1), 105–121.
  11. Polit D. F. and Beck C. T., (2010), Generalization in quantitative and qualitative research: Myths and strategies, Int. J. Nursing Stud., 47(11), 1451–1458.
  12. Raker J. R., Dood A. J., Srinivasan S. and Murphy K. L., (2021), Pedagogies of engagement use in postsecondary chemistry education in the United States: results from a national survey, Chem. Educ. Res. Pract., 22(1), 30–42.
  13. Santos-Díaz S. and Towns M. H., (2021), An all-female graduate student organization participating in chemistry outreach: a case study characterizing leadership in the community of practice, Chem. Educ. Res. Pract., 22(2), 532–553.
  14. Taber K. S., (2000), Case studies and generalizability: Grounded theory and research in science education, Int. J. Sci. Educ., 22(5), 469–487.
  15. Taber K. S., (2019), Experimental research into teaching innovations: responding to methodological and ethical challenges, Stud. Sci. Educ., 55(1), 69–119.
  16. Wolf E. J., Harrington K. M., Clark S. L. and Miller M. W., (2013), Sample size requirements for structural equation models: An evaluation of power, bias, and solution propriety, Educ. Psychol. Meas., 73(6), 913–934.

This journal is © The Royal Society of Chemistry 2021