Enhancing formative and self-assessment with video playback to improve critique skills in a titration laboratory

Poh Nguk Lau

doi:10.1039/C9RP00056A

View PDF VersionPrevious ArticleNext Article

DOI: 10.1039/C9RP00056A (Paper) Chem. Educ. Res. Pract., 2020, 21, 178-188

Enhancing formative and self-assessment with video playback to improve critique skills in a titration laboratory

Poh Nguk Lau
School of Applied Science, Temasek Polytechnic, Singapore. E-mail: pohnguk@tp.edu.sg

Received 25th February 2019 , Accepted 13th August 2019

First published on 14th August 2019

Abstract

The rhetorical argument that laboratory courses are crucial for training skilled STEM practitioners is ill-evidenced in teaching practice. The arduous task of implementing instructor-led skill assessment in large-cohort courses and persistent student disengagement from its educative goals are some obstacles. This study emphasized the need to equip learners to self-assess technical skills, supported by explicit performance standards and objective evidence. It trials two interventions, a self-assessment (SA) checklist and a learner-recorded video, to examine how the combination impacts the appraisal ability and attitudes towards SA. The participants were from a first year chemistry course in a biotechnology and chemical engineering course. All the participants self-assessed titration competencies against a checklist, with about half assisted with a video replay. A video critique task showed a significant main effect by intervention. SA-with-video participants scored higher than SA-only participants and the control group. The additional video intervention did not produce any significant gains above SA alone. Qualitative analysis revealed that SA-with-video participants were more targeted in their critique responses. Video differences in attitudinal responses towards SA were not prominent. Selected SA items related to perceptions of the value of SA in skill improvement, and, as a future study strategy, goals and commitment of using SA for skill improvement, were associated with video exposure in the biotechnology course, or with the course in the video group. Improvements for future work are discussed.

Introduction

Though most STEM educators would agree that laboratory curriculum is no doubt an integral part of science coursework, its efficacy in achieving desired student outcomes remains ambivalent (Hofstein and Lunetta, 1982, 2004). At the same time, economic and industrial reforms are continuing to exert pressure on higher education (HE) institutes to produce graduates who are both scientifically literate and skilled practitioners (Parry et al., 2012; Gibbins and Perkins, 2013).

Given the decrease in government funding and high investment costs in offering laboratory courses, there has been a constant re-evaluation of whether laboratory instruction has indeed fulfilled its intrinsic value in science and engineering curricula (Gibbins and Perkins, 2013). Past studies have repeatedly shown that students’ perceptions on the ground have diverged from our visionary and lofty goals of laboratory instruction, highlighting a mismatch between faculty and student goals (Russell and Weaver, 2008; Parry et al., 2012; DeKorver and Towns, 2015; Galloway and Bretz, 2016). Students often downplay their engagement and ownership of the learning process. For example, some possess the “escapist mindset” by hastily finishing up the work as fast as possible without much thinking, while others view laboratory work as a means to obtain good grades instead of skill (Parry et al., 2012; DeKorver and Towns, 2015; Galloway and Bretz, 2016). These are pressing issues to address in order to maximize the value of laboratory curriculum, and for laboratory instructors to re-focus on the assessment of laboratory outcomes and to monitor the process of skill mastery. A widely implemented assessment practice in the chemistry laboratory is to evaluate the extent to which students “act with criteria” (Prades and Espinar, 2010, p. 453) by integrating cognitive skills (decision making, relating theory to experiment) and hands-on skills. Another common feature of laboratory assessment is that the process is not once-off, but a continuous interaction between instructors and students in-class to evaluate competency levels (Prades and Espinar, 2010).

In the institute where the author teaches, a freshman course in inorganic and physical chemistry has a total enrolment size of about 500 students. Thus, instructor-led, one-on-one skill assessment on-site is often time-consuming and practically hard to implement and sustain for such enrolment size. The question then is whether one could turn the situation around by equipping learners with skills to discern their own competency levels objectively. For this learning experience to be meaningful, learners need to know where they are now in their competencies, what to aim for and how to get to the desired standards (Andrade and Heritage, 2017). As the good old adage goes, “practice makes perfect”, but we need to define what makes perfection in technical skills, and allow multiple opportunities for learners to improve and move towards perfection. Along these lines, two well-established cornerstones of teaching practice, formative assessment and self-assessment (SA), are particularly relevant for laboratory instruction. Andrade and Heritage (2017) defined formative learning as a process where learners reflect on their competencies against performance norms, and iteratively make refinements to close their learning gaps. If provided with objective evidence, specific learning goals of the domain content, or feedback from peers, students could engage in self-examination, enabling them to self-judge their own work quality (Boud, 2003). Boud argued that SA is a valuable life-long skill to develop in graduates from higher education, moulding several soft skills such as a mindset of continuous learning, positive self-concepts and increased autonomy in the ownership of learning, to name a few. In the context of laboratory assessment, the performance criteria should thus be clearly communicated to students and instructors to calibrate standards (Prades and Espinar, 2010).

A scan of the existing literature showed that SA in a laboratory course is supported by an open-ended reflection guide (Parry et al., 2012) or a more structured skill inventory (Zhang et al., unpublished work, 2013). Zhang et al. examined the effects of using SA in food science and biology laboratory classes. Skill checklists were provided to students as exemplars of performance standards in a cell culture and food microbiology laboratory. They then engaged in self-monitoring and reflection, and also provided peer feedback to each other. A post-intervention feedback survey showed that students viewed SA positively, with over 90% of the students surveyed agreed that the checklist increased confidence levels in skill performance, assisted in the identification of strengths and weaknesses and enabled them to make improvements to their skills. Parry et al. (2012) implemented the Critical Incident Report (CIR) in a biochemistry and molecular biology laboratory course. Students were asked to list what they thought were the critical tasks necessary to complete the task objectives and to reflect on how well they performed these tasks. They were also asked to describe the learning that they took away from these incidents and how it would affect their future performance. A pre- and post-intervention questionnaire was used to compare perception levels in the value of reflection. The results showed that the CIR enhanced participants’ awareness of the importance of tutor feedback and adopting alternative study strategies to improve their knowledge on the subject matter.

The problem with checklists or reflection guides is the lack of objective evidence. Post-task reflection could be hindered by an inability to recall what was actually performed during the task (Dawes, 1999). Therefore, recent work in chemical education research has begun to explore the use of video as a means of assessment to evaluate skill attainment (Towns et al., 2015; Hensiek et al., 2016; Hennah and Seery, 2017). Termed the “digital badge” approach, learners prepare a self-demonstration video on laboratory skills with a set of given instructions. The skills ranged from pipetting to titration. These videos were reviewed by peers, instructors or both. If the desired competency level is attained, the student would earn a badge as a visible recognition for skill mastery, much like how experts are recognized in professional fields. The authors reported improved student outcomes in terms of perceived confidence in technical skills and also in written tests (Towns et al., 2015; Hensiek et al., 2016; Hennah and Seery, 2017).

Veal et al. (2009) combined a self-reflection intervention with a self-demonstration video to measure the extent of students’ awareness of practical skills in a general chemistry course. The experimental group was video-recorded by instructors in class in which instructor feedback was provided after a series of student recordings. The participants were immediately prompted to think of how they went about completing the task, and the areas in which they did and did not do well. Subsequently, the participants completed a survey to elicit the utility of the videotape feedback and were interviewed. The results showed that, firstly, the experimental group fared better in the laboratory and theory examinations. Secondly, the video reflection drew participants’ attention to the quality of their laboratory skills more readily, explicitly and critically. The participants unanimously supported self-critique using a video.

Aims

Clearly, the literature review highlighted the effectiveness of video recording and self-reflection in the acquisition of laboratory skills. Video recordings allow students and instructors to objectively evaluate the skills demonstrated (the “where we are now”, Andrade and Heritage, 2017, p. 5). This minimized the issues of time constraints and memory recall errors. A checklist identifies the core performance criteria (the “where are we going”, Andrade and Heritage, 2017, p. 5). The earlier studies cited were not in a chemistry course setting, used only a video intervention alone, a SA checklist alone or open-ended self-reflection guides with no performance standards spelled out explicitly. Therefore, extrapolating from the literature to plug the gap, the research questions of the present study are to examine (1) if participants who used a well-defined skill checklist for SA would be more skilled in appraising titration skill quality in others; (2) whether scaffolding SA with a self-video would enhance participants’ critique ability and (3) whether the combined use of SA and self-video would influence participants’ attitudes of SA.

Method

The institute of higher education in which the author currently works offers diploma level courses for post-secondary school leavers with typically 10 years of prior education. This research study was conducted as part of institutional requirements to fulfill a milestone professional development course, with the study endorsed by management. The Institutional Review Board (IRB) guidelines pertaining to educational research projects were observed. Management approval was given to implement the study.

The inorganic and physical chemistry course spanned a duration of 17 weeks from April to August 2017, with three weeks of mid-term recess. The laboratory classes were timed on alternate odd weeks of the semester. The emphasis of the tasks was on the use of a pipette and the titration technique. Fig. 1 summarizes the lesson plan in the whole project cycle. In total, the participants completed the SA form over three laboratory sessions, and self-recordings were collected over four sessions. Re-rating the SA checklist while reviewing the video took place in Labs 3 and 4.


	Fig. 1 Overview of the lesson plan over the SA and self-video phase.

The first lesson began in Week 3 (second week of May 2017). In the first session, the participants began a simple pipette task and the author demonstrated the correct techniques. The participants took turns to practice the pipetting technique, with their partner recording the process using a mobile phone. In the first session, the self-assessment checklist was not distributed as the intent was to acquaint the participants with the technique. The skill checklist was customized specifically for the chemistry tasks of titration and pipetting. It was developed by the author and colleagues in the teaching team, and also from the author's prior experience in assessing practical titration skills in a school-based assessment setting. The author also took the opportunity to advise participants on the focal points required in the video. These focal points were the critical actions corresponding to the skill checklist. The participants uploaded their video files into a Google drive folder encrypted with a password provided by the author.

In Week 5, the laboratory task involved acid–base titration to determine the unknown concentration of a basic solution. After a teacher demonstration, the first self-assessment was implemented in Week 5, together with the video recording. The participants completed the skill inventory (A1 of the Appendix) on-site after completing the hands-on work. Laboratory lessons resumed in Week 13 (mid Jul 2017) after the term recess. Since a long break had occurred, the author conducted a review on good titration techniques with a briefing and a short quiz. The laboratory task involved titration, and before the video group began any hands-on work, they were requested to review the Week 5 video and to re-assess their skills on the SA checklist. The same lesson plan took place in the Week 15 class.

The participants continued on with the titration work in the last practical class in Week 17, but self-video and SA were stopped. A video critique task was administered. It showed a student performing a titration experiment similar to the task performed in class. The participants were instructed to provide written comments individually, to describe the areas in which the protagonist did well and to suggest areas for improvement. The other instrument, the SA perception survey, was also distributed for completion in this session (A2 of the Appendix)

Participants

The participants were year one students in the freshman cohort of April 2017, taught by the author. The video and self-assessment intervention were implemented in two classes: one from Chemical Engineering (ChE) and the other from Biotechnology (BIO). This resulted in a total of 52 participants. For each class, participation in self-video was sought on a volunteer basis. Since laboratory work was done in pairs, both participants in the same group must participate, for practical reasons. This was so that the participants could record the work of their own partner and vice versa, using their own mobile phones. Volunteers were credited with extra class participation points at the end of the semester. Approximately half of the participants responded to the call for self-video of laboratory work, while the other half did not. Two participants in the ChE class requested to discontinue with recording in the midst of the study, and thus, they were subsumed with the non-video cohort. All the participants completed a SA checklist at the end of wet work, except in the last session.

Two other tutorial classes taught by the author served as control. The participants in the control classes were taught by other laboratory instructors, and were not exposed to the video and SA. As part of a revision class, they were tasked to also critique the same titration video during a tutorial session. Table 1 summarizes the deployment of the intervention amongst the classes. Missing data in the survey and video critique were taken into account in the sample sizes.

Table 1 Sample sizes and administration of learning interventions and instruments

	Experimental		Control
	ChE	BIO	ChE 1	ChE 2
Self-video	10	13	—	—
SA checklist	26	27	—	—
Survey	25	27	—
Video critique	22	23	23	25

Instruments

The SA survey measured participants’ extents of agreement on the usefulness of the SA or skill checklist. It is a 21-item survey to elicit perceptions on the usefulness of the SA, participants’ levels of confidence, motivation to improve work and their inclination to use SA in the future. This instrument was a pilot version designed by another project team in the institute who mounted an extensive SA project in laboratory coursework (unpublished work by Zhang et al., 2013). Three items were free response type. For the video group, two items were included. They were an additional scale (item 20) to elicit responses on the level of usefulness of the self-recording video in improving titration skills, followed by an open-ended item for the participants to describe how it was useful (item 21). Ratings were anchored as a Likert scale, ranging from strongly agree (coded as 1) to strongly disagree (coded as 4), with an option of not applicable (coded as 5). There were other free response questions and a specific question on video utility targeted at the SA + video group. A total of 15 common items were tested between the groups. The only negative attribute, “Doing the self assessment is a waste of time”, was reverse coded to compute the Cronbach alpha. The Cronbach alpha values for these 15 items were above 0.909. When the responses were analysed by course and video intervention, the Cronbach alpha values were between 0.90 and 0.94. This indicates a high level of consistency in the responses when analysed under main group conditions.

The second data source was scores on a video critique task, administered in the last laboratory session of the course. The same video task was implemented in the two control ChE classes in the same week. The video clip was selected from the pool of participant recordings, as unscripted and imperfect videos taken by novices provide rich learning experiences (Blazeck and Zewe, 2013). Thus, it was inevitable that some critical scenes could not be captured. Participants’ written responses to the video critique exercise were graded on a 9-point rubric (see A3 in the Appendix), which captured all strengths and areas for improvements directly visible or which could be partially inferred from the video clip (Fig. 2).


	Fig. 2 Video critique task.

For actions or mistakes clearly visible from the video, one point was allocated for correct identification. Owing to the lack of visual cues, some actions in the clip could be construed as both strengths and weaknesses. Some examples were whether or not the student actor had read the liquid meniscus at eye level, had clamped the burette vertically, or had tapped the pipette tip when dispensing the solution. In such cases, benefit-of-doubt was given as partial credit (half point) if participants classified these skills either way. In some instances, participants also mentioned other non-SA checklist skills which were credible, such as using a white surface for color contrast. These responses were rewarded with a partial credit. If participants did not explicitly classify the actions as “actions done well” or “room for improvement”, semantical cues were used to imply responses such as “did not”, “should have”, “instead of” and “should not”. No marks were given for overtly wrong classification, meaning when participants classified a positive action as a deficit or vice versa. No deduction was made for such instances. Other attributes not captured in the rubric were coded on an a posteriori basis during the grading process to generate coding themes. These responses were coded using NVivo version 12.

Statistical methods

The non-parametric Kruskal–Wallis test was used for group differences in the task marks due to the non-normality of the data. Pair-wise comparisons were made with adjusted significance levels reported.

The chi-square statistic was used to find out if there were any survey items which produced a significant association with the video intervention or course. If the assumption on the expected cell count was violated, likelihood ratio statistics were reported (Field, 2013, p. 724). All the chi-square values and the respective p-values reported are the unadjusted values, compared to a Bonferroni-adjusted critical p-value. Decisions on how and when to apply the Bonferroni correction may be quite subjective (Cabin and Mitchell, 2000). This study uses an approach similar to that of Boucek et al. (2009) to correct for multiple testing. The Bonferroni adjustment was applied after the first-level chi-square tests, set at 0.05 significance. For items which passed the first-cut critical level, gamma (γ) coefficients are checked for effect sizes against a Bonferroni-corrected p-value. This adjusted alpha level is 0.05/6 (=0.0083), because there are 6 possible pairwise comparisons between the video and course that are relevant to the research aims. They are: video–no video; ChE–BIO; ChE video–ChE no video; BIO video–BIO no video; ChE video–BIO video; ChE no video–BIO no video. Gamma (γ) coefficients are used because the data are ordinal–nominal in nature. Items which cleared both criteria are presented in this work. Using an item-level correction seeks to achieve a compromise to balance the Type 1 error rate and the power of the statistical analysis.

Statistical limitations

Besides the small sample size and lack of validation studies on the SA survey, the method of applying the Bonferroni correction described in the earlier paragraph could also lead to a possibility of false positives. The alternative would be to front-load the Bonferroni correction on a scale-level using a critical p-value of 0.05/15 = 0.003 (with 15 items in the SA survey). As the subsequent section on the results unfolds, the pros and cons of front-loading the Bonferroni factor would become evident. Further discussion of this issue is presented at the end of this article.

Results

SA survey

Most survey items were rated positively by participants, with at least 80% of the responses in the strongly agree and agree categories (Table 2). The negative item 11 (“doing SA is a waste of time”) saw about 60% to 70% of the respondents disagreeing that the SA was not useful. There were no significant differences between the video and non-video groups. About 10% of the participants adopted a neutral stance to item 4 (“I found the SA useful”) and item 11, responding with Not Applicable on these items.

Table 2 SA survey response frequencies (top: SA + video, bottom: SA only). Counts are shown in brackets

	% SA	% A	% D	% SD	% NA	% SA + A
Item 5 is a free response question.a Significant association with the course. See Table 3.b Significant association with the video groups in BIO only. See Table 4.c Significant association with the course in the video group only. See Table 5.
(1) Performing the self-assessment exercises has increased my confidence in the subject	26 (6)	70 (16)	4 (1)	—	—	96
	28 (8)	72 (21)	—	—	—	100

(2) I had no difficulty identifying action steps to improve work	30 (7)	48 (11)	22 (5)	—		78
	17 (5)	69 (20)	10 (3)	3 (1)		86

(3) I worked on action steps in the next session to improve work^a	44 (10)	56 (13)	—	—	—	100
	31 (9)	69 (20)	—	—	—	100

(4) I found the SA checklist useful	33 (7)	52 (11)	5 (1)	—	10 (2)	85
(4) I found the SA checklist useful	32 (9)	68 (19)	—	—	—	100

(6) Doing the SA enables me to judge performance better	35 (8)	65 (15)	—	—	—	100
(6) Doing the SA enables me to judge performance better	31 (9)	66 (19)	3 (1)	—	—	97

(7) I am able to compare the quality of my work against standards/criteria	39 (9)	48 (11)	13 (3)	—	—	87
	24 (7)	69 (20)	7 (3)	—	—	93

(8) SA enables me to improve on my learning in areas I am not so good at	39 (9)	61 (14)	—	—	—	100
	38 (11)	59 (17)	3 (1)	—	—	97

(9) I become better aware about my learning through doing the SA	39 (9)	57 (13)	4 (1)	—	—	96
	31 (9)	66 (19)	3 (1)	—	—	97

(10) The SA helps me assess my strengths and weaknesses accurately	35 (8)	56 (13)	9 (2)	—	—	91
	28 (8)	65 (19)	7 (2)	—	—	93

(11) Doing the SA is a waste of time	9 (2)	14 (3)	32 (7)	36 (8)	9 (2)	23
(11) Doing the SA is a waste of time	4 (1)	24 (7)	48 (14)	14 (4)	10 (3)	28

(12) I do the SA with the intention of improving my work^c	30 (7)	57 (13)	13 (3)	—	—	87
(12) I do the SA with the intention of improving my work^c	21 (6)	79 (23)	—	—	—	100

(13) The school should continue implementing SA with subjects for me^b	35 (8)	52 (12)	13 (3)	—	—	87
	14 (4)	76 (22)	10 (3)	—	—	90

(14) I will continue using the SA in my learning	26 (6)	65 (15)	9 (2)	—	—	91
(14) I will continue using the SA in my learning	14 (4)	82 (23)	4 (1)	—	—	96

(15) SA will better prepare me for the world of work	30 (7)	61(14)	4 (1)	4 (1)	—	91
(15) SA will better prepare me for the world of work	24 (7)	66 (19)	10 (3)	—	—	90

(16) SA has improved my lab skills^b^,^c	52 (12)	44 (10)	4 (1)	—	—	96
(16) SA has improved my lab skills^b^,^c	35 (10)	65 (19)	—	—	—	100

Item 3 (I worked on action steps in the next session to improve my work) produced a strong significant association with the course, χ²(1) = 8.76, p = 0.003 (Table 3). All the participants regardless of video conditions agreed that they took action steps to close skill deficit in the next class. The gamma-coefficient (γ) of −0.74 was significant when compared to the Bonferroni-corrected critical value of 0.0083 (unadjusted p = 0.001). The large and negative effect size implied that BIO participants were more inclined to commit to improving their skills. This is seen in the proportion of strongly agree and agree responses, which were more evenly distributed in the BIO class (56% and 44%). The ChE course produced a more skewed profile (16% strongly agree, 84% agree).

Table 3 SA survey item with significant association with the course (top row: ChE; bottom row: BIO)

	% SA	% A	% D	% SD	% NA	% SA + A
χ ²(1) = 8.76, unadjusted p = 0.003; γ = −0.74, unadjusted p = 0.001 (<0.0083 adjusted critical level).
(3) I worked on action steps in the next session to improve work	16 (4)	84 (21)	—	—	—	100
	56 (15)	44 (12)	—	—	—	100

In the ChE class, all items had no significant association with the video intervention. Table 4 shows the response distribution for two items that differed significantly in the BIO class between the video groups. They were items 13 (“the school should continue to implement SA”, χ²(2) 7.59, p = 0.02, unadjusted) and 16 (“SA has improved my lab skills”, χ²(1) = 6.31, p = 0.01, unadjusted). For item 13, the γ coefficient was 0.86 (p = 0.002). This implied that BIO video participants strongly supported the future use of SA. The profile showed that the proportion of video participants who strongly agreed to continue SA implementation was almost seven times higher (SA + video = 46%, SA-only = 7%).

Table 4 SA survey items with significant association with video conditions, BIO only (top row: video; bottom row: non-video)

	% SA	% A	% D	% SA + A
Item 13: likelihood ratio χ²(2) = 7.59, unadjusted p = 0.02; γ = 0.86, unadjusted p = 0.002 (<0.0083 adjusted critical level). Item 16: χ²(1) = 6.31, unadjusted p = 0.01; γ = 0.79, unadjusted p = 0.004 (<0.0083 adjusted critical level).
(13) The school should continue implementing SA with subjects for me	46 (6)	54 (7)	—	100
	7 (1)	79 (11)	14 (2)	86

(16) SA has improved my lab skills	77 (10)	23 (3)	—	100
(16) SA has improved my lab skills	29 (4)	71 (10)	—	100

For item 16, a large and significant effect size was seen (γ = 0.79, p = 0.004). The response distribution showed that the percentage of video participants who thought that the SA had improved their hands-on skills was about three times higher (SA-and-video = 77%, SA-only = 29%). Similar to item 13, this observation is consistent with the magnitude and direction of the effect size.

Amongst the video participants, item 12 (“I do the SA with the intention of improving my work”, χ²(2) 7.81, unadjusted p = 0.02) and item 16 (χ²(2) = 8.46, unadjusted p = 0.015) were significantly associated with the course. Refer to Table 5. Item 12 produced a large γ coefficient of −0.83 (p = 0.002). The γ coefficient of item 16 was −0.87 (p = 0.001). These results suggested that BIO video participants had more favourable perceptions of SA on the items compared to their video peers in the ChE course. In both items, none of the BIO video participants responded negatively. No items were associated with the course background in the non-video group.

Table 5 SA survey items with significant association with the course in video participants (top row: ChE; bottom row: BIO)

	% SA	% A	% D	% SA + A
Item 12: likelihood ratio χ²(2) = 7.81, unadjusted p = 0.02; γ = −0.83, unadjusted p = 0.002 (<0.0083 adjusted critical level). Item 16: likelihood ratio χ²(2) = 8.46, unadjusted p = 0.015; γ = −0.87, unadjusted p = 0.001 (<0.0083 adjusted critical level).
(12) I do the SA with the intention of improving my work	10 (1)	60 (6)	30 (3)	70
(12) I do the SA with the intention of improving my work	46 (6)	54 (7)	—	100

(16) SA has improved my lab skills	20 (2)	70 (7)	10 (1)	90
(16) SA has improved my lab skills	77 (10)	23 (3)	—	100

Video critique

SA + video participants scored, on average, higher than the SA only and control groups (M = 2.82, SD = 1.43, n = 19 for SA + video; M = 2.10, SD = 1.38, n = 26 for SA only; M = 1.20, SD = 0.82, n = 48 for the control group). Table 6 presents the other descriptive statistics, along with the mean ranks and median scores.

Table 6 Descriptive statistics and mean ranks by intervention

	SA + video	SA only	Control
Md	2.50	2.00	1.00
Mean rank	67	53.6	35.5
M	2.82	2.10	1.20
SD	1.43	1.38	0.82
n	19	26	48

A significant difference in critique marks was obtained across the three groups (χ²(2, 93) = 21.14, p = 0.00). The SA + video participants had the highest median score (Md = 2.50), followed by the SA-only group (Md = 2.00) and the control group (Md = 1.00). Pairwise comparison revealed a strong, significant difference between the SA + video group and the control group (p = 0.00). The difference between the SA-only and control groups was marginal after accounting for pairwise comparisons (p = 0.016). The difference between the SA-only and SA + video groups was not significant (p > 0.05).

Skill attribute profile

For a qualitative comparison of the strengths and weaknesses amongst the three intervention groups, participants’ answers to the critique task were classified according to rubric-graded and non-rubric graded types. Rubric-graded ones were attributes required in the marking scheme (and thus classified as crucial skills in the inventory), including those with partial credit. Non-rubric graded ones were attributes that fell outside the marking rubric and not given due credit but are used for the current analysis. These non-rubric attributes were identified a posteriori during the grading process, resulting in 10 attributes. The coding structure was made as fine-grained as possible to retain the original flavour of the responses. For example, general comments such as “good pipetting skills” or “precise and accurate” were not lumped as one code but retained as differentiated codes to capture the nuances of responses.

The profile of rubric-graded attributes is shown in Table 7, arranged in descending order of frequency. The incidence of participants who identified a positive and negative attribute was computed against the sample size per group. The results showed that the top (correctly) identified weakness was poor dropwise control, followed by swirling of contents and the removal of the funnel, both positive demonstration. These three skills are core skills emphasized in the skill inventory. About 70% of the SA + video participants correctly identified poor dropwise control as a weakness. For “swirl flask contents” and “removed the funnel”, close to half of the SA & video participants made the correct judgment. About half of the SA-only participants cited poor dropwise control as a skill gap to close. In the control group, the attribute that attracted the highest incidence was “swirl flask”.

Table 7 Distribution of rubric-graded attributes. % are based on group size. (+)/(−) implies a correctly identified strength/weakness in the video performance. * skill inventory performance standard

	Total n	% SA & video (n = 19)	% SA only (n = 26)	% Control (n = 48)
(−) Poor drop-by-drop control*	33	68	46	17
(+) Swirl flask contents*	32	42	27	35
(+) Removed the funnel*	25	47	35	15
(+) Use a white tile	22	21	15	29
(+) Read at eye-level*	19	16	4	2
(+) Wash the burette tip*	16	21	39	4
(+) Use a funnel to add titrant	12	21	12	10
(−) Air bubbles in the burette*	12	11	15	13
(−) Failed to transfer pipette waste to a beaker	8	42	0	0
(+) Tap the pipette against a conical flask*	5	16	4	2
(−) Should lift the pipette to adjust the meniscus*	5	26	0	0
(−) Should read at eye-level*	4	0	8	4
(−) Didn’t tap the pipette against the conical flask*	2	5	0	2
(+) Clamp the burette vertically*	1	0	0	2

A total of 118 non-rubric responses were coded. Table 8 presents the data for these attributes, arranged in descending order of incidence. The general and wrong classification categories were further broken down into sub-attributes to provide details of participants’ responses. The general category captured loose and ambiguous comments, such as “performed titration well” or “good pipetting skills”. The wrong classification category included responses that incorrectly identified a positive (negative) skill in the video, but responded to as a deficit (strength).

Table 8 Distribution of non-graded attributes. % are based on group size

	Total (n = 118)	% SA & video (n = 19)	% SA only (n = 26)	% Control (n = 48)
Hand control	32	16	12	54
Wrong classification	22
• Added dropwise	16	21	12	19
• Did not wash the burette tip	3	5	4	2
• No air bubbles in the burette tip	3	16	0	0
Bubbles in the pipette	18	21	46	4
General	17
• Followed procedures correctly	2	0	4	2
• Gentle with rinsing	1	0	0	2
• Good pipetting skills	6	0	4	10
• Good titration skills	4	0	8	4
• Incorrect pipette method	1	0	4	0
• Precise and accurate	2	0	0	4
• Remove the remaining solution in the pipette	1	0	4	0
Wrong color change	8	0	19	6
Did not wash glassware	7	5	19	2
Speed (too fast or too slow)	5	0	0	10
Should not add deionized water	4	0	0	8
Wear personal protective equipment	3	0	8	2
Should prepare crude reading first	2	0	8	0

The results showed that a large proportion of control group participants considered hand maneuver as a skill deficit. Examples of such comments were “keep the left hand on the stopcock to have more control” or “should use the non-dominant hand as well”. None of the SA + video participants gave general comments, such as “good titration skills” or “incorrect pipette technique”, responses which did not identify any specific deficiencies. However, about 16 to 20% of these participants incorrectly judge that the burette tip had no air bubbles and that drop-wise control was performed well. Categorically, they were skills to be addressed. A small 5% of them also failed to notice the positive demonstration of rinsing the burette tip with deionized water during the titration. The “bubbles in the pipette” was a non-issue as the bubbles disappeared in the process of adjusting the meniscus. Approximately half of the SA-only participants (46%) and 21% of the SA + video participants thought this was an issue. The “bubbles in the burette tip” and “bubbles in the pipette” were not picked up by the SA-only and control group, as seen from the low percentages (0% and 5% respectively).

Discussion

The purpose of the current study is to investigate the effects of self-assessment (SA) on the quality of laboratory learning. It was hypothesized that participants exposed to the SA intervention would be more proficient in identifying strengths and weaknesses in others. It was also of interest to examine if participants with an additional piece of learning evidence from a self-video would perform differently in the critique task.

The performance of the video critique task supported the hypothesis that when SA was implemented alongside the self-video review, it elevates learners’ ability to identify pitfalls and strengths in others as compared to a control group who did not experience both interventions. This finding is consistent with that of Veal et al. (2009), who found that using video feedback to facilitate self-reflection of skills led to an improvement of student outcomes. The SA-only group also fared better when compared to the control, although marginally.

The nature of responses in the critique task showed some interesting trends. Control group participants tended to focus on surface attributes that were more readily observed from the video. These attributes were mainly the hand actions of the student actor, such as flask swirling, use of a white surface or trivial aspects such as speed of titration. The participants did not attend to detailed and minute deficits such as bubbles in the burette and poor drop-wise control in titrant dispensing, which were the core performance standards. On the other end, SA + video participants were more skilful in judging the micro aspects of the demonstration, and did not give unqualified statements on skill quality.

The second research question was to explore if participants who did both SA and self-video would perform differently if not better on the critique task. The data suggested that the self-video review did not produce any significant gains over and above what SA alone could achieve. Although the SA + video group fared the strongest, the difference with the SA-only group was no better than a chance occurrence. As long as skill acquisition is scaffolded by explicit performance standards, a video review appears to be a “good to have”. These results and the qualitative analysis on task responses lend credence to the merits of formative assessment and SA (Boud, 2003; Andrade and Heritage, 2017). On the other hand, the absence of the facilitative effects of the video should not be seen as a contradiction to the digital badge pedagogy reported in various studies (Towns et al., 2015; Hensiek et al., 2016; Hennah and Seery, 2017). This is because in the digital badge literature, the video is the dominant learning outcome and product. The goal was to work towards a perfect video for grading purposes, after iterative corrections. The current study uses video as part of the process in the learning.

The third aim was to investigate how differently the video and non-video groups would perceive the value of SA. 80% or more of the responses indicated favourable perceptions of SA, but none of the items differed by video intervention. These support levels were fairly consistent with those reported in previous works (Parry et al., 2012; Zhang et al., unpublished work, 2013). A possible reason why the self-video resource failed to manifest as a differentiating factor in the survey and the critique task could be that the participants perceive the SA check-list as separate from self-monitoring with the video. Thus, they might have failed to integrate the two resources effectively, negating any positive outcomes on attitudes and critique marks that might otherwise be seen. Another possible reason could be the modality of a critique task. The critique task assessed participants’ ability to identify strengths and weaknesses in others (on a video), as opposed to performing the physical titration task by oneself. The latter emphasized psychomotor performance, while the video critique task nuanced cognition and embodiment. In addition, regardless of whether there was a video review or not, all the participants already made conscious efforts to improve future skills.

Another outcome that pointed to the SA–video disconnect was seen in the low critique scores obtained by the SA + video group. On average, this group scored about 31%, indicating that core skills were still omitted. This, however, might also be due to the inability to recall the skill-set during the task or the fact that the particular skill was not readily observed from the video. Several options could be considered to improve the integration of the video and the SA checklist. For one, the video review should be undertaken immediately after practical work, instead of waiting until the next class. The participants should also be made to explicitly pen down action steps to address skill gaps and check their progress against these plans. The mechanics of the recording should also be given due focus to give participants more explicit initial guidance in capturing good videos meaningful for learning.

Two items showed a polarization between the BIO video and non-video participants in favor of the former. The first item was whether SA should continue to be implemented in their course (item 13). About 46% of the video participants strongly supported this statement, while only 7% of the non-video participants did. In fact, the video group viewed future SA implementation very positively. None of them refuted this item, while about 14% of the non-video group did not support the continuation of SA. The second item which resulted in significant differences between the BIO video and non-video participants was whether SA has improved laboratory skills (item 16). Regardless of whether a video was used or not, all the participants agreed that SA had improved their laboratory skills. However, the proportion of video participants who strongly agreed with this item was almost three times higher than that of the non-video group (77% and 29% respectively). The strong positive correlations suggested that the BIO students who used both SA and video appeared to be more supportive of self-monitoring. BIO video students were also more likely than their ChE video peers to agree that they used SA to improve their work (item 12). The BIO video participants also strongly supported that SA has helped them improve laboratory skills (item 16), the common item that differed not just from their non-video classmates but also from the corresponding ChE video participants. On another note, item 3 which elicited participants’ commitment to work on action steps for skill improvement was unexpectedly associated with the course, attracting higher levels of positive perceptions from the BIO participants. All in all, the results seemed to point to some course-based differences in how participants evaluate the merits of SA with or without a video review. It is unclear why such differences should arise. One possible reason could be student differences, such as interest and motivational levels. While it is sufficient to “persuade students of the interest and value of the exercise” (Boud, 2003, p. 183) with the BIO participants, a different incentive approach might be more suitable for the ChE course. This angle warrants deeper research.

In addition to improving the SA and video integration, this study has three notable statistical limitations. Firstly, the sample size is small; therefore, the results should be treated with caution. Secondly, at the point of study and reporting of the results, the SA survey has not been validated. Thus, the validity of the instrument remains unchecked. Increasing the sample size would not only improve the robustness of the results, but also allow for more in-depth quantitative techniques such as factor analysis to study the underlying structure of the SA survey. It is hoped that this study could be replicated to attract more research attention to using SA and formative assessment pedagogy in STEM laboratory courses.

Thirdly, as mentioned in the beginning, decisions on application of the Bonferroni correction may be quite tricky (Cabin and Mitchell, 2000). It is arguable whether the corrected p-value should be applied at the scale-level (15 items) or at the item-level (second-level correction) for significant items only as in the current analysis. It is acknowledged that results would differ, depending on the criteria used. Assuming if the alpha level was 0.05/15 (=0.003), the chi-square test results would show a marginal but significant relationship on one item only (again, item 3, worked on action steps to improve learning). The significance of the effect size of item 3 would still pass the 0.003 critical value. However, perception differences in video-versus-no video within the BIO class would go undetected, even though the significance of the effect size of some items would clear the α = 0.003 critical level. With a strict Bonferroni correction comes a possible loss of information on how learner characteristics might influence SA perceptions. This dilemma is not an easy issue to address and further research to replicate the current study is needed.

Conclusions

This study attempts to test the hypothesis that learners exposed to the SA intervention would be more proficient in identifying deficits and strengths in chemistry hands-on skills. The additional scaffold with a video replay was expected to generate different (if not enhanced) student outcomes when compared to just SA alone. This is a reasonable assumption since using a video replay of one's performance would facilitate self-monitoring. This in turn might also influence the perceptions of the long-term and short-term usefulness of SA.

The results appear to suggest that the critique task performance and perceptions towards SA did not differ significantly for participants who carried out SA only, and those with the additional video review. However, the SA + video participants did perform better in critiquing the features of a laboratory skill video demonstration when compared to the control group. They were also more critical judges of performance standards, particularly in noticing finer details and providing more targeted critiques.

In conclusion, the results obtained from the current study provided some preliminary learning points in using a video review to scaffold SA. Together with a skill inventory, the video tool also served as an explicit input for formative learning, although both resources should be more tightly intertwined in the lesson design. This would enable a more visible scaffold to guide learners to next-step improvement, and extract the value of the video resource tool in the entire learning experience.

Conflicts of interest

The author declares no conflicts of interest in this study.

Appendices

A1. Self-Assessment (SA) checklist/

Yes/No

Use of a pipette

1. Lift the pipette from the liquid surface to adjust the meniscus to the graduation mark

2. Gently tap or rotate the pipette tip against the base of glassware

Burette set-up

3. Clamp the burette vertically on the retort stand

4. Remove air bubbles from the tip of the burette

5. Remove the filter funnel from the burette

6. Read the meniscus at eye level

Endpoint observation

7. Continuously swirl flask contents

8. Towards the end-point, add titrant drop-wise

9. Towards the end point, completely transfer the titrant by rinsing the walls of the conical flask and burette tip

10. Obtain correct color change at the end point

Data-recording

11. Record burette readings to correct precision level (2 decimal places)

A2. SA survey items (video version), rated on a 5-point scale, Strongly Agree, Agree, Disagree, Strongly Disagree, Not Applicable. Items 5, 17, 18, 19 and 21 are free-response items. The non-video version excludes items 20 and 21.

1. Performing the Self-Assessment exercises has increased my confidence in the subject.

2. I had NO difficulty in identifying the action steps to improve my work.

3. I worked on the action steps in the next session to improve my work.

4. I found the Self-Assessment checklist useful.

5. Please explain your response in Q4 in terms of how the checklist was helpful or not helpful.

6. Doing the Self-Assessment enables me to judge my performance better.

7. I am able compare the quality of my work against the standards or assessment criteria.

8. The Self-Assessment enables me to improve on my learning in areas where I am not so good at.

9. I become better aware about my learning through doing the Self-Assessment.

10. The Self-Assessment helps me assess my strengths and weaknesses accurately.

11. Doing the Self-Assessment is a waste of time.

12. I do the Self-Assessment with the intention of improving my work.

13. The school should continue implementing Self-Assessment with subjects for me.

14. I will continue using Self-Assessment in my learning.

15. Self-Assessment will better prepare me for the world of work.

16. Self-Assessment has improved my lab skills.

17. With reference to your response to Q16, name these lab skills.

18. In your opinion, what are the steps involved in Self-Assessment of your work?

19. Share with us your overall experience in self-assessing your work for improvement.

20. The videos I took were helpful in improving my lab skills.

21. With reference to Q20, in what ways were the videos useful?

A3. Marking a rubric for a video critique task. Total maximum possible = 9 marks

Skills demonstrated, each response = 1 mark	Skills not demonstrated or to be improved, each response = 1 mark
• Burette clamped vertically	• Fail to clear air bubbles in the burette tip at the start of the titration
• Removed the funnel	• Poor drop-by-drop control
• Swirl flask contents continuously
• Wash the burette tip with a wash bottle
The following responses were awarded partial credit:	The following responses were awarded partial credit:
– Use the funnel for introducing titrant	– Should lift the pipette to adjust the liquid level
– Use of a white surface	– Should read the meniscus at eye level
– Tap the pipette against the conical flask wall	– Failed to transfer pipette waste into a waste beaker

Acknowledgements

The author would like to thank colleagues from the School of Applied Science, in particular, Zhang et al. (2013), for sharing their pilot work on self-assessment in the biology laboratory. Special thanks also go to Mr Jonathan, Wing Hong Chee, from the Learning Academy for his comments and critique in the early draft of this work.

References

Andrade H. L. and Heritage M., (2017), Using formative assessment to enhance learning, achievement, and academic self-regulation, Routledge, pp. 1–24.
Blazeck A. and Zewe G., (2013), Simulating simulation: promoting perfect practice with learning bundle–supported videos in an applied, learner-driven curriculum design, Clin. Simul. Nurs., 9(1), e21–e24.
Boucek C. D., Phrampus P., Lutz J., Dongilli T. and Bircher N. G., (2009), Willingness to perform mouth-to-mouth ventilation by health care providers: a survey, Resuscitation, 80(8), 849–853.
Boud D., (2003), Enhancing learning through self assessment, New York, NY: RoutledgeFalmer.
Cabin R. J. and Mitchell R. J., (2000), To Bonferroni or not to Bonferroni: when and how are the questions, Bull. Ecol. Soc. Am., 81(3), 246–248.
Dawes L., (1999), Enhancing reflection with audiovisual playback and trained inquirer, Stud. Contin. Educ., 21(2), 197–215.
DeKorver B. K. and Towns M. H., (2015), General chemistry students’ goals for chemistry laboratory coursework, J. Chem. Educ., 92, 2031–2037.
Field A., (2013), Discovering statistics using IBM SPSS statistics, SAGE publications.
Galloway K. R. and Bretz S. L., (2016), Video episodes and action cameras in the undergraduate chemistry laboratory: eliciting student perceptions of meaningful learning, Chem. Educ. Pract., 17, 139–155.
Gibbins L. and Perkin G., (2013), Laboratories for the 21st century in STEM higher education: a compendium of current UK practice and an insight into future directions for laboratory-based teaching and learning, Centre for Engineering and Design Education© Loughborough University.
Hennah N. and Seery M. K., (2017), Using Digital Badges for Developing High School Chemistry Laboratory Skills. J. Chem. Educ., 94(7), 844–848.
Hensiek S., DeKorver B. K., Harwood C. J., Fish J., O’Shea K. and Towns M., (2016), Improving and assessing student hands-on laboratory skills through digital badging, J. Chem. Educ., 93(11), 1847–1854.
Hofstein A. and Lunetta V. N., (1982), The role of the laboratory in science teaching: neglected aspects of research, Rev. Educ. Res., 52(2), 201–217.
Hofstein A. and Lunetta V. N., (2004), The laboratory in science education: foundations for the twenty-first century, Sci. Educ., 88(1), 28–54.
Parry D., Walsh C., Larsen C. and Hogan J., (2012), Reflective practice: a place in enhancing learning in the undergraduate bioscience teaching laboratory? Biosci. Educ., 19(1), 1–10.
Prades A. and Espinar S. R., (2010), Laboratory assessment in chemistry: an analysis of the adequacy of the assessment process, Assess. Eval. Higher Educ., 35(4), 449–461.
Russell C. B. and Weaver G., (2008), Student perceptions of the purpose and function of the laboratory in science: a grounded theory study, Int. J. Scholarship Teach. Learn., 2(2), 9.
Towns M., Harwood C. J., Robertshaw M. B., Fish J. and O’Shea K., (2015), The digital pipetting badge: a method to improve student hands-on laboratory skills, J. Chem. Educ., 92(12), 2038–2044.
Veal W. R., Taylor D. T. and Rogers A. L., (2009), Using self-reflection to increase science process skills in the general chemistry laboratory, J. Chem. Educ., 86(3), 393–398.
Zhang P. C., Tan K. B., Loh G. H. and Goh K., (2013, September), Self and Peer Assessment in Laboratory Skills, Oral presentation at the Joint ‘7th SELF Biennial International Conference’ and ‘ERAS Conference’, Singapore.