Evidence-Based Effective Teaching Behaviors for Complex Psychomotor Skills Training

Introduction: Although the research to operating room teaching is extensive, evidence relating surgical teachers’ behaviors to trainees’ objective complex psychomotor skills acquisition is limited. We aimed to identify objectively evidence-based teaching behaviors in hands-on training associated with in-creased complex psychomotor skills in surgical and non-surgical trainees. Methods: The MEDLINE, PsycINFO and ERIC databases were searched for relevant papers. Due to comparable training characteristics to complex surgical skill acquisition, papers on sports and music training were also included. Paper screening took place after training sessions with the inclusion and exclusion criteria. Inter-rater reliability was determined. Data were extracted and the quality of studies was assessed with the Medical Education Research Study Quality Instrument (MERSQI) and the Newcastle-Ottawa Scale-Education (NOS-E). Results: 18,337 references were identified. Seven studies were included. Teaching behaviors shown to improve trainees’ objective skills acquisition included feedback, instruction, active trainee involvement and demonstrations. Feedback and instruction with an external focus on the task and effect were supported by the strongest evidence. There was significant evidence regarding negative effects of harshly criticizing and belit-tling teaching


Introduction
Skills training is inseparably linked to surgical practice (Kneebone, 2003). Although simulation offers a safe training environment (Kneebone, 2003;Sutherland et al., 2006) teaching in the operating room (OR) remains central to surgical education (Agha & Fowler, 2015;De Win et al., 2016;Kneebone, 2003;Konge & Lonn, 2016;Li & George, 2017;Sutherland et al., 2006). While being taught by experienced surgeons, it is in the OR that trainees develop a set of skills, tacit knowledge, and abilities to cope with complex, stressful and variable situations (Agha & Fowler, 2015). Training trainees in the OR is a challenging task as teaching and safe surgical care have to be combined. To offer the best possible training it is important to understand how to successfully support learning in the OR. This may be even more important since the time and opportunities available for teaching in the OR are decreasing (Anderson et al., 2013;McCaskie, Kenny, & Deshmukh, 2011;Reznick & MacRae, 2006;Snyder et al., 2012).
Studies on simulation training are also extensive (Dawe et al., 2014;Gurusamy, Aggarwal, Palanivelu, & Davidson, 2009;Sturm et al., 2008;Sutherland et al., 2006); however, it is again hardly addressed which specific hands-on teaching behaviors are used. This makes it difficult to draw firm conclusions and make evidence-based recommendations on how to teach in the OR from a behavior standpoint.
In order to address this evidence gap, we conducted a literature review study according to systematic review principles. Our aim was to identify hands-on, evidence-based teaching behaviors which have been shown to be effective for 1287 Creative Education complex psychomotor skills acquisition in adult trainees. We looked for teaching behaviors that were associated with an objectively measured improvement of psychomotor skills, not a perceived improvement. By gaining more knowledge on this topic and discussing our findings, we expect to be able to express recommendations for surgical teachers to help them teach complex surgical psychomotor skills.

Scoping Search
A scoping search showed that research on evidence-based teaching behaviors for hands-on surgical complex psychomotor skills was scarce. We considered a skill complex if it was executed with specialized equipment, involves multiple actions and requires conscious cognitive demands. Previous research on surgical skills teaching turned to the fields of sports and music education due to similarities in complexity and training intensity (McCaskie et al., 2011;White et al., 2016).
Keeping with this approach, we searched for teaching behaviors in the fields of surgical, medical, sports, and music education.

Search Terms
Together with a librarian specialized in systematic reviews we defined our search strategy and searched studies published prior to February 14, 2018, in the MEDLINE, PsycINFO and ERIC databases (Table 1). All references were imported into Endnote X7 (Thompson Reuters, Philadelphia, PA, USA).

Screening
All references were screened on title and abstract according to a three-stage pre-screening and screening approach (Figure 1). For pre-screening, a medical student was trained using 3 sets of 100 references, independently screened by the student and one author (SA) achieving an inter-rater reliability of Cohen's Kappa 0.7. The student pre-screened all references using inclusion and exclusion criteria (Table 2). We took a conservative approach: in the case of any doubt, a reference was forwarded to the screening stage.
In the screening phase we applied stricter inclusion and exclusion criteria (Table 3). Two times 100 references were independently screened by two researchers (SA and JML) for training ( Figure 1). An inter-rater reliability of Cohen's Kappa 0.7 was reached. All references were then independently screened by SA and JML, achieving a moderate inter-rater reliability of Cohen's Kappa 0.6. Disagreements were resolved by a third researcher (CF) who was informed about the disagreement, but who was blinded for the individual decisions made by the other researchers. Papers that were included by CF were accepted for full text screening.
For full text screening we applied the strict inclusion and exclusion criteria (Table 3). Due to the moderate inter-rater reliability obtained during screening, 1) ((educat* or teach* or instruct* or tutor* or guid* or coach* or train* or feedback or mentor* or supervis* or pedagog* or faculty) adj3 (behavio* or activit* or practice* or interaction* or action* or characteristic* or strateg* or approach*)). mp. [mp = abstract, title, heading word, identifiers] 2) ((train* or student* or learn* or apprent* or intern or interns) adj3 (perform* or develop* or progress* or achiev* or competen* or advanc* or enhanc* or improv* or accomplish* or expert* or proficien* or skill* or gain* or grow* or outcome* or effect* or abilit*)). mp. [mp = abstract, title, heading word, identifiers] 3) (surg* or medic* or sport* or athlet* or music* or instrument* or hospital* or operating room* or operating theat* or intraoperative or intra operative Studies in other languages than English. *We consider psychomotor skills to be complex if 1. specialized equipment is required for their execution, and or 2. dynamic decision making is needed to select the proper skill to execute, and or 3. execution of the skill requires conscious attention even after training (e.g., the skill is physically or cognitively challenging).

Data Extraction, Quality Assessment and Level of Impact
For each experiment in each included study, we extracted general information (authors and field of research), study set-up (research aim and design), outcome measures (evidence-based outcome measures used in the study, data collection methods and bias risk assessment), and the teaching behaviors that were shown to be effective in the trainees' acquisition of skills.
To assess each experiment's quality and risk for bias we used the Medical Education Research Study Quality Instrument (MERSQI) and the Newcastle-Ottawa Scale-Education (NOS-E) for quantitative research (Cook & Reed, 2015). The MERSQI focuses on study design, number of institutions used for sampling, response rate, subjective or objective data collection, validity evidence of the applied instruments, appropriate data analyses, and impact of the outcome measures. The NOS-E focuses on representativeness of the trainees, selection and comparability of a comparison group, likeliness for study retention, and use of blinded assessors. Because of their different focus, the two scoring systems are considered complementary (Cook & Reed, 2015).
For each experiment, individual MERSQI and NOS-E items were scored and total scores were calculated (maximum MERSQI score: 18; maximum NOS-E score: 6), and compared to the normative score of 12.3 (MERSQI) and 3.58 (NOS-E) (Cook & Reed, 2015). If an experiment used two methods to analyze MERSQI and NOS-E total scores give an indication of the overall quality.
However, quality assessment should also take into account the individual MERSQI and NOS-E items (Cook & Reed, 2015). Applying the MERSQI and NOS-E enabled us to compare and interpret the quality of studies in relation to the normative scores.
We independently determined the impact levels for each experiment based on recommendations for evidence in medical education from Belfield et al. (2001) who based their work on the research of Kirkpatrick (Belfield, Thomas, Bullock, Eynon, & Wall, 2001). We identified four main impact levels (Table 4).

Study Quality
Using the MERSQI and NOS-E checklists, we identified sources of bias per experiment (Table 5). Total MERSQI and mean total MERSQI scores (ranging from 11.5 -15) were either near or above the normative score of 12.3 (Cook & Reed, 2015). Total NOS-E and mean total NOS-E scores for all experiments were below the normative score of 3.58 (ranging from 1 -3.5). Only one experiment used a method which achieved a NOS-E score above the normative score (4 (Wulf et al., 2002)). The level of impact for all experiments was focused on learner outcomes (Table 4 and Table 5). This enabled us to assess each experiment's quality and interpret the strength of the results.  (Table 5). Trainees who received harshly criticizing feedback performed worse than trainees who received encouraging positive feedback, minimal and neutral feedback, and no feedback.

Research in Surgical and Medical Skills Training
Trainees who received encouraging positive feedback did not perform better than trainees who received minimal and neutral feedback, and no feedback. The researchers concluded that it was not positive feedback which improved learning, but harshly negative and threatening feedback which impaired learning.

McSparron et al. (2015) analyzed feedback and instruction behaviors of teachers while they were teaching subclavian central venous catheter (S-CVC)
insertion to a trainee who was instructed to show challenging learning behavior.
Subsequently, the teachers' feedback and instruction behaviors were related to the performance of real novice trainees in a consequent training session (Table   5). Positive feedback (interpreted by the researchers as constructive feedback), suggestions as to how to improve, and step-by-step demonstrations were positively correlated. The regular repetition of learning goals was negatively correlated. The researchers concluded that this may be less effective for technical skills teaching. Duke and Henninger (1998) Wulf et al. (1999Wulf et al. ( , 2002 compared the effectiveness of two types of feedback and instructions in teaching sports skills, which either externally focused on the task and effect, or internally focused on how to move (Table 5). Regarding accuracy of the trained sports skills, trainees whose teacher provided externally focused feedback and instructions performed better during training and reten-

Strength of the Evidence
The evidence for the described teaching behaviors is mostly weak and limited to improved learner outcomes in a training setting (second lowest level of impact; Table 4). We scored the strength of evidence weak for the results and conclusions drawn by Harrison et al (1995), Duke and Henninger (1998), Henninger et al. (2006), McSparron et al. (2015) and Flinn et al. (2016). Table 5 shows detailed information regarding the MERSQI and NOS-E bias risk assessment.
We scored the strength of the studies conducted by Wulf et al. (1999Wulf et al. ( , 2002 moderate, providing the strongest evidence of all studies included in this review. The mean total MERSQI scores are all above the normative score, and the highest of all experiments included in this review. The mean total NOS-E scores are all around, but mostly below the normative score (Table 5). Still, important risks for bias remain.

Discussion
Our goal was to identify evidence-based teaching behaviors which improved complex psychomotor skills acquisition in a hands-on training setting, applica-  Threatening feedback was found to be harmful to trainees' skills acquisition (Flinn et al., 2016). The importance of non-threatening feedback is supported by surgical review studies (McKendy et al., 2017;Timberlake, Mayo, Scott, Weis, & Gardner, 2017). Threatening feedback causes stress in trainees (Flinn et al., 2016), which is considered harmful since it is induced by a factor outside the learning task itself (Joëls, Pu, Wiegert, Oitzl, & Krugers, 2006;Schwabe, Joëls, Roozendaal, Wolf, & Oitzl, 2012;Vogel & Schwabe, 2016), namely the teacher.
Interestingly Externally focused feedback and instructions on task and effect were superior to improve accuracy and movement quality in sport skills (Wulf et al., 1999;Wulf et al., 2002), especially if intensely provided in trainees with experience, and in a setting of perceptual feedback (Wulf et al., 2002). Step-by-step demonstrations have also been shown to improve skills  Henninger et al. (2006) found that actively involve trainees was effective in skills training. Although active involvement was not clearly defined, they considered it important for teachers to know when to direct trainees, when to allow them to talk, ask questions and verbalize actions and thoughts. Surgical review studies support this finding, although their evidence is primarily based on perceptions and not objective measurements (McKendy et al., 2017;Timberlake et al., 2017). Verbalization by trainees is also considered a key step in surgical skills training because gaining insight in trainees' reasoning processes helps to teach effectively (Nicholls et al., 2016).
Interestingly, the Peyton approach (Nicholls et al., 2016), which is often used in surgical skills training, seems to be compatible with the identified evidence-based teaching behaviors: non-threatening, externally focused feedback on task and outcome, instructions, suggestions how to improve, step-by-step demonstrations and active trainee involvement. The effectiveness of this Peyton approach may be improved by the integration of these behaviors. However, the Peyton approach requires one skill to be taught and performed at least four times in a row (Nicholls et al., 2016), which, in our view, questions its compatibility to teaching in the OR.

Strengths and Weaknesses of Our Study
The strength of our review is the focus on the objective measurement of effects of teaching behaviors on the acquisition of complex psychomotor skills. Our review addresses the growing need for optimal OR teaching behaviors as well as objective assessment of training quality and trainees' skill level to assure safe and effective surgery performed by surgical residents. Our research underlines that much can be gained in the field of surgical educational research. Since only one surgical paper was included one might question the applied inclusion and exclusion criteria we applied. However, the small amount of papers in comparable professions and sports may imply that attention for objective outcomes of different teaching behaviors is generally limited.
The teaching behaviors with the strongest evidence originated from sports research, which questions the translatability of our findings to surgery. Differences exist regarding very fine motor skills training (fingers). However, the intensity and extensity of training necessary to reach proficiency is comparable.

Implications for Future Research and Surgical Practice
We were surprised to find a lack of research investigating the effects of teaching  Setting: Trainees were equally divided into four groups during simulation training, based on order of sign up: a control group (with no feedback); an observation only group (teacher only observing the trainee using minimal and neutral feedback); an encouraging feedback style group (teacher providing positive feedback and acting encouraging), and a harshly criticizing feedback style (teacher providing harsh criticism, being sarcastic and condescending). Trainees were encouraged to perform as quickly and accurate as possible. Trainees Trainees in all groups improved significantly during training. Trainees who were provided with harshly criticizing feedback behaviors scored lower than the trainees in the other groups on overall performance scores (time and accuracy scores transformed into a performance score based on the SAGES FLS scoring system; no further description {A}). Trainees provided with positive and encouraging feedback scored similar to trainees who received minimal and neutral feedback, and trainees who received no feedback.