Application of the Objective Structure Clinical Evaluation in Evaluating Clinical Competence for a BSN Program

Given the complex clinical situations and the dynamic nature of nursing, the greatest challenge for nursing educators is performing objective evaluation of students’ clinical competence. The Objective Structured Clinical Evaluation (OSCE) is designed to address this challenge and it has been widely used in nursing education. However, its implementation in nursing education in Taiwan has been limited. Accordingly, a quasi-experimental design was utilized to examine students’ clinical performance and stress levels using a 12-station OSCE assessment. Further, we investigated the inter-rater reliability and internal consistency of the OSCE. After controlling for scores of clinical performance, overall differences in preand post-practicum stress in the OSCE group were significantly higher than in the control group (F (1, 89) = 4.89, p = 0.03). There was no group effect on practicum performance after controlling for grade point average (F (1, 89) = 2.69, p = 0.14). Cronbach’s alpha for 12 OSCE stations ranged from 0.22 to 0.80 and inter-rater reliability for all 12 stations showed Pearson’s r ranging between 0.76 and 1.00. Cohen’s kappa ranged from 0.70 to 1.00 (p < 0.001). Future studies should explore how OSCE can be best and most cost-effectively incorporated into the BSN curriculum.


Introduction
The nursing profession involves complex clinical situations requiring considerable adaptability, and thus most employers expect new graduates to be well prepared for a wide range of functions in addition to specific entrylevel skills for providing safe care [1] [2].To date, the literature has indicated that nursing graduates are not sufficiently competent to handle the challenges faced in daily practice [1], resulting in disappointed employers, frustrated new graduates and dissatisfied patients.Thus, it is important for nursing educators to develop pedagogy and practicum training to enhance students' clinical competence, thereby minimizing the education-service gap.
To foster professional socialization and gain confidence for nursing student, preceptorship programs in the clinical practicum have been successfully implemented to reduce anxiety and stress as well as promote smooth adaptation from student nurse to novice nurse [3].To validate students' competence, numerous outcome evaluations have been investigated to demonstrate the abilities of new graduates, and these have been adapted by nursing education [4].Over the past 20 years, however, the greatest challenges have been how to measure competence precisely and objectively, as the nature of competence is multi-faceted and influenced by students' level of confidence, comfort and self-efficacy [5].
The Objective Structured Clinical Examination (OSCE) has been identified as a useful assessment strategy for evaluating students' learning and clinical performance in the medical and nursing disciplines [6]- [8].Walsh et al.'s [8] integrative literature review found support for the utility of OSCE as a strategy to measure clinical competence in nursing, but the greatest concerns included a paucity of studies documenting the suitability of the design for measuring nursing clinical competence and the psychometric properties of the OSCE.To date, no studies have examined the effects of utilizing the OSCE as a strategy to measure students' clinical competence prior to the implementation of preceptorship programs.Thus, the purpose of this study was to investigate the effects of OSCE before the last-mile practicum, which is the final 6-week clinical practicum that usually commences at the beginning of the second semester of a nursing student's final year when a student is assigned to a preceptor and works as a novice nurse.We also examined the reliability and validity of the OSCE at each OSCE station.

Background
Stress, which reflects a dynamic relationship between the individual and environment, is a multidimensional phenomenon [9].Gibbons, Dempster and Moutray [10] suggested that an optimum amount of stress, "eustress," is necessary for mental and physical well-being.However, excessive stress can result in sleep disorders, restlessness, forgetfulness, abnormal fatigue, poor concentration, memory impairment, and hindered problemsolving [9], thus potentially impeding students' attainment of educational goals [11].Previous literature has demonstrated that principal stressors identified by student nurses included examinations, level and intensity of academic workload, the theory-practice gap, and poor relationships with clinical staff during their study period [12].Among these, student nurses rate preceptorship as the most stressful clinical staff relationship [13]- [15].Additionally, the lack of experience, difficult patients, evaluation by faculty, and anxiety about dispensing the wrong medication or information foster stress among novice student nurses stressors, making practicum the most anxiety-inducing component of nursing programs [16].Despite its considerable utility, the preceptorship is rated as the most stressful and challenging practicum experience because the preceptor and student, who are unlikely to be familiar with each other, must work together in arduous working environments [15].The frustration associated with a difficult relationship could result in student stress, dissatisfaction in the nursing profession, and interference with learning and assimilation [15].From a student-centric perspective, it is imperative to transform nursing education practices and reduce stressors, thereby facilitating student continuation and academic success during professional nursing development [17].
The OSCE, first described by Harden and colleagues [18], has been identified as a useful assessment strategy for evaluating students' learning and clinical performance [6] [7].Previous research has indicated the following benefits of the OSCE in medical and nursing disciplines: 1) increased student confidence when faced with challenges in practice [19]; 2) greater objectivity in comparison to most clinical competence measures [20] [21]; 3) simultaneous testing of a large number of students on a broader range of skills and knowledge [8] [21]; 4) reduced risk of examiner bias due to a wide range of examiners [22]; 5) increased communication skills with standardized patients (SPs) [23]; 6) timely feedback regarding errors or concerns during the assessment period [24]; 7) increased skill consistency among students [8].Despite the anxiety and stress surrounding assessment, most students report positive perceptions of the OSCE [8].Although studies have found anxiety to be a confounding factor for OSCE, anxiety might also positively affect performance [8] [22] and more accurately simulate real-life emergencies [6].Walsh et al.'s [8] integrative literature review found support for the utility of OSCE as a strategy to measure clinical competence in medicine (n = 23 papers) and nursing (n = 18 papers).However, as mentioned in the Introduction, few studies have documented the suitability of the design for measuring nursing clinical competence and the validity and reliability of the OSCE.Given these limitations, little is currently known about the effects and quality of OSCE when implemented in the nursing discipline.Thus, the aim of this study was to examine the effects of OSCE prior to the last-mile practicum on student stress levels and clinical practicum performance.We also evaluated the quality of OSCE with the inter-rater reliability and internal consistency of each OSCE station.

Design, Setting, and Sample
This study employed a quasi-experimental design and was conducted in a northern Taiwanese university, from October 2010 to September 2011.The university has two campuses with approximately 7600 enrolled students in 2011 with about 600 (12 classes) BSN students.Participants were recruited using convenience sampling from final-year (n = 103, two classes) BSN students.Participants had to have completed their fundamental, medicalsurgical, obstetric, pediatric, psychiatric, and community health nursing practicums.One student was thus excluded from the study sample, leaving a total of 102.Ten then declined participation due to a lack of interest (9.8% refusal rate), and thus, the final study sample consisted of 92 nursing students.Two weeks prior to the last-mile practicum, two class representatives selected one of two sealed envelopes in the presence of the researcher to determine the experimental (Group O; n = 38) and control groups (Group C; n = 54).Group O performed a 12-station OSCE 3 days before the last-mile practicum, while students in Group C received the standard last-mile practicum preparatory information and training.

Demographic Data
Demographic data included participants' gender, age, previous 3 years' grade point average (GPA), and lastmile practicum score.

Perceived Stress Scale of Nursing Students in Clinical Practice
The Perceived Stress Scale of Nursing Students in Clinical Practice questionnaire was developed by Sheu et al. [25] to measure the types and degree of perceived stressful events, henceforth referred to as the Practicum Stress Scale (PSS).The PSS, which is widely used [13] [26] [27], contains 29 items across six dimensions of stressors: taking care of patients (eight items), school faculty and hospital staff (six items), homework and workload (five items), peers and personal life (four items), clinical environment (three items), and professional knowledge and skills (three items).Each item is scored on a 5-point Likert-type scale from 0 ("never", frequency 0% -10%) to 4 ("all the time", frequency 90% -100%).Total scores can range from 0 to 116 points, with higher scores indicating greater stress.Cronbach's alphas for individual subscales were between 0.59 and 0.81, and the overall Cronbach's alpha was 0.89.The 1-week test-retest reliability of 0.60 (p < 0.01) demonstrated the reliability of this instrument, while the content validity index of 0.94 indicated its validity.In addition, 50.7% of the total variance was accounted for by the six factors, which confirmed the construct validity of this instrument [25].Both groups filled a practicum-related stress questionnaire 1 week before and after the last-mile practicum.

Practicum Performance
In general, subject grades represent the most popular indicator of academic performance in university settings.Therefore, students' practicum clinical performance was measured with a department-developed evaluation form with four domains: 1) clinical performance (50%; data collection, diagnosis, skills, and interventions); 2) communication skills (15%; patients, families, and the medical team); 3) professionalism (15%; attitude, ethics, and patient-centric care); 4) homework (20%).The clinical performance score comprised evaluation from the preceptor (weight 80%) and teacher coordinator (weight 20%).

Procedures
This study was approved by the institutional review board for studies with human subjects from Chang Gung Memorial Hospital (IRB 99-2245B).The principal investigator (PI) explained the purpose of the study to potential participants 3 months before the last-mile practicum began, after which content and procedures related to the OSCE were posted on the nursing department website.One week after PI provided this explanation, informed consent from all potential participants was obtained.

Preparation for the Last-Mile Practicum
The nursing school participating in this study offers a 1056-hour practicum on fundamental, medical-surgical, obstetric, pediatric, psychiatric, and community health nursing, and a "last-mile" practicum.The last-mile practicum, i.e., the final 6-week clinical practicum, usually commences at the beginning of the second semester of a nursing student's final year.During the last-mile practicum, students are assigned a preceptor (senior unit staff member), preceptorship, for individual guidance.Students and incumbents have similar roles and workloads, except for the additional preceptor supervision.Additionally, a teacher coordinates for administrative purposes, communicating with the preceptor, student, and head nurse to ensure that the practicum runs smoothly.The teacher also discusses case report homework, clinic-related problems and problem-solving skills, and addresses other needs with students in a weekly, 4-hour class.Students' final clinical performance scores are given by the preceptor and teacher coordinator.
To equip students' clinical competence, the university provides nursing students with self-learning tools in fundamental and medical-surgical skills (e.g., catheterization, venipuncture, dressing changes, and intravenous injection) using high-fidelity simulators (e.g., Resusci-Anne, Laerdal™, and SimMan™) in clinical competence centers (CCC).Alternatively, students can view standard operating procedures for these skills through the multimedia on demand platform on the university's intranet system.Both groups received same preparation prior to last-mile practicum.

Implementation of the Objective Structured Clinical Evaluation
Brannick, Erol-Korkmaz and Prewett [28] conducted a meta-analysis describing the effects of number of stations, number of raters, competence dimensions, and measurement purpose.The overall (summary) mean alpha across stations was 0.66; the overall mean alpha within stations across items was 0.78.Further, more stations tended to produce higher reliability.Specifically, the unweighted mean alpha for OSCEs with <10 stations was 0.56, but 0.74 for OSCEs with >10 stations.In Taiwan, the OSCE was adopted by institutions in 2006, and it has been mandated as a national medical board examination since 2012.In addition, several healthcare disciplines, including Chinese medicine, dentistry, pharmacology, and nursing, have adapted it to objectively assess students' clinical competence.
The nursing discipline has not been adopted in the national board examination, and thus the OSCE designed for this study was adapted from the 12-station version (120 min) used for Taiwanese medical students.To fully reflect multidimensional clinical competences, this OSCE procedure included six key dimensions: knowledge application and skills practice (five stations), problem identification (three stations), patient and family education (one station), clinical documentation (one station), ethics and patient safety (one station), and communication skills (one station).
Two important core competences for nurses, communication skills and ethical issues, were unobtrusively incorporated into each OSCE station.The 12-station OSCE included eight stations using SPs and four using manikin scenarios.Each station took 10 min to complete: 1 min for reviewing the exam question, 1 min for moving around stations, and 8 min for completing task components.The OSCE was conducted 3 days prior to the lastmile practicum at the clinical competence center (CCC) in the nursing department, a setting accredited by the Taiwanese Association of Medical Education and qualified for the national medical board examination.
Ten non-nursing students and school staff from the university trained by experienced faculty members were recruited as SPs.At each examination station, a checklist was developed to assess desired behaviors or competences, with 18 -30 checklist items.Another two expert faculty members reviewed the checklist for accuracy and thoroughness.Prior to the formal OSCE, two junior students completed a 12-station OSCE to ensure the clarity of questions and sufficiency of allotted time at each station.Further, all 24 raters (two raters for each station) attended the pilot run to familiarize themselves with all OSCE procedures and reach consensus on checklist scoring.
The total duration of OSCE was 120 min per student, and it was conducted in morning and afternoon sessions.Prior to the OSCE, a co-author explained the OSCE process and a staff member guided the first 12 students to their designated stations.After completing the OSCE, the students completed the PSS and a co-author shared their most common mistakes and major errors during the OSCE.To reduce contamination, students were not allowed to leave until all 38 had completed the OSCE.
Students in Group C received the standard last-mile practicum preparatory information and training, but not the 12-station OSCE.Therefore, they completed the PSS 1 week prior to the last-mile practicum.Both groups completed the PSS 1 week after the last-mile practicum.

Data Analysis
Data were analyzed using SPSS 17.0 for Windows (SPSS Inc., Chicago, IL, USA), and the threshold for statistical significance was set at p < 0.05.Prior to data analysis, we confirmed that data were appropriate for the chosen statistical analyses.Descriptive statistics, independent and paired t-tests, analyses of variance (ANOVA), and analyses of covariance (ANCOVA) were conducted to test for differences between groups and a potential main (group) effect on practicum performance after controlling for covariates.Cronbach's alpha, Pearson's r, and Cohen's kappa were used to examine the internal consistency of each station and inter-rater reliability.

Demographics
All 92 participants were female, aged between 21.13 -25.58 years (M = 22.00 ± 0.56).More than half (n = 56, 60.90%) were completing their practicum in a medical center with more than 5000 beds.The general ward was the most common practicum unit (n = 42, 45.70%), followed by emergency or intensive care (n = 39, 42.4%) and operating room (n = 11, 12.00%) (Table 1).In testing for homogeneity between groups, GPA for the previous 3 years was found to be significantly lower in Group O than in Group C (t = −2.09,p = 0.039) after applying Levene's test for equality of variances.There were no significant differences between groups on the six subscales of practicum stress before the practicum.

Practicum Stress
A univariate ANCOVA was used to examine whether there was a main group effect on level of posttest practicum stress after the last-mile practicum after controlling for three potential confounding variables-level of pretest practicum stress, clinical performance, and GPA for the previous 3 years-to adjust for baseline and homogeneity differences between groups (Table 1).There was a significant group effect on last-mile practicum stress, (F (1, 87) = 8.86, p = 0.004) with lower stress in Group O (1.36 ± 0.57) than in Group C (1.59 ± 0.54).Pretest practicum stress (F (1, 87) = 8.39, p = 0.005) and practicum performance (F (1, 87) = 20.00,p < 0.001) significantly affected posttest practicum stress.Further analysis of six stress-inducing factors by controlling for the above three covariates indicated that the decrease in stress from "taking care of patients", "homework and workload", and "professional knowledge and skills" for Group O was significantly greater than it was for Group C, F (1, 87) = 8.38, p = 0.005; F (1, 87) = 16.61,p < 0.001; and F (1, 87) = 7.30, p = 0.008, respectively (Table 2).

Practicum Performance
We employed a univariate ANCOVA to examine whether there was a group effect on practicum performance after controlling for GPA for the previous 3 years and level of pretest practicum stress.Results indicated there was no group effect (F (1, 87) = 2.63, p = 0.108) on practicum performance, but GPA for the previous 3 years was found to significantly affect practicum performance (F (1, 87) = 5.74, p = 0.019).

Internal Consistency of OSCE Stations and Inter-Rater Reliability
Each OSCE station can be considered analogous to a scale or domain of an inventory.Thus, Cronbach's alpha, a measure of internal consistency reliability, was appropriate for examining whether students in identical scenarios consistently produced similar results.Thus, Cronbach's alpha for each station was calculated to ensure in-  ternal consistency within each checklist.Acceptable internal consistency is indicated by a Cronbach's alpha value between 0.60 and 0.70, good internal consistency by values between 0.70 and 0.90 and excellent by values above 0.90 [29].Internal consistency for stations 1, 3, 4, and 7 -12 was acceptable to good (0.63 -0.80), but poor for stations 5 and 6 (0.53 -0.59), and unacceptable for station 2 (0.22).Regarding inter-rater reliability of each station across all 12 stations, an independent samples t-test was conducted to examine whether there were differences between the two raters at each station.There were no interrater differences across the 12 stations (ps > 0.005).Prior to exploring the relationship between any two raters for each participant's exam score at each station, the Shapiro-Wilk test was used to determine the normality of the mean difference scores at each station.The normality assumption was found to be violated at stations 6, 7, 9, and 11, necessitating the use of a non-parametric test, Spearman's rho, to test the relationship between the two raters at stations 6, 7, 9, and 11 with Spearman's ρs = 0.97.Pearson's r was used to explore the relationship between any two raters for each participant's exam score for the remaining stations.Significant correlations were observed across other 8 stations, (rs = 0.76 -1.00).We created a four-level overall performance scoring system with a straightforward 0 -100 numeric scoring system, regrouping scores into "excellent" (≥90), "good" (≥80 and <90), "average" (≥60 and <80), and "failing" (<60).Cohen's kappa used to examine inter-rater reliability between two raters for categorized data, and these values ranged from 0.70 to 1.00 (κs < 0.001), except for station 10, indicating substantial agreement across almost all stations (Table 3).

Practicum Stress
The reduction in last-mile practicum stress levels was significantly greater for Group O (from 2.09 ± 0.71 to 1.36 ± 0.57) than for Group C (from 2.06 ± 0.53 to 1.59 ± 0.54).However, of the three covariates, level of pretest practicum stress and practicum performance significantly affected posttest practicum stress.Decreases in three stress-causing factors, namely, "taking care of patients", "homework and workload", and "professional knowledge and skills" were significantly greater in Group O.In the present study, "homework and workload" was rated as the most stressful dimension prior to the clinical practicum over "relationships with clinical staff," a finding that differed from most previous studies [14] [15].However, it was consistent with Evans and Kelly [12], who indicated level and intensity of academic workload as primary stressors.Considering that the last-mile practicum (one preceptor for each student) differs completely from students' previous clinical experiences (one clinical instructor for every seven students) offering new sources of stress, it is understandable that students from both groups might find working independently to be difficult."Taking care of patients" and "professional knowledge and skills" were among the top three stressors before and after the practicum for both groups, consistent with three previous studies on sources of stress associated with clinical practice [26] [30] [31].However, Group O had lower levels of stress associated with the three stress-causing factors than did Group C. This may have been facilitated by the increased confidence, communication skills, and techniques fostered by the preparatory OSCE evaluation [19] [23] [24].

Practicum Performance
There were no significant differences between the two groups in practicum performance grades after controlling for GPA and pre-practicum stress level.This might be because 1) practicum performance grades were scored by different preceptors with a wide degree of variation; 2) tasks completed during the practicum differed considerably, with different areas (e.g., operating room, intensive care unit, emergency department, general ward, rehabilitation unit, and nursing home) entailing very different tasks, workload demands, and work environments; and/or 3) the student-preceptor relationship is an important determinant of stress during one's practicum [30]- [32], and frustrating or difficult relationships might result in student stress, dissatisfaction in the nursing profession, and/or interference with learning and clinical performance [15].The present study did not establish predictive validity; students who performed the OSCE obtained identical practicum performance grades as the control group.Thus, this study indicates that overall OSCE performance may not fully reflect clinical competence as measured by practicum performance, and it may not predict the transferability of nursing knowledge and skill learned from in-class settings to real clinical situations [28].Future studies might consider employing measurements that specifically examine clinical competence with direct observation, including assessments and a portfolio using a valid nurse competency inventory [1] [5].

Quality of the OSCE
The overall mean alpha across stations was 0.64 (0.22 -0.80), which was lower than 0.78 reported by Brannick et al. [28].The unstable internal consistency of OSCE stations might be due an excess of checklist items (18 -30 per station) and complex scoring systems, particularly at the skill practice station.
According to most literature indicating good agreement with inter-rater scores γ =0.6 [20] reliability of 0.76 - 1.00 for all 12 stations in this study was regarded as good reliability of OSCE and that was consistent with one study [33] (history-taking 0.73 -0.96 and neurological physical examination abilities 0.84 -0.8, and better than another (0.53 -0.96) [34].
In conclusion, implementation of the OSCE prior to the beginning of the last-mile practicum in student nurses had significant effects on reducing practicum stress.However, the effects of the OSCE on clinical competence were limited.From measures of both educators and students, however, the OSCE appears to be a valid, reliable, and comprehensive instrument for assessing nursing skills in conjunction with clinical practice and evaluation.Still, OSCE evaluation cannot replace experience gained from clinical settings [7].Genuine clinical experiences have always been a key component of healthcare education, and the OSCE can supplement evaluation in nursing education.Future studies might consider exploring cost-effective and efficient methods of incorporating the OSCE into the nursing curriculum for the improvement of evaluation.

Limitations and Suggestions
This study had a relatively small sample of nursing students and was geographically restricted to one teaching university in northern Taiwan, perhaps limiting its generalizability.Future randomized clinical trials are recommended to decrease sampling bias.Given that competence will remain key in nursing practice and education for the foreseeable future, multidimensional measurement of clinical competence is recommended.Due to the accumulative nature of clinical competence, future studies may implement a longitudinal design to lengthen the observation time period and implement the OSCE at the beginning of the BSN program rather than in the final year.
In terms of the quality of the OSCE in this study, an excess of checklist items per station, different scoring criteria among stations and the resulting wide range of scores may have reduced indices of content validity.In future studies, we suggest establishing more widely applied standard OSCE protocols, such as <15 checklist items per station and rules for awarding and deducting points for answers.Furthermore, the cost of implementing the OSCE is considerable.Cost can vary depending on the number of stations, raters per station and SP, yet OSCE remains a costly evaluation tool compared with traditional paper and pencil tests.Despite these limita-tions, cost-effective incorporation of the OSCE into the nursing curriculum could help build clinical competence and create links between learning and performance in student nursing assessments.

Table 2 .
Comparison of mean stress scores before and after last-mile practicum.

Table 3 .
Content validity and inter-rater reliability among the 12 stations.