Validation of the Physical Education Teacher ’ s Efficacy for Standards-Based Instruction ( ESBI ) Scale

The main purpose of this study was to determine the validity and reliability of the Efficacy for Standards-Based Instruction (ESBI) scale, developed by the current investigators, and to compare the ESBI with two other self-efficacy scales that had been used in physical education (TESPE, Chase, Lirgg, & Carson, 2001;TSES, Tschannen-Moran & Hoy, 2001). The ESBI, TESPE, and TSES were administered to 60 physical education teachers from 16 school districts in Iowa. Cronbach’s alpha (internal consistency) for the ESBI was .96, and the Equal-Length Spearman Brown split-half coefficient inferred good reliability (r = .90). The ESBI demonstrated better validity and reliability than the previously developed TESPE (Cronbach’s alpha = .89; Spearman Brown split-half coefficient = .86) and TSES (Cronbach’s alpha = .84; Spearman Brown split-half coefficient = .79). As a test of concurrent validity for ESBI, Pearson’s product moment correlations were performed to test the extent to which the total efficacy scores and subscales were related. The ESBI, TESPE, and TSES all had significant positive correlations with each other (p < .01). Validation of the three selfefficacy scales was also performed using the ranked Physical Education Curriculum Analysis Tool (PECAT) score for each district as an independent measure. The ESBI scale produced a low but significant correlation (r = .28, p < .05) with PECAT, but TSES and TESPE were not significant. This suggested that ESBI was more related to standards and benchmarks than the other two measures. These results indicate that the ESBI has shown good (versus TESPE) or better (versus TESES) validity and reliability compared with previous work. This work also supports Bandura’s (1986) notion of specificity for self-efficacy.


Introduction
A well-defined line of research has been conducted on the role of self-efficacy in teaching and learning environments.Originated by Bandura (1977), self-efficacy has been defined as the "belief in one's capabilities to organize and execute the courses of action required to produce given attainments".Bandura hypothesized that self-efficacy levels influenced an individual's magnitude of adherence to specific goals and persistence while trying to achieve those goals.
Although there is a growing body of research around the construct of teacher efficacy in general education settings, only limited research can be found in physical education (Martin & Kulinna, 2003).The need to develop an instrument that will measure teacher's efficacy beliefs within physical education settings is justified.A number of factors make the measurement of teacher self-efficacy a difficult task.Researchers have questioned the validity and reliability of existing measures (Tschannen-Moran & Hoy, 2001;Henson, Kogan, & Vacha-Haase, 2001).For example, there has been disagreement over the conceptualization of teacher efficacy that has contributed to lack of clarity in measuring the construct.Unfortunately, research on teacher self-efficacy has been "plagued" by methodological and conceptual shortcomings (Bandura, 1997;Woolfolk & Hoy, 1990).Ross' (1994) meta-analytic study, for example, found that virtually all 87 studies he examined viewed teacher efficacy as a generalized expectancy, contrary to the domain-and task-specific conceptualization of self-efficacy.Additionally, self-efficacy has been inadequately assessed with one-item scales that have failed to achieve correspondence between the self-efficacy measure and the behavior of interest (Bandura, 1997).This suggests that connecting school district curriculum maps to a valid measure of teacher self-efficacy could be of interest.
Definitions of teacher self-efficacy (e.g., Hoover-Dempsey, Bassler, & Brissie, 1987;Hoy & Woolfolk, 1993) have also confounded self-efficacy with outcome expectations and locus of control (Guskey & Passaro, 1994), making it difficult to substantiate conclusions in this area.Therefore, reports that teacher self-efficacy is positively related to perceptions of parental involvement (e.g., home tutoring; Hoover-Dempsey et al., 1987), administrative attention and support (Ashton & Webb, 1986;Chester & Beaudin, 1996), colleague collaboration (Chester & Beaudin, 1996;Hoy & Woolfolk, 1993), and a rigorous academic climate (Woolfolk & Hoy, 1990) must be viewed with caution.Bandura (1997) has said that there is a need for sound self-efficacy measures in education that are based on the theoretical underpinnings of Social Cognitive Theory.
There are questions about the extent to which teacher's efficacy is specific to given context and to what extent efficacy beliefs are transferable across contexts.Teachers' sense of teaching efficacy is not necessarily uniform across different subjects.Bandura (1997) contends that teacher efficacy scales should be linked to the various knowledge domains.Multi-item measures are an improvement over single-item ones, but teacher efficacy scales are, for the most part, still cast in a general form rather than being tailored to the domains of instructional functioning.In addition, the appropriate level of specificity in the measure of teacher efficacy has been difficult to discern.Although the Gibson & Dembo (1984) measure has been the most popular of the teacher efficacy instruments to date, problems remain both conceptually and statistically.The lack of clarity about the meaning of factors and the instability of the factor structure make this instrument problematic for researchers (Tschannen-Moran & Hoy, 2001).
When measuring teacher efficacy, Bandura (1997) recommended various levels of task demands, allowing respondents to indicate the strength of their efficacy beliefs in light of a variety of impediments or obstacles and providing a broad range of response options.The Teachers Sense of Efficacy Scale (TSES) (Tschannen-Moran & Hoy, 2001) was developed with a scale similar to Gibson & Dembo (1984) and included portions of Bandura's scale.The factor structure, reliability, and validity of this measure have been examined in three separate studies.The results of these analyses indicate that the Teacher's Sense of Efficacy Scale (Long and Short ver-sion) can be considered reasonably valid and reliable.Positive correlations with others measures of personal teaching efficacy (r = .64,p < .01)provide evidence for construct validity (Tschannen-Moran & Hoy, 2001).The Gibson & Dembo (1984) instrument focuses on coping with student difficulties and disruptions as well as overcoming the impediments posed by an unsupportive environment.Lacking were assessments of teaching support of student thinking, effectiveness with capable students, creativity in teaching, and the flexible application of alternative assessment and teaching strategies.The TSES addresses some of these limitations by including items that assess a broader range of teaching tasks, as advocated by Bandura (2001).
In 1999, Chase and Lirgg received the AAHPERD Research Grant Program Award for their development of the Teacher Efficacy Scale in Physical Education (TESPE).The scale is based on what the researchers identify as the four dimensions of physical education teacher efficacy: motivation, analysis of skills, preparation, and communication.Although documented use of the TESPE is severely limited, these outcomes are important variables in preparing physically educated students.Chase and Lirgg theorized that teacher efficacy would affect a teacher's commitment to teach, persistence in teaching, use of time in providing instruction, and the quality and type of feedback provided to students.To test this model, sixteen preservice teachers completed the Teacher Efficacy Scale for Physical Education (TESPE) and were videotaped teaching one lesson in physical education (Chase, Lirgg, & Sakelos, 2003).Results of a one-way analysis of variance of instructional time and quality of feedback indicated that there were differences between the teachers with high teacher efficacy and those teachers with low teacher efficacy.Teachers with high efficacy provided more Academic Learning Time (82%) than the teachers with low efficacy (76%).Teachers with high teacher efficacy also provided more specific reinforcement (M = 15.20),general encouragement (M = 3.20), specific informational feedback (M = 15.20),general organization (M = 22.40), and less general punishment (M = .40)feedback than teachers with low teacher efficacy (specific reinforcement (M = 7.00), general encouragement (M = 1.80), specific informational feedback (M = 7.60), general organization (M = 19.80),and less general punishment (M = 2.00)).Overall, teachers with high efficacy were more positive in their feedback to students than teachers with low teacher efficacy.
Teachers are critical in determining the activities children engage in during physical education classes.They can decide to implement curriculums and teach lessons that focus on social skills, sport skills, or health-related fitness.The choices teachers make about day-to-day lesson content have an impact on the children during class (Martin & Kulinna, 2003).Self-efficacy is an important teaching variable but is difficult to assess (Bandura, 1997).Appropriate assessment of self-efficacy must be context-specific and includes various levels of task demands (Bandura, 2001).The Efficacy for Standards-Based Instruction (ESBI) Scale developed for this study is an important contribution to physical education research because it is based on Bandura's guidelines.The ESBI could offer a much needed, theoretically sound and methodologically valid and reliable test score for assessing physical education teachers' self-efficacy for teaching quality lessons that are based on the national standards.Therefore, the objective in this research was to test the validity of the ESBI self-efficacy instrument for physical education teachers.A secondary objective was to analyze the validity of the TESE and TESPE scales.

Teacher Efficacy Scale in Physical Education (TESPE)
Three measures of self-efficacy were taken.The Teacher Efficacy Scale in Physical Education (TESPE) was used to assess how confident each teacher feels that he or she can positively affect the learning of students (Chase, Lirgg, & Carson, 2001).The TESPE consists of 16 items on four dimensions of teacher efficacy; motivation, analysis of skills, preparation, and communication.Each item is rated on a 7-point Likert scale from 1 (no confidence) to 7 (extremely confident) and follows the stem "How sure are you in your ability to…" The TESPE was administered to physical education teachers at two time points; baseline (Time 1), and conclusion of six-week intervention in virtual space (Time 2).Previous research presumed validity and reliability of this measure, but has failed to report the specific data (Chase et al., 2001;Magyar, Guivernau, Gano-Overway, Newton, Kim, Watson, & Fry, 2007).

Teacher's Sense of Efficacy Scale (TSES-Short Form)
To account for the currently unknown construct validity of the TESPE, a second measure of self-efficacy was used-the Teachers' Sense of Efficacy Scale (TSES-short form) (Tschannen-Moran & Hoy, 2001).Sample items include "How much can you gauge student comprehension of what you have taught?and "To what extent are you able to tailor your lessons to the academic level of your students?"A 9-point scale is used for each item, with anchors at 1-nothing, 3-very little, 5-some influence, a bit, and 9-a great deal.Concurrent validity of the short and long forms has been identified by assessing the correlation of this measure with other existing measures of teacher efficacy.

Efficacy for Standards-Based Instruction (ESBI)
The third self-efficacy tool, the Efficacy for Standards-based Instruction (ESBI) scale to measure self-efficacy for curricular decisions of physical education teachers was created for this study because this type of self-efficacy measurement does not exist.The ESBI (Figure 1) consists of 20 items devised from the specific objectives for PECAT to rate physical educator's confidence in their ability to align district standards, benchmarks, lessons, and assessments and relate these to the national physical education standards.Following Bandura's (1995) guidelines, strength of teacher efficacy beliefs were recorded using a 100-point scale, ranging in 10-unit intervals from 0 ("Cannot do"); through intermediate degrees of assurance, 50 ("Moderately certain to do"); to complete assurance, 100 ("Certain can do").

Procedures
District superintendents were sent a letter (Appendix A) requesting permission to contact physical education teachers at their district to attend a physical education workshop.After approval was granted from each school district Superintendent, Institutional Review Board approval was granted by the university.All elementary, middle school, and high school physical education teachers within each district with administrator approval were invited to complete the series of questionnaires.Three self-efficacy instruments (TSES, TESPE, and ESBI) were administered at the beginning of a professional development workshop for physical education teachers.All participants completed an informed consent.

Statistical Analysis
The data were analyzed using the Statistical Package for the Social Sciences (SPSS, Chicago) software (version 17).Spearman's correlation coefficients were used to determine concurrent validity of the scores on the ESBI using the TESPE and TESE as criterion measure.To this end, the scores of the baseline ESBI were used to exclude the possibility of biases resulting from an increased awareness of self-efficacy or a learning effect.
Internal consistency was examined with Cronbach's alpha, a widely used method based on variance among parts of the test.Reliability coefficients for each dimension were calculated by two way analysis of variance.Discriminant validity was tested by comparing item to scale correlation with item to other scale correlations.The correlation matrix based on Pearson's product-moment correlation was used to analyze inter-item correlations, item-total correlations, and correlations between the subscales.

Results
The descriptive statistics (means, standard deviations, skewness, and kurtosis) for the ESBI, TESPE, and TSES instruments are shown in Table 1.

ESBI
The Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy test result for the ESBI was .87 and Bartlett's test of sphericity was significant (X 2 = 1239.69,df = 190, p < .01).These results showed that the sample size was adequate and has shown sphericity (variables are either equally correlated or uncorrelated).Cronbach's alpha coefficient was .96.The Guttman's split-half coefficient procedure and Equal-Length Spearman Brown split-half coefficient both resulted in a good reliability coefficient (r = .90).These reliability coefficients infer that the test halves are highly correlated and the questionnaire has good internal consistency.

TESPE
The KMO of sampling adequacy test result for the TESPE was .81 and Bartlett's test of sphericity was signify- cant (X 2 = 528.11,df = 120, p < .01).These results showed that the sample size was adequate and has shown sphericity.Cronbach's alpha coefficient was .89(see Table 2).The Guttman's split-half coefficient procedure resulted in a good reliability coefficient (r = .86).Likewise the Equal-Length Spearman Brown split-half coefficient showed a good correlation (r = .87).

TSES
The KMO of sampling adequacy test result for the TSES was .80 and Bartlett's test of sphericity was significant (X 2 = 268.82,df = 66, p < .01).These results showed that the sample size was adequate and has shown sphericity.The Cronbach's alpha coefficient was .84.The Guttman's split-half coefficient procedure and Equal-Length Spearman Brown split-half coefficient both resulted in a good reliability coefficient (r = .79).

Pearson Correlation Analyses
Pearson's product moment correlations were performed to test the extent to which the total efficacy scores were related (see Table 3).The ESBI, TESPE, and TSES all had a significant positive correlations with each other (p < .01).Pearson's product moment correlations were also performed to test the extent to which the ESBI, TESPE, and TSES subscale scores were validated.Subscales for all three self-efficacy measures were also significantly correlated within each scale (see Tables 4-6).Validation of the ESBI, TESPE, and TSES was performed using the ranked Physical Education Curriculum Analysis Tool (PECAT) score for each district as an independent measure (see Table 7).PECAT was developed by the Centers for Disease Control and Prevention (CDC) ( 2006) to assess how closely district physical education curricula align with national standards.The ESBI scale produced a low but significant correlation (r = .28,p < .05 with PECAT but TSES and TESPE did not.

Discussion
The purpose of this study was to test the internal consistency and validity of the ESBI self-efficacy instrument for physical education teachers and to compare it with previous self-efficacy scales used in physical education.This work shows that that the ESBI is good (TESPE) or better (TSES) compared with previous work for validity and reliability.Based on the current findings, researchers can be confident that the ESBI can produce valid and     reliable scores when used to assess physical education teacher self-efficacy for standards-based instruction.These results strongly support the validity and reliability of the ESBI and provide some support for the use of the TESPE and TSES, but to a lesser degree.
The ESBI subscales tended to maintain stronger score integrity than the TESPE and TSES subscales.This finding suggests that the TESPE and TSES may be more susceptible to measurement error problems (Tschannen-Moran et al., 1998).Accordingly, the use of the TSES as a measure of physical education teacher efficacy is particularly questionable.Because the ESBI and TESPE were developed for physical education teachers, outcomes were better than the TSES.This supported the being contextually specific as Bandura (1997) proposed.
Of particular interest was that both the ESBI and TESPE featured a Preparation (or Planning) subscale.Intuitively, these two subscales might be examining self-efficacy towards the same specific skills.An additional Pearson-product moment correlation indicated that the ESBI Preparation and TESPE Planning subscales were significantly and positively related (r = .54,p = .01),more than any other subscale for these instruments.Although phrased differently, both subscales include items reflecting the following: 1) preparation of lessons that promote learning, 2) use of lesson plans using behavioral objectives and 3) developmental appropriateness.ESBI items referred to key terms such as "alignment" and "standards and benchmarks" while TESPE did not.
The tests consistently yielded lower score reliabilities for the TSES subscales (Table 6).If the TSES is to see continued use in the study of physical educator teacher efficacy, it likely should undergo revision with an eye to measurement integrity and be replaced with efforts to more reliably measure the outcome expectancy dimension of Bandura's (1997) Social Cognitive Theory.Validation is a continuous process however, so future research should examine other psychometric properties of the ESBI with larger and more diverse samples.

Table 3 .
Mean scores, standard deviations, and pearson correlations for esbi, tespe, and tses total efficacy scores.

Table 4 .
Mean scores, standard deviations, and pearson correlations for ESBI Subscales.

Table 5 .
Mean scores, standard deviations, and pearson correlations for TESPE Subscales.

Table 6 .
Mean scores, standard deviations, and pearson correlations for TSES Subscales.

Table 7 .
Validation of ESBI, TESPE, and TSES using PECAT scores as an independent measure.