Learning to Improve: Report of a Three-Year Capacity-Building Project Leveraging Professional Development + Coaching to Improve Third-Grade Reading Outcomes

This study reports the results of a three-year capacity building effort to improve core reading knowledge and practice in 165 third-grade teachers working in 63 urban schools and its effects on student reading outcomes. Teachers volunteered to participate in one or two years of professional development lasting from 90 to 180 hours. Core reading knowledge among teachers resulted in statically significant growth with generally large effect sizes. Three cohorts of third-grade students taught by participating teachers were assessed on multiple measures of reading at the beginning and end of each school year. Results for within-year improvement showed large effects on all student outcomes. Analysis of the magnitude of student gains between the three years found that for two of the four measures gains in year one were exceeded in years two and three. Implications for professional training to facilitate improved reading outcomes are discussed.


Introduction
Calls for reading improvement have echoed for decades and include those from Flesch (1955), Anderson, Hiebert, Wilkinson, & Scott (1985), Snow, Burns, & Griffin (1998), the National Reading Panel (2000), Foorman et al. (2016), and Seidenberg (2017). Accompanying these calls are reading achievement scores The Common Core standards (2010) identify foundational skills as the reading sub-skills involved in converting print to speech and the fluent reading skills that are important to comprehension. Extending the link from language to comprehension, a recent study has found that foundational skills are critical to third-grade achievement on end-of-year state accountability assessments (Paige et al., 2019). The authors reported that students with appropriate foundational skills were seven times more likely to score proficient or better on the state reading test. Further, only one-third of the over 1000 students in the study had attained appropriate foundational skills. Using professional development and coaching to build capacity for teaching reading, the present study reports on an initiative to improve third-grade reading outcomes. This study contributes to the knowledge base of educational change through a description of the teacher training process and the measurement of the student outcome measures that detect improvement in fundamental reading processes.
The structure of this article proceeds with a review of the applicable literature including the role of teacher core and pedagogical knowledge, attempts to change and build teacher practice, and the role of coaching. The study continues with a description of the methods including details of the study context and the curriculum used to improve teacher knowledge and practice, as well as the instruments used to measure reading. In the results section, we address each of the three research questions with details of the quantitative analysis and the findings. In the discussion section, we provide our interpretations of the study findings and the contribution this study makes to the literature base.

Teacher Knowledge and Practice
The foundational reading knowledge imparted by teacher educators to their students leaves a significant imprint on how these aspiring teachers view reading education. Teacher educators also equip these students with an initial instructional toolkit that is carried with them into the classroom after graduation. However, for too many of these future teachers this toolkit is woefully inadequate.
Binks-Cantrell, Washburn, Joshie, & Hougen (2012) assessed what teacher educators understand about foundational reading knowledge. After grouping teacher educators into higher-and lower-scoring groups the authors reported that those in the more knowledgeable group produced teacher candidates who outscored those taught by teacher educators who knew less. The authors concluded that students cannot learn what their teacher does not know and join others who have proposed this condition as a major contributor to poor reading outcome in the United States (Applegate & Applegate, 2004;Seidenberg, 2017). Unfortunately, changing what is taught by teacher educators in the over 1200 schools of education in the US is more than a challenging task. For example, in a state-wide analysis of teacher data in Florida, Harris & Sass (2007) found no evidence that either undergraduate training or academic achievement had any effect on the academic outcomes of their future students.

Changing Teacher Practice
What teachers do in the classroom matters because reading is a learned skill that must be taught, and so it follows that teacher quality impacts student outcomes (Blair, Rupley, & Nichols, 2007;Wenglinsky, 2000;Wharton-McDonald, Pressley, & Hampston, 1998). In order to be effective reading instruction must be guided by content knowledge and efficacious instructional practices (Kennedy, 2016;Sparks & Loucks-Horsley, 1990). As in subject areas such as biology or history, there exists a core body of content knowledge that teachers must know in order to be effective reading teachers (Snow & Griffin, 2007). Reading core content includes deep knowledge of phonemic awareness, phonics, fluency, vocabulary and comprehension, as well as the fundamentals of language and its development (McCardle & Chhabra, 2004;NRP, 2000;Snow, Burns, & Griffin, 1998). In order to provide evidence-based reading instruction teachers must not only possess core content knowledge, they must also have the ability to effectively apply that knowledge to classroom practice (Goldhaber & Anthony, 2007;McCardle & Chhabra, 2004;Moats, 2004;NRP, 2000).
An initiative to improve teacher core reading knowledge must be intentional.
After identifying what knowledge and which instructional practices best result in improved reading outcomes, the question becomes how to effectively 1) transfer this knowledge to teachers and then 2) convert that knowledge into instructional change that results in improvement (Shulman, 1986(Shulman, , 1987. Knowledge-to-practice transfer is not an inconsequential problem as greater teacher knowledge is not necessarily accompanied by better practice (Reutzel, Dole, Fawson, Jones, Read et al., 2009). A compounding problem is that teachers report that 90% of professional development is not useful as some suggest it too often consists of ineffec-  (Darling-Hammond et al., 2009). It has been estimated that about 15 percent of traditional "sit and get" professional development is actually implemented in the classroom, a transfer ratio that provides less than the necessary capacity to affect change (Meyer, 1988). Bush (1984) found that training describing instructional practices could be successfully adopted by just 10% of teachers, in other words 90% gained no benefit at all. This suggests that an effective model must provide considerably more support over time as teachers struggle to implement new instructional practices (Ermeling, 2009;Fullan, 2001).
However, an ineffective delivery model may not be the single root cause of the poor return on PD. It may be, as Elmore (2000) points out, that PD may not target the content most likely to result in change to student outcomes. This may be a problem that both precedes and interacts with complaints of ineffective delivery models as improvement experts are clear that capacity training must address the processes that will actually result in change (Bryk, 2014;Elmore, 2002;Demming, 2000).

Building Teacher Capacity
PD directly addresses the issue of capacity which Cohen, Raudenbush, & Ball (2000) define as the teacher's knowledge, instructional skill, and material resources that combine to create the interaction among students, the content, and the teacher to result in learning. Desimone (2009) posits that effective professional development (PD) increases teacher knowledge and skill, which then leads to change in instruction that results in greater student learning. While this seems a reasonable theory of action it has seldom been shown to actually evolve. A review of 1343 PD studies (Yoon et al., 2007) found just nine meeting the requirements of What Works Clearinghouse that resulted in significant student gains. This suggests that connecting the links recommended by Desimone is extremely difficult. Looking further into recommendations, Lewis (2009)

says that
PD must connect what teachers learn directly to their practice. For example, Garet et al. (2001) report that effective PD must focus not only on content knowledge, but also include opportunities for active learning integrated with instruction. Despite these recommendations, researchers have found teacher practice to be surprisingly resilient to change (Cohen, 1990;Peterson & Comeaux, 1990;Spillane & Zeuli, 1999). Unfortunately, inadequate teacher knowledge is not limited to reading as insufficiencies have been noted across other content areas including teachers of science (Dorph et al., 2007;Luft & Hewson, 2014) and mathematics (National Council of Teachers of Mathematics, 1991). Gulamhussein (2013) recommends five criteria for effective professional development, three of which overlap with those of Desimone (2009) and two that do not. Duration of professional development is critical and should emphasize distributed practice over time. While programs providing greater duration have been found to be more successful, a question is how much is enough (Darling-Hammond, Wei, Andre, Richardson, & Orphanos, 2009 (2003) found that programs providing 80 hours of instruction were more likely to be successful than those providing less. French (1997), on the other hand, found that 50 hours of instruction, practice, and coaching was sufficient to transfer learning to instruction. Teachers must be supported during the critical process of applying new learning to the classroom. Truesdale (2003), Cornett & Knight (2009), and Atteberry & Bryk (2011) report that during the confusion and frustration that accompanies the implementation of new teaching strategies and routines, coaching can provide teachers with critical support. Active learning involves teachers in a variety of learning approaches to new concepts (Richardson, 1998;Roy & Chi, 2005). Such activities include implementation videos, role playing, reading, discussion, and modeling. Of these activities modeling has been viewed as most effective (Desimone, et al., 2002;Garet et al., 2001;Penuel, Fishman, Gallagher, Korbak, & Lopez-Prado, 2009). The final principle states that professional development should focus on content specific curriculum as it is most effective at improving teacher practice and student achievement (Blank & de las Alas, 2009;Cohen & Hill, 2001;Kennedy, 1998).
In their What Works Clearinghouse review, Yoon et al. (2007) arrived at the following conclusions of what drives effective PD. First, while workshops have garnered a poor reputation for effectiveness, surprisingly, all 9 of the studies found to be effective involved workshops of some kind. Second, within-school expertise is often insufficient to facilitate and lead teachers in capacity-building initiatives aimed at student improvement. Successful professional development is more likely to be successful when involving content experts from outside the building. Third, none of the 9 successful studies employed a train-the-trainer approach to professional development which may hold potential for success, but has no evidence for support. Fourth, professional development must be distributed over time as educators cannot quickly absorb new learning. Effective PD was found to take 30 or more hours while implementations of shorter duration yielded no positive results. The fifth finding suggests that following professional development sustained follow-up is necessary to leverage its potential for effectiveness. Finally, there is no set of best practices for PD, rather, effective PD is constructed from a carefully considered mix of practices customized by content, process, and the context of the particular school building.

Coaching
An element now recognized across education as critical to successful adoption of new skills is teacher coaching. While there has been a considerable amount written on what authors consider to be the important characteristics and responsibilities of coaches, reports on the effectiveness of coaching have been slower to emerge (Bean, Swan, & Knaub, 2003;Dole, 2004;Vanderburg & Stephens, 2010). However, unlike PD, empirical findings are increasingly supporting the notion that coaching has a measurable, positive effect on teacher performance (Gamse, Jacob, Horst, Boulay, & Unlu, 2008;Garet et al., 2008). In a  Bryk, & Dexter (2010). Beginning with a baseline of student reading outcomes, the authors compared growth over four years and found that coaching could be attributable to increases in reading achievement with statistically significant effect sizes of 0.22, 0.37, and 0.43 across the three years following the baseline year.
Finally, Davis, McPartland, Pryseski, and Kim (2018) found that the use of literacy coaches to assist ninth-grade teachers in the use and implementation of literacy strategies resulted in improved student reading comprehension with an effect size of 0.19.

Research Questions
The present study is part of a three-year professional development initiative to improve end-of-third-grade reading outcomes by improving teacher capacity for reading instruction from kindergarten-through third-grade. This study investigates changes in third-grade teacher reading knowledge as a result of PD and the resulting student reading outcomes through a focus on three research questions: reveals that well over half (53.6%) of JCPS students achieve at less-than-proficient levels. When these scores are broken out by ethnicity nearly 60% of European-American children achieve proficiency compared to 28.9% of African-American children. This disparity is important as the present study is conducted in schools largely attended by African-American children and others from disadvantaged backgrounds.

Project Background
The Jefferson County Public Schools Literacy Project (Project) was a university-district initiative between JCPS and literacy educators from Bellarmine University with a goal of increasing end-of-third-grade reading outcomes. The theory of action adopted by the Project was that of Desimone (2009) where improving teachers' core reading knowledge and pedagogical skill with the help of literacy coaches, improves core (tier 1) instruction and results in improved student reading outcomes. The Project adopted the fundamental idea that to substantially improve reading outcomes teachers must be deeply knowledgeable about how printed words are transferred into sound and meaning by the reader.
Teachers must also be highly skilled in the pedagogy that facilitates letter-sound correspondence and the transfer of that knowledge into appropriate reading fluency with comprehension. As such, the Project took the approach that everyone involved in reading instruction must learn to improve, and that this learning is not to a criterion, but rather, grows on a continuous improvement continuum.
The district had in place a "Third-Grade Reading Pledge," an aspirational goal that all end-of-third-grade students would be reading on grade-level, although grade-level was left undefined. In the fall of 2013, the district's Chief Academic Officer invited area schools of education to propose initiatives to facilitate achievement of the third-grade reading pledge. The proposal from Bellarmine was based on the design of prior reading academies initiated in Dallas and Memphis (Manzo, 2000;Feldman, Schneck, Feighan, Coffey, & Rui, 2011). The Project was reviewed by the District and ultimately approved by the JCPS Board of Education. Project funding came primarily from Title 1 and general funds to pay delivery costs to Bellarmine. Deliverables included the design and delivery of a one-year capacity-building curriculum for kindergarten through third-grade teachers, ESL and Special Education teachers, the training of literacy coaches, designing a student outcome assessment system, collecting and analyzing data, and generally overseeing the Project in conjunction with district administrators. The first-year success of the Project resulted in the annual renewal of the project over the next two years. Total expenditures by the district for the three years amounted to approximately $2.5 million.

School and Teacher Participation
In the spring of 2014, the now Board-approved Project was presented to prin- Teachers received no monetary compensation for participation in the Project.
However, teachers did receive a total of six hours (3 hours per semester) of graduate level credit at no cost to them and were provided the books required for class. Graduate credit was granted by Bellarmine University and could be applied toward a degree at Bellarmine or transferred to another institution. Classes were delivered weekly in elementary schools that were in proximity to participating schools to ease travel for teacher participants. One year of classes resulted in 90 hours of face-to-face training over the two courses.
By the end of Year 1 many teachers were requesting a second year of training to better extend what they had learned. This resulted in the design of a third and fourth course available to teachers who had completed the initial foundational year of training. For participation in the second year of advanced training, teachers received an additional six hours of graduate credit, again at no cost to them, bringing the total of earned graduate credit to 12 hours for those completing two years of training. This second year of face-to-face training provided an additional 90 hours of training. Teachers participating in both years of training received a total of 180 hours of professional development.
Project training was open to teachers from K-3, special education, and ESL classrooms. Across the three project years a total of 162, 224, and 200 teachers enrolled in training in years 1, 2, and 3 respectively for a total enrollment of 586 teachers.

Course Content
The theory of action ( Figure 1) adopted by the Project is one hypothesized by Desimone (2009) where professional development and literacy coaching improves teacher knowledge and skill, which then leads to improved classroom teaching and ultimately, to improved student reading outcomes. This put the primary focus of the Project on the improvement of Tier 1 or core instruction.  in teaching letter-feature analysis skills as well as oral reading fluency and comprehension instruction. Also emphasized was development of a multi-tier support structure (MTSS) for students who were struggling. Throughout the Project a formative approach to curriculum was maintained that allowed the training curriculum to be adjusted in response to the learning of teacher-participants (Jimenez, 1997;Reinking & Bradley, 2004).

Literacy Coaches
In conjunction with the district, coaches were selected and then trained in the Project curriculum during a 2-week, 80-hour long summer workshop. During the school year coaches met monthly as a group with Project leaders to share insights, discuss logistics of the Project, how best to assist teachers, refine coaching skills, and continually enhance subject matter knowledge. Coaches were also trained to develop trust and establish rapport with each teacher in order to provide useful suggestions based on best-practice for improved student outcomes.
For each CAP, coaches engaged the participating teacher in a coaching cycle to provide support in the implementation of a new teaching strategy and to ensure continued use of the teaching strategy based on student need. As part of the coaching cycle the coaches held a pre-conference, observed an implementation of the strategy, and then held a post-conference with their respective Project teachers. Each pre-and post-conference session lasted up to 30 minutes. Additionally, and on an as-needed basis, coaches modeled strategies in participating classrooms.
Instructional coaching for elementary schools was administered by individuals with the title of Goal Clarity Coach. The scope of responsibilities for a Goal Clarity Coach was to provide support, assistance, and advice to the district-wide service center and/or the school faculty in the content area of need. Subject matter expertise of individual coaches tended to be wide-ranging from math to science to literacy across the elementary, middle, and high school level. During Year 1 of the Project, the responsibilities of literacy coaches were assigned by the district to the Goal Clarity Coach, when this was not possible it was given to a teacher leader. Initially Project literacy coaches were not compensated for these responsibilities. Over the course of the three years, Project literacy coaches were chosen with specific subject matter area expertise in elementary reading and eventually 50 percent of a coach's job responsibility was compensated by the

Student Participants
The unit of analysis for reading outcomes is conducted at the student level. The empirical student sample in the present study are third-grade students instructed by teachers participating in foundational and advanced training across the three years of the study. As the primary concern of district leaders was making the Project available on a wide basis, selection of a control-group was not possible.  Henderson & Templeton (1986) and provides a measure of the child's orthographic knowledge (Ehri, 1993;Ganske, 1994Ganske, , 2014.
The test is administered one word at a time where the teacher pronounces the word, uses it in a sentence, and then pronounces it again. The student writes the word on their answer sheet and then waits for the teacher to say the next word.

Reading Fluency
The assessment of reading fluency consisted of students individually reading aloud a curriculum-based measure (CBM). Students read the narrative passage for 3 minutes while being scored for reading miscues (omissions, insertions, mispronounced words, reversals and skipping a line) by the test administrator. If after 3 seconds students were unable to read a word it was counted as an error, and the student was told the word and directed to continue reading. Total time spent reading was recorded for those who finished in less than 3 minutes. Passages were administered in the fall and spring and ranged between 332 and 358 words in length and were measured for Lexile complexity using the Lexile Analyzer (MetaMetrics, 2016). All passages measured in the 700 L to 800 L range and are within the text complexity grade-bands identified by the Common Core that possess adequate reliability (Deno, 1985;Deno, Mirkin, & Chiang, 1982;McGlinchey & Hixson, 2004). The range of reading fluency scores for this group of students was 0 to 200 words-correct-per-minute. Reliability of the present data was determined using a split-half reliability test resulting in Pearson's r ranging between 0.982 and 0.991 depending on the text.

Assessment Administration
All assessments were individually administered to students by their Project teacher. Teachers and coaches were instructed on the administration and scoring of each instrument early in foundational training. Assessments were introduced one at a time followed by in-class administration practice. Teachers were then required to administer the assessments to two students and then bring the completed assessments to class. Assessments were then blindly scored by both the instructor and student and compared for reliability. Students whose grading was not in complete agreement with that of the instructor were immediately remediated to correct the scoring error. Those teachers were then required to bring to class an additional set of assessments from two different students the following week to repeat the scoring procedure under the auspices of the instructor. After 100% agreement with the instructor, a sample of blind scores for both raters were returned to the researchers for another round of reliability checking. After training and reliability checking, teachers then administered all assessments to their remaining students. Because of the temporal distance between the assessment periods the administration training protocol was repeated in April as preparation for the May assessment period.

Results
This study reports first, the results of a project to improve teacher capacity of core reading content and second, changes in third-grade reading outcomes over a three-year period as measured by developmental spelling knowledge, pseudoand sight-word reading, and reading fluency. We begin by analyzing growth in teacher knowledge as measured by the LIKS.

Research Question One
Research question one asks if teachers' reading knowledge improved after participating in foundational training provided by the Project. Note the LIKS data reflects teachers participating in foundational training classes only and does not include those in advanced training. Bonferroni correction between pre-and posttest LIKS results to determine the statistical significance of change with effect size measured using Cohen's d (Cohen, 1988). Results in Table 3

Research Question Two
For each study year we measured spelling development, sight-word reading, pseudo-word reading, and reading fluency with each year representing an independent sample of students. Table 4 shows the means and standard deviations for the measured variables by year while Table 5 shows the bi-variate correlations. A close inspection of the study variables indicates some differences in the levels of fall achievement between years while spring scores appear to increase in years two and three beyond that of year one. Bi-variate correlations reveal moderate to large relationships between variables for each of the three years with relationships in years two and three appearing generally larger than those in year one.
Research question two asks the extent to which student reading outcomes changed over the three years of the Project. The Figure 3 bar graph shows the fall and spring means for each variable across the three study years. A visual inspection of the means shows first, that growth occurred in each of the four variables between fall and spring of each year. Developmental spelling means increased from 3.1 to 5.
Utilizing the fall measure as a covariate in each of the models controls for any variability between the years resulting from the pretest (Fall measure). The ANCOVA controls for any differences in the outcomes (spring measure) that may be attributable to the fall measure. ANCOVA is an efficient method for isolating a treatment effect and the use of pretest scores is an effective covariate when the purpose of the model is to examine post-test variability (Yang & Tsiatis, 2001). In practical terms, the ANCOVA adjusts the data such that the different starting points (fall measures) do not impact the observed differences in the spring measures. The slopes shown in Table 6 represent the within year comparison (fall to spring). In all years the slopes are statistically significant (p < 0.001) indicating a significant increase in the spring scores compared to those from the fall.

Research Question Three
Research question three asks if the rate of student learning on the measured variables changed by year? In other words, for each of the measured variables did students acquire the same amount of learning each year or were some years more productive than others? It may be inferred that the greater the value of the slope estimate the greater the rate or magnitude of learning. Equality of slopes by year would indicate an equal amount of learning took place while statistically significant differences between the slopes would indicate student learning differed. Figure 4 plots the mean growth by variable while Figure 5 plots the slope coefficient estimates by year for each of the four student outcomes.
To test the hypothesis of equality of slopes by year:   When the results indicated overall model significance (F-test), pairwise comparisons were estimated to investigate the simple factor level effects. The pairwise comparisons were estimated in a method similar to the overall model but the dummy term was limited to two years rather than three. When the interaction term comprised of the two years and the fall measure was significant, it is reported as a significant simple effect (t-statistic.) Table 6 reports the results of this statistical testing to determine differences between years (rate of magnitude of yearly increase). Figure 5

Discussion
The Project was guided by a learning to improve framework suggesting intensive Throughout the Project curriculum designers used a formative approach which allowed for carefully considered adjustments to enhance the learning and utility of the training content. This approach provided curriculum designers the space to learn and improve based on teacher and instructor feedback and to make use of information gained from student outcomes. In Year 2 for example, the curri- a direct measure of classroom reading instruction, we take the year 2 and 3 increases across the measured variables as indirect evidence that instruction improved. We think it is unlikely that given the large sample sizes across the three years that students independently improved with no instructional input.
Beyond quantifying the descriptive changes occurring in all four of the reading outcomes, our third research question explored whether the magnitude of learning differed by year. Our analysis of covariance (ANCOVA) results revealed that the regression slopes were significantly different across the three years for two of the reading outcome variables. While the plots in Figure 4 for spelling development and pseudo-word reading show clear differences in the magnitude of growth between years, those for sight-word reading and fluency clearly do not. We interpret the between-year increases in the magnitude of spelling knowledge growth as evidence that students learned at increasingly faster rates. While we cannot make a causal claim, we interpret this as suggesting teachers became increasingly proficient with instructional practices that encourage letter-feature development in students. As pseudo-word reading reflects the ability of students to apply their letter-feature knowledge to decode words, the increases in 2017 over 2015 suggests growth in the magnitude of student learning. The 2014 to 2015 reduction in the regression coefficient for pseudo-word learning is difficult to explain as there could be numerous reasons. The regression coefficients for sight-word reading and fluency also show significant growth for each year, although differences suggesting increasingly faster between-year growth were not found.
Our perspective of learning to improve emerges from a quality improvement paradigm suggesting that a process for improvement of core reading instruction can ultimately lead to enhanced instruction and predictable growth in student outcomes. Quality improvement (Deming, 2000;Shewhart, 1980) is a system that begins with the identification of quality measures, that is, the activities that occur within the instructional process that contribute to its' ultimate quality. For example, one quality measure is teacher core knowledge of reading that was addressed in the present study. Other quality indicators likely exist which act to produce improved student outcomes. Some of these may include the amount of time teachers are actually engaged in teaching reading, the efficient use of instructional time, word-work quality, the extent to which instruction is differentiated to account for learner differences, and the regular use of formative and diagnostic assessments to measure growth of critical reading sub-skills (Black & William, 1998). Other indicators include the amount of time students spend ac- are asked to read, the amount and quality of teacher feedback provided to students, the materials teachers use to implement instruction, and the fidelity with which teachers implement a teaching and learning cycle. We suggest it is reasonable to expect that teachers differ in the quality with which they implement these and other instructional indicators and that these differences account for common variation that affects student outcomes. It follows then that determining which indicators account for the greatest variation in student learning, and then bringing them into statistical control may lead to reading achievement gains. We posit that a continuous quality improvement process (which implies it is guided by appropriate measurement) can provide a school with a proven, reliable, and predictable process that puts it in control of instructional improvement and student outcomes.

Conclusion
It remains a question whether or not additional or different Project training content would have improved reading outcomes beyond those found in the present study. Also unknown is the effect of the advanced year of training on teacher practice and student outcomes. From an anecdotal perspective, teachers participating in the second year of the Project reported an increased understanding of the diagnostic assessments and how to leverage those results to improve and differentiate their instruction. From a teacher preparation perspective, the Project results suggest that improvement of reading instruction is intensive, hard work that must have at its foundation the correct curriculum that teachers perceive to be worth learning. Improvement must also involve knowledgeable individuals in the form of training instructors and literacy coaches to support and guide teacher learning and classroom implementation. What is critical is that at some point teachers begin to see improvement in their students that suggests their effort is worth their trouble. It is in these moments when teachers be-  Deming (1980) suggests, that first knowing what to do and then doing it well is critical to helping their students become better readers.
Given the reviewed research suggesting teachers are poorly prepared to teach reading to students at-risk for reading failure, combined with data showing too many students are underachieving in reading, leads to the consideration that the current reading teacher preparation model is insufficient (Licklider, 1997). Much as a medical student who just received an M.D. degree is not ready to practice without several years of residency training, graduation from a teacher preparation program can provide at best, a start at becoming a skilled reading teacher. It may be that becoming competent in the practice of reading instruction requires much more than preparation programs can provide under the current model.
Long-term and consistently poor national and state-level reading results support the notion that post-certification PD is not improving reading outcomes. As Project implementation began we were surprised at the poor level of core reading knowledge across one of the country's great city school districts. From the central office and senior administrator level down to the building level, deep literacy knowledge was universally absent. Even more problematic was the presence of instructional ideas that were at odds with what we know about how humans read and how best to teach its acquisition. Our efforts suggest to us that teachers of students at-risk for reading failure are in need of long-term, high-level "residency" training under highly knowledgeable coaches employing best practices within a proven quality improvement system. While the question remains of how best to deliver such training we suggest that the model presented in the present study is a beginning.

Limitations
Our results are limited by the absence of a randomized control trial to control for possible confounds and alternate explanations to the study results. This leaves open the possibility that other factors could explain both teacher training outcomes and the increases in student outcomes. This study is also limited by an inability to measure the incremental contribution of the second year of teacher training in the Project which we think may contribute to increase seen in student outcomes. The study design involved three independent cross sections of students that prohibited the tracking of within-student results across the three years. The study design was also not able to account for third-grade students in the study sample who had been previously instructed by Project teachers in either the first-or second-grade, or in both grades. It is entirely possible that an enhanced effect of the Project was experienced by students who received prior instruction from one or two Project teachers. Our study design only allowed the gathering of data from teachers enrolled in Project training. This meant we were unable to track individual Project teachers across the three years of the study, which could have provided valuable insight into teacher growth. We were unable to adequately document changes in classroom instruction. Such data would have allowed the measurement of change in teacher practice and modeling of its effect on student outcomes. In all, our study reflects the challenges of working within school districts where the desire for quickly improved outcomes on state assessments can be intense and the will and discipline to implement well-designed studies that can rigorously answer important questions is often lacking.

Future Research
Our results suggest research into the development of a continuous improvement system that can measure, analyze, and improve the indicators found to predict significant variance in reading instruction is needed. Much of the focus of reading research has been on the specification of the cognitive processes involved in reading and instructional strategies that facilitate growth in sub-processes such as phonological awareness, letter-sound learning, fluency, vocabulary, and comprehension. Much less is known about how these strategies work coherently within a system of instruction whose objective is to get every student to at least, minimum levels of reading achievement that can facilitate academic success. This is an ambitious task that has yet to resonate on a general basis across the research community and school districts. If NAEP and state accountability results are accepted as evidence of poor reading, we suggest it is time to move in the direction of the quality of improvement of reading instruction.