Impact of the Flipped Classroom on Students’ Learning Performance via Meta-Analysis

The flipped classroom is a teaching strategy that reconstructs conventional teaching methods and pays attention to students’ active learning. Recently, there have been many studies comparing the effects of flipped and traditional classrooms on students’ learning outcomes, but which is more suitable remains an open issue. This study explored the effect of flipped classrooms on student learning performance compared to traditional classrooms via meta-analysis. Using predefined eligibility criteria to screen the literature, WoS databases were searched for the relevant articles, and 63 experimental articles were included in the meta-analysis. STATA was used to conduct the current meta-analysis. The results indicated that the flipped classroom can improve students’ academic performance. The subgroup analysis showed that the heterogeneity of each subgroup was relatively large, and the sensitivity analysis found that the source of heterogeneity might be caused by the different experimental designs and the specific implementations of the flipped classrooms. The results provide a broad perspective for educators to implement flipped classrooms in the future.


Introduction
The flipped classroom is a hybrid approach that combines online learning with face-to-face classroom activities (Graham, Woodfield, & Harrsison, 2013). It inverts the traditional instruction strategy. Before the class, the teacher provides videos or other resources and the students choose a suitable time and place to learn based on their personal learning rhythm; in the class, the students partici-pate in collaborative and interactive learning activities to make good use of the class time (Bergmann & Sams, 2012;Fulton, 2012;Mok, 2014). With the development of digital resources, particularly the open online instructional videos created by Khan Academy and the massive open online courses (MOOC), the flipped classroom has become increasingly popular in modern education (Sun, Xie, & Anderman, 2018). In line with this, this study aims to investigate whether flipped classrooms can promote students' learning performance in different subject areas and learning stages, and whether students' academic performance in flipped classrooms has different effects due to the different types of knowledge taught.
In addition, compared with regular courses, flipped courses have been found to have no positive effect on students' learning motivation (Tse et al., 2017). To sum up, it is hard to reach a consistent conclusion regarding whether the flipped classroom instructional strategy has a significant impact on students. Therefore, the present study aimed to expand the perspective of previous meta-analyses and reviews by investigating the impact of flipped classroom interventions comparing the traditional classroom with students' learning performance. Furthermore, the present study explored whether the flipped classroom's specific implementation characteristics moderate the learning outcomes.

Previous Meta-Analyses on Flipped Classroom
Existing studies have discussed the impact of implementing the flipped class-room on students in the medical and health profession disciplines. Chen et al. (2018) compared the efficacy of the flipped classroom with traditional lecturebased learning with 46 studies in the field of medicine, and found that the students who experienced flipped classrooms had higher learning performance than those who learned in traditional classrooms. Hew and Lo (2018) found that the flipped classroom strategy in health profession education produces a significant improvement in student learning performance compared with regular courses according to 28 eligible comparative studies. Hu et al. (2018) analyzed the effectiveness of flipped classroom teaching in nursing courses according to 11 articles, and found that it was effective in terms of improving students' scores of theoretical knowledge and skills. Xu et al. (2019) also conducted a meta-analysis to explore the effectiveness of inverted classrooms on the development of Chinese nursing students' skill competence through 22 eligible studies. The metaanalysis results of Gillette et al. (2018) reached different conclusions, as they found that there was no significant difference in the student performance of traditional and flipped classrooms by conducting a meta-analysis of five articles published in the field of pharmacy education between 2000 and 2017. Ang, Zaid, and Harun (2015) conducted a meta-analysis on 10 articles published since 2010 related to flipped classrooms, social collaboration knowledge construction (SCKC), and information and communication technology (ICT) courses. The study of Ang et al. (2015) was concerned with the effectiveness of flipped classrooms on college students' scientific production and found that the flipped classroom model is conducive to promoting the construction of students' social collaboration knowledge and ICT skills. Martínez, Díaz, Rodríguez and García (2019) evaluated the effect of the flipped classroom teaching method on college students' learning performance through the research included in the WOS and Scopus database, and the results showed that students' learning performance was improved under the flipped classroom.
There are also studies which are not limited to certain subject areas or learning stages. Rahman, Aris, Mohamed and Zaid (2014) found that the flipped classroom instructional strategy is suitable for integrating into mathematics, science, engineering, technology and social science and other disciplines, and has a positive effect on students' test scores by 15 relevant articles published in . However, Cheng, Ritzhaupt and Antonenko (2019 found that the flipped course had a negative effect (g = −0.081) on student outcomes under the context of engineering education. Cheng et al. (2019) analyzed the impact of flipped classrooms on students' learning outcomes by a set of moderating variables including students' learning stage, subject area, duration of study, and publication type, and found that the effect sizes were significantly moderated by subject area. Lag and Saele (2019) meta-analyzed eligible papers published after 2010 to evaluate the impact of flipped classrooms compared with traditional classrooms on learning outcomes and student satisfaction, and the results showed that flipped classrooms had little impact on learning (g = 0.35). Briefly, there are several meta-analyses of the effectiveness of flipped classrooms in certain disciplines or in-volving certain groups of students. However, they did not explore whether the types of knowledge taught in the flipped classrooms also affect students' academic performance.

Research Questions
Previous meta-analyses have covered a broad range of overall effects, but most studies are limited to a certain subject area (e.g., Chen et al., 2018;Gillette et al., 2018;Hu et al., 2018), discipline (e.g., Ang et al., 2015;Martínez et al., 2019;Xu et al., 2019) or educational level (e.g., Martínez et al., 2019). There are a few articles (e.g., Cheng et al., 2019;Lag & Saele, 2019) which have examined the potential moderating effects of flipped classrooms and explored whether specific characteristics of the implementation moderate the impact. Wasserman, Quint, Norris, and Carr (2015) found that the different proportions of conceptual knowledge and procedural knowledge in the learning content will affect the outcomes of students under the flipped instructional model. Therefore, based on previous researchers' findings, subjects and knowledge types are also included in the moderating variables in the present study to explore the differences in the learning effects of the flipped classroom from all disciplines. The current meta-analysis aims to answer the following two research questions: RQ1: Can the flipped classroom instructional strategy effectively improve students' learning performance? RQ2: Is the students' performance in the flipped classroom instructional strategy moderated by specific characteristics, such as RQ2-1: study design, RQ2-2: sample size, RQ2-3: learning stage, RQ2-4: subject area, RQ2-5: knowledge type, RQ2-6: instructor equivalence, RQ2-7: and intervention duration?

Search Strategy and Selection Criteria
In order to ensure that the search of the literature that constitutes the study samples is rigorous, this study followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-analyses) statement of the eligibility criteria and quality criteria of research selection (Moher, Liberati, Tetzlaff, Altman, & The PRISMA Group, 2009). The retrieval took place on January 1, 2019. To capture high-quality and a broader range of potential and eligible papers, we conducted the following search strings with Boolean operators to identify the relevant papers in the core collection retrieval of the WOS (Web of Science) database. First step: TS = ("flipped classroom*" OR "inverted classroom*" OR "flipped learning"); second step: TS = ("academic performance*" OR "academic achievement*" OR "learning performance*" OR effect* OR impact* OR efficacy OR performance* OR achievement*) AND #1, where #1 represents the results obtained in the first retrieval step. The results returned 595 articles written in English during the time span of 2012-2018.
The following inclusion criteria were used to decide which paper could be in- cluded in this meta-analysis. According to the inclusion criteria, 595 papers were screened, and the flowchart is shown in Figure 1.
Inclusion Criterion 1: The papers utilized comparative studies such as quasi-experiments and randomized controlled trial.
Inclusion Criterion 2: The experimental group adopted the flipped classroom instructional strategy, while the control group adopted the conventional classroom instructional strategy.
Inclusion Criterion 3: The papers reported the learning performance of the experimental group and the control group on similar course topics using the identical assessment instruments.
Inclusion Criterion 4: The papers provided enough data to calculate effect size, such as mean and standard deviation.

Coding of the Outcome Variables
In the current study, 63 sample papers were coded according to the coding methods of Chen et al. (2018) and Cheng et al. (2019). In addition, we also wanted to explore the impact of knowledge type on students' learning performance in flipped classrooms. Learning content was classified according to the dimensions of knowledge structure (Krathwohl, 2002). Thus, 63 eligible papers were coded according to basic information, characteristics of the learner, content attributes of learning materials. The specific coding content is as follows: 1) The basic information includes author, publication time, sample size, type of experiment, 2) The characteristics of learners include the learning stage of the participants, such as elementary school, middle school, high school, university; and 3) The content of the study mainly analyzes the type of knowledge and the subject area.
To ensure the reliability of the coding, each of the authors coded five papers and analyzed the inconsistent codes to form a consistent understanding of the coded content. Then, each of us coded all of the sample papers separately. After coding, all the authors discussed the different opinions to form a consensus and correct the results. The coding results of the 63 papers are shown in Table A1 of the Appendix. Among them, Lo et al. (2018) conducted four comparative experiments to explore the effect of flipped classrooms on students' outcomes in secondary school. Since the four experiments were performed on four different subject areas, those four sets of data were all included in the meta-analysis. Therefore, the number of effect sizes from 63 sample papers is 66 (k = 66).

Effect Size Extractions and Calculations
The effect size (sometimes referred to as correlation or standardized mean difference) is the unit of currency in a meta-analysis, which quantifies the magnitude between the control group and treated group or the strength of a relationship between two variables (Borenstein, Hedges, Higgins, & Rothstein, 2010). In a meta-analysis, both Hedges' g and Cohen's d can be used to standardize the effect size. However, when the sample size is small, the calculation method of Hedges' g is more accurate (Borenstein et al., 2010 The equation to calculate the standard error and confidence interval (CI) for Hedges' g is as follows: Open Journal of Social Sciences Note. M E and M C represent the mean of the learning performance of the experimental group and the control group respectively. N E and N C represent the sample size of the experimental group and the control group respectively. SD E and SD C represent the standard deviation of the experimental group and the control group respectively.

Quality Assessment and Heterogeneity Test
Meta-analysis is a quantitative method for comprehensive analysis of published research results (Hedges & Olkin, 1985). Therefore, the articles included in the analysis will have a great impact on the results. To ensure the reliability of the 63 papers included in the meta-analysis, the Begg's rank correlation method and funnel plot were used to comprehensively measure the publication bias of the 63 sample articles. The funnel plot allows people to visually judge the publication bias, while the Begg's rank correlation method complements the funnel plot from a quantitative perspective. If Z > 1.96, p < 0.05, there may be a publication bias; Z < 1.96, p > 0.05 indicates no publication bias (Begg & Mazumdar, 1995).
As depicted in Figure 2, the effect size of the study sample was evenly distributed on both sides of the average effect size, and most were distributed within the confidence interval. In addition, the results of Begg's test showed that Z = 0.51 < 1.96 and p = 0.607 > 0.05. Thus, there was no publication bias in the sample literature.
Heterogeneity refers to the inconsistencies in research results among samples. I-squared (I2) and Q-statistic can measure the heterogeneity in meta-analysis. STATA 12.0 was used to test the heterogeneity of the 63 sample papers, and the result was I2 = 92% > 50%, so it shows a high level of heterogeneity (Higgins & Thompson, 2002). Therefore, the random effect model was adopted for metaanalysis in the present study. The Q-statistic test is a test of the null hypothesis that all studies in the meta-analysis have an identical effect size (Borenstein et al., 2010). In this study, the q-value is 815.48 with 65 degrees of freedom and a p-value of 0.00 < 0.001. Thus, we can reject the null hypothesis that the true effect size is common in all the studies (Borenstein et al., 2010). Therefore, except for the sampling error, there may be several other moderators that result in heterogeneity (Borenstein et al., 2010). Therefore, subgroup analysis was also performed to determine their impact size. These moderating variables are: 1) study design, 2) sample size, 3) learning stage, 4) subject area, 5) knowledge type, 6) instructor equivalence, and 7) intervention duration of the flipped classroom.

Descriptive Statistics Information
The total sample size of students who experienced flipped classrooms (N = 4716) and traditional classrooms (N = 4447) is N = 9163 (Table A1 of the Appendix). took place in higher education, and the number of participants in the experiment was generally more than 60. The sample literature covers a wide range of subject areas, while the papers relating to humanities and engineering techniques are relatively few. This phenomenon is the same as that reported by Cheng et al. (2019). Most of the course knowledge selected in the experiment was conceptual knowledge. Figure 4 depicts the effect size and confidence interval for each of the 63 studies, and the average effect size of all studies. Each small box in the figure corresponds to the effect size of each study, and the horizontal line through each small box signifies the confidence interval. The vertical solid line in the middle is an invalid line, indicating that the factors studied are not statistically related to the outcome. The diamond at the bottom of the forest plot, which is crossed by a dotted line, describes the combined effect size and its confidence interval. As shown in Figure 4, most studies were on the right side of the invalid line, with statistical significance. Cohen (1988) believes that when the effect size is greater than 0.8, it can be considered large; it is medium between 0.2 and 0.8, and less than 0.2 is small and has little significant meaning. In the sample literature, the effect size of 22 studies was less than 0.2, which was a small effect, and the effect size of 20 studies was between 0.2 and 0.8, which was a medium effect. A total of 24 studies have an impact greater than 0.8, which suggests that the flipped classroom can obviously promote students' learning performance. However, the calculation results show that the average effect size of the 63 studies is 0.621 (95% confidence interval, CI = 0.464 -0.778, Z = 7.74, p < 0.001), 0.2 ≤ 0.621 ≤ 0.8, so

Overall Effect Size of the Flipped Classroom for Learning Performance
it is a medium effect size. Finally, the result of Rosenthal's fail-safe N test determined that 16,585 additional studies with null results would be required to nullify the current overall effect size (Rosenthal, 1979). Therefore, the flipped classroom instructional mode can improve students' learning performance. This conclusion is the same as those of Lag and Saele (2019) and Cheng et al. (2019).

Effect Sizes of Academic Achievement by Moderator Variables
For the purpose of investigating the impact of flipped classrooms on student academic achievement in different contexts, several moderator variables were analyzed using the random-effect model by STATA, and the results are shown in Table 1. As shown in Table 1, the heterogeneity between subgroups was large, and the source of heterogeneity could not be found. To evaluate the statistical stability of the results, every study was excluded from the meta-analysis each time to reveal the effect of every dataset on the merged pooled results. Through sensitivity analysis, the result showed that the study of  had the greatest impact on the total merger effect. If this study was removed, the overall effect had a great impact, but it was still within 95% CI. After carefully reading the research of Kim and Jang (2017), we found that the effect size of this   research is significantly larger (g = 4.452) than those of other research. Kim and Jang (2017) designed a randomized, controlled trial to assess the impact of a flipped classroom on nursing students at a university in South Korea, and two separate experiments were implemented in two different years to prevent diffusion and imitation effects between the control and the experimental groups. Most of the other 62 studies included in the meta-analysis focused on naturally occurring classes in schools, and rarely separated the experimental group from the control group by 2 years, so the study design may be one of the sources of heterogeneity. Due to the characteristics of nursing education, each class inter-vention lasted 100 minutes. Students' academic performance was also included in the assessment of learning content and practical performance, so different regions, schools' teaching methods and evaluation methods may have caused the heterogeneity

Study Design and Sample Size
As can be seen from Table 1 (Table A1 of the Appendix). The flipped classroom does not have a great advantage in these four groups of samples. Wasserman, Quint, Norris, and Carr (2017) suggest that this study design may not control some unrelated variables well. Even if the teacher of the control and trial group is the same, the teaching ability and experience of the teachers will improve with time. Therefore, in the future experimental design, the experimental group and the control group should be controlled in the same time period for better control variables, while also paying attention to the diffusion and imitation effect of the experimental and control groups .

Learning Stage
As can be gleaned from Table 1 are all greater than zero, indicating that students who experienced the flipped classroom had better learning performance than those who experienced the tra-ditional classroom. However, the p-value of the high school stage experiment was 0.34 and the confidence interval included zero, which was not statistically significant for us. Most of the empirical research on flipped classrooms was carried out in the higher education stage. This result is consistent with Cheng et al. (2019) and Tucker (2013). The flipped classroom is mainly used in the field of higher education to innovate the conventional teaching model, and it is also attracting increasing attention at the K-12 stage (Johnson et al., 2013;Tucker, 2013). Most of the researchers are from higher education institutions where they collect data easily from university. On the other hand, some students think it is not suitable to carry out flipped classrooms in lower grades, because the students in lower grades may not have the academic maturity needed to be successful in the flipped setting, or may have a lower level of flipped classroom preparation, with half of the students who experienced the flipped class saying they would not choose another flipped class (Mason, Shuman, & Cook, 2013;Strayer, 2007;Tomas, Evans, Doyle, & Skamp, 2019). The flipped classroom has higher requirements on student's self-regulated learning activities than the traditional classroom, and students' self-regulated learning abilities affect the effectiveness of flipped classrooms (Rodrigues, Sedraz, Ramos, de Souza, & Gomes, 2016). Moreover, the design of flipped courses is dependent on online resources; the more students watch the course website, the higher the test score they will acquire (Hotle & Garrow, 2016). In fact, if there is a lack of supervision, students often procrastinate or just open a video file without understanding the content (Beatty, Merchant, & Albert, 2019). In this condition, teachers need to reinforce interaction with students or assign assistants to supervise them; that is, they can conduct flipped learning on a continuum that develops different levels of student-oriented learning and autonomy based on students' learning needs and their preparation for the flipped classroom (Beatty et al., 2019;Sun & Wu, 2016;Tomas et al., 2019). Table 1 shows the subgroup analysis by subject area and knowledge type for the flipped classroom versus conventional classroom. The flipped classroom was used in various professional fields, but the humanities (k = 10, N = 545) and engineering sciences (k = 6, N = 475) are relatively small compared to the social sciences (k = 19, N = 3342), natural sciences (k = 16, N = 2343), and medicine (k = 16, N = 2458). The social sciences (g = 0.579), natural sciences (g = 0.67), and medicine (g = 0.648) have moderate effect sizes, and the humanities has the largest effect size (g = 1.024), while the effect size of engineering and technology is −0.118. Therefore, in the five subject areas, except for engineering and technology, students performed better in the flipped classrooms than in traditional classrooms. The p-value of engineering and technology is 0.593, indicating that this is not statistically significant for this condition. Humanities (p < 0.001) show great results for the flipped classroom teaching method, with 10 groups of expe-riments included in the humanities classification, most of which are associated with language learning. Five groups are English (Ekmekci, 2017;Lin, Hwang, Fu, & Chen, 2018;Yu & Wang, 2016), two are Chinese (Wang, An, & Wright, 2018b;Lo et al., 2018), and one is Korean (Kim, Park, Jang, & Nam, 2017). Only the effect size of  is −0.018, indicating that the student outcomes in the traditional classroom were better than those in the flipped classroom. Therefore, the flipped classroom instructional strategy may be suitable for language learning. From the perspective of knowledge type, conceptual knowledge (g = 0.621) and procedural knowledge (g = 0.696) have moderate effects, and students perform better in flipped classrooms than in traditional classrooms, while the difference between the two types of knowledge is not obvious. After the experiment, there is also no difference between flipped classrooms and traditional classrooms in the retention of knowledge type (Bouwmeester et al., 2019).

Subject Area and Knowledge Type
Not all course materials are fit for students to learn autonomously by instructional videos (Scott, Green, & Etheridge, 2016). Braun, Ritter and Vasko (2014) also found that the flipped classroom method may not be applicable to all topics. However, classroom presentation of new knowledge and problem-based teaching did not receive enough attention in some flipped classrooms. In some studies (e.g., Kostaris, Sergis, Sampson, Giannakos, & Pelliccione, 2017), teachers used the instructional video to invert the classroom completely, leaving only a little and new content for the classroom. However, some topics are easy to understand through video, whereas others are too complex for students to learn (Scott et al., 2016). Even if some students watch the video repeatedly, they still cannot understand the knowledge presented in the video course. Therefore, during the experimental intervention, the teacher had to spend extra classroom time going over the concepts. To overcome this cognitive impairment, the classroom duration of the flipped model was extended from 50 minutes to 100 minutes/day; however, it caused overload for the students and teachers (Anderson et al., 2017;Bouwmeester et al., 2019).

Instructor Equivalence and the Duration of Intervention
As depicted in Table 1, in 53 studies, the control group had the same teacher as the experimental group, while in nine studies the two groups had different teachers. Whether it is the same or a different teacher, the students' learning performance in the flipped classroom is better than in the regular class, but the effect size of the same teacher (g = 0.72) is greater than that of different teachers (g = 0.346). Therefore, teachers have little influence on students, but two different instructors may lead to an instructor bias (Webster & Majerich, 2014). Therefore, in future experiments, researchers should ensure that the teachers in the experimental group are consistent with those in the control group. From the duration of the experiment, the effect of the experimental intervention time of 6 -10 weeks (N = 1291, g = 1.199) is better than that of 1 -5 weeks (N = 590, g = 0.632) and above 10 weeks (N = 7282, g = 0.441). There were 45 studies with one seme-ster as the experimental time, but the effect was not as good as that for 6 -10 weeks. Most of the experiments of 1 -5 weeks were based on unit learning. Students' learning in the flipped classroom was better than in the traditional classroom, but the p-value was 0.004, which indicates that it is not statistically significant in this case.
The effect size of medium intervention duration was found to be the largest. Anderson et al. (2017) also found a small to moderate effect of flipped classrooms on students' performance after intervention of about 6 months, although long-term gains failed to reach statistical significance. If the intervention time of the flipped classroom is too short, students may not quickly adapt to this teaching method. Some studies (e.g., Hotle & Garrow, 2016;Mason et al., 2013) concluded that when a flipped classroom is implemented, students usually have an adaptation period of about 3 weeks. When they realize that their original learning habits are inconsistent with the current learning mode, they will self-adjust their learning habits. In the first few weeks of the course, students who experienced flipped classes spent the same amount of time on homework activities as the students who experienced traditional classes, and spent less time than the students who experienced traditional classes in a week before the final examination (Bouwmeester et al., 2019). On the other hand, students who learned in a flipped classroom spent more time on homework activities on average than students in a traditional classroom (Bouwmeester et al., 2019). In order to minimize the extra workload bias, Blazquez et al. (2019) reduced the time spent in the flipped classroom by 2 hours, but the score of the experimental group was lower than that of the control group during the short-term intervention, while there was no significant difference in the learning effect of the two groups in the long-term intervention. In addition, the flipped class session was too short, and the students did not have enough time for in-depth discussion. However, extending the class time will increase students' learning load, and even reduce the self-efficacy of the students who experienced the flipped classroom to the same level as the students in the traditional classroom at the end of the course (Bouwmeester et al., 2019;Rodriguez et al., 2019).

Conclusion
In this meta-analysis, we identified and extracted the eligible papers in the core collection of the Web of Science database, then encoded and analyzed 63 papers included in the meta-analysis. The overall impact of flipped and traditional classrooms on student academic achievement was analyzed using STATA (Version 12.0). According to the results of the analysis, the present study found that students' performance in flipped classrooms was better than in traditional classrooms, with a 0.621 average effect size. It proved the flipped classroom instructional strategy can effectively improve students' learning performance. The first question was answered.
The results of this study indicated that learning outcomes varied with specific Sample size: the experimental results of samples below 120 were better than those of the large sample size (greater than 120); 3) Learning stage: flipped classrooms have been applied in K-12 education and higher education, but according to the number of studies, flipped classrooms are more commonly applied in higher education. The impact of the high school stage is lower than that of the elementary school, junior high school, and university stages; 4) Subject area: the flipped classroom teaching effect in the humanities is better than that in the social sciences, natural sciences, and medical education, while the students of engineering education perform better in traditional classrooms; 5) Knowledge type: the influence of knowledge types on the effect of flipping classroom learning is not obvious; 6) Instructor equivalence: the experimental results of the same teachers were better than those of different teachers; 7) Intervention duration: short-term and medium-term interventions are better than long-term interventions (more than 10 weeks).

Implications
When implementing flipped classrooms, teachers have to consider whether students can adapt to and accept the curriculum reform and whether the content is suitable for a flipped classroom. The conclusions of this study provide a broad perspective for relevant researchers and educators to study or implement flipped classrooms. Teachers should consider the number of the students according to the sample size (the sample below 120 were better than larger sample), learning stage (higher education is better than K12), subject area (the subject of humanities is better), and intervention duration (short-term and medium-term interventions are better).

Research Limitations and Future Research
This meta-analysis has a large heterogeneity in terms of both total analysis and subgroup analysis, like the study of Xu et al. (2019). This phenomenon may be caused by the following factors: 1) the different evaluation methods of students' learning performance; 2) different cases of flipped instruction in schools have different teaching objectives, different content, and different teaching methods in the implementation of the flipped classroom. This study collected as much of the eligible literature as possible in the core collection of the Web of Science database, but there would be more articles if it was also searched in other databases to obtain the target literature. In this study, only one study conducted in elementary school and two studies conducted in high school were found. Therefore, whether the implementation of flipped classrooms in elementary and high schools is effective, it remains to be further explored. In the process of coding the articles, most empirical research on the flipped classroom only discussed the students' learning performance, with few papers also discussing the students' learning motivation and learning attitude. In addition, most of the research describes the experimental process and the posttest score, but there is no specific description of the learning materials and the implementation process of the flipped classroom. Therefore, it is difficult to determine whether these factors have an important influence on the students' learning performance. Future research on flipped classrooms can also explore the effects on students' outcomes including type of learning resources provided by teachers before class, the length or style of videos, whether tests are provided before class, and the form of teacher-student interaction in class. In flipped classrooms, students play the main role while teachers play the secondary role. However, the transfer of learning responsibility makes students feel that the workload is too heavy, which is also a great challenge for teachers. How to reduce the burden on students and teachers is a problem worth paying attention to in further studies.