Interactive Graphics for Presentation and Exploration of Student Performance Profiles in IGLU / TIMSS 2011 Educational Surveys

This paper advocates the application of interactive graphics as a qualitative research method for comparative large-scale assessments, known to supplement and extend analytical techniques. Graphics can be easily generated for effective display of information. Interaction is essential for exploring data in a flexible and controlled manner. The national results of the common sample of the large-scale educational assessment studies Internationale Grundschul-Lese-Untersuchung (IGLU)/ Trends in International Mathematics and Science Study (TIMSS) 2011 demonstrated the effectiveness of interactive graphics. The performance profiles of elementary school students in terms of the competency domains of reading, mathematics, and science are analyzed. Furthermore, spineplots are used to investigate the background characteristics of students of different performance types in order to identify possible educational disadvantage. The study aims to simplify data exploration before reporting, and to more effectively present the findings obtained from comparison studies on school achievement.

Interactive graphics analyze complex data sets and illustrate the data structure in a flexible and efficient manner.Data can be extracted effectively and effortlessly via interactive graphics [8].In addition, these graphics can be easily generated by various software packages currently available.For this purpose, the free and general visualization software Mondrian (http://stats.math.uni-augsburg.de/mondrian), in particular, is recommended and used in this paper ( [9] [10]; see also [11]).
Ref. [12] outlines the following chief advantages and functions of an interactive graphical system: 1) Queries: Interactive graphics retrieve exact or hidden information from a graphic in different ways; 2) Selection: Interactive graphics can effectively compare groups and select data in a variety of ways with a wide range of tools/methods; 3) Highlighting: Through interactive graphics, the user can "link" each selection to all representations of the data and make direct comparisons; 4) Modification of graphic parameters: The parameters of interactive graphics can be varied and adjusted quickly and efficiently.
Based on the reported national (German) scaling results of the common sample of the IGLU/TIMSS 2011 educational surveys, interactive graphics are used as an example in this study to visualize and explore data.In this respect, the appropriate interactive graphics will be used to analyze the main educational aspects or research questions.The present article attempts to make up for the lack of publications on this topic for large-scale assessments, to the best of our knowledge.The software package Mondrian is primarily used to visualize and explore the educational research data, as well as one plot produced in R [13].
The software program Mondrian helps visualize several data types in various forms.Categorical data can be easily visualized as interactive bar charts, spineplots, and mosaic plots.Continuous data can be displayed as interactive histograms, spinograms, scatterplots, parallel coordinate plots, and box plots.These graphics are described in the following sections.The key benefits of Mondrian are its high level of interactivity and problemfree handling of large data sets, as well as its capability to interact with R to allow for numerical computations from within Mondrian ([9] [10] [14]).

General Information
The research organization International Association for the Evaluation of Educational Achievement (IEA) evaluates the reading comprehension, as well as the mathematical and scientific competencies of fourth-grade students once every five and four years, respectively.The PIRLS/IGLU and TIMSS surveys are used for these evaluations.Both the IGLU and TIMSS surveys were simultaneously implemented in Germany for the first time in 2011.The representative German sample comprised 3928 students from 197 elementary schools.Domainspecific student performances were classified based on a scale of five competency levels.
For analyzing particular competencies across domains, the performance test values were mutually scaled with a multidimensional item response model (mixed-coefficients multinomial logit model; for details, see [15]).In addition, the national performance test values were scaled on a standardized metric (mean of 300 and standard deviation of 100).In an international context, German students performed in the upper third, in both surveys.Hence, Germany's performance was higher than the average of the participating Organisation for Economic Co-operation and Development (OECD) member states [5] [6].
However, Germany showed no significant improvements in IGLU 2011, compared with the 2001 and 2006 surveys.Although the social background significantly affected reading comprehension, students with migrant backgrounds did not show a significantly higher reading performance in the 2011 survey [5].A similar trend can also be observed in the 2011 TIMSS survey.Unfortunately, students showed no significantly improved competencies in both the mathematics and science domains, compared with the previous surveys.However, social background did appear to a significant negative effect, as in case of the IGLU surveys.Nevertheless, it is worth noting that the performance range between weaker and stronger students could be significantly reduced.Furthermore, the three domains did not show significant disparities in terms of gender [6].
This paper highlights the advantages of using interactive graphics to analyze comparison studies on school achievement.The aim is to simplify data exploration before the actual reporting, and to graphically present the results.

Results of Latent Profile Analysis
In addition to the identified performance test values in the three domains, a latent profile analysis (LPA) was conducted.LPA was performed based on the estimated plausible value performance results of the students, with the software Latent GOLD (http://statisticalinnovations.com/products/latentgold.html).
Here, the relationship among the different scales and subscales of IGLU and TIMSS 2011, as well as student groups with similar/different performance or competency profiles, can be determined [16] [17].Hence, students with similar types of competencies are grouped homogeneously via LPA.The Bayesian information criterion (BIC) and consistent Akaike information criterion (CAIC) were computed for LPA models with up to ten performance profiles.An LPA model with seven performance profiles was selected based on these criteria (for details, see [18] [19]).
The proportions and performance means of the seven identified performance profiles are presented in Table 1 and Figure 1.The performance means of the competency domains show a somewhat homogeneous pattern within the performance types (cf.[20]).

Education-Related Questions
Chief educational aspects or questions are explored and subsequently analyzed with appropriate interactive graphics.
"Educational disadvantage" is the main factor analyzed in comparison studies on school achievement.Although ideally every person has access to the same opportunities towards his or her educational goal [21]- [23], Table 1.Proportions and performance means of the seven performance profiles identified over the competency domains.students with migrant background, in particular, are disadvantaged in the German education sector according to comparison studies on school achievement such as PIRLS, TIMSS, or PISA [5] [6].
In light of their relevant cultural and socioeconomic characteristics, as well as their migration background, the present article examines the potential educational disadvantage for students.In this context, the relationships among student performances and several relevant background characteristics are graphically analyzed.Moreover, homogeneity within the particular identified performance profiles is analyzed, which depends on the students' abilities mutually over all three competency domains.On the one hand, the degree of homogeneity within the particular performance profiles can be illustrated and analyzed efficiently through interactive graphics.On the other hand, students with relative performance strengths or weaknesses can be easily evaluated via linked graphics.This easy reckoning can, for instance, identify students with individual learning disabilities or excellent performances in only one domain [24] [25].In practice, such "special cases" require tailored educational programs.Finally, the domains and the performance profiles are compared, considering the relevant background characteristics.
The central educational aspects and research questions are summarized in Table 2, which in turn are presented via the appropriate interactive graphics in the subsequent sections.

Identification of Educational Disadvantage by Means of Background Characteristics
Considerable data on background characteristics are available, in addition to the performance test values assessed in the IGLU/TIMSS 2011 surveys.These comprise, inter alia, cultural and social background characteristics, as well as subject-specific attitudes and self-concepts.
This section focuses on the degree of difference among the seven identified performance types, relative to certain characteristics such as gender or cultural and social properties.In terms of their relevant background information, the potential educational disadvantage for students, in particular, can be graphically explored [21]- [23]: In all comparison studies on school achievement, carried out so far in Germany, it was stated that a part of the students does not master basic skills in reading, writing, and arithmetic.In an ideal education system, every person has the same opportunities to achieve his or her educational goal.However, in reality, some are disadvantaged in this respect, in particular, students with lesser personal, social, financial, or cultural resources (see also [26]).
This section aims to graphically determine the potential educational disadvantage for students based on their relevant background characteristics.The spineplot, produced easily in Mondrian, can be used to this end.

Characterization of performance types and identification of educational disadvantage
Can educational disadvantage for students be explored using relevant background characteristics?Spineplot Is there a relationship between missing values in the background variables and performance profiles?

Homogeneity
Can homogeneous groups of students be identified with similar performance patterns within the performance profiles?

Misallocations
Can misallocations to the performance profiles be identified?

Relative performance strengths and weaknesses
Can relative performance strengths and weaknesses be identified for individual students?Histograms, parallel coordinate plot

Comparison of domains and performance profiles
Can differences between the domains and performance profiles be identified regarding relevant background characteristics?Trellis plot Spineplots are similar to conventional bar charts.However, the conditional probabilities of the analysis can be compared in the former, as the length of the bars is uniformly scaled [9] [27].
The graphical results of the descriptive relationships among student performances and selected background characteristics are summarized in Figure 2

Risk of poverty of family:
The educational disadvantage of students at a risk of poverty can clearly be explored with the variable risk of poverty of family.In particular, the estimated probability P (no risk of poverty | performance profile X i ) and the estimate P (risk of poverty | performance profile X i ) show a consistently increasing relationship on comparison.Families of students with higher performance profiles are at a lesser or no risk of poverty (Figure 2).For example, P (no risk of poverty | performance profile 1) = 0.22 versus P (no risk of poverty | performance profile 7) = 0.79.Similar results can also be obtained when considering the performance profile based on the students' risk of poverty.The results indicate that the probability of belonging to a higher performance profile is greater, in particular for students not exposed to the risk of poverty (Figure 3).
Migration background of parents: Similar patterns of educational disadvantage were also identified based on the variable migration background of parents.In particular, both parents of students with a higher performance profile were significantly more likely to have not been born abroad (Figure 2).As an example, the following estimated probabilities can be considered: P (both parents not born abroad | performance profile 1) = 0.23 versus P (both parents not born abroad | performance profile 7) = 0.86.Moreover, an educational disadvantage can be identified when the probabilities of belonging to a performance profile are analyzed in terms of the migration background of both parents.The results show that students were more likely to belong to a higher performance profile, if both parents were not born abroad (Figure 3).
The graphical results for further background variables can be found in Appendix A.1 (see Figure A1 and Figure A2).

Response Tendencies: Missing Values within Background Variables and Profiles
Although not considered previously, missing values within background variables have been found to be very important in properly evaluating large-scale assessments such as IGLU, TIMSS, or PISA, as they may generally lead to distorted results.Presently, several statistical methods are available to overcome this limitation.The multiple imputation of missing values is most frequently used in educational sciences (e.g., [28] [29]).However, the effects and implications of different imputation methods for large-scale assessments have been rarely studied in depth.With the present study's aim of visualization, plots to explore the structure of missing values are presented, which may aid in similar statistical modeling.
In this respect, the structure of the missing values and their proportions within particular variables must be studied.The missing value plot can be produced with the software Mondrian for this purpose.The key advantage of the missing value plot is its ease of use and interpretation, which is addressed in this study.Simple graphical representations of the proportions of missing values within background variables of interest are presented in Figure 4.
The various factors giving rise to missing values are not discussed in this paper.Instead, the present paper aims to identify the potential response tendencies for selected background variables, especially in terms of classifying students into the specified performance profiles.In this respect, spineplots were used to explore the probabilities of a missing value appearing under a particular performance profile, that is, P (missing value [in a variable] | performance profile X i ).The corresponding graphical explorations for the background variables risk of poverty of family and migration background of parents are reported in Figure 5.
Comparable tendencies with respect to the response behaviors of the students can be identified.Students with lower performance profiles are more likely to have missing values within the two background characteristics.For example, P (missing value | performance profile 1) is greater than P (missing value | performance profile 7).The probability of occurrence of a missing value steadily decreases with higher performance profiles.Similar results were observed with other background variables as well.Therefore, missing values do not seem to be random in this context.Further, the obtained information can be useful in analytical modeling approaches to dealing with missing values, for example.

Homogeneity of Performance Profiles
Via LPA, students with similar abilities in the IGLU/TIMSS 2011 surveys were clustered and classified into highly homogeneous groups (e.g., [16]).Subsequently, homogeneity or deviations thereof within the computed performance profiles are presented graphically.
The degree of homogeneity within profiles can be analyzed in various ways with the help of interactive graphics.The parallel coordinate plot [30]- [32] is used to determine the student performance test values over all three domains.Homogeneity within a particular performance profile is then investigated by selecting and highlighting the target profile.In particular, on the one hand, multidimensional data in two-dimensional space can be visualized, where each of the domains is represented by a parallel vertical axis and the values of the cases/ students are joined by a sequence of line segments.On the other hand, the selected profiles can be easily linked to the parallel coordinate plot.
The performance test values of the students over all three domains with respect to the selected performance profiles 1, 3, 5, and 7 (left: reading dimension; center: mathematics dimension; right: science dimension) are presented in Figure 6.The performance test values related to the seven performance profiles are visualized in Appendix A.2 in more detail (see Figure A3).
Performance profile 1 is particularly striking.Some cases that are not homogeneous within that profile can be graphically detected.An arrow-marked extreme case in Figure 6 can be determined within performance profile 1.This case need not be allocated to this profile, due to the higher test values in the mathematics and science domains.The reason is that students were wrongly allocated to profiles based on the LPA classification.Further research is warranted on alternative methods or a more fine-grained LPA analysis based on higher dimensions, such as an eight-or ten-dimensional competency model underlying the student responses.Interestingly, the relative performance strengths or weaknesses of students lead to their misclassification.This issue is addressed in the following section.

Misallocation in Performance Profiles
Misallocations within the performance profiles must be identified to maintain the high degree of homogeneity and to counteract a heterogeneous pattern within particular profiles.The parallel box plot can be used for this purpose (e.g., [9]), as displayed in Figure 7.The performance test values of the students are presented as parallel box plots, and the individual performance profiles are selected and highlighted therein.
The parallel box plots over the three competency domains for the corresponding profiles 1, 3, 5, and 7 are presented in Figure 7 (all seven performance profiles are visualized in detail in Figure A4 of Appendix A.2).As can be seen, misallocations can be identified individually within each performance profile, albeit with different degrees.For example, the arrows in Figure 7 indicate two misallocated cases.

Identification of Relative Performance Strengths and Weaknesses
Interestingly, homogeneity within profiles may be distorted by the presence of students with relative performance strengths or weaknesses, such as students with learning difficulties or excellent performances in merely one domain (e.g., see [24] [25]).These students may significantly distort homogeneity within performance profiles.Students with relative strengths or weaknesses require a tailor-made education program that effectively targets their specific domains (e.g., see [33]).Therefore, such "special individual cases" must be accurately identified in educational contexts.
The relative performance strength for the reading domain is graphically illustrated using the software Mondrian (see Figure 8).Similarly, the competency domains of mathematics and science can be visualized and the relative performance weaknesses of students graphically identified as well.This is based on the histograms of the performance test values on the one hand and the parallel coordinate plot of the test values over the three domains on the other [30] [32].The interactive nature is beneficially employed for the so-called stepwise selection: First, a relatively high performance test value in the reading domain is selected for the histogram display (Figure 8, Step 1).Next, a corresponding and comparatively low performance test value in the mathematics domain is selected via the and-function implemented in Mondrian (Figure 8, Step 2).This is repeated for the science domain as the last step (Figure 8, Step 3).As a result, the selected relative performance strengths can be read from the parallel coordinate plot using the linking-function of the software Mondrian.
As presented, the graphical procedure identifying students with relative performance strengths or weaknesses may facilitate subsequent and more advanced analyses.For instance, the possible causes for the detected relative performance strengths and weaknesses can be examined in depth with additional linked graphics for interesting background characteristics.

Comparison of Domains and Performance Profiles
The reading, mathematics, and science domains and the identified performance profiles were compared, considering the relevant background characteristics.
The so-called trellis plot can efficiently display this comparison graphically, which can be produced by the R package lattice [34].Trellis plots can be used to represent multivariate data.The trellis plot is based on visualizing conditional plots [34]- [36], such that a single graphic displays different information.For instance, the competency domains and performance values can be compared with each other.The distributions of performance test values with respect to particular background characteristics can be contrasted, or the degree of homogeneity within the performance profiles can be explored.In the example of this paper (see Figure 9), the trellis plot is used to visualize and interpret the domains and performance profiles relative to the background variable risk of poverty of family.In Figure 9, the performance profiles are represented as the columns and the domains as the rows.
Students exposed to the risk of poverty were found to be explicitly disadvantaged educationally with respect to the background variable risk of poverty of family, as derived from the graphic (cf.also Section 4.1).Educational disadvantage can be identified within all three domains, as well as over most of the performance profiles, with few exceptions.In particular, students not exposed to the risk of poverty achieve comparatively higher performance test values than those who are.However, exceptions were students exposed to the risk of poverty within lower performance profiles.Some of those students exposed to the risk of poverty with lower profiles had higher test values in the reading and mathematics domains.This may be partly explained by the comparatively higher proportions of students exposed to the risk found in the lower performance profiles.However, students within the upper performance profiles who are not exposed to the risk of poverty achieved higher performance test values.

Discussion
This paper presents the use of interactive graphics to empirically analyze the scaling results obtained from large-scale educational assessment studies.The efficiency of interactive graphics has been demonstrated using empirical data from the IGLU/TIMSS 2011 surveys.It is an easy, intuitive, and effective method of investigating substantial educational questions or hypotheses.
It is important to note that interactive graphics do not necessarily replace analytical scaling procedures such as the item response theory.Graphical and numerical/analytical approaches are complementary, rather than opposing.The information obtained from visual data exploration may serve as a reference for assessing the plausibility of results derived from inferential statistical methods more qualitatively.In this sense, visualization cannot be considered in the same vein as statistical confirmatory procedures.Nevertheless, and probably because of this, interactive graphics enable one to "recognize or see" relatively complex questions effectively.This method holds future research prospects.Interactive graphics can be very useful in large-scale assessments, where large quantities of data are available and collected over several survey cycles.For example, data can be analyzed more completely with a combination of the item response theory and graphics through mixedmethods strategies.Qualitative and quantitative evidence must be combined systematically and supplementary to each other (e.g., [37]- [39]).To date, a mixed-methods evaluation program combining interactive graphics and the item response theory for large-scale assessments has not been developed, which can be considered in further research.
Classification may also be addressed in future research.In this regard, several available methods of classification have not yet been applied in large-scale assessments, whereat the complementary use of interactive graphics can prove valuable.From an educational perspective, students with relative performance strengths or weaknesses must be systematically identified with specifically designed procedures for determining these "rare or extreme cases."In addition, a more fine-grained subdivision or granularity of the competency domains may be necessary for a more accurate search of peculiar performance patterns.In this respect, the comprehension processes and cognitive demands of the students within the individual domains could also be considered during the analyses.These investigations can certainly benefit from the use of interactive statistical graphics.

Conclusion
In conclusion, graphics have the following features: they are generated rapidly, intuitive and relatively simple to understand, and capable of developing their own dynamics during data analysis, especially due to their nonnormative/descriptive nature.The effectiveness of graphics is well known.Results are often presented or expressed visually.It follows then that graphics can be systematically and profitably integrated into educational science and research, in addition to complementing previous numerical methods.It is hoped that the present paper can contribute to a research basis in this direction.

Figure 1 .
Figure 1.Performance profiles of students in Germany in regards to the three competency domains.
[…] Evaluations of the […] PISA survey and the […] IGLU survey from the year 2006 clarify the persistent link between education and educational opportunities, in which particularly children with migration background are disadvantaged ([22], p. 92).

Figure 2 .
Figure 2. Identification of proportions of background characteristics within performance profiles by means of bar charts and spineplots (from characteristic values of the variables risk of poverty of family and migration background of parents to performance profiles).

Figure 3 .
Figure 3.Comparison of profile assignments based on the background variable values by means of bar charts and spineplots (from performance profiles to characteristic values of the variables risk of poverty of family and migration background of parents).

Figure 4 .
Figure 4. Missing value plot of selected background variables.

Figure 5 .
Figure 5. Exploration of response tendencies by means of bar charts and spineplots (from missing values of the variables risk of poverty of family and migration background of parents to performance profiles).

Figure 6 .
Figure 6.Exploration of homogeneity within the performance profiles by means of parallel coordinate plots (selection of performance test values over the performance profiles 1, 3, 5, and 7).

Figure 7 .
Figure 7. Exploration of misallocations within the performance profiles by means of parallel box plots (selection of performance test values over the performance profiles 1, 3, 5, and 7).

Figure 8 .
Figure 8. Graphical procedure for identifying relative performance strength in the reading domain by means of histograms and parallel coordinate plot (stepwise selection of performance test values).

Figure 9 .
Figure 9.Comparison of domains and performance profiles using the trellis plot.