Anti-Robinson Structures for Analyzing Three-Way Two-Mode Proximity Data

Multiple discrete (non-spatial) and continuous (spatial) structures can be fitted to a proximity matrix to increase the information extracted about the relations among the row and column objects vis-à-vis a representation featuring only a single structure. However, using multiple discrete and continuous structures often leads to ambiguous results that make it difficult to determine the most faithful representation of the proximity matrix in question. We propose to resolve this dilemma by using a nonmetric analogue of spectral matrix decomposition, namely, the decomposition of the proximity matrix into a sum of equally-sized matrices, restricted only to display an order-constrained patterning, the anti-Robinson (AR) form. Each AR matrix captures a unique amount of the total variability of the original data. As our ultimate goal, we seek to extract a small number of matrices in AR form such that their sum allows for a parsimonious, but faithful reconstruction of the total variability among the original proximity entries. Subsequently, the AR matrices are treated as separate proximity matrices. Their specific patterning lends them immediately to the representation by a single (discrete non-spatial) ultrametric cluster dendrogram and a single (continuous spatial) unidimensional scale. Because both models refer to the same data base and involve the same number of parameters, estimated through least-squares, a direct comparison of their differential fit is legitimate. Thus, one can readily determine whether the amount of variability associated which each AR matrix is most faithfully represented by a discrete or a continuous structure, and which model provides in sum the most appropriate representation of the original proximity matrix. We propose an extension of the order-constrained anti-Robinson decomposition of square-symmetric proximity matrices to the analysis of individual differences of three-way data, with the third way representing individual data sources. An application to judgments of schematic face stimuli illustrates the method.


Introduction
The structural representation of proximity data has always been an important topic in multivariate data analysis (see, for example, the monograph by Hubert, Arabie, and Meulman [1]).The term proximity refers to any numerical measure of relation between the elements of two (possibly distinct) sets of objects.Proximities are usually collected into a matrix, with rows and columns representing the objects.Our concern here is only with square-symmetric proximity matrices, within the taxonomy of Carroll and Arabie [2] referred to as two-way one-mode proximity data; the way relates to the dimensionality of the matrix, and the mode to the number of sets of entities the ways correspond to.Proximities are assumed to be nonnegative, and WLOG, are henceforth interpreted as dissimilarities such that larger numerical values pertain to less similar pairs of objects.
The (casual) meaning of structural representation is captured by colloquialisms like "what goes with what" or "what is more or less than".Mathematically, structural representation is defined here either in terms of a discrete non-spatial model or a continuous spatial model.
The former displays the relation between the row and column objects of the proximity matrix as a graph, with objects represented by nodes and dissimilarities fitted by edges of different length connecting the object nodes.The focus here is on the ultrametric tree structure, with objects arranged into nested categories-a display that is often referred to as a hierarchical cluster diagram.The continuous spatial model used here is the unidimensional scale: the interobject relations are translated into an ordering of objects along a line, with the dissimilarities fitted by the distances between objects.Both models are directly comparable for a given proximity matrix because they involve the same number of parameters estimable through least-squares minimization.Substantively, the two models address distinct perspectives on how objects are mentally represented.
Carroll [3] introduced the idea of fitting a proximity matrix by multiple continuous and discrete structures to increase the information extracted about the relations among the row and column objects vis-à-vis a representation featuring only a single structure (see also [1]- [6]).However, it is often difficult to determine which particular set of multiple continuous and discrete structures provides the most faithful representation of a given proximity matrix.To illustrate, assume that for a given proximity matrix two (continuous) unidimensional scales provide superior overall fit as compared to a representation by two (discrete) ultrametric tree structures.At the same time, the first ultrametric tree structure attains substantially higher fit than the first unidimensional scale.We can only directly compare those two first structures.The second structures refer to different data bases.So, should we ultimately choose the discrete or the continuous representation?
We propose to resolve this ambiguity by using a nonmetric analogue of spectral matrix decomposition, namely, the additive decomposition of a proximity matrix into equally-sized matrices, constrained only to display a specific patterning among the cell entries called the anti-Robinson (AR) form.The AR form of a matrix is characterized by an ordering of its cells such that their entries are never-decreasing when moving away from the main diagonal [7].As our ultimate goal, we seek to extract a small number of matrices in AR form such that their sum allows for a parsimonious, but faithful reconstruction of the total variability among the original proximity entries.
But why use the AR decomposition of the proximity matrix instead of the well-known spectral decomposition?A remarkable result in combinatorial data analysis states that fitting a unidimensional scale or an ultrametric tree structure to a square-symmetric proximity matrix generally produces matrices of distance and path length estimates that can be permuted into a perfect AR form [1]. Within this context, order-constrained AR matrix decomposition attains the status of a combinatorial data analytic meta-technique.Given their specific pattering, each extracted AR component relates immediately to the representation by a single unidimensional scale and a single ultrametric tree.As both models are directly comparable for each AR matrix, we can determine which structure provides a superior representation of the associated amount of variability-a feature not available when fitting multiple structures to a data matrix without order-constrained AR decomposition.
The extension of order-constrained AR matrix decomposition to accommodate three-way data for studying individual differences provides the analyst with an instrument to explore complex hypotheses concerning the appropriateness of continuous or discrete stimuli representations from an interindividual as well as an intraindividual perspective.We advocate an approach guided by a principle common in statistics as well as of immediate intuitive appeal, namely, to analyze individual variability within a deviation-from-the-mean framework.First, the individual proximity matrices are aggregated across sources into a single average matrix that is then fit by a small number of AR components (say, two or three).The identified AR decomposition serves as a frame of ref-erence against which the individual proximity matrices are fit in a confirmatory manner, informing us how well the various individuals are represented by the reference decomposition.In addition, for each individual source, the obtained (comfirmatory) AR components themselves can be fit by unidimensional and ultrametric structures.Thus, we obtain (1) an interindividual assessment to what extent the various sources conform to the strucutral reference representation, and (2) an intraindividual comparison whether the distinct (confirmatory) AR matrix components are better represented in terms of a (continuous) unidimensional or (discrete) ultrametric model, with possible implications about the underlying individual cognitive representations of the stimuli.
The next section provides a summary of order-constrained AR matrix decomposition, while its generalization to accommodate three-way two-mode data is introduced in the third section.We conclude with an illustrative application of three-way order-constrained AR matrix decomposition for analyzing the structure of individual differences in judgments of schematic face stimuli.O O ∈ .Thus, in the following, we will consider solely the anti-Robinson (AR) form of proximity matrices, and their representation through sums of matrices of the according pattern.A square-symmetric matrix

Order-Constrained AR Matrix Decomposition
for 1 1 and for 2 for Verbally stated, moving away from the main diagonal of Q , within each row or column, the entries never decrease.
We seek a decomposition of P into the sum of matrices 1 , with K at most equall to N , such that the least-squares loss function or 2 L -norm is minimized: subject to the constraint of all k Q having AR form, where tr denotes the trace function, defined for an arbitrary The task of identifying an additive decomposition of P is addressed by fitting − − P Q Q , and so on.However, the given proximity data matrix P initially does not have AR form.Therefore, the rows and columns of P need first to be re-arranged into a form that matches the desired AR pattern as closely as possible through identifying a suitable permutation ( ) ∑ Q represents a two-fold (constrained) least-squares minimization problem; first, an operation to find a collection of permutations 1 , , K ρ ρ  , for attempting to reorder the matrices into AR form, and second, an estimation step to numerically identify the desired AR components conforming to the order constraints as established through the respective permutations.

Optimal AR Decomposition: Algorithmic and Computational Details
The first task of searching for an optimal collection of permutations, 1 , , K ρ ρ


, represents an NP-hard combinatorial optimization problem, solvable by optimal solution strategies for object sets  of small size, but calling for heuristic approaches for larger N , yet with no guarantee of identifying the globally-optimal permutation.Note that for each component k the number of distinct permutations equals ! 2 N because symmetry of P precludes the necessity of considering reverse permutations.The search for the collection of optimal permutations, 1 , , K ρ ρ * *  , can be carried out through a quadratic assignment (QA) heuristic.In general, solving the QA problem for two given N N × matrices , requires finding a permutation ρ of the rows and columns of A for fixed B such that the sum of the products of corresponding cells is maximized: , with 1 i j N ≤ ≤ ≤ , which yields a perfect and equally spaced AR structure, and let The fixed matrix B serves as the AR target: As the algorithm iteratively tries to maximize ( )


, are also at least locally-optimal.Alternative strategies for solving the QA problem include dynamic programming [10] and branch-and-bound algorithms [11] [12].
The second optimization task of estimating the least-squares approximation Q (in perfect AR form) to ρ P (denoting the permuted matrix), is carried out through iterative projection (IP) as proposed by Dykstra [13] [14].In general, IP offers a computational strategy for solving constrained least-squares minimization problems.Let n X ∈  (with n ∈  ) denote a complete inner product space, also called a Hilbert space.The least-squares approximation to a vector X ∈ x is sought from a closed convex set C X ∈ in the form of a vector ˆC * ∈ x minimizing the least-squares criterion, ( ) ( ) ˆ′ − − x x x x , subject to linear constraints imposed on x as defined by C .Conceptually, ˆ * x can be found directly by projecting x onto C ; in practice, however, this may pose extreme computational demands.Dykstra's solution is as simple as it is elegant: based on the decomposition of the constraint set x is initialized by setting ( ) 0 = x x, followed by the projection of ( ) x , in turn projected onto 2 C , yielding ( ) x to be projected onto 3 C , and so on.The difference between consecutive projections ( ) x is termed the increment, or residual.The algorithm concludes its first cycle of projections onto sets 1 , , W C C  , with the projection of ( ) x , the input vector for the second projection cycle through . However, from the second cycle on, each time 1 , , W C C  are revisited in subsequent cycles, the increment from the previous cycle associated with that particular set has to be removed from the vector before actually proceeding with the projection.
Recall that, given the actual numerical entries in P , ρ * indicates the best possible reordering of the N rows and columns of P into a form matching an AR pattern.Our goal is to estimate an N N × least-squares approximation P , which has perfect AR form.Notice that the arrangement of the N objects in Q has already been determined by ρ * .Thus, we can expand Equation (1) into a collection of ( )( ) , which the desired numerical estimates ij q have to satisfy.Each w C represents a single inequality of the form ( ) Observe that the traceform of the loss function is equivalent to the earlier presented generic form ( ) ( ) p and q by stacking the matrix columns into a vector, yielding the loss function ( ) ( ) , the IP algorithm proceeds by checking for each adjacent pair of row and column objects whether the involved proximities conform to the respective constraints in w C .If at iteration 1 t − a violation is encountered, the particular proximities are projected onto w C and replaced by their projections in iteration t [13].For example, if we observe ( ) < the projections are given by: ( ) ( ) through successive residualizations of P relies on engineering a complex interplay of the two optimization routines, QA and Dykstra's IP.The algorithm is initialized by subjecting P to a QA-based search for a permutation 1 ρ providing a rearrangement of the rows and columns of P , matching the desired AR form as closely as possible, denoted by 1 ρ P .Based on the order of row and column objects, as imposed by 1 ρ , the least-squares optimal AR matrix 1 ρ as defined by (1.1).By updating the target matrix, B , through 1 Q , we initiate a second QA search for a possibly superior permutation 1 ρ of P (of course, 1 = B Q then remains fixed throughout the QA routine).The resulting 1 ρ P will potentially even be closer to optimal AR patterning.Subsequently, 1 Q is refit through IP.The algorithm cycles through QA and IP until convergence (i.e., updating B by 1 Q does not result in any changes of 1 ρ ).
The residual matrix − is submitted to the search for the second AR component, 2 Q .The algorithm switches back and forth between QA and IP, until 2 ρ and 2 Q have been identified, yielding the residual matrix to be forwarded to a third QA-IP search bearing 3 ρ and 3 Q .The algorithm continues until a complete decomposition of P has been attained, usually with K N  .
Evidently, the ultimate solution depends on the initial arrangement of rows and columns of P (i.e., the particular order, in which P is passed to the first QA search cycle); therefore, to reduce the risk of detecting a purely locally-optimal solution, a common nuisance to any heuristic procedure, multiple starts with random permutations are strongly recommended.

Low-AR-Rank Approximation to a Proximity Matrix
Spectral decomposition in linear algebra allows us to factor a given N N × proximity matrix into a sum of equally-sized matrices.Low-rank approximation refers to reconstructing the proximity matrix by a small number, r N  , of those matrices.In an analogous manner, we can state our ultimate analytic objective so as to identify a smallest AR rank decomposition of P such that . When only r components in AR form are retrieved, the algorithm capitalizes on repetitively refitting the residuals of the different k Q .Explicitly, assume that r extracted AR components yield the residual matrix . In attempting to additionally improve the fit of the decomposition obtained, the residuals are added back to 1 Q , succeeded by another run through the QA-IP fitting cycle, very likely detecting a more effective permutation 1 ρ , producing a revised 1 that in turn are restored to the previously estimated 2 Q component, which is then subjected to a new search for updating 2 Q , and so on.

Fit Measure
The quality of a specific lower-AR-rank approximation to P is assessed by the variance-accounted-for (VAF) criterion.VAF is a normalization of the least-squares loss criterion by the sum of squared deviations of the proximities from their mean defined as where p denotes the mean of the off-diagonal entries in , and ( ) q the fitted values of the th k AR component.

The Representation of AR Matrix Components
Each of the AR components 1 , , K Q Q  lends itself immediately to a more restricted representation, either in the form of a (continuous) unidimensional scale [16]- [18] of the elements in  along a single axis, or through a (discrete) ultrametric tree diagram [19].
For a specific AR component matrix , the unidimensional scale representation of its row/column objects can be constructed by estimating object coordinates ( ) on the line minimizing the leastsquares loss function (by including an additive constant k c -see justification below): , subject to the triangle (in)equality constraints on the coordinate values as implied by permutation k ρ * associated with k Q : An ultrametric tree structure can be characterized as a weighted acyclic connected graph with a natural root, defined as the node equidistant to all leaves or terminal nodes.The terminal nodes of an ultrametric tree represent a set of N objects . As a necessary and sufficient condition for a unique ultrametric tree representation, the weights along the paths connecting objects , O O , typically with a distance interpretation and collected into an , must satisfy the three-point condition or ultrametric inequality: for any distinct object triple i O , j O , and l O , the largest two path length distances among ij u , il u , and jl u must be equal.Fitting an ultrametric structure to k Q also rests on minimizing a least-squares loss function: subject to the constraints as defined by the ultrametric inequality: For a detailed technical description of how to construct a unidimensional scale or an ultrametric tree representation, the reader is encouraged to refer to the above references.The choice of these two models is justified by a remarkable result in combinatorial data analysis [1] [8], which links them directly to the AR form of a square-symmetric proximity matrix.Specifically, the matrices of fitted interobject (unidimensional scale) distances and of estimated (ultrametric tree) path lengths can be permuted into perfect AR form.More to the point, optimal unidimensional scale and ultrametric tree structures induce the AR form of the matrices of fitted values.In other words, the AR form is a necessary condition for identifying these two more restricted models, as defined by the order constraints or the ultrametric inequality.We should re-emphasize that, for a given k Q , both models involve the same number of parameters, estimable through least-squares.Thus, their VAF scoresobtainable through normalizing the respective loss function by -are on an equal scale and directly comparable as to which structure provides a superior representation of k Q .A brief technical note will be helpful to elaborate this claim.Recall that fitting the undimensional scale includes estimating an additive constant, c .Employing a least-squares model without an additive constant would amount to fitting a regression model through the origin (in familiar generic notation: i i i y x β = +  ), which-even though justifiable on theoretical grounds in certain instances-in general, has serious disadvantages.First, the residuals usually do not sum to zero; second, the sum of the squared residuals, SSE, might exceed the total sum of squares, SSTO, hence the coefficient of determination, 2 1 , might turn out to be negative, thus becoming essentially meaningless.Lastly, using uncorrected SS as a remedy will avoid a negative 2  R , but 2 R will no longer be bounded between zero and one, creating another interpretation problem.Yet, by including an additive constant 0 β in the model, = + + , we ensure that the obtained VAF fit score is equivalent to the (bounded) 2 R measure in regression.In the unidimensional scaling model, the additive constant is represented by c .
"Two interpretations exist for the role of the additive constant c .We could consider j i x x − to be fitted to the translated proximities ij q c + , or alternatively, j i x x c − − to be fitted to the original proximities ij q , where the constant c becomes part of the actual model" to be fitted to the untransformed proximities ij q [1].Once c is established, constructing a unidimensional scale can be regarded as approximating the transformed proximities ˆij − ; in other words, we are fitting a metric model (if the ij q are subjected to an optimal monotone transformation, then the model represents nonmetric combinatorial unidimensional scaling).Fitting a least-squares intercept model implies the conjecture that the proximities have interval (and not ratio) scale properties.
One might object that the unidimensional scale model now has an additional parameter as compared to the ultrametric tree model.Notice, however, that in the special case of fitting a least-squares ultrametric structure, regression models with or without intercept are identical due to the particular structural side constraints imposed by the ultrametric inequality (thus, not necessitating the explicit inclusion of an additive constant, c):

AR Matrix Decomposition for Three-Way Data
Three-way data, with the third way representing different data sources, such as subjects, experimental conditions, scenarios, or time points, represent a cube having multiple data matrices stacked as slices along the third dimension.Our concern here will be three-way two-mode data, so the slices will consist of square-symmetric proximity matrices of identical dimensions.By introducing the source index 1, , s S =  , the entire data cube is denoted by ( ) { } .The approach for modeling three-way data adopted in this paper is guided by a principle common in statistics as well as of immediate intuitive appeal, namely, to analyze individual variability within a deviation-from-themean framework: based on the individual proximity matrices aggregated across sources, denoted by a P , a best fitting reference AR decomposition is generated into r average AR components, against which the individual data are fit.Concretely, assume that for a P the low AR rank approximation 1 , , r Q Q  was chosen, with each component characterized by a unique ordering of as expressed by the associated permutations 1 , , r ρ ρ  obtained from successive residualizations of a P .The componentwise object arrangements embodied in k ρ serve as a blueprint for (re)constructing the set of IP-constraints, ( ) , determining the H.-F. Köhn specific AR form of k Q , thereby providing a frame of reference for the AR decomposition of the S individual proximity matrices in a confirmatory manner: as for each of the r components the AR-optimal object order is fixed, the confirmatory fitting of the individual sources skips the QA step, and instead proceeds directly with estimating the r source-specific AR components ( )

Q
through IP, employing the r sets of constraints . The VAF criterion computed for each source separately serves as a fit index quantifying how closely the fitted proximities of the ∑ Q of the individual AR decompositions reflect the properties of the reference structure: A source-specific unidimensional or ultrametric representation of the estimated AR components can be used for a direct evaluation of individuals' fit against the reference structure, as well as for a more in-depth inter-and intra-subject analysis, be that by comparing the overall fit scores between sources or by specific fit of components ( )

Q
within a given source s , to determine whether the continuous or discrete structure provides superior representation, with possible implications about the underlying individual cognitive processes.

Application: Judgments of Schematic Face Stimuli
A total of 22 S = graduate students in the Psychology department of the University of Illinois provided pairwise dissimilarity ratings of 12 schematic faces, displayed on a computer screen, on a nine-point scale.Not too different from previous work by Tversky and Krantz [20], the twelve face stimuli were generated by completely crossing the three factors of "Facial Shape", "Eyes", and "Mouth" (see Figure 1).
The 22 individual matrices of dissimilarity ratings s P were aggregated into a single matrix a P by first converting each individual's ratings into z-scores (i.e., subtracting a person's mean rating from each of his or her 66 dissimilarity ratings, and dividing these differences by his or her standard deviation), then averaging the z-scores across the 22 respondents for each pair of faces.Lastly, to eliminate negative numbers, and to put the aggregated dissimilarities back onto a scale similar to the original nine-point scale, the average z-score ratings were multiplied by 2, and a constant value of 4 was added.
The matrix a P was subjected to a QA-IP-based search for an optimal AR decomposition with { } 1, 2,3, 4 K = using multiple runs with initial random permutations of the input proximity matrix.For each decomposition into K components, 10,000 random starts were used.The frequency distributions of the VAF scores obtained for each K are reported in Table 1; the number of decimal places used may seem excessive, but is done here to make the distinct locally optimal solutions apparent.Figure 2 presents the orders of facial stimuli as discovered for the k Q belonging to the AR decomposition of a P into K components with maximum VAF.Both the distribution results in Table 1 as well as the displays of the stimulus orderings in Figure 2 clearly suggest that the aggregated proximity matrix a P is of AR rank 2 r = : first, the VAF increment for 3 K > is minimal; second, while the order of stimuli associated with the first two components reveals an obvious pattern-1 Q : emotional appearance as captured by factor "Mouth"; 2 Q : "circle" shaped versus "solid" eyes-the stimulus arrangements along the third and fourth AR components, 3 Q and 4 Q , lack such pattern.Lastly, across all four decompositions the first two AR components maintain a stable order of the stimuli (apart from a minimal inversion occurring on the second AR component concerning the two right-most faces), which does not hold for the third and fourth AR components.Thus, the biadditive solution with the largest VAF score (0.9593) was chosen as reference AR decomposition, against which the 22 individual rating matrices were fit.
The graphs of the unidimensional scale and ultrametric tree representation of the biadditive AR reference components 1 Q and 2 Q are shown in Figure 3.The interpretation is straightforward: the unidimensional scale constructed for 1 Q orders the face stimuli along the continuum negative to positive expression of emotion, grouping them into three well-defined categories formed by the levels "frowning", "flat", and "smiling" of factor "Mouth" (the exact position of each face stimulus is at its left eye).The corresponding ultrametric tree structure of 1 Q produces an almost perfectly balanced threefold segmentation of the face stimuli, also based on the primary criterion emotional appearance as expressed by the levels of factor "Mouth".Notice that the three categories are not perceived as equally distinct; rather, "flat" and "smile" are merged, while "frown" is still set apart.Within each group, a secondary classification into faces according to "Facial Shape" can be observed, whereas "open" and "solid" circled eyes differentiate between stimuli at the tertiary level (note that this pattern is slightly violated by the "smile" segment).The unidimensional scale representing 2 Q separates faces with "circle" shaped from "solid" eyes, with "oval" faced stimuli placed at the extreme scale poles-an arrangement of stimuli accurately reflected by the ultrametric tree structure.The latter also implies that "Facial Shape" and emotional appearance serve as subsequent descriptive features to group the stimuli.The tied triplet of "circle" shaped faces having "solid" eyes indicates a compromised fit of the ultrametric structure such as its imposition could only be accomplished by equating the respective stimulus distances.One might be tempted to conclude that the unidimensional scaling of the second AR component is degenerate because the twelve stimuli are lumped into four locations only.Upon closer inspection, however, this appears as an absolutely legitimate representation: the scale appropriately discriminates between "solid" and "circle" shaped eyes, while incorporating a secondary distinction based on facial shape.Thus, the retrieved scale accurately reflects a subset of constituting facial features ordering the stimuli in perfect accordance.
Each of the 22 individual source matrices, s P , was then fitted against this biadditive AR reference structure -that is, the confirmatory biadditive AR decomposition of each of the individual proximity matrices was carried out, with the IP constraints derived directly from the object orderings associated with the two AR  2) and ( 5), however, with minor modifications to adjust for the respective residualization; for example: VAF1 1 ( ) • In terms of the overall VAF score, subjects 11, 19, and 21 form the top three group, while the bottom three comprise subjects 1, 4, and 6, with the top and bottom three subjects marked by " • ", and by "  ", respectively.For ease of legibility, all original fit scores were multiplied by 1000, thereby eliminating decimal points and leading zeros.
Table 3 reports the VAF scores obtained for the 22 sources when fitting unidimensional scales and ultrametric tree structures to the individual AR components ( ) The VAF scores correspond to normalizations of the least-squares loss criteria in EquationS ( 3) and ( 4) by . For all sources, except subject 1, the unidimensional scaling of ( )1 s Q results in a much better fit than the corresponding ultrametric tree structure; however, for the second AR component, the ultrametric tree representation attains superior fit.
The most instructive course is to focus on the extremes and for further inspection only individuals who are exceptionally poorly or well represented: the top and bottom three subjects.For succinctness, from the top three group only the best fitting source, subject 11, was chosen for closer inspection.However, all three members of the bottom group, subjects 1, 4, and 6, were completely documented because they promised to provide deeper diagnostic insight into how sources with data of apparent ill-fit were handled by the AR Q .For the ultrametric dendrograms the face stimuli have been arranged to match their order associated with the respective AR components; recall that for fixed ultrametric structure there exist ( ) different ways of positioning the terminal object nodes of the tree diagram.Not too surprising, the unidimensional scale as well as the non-spatial tree representations of 1 Q and 2 Q for subject 11 are almost identical to those of the reference structure, which by and large also applies to the unidimensional scales constructed for subjects 6 and 4.However, the findings for the latter two sources are put into perspective by the their ultrametric dendrograms: wildly dispersed branches, accompanied by several misallocations of stimuli, particularly for the representations of the second AR component, indicate an at best mediocre fit of the data from respondents 6 and 4 to the imposed AR structures.Such ambiguities do not pertain to subject 1, who is an obvious misfit: first, notice the equidistant alignment of stimuli along the unidimensional scale representing the first AR component, indicating a presumable lack of differentiation in the judgments of this subject.Moreover, inspecting the remaining three graphs corroborates the notion of the problematic nature of the data contributed by source 1 as the face stimuli appear to be arranged more or less in random order.
As an additional test of the conclusions from the confirmatory analysis, the individual data matrices of respondents 11, 6, 4 and 1 were reanalyzed through independent biadditive AR decompositions.Based on the logic that the confirmatory fitting could lead to a compromised structural representation of the individual data by imposing constraints inappropriate for a particular source, one might consider the obtained confirmatory configurations-in a transferred sense-as a lower bound representation.In other words, we would expect a well-fitting subject to profit from an independent analysis with the prospect of an only slightly better repressentation.For sources with poor fit, however, the independent analysis will either yield a substantial increase in fit because their idiosyncratic perspective, in disagreement with the majority of the sample, will no longer be distorted by an inadequate confirmatory reference frame, or these subjects will qualify as having provided simply inconsistent or random judgments not amenable to any improved data representation.Each of the four individual proximity matrices was was analyzed using 10,000 random starts.The solutions with the highest VAF score were chosen as final representations.The key diagnostics for subjects 11, 6, 4, and 1 are reported in Table 4.For all sources the overall VAF scores imply that their data are exhaustively approximated by an AR decomposition of rank 2 r = .
The VAF scores for the unidimensional scale and ultrametric tree representations fit subsequently to the two AR components are presented in Table 5.As before, the unidimensional scale representations of the ( )1 s Q attain much higher VAF values across all four sources than the ultrametric tree structures, while the latter provide superior representations of the second AR components ( )2 s Q .For sources 11, 6, 4 and 1, graphs of the unidimensional scales and ultrametric tree structures for ( )1 s Q and ( )2 s Q are given in Figure 6 and Figure 7, respectively.The scale and dendrogram displays obtained for subject 11 afford an immediate interpretation: the arrangement of facial stimuli associated with the first AR component is dominated by the factor "Eyes", with factors "Facial Shape" as secondary and "Mouth" as tertiary perceptual criteria; the ordering of faces identified with the second AR component is determined by the feature hierarchy "Mouth", "Facial Shape" and "Eyes".So, in comparison with the reference structure used for the confirmatory fitting, the independent analysis reveals that subject 11 employs the identical feature hierarchies, but reverses their association with the two AR components.For subject 6, the results are far less consistent.The independent analysis identifies "Mouth" as the primary factor, but in obvious disagreement with the first confirmatory AR component, the faces are no longer ordered along the meaningful emotional continuum "frown"-"flat"-"smile".In addition, the secondary and tertiary criteria "Facial Shape" and "Eyes" are used inconsistently within these three groups-a finding also evidenced by the ultrametric tree.The branching pattern of the dendrogram indicates further inconsistencies in the grouping of stimuli as exemplified by the misclassifications of the two stimuli with "oval" face, "solid" eyes and "flat" mouth, or "circle" shaped face, "solid" eyes and "frowning" mouth line, respectively.The displays for the second AR component confirm these conclusions because both the unidimensional scale as well as the dendrogram are hardly interpretable and, rather resemble a non-systematic, ad hoc mix of factors "Facial Shape" and "Eyes".Similar findings can be stated for subject 4: the scale as well as the tree representation of the first AR component appear to be determined by an inconsistent combination of factors "Eyes" and "Facial Shape", while the arrangement of stimuli in the representations of the second AR component, seems to be governed by "Mouth", but also in an inconsistent manner.In summarizing, remarkably and contrary to our expectations, respondents 6 and 4, do not emerge as subjects with a deviant, nevertheless  ( ) well interpretable pattern of perceptual criteria, but as sources with profound inconsistencies in their judgments.The fuzzy results for source 1 allow for only an obvious explanation: the arrangement of the face stimuli reveals no discernable relation, implying that subject 1 merely contributed random responses.

Conclusions
In contrast to (traditional) calculus-based approaches, combinatorial data analysis attempts to construct optimal continuous spatial or discrete non-spatial object representations through identifying optimal object orderings, where "optimal" is operationalized within the context of a specific representation.For example, Defays [21] demonstrated that the task of finding a best-fitting unidimensional scale for given inter-object proximities can be solved solely by permuting the rows and columns of the data matrix such that a certain patterning among cell entries is satisfied.The desired numerical scale values can be immediately deduced from the reordered matrix.Similarly, for hierarchical clustering problems, in their more refined guise as searches for ultrametric or additive tree representations, optimal solutions are directly linked to particular permutations of the set of objects.Within this context, order-constrained AR matrix decomposition attains the status of a combinatorial data analytic meta technique, essentially pre-processing the observations: the total variability of a given proximity matrix is decomposed into a minimal number of separate AR components, each associated with a unique optimal object ordering related to distinct aspects of variation in the data.Based on the identified object permutations, the different AR components can be directly translated into continuous or discrete representations of the structural properties of the interobject relations.As the two models are fit through least-squares, they are directly comparable formally in terms of fit as well as interpretability-a feature not available when fitting multiple structures to a data matrix directly without initial decomposition.The extension of ordinal AR matrix decomposition to accommodate three-way data provides the analyst with an instrument to explore complex hypotheses concerning the appropriateness of continuous or discrete stimuli representations from an interindividual as well as intraindividual perspective.From a substantive point of view, the results suggest that the set of schematic face stimuli is best represented by a combination of continuous spatial and discrete non-spatial structures: the first AR component, associated with factor "Mouth" and interpretable as emotional appearance, receives superior representation through a unidimensional scale, whereas the second AR component, reflecting a mix of "Facial Shape" and "Eyes", is better represented by a discrete non-spatial structure-from hindsight not too surprising, given the specific manner in  which the stimuli were generated.The most remarkable finding concerns the independent analysis of subjects 6 and 4, who, despite numerically satisfactory fit scores, do not attain a more meaningful stimulus representation.Our initial hypothesis attributed their mediocre confirmatory fit to the biadditive AR reference structure as too restrictive for their distinctive perception.Contrary, the outcome of the independent analysis rather confirms the notion that these subjects simply entertained a weakly elaborated, inconsistent mix of criteria.The instantaneous question posed by these results-and definitely deserving future study-is whether an inconsistent, incomprehensible representation observed with a confirmatory AR decomposition provides sufficient diagnostic evidence to generally discredit the respective data set.More succinctly, is a questionable representation obtained from a confirmatory AR decomposition the incidental effect of an inappropriate structural frame of reference, or does it generally hint at data of poor quality, notwithstanding the context of a confirmatory or independent analysis?We may conjecture that the analysis of individual structural differences through AR decomposition is far less restrictive in its reliance on purely ordinal constraints, and, therefore, is not so susceptible to masking actual inconsistencies hidden in the data by the imposition of a more rigid (continuous) reference configuration.
As a final comment, given the extreme importance of the VAF criterion in selecting the AR reference decomposition as well as in assessing source-specific fit, conducting a large-scale simulation to study the behavior of the VAF measure deserves high priority.Of particular importance seems the question whether different object orders can result in identical VAF scores and to what extent different stimuli orders effect substantial alterations in VAF values.
Formally, we need to find a perfect AR matrix * Q minimizing the least-squares loss function, to the ij p as the triangle (in)equality constraints will be violated: let il

Figure 1 .
Figure 1.The construction of schematic face stimuli.

Figure 2 .
Figure 2. The orders of schematic face stimuli obtained for the various k Q components of the AR decompositions into K components with maximum VAF score.

Figure 3 .Q and ( ) 2 sQ
Figure 3. Biadditive reference AR decomposition of a P (VAF = 0.959): unidimensional scale and ultrametric tree representations.reference components.Table 2 presents the results for the 22 sources, sorted according to their VAF scores as defined by Equation (5), indicating how closely the various individuals actually match the biadditive reference AR structure.A variety of additional diagnostic measures is provided for assessing the differential contribution of the AR components ( )1 s Q and ( )2 s Q .The VAF1 and VAF2 scores indicate how well the two AR components actually fit the corresponding residualizations of s P .They are defined in a manner similar to the overall or individual VAF measures in Equations (2) and (5), however, with minor modifications to adjust for the respective residualization; for example:

Figure 4 .Figure 5 .
Figure 4. Confirmatory biadditive AR decomposition for selected sources: Unidimensional scale and ultrametric tree representations for the first AR component ( )1 s Q .

Figure 6 .
Figure 6.Independent biadditive AR decomposition for selected sources: Unidimensional scale and ultrametric tree representations for the first AR component ( )1 s Q .

Figure 7 .
Figure 7. Independent biadditive AR decomposition for selected sources: Unidimensional scale and ultrametric tree representations for the second AR component ( )2 s Q .

Table 1 .
Frequency distributions of the VAF scores as obtained from 10,000 random starts for

Table 2 .
Ranking of Individual VAF scores plus various fit measures quantifying the separate contributions of the two AR components ( )1 s Q .

Table 3 .
Individual VAF scores for the unidimensional scale (US) and ultrametric tree (UT) representations of the two AR components ( )1 Figure4and Figure5present the unidimensional scales and ultrametric tree graphs fitted to the two individual AR components ( )1

Table 4 .
Independent biadditive AR decomposition of selected sources: Individual VAF scores plus various fit measures quantifying the separate contributions of the two AR components ( )1 s Q .

Table 5 .
Individual VAF scores for the unidimensional scale (US) and ultrametric tree (UT) representations of the two AR components ( )1