Uncovering and Displaying the Coherent Groups of Rank Data by Exploratory Riffle Shuffling

Let n respondents rank order d items, and suppose that d<<n. Our main task is to uncover and display the structure of the observed rank data by an exploratory riffle shuffling procedure which sequentially decomposes the n voters into a finite number of coherent groups plus a noisy group : where the noisy group represents the outlier voters and each coherent group is composed of a finite number of coherent clusters. We consider exploratory riffle shuffling of a set of items to be equivalent to optimal two blocks seriation of the items with crossing of some scores between the two blocks. A riffle shuffled coherent cluster of voters within its coherent group is essentially characterized by the following facts : a) Voters have identical first TCA factor score, where TCA designates taxicab correspondence analysis, an L1 variant of correspondence analysis ; b) Any preference is easily interpreted as riffle shuffling of its items ; c) The nature of different riffle shuffling of items can be seen in the structure of the contingency table of the first-order marginals constructed from the Borda scorings of the voters ; d) The first TCA factor scores of the items of a coherent cluster are interpreted as Borda scale of the items. We also introduce a crossing index, which measures the extent of crossing of scores of voters between the two blocks seriation of the items. The novel approach is explained on the benchmarking SUSHI data set, where we show that this data set has a very simple structure, which can also be communicated in a tabular form.

Uncovering and displaying the coherent groups of rank data by exploratory riffle shuffling Vartan Choulakian and Jacques Allard Université de Moncton, Canada email : vartan.choulakian@umoncton.ca, jacques.allard@umoncton.ca

November 2020
Résumé Let n respondents rank order d items, and suppose that d << n. Our main task is to uncover and display the structure of the observed rank data by an exploratory riffle shuffling procedure which sequentially decomposes the n voters into a finite number of coherent groups plus a noisy group : where the noisy group represents the outlier voters and each coherent group is composed of a finite number of coherent clusters. We consider exploratory riffle shuffling of a set of items to be equivalent to optimal two blocks seriation of the items with crossing of some scores between the two blocks. A riffle shuffled coherent cluster of voters within its coherent group is essentially characterized by the following facts : a) Voters have identical first TCA factor score, where TCA designates taxicab correspondence analysis, an L 1 variant of correspondence analysis ; b) Any preference is easily interpreted as riffle shuffling of its items ; c) The nature of different riffle shuffling of items can be seen in the structure of the contingency table of the first-order marginals constructed from the Borda scorings of the voters ; d) The first TCA factor scores of the items of a coherent cluster are interpreted as Borda scale of the items. We also introduce a crossing index, which measures the extent of crossing of scores of voters between the two blocks seriation of the items. The novel approach is explained on the benchmarking SUSHI data set, where we show that this data set has a very simple structure, which can also be communicated in a tabular form.

Introduction
Ordering the elements of a set is a common decision making activity, such as, voting for a political candidate, choosing a consumer product, etc. So there is a huge literature concerning the analysis and interpretation of preference data scattered in different disciplines. Often rank data is heterogenous : it is composed of a finite mixture of components. The traditional methods of finding mixture components of rank data are mostly based on parametric probability models, distance or latent class models, and are useful for sparse data and not for diffuse data.
Rank data are sparse if there are at most a small finite number of permutations that capture the majority of the preferences ; otherwise they are diffuse. As a running example in this paper, we will consider the famous benchmarking SUSHI data set enumerating n = 5000 preferences of d = 10 sushis, see [1]. The SUSHI data set is diffuse, because there are at most three counts for one observed permutation. It has been analyzed, among others by [2,3,4].
A second data set that we shall also analyze is the APA dataset of size n = 5738 by d = 5, see [5]. APA data set is also considered as non sparse by [2], because all the 120 permutations exist with positive probability.
For a general background on statistical methods for rank data, see the excellent monograph by [6] and the book [7].

Riffle shuffle
The riffle shuffle, see [8], is considered the most popular method of card shuffling, in which one cuts a deck of d cards (aka items) into two piles of sizes d 1 and d 2 , respectively, and successively drops the cards, one by one, so that the piles are interleaved into one deck again.
Let V, named a voting profile, be a set of n preferences on d items. Based on riffle shuffling ideas, [2] proposed the notion of riffled independence to model the joint probability distribution of observed preferences of V . In-dependently, [9] used it for visual exploration of V , naming it two blocks partition of the Borda scored items with crossing of some scores ; this will be further developed here under the following important Assumption : d << n. This means that the sample size n is quite large compared to the number of items d.
SUSHI and APA data sets satisfy this assumption. The most important first step in the application of a riffle shuffling procedure is how to partition the d items into two disjoint subsets. In the probabilistic riffle shuffling approach of [2], the partitioning step is essentially done using some adhoc approach in the case of the SUSHI data set or based on background second order information of the items in the case of the APA data set. While in the exploratory riffle shuffling approach of this paper an optimal partition is obtained by maximizing the cut norm of row centered data, or equivalently by taxicab correspondence analysis of nega coded data.
We compare the two formulations of riffle shuffle, probabilistic and exploratory, in section 10.

Aim
Our aim is to explore and display a given voting profile V by sequentially partitioning it into G coherent groups plus a noisy group ; that is, where G represents the number of coherent groups and cohG(g) is the gth coherent group. Furthermore, each coherent group is partitioned into a finite number of coherent clusters ; that is, where c g represents the number of coherent clusters in the gth coherent group. So the coherent clusters are the building blocks for the coherent groups. We note the following facts : Fact 1 : The assumption d << n induces the new notion of coherency for the clusters and consequently for the groups ; it is a stronger characterization than the notion of interpretability for groups as discussed in [9].
Fact 2 : Each coherent group and its clusters have the same latent variable summarized by the Borda scale.
Fact 3 : Given that the proposed method sequentially peels the data like Occam's razor, the number of groups G is calculated automatically. Furthermore, outliers or uninformative voters belonging to the noisyG are easily tagged.
Fact 4 : The approach is exploratory, visual, data analytic and is developed within the framework of taxicab correspondence analysis (TCA). TCA is an L 1 variant of correspondence analysis developed by [10]. TCA is a dimension reduction technique similar to principal component analysis. In this paper, we shall use only the first TCA factor scores of the items and of the voters.
Two major advantages of our method are : First, we can easily identify outliers. For the SUSHI data, our method tags 12.36% of the voters as outliers, which form the noisy group. While no outliers in the SUSHI data have been identified in [3,4]. Second, it provides a tabular summary of the preferences which compose a coherent group. For instance, consider the first mixture component of the SUSHI data given in [4], where the modal ordering is almost the same as the Borda scale ordering of the ten sushis in cohG(1) obtained by our method, see Table 14 in this paper. The sample size of their first mixture component is 27.56 %, which is much smaller than 48.36%, the sample size of our cohG(1), see Table 14. However, Table 13 of this paper provides a tabular-visual summary of the 2418 preferences which form cohG(1). The visual summary describes different kinds of equivalent similar riffle shufflings of the 2418 preferences, and it provides further insight into the structure of the data. Such interesting visual summaries are missing in [3,4].

Highlights of a coherent cluster
A coherent cluster of voters has interesting mathematical properties and is essentially characterized by the following facts : a) Voters have identical unique first TCA factor score. b) Any voter preference is easily interpreted as a particular riffle shuffling of its items.
c) The nature of riffle shuffling of the items can be observed in the structure of the contingency table of the first-order marginals constructed from the Borda scorings of the voters belonging to the coherent cluster.
d) The first TCA factor scores of the items of a coherent cluster are interpreted as Borda scale of the items. e) We also introduce the crossing index, which measures the extent of interleaving or the crossing of scores of voters between two blocks seriation of the items in a coherent cluster.

Organization
This paper has eleven sections and its contents are organized as follows : Section 2 presents an overview of TCA ; section 3 presents some preliminaries on the Borda coding of the data and related tables and concepts ; section 4 presents Theorem 1, which shows that the first principal dimension of TCA clusters the voters into a finite number of clusters ; section 5 discusses coherent clusters and their mathematical properties ; section 6 discusses riffle shuffling in a coherent cluster ; section 7 introduces the crossing index ; section 8 introduces the coherent groups ; section 9 presents the analysis of APA data set ; section 10 presents a comparison of the two formulations of riffle shuffle probabilistic and exploratory ; and finally we conclude in section 11.
All mathematical proofs are relegated to the appendix. Details of the computation are shown only for the first coherent group of SUSHI data set.

An overview of taxicab correspondence analysis
Consider a n × p matrix X where X ij ≥ 0. We have p j=1 n i=1 X ij = X * * . Let P = X/X * * be the correspondence matrix associated to X ; and as usual, we define p i * = p j=1 p ij , p * j = n i=1 p ij . Let D n = Diag(p i * ) a diagonal matrix with diagonal elements p i * . Similarly D p = Diag(p * j ). Let k = rank(P) − 1.
In TCA the calculation of the dispersion measures (δ α ), principal axes (u α , v α ), principal basic vectors (a α , b α ), and principal factor scores (f α , g α ) for α = 1, ..., k is done in a stepwise manner. We put P 1 = (p . Let P α be the residual correspondence matrix at the α-th iteration.
The variational definitions of the TCA at the α-th iteration are The α-th principal axes are and the α-th basic principal vectors are and the α-th principal factor scores are furthermore the following relations are also useful u α = sgn(b α ) = sgn(g α ) and v α = sgn(a α ) = sgn(f α ), (6) where sgn(.) is the coordinatewise sign function, sgn(x) = 1 if x > 0, and sgn(x) = −1 if x ≤ 0. The α-th taxicab dispersion measure δ α can be represented in many different ways The (α + 1)-th residual correspondence matrix is An interpretation of the term D n g α f ′ α D p /δ α in (8) is that, it represents the best rank-1 approximation of the residual correspondence matrix P α , in the sense of taxicab norm.
In CA and TCA, the principal factor scores are centered ; that is, The reconstitution formula in TCA and CA is In TCA, the calculation of the principal component weights, u α and v α , and the principal factor scores, g α and f α , can be accomplished by two algorithms. The first one is based on complete enumeration based on equation (3). The second one is based on iterating the transition formulae (4,5,6). This is an ascent algorithm ; that is, it increases the value of the objective function at each iteration, see [11]. The iterative algorithm could converge to a local maximum ; so it should be restarted from several initial configurations. The rows or the columns of the data can be used as starting values.

Preliminaries
In this section we review a) The Borda scoring of a voting profile V into R and the Borda scale ; b) Contingency table of the first order marginals of R ; c) The coded tables R double and R nega .

Borda scorings and Borda scale
Let A = {a 1 , a 2 , . . . , a d } denote a set of d alternatives/candidates/items, and V a set of n voters/individuals/judges. In this paper we consider the linear orderings/rankings/preferences, in which all d objects are rank-ordered according to their levels of desirability by the n voters. We denote a linear order by a sequence s = (a k 1 ≻ a k 2 ≻ . . . ≻ a k d ), where a k 1 ≻ a k 2 means that the alternative a k 1 is preferred to the alternative a k 2 . The Borda scoring of s, see [12] , is the vector b(s) where to the element a k j the score of (d − j) is assigned, because a k j is preferred to (d − j) other alternatives ; or equivalently it is the jth most preferred alternative. Let R = (r ij ) be the matrix having n rows and d columns, where r ij designates the Borda score of the ith voter's preference of the jth alternative. We note that the ith row of R will be an element of S d the set of permutations of the elements of the set {0, 1, 2, ..., d − 1} . A toy example of R is presented in Table 1 for n = 4 and d = 3.
The Borda scale of the elements of A is β = 1 ′ n R/n, where 1 n is a column vector of 1's having n coordinates. The Borda scale seriates/orders the d items of the set A according to their average scores : β(j) > β(i) means item j is preferred to item i, and β(j) = β(i) means both items (a i , a j ) are equally preferred. In the toy example of Table 1, the Borda scale seriates Similarly, we define the reverse Borda score of s to be the vector b(s), which assigns to the element a k j the score of (j − 1). We denote R= (r ij ) to be the matrix having n rows and d columns, where r ij designates the reverse Borda score of the ith judge's nonpreference of the jth alternative. The reverse Borda scale of the d items is β = 1 ′ n R/n. We note that Table 1 : Toy example with n = 4 preferences of d = 3 items.

Contingency table of first-order marginals
The contingency table of first order marginals of an observed voting profile V on d items is a square d × d matrix M, where M(i, j) stores the number of times that item j has Borda score i for i = 0, ..., d − 1, see [6, p.17]. Table 2 displays the matrix M for the toy example R displayed in Table 1. We note the following facts : a) It has uniform row and column marginals equal to the sample size. b) We can compute the Borda scale β from it. c) It reveals the nature of crossing of scores attributed to the items for a given binary partition of the items. For the toy example, consider the partition {C} and {B, A} with attributed scores of {0} and {1, 2} respectively (this is the first step in a riffle shuffle). Then the highlighted cells (marked in bold) in Table 2 show that there are two crossing of scores, permutation (transposition) of the scores 0 and 1, between the sets {C} and {B, A}, (this is the second step in a random shuffle). Furthermore the third row of Table 2 shows that the score 2 is equally attributed to both items of the set {B, A} and it never crossed to {C}.

Coded tables R double and R nega
Our methodological approach is based on Benzécri's platform, see [13, p.1113] , that we quote : " the main problem inductive statistics has to face is to build tables that, through appropriate coding and eventual supplementation, give to the available data such a shape that the analysis is able to extract from it the answer to any question that we are allowed to ask ". Italics are ours.
There are three elements in Benzécri's platform : a) coding, a kind of preprocessing of data, will be discussed in the following paragraph ; b) eventual supplementation consists in applying TCA and not correspondence analysis (CA), because in the CA case we do not have a result similar to Theorem 1 ; c) question that we are allowed to ask is to explore and visualize rank data.
Within the CA framework, there are two codings of rank data R double and R nega. .

R double
The first one is the doubled table of size (2n) × d proposed independently by [14,15], where they showed that CA of R double is equivalent to the dual scaling of Nishisato coding of rank data, see [16]. CA of R double is equivalent to CA of its first residual correspondence matrix double shows that each row is centered as in Carroll's multidimensional preference analysis procedure, MDPREF, exposed in Alvo and Yu (2014, p.15). In TCA the objective function to maximize is a combinatorial problem, see equation (3) ; and the first iteration in TCA of R double corresponds to computing

R nega
In the second approach, we summarize R by its column total ; that is, we create a row named nega = nβ = 1 ′ n R, then we vertically concatenate nega to R, thus obtaining R nega = ( R nega ) of size (n + 1) × d.
[17] discussed the relationship between TCA of R double and TCA of R nega : TCA of R nega can be considered as constrained TCA of R double , because we are constraining the vector −v t = −1 t n in (11) ; that is, the objective function to maximize corresponds to computing So, we see that if in (11) the optimal value of v = 1 n , then δ double Define the set of indices I + = {i|v 1i = 1} and shows that the summation in (13) is restricted to the subset of assessors that belong to I + . The subset I + indexes the voters having the same direction in their votes. Given that we are uniquely interested in the first TCA dimension, all the necessary information is encapsulated in I + , as discussed in [17,9] using other arguments. Furthermore, δ 1 in (13) equals four times the cut norm of .., d} and T ⊆ I; it shows that the subsets I + and S + are positively associated, for further details see for instance, [18,19].
In the sequel, we will consider only the application of TCA to R nega .

First TCA voter factor scores of R nega
We show the results on the SUSHI data set enumerating n = 5000 preferences of d = 10 sushis, see [1]. Even though, our interest concerns only the first TCA voter factor scores of a voting profile V 1 , it is a common practice in CA circles to present the principal map of the row and column projections. Figures 1 and 2 display the principal maps obtained from CA and TCA of R nega of the SUSHI data denoted by V 1 . We observe that, TCA clusters the voters into a finite number of discrete patterns, while CA does not : This is the main reason that we prefer the use of TCA to the use of the classical well known dimension reduction technique CA.
We have the following theorem concerning the first TCA principal factor scores of the voters belonging to a profile V 1 , f 1 (i) for i = 1, ..., n, where Theorem 1 a) The maximum number of distinct clusters of the n voters belonging to Remark 1 a) We fix f 1 (nega) < 0 to eliminate the sign indeterminacy of the first bilinear term in (10).
where the voters of the αth cluster are characterized by their first TCA factor score ; that is, A cluster of voters defined in Remark 1b, V 1,α for α = 1, ..., d 1 d 2 + 1, can be classified as coherent or incoherent. And this will be discussed in the next section.

Coherent cluster
The following definition characterizes a coherent cluster.
(i) is the first TCA factor score of the voter i∈V 1,α obtained from TCA of subprofile V 1,α .
b) Definition 1 implies that a cluster V 1,α is coherent when for all voters i∈V 1,α the first TCA factor score f V 1,α 1 (i) does not depend on the voter i, but it depends on (α, d 1 , d 2 ).
Corollary 1 : It follows from Remark 1a and equation (13) that, a necessary condition, but not sufficient, for a cluster V 1,α to be coherent is that its first TCA factor score obtained from TCA of V 1 is strictly positive ; that is, 0 < f V 1 1 (i) for i ∈ V 1,α . Example 2 : Figures 3 through 9 show the coherency of the clusters of voters V 1,α for α = 1, ..., 7, where dots represent clusters of voters; while Figure 10 shows the incoherence of the cluster V 1,8 . Further, the first three columns of Table 3 display the mathematical formulation of the 7 coherent clusters cohC 1 (α) = V 1,α for α = 1, ..., 7 as defined in Remark 1b and their sample sizes |V 1,α |.
is the first TCA dispersion value obtained from TCA of V, and f 1 (nega) is the first TCA factor score of the row nega.
The equality in Proposition 1 is attained only for coherent clusters as shown in the following result.

Interpretability of a coherent cluster
The following result shows that for coherent clusters, the first TCA dimension can be interpreted as Borda scaled factor. Proposition 3 : The first TCA column factor score of the item j, g 1 (j), is an affine function of the Borda scale β(j); that is, g 1 (j) = 2 d−1 β(j) − 1 for j = 1, ..., d. Or corr(g 1 , β) = 1.

Remark 3 :
The first TCA principal factor score of item j for j = 1, ..., d is bounded : Example 4 : Table 4 displays the Borda scales of the items, sushis, in the seven coherent clusters cohC 1 (α) = V 1,α for α = 1, ..., 7. To identify the sushi type, one has to refer to Figure 2 ; for instance, j10 corresponds to 10cucumber roll in Figure 2. We observe the following main fact : For each of the seven coherent clusters, the first TCA principal axis produced the same binary partition of the items : J 1 = {j10, j7, j4, j9} characterized by 4.5 > β(j 1 ) for j 1 ∈ J 1 , and J 2 = {j3, j1, j2, j6, j5, j8} characterized by β(j 1 ) > 4.5 for j 2 ∈ J 2 . The six sushis in J 2 have Borda scales above average score of 4.5 = (0 + 9)/2 ; while the four sushis in J 1 have Borda scales below average score of 4.5. Now we ask the question what are the differences among the seven coherent clusters ? The answer is riffle shuffling of the scores of the items, which we discuss next.
6 Exploratory riffle shuffling [8] is the seminal reference on riffle shuffling of cards. [2] generalized the notion of independence of two subsets of items to riffled independence to uncover the structure of rank data. Within the framework of data analysis of preferences, exploratory riffle shuffling can be described in the following way. We have two sets : J a set of d distinct items and S a set of d Borda scores. We partition both sets into two disjoint subsets of sizes d 1 and Riffle shuffling consists of two steps. In the first step, we attribute the scores of S 1 to J 1 and the scores of S 2 to J 2 . In the second step, we permute some scores attributed to J 1 with the same number of scores attributed to J 2 . The second step can be mathematically described as an application of a permutation τ , such that τ J (S 1 , S 2 ) = (τ J 1 (S 1 ), τ J 2 (S 2 )). We interpret τ J 1 (S 1 ) as the set of scores attributed to J 1 , and τ J 2 (S 2 ) as the set of scores attributed to J 2 .
Example 5 : Table 5  Further, we note by |τ J 1 (S 1 )| the number of voters who have done the riffle shuffle (τ J 1 (S 1 ), τ J 2 (S 2 )). So |τ J 1 (S 1 ) = {0, 1, 2, 3} | = 4, | {0, 1, 2, 5} | = 2 and | {0, 1, 4, 5} | = 1. The permuted scores between the two blocks of items is in bold in Table 5.  Remark 4 : A useful observation that we get from Example 5 is that we can concentrate our study either on J 1 or on J 2 : For if we know τ J 1 (S 1 ), the scores attributed to J 1 , we can deduce τ J 2 (S 2 ), the scores attributed to J 2 because of mutual exclusivity constraints ensuring that any two items, say a and b, never map to the same rank by a voter.
A simple measure of magnitude of (d 1 , d 2 ) riffle shuffling of a voter i is the sum of its Borda scores attributed to the items in J 1 ; that is, where r ij is the Borda score attributed to item j by voter i. In Table 5, for the first four voters, T i (τ J 1 (S 1 )) = 6 for i = 1, ..., 4, which is the minimum attainable sum of scores ; it implies that for these voters there is no crossing of scores between the two blocks J 1 and J 2 . While for voters 5 and 6, T i (τ J 1 (S 1 )) = 8 for i = 5, 6; for voter 7, T 7 (τ J 1 (S 1 )) = 10. These values show that the crossing of scores between the two blocks J 1 and J 2 of voters 5 and 6 are at a lower level than the crossing of scores for voter 7.
For relatively small sample sizes, it is easy to enumerate the different types of (d 1 , d 2 ) riffle shuffles. For relatively large sample sizes, we use the contingency table of first-order marginals, that we discuss next.

Types of (d 1 , d 2 ) riffle shufflings in a coherent cluster
The contingency table of first order marginals of an observed voting profile V on d items is a square d × d matrix M, where M(i, j) stores the number of times that item j has Borda score i for i = 0, ..., d − 1, see subsection 3.2. It helps us to observe types of (d 1 , d 2 ) riffle shufflings in a coherent cluster as we explain in Example 6.      Example 6 : Tables 6 to 12 display M 1,α for α = 1, ..., 7, the contingency tables of first order marginals of the seven coherent clusters of the SUSHI data, respectively. We observe the following : Each one of them reveals the nature of the riffle shuffles of its coherent cluster, which are summarized in Table 13. The number of observed (4, 6) blocks of scores for the seven coherent clusters, (τ J 1 (S 1 ), τ J 2 (S 2 )), is only 27 in Table 13 out of the possible total number of 10!/(4!6!) = 210. The counts of the observed (4, 6) blocks do not seem to be uniformly distributed in Table 13. Furthermore, we observe that as α increases from 1 to 7, the magnitude of riffle shuffles, T v (τ J 1 (S 1 )), increases in the coherent clusters from 6 to 12. Integers in bold in Table 13 are the shuffledcrossed scores.  x + y = 147, which is the number of 0s not attributed to J 1 in M 1,7 . u + w = 72, which is the number of 7s attributed to J 1 in M 1,7 . s + z + x = 169, which is the number of 6s attributed to J 1 in M 1,7 . t + z + y = 157, which is the number of 5s attributed to J 1 in M 1,7 . t + s + w + y = 185, which is the number of 4s attributed to J 1 in M 1,7 . u + t + x = 158, which is the number of 3s attributed to J 1 in M 1,7 . u + s + x + y = 218, which is the number of 2s attributed to J 1 in M 1,7 .

Crossing index
The following (d 1 , d 2 ) crossing index is based on the internal dispersion of a voting profile.
Definition 3 : For a voting profile V we define its crossing index to be by Proposition 2.
where δ 1 (V d 1 ,d 2 ) is the first taxicab dispersion obtained from TCA of V and (d 1 , d 2 ) represents the optimal TCA binary partition of the d items of V such that d = d 1 + d 2 .

Proposition 4 : The crossing index of a coherent cluster is
Cross(cohC(α)) = 2(α − 1) Example 7 : The last column in Table 3 contains the values of the crossing indices of the seven coherent clusters of the first iteration of SUSHI data. We observe : a) Cross(cohC 1 (1)) = 0, because the structure of its matrix of first order marginals, M 1,1 , is block diagonal ; which means that the permutation τ is the identical permutation, so there are no crossing of scores between the two subsets of items J 1 and J 2 in cohC 1 (1). b) Cross(cohC 1 (α)) for α = 1, ..., 7 is a uniformly increasing function of α, similar in spirit to the T v (τ J 1 (S 1 )) statistic. c) For the incoherent cluster

Coherent group
Our aim is to explore a given voting profile V by uncovering its coherent mixture groups, see equation (1) ; that is, V = ∪ G g=1 cohG(g) ∪ noisyG, where G represents the number of coherent groups and cohG(g) is the gth coherent group. The computation is done by an iterative procedure in n G steps for n G ≥ G that we describe : For g = 2 ; compute cohG(2) from V 2 , then partition V 2 = V 3 ∪ cohG(2); By continuing the above procedure, after n G steps, we get V = ∪ n G g=1 cohG(g).
However, some of the higher ordered coherent groups may have relatively small sample sizes ; so by considering these as outliers, we lump them together thus forming the noisy group denoted by noisyG in equation (1).
Let us recall the definition of a coherent group given in equation 2 that is, a coherent group is the union of its coherent clusters. This implies that the sample size of cohG(g) equals the sum of the sample sizes of its coherent clusters |cohG(g)| = cg α=1 |cohC g (α)|.
As an example, for the SUSHI data, from the 2nd column of Table 3 we can compute the sample size of the first coherent group Furthermore, cohG(1) is composed of 27 observed riffle shuffles summarized in Table 13, which provides quite a detailed view of its inner structure. The next result shows important characteristics of a coherent group inherited from its coherent clusters.
Theorem 2 : ( Properties of a coherent group cohG(g)) a) The first principal column factor score g 1 of the d items in a coherent group is the weighted average of the first principal column factor score g 1 of the d items of its coherent clusters ; that is, And corr(g 1 (cohG(g), β(cohG(g)) = 1.
b) The first TCA dispersion value of a coherent group is the weighted average of the first TCA dispersion values of its coherent clusters ; that is, c) The crossing index of a coherent group is the weighted average of the crossing indices of its coherent clusters ; that is, Cross(cohC g (α)).
We can discern the following grouped seriation (bucket ranking) of the items The groupings are based on the standard 95% confidence intervals of the Borda scale of the items.
The 2nd coherent group cohG (2), summarized by its Borda scales in Table 14, is made up of eight coherent clusters ; it is composed of 19.0% of the sample with crossing index of 35.38%. The voters in this coherent group disapprove {uni(seaurchin), sake(salmonroe)} , which are considered more "daring sushis".
The third coherent group cohG (3), summarized by its Borda scales in Table  14, is made up of eight coherent clusters ; it is composed of 13.24% of the sample with crossing index of 27.3%. The voters in this coherent group prefer the three types of tuna sushis with sea urchin sushis.
The fourth coherent group cohG(4), summarized by its Borda scales in Table  14, is made up of eight coherent clusters ; it is composed of 6.94% of the sample with crossing index of 35.27%. The voters disapprove the three types of tuna sushis.
b) The four coherent groups summarized in Table 14 can also be described as two bipolar latent factors : By noting that the only major difference between the first two coherent groups is that (5. uni (sea urchin), 6. sake (salmon roe)) are swapped with (7. tamago (egg), 4. ika (squid)). While the only major difference between the third and fourth coherent groups is that the three tunas are swapped with (4. ika (squid), 5. uni (sea urchin), 1. ebi (shrimp)). c) We consider the fifth group as noisy (outliers not shown) composed of 12.36% of the remaining sample : it contains cohG(5) = ∪ 2 α=1 cohC 5 (α) whose sample size is 38, a very small number. For the sake of completeness we also provide the sample sizes of its two coherent clusters |cohC 5 (1)| = 22 and |cohC 5 (2)| = 16.

APA data set
The 1980 American Psychological Association (APA) presidential election had five candidates : {A, C} were research psychologists, {D, E} were clinical psychologists and B was a community psychologist. In this election, voters ranked the five candidates in order of preference. Among the 15449 votes, 5738 votes ranked all five candidates. We consider the data set which records the 5738 complete votes ; it is available in [20, p.96] and [5, T able 1]. The winner was candidate C.  Table 15 compares the results obtained by our method and the best distancebased mixture model given in [21]. Distance-based models have two parameters, a central modal ranking and a precision parameter. The precision parameter mea-sures the peakedness of the distribution. [21] found that the Cayley distance produced better results than the Kendall and Spearman distances using BIC (Bayesian information criterion) and ICL (integrated complete likelihood) criteria. Parts a and b of Table 15, are reproduced from [21,Tables 4 and 5].
Part c of Table 15 summarizes the results of our approach, where we only describe the first four coherent groups : We find only the first two coherent groups as meaningfully interpretable based on the a priori knowledge of the candidates. Voters in cohG (1), with sample size of 31%, prefer the research oriented psychologists {A, C} over the rest. Voters in cohG (2), with sample size of 23.7%, prefer the clinical psychologists {D, E} over the rest. We interpret cohG(3) and cohG(4) as mixed B with 14.23% and 12.% of the voters, respectively. Additionally, there is a noisyG making up 19.1% of the sample, which comprises cohG(5) displayed in Table 15.
[5] discussed this data set quite in detail ; surprisingly, our results confirm his observations : a) There are two groups of candidates, {A, C} and {D, E} . The voters line up behind one group or the other ; b) The APA divides into academicians and clinicians who are on uneasy terms. Voters seem to choose one type or the other, and then choose within ; but the group effect predominates ; c) Candidate B seems to fall in the middle, perhaps closer to D and E.
The following important observation emerges from the comparison of results in Table 15. We have two distinct concepts of groups for rank data, categorical and latent variable based. To see this, consider groups 3 and 4 in part a of Table  15 : Group 3 is based on the modal category B ≻ C ≻ A ≻ D ≻ E and group 4 is based on the modal category B ≻ C ≻ A ≻ E ≻ D. The only difference between these two modal categories is the permutation of the least ranked two clinical psychologist candidates {D, E} ; this difference is not important and does not appear in our approach, which is a latent variable approach.
Parameters of the best mixture model selected, Cayley-based, using ICL Group sample% modal ordering precision 1 100 c) The first five coherent groups, each composed of two coherent clusters.

Description
The eight coherent clusters of the first four coherent groups can simply be described as : In this case, we can also visualize all the orderings belonging to a coherent group : Figures 11 and 12 display all the preferences belonging to the two coherent clusters of the first coherent group. The label CAEBD162 in Figure 11 should be interpreted as the preference C ≻ A ≻ E ≻ B ≻ D repeated 162 times.

Riffle independence model
Riffle independence is a nonparametric probabilistic modelling method of preferences developed by [2], which generalizes the independence model. It can be described in the following way : (a) Partition the set J of d distinct items into two disjoint subsets J 1 of size d 1 and J 2 of size d 2 . Then generate an ordering of items within each subset according to a certain ranking model. This implies that any ordering of the d items can be written as a direct product of two disconnected orderings ; which in its turn implies the independence of the two subsets J 1 and J 2 . So the model complexity of this step is of order d 1 ! + d 2 !.
(b) Interleave the two independent orderings for these two subsets using a riffle shuffle to form a combined ordering. An interleaving is a binary mapping from the set of orderings to {J 1 , J 2 }. The model complexity of this step is of order d!/(d 1 !d 2 !). The interleaving step generates the riffled independence of the two subsets J 1 and J 2 . So the combined model complexity of both steps is For example, consider an ordering of the items in the set J = {A, B, C, D, E, F } from its two subsets J 1 = {A, C} and J 2 = {B, D, E, F } . In the first step, relative orderings of the items in J 1 and J 2 are drawn independently. Suppose we obtain the relative ordering ϕ(J 1 ) = (C ≻ A) in J 1 , and the relative ordering ϕ(J 2 ) = (B ≻ D ≻ F ≻ E) in J 2 . Then, in the second step, the two relative orderings are combined by interleaving the items in the two subsets. For instance, if the interleaving process is ω(J 1 , J 2 ) = (J 1 , J 2 , J 2 , J 1 , J 2 , J 2 ), where the relative ordering of the items in each subset remains unchanged, the combined ordering is then determined by the composition Given the two subsets J 1 and J 2 with their orderings ϕ(J 1 ) and ϕ(J 2 ) and interleaving ω(J 1 , J 2 ) generated from models with probability distributions f J 1 , g J 2 and m ω , respectively, the probability of observed ordering under the riffle independence model is There are two formulations of riffle shuffle for rank data in statistics : probabilistic and exploratory. In the riffled independence model, the set of items is partitioned recursively, while in the exploratory approach the set of voters is partitioned recursively.

Conclusion
The main contribution of this paper is the introduction of an exploratory riffle shuffling procedure to reveal and display the structure of diffuse rank data for large sample sizes. The new notion of a coherent cluster, that we developed, is simply based on the geometric notion of taxicab projection of points on the first TCA axis globally and locally ; furthermore, it has nice mathematical properties. Coherent clusters of a coherent group represent the same latent variable opposing preferred items to disliked items, and can easily be interpreted and displayed.
Like Occam's razor, step by step, our procedure peels the essential structural layers (coherent groups) of rank data.
Our method was able to discover some other aspects of the rank data, such as outliers or small groups, which are eclipsed or masked by well established methods, such as distance or random utility based methods. The major reasons for this is that in random utility based methods the multivariate nature of a preference is reduced to binary preferences (paired comparisons), and in Mallows distance related methods distances between any two preferences are bounded.
We presented a new index, Cross, that quantifies the extent of crossing of scores between the optimal binary partition of the items that resulted from TCA. The crossing index of a group is based on the first taxicab dispersion measure : it takes values between 0 and 100%, so it is easily interpretable.
The proposed approach can easily be generalized to the analysis of rankings with ties and partial rankings.
The package TaxicabCA written in R available on CRAN can be used to do the calculations.
The first residuel correspondence matrix will be Consider the nontrivial binary partition of the set S = {0, 1, ..., d − 1} into S = S 1 ∪ S 2 , where |S 1 | = d 1 , |S 2 | = d 2 and d = d 1 + d 2 . To eliminate the sign indeterminacy in the first TCA principal axis, we fix v 1 (nega) = v 1 (n + 1) = −1; and we designate by S 1 the set of item indices such that the first TCA principal axis coordinates are negative, that is, u 1 (j) = −1 for j ∈ S 1 . It follows that u 1 (j) = 1 for j ∈ S 2 . Now we have by (4) and from which we deduce by (5) for i = 1, ..., n f i1 = a i1 We have the following Theorem concerning the first TCA principal factor scores of respondents f i1 for i = 1, ..., n. It follows that so j∈S 1 r ij can take at most d 1 (d 2 +d−1) . Then f i 1 1 = 2d 1 d − 4 d(d−1) (−1 + j∈S 1 r ij ) will be the contiguous higher value to f i1 ; and similarly f i 2 1 = 2d 1 d − 4 d(d−1) (1 + j∈S 1 r ij ) will be the contiguous lower value to f i1 ; and the required result follows.
Proof : Easily shown by using Definition 3 and Proposition 2.
The proof of Theorem 2a easily follows from Theorem 3. The proof of Theorem 2b is similar to the proof of Propostion 1. The proof of Theorem 2c is similar to the proof of Propostion 4.