Singular Valued Decomposition and Principal Component Analysis to Compare Market Indexes

In this paper, we used the Singular Value Decomposition (SVD) to find the relationships in the fluctuation of the six market indexes CAC 40, DAX, DOW JONES 30, FTSE 100, IBEX35 and NIKKEI 225 during the year 2018. This technique allows relating several indexes in a very similar way the classical Principal Component Analysis (PCA). In fact, we will just use the statistical software to confirm some results.


Introduction
It is assumed that there are six indexes: CAC 40, DAX, DOW JONES 30, FTSE 100, IBEX35 and NIKKEI 225 with 254 n = trading days, in fact not all indexes have the same number of days, when a value was missing the value of the index has been repeated. In this order, let be ( ) (   ) 254 6   1  2  3  4  5  6 , , , , , .
ij q × = = ∈ Q q q q q q q  (1) In Figure 1, you can see these values, although the difference in size prevents any idea of their possible relationships, in other words, the data needs to be normalized. The first is centering the values in each column using the mean value for that column, i.e. , , , , , with , for 1, , 254; 1, , 6 i j = =   . In Figure 2, we have plotted the columns of the matriz P . In this graphic, it already seems to detect possible relations between the variations in the indices.
The question considered is, is there a connection between the movements in the indexes of the markets? and if there are, which are the relationships and which are not?
This paper is organized as follows. In Section 2, we describe the principal components and possible linear approaches. In Sections 2 and 5, we study the one and two dimensional approximations respectively and we compare our results with those obtained with the software in Matlab. In Section 8, we consider the five dimensional approximation with a result maybe a little surprising. Finally in Section 11, we analyze the numerical results and draw the main conclusions.
Our numerical methods were implemented in Matlab, the codes are available on request. The experiments were carried out in an Intel(R) Core(TM)2 Duo CPU U9300 @ 1.18 GHz, 1.91 GB of RAM.

Principal Component
The goal of Principal Component Analysis is to find the principal directions of the normalized data matrix . This technique has widely be used in computer science for data reduction as it enables to summarise the main directions of the data set. However, we will use an alternative option to Component Analysis called singular value decomposition. Before properly entering data processing we split our data set into a training and testing set, respectively  . 1 P is used to find the principal components while 2 P is used to test the accuracy of the approximations. In practice, we select the even rows of the dataset for training and the odd for testing.
In particular, SVD can be seen as a basis transformation that enables us to go from an initial six dimension corresponding to the initial data basis into a second six-dimension space with orthogonal basis. U and V correspond to the transformation matrices. In Matlab the relevant command is svd. In this particular case, the singular values of 1 P , that is the diagonal elements of the matrix Σ Because we want to determine the best linear fit of the normalized data ( ) the first choice (4) corresponds a one-dimensional approximation or linear case, (5) is a two-dimensional approximation, etc.

Linear Case
In this section, we only focus in the first linear relation defined in the previous section: Our goal is to assess the predictive power of this first approximation, in other words, if we knew an index, could we predict another index? For example, if we select the French market index (Cac) ( 1 j = ) we take 1 11 p v α = , then the best fits for the other indexes are This procedure applied in each day 1, ,127 i =  is the prediction The resulting lines are shown in Figure

Two Dimensional Approximation
In this subsection we will try to answer the following question: knowing two in-    , and the i-th prediction for the j index is In Figure 6 and Figure 7 we have plotted the four cases. For each one, the red line is what would be obtained if the predicted values and real values are equal and the blue points are the real values versus the values predicted using the training set. From these four graphs two predictions are reasonable (Ftse and Ibex) and the other two are quite bad (Dow Jones and Nikkei).
On the other hand, using the before software in Matlab we can represent three components by typing: • >> biplot(coefs(:,1:3),'Scores',score(:,1:3),'VarLabels',vbls) with the result in Figure 8 a picture similar to the previous Figure 5 but with three principal components. Maybe the most remarkable detail is that the vectors for indexes Cac, Dax, Ftse and Ibex are almost on a single plane.

Five Dimensional Approximation
The question that we now consider is whether known five indices, how well can predict the sixth. Now the fit is (8) i.e.  (12) which in matrix form is .     , If we write

Conclusions and Discussion
In for 1, , 5 k =  representing in Figure 12. Usually, this information is used to decide on how many components to use for a PCA, the most used is based on a noticeable change in this plot, applying this to Figure 12, one would again decide on taking one or two columns and no more. There are those who use a criterion of the form k R tol < , where tol is a number chosen somewhere between 0.25 and 0.05, here the result is similar. Moreover, after the results of Figure 5 and Figure 8 in this research it would seem that the linear case with one column is the best.
This small academic exercise does not conclude important results, however, we believe that it is quite easy to extend this kind of analysis to longer series of F. Vadillo, N. Vadillo data or also apply it to other indexes, for example in Investing.com there are 45 indexes. This would need to prepare the data files, a routine but important job, but with little academic interest. The main philosophy of all this is written by Yuval Noah Harari in the introduction of [21]: In a wold deluged by irrelevant information, clarity is power.

Fund
This work was supported by Spanish Ministry of Sciences Innovation and Universities with the project PGC2018-094522-B-100 and by the Basque Government with the project IT1247-19.