^{1}

^{*}

^{1}

Topological features in high dimensional time series are used to characterize changes in stock market dynamics over time. We explored the daily log returns of four major US stock market indices and 10 ETF sectors between January 2010-June 2020. Topological data analysis and persistence homology were used on two sequences of point cloud data sets the stock indices and the ETF sectors, respectively. Using these sequences, the daily log returns, persistence diagrams, persistence landscapes, and mean landscapes were used to quantify topological patterns in the multidimensional time series. For example, norms of the persistence landscapes were generated to detect critical transitions in the daily log returns. To measure statistical significance, we implemented three permutation tests with a significance level α = 0.05 to determine if topological features change within a particular time frame by comparing sliding windows in the sequence of point cloud data sets. We found that between July 1, 2019 and July 1, 2020, there is evidence of changing structure in the US stock market. Critical transitions are identified by the statistical properties of the norms of the persistence landscape between contiguous daily sliding windows of the stock indices and ETF sector series.

Topological data analysis (TDA) extracts topological features by examining the shape of the data through persistent homology to produce topological summaries. Two topological summaries, the persistent barcode [

In Bubenik [

With this alternative topological summary and the ability to conduct statistical inference, we bring our focus to critical transitions in complex dynamical systems, in particular, the financial market. Scheffer et al. [

Ensor and Koev [

While Ensor and Koev [^{1}-norms of persistence landscapes. While the paper has valid analysis, our interest is in stocks and ETF sectors rather than cyprtocurrencies.

Alternatively, Gidea and Katz [

While these papers provide insightful groundwork for TDA in financial markets and cryptocurrencies, such as showing how to use cross correlation networks to track topological changes, using TDA with machine learning to understand what happens before a critical transition, and using the norms of persistence landscapes to indicate an approaching critical transition, these financial papers lack statistical inference. We are motivated to explore how the topological features change within a given time period for stocks and ETF sectors and find any statistical significant using a permutation test [

While we acknowledge the previous cited authors, we deem our contributions as an empirical framework that adapts their analytical models to new data sets and expand by conducting statistical inference. Similar to Gidea and Katz [

In Section 2, we provide background information on algebraic topology, homology, constructing the Vietoris-Rips complex, persistent homology, topological summaries, norms of persistent landscapes, and statistical inference. In Section 3, we outline our methods for obtaining the data, constructing a sequence of a point cloud data, using persistent homology on a sequence of a point cloud data set, generating topological summaries, and performing statistical inference. In Section 4, we present our findings from our data. In Section 5, we discuss and provide an interpretation of our results. In Section 6, we conclude the paper.

This study presents a topological data analysis of financial time series data. Here we provide background material about four relevant areas: algebraic topology, homology, topological summaries, and norms for persistent landscapes. We apply topological data analysis to a sequence of point cloud data sets to examine their topological properties within a point cloud matrix of d 1-dimensional time series. For our analysis, a sequence of point cloud data sets denoted X n is shown below:

X 1 = [ x ( t 1 ) x ( t 2 ) ⋮ x ( t w ) ] = [ x 1 1 ⋯ x 1 d x 2 1 ⋯ x 2 d ⋮ ⋱ ⋮ x w 1 ⋯ x w d ] ⋮ X q = [ x ( t q ) x ( t q + 1 ) ⋮ x ( t q + w − 1 ) ] = [ x q 1 ⋯ x q d x q + 1 1 ⋯ x q + 1 d ⋮ ⋱ ⋮ x q + w − 1 1 ⋯ x q + w − 1 d ] , (1)

where each point in the sequence is expressed as x ( t n ) = ( x q 1 , x q 2 , ⋯ , x q d ) ∈ ℝ d , d is the column number from a 1-dimensional time series, w is the sliding window size for a certain number of trading days ( n t d ) with a sliding step of one day, and n = 1,2, ⋯ , q . To obtain q, the difference is taken between the total number of days of the daily log returns ( n d l r ) and one less than the sliding window size w − 1 , so that q becomes q = n d l r − ( w − 1 ) or q = n d l r − w + 1 . The total number of days of the daily log returns ( n d l r ) is the total number of trading days ( n t d ) minus 1 or n d l r = n t d − 1 . To approximate the daily log returns, the formula is discussed in Section 3.1. So, every point cloud is compromised of a d × w matrix, where w > d [

To produce topological summaries, we must first construct a Vietoris-Rips filtration for each point cloud in a sequence of point cloud data sets, which requires understanding simplices and simplicial complexes and are defined below [

Definition 2.1 Let { a 0 , ⋯ , a n } be a geometrically independent set in ℝ N . We define the n-simplex σ spanned by a 0 , ⋯ , a n to be the set of all points x of ℝ N such that:

x = ∑ i = 0 n t i a i , where t = ∑ i = 0 n t i = 1 , (2)

and t i ≥ 0 for all i.

Definition 2.2 A simplicial complex K in ℝ N in a collection of simplices in ℝ N such that:

· Every face of a simplex of K is in K.

· The intersection of any two simplexes of K is a face.

In homology, we are interested in a vector space H i ( X ) to a space X for each natural number i ∈ { 0,1,2 , ⋯ } , because H i ( X ) counts the number of k-dimensional holes in X. For example, H 0 ( X ) counts the number of 0-dimensional holes or the number of connected components in X, while H 1 ( X ) counts number of 1-dimensional holes or the number of loops in X. Furthermore, the algebraic structures must be homotopy invariant, meaning they must not change through deformations. Yet, it is very challenging to determine the homology of arbitrary topological spaces, because it is computationally inefficient, so instead we approximate using simplicial complexes.

Now that simplicial complexes have been defined, we are introducing the p^{th} homology of a simplicial complex K. First, we denote the field with two elements as F 2 . Second, for a given simplicial complex K, we let C p ( K ) denote the F 2 -vector space with basis given by the p-simplices of K. Third, for any p ∈ { 1,2, ⋯ } , we define the linear map:

∂ p : C p ( K ) → C p − 1 ( K ) : σ ↦ ∑ τ ⊂ σ , τ ∈ K p − 1 τ , (3)

The kernel of ∂ p : C p ( K ) → C p − 1 ( K ) is the subgroup ∂ p − 1 ( 0 ) of C p ( K ) and is called the group of p-cycles. The image of ∂ p + 1 : C p + 1 ( K ) → C p ( K ) is the image ∂ p + 1 is the subgroup of ∂ p + 1 ( C p + 1 ( K ) ) of C p ( K ) and is called the group of p-boundaries [

Definition 2.3 For any p ∈ { 0,1,2, ⋯ } , the p^{th} homology of a simplicial complex K is the quotient vector space is defined as:

H p ( K ) = kernel ( ∂ p ) / image ( ∂ p + 1 ) . (4)

Its dimension is defined by:

β p ( K ) : = dim H p ( K ) = dimkernel ( ∂ p ) − dimimage ( ∂ p + 1 ) , (5)

which is called the p^{th} Betti number of K [

The p-cycles that are not boundaries represent p-dimensional holes, which the p^{th} Betti number counts. For the p^{th} homology of a filtered simplicial complex K, we apply definition 2.3 and define as:

Definition 2.4 Let K be a finite simplicial complex, and let K 1 ⊂ K 2 ⊂ K 3 … ⊂ K l = K be a finite sequence of nested subcomplexes of K. The simplicial complex K with such a sequence of subcomplexes is called a filtered simplicial complex. The p^{th} persistent homology of K is the pair

( { H p ( K i ) } 1 ≤ i ≤ l , { f p i , j } 1 ≤ i ≤ j ≤ l ) ,

where i , j ∈ { 1, ⋯ , l } for all i ≤ j , f p i , j : H p ( K i ) → H p ( K j ) are the linear maps induced by the inclusion maps K i → K j [

The p^{th} persistent homology of a filtered simplicial complex provides more information about the maps between each subcomplex than the homologies of single subcomplexes, which is explained further in Section 2.2.2. While there are several filtered simplicial complexes, such as the Cech, Alpha, and Delaunay, we chose the Vietoris-Rips complex, because it is computationally efficient [

Definition 2.5 Let X = { x 1 , ⋯ , x n } be a collection of points in ℝ d . Given a distance ϵ > 0 , R ( X , ϵ ) denotes the simplicial complex on n vertices x 1 , ⋯ , x n , where an edge between the vertices x i and x j with i ≠ j is included if and only if d ( x i , x j ) ≤ ϵ or generally the k-simplex are included with vertices x i 0 , ⋯ , x i k if and only if all of the pairwise distances are at most ϵ . This type of simplicial complex is called a Vietoris-Rips complex [

When ϵ < ϵ ′ , the Vietoris-Rips complex forms a filtration, R ( X , ϵ ) ⊆ R ( X , ϵ ′ ) , which by definition 2.4 is a filtered simplicial complex. While there is no clear criteria for select ϵ ′ , [

R ( X 1 , ϵ ) ⊆ R ( X 1 , ϵ ′ ) ⋮ R ( X q , ϵ ) ⊆ R ( X q , ϵ ′ ) , (6)

where q is the difference between the number of the daily log returns and the sliding window ( w + 1 ) or q = n d l r − w + 1 . By definition 2.4, R ( X n , ϵ ) is a filtered simplicial complex.

Using definition 2.4 and definition 2.5, it is possible to find the p-dimensional homology of the Vietoris-Rips complex of X n labelled as H p ( R ( X n , ϵ ) ) with coefficients in field ℤ / 2 ℤ for small values of p and for different values of ϵ [

H p ( R ( X 1 , ϵ ) ) → H p ( R ( X 1 , ϵ ′ ) ) ⋮ H p ( R ( X q , ϵ ) ) → H p ( R ( X q , ϵ ′ ) ) , (7)

where q = n d l r − w + 1 . Each H p ( R ( X n , ϵ ) ) is a vector space whose generators correspond to holes in R ( X n , ϵ ) , and the linear maps f p i , j allow us to track the generators from H p ( R ( X n , ϵ ) ) → H p ( R ( X n , ϵ ′ ) ) . A suitable basis is selected by applying the Fundamental Theorem of Persistence Homology.

Theorem 2.1 (Fundamental Theorem of Persistent Homology) The Fundamental Theorem of Persistent Homology states there is a choice of basis vectors H p ( K i ) for each i ∈ { 1, ⋯ , l } such that each map is determined by a bipartite matching of basis vectors [

Given Theorem 2.1, there is a choice of basis vectors of H p ( R ( X n , ϵ ) ) , such that one may construct a well-defined and unique collection of disjoint half-open intervals, where a generator x ∈ H p ( R ( X n , ϵ ) ) corresponds to a half-open interval [ b i , d i ) , which represents the lifetime of x. The endpoints b i and d i refer to x first appearing and finally disappearing respectively in R ( X n , ϵ ) . Specifically, if x ≠ 0 is not in the image of f p b i − 1 , b i , then x is born in H p ( R ( X n , ϵ ) ) . Conversely, if d i > b i is the smallest index for which f p b i , d i ( x ) = 0 , then x dies in H p ( R ( X n , ϵ ) ) . Persistence is determined by a generator’s lifetime in the half-open interval, where a generator is considered more persistent the longer it appears in the half-open interval. If f p b i , d i ( x ) = 0 for all b i > d i in I j , then x lives forever, and its lifetime is represented by the interval [ b i , ∞ ) [

To visualize, construct, and produce topological summaries, Theorem 2.1 is used to select the choice of basis vectors from H p ( R ( X n , ϵ ) ) and the corresponding linear maps f p b i , d i , in which all topological summaries are derived from the persistent modules.

Definition 2.6 A persistence module is defined as a vector space M α for all a ∈ ℝ and linear maps M ( a ≤ b ) : M a → M b for all a ≤ b such that:

1) M ( a ≤ a ) is the identity map;

2) For all a ≤ b ≤ c , M ( b ≤ c ) ∘ M ( a ≤ b ) = M ( a ≤ c ) .

For additional information about the construction of a persistence module, see [^{th} barcode is denoted by B p = { I j } . A topological feature’s survival or persistence is represented by the interval’s length. The second type of topological summary is the p^{th} persistence diagram, which is denoted as D p = { ( b i , d i ) } i ∈ I j , where b i and d i are the bar codes intervals’ end points and − ∞ < b i < d i < ∞ .

Unfortunately, the geometric properties of the barcodes and persistence diagrams present a difficult challenge for the calculation of means and variances, since two barcodes or two persistence diagrams may not have the same unique Friechet mean, which means statistical inference cannot be done. While the barcode and the persistence diagram are conventional topological summaries, Bubenik [

Bubenik and Dlotko [

f ( b , d ) = ( 0 if x ∉ ( b , d ) x − b if x ∈ ( b , b + d 2 ] − x + d if x ∈ ( b + d 2 , d ) (8)

Second, given a persistence module, M, the persistence landscape may be defined as the function λ : ℕ × ℝ → R given by:

λ ( k , t ) = sup ( h > 0 | rank M ( t − h ≤ t + h ) > k ) . (9)

Third, given a persistence diagram D p = { ( b i , d i ) } i ∈ I for b < d , f ( b , d ) ( t ) = max ( 0, min ( b + t , d − t ) ) , and the persistence landscape is defined as follows:

λ ( k , t ) = kmax { f p b i , d i ( t ) | ( b i , d i ) ∈ D p ( t ) } i ∈ I , (10)

where kmax denotes the k^{th} largest element. Using Equation (10) for X n , the persistence landscape of X n denoted by λ ( X n ) is the following:

λ ( X 1 ) = k-max { f p b i , d i ( X 1 ) | ( b i , d i ) ∈ D p ( X 1 ) } i ∈ I ⋮ λ ( X q ) = k-max { f p b i , d i ( X q ) | ( b i , d i ) ∈ D p ( X q ) } i ∈ I , (11)

where q = n d l r − w + 1 . This results in the following lemma from [

Lemma 2.2

The persistence landscape has the following properties:

1) λ k ( t ) ≥ 0 ,

2) λ k ( t ) ≥ λ k + 1 ( t ) , and

3) λ k ( t ) 0 is 1-Lipschitz.

From Equation (10), the persistence landscape is obtained and used to calculate the mean landscape, which is defined below:

Definition 2.7 Let Y 1 , ⋯ , Y n be independent and identically distributed copies of Y, and let Λ 1 , ⋯ , Λ n be corresponding persistence landscapes. The mean landscape Λ ¯ n is given by the point wise mean, in particular, Λ ¯ n ( ω ) = Λ ¯ n , where

λ ¯ n ( k , t ) = 1 n ∑ i = 1 n λ i ( k , t ) . (12)

Using Equation (12) for X n , we have the following:

λ ¯ n ( X 1 ) = 1 n ∑ i = 1 n λ i ( X 1 ) ⋮ λ ¯ n ( X q ) = 1 n ∑ i = 1 n λ i ( X q ) , (13)

where q = n d l r − w + 1 . The mean landscape is used in section 2.5 and section 2.6.

Gidea and Katz [

Therefore, for real valued functions on ℝ × ℝ , for 1 ≤ p < ∞ , p-norms of persistence landscapes are defined as:

‖ λ ‖ p = ∑ i = 1 ∞ [ ∫ − ∞ ∞ λ k ( t ) p d t ] 1 p , (14)

and for p = ∞ ,

‖ λ ‖ ∞ = sup k , t λ k ( t ) . (15)

Applying Equation (14) to our sequence of point cloud data sets X n results in:

‖ λ ( X 1 ) ‖ p = ∑ i = 1 ∞ [ ∫ − ∞ ∞ λ ( X 1 ) p d t ] 1 p ‖ λ ( X q ) ‖ p = ∑ i = 1 ∞ [ ∫ − ∞ ∞ λ ( X q ) p d t ] 1 p , (16)

where q = n d l r − w + 1 .

To compare the topological features between two groups, the persistence landscape is used to conduct a hypothesis test and statistical inference, which require several assumptions provided by [

Y = f ( λ ( k , t ) ) = ∑ k ∫ ℝ λ k ( t ) d t , (17)

where f ∈ L b ( S ) is a continuous linear functional, 1 a + 1 b = 1 , and Y satisfies the (SLLN) and (CLT) as seen in [

The statistical properties and definitions above are utilized to a conduct hypothesis tests with corresponding p-value based on a permutation test. To compare the topological features of two groups, Y 1 and Y 2 , where k 1 and k 2 are samples taken from these groups respectively, and Λ 1 and Λ 2 are the corresponding landscapes respectively. The associate sample values of Y 1 and Y 2 are denoted as y 1 1 , ⋯ , y 1 k 1 and y 2 1 , ⋯ , y 2 k 2 and the corresponding landscapes of these sample values are labelled as λ 1 1 , ⋯ , λ 1 k 1 and λ 2 1 , ⋯ , λ 2 k 2 . We apply Equation (17) to Y 1 and Y 2 , so the functional of Y 1 and Y 2 are as follows:

Y 1 = f ( y 1 1 ) , ⋯ , f ( y 1 k 1 ) = f ( λ 1 1 ( k , t ) ) , ⋯ , f ( λ 1 k 1 ( k , t ) ) = ∑ i = 1 k 1 ∫ ℝ λ 1 i ( k , t ) d t Y 2 = f ( y 2 1 ) , ⋯ , f ( y 2 k 2 ) = f ( λ 2 1 ( k , t ) ) , ⋯ , f ( λ 2 k 2 ( k , t ) ) = ∑ i = 1 k 2 ∫ ℝ λ 2 i ( k , t ) d t . (18)

Recall the sample mean is Y ¯ = 1 n ∑ i = 1 n Y i , so the sample means of the Y 1 and Y 2 are the following:

Y ¯ 1 = 1 k 1 ∑ i = 1 k 1 f ( y 1 i ) = 1 k 1 ∑ i = 1 k 1 f ( λ 1 i ( k , t ) ) Y ¯ 2 = 1 k 2 ∑ i = 1 k 2 f ( y 2 i ) = 1 k 2 ∑ i = 1 k 2 f ( λ 2 i ( k , t ) ) , (19)

where again k 1 and k 2 are the samples taken from Y 1 and Y 2 . We assume that μ 1 and μ 2 are the expectations of Y 1 and Y 2 . So, μ 1 and μ 2 are assumed to be the population means of Y 1 and Y 2 . Therefore, the statistical hypothesis is:

H 0 : μ 1 = μ 2 H a : μ 1 ≠ μ 2 . (20)

To test the null-hypothesis, we use a two sample permutation test. Let

t = | Y ¯ 1 − Y ¯ 2 | V a r ( Y 1 ) k 1 + V a r ( Y 2 ) k 2 . (21)

Using Equation (21), t 1 , ⋯ , t m of the test statistic are calculated for permutations s = 1 , ⋯ , m . The observed value of the test statistic is expressed as t observed . The p-value is calculated by comparing t observed with t s and averaging the number of times t observed ≤ t s . Thus, Equation (21) becomes:

t { 1, Y 1 , Y 2 } = | Y ¯ 1 − Y ¯ 2 | V a r ( Y 1 ) k 1 + V a r ( Y 2 ) k 2 ⋮ t { m , Y 1 , Y 2 } = | Y ¯ 1 − Y ¯ 2 | V a r ( Y 1 ) k 1 + V a r ( Y 2 ) k 2 . (22)

A general form of Equation (22) is:

t { s , Y 1 , Y 2 } = | Y ¯ 1 − Y ¯ 2 | V a r ( Y 1 ) k 1 + V a r ( Y 2 ) k 2 . (23)

Hence, using Equation (22) and every instance where t observed ≤ t s , the p-value is obtained as:

p -value { Y 1 , Y 2 } = 1 m ∑ i = 1 m t { i , Y 1 , Y 2 } . (24)

To measure the statistical significance, [

Instead of conducting one hypothesis test, multiple hypotheses tests are conducted to determine how the topological features in our sequence of point cloud data sets X n change within a particular time frame. The hypotheses tests are done on all the sliding window matrices within X n . In particular, two adjacent sliding window matrices are compared, where adjacent means the sliding window matrices differ by a sliding step of one day. For example, the sliding window matrices X 1 and X 2 would be compared, while the sliding window matrices X 1 and X 3 would not be compared. Therefore, the assumptions, equations, and definitions from section 2.5 are applied to X n . When hypotheses tests are performed, there are q = n d l r − w + 1 random variables (see Equation (1)), which is also the size of the sequence of the point cloud data set X n .

So, we let Y 1 , Y 2 , ⋯ , Y q be random variables, where k 1 , k 2 , ⋯ , k q are taken as samples from these groups respectively, and Λ 1 , Λ 2 , ⋯ , Λ q are the corresponding landscapes respectively. The associate sample values of Y 1 , Y 2 , ⋯ , Y q are denoted as y 1 1 , ⋯ , y 1 k 1 , y 2 1 , ⋯ , y 2 k 2 , ⋯ , y q 1 , ⋯ , y q k q , and the corresponding landscapes of these sample values are labelled as λ 1 1 , ⋯ , λ 1 k 1 , λ 2 1 , ⋯ , λ 2 k 2 , ⋯ , λ q 1 , ⋯ , λ q k q . The functional in Equation (17) is used to define the following for Y 1 , Y 2 , ⋯ , Y q :

Y 1 = ∑ i = 1 k 1 ∫ ℝ λ 1 i ( X 1 ) d t ⋮ Y q = ∑ i = 1 k q ∫ ℝ λ j i ( X q ) d t , (25)

where q = n d l r − w + 1 . Recall the sample mean is Y ¯ = 1 n ∑ i = 1 n Y i , so the sample means of the Y 1 , Y 2 , ⋯ , Y q as follows:

Y ¯ 1 = 1 k 1 ∑ i = 1 k 1 f ( λ i ( X 1 ) ) Y ¯ q = 1 k q ∑ i = 1 k q f ( λ i ( X q ) ) , (26)

where q = n d l r − w + 1 . We assume that μ 1 , μ 2 , ⋯ , μ q are the expectations of Y 1 , Y 2 , ⋯ , Y q . So, μ 1 , μ 2 , ⋯ , μ q are assumed to be population means of Y 1 , Y 2 , ⋯ , Y q , and the statistical hypotheses are:

H 0 : μ 1 = μ 2 H a : μ 1 ≠ μ 2 ⋮ H 0 : μ q − 1 = μ q H a : μ q − 1 ≠ μ q , (27)

where q = n d l r − w + 1 . To test the null-hypothesis, we use a two sample permutation test with statistics,

t { Y 1 , Y 2 } = | Y ¯ 1 − Y ¯ 2 | V a r ( Y 1 ) k 1 + V a r ( Y 2 ) k 2 ⋮ t { Y q − 1 , Y q } = | Y ¯ q − 1 − Y ¯ q | V a r ( Y q − 1 ) k q − 1 + V a r ( Y q ) k q . (28)

where q = n d l r − w + 1 . Using Equation (28), t 1 , ⋯ , t m of the test statistic are calculated for permutations s = 1 , ⋯ , m . The observed value of the test statistic is expressed as t observed . The p-value is calculated by comparing t observed with t s and averaging the number of times t observed ≤ t s . Using Equation (23), Equation (28) becomes:

t { s , Y 1 , Y 2 } = | Y ¯ 1 − Y ¯ 2 | V a r ( Y 1 ) k 1 + V a r ( Y 2 ) k 2 ⋮ t { s , Y q − 1 , Y q } = | Y ¯ q − 1 − Y ¯ q | V a r ( Y q − 1 ) k q − 1 + V a r ( Y q ) k q , (29)

where q = n d l r − w + 1 . Hence, using Equation (29) and every instance where t observed ≤ t s , the p-value is obtained as:

p -value { Y 1 , Y 2 } = 1 m ∑ i = 1 m t { i , Y 1 , Y 2 } p -value { Y q − 1 , Y q } = 1 m ∑ i = 1 m t { i , Y q − 1 , Y q } , (30)

where q = n d l r − w + 1 . In our study, we also conduct hypotheses tests between two sequences of point cloud data sets, X n 1 and X n 2 , within the same sliding window, so using the same assumptions, definitions, and results from this section. The only difference is a change in subscripts and superscripts. This case is presented in Section 3.4.

In this section, we describe the methods to obtain the data and analyze the financial time series using topological data analysis, statistical inference, and RStudio [

log 10 ( x t x t − 1 ) = log 10 ( x t ) − log 10 ( x t − 1 ) ≈ r t ,

which is an approximation of a return [

After approximating the daily log returns, we designed two sequences of point cloud data sets, each with a sliding window of w = 50 and a sliding step of one day, which is based on the same method found in [

X 1 S I = [ x ( t 1 ) x ( t 2 ) ⋮ x ( t 50 ) ] = [ x 1 1 ⋯ x 1 4 x 2 1 ⋯ x 2 4 ⋮ ⋱ ⋮ x 50 1 ⋯ x 50 4 ] ⋮ X 951 S I = [ x ( t 951 ) x ( t 952 ) ⋮ x ( t 1000 ) ] = [ x 951 1 ⋯ x 951 4 x 952 1 ⋯ x 952 4 ⋮ ⋱ ⋮ x 1000 1 ⋯ x 1000 4 ] . (31)

The second sequence of point cloud data set denoted by X n E T F examined the 10 ETF sectors ( d = 10 ), which yielded a 10 × 50 matrix for each single point cloud for a total of q = n d l r − w + 1 = 951 point clouds as seen below from using Equation (1):

X 1 E T F = [ x ( t 1 ) x ( t 2 ) ⋮ x ( t 50 ) ] = [ x 1 1 ⋯ x 1 10 x 2 1 ⋯ x 2 10 ⋮ ⋱ ⋮ x 50 1 ⋯ x 50 10 ] ⋮ X 951 E T F = [ x ( t 951 ) x ( t 952 ) ⋮ x ( t 1000 ) ] = [ x 951 1 ⋯ x 951 10 x 952 1 ⋯ x 952 10 ⋮ ⋱ ⋮ x 1000 1 ⋯ x 1000 10 ] . (32)

Next, we constructed Vietoris-Rips complexes and filtration for each point cloud in X n S I and X n E T F from definition 2.4, definition 2.5, and Equation (6) and R-package “TDA” [

R ( X n S I , ϵ ) = R ( X n S I , 0 ) ⊂ ⋯ ⊂ R ( X n S I , 0.055 ) , (33)

R ( X n E T F , ϵ ) = R ( X n E T F , 0 ) ⊂ ⋯ ⊂ R ( X n E T F , 0.08 ) , (34)

where n = 1 , ⋯ , 951 . Based on the Equations (6), (33), and (34), we computed only the p = 1 dimensional homology H 1 ( R ( X n , ϵ ) ) with coefficients in the field ℤ / 2 ℤ from Equation (7) as follows:

H 1 ( R ( X n S I ,0 ) ) → H 1 ( R ( X n S I ,0.055 ) ) , (35)

H 1 ( R ( X n E T F ,0 ) ) → H 1 ( R ( X n E T F ,0.08 ) ) , (36)

where n = 1 , ⋯ , 951 . Also, we are only interested in the persistence of loops in as they appear in each point cloud during the transition states of the market, which is why we did the first dimensional homology. From definition 2.4, the filtration from Equations (33) and (34) induced a sequence of linear maps f 1 b i , d i , S I : H 1 ( R ( X n S I , 0 ) ) → H 1 ( R ( X n S I , 0.055 ) ) and f 1 b i , d i , E T F : H 1 ( R ( X n S I ,0 ) ) → H 1 ( R ( X n S I ,0.08 ) ) . The images of these maps are the persistent homology groups. The collection of vector spaces H 1 ( R ( X n S I ) ) and H 1 ( R ( X n E T F ) ) along with the corresponding linear maps is a persistent module, which leads us to the topological summaries.

By modifying the R script in [

where

where

While the topological summaries were useful for examining topological features, we were also interested in finding statistical significant for any changes of these topological features within time. The time period of interest is July 1, 2019 to July 1, 2020, which has

We make the same assumptions from Section 2.5 and Section 2.6. Our random variables will derive from our two sequences of point cloud data sets,

For all the stock indices and all the ETF sectors, we have

The functional in Equation (25) is used to define the random variables for all the stock indices and all the ETFs as follows:

where

where

For our third set of statistical hypotheses, we also wish to compare all the stock indices against all the ETF sectors within the same sliding windows. Our statistical hypotheses will determine for two groups at a time if the means of topological features are the same within the same sliding window as shown below:

where in Equations (45)-(47),

where

where

Similarly, Equation (50) becomes:

where in Equations (51)-(53), where

Similarly, using Equation (53) and every instance where

where in Equations (54)-(56),

The goal of this study is to detect a statistically relevant critical transition and characterize any changes in topological features over time. To assess the statistical significance of observed differences in the topological features that change over time, we used a permutation test. For degree 1, we obtained ten sample values of the random variables

Using Equations (44), (46), (49), (52), and (55), the permutation test is conducted with a significance level

In order to understand these results, we will review the daily log returns, the norms of the persistence landscapes, and the topological summaries of all the stock indices and all the ETF sectors. When reviewing the daily log returns for DJIA, the S&P 500, NASDAQ, and Russell 2000 between January 5, 2010 and June 30, 2020 (see

When we examine the daily log returns of all of the stock indices between January 5, 2010-June 1, 2020, the minimum daily log return occur on March 16, 2020, where Russell 2000 had a return of −0.154, the S&P 500 had a return at −0.1277, and the other stock indices were in between these values. When reviewing the daily log returns for all of the ETFs sectors for the same time period, the minimum daily log return also occurs on March 16, 2020, where Information Technology (XLK) had a return of −0.1487, Consumer Staples (XLP) had a return of −0.0702, and the other ETF sectors were in between these values. While March 16, 2020 is not recognized as an official financial crash or meltdown, this

date is noteworthy, and warrants closer examination for potential critical transitions prior to this date. Focusing on when the peaks occur, we include summary statistics for July 1, 2019 to July 1, 2020 for all the stock indices and all the ETF sectors in

The norms of the persistence landscapes in homology degree 1 presented in ^{1} distances are less than 0.01 between 2017 and 2018, less than 0.02 between 2018 and 2020, but the greatest L^{1} distance occurs in 2020 at approximately 0.08 as seen in ^{1} distance for all of the ETFs, have more spikes than the L^{1} distances of the stock indices, especially between 2018 and 2020, but the greatest L^{1} distance occurs in March 2020 at approximately 0.14 as seen in ^{2} norms for all of the stock indices and all of the ETFs have similar distances, there is a noticeable spike in 2020. However, the distances in L^{2} are not as great as in L^{1}, as shown in

Stock Name | |||||
---|---|---|---|---|---|

Dow Jones | −1e−04 | 5e−04 | 0.0228 | −0.8479 | 13.1333 |

S&P 500 | 2e−04 | 5e−04 | 0.0213 | −0.8691 | 12.6087 |

NASDAQ | 9e−04 | 5e−04 | 0.0213 | −1.0494 | 12.6029 |

Russell 2000 | −3e−04 | 7e−04 | 0.0263 | −1.3226 | 11.2358 |

Stock Symbol | |||||
---|---|---|---|---|---|

XLY | 0.0003 | 0.0004 | 0.0210 | −1.3141 | 14.1179 |

XLP | 0.0001 | 0.0003 | 0.0173 | −0.2487 | 12.6118 |

XLE | −0.0017 | 0.0012 | 0.0348 | −1.3392 | 13.4664 |

XLF | −0.0006 | 0.0008 | 0.0276 | −0.6256 | 10.4510 |

XLV | 0.0004 | 0.0004 | 0.0187 | −0.4299 | 10.1362 |

XLI | −0.0004 | 0.0006 | 0.0243 | −0.5575 | 10.0397 |

XLB | −0.0001 | 0.0005 | 0.0232 | −0.7113 | 10.0980 |

XLK | 0.0012 | 0.0006 | 0.0242 | −0.6869 | 12.5942 |

XLU | −0.0001 | 0.0006 | 0.0235 | −0.0751 | 11.9571 |

SPY | 0.0002 | 0.0004 | 0.0207 | −0.8911 | 11.7189 |

sectors respectively. In particular,

Aside from the norms of the persistence landscapes, we produce topological summaries to represent the persistence of topological features for all the stock indices and for all the ETF sectors between January 3, 2020 and June 30, 2020. Along with these topological summaries (the persistence diagram, the persistence landscape, the mean landscape), we plotted the daily log returns for the corresponding sliding window of 50 trading days shown in Figures 7-12.

the stock indices and from March 1, 2020 to May 1, 2020 for all of the ETF sectors. Significant persistence is apparent in the persistence diagram and more spikes appear in the persistence landscape and mean landscape.

From reviewing the norms of the persistence landscape, the daily log returns, persistence diagrams, persistence landscapes, and mean landscapes for all of the selected dates, it is clear that the number of the loops in the relevant point clouds are more pronounced resulting in more persistence, which signifies that the stock market is transitioning from a stable state to a more unpredictable, volatile state. Moreover, the ETF sectors demonstrate more volatility than the stock indices. These stock indices’ findings coincide with the 2000 and 2008 market crashes findings found in [^{1} distances that confirm the critical thresholds prior to the 2020 peak and exhibit more than the L^{2} norm. In other words, the L^{p}-norms exhibit strong growth around the emergence of the primary peak.

While the highest peak occurred on February 21, 2020 for all of the stock indices and March 3, 2020 for all of the ETF sectors in the L^{p} norms, the Coronavirus (COVID-19) broke out in 2019 in Wuhan, China, but on January 21, 2020, the first US case was confirmed. The most important dates are March 13, 2020 when President Trump declares national emergency, March 15, 2020 when the Center of Disease Control and Prevention warns against large gatherings, and March 17, 2020 when COVID is present in all 50 states. The daily log returns for all of the stock indices and for all of the ETF sectors do not include negative values. Yet, there are other dates that could have lead to a market decline in March 16, 2020. For example, on January 30, 2020 when World Health Organization (WHO) declares a global health emergency or between February 5, 2020 and February 29, 2020 when the outbreak becomes an epidemic. While we acknowledge that it is quite difficult to predict a market crash, the norms of the persistence landscape performed really well as indicator in detecting critical transitions and the topological summaries authenticated volatility by of the number of loops increasing.

Our hypotheses tests aimed to find how topological features change within time, notably between July 1, 2019 and July 1, 2020. Our hypotheses tests for all of the stock indices found evidence of difference in topological features when comparing adjacent sliding windows of a sliding step of one day. In particular, we found for the chosen time frame that the daily log returns of all the stock indices significantly differ in the number of loops. Equivalently, our hypotheses tests for all of the ETF sectors found evidence of difference in topological features when comparing adjacent sliding windows of a sliding step of one day. Specifically, we found for the selected time frame that the daily log returns of all the ETF sectors significantly differ in the number of loops. Our last hypotheses tests between all of the stock indices and all of the ETF sectors within the same sliding window found inconclusive evidence of difference in topological features for the entire time frame.

In this paper, we investigated the topological features of four major indices and 10 ETF sectors for January 4, 2010-July 1, 2020. We used two sequences of point cloud data sets, one for all the stock indices and the other for all the ETFs with a sliding window

We conclude with possible future research goals. Further work could be done analyzing persistence landscapes for homology in degree two. It would be interesting to study topological features based on higher degree persistence. Furthermore, it would be fascinating to expand to commodities, futures, and other financial time series. Moreover, it would be more resourceful to expand topological data analysis to statistics beyond statistical inference and use for predictive modeling with machine learning.

This table presents summary statistics for all the stock indices. We estimated the mean (

This table presents summary statistics for all the ETF sectors. We estimated the mean (

The authors would like to thank Tracy Volz for helpful discussions in the editing process. The authors would also like to thank the Center of Computational Finance and Economic Systems (https://cofes.rice.edu).

The authors declare no conflicts of interest regarding the publication of this paper.

Aguilar, A. and Ensor, K. (2020) Topology Data Analysis Using Mean Persistence Landscapes in Financial Crashes. Journal of Mathematical Finance, 10, 648-678. https://doi.org/10.4236/jmf.2020.104038