^{1}

^{2}

During the last few decades, many statistical physicists have devoted re-search efforts to the study of the problem of earthquakes. The purpose of this work is to apply methods of Statistical Physics and network systems based on “neurons” in the study of seismological events. Data from the Advanced National Seismic System (ANSS) of Southern California were used to verify the relationship between time differences between consecutive seismic events with magnitudes greater than 3.0, 3.5, 4.0 and 4.5 through the modeling of neural networks. The problem we are analyzing is time differences between seismological events and how these data can be adopted as a time series with non linear characteristic. We are therefore using the multilayer perceptron neural network system with a backpropagation learning algorithm, because its characteristics allow for the analysis of non-linear data in order to obtain statistical results regarding the probabilistic forecast of tremor occurrence.

Earthquakes, one of nature’s many different phenomena, are the cause of huge catastrophes in their places of occurrence. Such catastrophes are characterized by the physical destruction of cities (houses, buildings, urban roads, etc.) and consequently large numbers of human victims. The extent of the damage affects thousands of people and many cities around where the tremor occurred, reaching thousands of square kilometers. Interestingly, neural networks have been shown to be useful when applied in different areas such as recognition of word patterns [

Much research is being done today in Seismology to better understand the dynamics of earthquakes, such as: the study of volcanic activity as a precursor of tremors [

Several prediction-related works have been carried out throughout history with the aim of relating earthquakes to their probability of occurrence [

Considering the relationship between earthquakes and neural networks we have some work related to the modeling of neural networks oriented to the understanding of earthquakes [

However, the prediction of earthquakes continues to be difficult, and much effort will certainly be devoted to solving this problem. For this paper, in order to estimate the possibility of a quake occurrence and time differences between events, the seismological data for analysis was taken from the Advanced National Seismic System (ANSS) catalog in Southern California.

The rest of the paper is organized as follows: in Section 2 we describe the study area; Section 3 describes the database used and the way in which the data were prepared; in Section 4, we present the type of network we use and in Section 5, the learning process. In Section 6, the results are described and, finally, Section 7 presents the conclusions.

to 2013, the data for which was taken from the Advanced National Seismic System (ANSS) catalog in Southern California. The choice of this region is related to the fact that the San Andreas Fault is a major cause of tremors and to the amount of existing tremor data available measured at the site. The San Andreas fault system in the region of San Francisco is a complex of faults and part of an isolated system where the Pacific plate meets the North American plate [^{1} the magnitude was 7.8 which was one of the largest tremors in the region.

Using neural networks for a better analysis of the seismological data, the ANSS Catalog was modified and only the time differences between the seismological events in decimal time were considered.

The multilayer perceptron network has played an important role in solving complex non-linear characteristic problems, such as, voice recognition [

The algorithm called error retro propagation, or just retro propagation, is widely used in multilayer neural networks containing one or more hidden layers. The algorithm consists of two steps: propagation and backpropagation [

In retro propagation, all the synaptic weights are adjusted from the output layer to the input layer, through the generation of what is called the error signal, which is based on the difference between the output generated from the network and the desired output. This signal is propagated back through the network from

the output layer to the hidden layer and the respective weights that interconnect these layers are adjusted so that the response generated by the network approximates the desired response.

The learning type of the network is supervised and its input and output values can be binary or continuous (limited by computer precision). Its propagation rule in each neuron is shown in Equation (1)

The learning of backpropagation is based on the updating of the synaptic weights of the network by minimizing the mean squared error using the Descending Gradient method [

where

where, p is the number of patterns introduced into the network, k is the number of neurons that are in the network output, d is the desired output of the network and y_{i} is the output obtained by the network for a certain standard introduced to the neural network. For each standard, the average quadratic error can be minimized, also generally leading to the minimization of the total mean quadratic error. Thus, the error can be defined by Equation (4).

E = 1 2 ∑ i = 1 k ( d i − y i ) 2 (4)

In minimizing the mean square error, we determine the error gradient in relation to the weight

To continue the calculation of the gradient of the mean square error, we have two possibilities: the calculation of the error in the output layer and the indirect calculation of the error in the hidden layer based on errors of the output layer.

where v is the total number of inputs applied to the neuron j, its respective activation being given by Equation (6)

In addition, we define, according to Equation (7), the error e j , the difference being of values between the desired output and the output generated by the network,

Through Equation (7), Equation (4) is represented in the following way

Therefore, using Equation (8), we calculate the gradient of the mean square error with respect to weight:

The term

Thus, for the output neuron, Equation (9) is represented as follows:

By calculating the derivatives of Equation (10) we get:

Equation (11) represents the derivative of the mean square error with respect to the synaptic weight

Thus, the update of the weights of the neurons of the output layer are given by Equation (13)

where η represents the learning rate of the neural network.

When we consider the neuron j as a neuron of the hidden layer, there is no desired output for this neuron and its respective error signal must be calculated based on all the error signals of all the neurons connected to this unhidden neuron.

From Equation (12), we can redefine the local gradient ( δ j ) for the hidden layer neuron j

For the neuron k shown in

from Equation (15), we get the value of ( ∂ E ∂ y i )

By extending Equation (16) a little more, we obtain Equation (17)

The output neuron error k and its respective derivative are given by Equation (18) and Equation (19).

In addition, as seen in

Thus, from the results of the derivatives in Equation (17), we get

The term

Thus, by replacing Equation (22) in Equation (14), we obtain the expression of the local gradient

With this, the update of the weights of the neurons of the hidden layer, are given by Equation (24)

The learning algorithm is performed by minimizing the mean square error as a function of the synaptic weights which generates a movement for an overall minimal error throughout the interactions. The main parameters that have direct intervention in the learning process of the network are the learning rate and the momentum term.

The learning rate (η) is a constant parameter that varies at interval [0, 1] and influences the convergence of the learning process, orienting the change of the synaptic weights. A small learning rate generates a very slight change in weights, however, it requires a very long training time with the added possibility of the error dropping to a local minimum preventing it from leaving this point [

If the learning rate is very large, for example, near the maximum value that is 1, there are larger changes in the weights, allowing for instabilities around the global minimum. A learning rate value that does not generate problems for error minimization should be large enough so as not to cause oscillations in minimization and should only result in faster learning [

An alternative that can be used to increase the learning rate without creating oscillations around the global minimum is found when we modify Equation (13) or Equation (24) and include the term momentum, which brings information of the past changes of the weights in the direction of update of the new weights. Equation (25) shows how the updating of the weights with the inclusion of the term momentum is modified.

where

In this analysis, in order to verify the relationship between the number of inputs of the network with the distribution of the data tested, we consider the input data as the time differences between all the seismological events occurring sequentially, without filtering of the measured magnitude. Thus, through the network structure of

From the results of the 100 trained data and 100 data tested (

values generated by the network and thus obtained better results for the configuration of the network with 100 inputs, shown in

The structure represented by

Since the previous data were made with all magnitude values and with the tests we also noticed that indicators of promising results were obtained due to the peak of the distribution being found around zero. We performed the same type of analysis, filtering the time differences of the data greater than 3.0, 3.5, 4.0 and 4.5, also observing in the data below 4.0 that it was necessary to withdraw from the data the events which are “quarry blast” and “sonic boom” (sonic blasts). For

data of magnitudes greater than 3.0 we found a total of 19,984 data, for magnitudes greater than 3.5, 6809 given, for magnitudes greater than 4,0, 2240 and for magnitudes larger than 4.5, 714 given.

In order to improve the results, we performed the criterion of stopping data training based on the convergence of the training error and the minimum error of prediction of the data. We had good results for magnitudes greater than 4.0 and 4.5. The graph in

The graph in

The graph in

The graph in

The graph in

The graph in

The peaks around zero in the distribution using all values of magnitudes in our network, proved to be a good indicator of seismological prediction. When applying this same procedure to data with magnitudes greater than 3.0, we had magnitudes greater than the range of 4.0 and 4.5. These are promising results when we introduce the training stop criterion at the moment in which the minimum error of forecast of the data reaches the global minimum. Therefore, we can verify that our model has a response to the data forecast. The prediction estimate was calculated roughly by the width of the bins of the histograms. A better idea of the relationship between prediction interval and the mean time between events will be obtained by adjusting the data to some statistical distribution that allows quantitative calculation of the half-life of the distribution. This study is in course and will be published elsewhere.

V. H. A. D. thanks CAPES (Brazilian Education Founding Agency) for a fellowship. A. R. R. P. thanks CNPq (Brazilian Research Founding Agency) for a fellowship.

Dias, V.H.A. and Papa, A.R.R. (2018) Application of Neural Networks in Probabilistic Forecasting of Earthquakes in the Southern California Region. International Journal of Geosciences, 9, 397-413. https://doi.org/10.4236/ijg.2018.96025