A Novel Method for Solving Ordinary Differential Equations with Artificial Neural Networks

This research work investigates the use of Artificial Neural Network (ANN) based on models for solving first and second order linear constant coefficient ordinary differential equations with initial conditions. In particular, we employ a feed-forward Multilayer Perceptron Neural Network (MLPNN), but by-pass the standard back-propagation algorithm for updating the intrinsic weights. A trial solution of the differential equation is written as a sum of two parts. The first part satisfies the initial or boundary conditions and contains no adjustable parameters. The second part involves a feed-forward neural network to be trained to satisfy the differential equation. Numerous works have ap-peared in recent times regarding the solution of differential equations using ANN, however majority of these employed a single hidden layer perceptron model, incorporating a back-propagation algorithm for weight updation. For the homogeneous case, we assume a solution in exponential form and compute a polynomial approximation using statistical regression. From here we pick the unknown coefficients as the weights from input layer to hidden layer of the associated neural network trial solution. To get the weights from hidden layer to the output layer, we form algebraic equations incorporating the default sign of the differential equations. We then apply the Gaussian Radial Basis function (GRBF) approximation model to achieve our objective. The weights obtained in this manner need not be adjusted. We proceed to develop a Neural Network algorithm using MathCAD software, which enables us to slightly adjust the intrinsic biases. We compare the convergence and the accuracy of our results with analytic solutions, as well as well-known numerical methods and obtain satisfactory results for our example ODE problems.


Introduction
The beginning of Neuro-computing is often taken to be the research article of McCulloch and Pitts [1] published in 1943, which showed that even simple types of neural networks could, in principle, compute any arithmetic or logical function, was widely read and had great influence. Other researchers, principally von Neumann, wrote a book [2] in which the suggestion was made that the research into the design of brain-like or brain-inspired computers might be of great interest and benefit to scientific and technological knowledge.
We present a new perspective for obtaining solutions of initial value problems using Artificial Neural Networks (ANN). We discover that neural network based model for the solution of ordinary differential equations (ODE) provides a number of advantages over standard numerical methods. Firstly, the neural network based solution is differentiable and is in closed analytic form. On the other hand most other techniques offer a discretized solution or a solution with limited differentiability. Secondly, the neural network based method for solving a differential equation provides a solution with very good generalization properties. The major advantage here is that our method reduces considerably the computational complexity involved in weight updating, while maintaining satisfactory accuracy.

Neural Network Structure
A neural network is an inter-connection of processing elements, units or nodes, whose functionality resemble that of the human neurons [3]. The processing ability of the network is stored in the connection strengths, simply called weights, which can be obtained by a process of adaptation to, a set of training patterns.
Neural network methods can solve both ordinary and partial differential equations. Furthermore, it relies on the function approximation property of feed forward neural networks which results in a solution written in a closed analytic form. This form employs a feed forward neural network as a basic approximation element [4] [5]. Training of the neural network can be done either by any optimization technique which in turn requires the computation of the gradient the error with respect to the network parameters, by regression based model or by basis function approximation. In any of these methods, a trial solution of the differential equation is written as a sum of two parts, proposed by Lagaris [6].
The first part satisfies the initial or boundary conditions and contains no adjustable parameters. The second part contains some adjustable parameters that involves feed forward neural network and is constructed in a way that does not affect the initial or boundary conditions. Through the construction, the trial so-lution, initial or boundary conditions are satisfied and the network is trained to satisfy the differential equation. The general flowchart for neural network training (or learning) is given below in Figure 1.

Neural Networks as Universal Approximators
Artificial neural networks can make a nonlinear mapping from the inputs to the outputs of the corresponding system of neurons, which is suitable for analyzing the problem defined by initial/boundary value problems that have no analytical solutions or which cannot be easily computed. One of the applications of the multilayer feed forward neural network is the global approximation of real valued multivariable function in a closed analytic form. Namely such neural networks are universal approximators. It is discovered in the literature that multilayer feed forward neural networks with one hidden layer using arbitrary squashing functions are capable of approximating any Borel measurable function from one finite dimensional space to another with any desired degree of accuracy. This is made clear in the following theorem.

Universal Approximation Theorem
The universal approximation theorem for MLP was proved by Cybenko [7] and Hornik et al. [8] in 1989. Let n I represent an n-dimensional unit cube containing all possible input samples ( ) are dense in ( ) n C I . This simply means that given any function ( ) n f C ∈ I and 0 ε > , there is a sum ( ) , y x w of the above form that satisfies

Gradient Computation
Minimization of error function can also be considered as a procedure for training the neural network [9], where the error corresponding to each input vector x is the value ( ) f x which has to become zero. In computing this error value, we require the network output as well as the derivatives of the output with respect to the input vectors. Therefore, while computing error with respect to the network parameters, we need to compute not only the gradient of the network but also the gradient of the network derivatives with respect to its inputs. This process can be quite tedious computationally, and we briefly outline this in what follows.

Gradient Computation with Respect to Network Inputs
Next step is to compute the gradient with respect to input vectors, for this purpose let us consider a multilayer perceptron (MLP) neural network with n input units, a hidden layer with m sigmoid units and a linear output unit. For a given input vector the network output is written: Similarly, the k th derivative of N is computed as; ϕ denotes the k th order derivative of the sigmoid activation function.

Gradient Computation with Respect to Network Parameters
Network's derivative with respect to any of its inputs is equivalent to a feedforward neural network ( ) k N x with one hidden layer, having the same values for the weights ji w and thresholds j u and with each weight j v being replaced with j j v p . Moreover, the transfer function of each hidden unit is replaced with the k th order derivative of the sigmoid function. Therefore, the gradient of k N with respect to the parameters of the original network can easily obtained as:

Network Parameter Updation
After computation of derivative of the error with respect to the network parameter has been defined then the network parameters j v , j u and ji w upda- where µ , η and γ are the learning rates, 1, 2, , i n =  and 1, 2, , j m =  . Once a derivative of the error with respect to the network parameters has been defined it is then straightforward to employ any optimization technique to minimize error function.

General Formulation for Differential Equations
Let us consider the following general differential equations which represent both ordinary and partial differential equations Majidzadeh [10]: is the unknown scalar-valued solution to be computed. Here, G is the function which defines the structure of the differential equation and ∇ is a differential operator. Let

( )
, t x p ψ denote the trail solution with parameters (weights, biases) p. Tian Qi et al. [11], gave the following as the general formulation for the solution of differential equations (5) using ANN. Now,

( )
, t x p ψ may be written as the sum of two terms where ( ) A x satisfies initial or boundary condition and contains no adjustable parameters, whereas

Neural Network Training
The neural network weights determine the closeness of predicted outcome to the desired outcome. If the neural network weights are not able to make the correct prediction, then only the biases need to be adjusted. The basis function we shall apply in this work in training the neural network is the sigmoid activation function given by

Method for Solving First Order Ordinary Differential Equation
Let us consider the first order ordinary differential equation below In this case, the ANN trial solution may be written as , N x p is the neural output of the feed forward network with one input data x with parameters p. The trial solution satisfies the initial condition. Now let us consider a first order differential equation: with trial solution: where x is the input to neural network model and p represents the parametersweights and biases.
To solve this problem using neural network, we shall employ a neural network architecture with three layers. One input layer with one neuron; one hidden layer with three neurons and one output layer with one output unit, as depicted below in Figure 2.
Each neuron is connected to other neurons of the previous layer through adaptable synaptic weights j w and biases j u . Now, where 1 Now, in solving ordinary differential equations, we assume a solution to the homogeneous part and approximate the function using SPSS model, which estimates the regression coefficients in the multiple regression mode. These coefficients are what we use as the weights from the input layer to the hidden layer. The condition placed on Any exponential function, ( ) e x y x α = , where α ∈  , is a part of solution to any first order ordinary differential equation. We regress that function using excel spreadsheet and SPSS model as follows: Assuming we let , be a solution to a given first order ordinary differential equation, defined on the given interval. Then dividing the interval into 10 equidistant points, we use excel spreadsheet to find values of ( ) a y x at all the points shown in Table 1, then use SPSS to regress and get the weights which we designate weights from input layer to hidden layer.
The above is followed by the display of the SPSS 20 output of the data in Table 2, from which we pick the weights.
Looking at Table 2, we see that the cubic curve fits perfectly the assumed solution. Therefore, we pick the coefficients: 2.413, 0.115 and 3.860 as the weights from input layer to the hidden layer.
The next task is to obtain the weights from hidden layer to the output layer.
We shall find ( ) is satisfied, where i v are real valued constants such that When one can find coefficients i v that make ε arbitrarily small for any function ( ) and the solution becomes where v becomes a vector with the coefficients, f is a vector composed of the values of the function at the N points, and φ the matrix with entries given by values of the elementary functions at each of the N points in the domain. An important condition that must be placed in the elementary functions is that the inverse of φ must exist. In general, there are many sets with the property of universal approximation for a class of functions. We would prefer a set vides a smaller error ε for a pre-set value of N. This means that the speed of convergence of the approximation is also an important factor in the selection of the basis. As we mentioned earlier, there are many possible elementary basis functions we can choose from. A typical example of basis function is the Gaussian basis specified by: However in neuro-computing, the most popular choice for elementary functions is the radial basis function (RBFs) where where , , v f φ are as defined before, and ( ) ( ) Now to compute ( ) f x depends on the nature of a given differential equation. For first, second or higher order homogeneous ordinary differential equations, we form linear, quadratic or higher order polynomial equations incorporating the default signs of the terms in the differential equations. For non-homogeneous ordinary differential equations, we use the forcing functions. When the weights of the neural network are obtained by these systematic ways, there is no need to adjust all the parameters in the network, as postulated by previous researchers, in order to achieve convergence. All that is required is a little adjustment of the biases, and these are fixed to lie in a given interval and convergence to a solution with an acceptable minimum error is achieved. When the problem of obtaining the parameters is settled, it becomes easy to solve any first, second or higher order ordinary differential equation using the neural network model appropriate to it. We shall restrict this study to first and second order linear ordinary homogeneous differential equations. To compute the prediction error we use the squared error function defined as follows: where d ψ represents the desired output and p ψ the predicted output. The same procedure can be applied to second and third order ODE. For the second order initial value problem (IVP): The trial solution is written , and for two point BC: the trial solution in this case is written: Now, we first demonstrate the computational complexity involved in adjusting all parameters in order to update the weights and getting a close approximation to the desired result. Subsequently, we proceed to our main results and analysis that displays the ease of computation achieved by our novel method of adjusting only the biases. The former adjustment or weight updation is done using the backpropagation algorithm. Therefore, we need to train the network so we can apply the backpropagation algorithm. The basis function we shall apply in this work in training the neural network is the sigmoid activation function given by Equation (9). In a simple neural model as depicted in Figure 3 and predicted output p y . The diagram below shows the neural network training model for the sample data we are considering. Now we proceed to train the network to get the predicted output.

Network Training
First we compute  We have seen that the predicted output does not correspond to the desired output, therefore we have to train the network to reduce the prediction error. To compute the prediction error, we use the squared error (21) function defined as follows. Considering the predicted output we calculated above, the prediction error is: We observe that the prediction error is huge, so we must attempt to minimize it. We noted previously that the weights determine how close a predicted output is to the desired output. Therefore to minimize the error we have to adjust the weights. This can be achieved using the formula;

Computation of the Gradient
The error computation not only involves the outputs but also the derivatives of the network output with respect to its inputs. So, it requires computing the gradient of the network derivatives with respect to its inputs. Let us now consider a multilayered perceptron with one input node, a hidden layer with m nodes, and one output unit. For the given inputs The derivative of N θ with respect to other parameters may be obtained as Now after getting all the derivatives we can find out the gradient of error. Using general learning method for supervised training we can minimize the error to the desired accuracy. We illustrate the above using the first order ordinary differential equation below In this case, the ANN trial solution may be written as For evaluating the derivative term in the right hand side of (32), we use equa- The weights from input to hidden are modified according to the following rule Here, is the learning rate and r is the iteration step. The weights from hidden to output layer may be updated in a similar formulation as done for input to hidden. Now going back to Equation (27), we recall that ( ) If the neural network model is a simple one as we saw in Section 3.3, then, Now let us consider a first order differential equation: with trial solution: where x is the input to neural network model and p represents the parametersweights and biases.
To solve this problem using neural network (NN), we shall employ a NN architecture given in Figure 3. Now, If the neural network model is not able to predict correctly the solution of the differential equation with the given initial parameters-weights and biases, we If the prediction error does not satisfy an acceptable threshold, then the parameters need to be adjusted using the equation, Recall that: to update the weights and biases. The superscript (r) denotes the r th iteration.
It is important to note that the foregoing is necessary in order to achieve the number of iterations required. Sometimes, it may be necessary to do up to 30 or more iterations for the solution to converge within an acceptable threshold. It is on the basis of this complex derivatives involved in solving ODE with neural network, especially of higher order, and the many iterations required for convergence that motivated our search for a more efficient and accurate way of solving the given problem, but avoiding the inherent computational complexity.

Results
We begin with a couple of examples on first and second order differential equations.

Example
Consider the initial value problem; This equation has been solved by Mall and Chakraverty [13]. As discussed in the previous section, the trial solution is given by; Applying the initial condition gives . To obtain the weights from input to hidden layer, it is natural to assume ( ) e x a y x = . We approximate this function using regression in SPSS. We shall train the network for 10 equidistant points in [0, 1], and then employ excel spreadsheet to find values of ( ) e x a y x = at all the points. This leads us to the following data in Table 3.
We then perform a curve-fit polynomial regression built into IBM SPSS 23 [14], for the function ( ) e x a y x = . The output is displayed below in Table 4. Table 4 displays the linear, quadratic and cubic regression of ( ) e x a y x = . The quadratic and cubic curves show perfect goodness of fit, 2 1 R = . Using the cubic curve, we pick our weights from input layer to hidden layer as: We now form a linear function based on the default sign of the differential equation, i.e. ( ) where a is the coefficient of the derivative of y and b is the coefficient of y Thus; The neural architecture for the neural network is shown in Figure 2, so we let 3 N = . We take . It then follows that Substituting the given values of the vectors x and f, we obtain the weights from the hidden layer to the output layer, Therefore the weights from the hidden layer to the output layer are; The biases are fixed between -1 and 1. We now train the network with the available parameters using MathCAD 14 software [15] as follows: Here y d and y p are respectively the desired output (exact solution) and the predicted output (trial solution). From the indicated error value, this is an acceptable accuracy. We compare our results with the neural network results obtained by Otadi and Mosleh [16] and find them to be in reasonable agreement. This is depicted in Table 5, as well as the graphical profile in Figure 4 below. The perfect accuracy is evident in the graphical profile depicted in Figure 4.

Remark
In what follows, we consider a non-homogeneous second order linear differential equation. It is important to recall that for any second order non-homogeneous differential equation of the form ( ) ( ) ( ) ( ) y x a x y b x y f x ′′ ′ + + = , the nonhomogeneous term ( ) f x is termed the forcing function. In this section, we shall employ the forcing function to compute the weights from hidden layer to the output layer. This is made clear in the following example.

Example
Consider the initial value problem;  Table 6.
Using regression and SPSS model we find weights from input layer to hidden layer. From Table 7 and using the cubic curve fit with R 2 = 1, we pick our weights from input layer to hidden layer as: 11 12 13 0.488, 1.697, 3.338 w w w = = = .
Now to compute the weights from hidden layer to the output layer, we use the function: The weights from the hidden layer to the output layer are; The biases are fixed between −1 and 1. We now train the network with the available parameters using our MathCAD 14 algorithm as follows: ( ) ( ) We compare the exact and approximate solution in Table 8. The accuracy is clearly depicted graphically in Figure 5.

Conclusion
In this paper, we have presented a novel approach for solving first and second order linear ordinary differential equations with constant coefficients. Specifically, we employ a feed-forward Multilayer Perceptron Neural Network (MLPNN), but avoid the standard back-propagation algorithm for updating the intrinsic weights.
This greatly reduces the computational complexity of the given problem. Our results are validated by the near perfect approximations achieved in comparison with the exact solutions, as well as demonstrating the function approximation capabilities of ANN. This then proves the efficiency of our neural network procedure. We employed Excel spreadsheet, IBM SPSS 23, and MathCAD 14 algorithm to achieve this task.