A Double-Weighted Deterministic Extreme Learning Machine Based on Sparse Denoising Autoencoder and Its Applications

Liang Luo; Bolin Liao; Cheng Hua; Rongbo Lu

doi:10.4236/jcc.2022.1011010

Journal of Computer and Communications > Vol.10 No.11, November 2022

A Double-Weighted Deterministic Extreme Learning Machine Based on Sparse Denoising Autoencoder and Its Applications

Liang Luo¹, Bolin Liao¹, Cheng Hua¹, Rongbo Lu^2*
¹College of Computer Science and Engineering, Jishou University, Jishou, China.
²College of Computer and Artificial Intelligence, Huaihua University, Huaihua, China.
DOI: 10.4236/jcc.2022.1011010 PDF HTML XML 78 Downloads 418 Views

Abstract

Extreme learning machine (ELM) is a feedforward neural network-based machine learning method that has the benefits of short training times, strong generalization capabilities, and will not fall into local minima. However, due to the traditional ELM shallow architecture, it requires a large number of hidden nodes when dealing with high-dimensional data sets to ensure its classification performance. The other aspect, it is easy to degrade the classification performance in the face of noise interference from noisy data. To improve the above problem, this paper proposes a double pseudo-inverse extreme learning machine (DPELM) based on Sparse Denoising AutoEncoder (SDAE) namely, SDAE-DPELM. The algorithm can directly determine the input weight and output weight of the network by using the pseudo-inverse method. As a result, the algorithm only requires a few hidden layer nodes to produce superior classification results when classifying data. And its combination with SDAE can effectively improve the classification performance and noise resistance. Extensive numerical experiments show that the algorithm has high classification accuracy and good robustness when dealing with high-dimensional noisy data and high-dimensional noiseless data. Furthermore, applying such an algorithm to Miao character recognition substantiates its excellent performance, which further illustrates the practicability of the algorithm.

Keywords

Extreme Learning Machine, Sparse Denoising Autoencoder, Pseudo-Inverse Method, Miao Character Recognition

Share and Cite:

Luo, L. , Liao, B. , Hua, C. and Lu, R. (2022) A Double-Weighted Deterministic Extreme Learning Machine Based on Sparse Denoising Autoencoder and Its Applications. Journal of Computer and Communications, 10, 138-153. doi: 10.4236/jcc.2022.1011010.

1. Introduction

Extreme Learning Machine (ELM), a brand-new single hidden layer feedforward neural network learning algorithm, was first put forth by Professor Huang of Nanyang Technological University in Singapore in 2004 [1]. Since ELM has the advantages of short training times and excellent generalization performance, it has received extensive attention and investigation from many scholars in the fields of pattern classification and data prediction, etc., unlike other conventional neural network algorithms (such as BP neural network algorithm [2]). The fundamental idea of ELM is to randomly create the input weights and hidden layer offsets of the network structure throughout the training phase and preserve them unmodified. Then the output weight of ELM is obtained by pseudo inverse operation [3]. The whole training process does not need iteration, only the hidden layer’s number of neurons and activation function need to be set [4]. At present, ELM is being utilized extensively in the detection of diseases [5], the understanding of intelligent decision-making [6], and the recognition of traffic signs [7].

However, it is worth pointing out that the input weights, hidden layer bias, and the numbers of hidden layer neurons of the ELM algorithm are uncertain, conducting to the disadvantages of high complexity and low stability of the algorithm [8]. In view of this problem, many scholars have carried out a lot of research and experiments. In [9], combining a Memetic Algorithm (MA) and ELM to embed the local search strategy into the global optimization framework to seek the optimal parameters of the network structure. Moreover, Rong et al. proposed the pruned-ELM algorithm [10], which uses statistical methods to measure the correlation of hidden nodes and optimizes hidden layer nodes via pruning. In [11], a generalized interval-2 type fuzzy neural network establishes hidden nodes on the basis of interval-2 type multivariate Gaussian function, realizes the optimization of ELM hidden layer nodes and achieves good results. Although the aforementioned algorithms can optimize the structure of neurons in the hidden layer, the defects that cause a large number of hyperparameters are nonnegligible. Hyperparameters are obtained by iterative optimization in the process of neural network training, which increases the computational complexity and reduces the real-time stability of the algorithm. To surmount this problem, a double pseudo-inverse extreme learning machine (DPELM) algorithm is proposed in to directly determine the input and output weights of the neural network [12], which improves computational efficiency and stability.

On the other hand, aiming at high-dimensional or noisy data, single hidden layer ELM cannot identify data features well, thus affecting the final data classification results. Many researchers try to solve this problem by combining the ELM algorithm with autoencoder, which can not only reduce the computational complexity of deep networks, but also reduce the training time. In [13], Cambria et al. propose ELM-AE by combining the advantages of ELM and autoencoder, which has good feature representation ability and classification performance. On this basis, Sun et al. proposed the discriminant Graph regularized extreme learning machine autoencoder (GELM-AE) to improve ELM-AE [14], which can extract more abstract high-level features and improve the overall performance of the network model. Inspired by the above research, this paper combines SDAE and DPELM, and further proposes a double pseudo-inverse extreme learning machine based on sparse denoising autoencoder (SDAE-DPELM) [15]. SDAE is an improved autoencoder deep learning algorithm, it adds sparse constraints to the autoencoder to optimize the network structure, then better extract the deep characteristics of the sample data, and enhance the robustness and regulation ability of the algorithm. The algorithm uses the sparse denoising automatic encoder to extract features from the input data and takes the extracted features as the input data of DPELM, and then carries out network training. The network training speed is guaranteed while also improving the feature extraction capability and noise immunity of the network.

2. Relate Work

In this paper, the proposed algorithm is an extension of ELM algorithm. For comparison and connection, this section will briefly review the relevant concepts of ELM and the necessary theory of SDAE. It includes the combination of original ELM and SDAE.

2.1. Traditional Extreme Learning Machine

Given $ℵ$ randomly distinct sample sets $(x_{n}, y_{n})$ , where $x_{n}$ is the input vector (sample feature) and $y_{n}$ is the matching sample label vector, with $x_{n} = {(x_{n 1}, x_{n 2}, \dots, x_{n N})}^{T} \in ℝ^{N}$ , $y_{n} = {(y_{n 1}, y_{n 2}, \dots, y_{n M})}^{T} \in ℝ^{M}$ . The ELM network has the following parameters: N input neurons, $K_{1}$ hidden layer neurons, M output neurons, and $G (\cdot)$ denotes the activation function in the hidden layer neuron. The ELM’s output can be expressed as

$y_{n} = \sum_{k_{1} = 1}^{K_{1}} λ_{k_{1}} G (ζ_{k_{1}}, ϕ_{k_{1}}, x_{n}),$ (1)

$n = 1, 2, \dots, ℵ$ , the k₁th neuron of the hidden layer with the weight vector in the input layer is represented by $ζ_{k_{1}} = (ζ_{k_{1} 1}, ζ_{k_{1} 2}, \dots, ζ_{k_{1} N})$ , $ϕ_{k_{1}}$ is a representation of the k₁th neuron’s bias. $λ_{k_{1}} = (λ_{k_{1} 1}, λ_{k_{1} 2}, \dots, λ_{k_{1} M})$ represents the vector of weights of the k₁th hidden layer neuron with the output layer. The above linear equations can be written in the matrix from

$H Λ = Y,$ (2)

where Y expresses the training sample’s desired output matrix, and

$H = [\begin{matrix} G (ζ_{1}, ϕ_{1}, x_{1}) & \dots & G (ζ_{K_{1}}, ϕ_{K_{1}}, x_{1}) \\ ⋮ & ⋱ & ⋮ \\ G (ζ_{1}, ϕ_{1}, x_{ℵ}) & \dots & G (ζ_{K_{1}}, ϕ_{K_{1}}, x_{ℵ}) \end{matrix}]$

is the randomized matrix mapping. Randomly generated the hidden layer neuron parameters $(ζ_{n}, ϕ_{n})$ and their values will remain constant throughout the training procedure. Given sample data and a known matrix $H$ , obtain the least squares solution $\tilde{Λ}$ of Equation (2), $\tilde{Λ} = H^{†} Y$ , $†$ represents the pseudo-inverse of the matrix.

2.2. Sparse Denoising Autoencoder

The traditional autoencoder (AE) has limited ability to reconstruct input data and poor ability to extract data features. Sparse autoencoder (SAE) is improved by traidional AE. Unlike AE, it adds sparsity constraint to the error function, so that most hidden layer nodes are set to 0, a few hidden layer nodes are not 0, and the network is more sparse. In this way, the SAE has good adjustment ability, which makes the learning process of the model more similar to that of the human brain, which is conducive to extracting more representative features and improving the classification accuracy of the algorithm. The SDAE is based on SAE to degrade the original sample data. Its goals are to reduce noise interference, enhance data reconstructions, and increase the algorithm’s robustness. Figure 1 depicts the structure of the SDAE network.

Degradation, sparse coding, and decoding are the three stages of the SDAE training process. To construct the degraded data $\ddot{X}$ , the original input data data $X$ is first set to 0 with the specified deterioration rate $v$ . Then the degraded data $\ddot{X}$ is sparsely encoded to get the encoded data $h$ . Finally, the encoded data $h$ is decoded by SDAE to obtain the reconstructed data $Y$ . Minimizing the loss function $L (X, Y)$ obtains the best feature representation of the original data.

The calculation formula of sparse coding and decoding process is shown in Equation (3):

$h = g (\ddot{X}) = s_{g} (w \ddot{X} + b), y = f (h) = s_{f} (w^{'} h + b^{'}),$ (3)

where w and b are the sparse coding weight matrix and offset vector, respectively, and $s (\cdot)$ is the activation function; typically, a sigmoid function is used. $w^{'}$ and $b^{'}$ are the decoding weight matrix and offset vector, respectively. Given the training set $D = {x^{(i)}, y^{(i)}}_{i = 1}^{N}$ , the SDAE’s overall loss function is:

Figure 1. Network structure diagram of SDAE.

$L (x, y) = \frac{1}{N} \sum_{D} J (x, y) + η \sum_{k_{2} = 1}^{K_{2}} K L (ρ ∥ {\hat{ρ}}_{k_{2}}) .$ (4)

In Equation (4), the first part $J (x, y) = \frac{1}{2} {‖ x - y ‖}_{2}^{2}$ is the square difference

error term. $η$ is used to control the weight of the sparse penalty term and can take any value between 0 and 1, and $K_{2}$ represents the number of hidden layer nodes.

The Kullback-Leibler Divergence (KL) method is used for sparse penalty, as shown in Equation (5):

$K L (ρ ∥ {\hat{ρ}}_{k_{2}}) = ρ \lg (\frac{ρ}{{\hat{ρ}}_{k_{2}}}) + (1 - ρ) \lg (\frac{1 - ρ}{1 - {\hat{ρ}}_{k_{2}}}),$ (5)

where ${\hat{ρ}}_{k_{2}} = \frac{1}{N} \sum_{i = 1}^{N} (a_{k_{2}} (x_{i}))$ , represents the average activation value of all

training samples on hidden layer neuron $k_{2}$ , and $a_{k_{2}}$ is the activation value on hidden layer neuron $k_{2}$ . To achieve the effect that most of the neurons are inhibited, $ρ$ is generally yielded to a value close to 0. If $ρ$ takes the value 0.03, then by this constraint, the average activation value of each implicit layer neuron $k_{2}$ of the self-encoder will be close to 0.03.

3. SDAE-DPELM Algorithm Design

3.1. DPELM Learning Algorithm

This section focuses on the weight determination method of the improved ELM. The DPELM consists of N (input neurons), $K_{1}$ (hidden layer neurons) and M (output neurons), which is similar to classic ELM. DPELM algorithm is shown in Figure 2. By further analyzing the traditional ELM principle, we can rewrite Equation (1) as the following equation:

$Y = Λ G (Z X - Φ),$ (6)

where $Y = [y_{1}, y_{2}, \dots, y_{ℵ}] \in ℝ^{M \times ℵ}$ , $X = [x_{1}, x_{2}, \dots, x_{ℵ}] \in ℝ^{N \times ℵ}$ , $Φ = [ϕ_{1}, ϕ_{2}, \dots, ϕ_{ℵ}] \in ℝ^{K \times ℵ}$ , $Λ$ is the output weight matrix and $Z$ is the input weight matrix,

$\begin{array}{l} Λ = [\begin{matrix} λ_{11} & λ_{12} & \dots & λ_{1 K_{1}} \\ λ_{21} & λ_{22} & \dots & λ_{2 K_{1}} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ λ_{M 1} & λ_{M 2} & \dots & λ_{M K_{1}} \end{matrix}] \in ℝ^{M \times K_{1}}, \\ Z = [\begin{matrix} ζ_{11} & ζ_{12} & \dots & ζ_{1 N} \\ ζ_{21} & ζ_{22} & \dots & ζ_{2 N} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ ζ_{K_{1} 1} & ζ_{K_{1} 2} & \dots & ζ_{K_{1} N} \end{matrix}] \in ℝ^{K_{1} \times N} . \end{array}$

Theorem 1. Assume that activation function $G (\cdot)$ is strictly monotonous. The bias $Φ$ and output weights $Λ$ are generated from interval randomly, the ideal input weights $Z = (G^{- 1} (Λ^{†} Y) + Φ) X^{†}$ .

Figure 2. DPELM network structure diagram.

The bias $Φ$ and output weights $Λ$ are generated from interval randomly, Prove: Simultaneously left multiply both sides of Equation (6) by $Λ^{†}$ to get

$Λ^{†} Y = Λ^{†} Λ G (Z X - Φ) = G (Z X - Φ) .$ (7)

Then, solve the inverse function of the above equation, we have

$G^{- 1} (Λ^{†} Y) = Z X - Φ,$

where $G^{- 1} (\cdot)$ is the inverse function of $G (\cdot)$ . The above equation can be rewritten as follows

$Z X = G^{- 1} (Λ^{†} Y) + Φ .$ (8)

Finally, we right multiply in both sides of the above equation by $X^{†}$ to obtain

$Z X X^{†} = (G^{- 1} (Λ^{†} Y) + Φ) X^{†},$

that is,

$Z = (G^{- 1} (Λ^{†} Y) + Φ) X^{†} .$

The proof is finished.

The best output weight is determined using the pseudo-inverse approach once the optimal $Z$ has been found. The formula $\tilde{Λ} = Y {(G (Z X - Φ))}^{†}$ can be used to determine the value of $\tilde{Λ}$ .

3.2. Combination Method of SDAE and DPELM

The DPELM algorithm’s network structure is straightforward, and it can quickly classify data. However, the classification performance of the data will be considerably impacted whenever the DPELM algorithm works with high-dimensional and noisy data. SDAE is capable of capturing characteristic representations of input data. Due to the increased sparsity constraint in SDAE, the network can better learn the structural features of the input data and thus better describe the input data. Therefore, we consider combining SDAE with DPELM. SDAE is responsible for extracting the input data required by DPELM, and then DPELM classifies the sample data. In addition to successfully reducing noise interference and enhancing algorithm robustness, it may also simplify the network topology and boost classification performance. Figure 3 depicts the SDAE-DPELM algorithm’s network structure.

Below is a description of the SDAE-DPELM training process in full.

● Step 1: The sample data $X$ is degraded into $\ddot{X}$ . Gradient descent training is carried out for SDAE to make the input data of the network equal to the output data. When the error function L is below the set threshold, SDAE training is finished.

● Step 2: After the SDAE training process, we obtain the hidden layer data $h$ from the SDAE network. $h$ is the result of a high level of abstraction of the original input layer sample data and network parameters. Due to the degradation of the original input and the addition of sparsity constraints to the network, the essential characteristics of the input data can be better reflected and the algorithm is more robust.

● Step 3: In this part, we take the extracted $h$ as input layer sample data of DPELM classification algorithm, and use DPELM classification algorithm for classification processing. We conducted supervised training on these labeled sample data and obtained the fina classification results. At this point, the entire SDAE-DPELM algorithm training is finished.

Figure 3. SDAE-DPELM network structure diagram.

4. Experimental Results and Analysis

4.1. Introduction of Experimental Environment and Experimental Dataset

In this section, aimed at examining the performance of our proposed algorithm, we will select the MINIST dataset on UCI for multiple experiments. The experimental hardware consists of an Intel(R) Core(TM) i9-10850K CPU 3.60 GHz with 16 GB of memory, 64-bit Windows 10, and Matlab 2016(a) as the experimental software.

4.2. Analysis and Determination of Network Structure

The number of input layer nodes N and output layer nodes M are fixed during process of neural network training, which corresponds to the number of training data dimensions and classes, respectively. Therefore, the key to neural network structure determination lies in the number of hidden layer nodes K₁. To determine K₁, scholars usually use the trial and error method to change the network structure according to certain standards for repeated experiments and then use the network structure with the best performance. This paper needs to conduct comparative experiments on performance impact. The different number of hidden layer nodes will affect the objectivity of model performance comparison. In SDAE-DPLEM neural network, it is divided into two stages: SDAE feature extraction and DPELM classification. The number of hidden nodes is different at different stages, so it has different effects on the performance of the network structure. Since it is challenging to count the hidden layer nodes K₁ and K₂ in both phases at once, the following experiments are used in this study to incrementally count the hidden layer nodes in each phase.

Firstly, this paper selected the MINIST dataset to train ELM and observe the relationship curve between ELM hidden layer node K₁ and classification performance in the training set and test set. The value range of K₁ is [100, 2000]. The performance trend is shown in Figure 4(a).

It can be seen from Figure 4(a) that the classification accuracy tends to increase incrementally as hidden layer node K₁ increases. Considering the compactness of the ELM algorithm network structure and the cost of training time

Figure 4. Influence of hidden layer node number on classification performance. (a) ELM Hidden layer node K₁; (b) SDAE hidden layer node K₂; (c) DPELM hidden layer node K₁.

burden, 1500 nodes are determined to be the ideal number of ELM hidden layer nodes K₁. Then, SDAE hidden layer nodes K₂ is determined by trial and error method again. The variation trend of the classification accuracy of the algorithm SDAE-ELM with SDAE hidden layer nodes K₂ is shown in Figure 4(b).

In Figure 4(b), after SDAE hidden layer nodes K₂ increases, the classification accuracy of SDAE-ELM gradually increases. When K₂ gets to 800, the classification accuracy is higher, and then there will be a small fluctuation in this range. While ensuring high classification accuracy, to reduce the complexity of time and space, the convergence of the network needs to be considered. Therefore, K₂ is determined to be 800.

As can be seen from the Figure 4(c), when DPELM hidden layer nodes K₁ is less than 400, the classification accuracy of DPELM increases rapidly and reaches the highest point at 400. However, with the increasing DPELM hidden layer nodes K₁, the classification accuracy of DPELM begins to decline. In conclusion, in the following experimental analysis and comparison, the optimal DPELM hidden layer nodes K₁ is set to 400.

Finally, the optimal network structure 784-800-400-M of SDAE-DPELM is determined. Where 784 is the input layer nodes N of the data set and M is the number of categories, that is, the output vector.

4.3. SDAE-DPELM Numerical Comparison Experiment

SDAE-DPELM is compared to existing DPELM and SDAE-ELM to validate the algorithm’s overall performance. Concretely, SDAE-DPELM is compared with DPELM to verify the impact of SDAE (sparse denoising autoencoder) on model accuracy, and the relevant results are shown in Figure 5(a). Similarly, SDAE-ELM is used to verify the superiority of double pseudo inverse and the related results are depicted in Figure 5(b). Noteworthy, the four algorithms have N= 784 nodes in the input layer and M= 10 nodes in the output layer, whose network structures are 784-K₁-10, 784-800-K₁-10,784-K₁-10, 784-800-K₁-10, respectively, where 800 is hidden layer nodes K₁.

Figure 5. Curve of classification accuracy of DPELM, SDAE-ELM and SDAE-DPELM with hidden layer nodes K₁.

Obviously, SDAE-DPELM is more accurate than DPELM in any number of hidden nodes from Figure 5(a). As can been seen from Figure 5(b), when K₁ is large enough, the identification accuracy of SDAE-ELM in data set MINIST can be close to that of SDAE-DPELM, but too many hidden nodes will lead to more complex network structure and increase the burden of network training time. Therefore, this shows that SDAE-DPELM algorithm is effective by altering the conventional weight calculation approach.

As shown in Figure 6(b), when Gaussian noise (G.N.) is added to the MINIST dataset, the classification accuracy of the four models does not decrease significantly. The accuracy of data classification by SDAE-DPELM is still around 96%. At the same time, when the number of nodes is small, SDAE-DPELM can quickly reach a high value of classification accuracy. However, when salt and pepper noise (S&P.N.) is added to the data, the classification accuracy of the four models decreases to different degrees. SDAE-DPELM reduced classification accuracy by about 3%. Nevertheless, no matter what noise is added, the classification accuracy of SDA-DPELM is better than the other three models at any node between 0 and 1500. This indicates that SDAE-DPELM has better anti-noise than the other three models.

Figure 6. Influence of hidden layer nodes K₁ on classification accuracy of four algorithms under different noise conditions. (a) No noise. (b) G.N. (c) S&P.N. (d) G.N. and S&P.N.

4.4. Robustness Analysis of SDAE-DPELM Algorithm

SDAE-DPELM degenerates the original data samples, and its purpose is to eliminate noise interference and extract more essential features, thereby improving the robustness of the model. To verify the robustness of the SDAE-DPELM algorithm, different proportions of white Gaussian noise were added to the MNIST data set to compare the classification accuracy of the algorithm. The added Gaussian noise is subject to N(0, 0.01), N(0, 0.02), N(0, 0.03), N(0, 0.04), N(0, 0.05). In order to enhance the persuasiveness of the experiment, this experiment is carried out under the condition that the sparsity parameter $ρ$ is 0.03, 0.05, and 0.07. The experimental results are shown in Figure 7.

It can be seen from Figure 7 that the classification accuracy of the algorithm slightly decreases when different proportions of white Gaussian noise are added to the dataset. However, when the sparsity parameter E takes different values, the variation range of classification accuracy is still less than 2%, which indicates that the data with certain noise will not significantly affect the classification accuracy of SDAE-DPELM.

4.5. Stability Analysis of SDAE-DPELM Algorithm

To verify the stability of the SDAE-DPELM algorithm, this section compares the mean, variance and range of SDAE-DPELM and ELM. Gradually increase the number of neurons in the hidden layer of the algorithm, and each time increase 50 neurons to observe the change of classification accuracy. Because the results of each classification of traditional ELM will have certain differences due to the different random initialization weights. In order to ensure the objectivity of algorithm comparison results and not be interfered by random values, during the experiment, every 50 hidden layer neurons are added, the ELM algorithm and

Figure 7. Comparison of SDAE-DPELM accuracy in the MINIST dataset of white Gaussian noise with different proportions.

SDAE-DPELM algorithm are required to run 10, 20 and 50 times respectively. The average accuracy of the test set is taken as the final classification result of each algorithm. The experimental results are shown in Table 1.

It can be seen from the results in Table 1 that the classification performance of SDAE-DPELM algorithm is higher than that of the original ELM no matter whether it has been tested 10 times, 20 times or 50 times. In order to further analyze the stability of SDAE-DPELM algorithm, the range and variance comparison experiments between SDAE-DPELM and ELM are conducted again according to the above methods.

The variance value and the range value reflect the discrete degree of the output results of the two algorithms, which can be used to judge the stability of the algorithm output. The greater the dispersion, the more unstable the algorithm is. On the contrary, the smaller it is, the better the stability of the algorithm is. As can be seen from Figure 8, the variance and range of the proposed algorithm in MINIST dataset are smaller than those of the traditional ELM algorithm, indicating that the stability of the proposed algorithm is better than that of the traditional ELM algorithm.

Table 1. Comparison of ELM and SDAE-DPELM classification average accuracy (%).

Figure 8. Variation of range and variance of ELM algorithm and DPLEM algorithm. (a) t = 10. (b) t = 20. (c) t = 50. (d) t = 10. (e) t = 20. (f) t = 50.

4.6. Recognition of Xiangxi Block Miao Characters Based on SDAE-DPELM

Xiangxi block Miao script is a written symbol created by Miao literati in the late Qing Dynasty to record, sort out and create Miao songs, which is still widely used in the Xiangxi Miao area. Xiangxi block Miao character recognition is to automatically recognize Miao characters written on a paper carrier by using computer and image processing technology, which is of great practical significance for the digital preservation of Miao characters. This section will use the new neural network SDAE-DPELM algorithm proposed in this paper to further explore the application of Xiangxi block Miao text recognition.

In this paper, we randomly select some characters from the Miao language library for the experiment. To make it easier for readers to read, we selected a portion from the original data set, as shown in Figure 9.

This experiment uses the chosen Miao character dataset to examine the highest correct recognition rate that both algorithms can obtain with the number of neurons in the hidden layer between 1 and 1000 to validate the performance of SDAE-DPELM in Miao language recognition.

The experimental results are shown in Table 2. The network structure of the SDAE-DPELM algorithm is more condensed than that of the original ELM algorithm, and it also achieves superior classification performance even with fewer hidden layer nodes. It is worth mentioning that the network structure of SDAE-DPELM algorithm given in Table 2, in which the input vector of the first layer is 2704, is the feature vector obtained by scaling the original Xiangxi square character picture and then extracting its features by LGS operator. The second layer of 2375 neurons is a deep feature vector obtained by further deep feature extraction

Figure 9. Some Miao characters in the original sample.

Table 2. Performance comparison of two algorithms in Miao character recognition.

Figure 10. Comparison of recognition results of two algorithms. (a) ELM Algorithm. (b) SDAE-DPELM Algorithm.

by the SDAE. The following 30 (1000 in SDAE-ELM and ELM algorithm) is the number of neurons in the hidden layer required by the classifier. 10 represents the dimension of the output vector. Ten different output results can be represented.

As shown in Figure 10, this paper proposed algorithm can achieve higher accuracy than the original ELM algorithm with fewer hidden layer nodes. This provides an effective and practical tool for Xiangxi block Miao character recognition.

5. Conclusion

In this paper, a new deep learning algorithm SDAE-DPELM is proposed by combining SDAE with DPELM. SDAE is used to extract more representative deep abstract features of data, and the extracted features are used as input data of DPELM, and then network training is carried out. It not only overcomes the disadvantages of ELM’s complex network structure and weak robustness caused by random assignment of hidden layer parameters, but also retains the advantages of ELM’s fast operation speed. The experimental results show that the SDAE-DPELM algorithm has good robustness and stability. The practicality of SDAE-DPELM is also verified in the application of Miao character recognition. At present, the double-pseudo inverse weight determination method is only used for the traditional ELM. In future work, we will consider combining this weight determination method with ensemble learning to apply to the field of data prediction and image recognition.

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant No. 62066015, the Natural Science Foundation of Hunan Province of China under Grant No. 2020JJ4511.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1]	Huang, G.B., Zhu, Q.Y. and Siew, C.K. (2004) Extreme Learning Machine: A New Learning Scheme of Feedforward Neural Networks. 2004 IEEE International Joint Conference on Neural Networks, Vol. 2, 985-990. https://doi.org/10.1109/IJCNN.2004.1380068
[2]	Jin, W., Li, Z.J., Wei, L.S. and Zhen, H. (2000) The Improvements of BP Neural Network Learning Algorithm. 5th International Conference on Signal Processing Proceedings. 16th World Computer Congress 2000, Vol. 3, 1647-1649.
[3]	Xu, R., Liang, X., Qi, J.S., Li, Z.Y. and Zhang, S.S. (2019) Advances and Trends in Extreme Learning Machine. Chinese Journal of Computers, 42, 1640-1670.
[4]	Huang, G.B. and Wang, D.H. and Lan, Y. (2011) Extreme Learning Machines: A Survey. International Journal of Machine Learning and Cybernetics, 2, 107-122. https://doi.org/10.1007/s13042-011-0019-y
[5]	Chyzhyk, D., Savio, A. and Graa, M. (2015) Computer Aided Diagnosis of Schizophrenia on Resting State fMRI Data by Ensembles of ELM. Neural Networks, 68, 23-33. https://doi.org/10.1016/j.neunet.2015.04.002
[6]	Wang, G.H., Li, S.M., Zhu, D. and Yang, J. (2014) Application of Extreme Learning Machine in Objective Stereoscopic Image Quality Assessment. Journal of Optoelectronics Laser, 25, 1837-1842.
[7]	Huang, Z., Yu, Y. and Liu, H. (2016) An Efficient Method for Traffic Sign Recognition Based on Extreme Learning Machine. IEEE Transactions on Cybernetics, 47, 920-933. https://doi.org/10.1109/TCYB.2016.2533424
[8]	Zhang, W.B., Ji, H.B., Wang, L. and Zhu, M.Z. (2014) Multiple Hidden Layer Output Matrices Extreme Learning Machine. Systems Engineering and Electronics, 36, 1656-1659.
[9]	Zhang, Y., Wu, J., Cai, Z., Zhang, P. and Chen, L. (2013) Memetic Extreme Learning Machine. Pattern Recognition, 58, 135-148. https://doi.org/10.1016/j.patcog.2016.04.003
[10]	Rong, H.J., Ong, Y.S., Tan, A.H. and Zhu, Z. (2008) A Fast Pruned-Extreme Learning Machine for Classification Problem. Neurocomputing, 72, 359-366. https://doi.org/10.1016/j.neucom.2008.01.005
[11]	Pratama, M., Zhang, G., Er, M.J. and Anavatti, S. (2016) An Incremental Type-2 Meta-Cognitive Extreme Learning Machine. IEEE Transactions on Cybernetics, 47, 339-353. https://doi.org/10.1109/TCYB.2016.2514537
[12]	Chen, L., Huang, Z., Li, Y., Zeng, N., Liu, M., Peng, A. and Jin, L. (2019) Weight and Structure Determination Neural Network Aided with Double Pseudoinversion for Diagnosis of Flat Foot. IEEE Access, 7, 33001-33008. https://doi.org/10.1109/ACCESS.2019.2903634
[13]	Cambria, E. and Huang, G. (2013) Extreme Learning Machines-Representational Learning with ELMs for Big Data. IEEE Intelligent Systems, 28, 30-59. https://doi.org/10.1109/MIS.2013.140
[14]	Sun, K., Zhang, J., Zhang, C. and Hu, J. (2017) Generalized Extreme Learning Machine Autoencoder and a New Deep Neural Network. Neurocomputing, 230, 374-381. https://doi.org/10.1016/j.neucom.2016.12.027
[15]	Ng, A. (2011) Sparse Autoencoder. CS294A Lecture Notes, 72, 1-19.

Journals Menu

Follow SCIRP

	+1 323-425-8868
	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies