Semantics Interaction Control for Constructing Intelligent Ecology of Internet of Things and Critical Component Research

Intelligent equipment is a kind of device that is characterized by intelligent sensor interconnections, big data processing, new types of displays, human-machine interaction and so on for the new generation of information technology. For this purpose, in this paper, first, we present a type of novel intelligent deep hybrid neural network algorithm based on a deep bidirectional recurrent neural network integrated with a deep backward propagation neural network. It has realized acoustic analysis, speech recognition and natural language understanding for jointly constituting human-machine voice interactions. Second, we design a voice control motherboard using an embedded chip from the ARM series as the core, and the onboard components include ZigBee, RFID, WIFI, GPRS, a RS232 serial port, USB interfaces and so on. Third, we take advantage of algorithms, software and hardware to make machines “understand” human speech and “think” and “comprehend” human intentions to structure critical components for intelligent vehicles, intelligent offices, intelligent service robots, intelligent industries and so on, which furthers the structure of the intelligent ecology of the Internet of Things. At last, the experimental results denote that the study of the semantics interaction controls based on an embedding has a very good effect, fast speed and high accuracy, consequently realizing the intelligent ecology construction of the Internet of Things.


Introduction
With the vigorous development of sensor technology, network transmission technology, intelligent information processing technology and so on, the Internet of Things with intelligent interconnections of "thing to thing" is believed to be the third wave of world information industry development (following the computer and Internet). People have a lot of information that needs to be communicated via the computer every day. Traditional human-machine interaction modes such as the keyboard, mouse, touch screen and so on have had increasing difficulties meeting the growing needs of people for intelligent computing and control. Especially with mobile terminals (for example, palm computers, PADs, mobile-phones and so on) and various kinds of intelligent devices being extensively used in mobile computing environments, implementation requirements for voice interaction have become increasingly more urgent. Speech recognition technology can be applied to indoor equipment controls, voice control telephone exchange, intelligent toys, industrial controls, home services, hotel services, banking services, ticketing systems, information web queries, voice communication systems, voice navigation and so on in all kinds of voice control systems and self-help customer service systems [1] [2] [3]. In particular, with the vigorous development of artificial intelligence technology, compared to traditional man-machine interaction modes, (which mainly include keyboards, mice and so on to communicate), people naturally expect that machines will have highly intelligent voice communication abilities, (named intelligent machines) that can "understand" human speech, "think" and "comprehend" human intentions, and finally respond to the speech or actions. This has always been one of the ultimate goals of artificial intelligence, which is one of critical components to structure the intelligent interconnections of the Internet of Things. Intelligent voice interaction technology has involuntarily become one of the current research hotspots [4] [5] [6] [7] [8].
For this purpose, first, we present a type of novel intelligent deep hybrid neural network algorithm to realize voice signal processing based on efficient embedded automatic speech recognition (EASR), speech understanding (SU) and semantics control. Second, we design a voice control motherboard using an embedded chip from the ARM series as the core. At last, on the basis of these, in order to provide critical components for constructing intelligent vehicles, intelligent service robots, intelligent offices, intelligent industries and so on and to realize the intelligent ecology of the Internet of Things [9] [10] [11], we present a model. The model is shown in Figure 1.  other key technologies based on the HMM system for automatic speech recognition (for example, using the maximum a-posteriori (MAP) probability estimation criterion [12] and the maximum likelihood linear regression (MLLR) [13] to solve the parameter adaptive problem of the HMM model  [28]. At the same time, some other neural networks were proposed based on these models, for example the sparse deep belief network (SDBN) [29], the sparse stack automatic encoders (SSAE) [30], the deep convolution generative adversarial network (DCGAN) [31] and so on. All of these have become main constituent models of deep neural networks, namely, deep learning [32] [33].

Previous Foreign and Domestic Studies
The concept of the Internet of Things (IOT) was first proposed by Professor Ashton of the Auto-ID Center of the Massachusetts Institute Technology in 1999 [34]. He presented the "intelligent interconnection of thing to thing", which uses information sensor equipment to collect information in real time and constitutes a huge network combined with the Internet [35]- [40]. As early as 1999, the Chi- An embedded system is a kind of dedicated computer system with an application as the centre. It is based on computer technology, can tailor software and hardware and can adapt to the application system that has stringent requirements on functions, reliability, costs, volume, power consumption and so on [42] [43]. An embedded processor is the core of an embedded system. It is the hardware unit that controls and assists the system's operations. At present, there are more than 1000 kinds of embedded processors in the world. The popular system architecture includes the embedded microprocessor unit (EMP), the embedded micro controller unit (MCU), embedded digital signal processors (DSP), embedded systems on chip (SOC) and so on for these four kinds [44].

Principle of Speech Recognition Control and Mathematical Theory Model
Although the recognition principle of all languages is similar, different languages have different recognition processes. The speech recognition control in the paper is based on Chinese, as shown in Figure 2 and Figure 3. Speech recognition control can be seen as the following process. Suppose the source signals are a series of words W that are uttered by someone, which are converted into speech signals O through a noisy channel. Speech recognition involves speech decoding, which can be considered as the problem of solving the maximum a posteriori probability (MAP) [12]. It is assumed that the speech signals have been expressed as a sequence of observation vectors, namely, speech feature vectors O.
To find the maximum a posteriori probability, calculate the posteriori probability of all possible sequences of words and find the maximum probability, represented as * W , as shown in formula (1): where τ is a collection of all words. Because ( ) The (random) language model can be expressed as the occurrence probability ( ) P W of word string W, which can be decomposed into: where i w is the ith word of the word string, and n is the number of words that W has.
It is unrealistic to estimate the conditional probability ( ) −  of all vocabularies and word sequences to use the simplified model. The n-gram model (n elements grammar model) is the language model that is used the most successful and widely used to date. It assumes that the conditional probability ( ) is only related to the preceding 1 n − words. As a result, it can be simplified as: Thus, ( ) P W approximates the following by using the binary grammar model, namely, 2-gram:

Backward Propagation Neural Network
The mean square error of the neural network training model can be expressed as:

of Computer and Communications
To obtain the optimization parameters, use the gradient descent method to minimize this function. The partial derivative that is being calculated is called the "residual" for each unit and is denoted as ( ) l i δ . Thereby, it can get all the residuals of the units in the last layer (output layer): Next, the residual of the individual unit in each layer (for example, where W denotes the weight, b denotes the bias, ( )

Deep Bidirectional Recurrent Neural Network (DBRNN)
Being based on the consistency and causality characteristics of speech signals, with an input series of sequential speech signal sequences, it can infer the output

Forward Propagation Algorithm
For single-hidden layer one-way RNN, to illustrate the input sequences ( ) , X R , the connection matrix of the input layer to the hidden layer is ih W , the recur-  ( ) The input and output of the backward hidden layer can respectively be obtained as follows:  (14) , and the output of each layer of the DBRNN can also be obtained in turn.

Time Domain Backward Propagation Algorithm
Because There are two sources of error signals being propagated to the hidden layer at moment t. One is error signals o t e of the output layer at moment t, and the other is the error signals From formula (17), it can be seen that the error signals of the neural network will be propagated with the inverse time axis from moment T to moment 1. The algorithm is also named the BPTT.
Therefore, the gradient of the RNN can be obtained as: At last, it can obtain the formula of the gradient computations as follows:  Then, it will update the model's parameters. By being based on these, it can further implement the learning and training of the DBRNN.

Experimental Environment
The relevant experimental equipment is shown below:  Hardware: 1) The core processing unit of the module adopts a Samsung

Experimental Process and Results
The implementation process of speech recognition semantics control is shown below. First, speech recognition can be divided into two parts, namely, speech training and recognition. In the process of training speech signals, input devices (for example, microphones and so on) can be used to obtain speech signals, make A/D conversions, and encode and decode digital signals. They can use the hybrid neural networks presented by us to conduct learning and training, and the training results are burned into the Flash so that achieve recognition in the subsequent speech recognition stage. Second, in the speech recognition phase, after the input speech signal is processed by the audio digital signal encoding decoder, the system notifies the embedded Linux operating system based on the ARM CortexTM-A8 and makes the match with the reference samples stored in the Flash. Thus, the best identification results are obtained, and they switch to the corresponding semantic vocabularies. Finally, it achieves corresponding I/O output controls by the system call functions of the embedded Linux operating system that is based on the semantic results. For example, it can realize the operation of turning on and turning off LED lights in intelligent furniture, other industrial equipment, and so on. The Linux system kernel controls the ARM Cor-texTM-A8 and calls its drivers, which should be implemented for the system call operations at least for the open, read, write, close and other system calls [50]. In the experiment, we also refer to the developmental boards of YueQian and the phonetic components of Hkust XunFei [51]. The experimental results are as follows.
To connect the power of the developmental board and the serial port line (one end to the PC, and the other end to development board), we use the software SecureCRT developed by us to download the programs to the ARM Cor-texTM-A8 board and conduct the cross-compilations. Voice data are obtained through recording devices, and the results are shown in Figure 6.
We use the ESP8266 tool developed by us to burn and write the hybrid neural networks and other algorithms presented by us to the storage of the ARM Cor-texTM-A8 board for embedded speech recognition processing. The results are shown in Figure 7.
The speech recognition semantics control system being implemented in this paper has stronger functions. It can realize the recognition of voice data from audio files and realize the recognition of voice data directly from the microphone and other input devices. The results are shown in (a) and (b) of Figure 8.
It has also realized the recognition of voice data directly from the microphone and other input devices, for example, the voice data "开灯" (Turning on light) and "关灯" (Turning off light). In the experiment, we have used six lights with ID numbers corresponding from 1 to 6 and have realized the switch operation of any light, such as No. 3 Figure 10. The control of the lights of two kinds of circuit boards respectively being realized (the speech recognition control of this paper is based on Chinese).

Summary and Prospect
The purpose of this paper was to assess the semantic interaction control for constructing the intelligent ecology of Internet of Things and conducting critical component research. First, we present a kind of novel intelligent deep hybrid neural network algorithm based on a deep bidirectional recurrent neural network integrated with a deep backward propagation neural network. This has realized acoustic analysis, speech recognition and natural language understanding for jointly constituting human-machine voice interaction. Second, we design a voice control motherboard using an embedded chip from the ARM series as the core, and the onboard modules include ZigBee, RFID, WIFI, GPRS, an RS232 serial port, a USB interface and others. Third, we take advantage of the algorithm, software and hardware to make machines "understand" speech of people and "think" and "comprehend" human intentions in order to structure critical components for intelligent vehicles, intelligent offices, intelligent service robots, intelligent industries and so on in order to structure intelligent ecology of the Internet of Things. At last, the experimental results denote that the study of the semantics interaction control based on an embedding has a very good effect, fast speed and high accuracy, consequently realizing the intelligent ecology construction of the Internet of Things. After the realization of the intelligent ecological construction of the Internet of Things through semantic interaction control, we will further complete the commercialization and scale use of the promotion, which are the directions of our future efforts. Journal of Computer and Communications

Conflicts of Interest
Hai-jun Zhang declares that he has no conflict of interest. Ying-hui Chen declares that she has no conflict of interest.