Detection and Selection of Moving Objects in Video Images Based on Impulse and Recurrent Neural Networks

The purpose of the article is to develop a methodology for automating the detection and selection of moving objects. The detection and separation of moving objects based on impulse and recurrence neural networks simulation. The result of the work is a developed motion detector based on impulse and recurrence neural networks and an automated system developed on the basis of this detector for detecting and separating moving objects and is ready for practical application. The feasibility of integrating the developed motion detector with Emgu CV (OpenCV) image processing package, multimedia framework functions, and DirectShow application programming interface were investigated. The proposed approach and software for the detection and separating of moving objects in video images using neural networks can be integrated into more sophisticated specialized computer-aided video surveillance systems, IoT (Internet of Things), IoV (Internet of Vehicles), etc.


Introduction
The continuous population growth, coupled with the automotive industry development made it necessary for the implementation of Intelligent Transport Systems (ITS). ITS is a core part of intelligent logistics and IoV (Internet of Vehicles). Previous approaches, such as widening roads, and increasing their number, have become useless. This is especially evident in the large cities of the recognition [10] is related to the fact that how the human brain processes information is fundamentally different from that employed by conventional digital computers [11]. The brain is an extremely complex, non-linear, parallel computer (an information processing system). It has the ability to organize its structural components, called neurons so that they can perform specific tasks (such as pattern recognition, sensory signal processing, motor functions) many times faster than the fastest modern computers can afford. Ordinary vision is an example of this task of information processing. The function of the visual system is to create an image of the environment in a way that enables interaction with that world. More precisely, the brain successively performs a number of recognition tasks (such as recognizing a familiar face in an unfamiliar environment). It takes about 100 -200 milliseconds to do this, while even smaller tasks on a computer can take several days.
The objective of this work is to develop a methodology for automating the detection and selection of moving objects, a motion detector based on impulse and recurrence neural networks and an automated system developed on the basis of this detector in order to integrate it with conventional video surveillance systems used for violations video recording.

Model of Neuron Impulse Network to Select Moving Objects
The general structure of the impulse neural network used to isolate moving objects and used in the detector being developed is shown in Figure 1 [12] [13]. The input layer of neurons is an analog of the retinal photoreceptor layer, so what we're going to call the first layer neurons receptors. Each pixel ( ) , x y of the input frame of the video image corresponds to its receptor ( ) , r N x y . The hidden layer is an analog of the retinal interlayer of the eye. It consists of two independent arrays of neurons 1 N and 2 N . They are the same size as the first layer and are connected by synaptic connections to both the input layer of neurons (receptors) r N and the output layer of neurons out N .  , N x y starts to generate impulses, since the delayed signal (for a period t ∆ ), flowing from the excitatory synapse is stronger than the signal from the inhibiting synapse. In other words, the neural network begins to react to the variation in pixel brightness that can be caused by the passage of objects moving over a static background. The result of the work/simulation of the detector based on the impulse network model is given in Figure 2. The output layer out N of the neural network has the same dimensions as the input layer and the hidden layer. Each neuron of a given layer

Recurrent Neural Network Model
Recurrent neural networks are a class of neural networks with feedback between different layers of neurons [14]. The special feature of such networks is the transmission of signals from the output (hidden) layer to the input layer. The particular interests are the multi-layer recurrence networks, which are the development of single-directional perceptron networks through the addition of feedback links. On each layer of such a network, there is a unit delay element that allows the input flow to be considered as unidirectional. Such a recurrent neural network functions as a single-directional perceptron network. The learning algorithm of such a network is more complex due to the dependence of signals at time t on their value in previous moments [15].
There are different models of recurrent neurons: Recurrent Multilayer Perceptron, Real-Time Recurrent Network, etc. The most interesting is the RTRN network (Real-Time Recurrent Network) proposed by R. Williams and D. Zipser and intended for real-time signal processing [15]. The RTRN network is a special case of the Elman network. The structure of the RTRN network is shown in Figure 3.
The network contains N input nodes, K hidden neurons, and K corresponding nodes of the context layer. Of the K hidden neurons, only M forms the output of the network. All output neuron signals are connected as neural network inputs through the z − 1 delay elements. Denote the weighted sum of the i-th neuron of the hidden layer u i , and the output of this neuron is y i . The vector x(t) and the deflected (delayed) vector y(t − 1) form an extended activation vector x(t) that excites the network neurons: After describing the input vector of the network at t, you can determine the state of all neurons according to the dependencies:

Furthermore f () denotes a continuous neuron activation function (sigmoidal).
We also determined 0 , , , as a vector of weights of the i-th neuron and -the matrix of weights of neurons. The William-Zipser learning algorithm is used to train the RTRN network, namely to minimize the following criterion: where After minimizing the weight criterion (4) w αβ (matrix weight W) can be defined as follows: The proposed model of a recurrent neural network can be used for detecting moving objects in a video image. The video image is a sequence of frames, and each frame is a set of pixels defined by the corresponding parameters. We will assume that the input of the neural network receives a sequence of frames, and the output is expected to receive a processed sequence of frames on which the moving object is selected from the general image.
The set of pixels of the input frame of the video images will correspond to the input flow of signals for the input nodes of the neural network. Thus, take the number of input nodes of the N network equal to the number of pixels per frame.
Since the output of the neural network is expected to receive a processed frame, the number of neurons in the hidden layer K = N is determined. The output of the hidden layer of neurons is the output of the neural network, and also the input signals for the delay elements z − 1 (contextual layer).
Each hidden layer neuron receives a signal from each input node and each context layer delay element. Thus, each neuron of the hidden layer at time t has information about pixels of the current frame and the processed (previous) frame at time t − 1. Processing pixel information of the current and previous (processed) frames hidden neuron frame emits a signal.
At the beginning of time t 0 , when the delay elements do not have information about the processed frame, the first frame of the video image enters the network. At the same time, a learning process (the installation of neural network weights) is taking place, which is aimed at putting the network in an equilibrium position.
Equilibrium is achieved when input and processed frames are identical, i.e. there is no movement in the video image.
When motion appears in a video image (there is a difference in two sequential frames), the configured neural network will go out of equilibrium, which will be displayed on the processed frame. In the next step, the neural network will attempt to enter a state of equilibrium based on the processed frame. This process will continue as long as movement is present in the video.
One of the variants of the development (modernization) of the proposed model of a recurrent neural network for a motion detector is the reduction of redundancy in the connections between the hidden layer of neurons and the input layer of signals, as well as the context layer. This redundancy in communications makes it difficult to train the network and process high-resolution video images. It is proposed to reduce the number of connections and to bind each neuron of the hidden layer to the corresponding input element of the input layer (pixel) and elements adjacent to this pixel. The adjacent pixels can be 4 or 8 adjacent pixels with 4 or 8 connections (Figure 4).
The proposed option to reduce redundancy would decrease the learning time of the network. When the number of connections is reduced, it is necessary to carry out a smaller number of calculation operations, which will decrease the demands on the computational resources of the motion detector. The processing time of each frame and the time of the sensor reaction to movement will also be reduced, which will make it possible to use the proposed technique for high-frequency video images.

Structure and Operation Algorithm of the Automated System
The design and development of the automated system used the principle of modularity, i.e. the system consists of several software modules that are relatively independent and easily integrated with each other. This allows changes to be made to one software module without affecting the other. The structure of the automated system is presented in Figure 5. The first module is a graphical interface for setting the system parameters and displaying the results. This module was created on the Microsoft NET Core 3.1 platform using Windows Form technology for user interface and interaction with data models. The user selects and changes the source of the video sequence, and specifies the format of the data output (output to the screen or write to the file). As a result, this module forms a set of parameters that are used in other modules of the software complex.   The second module is a module for pre-and post-processing the video sequence. This module interacts closely with the other modules of the system. The pre-processing of a video sequence consists of obtaining a video image from a source and in preparing for the subsequent transmission to the module of processing a recurrent neural network. Postwork consists of displaying processed data to a user screen or writing to a file.
The third module is a module for processing a recurrence neural network. This module is an implementation of a recurrence neural network. The input of the given network receives a sequence of frames of the original video. The result of the work of the module is processed frames of the initial video with an image of moving objects.
As can be seen from Figure 5, the system uses the Emgu CV (OpenCV) package libraries extensively [16]. This is due to the fact that this computer vision package provides a fairly wide range of well-developed and well-organized image processing classes and methods. It also uses the DirectShowNet library, which allows working with graphics input devices and the Tiger.Video.VFW library to work with video files.

Recurrent Neural Network Module
The recurrent neural network module is one of the components of the automated system being developed. Its task is to process the incoming video sequence to select moving objects. Processing is performed by a recurrent neural network [14] [16].
The result of the work/simulation of the detector based on the recurrent network model is given in Figure 6 and The recurrent neural network processes each incoming frame according to the algorithm presented in Figure 7.  After the frame is received, a signal flow is formed on the basis of information about the brightness of pixels of the input frame.
The processing of the input flow of signals takes place after a delay signal is received indicating that all signals are received and can be processed. Input flow signals and output stream signals received during the processing of the previous sequence frame are accepted for processing. At an initial time, the output flow is equal to the input stream of the signals.
During processing, the recurrent neural network forms an output stream of signals, which is sent to the input of the network and transformed into an output frame. The converted frame is transferred to the pre-work and post-work module for further use.
The input frame is received by a module for pre-processing and post-processing the video sequence, and the module receives an output frame after processing by a recurring neural network.
The input and output frame is an object of type System.Drawing.Bitmap. All modules are written in C# programming language in Microsoft Visual Studio Enterprise 2019.
The result of the module is an output frame sequence on which only the se-lected moving objects are located. An example of the results of the module is presented in Figure 8.

Video Sequence Pre-and Post-Processing Module
This module is intended for obtaining an input video sequence from different sources (camera, file). The DirectShow Multimedia Framework and Application Programming Interface (Figure 9) functions are used to obtain a video sequence from a capture device (camera). DirectShow allows Windows applications to manage a wide range of audio/video input/output devices, including DV and web cameras, DVD devices, TV tuners, etc.
Capture filters are used to obtain a video sequence from the capture devices. Capture Filters are designed to inject multimedia data into the program stream from various physical devices. The role of the device can be both various video devices (portable video cameras, webcams, TV-tuners) and audio devices (microphone, modem line) as well as data can be obtained from files (AVI, MPEG, MP3).   If a capture device is selected as a video sequence source, this information is transmitted via API functions to the DirectShow package filter core, so this package must be preinstalled on the computer where the developed system is used.
If an existing file is selected as a video sequence source, this information is transmitted via API functions to the Tiger.Video.VFW package. The Tiger.Video.VFW package is intended for reading and writing AVI files. The package has a set of features implemented through low-level access to the Windows file system. This makes it possible to quickly work with video files as a source of a video sequence for further processing by the system.

Conclusions
The paper describes promising research directions in the field of artificial intelligence motion detection using neural networks.
The models of impulse and recurrence neural networks for the detection and The processing time of each frame and the time of the detector reaction to motion will also be decreased, which contributes to the application of the proposed detector and methodology improvement of automation of the detection and selection of moving objects for high-frequency video images.
The suggested approach for detecting and separating moving objects is an attempt to simulate the ability of the human eye to quickly isolate moving objects and surpass existing deterministic methods in terms of speed of selection of moving objects and economy computing resources. And the motion detector being developed on the basis of this approach, as a software module, will be able to find a suitable application in the field of digital video processing. It is in-tended to use this detector in automated traffic management systems [17] [18] [19] as an alternative to existing detectors, even taking into account the possible improvement of the latter by the use of parallel computations for simultaneous processing of segments of the video image and the selection of moving objects within each of them [20].
The disadvantage of the proposed approach for the detection of motion in video is that as the image resolution increases, the number of neurons in the network increases dramatically, resulting in noticeable delays in the processing of the signal flow of neurons.
Impulse neural network elements can be implemented in hardware [21], or software with the use of modern parallel computing technologies based on graphics processors. This can significantly accelerate the selection of moving objects in a video image.
As a result of the work carried out, an automated system for separating moving objects on a video sequence based on a recurrent neural network has been created and is ready for practical use. The automated system consists of three independent modules (a graphical interface of the automated system, a recurrent neural network, and a pre-and post-processing video sequence). The structure of the automated system follows the principle of modularity, which allows for the development of modules independently from each other and allows for future changes in any of the modules without affecting the other.
The paper explored the possibility of integrating programs written in object-oriented programming languages (in particular C#) with the Emgu CV image processing package and the multimedia framework and programming interface functions of DirectShow, selected the most efficient Emgu CV package tools for image processing.
The developed automated system of moving objects selection uses the graphical core of video stream processing based on the package Emgu CV, namely the functions of transforming images into semitone (in grayscale shades). However, the disadvantage of this approach is the need to have Emgu CV package libraries on the computer where the proposed automated system will be used [16]. The proposed automated system is a software product that makes it possible to separate moving objects in a video stream based on recurring neural networks. This software provides the opportunity to upgrade in order to improve the detection of moving objects by modifying the module of the recurrence neural network.
The proposed approach and software for the detection of moving objects in video images using neural networks can be incorporated into more sophisticated specialized computer-aided video surveillance systems, IoT, IoV, etc.
Future work directions may include extending the proposed approach and motion detector for global traffic management, including the optimization of next tasks: providing city services with up-to-date information about the current state of the transport infrastructure by creating a measurement system based on IoT, Industrial Internet of Things (IIoT), and IoV technologies; adaptive control systems implementation for traffic lights, bollards, and other road infrastructure devices based on information about traffic congestion in order to redirect and synchronize traffic flows, prevent the traffic jams formation; intelligent systems implementation for: forecasting the transport situation in the city, costs optimization of the transport infrastructure development by predicting possible results of decision-making; the use of intelligent systems for modeling the transport situation in the design of roads, intersections, crossings, traffic lights, and other tasks; monitoring systems implementation in order to detect the location and current state of urban transport, creation of "smart" stops; informing citizens systems creation about the current roads congestion and the condition of parking lots (free places availability), including mobile applications; transport flows synchronization of various types (buses, subways, trams, etc.), to reduce the time that passengers spend on transfers; introduction of unmanned road transport and railway systems, including the metro; automation of management processes and accounting of the work of contractors engaged in snow removal on roads, crossings, and stops, adaptive management of cleaning taking into account the current weather and transport situation; the use of unmanned aerial vehicles/drones for traffic monitoring, the snow removal quality on roads and crossings, the quality of road surface, etc.; the transition to the use of wireless technologies (communications, power supply).