Parallel Cascade Correlation Neural Network Methods for 3 D Facial Recognition : A Preliminary Study

This paper explores the possibility of using multi-core programming model that implements the Cascade correlation neural networks technique (CCNNs), to enhance the classification phase of 3D facial recognition system, after extracting robust and distinguishable features. This research provides a comprehensive summary of the 3D facial recognition systems, as well as the state-of-theart for the Parallel Cascade Correlation Neural Networks methods (PCCNNs). Moreover, it highlights the lack of literature that combined between distributed and shared memory model which leads to novel possibility of taking advantage of the strengths of both approaches in order to construct an efficient parallel computing system for 3D facial recognition.


Introduction
The simultaneous execution of the same task on multi-processors in order to obtain results quickly is known as parallel computing [1].The advantage of this method is that a task is divided into many sub-tasks and processed by more than one processor simultaneously, resulting in a significant reduction in the response time.Parallel computing plays a very critical role in many application areas, especially those which require processing large amount of data, such as image processing, filtering, data visualization, and segmentation [2].On the other hand, automatic 3D pattern recognition is considered as a very hard to solve problem due to its non-linearity [3].In particular, it is presented as a template matching challenge, where recognition should be performed in a high-dimensional space [3].Therefore more computation is needed to find a match, which can be sorted by using a dimensional reduction technique to project the problem into a lower dimensionality space.Particularly, facial recognition based on 3D information is relatively new in terms of literature, algorithms, commercial applications, and datasets used for experimentation [3].Besides that, comparing different 3D facial recognition techniques is very challenging for a number of reasons.Firstly, there are very few standardized 3D facial databases which are used for benchmarking purposes.Secondly, there are differences in the experimental setups and in the metrics, which are used to evaluate the performances of pattern recognition techniques [3].The Cascade Corre-lation Neural Networks (CCNNs) can be considered as a good solution to the pattern recognition computation problems; one of its advantages is that they can reduce misclassifications among the neighborhood classes.
The rest of the paper is organized as follows: Section 2 looks at the state of the art and background of 3D features extraction and facial recognition.In Section 3 we present the approaches of parallel processing systems and its related work with facial recognition systems.A detailed literature on cascade correlation neural networks and its application with parallel systems are introduced in section 4. Section 5 presents the summary of this research with future directions for our work.

3D Facial Recognition Studies
According to Al-Qatawneh [3], there is a limit in 3D facial recognition researches in comparison with the wealth of 2D facial recognition researches, regardless of that there are a number of investigations which have demonstrated how geometric pattern structure could be used to aid recognition [3].For instance, 3D facial recognition has attracted more attention in recent years due to two major factors.Firstly, the inherent problems with 2D facial recognition systems which appear to be very sensitive to facial pose variation, variant facial expressions, lighting and illumination [3].On the other hand, for example, Xu et al. [4] compared 2D intensity image against depth images and concluded that depth map give a more robust face representation, because intensity images are significantly affected by changes in illumination [3].Secondly, the recent developments in 3D acquisition techniques, such as 3D scanners, use of infrared and other technologies which have made obtaining 3D data much easier than it was before.For 3D face recognition applications, two main requirements have to be met.The first requirement is to provide a powerful representation modelling technique for 3D facial image.The second is to provide a matching algorithm or criterion to recognize and distinguish between these models.While the second requirement has been subject to extensive investigation and research, the first requirement is still considered an open research area [3].In a paper presented by Bowyer et al. [5] covered this topic in detail by presenting a comparative survey of 3D face recognition algorithms.They concluded that 3D face recognition has the potential to overcome limitations of its 2D counterpart.In particular, 3D shape data of a face could be used to correct the corresponding 2D facial image, taken with a non-standard pose, to a standard pose [3].
As it explained in our previous work [3] Nagamine et al. [6] tackled face recognition by exploring facial profiles.They used horizontal section (extracted as an intersection of a face surface with a plane parallel to X-Z plane), vertical section (extracted as an intersection of a face surface with a plane parallel to Y-Z plane) and circular cross section (extracted as an intersection of a face surface with a cylinder (axis on Y-Z plane and parallel to Z-axis)).They extracted five feature points, and used them to standardize the face pose.For comparison between faces, Euclidean distance matching between feature vectors of different faces was used.It was concluded that vertical profiles that pass through the central region of the face give better recognition rates, circular sections which cross near the eyes and part of the nose also show some distinctiveness, while the distinctiveness of the horizontal profiles are not remarkable in themselves [3].
In addition, Hasher et al. [7] [8] used PCA and Independent Component Analysis (ICA) to analyze range images in a similar way to 2D intensity images and estimated probability models for the coefficients.For registration, and pose standardization they used the nose tip and the nose bridge [3].They used a database of 37 individuals with images of 6 different facial expressions for each [7].
Elyan and Ugail [9] presented a method to determine the symmetry profile of the face.They computed the intersection between the symmetry plane and the facial mesh and then computed a few feature points along the symmetry profile in order to allocate the central region of the face and extract a set of profiles from that region.In this approach, they assume that the symmetry profile passes through the tip of the nose.To locate the tip of the nose they fit a bilinear blended Coon's surface patch.Coon's patch is simply a parametric surface defined by four given boundary curves [9].These four boundaries of the Coon's patch are determined based on a boundary curve that encloses an approximated central region of interest, which is simply the region of the face that contains or is likely to contain the nose area [3].
Considering all this prior work which has been done, [3] asserted that there still remain a number of areas that 3D face recognition research needs to address.For registration, automatic landmark localization, artefact removal, scaling, and elimination of errors due to occlusions, glasses, beard, etc. need to be worked out.Additionally, methods of deforming the face without the loss of discriminative information would also be beneficial It is likely that information fusion is the future of 3D face recognition [3].There are many ways of representing and combining texture and shape information.Publicly available 3D datasets are necessary to encourage further research on these topics [3].
Table 1 gives a comparison of selected elements of algorithms that use 3D facial data to recognize faces.

Parallel Processing Studies
The meaning of parallel computing is when in order to obtain results quickly there is a simultaneous execution of the same task on multiple processors [1].The benefit of this method is that a task is divided into many subtasks and processed by more than one processor simultaneously, resulting in a significant reduction in the response time.Parallel computing plays a very critical role in many application areas, especially those which require processing large amount of data, such as weather forecast, data visualization, biology and engineering [29].
Parallel processing computers can be classified in many perspectives; such as implicit and explicit parallelism.When referring to implicit parallelism we mean that it is a built-in programming approach that is incorporated within parallel language and parallelizing compilers, it does not specify or control scheduling of calculations.Whereas the responsibility of explicit parallelism is on the programmer, with tasks such as task decomposition, synchronization, communication and so on.As another classification is based on instruction streams and data streams that were proposed by Flynn in 1996.Flynn proposed a four-way classification of parallel computers that are SISD (Single Instruction Single Data), SIMD (Single Instruction multiple Data), MISD (Multiple Instruction Single Data), and MIMD (Multiple Instruction Multiple Data) [1].
In 1988 E. E. Johnson proposed a new taxonomy which was based on memory structure such as shared/global memory or distributed memory [1].Communications are totally involved between processors in parallel computation and the mechanism which are used for communication/synchronization and is referred to as message passing.Fortunately, many message-passing libraries have been developed to provide routines to initiate and configure the messaging environment as well as to send/receive packets of data between processors.The two most popular message-passing libraries are Parallel Virtual Machine (PVM) [30] and Message Passing Interface (MPI) [30], while the most popular routines as shared address space paradigms are the POSIX Thread [30] and OpenMP [30] as illustrated in Figure 1.
Recently, researchers tried to use General Purpose computation on Graphics Processing Units (GPGPU) as parallel programming approach.GPGPU are techniques to program GPU chips using application programming interface (API) functions such as OpenGL, Direct3D and CUDA [31] and are used in order to obtain results quickly.However, Graphics Processing Units (GPUs) are highly threaded streaming multiprocessors of very high computation and data throughput [32].In 2006, CUDA (Compute Unified Device Architecture) was created by NVIDIA which is a parallel computing platform and programming model and implemented by the graphics processing units (GPUs).CUDA has been widely deployed through thousands of applications and published research papers such as astronomy, biology, chemistry, physics, and data mining.Supported by an installed base of over 300 million CUDA-enabled GPUs in notebooks, workstations, compute clusters and super  computers [33].
However, there are several mechanisms to parallelize large amounts of data; function decomposition, data decomposition or both.Data decomposition consists of horizontal distribution and vertical distribution or both.Moreover, hybrid parallelism merges both data parallelism with horizontal or vertical distribution and function decomposition [1] as shown in Figure 2. The one that is probably the most important programming language in computer science used to analyze data, develop algorithms, and create models and applications is a MATLAB [34].Over recent times, MALTAB has gained much popularity and has been applied in many fields including image processing, bioinformatics, engineering, medical, signal processing, communications and parallel computing.Additionally, MATLAB has parallel computing built-in functions that lets the researcher to solve computationally and data-intensive problems using multi-core processors, GPUs, and computer clusters [34].MATLAB multicore programming is implemented based on threads.Threads are lightweight processes; and it is very much easier to write threaded programs because the applications are threaded and run on a single machine, can also run on multiple machines without changes.This ability of migrating programs between different platforms is of great benefit to threaded APIs.Additionally, threads which run on the same processor reduce the latency of accessing memory, I/O and communications.
This can be done by a different number of ways such as it can measure the parallel performance of a given application such as Amdahl's law, speedup (S), Efficiency (E) and Overhead.The time of execution is defined by the serial runtime of a program and is the time elapsed between the beginning and the end of its execution on a sequential machine (Ts).The Parallel run time (Tp) is the time which elapses from the moment a parallel computation starts to the moment the last processor finishes execution.Overhead function (To) or total overhead of a parallel system is the total time collectively spent by all processing elements over and above that required by the fastest known sequential algorithm for solving the same problem on a single processing element; as given by Equation ( 1): The measure that captures the relative benefit of solving the problem in parallel is called Speedup (S).It is defined as the ratio of the time taken to solve a problem on a single processing element to the time required to solve the same problem on a parallel computer with p identical processing elements; given by Equation (2): The Efficiency (E) is a measure of the fraction of time for which a processing element is usefully employed; it is defined as the ratio of speedup to the number of processing elements; as given by Equation (3):

Parallel Neural Networks for Facial Recognition
The equipment and ability to use automatic systems for face recognition research has emerged over the last few decades.One reason for growing interest in this topic is the wide range of possible applications for face recognition systems.There is the possibility of training neural networks but it is time consuming task.Therefore, substantial efforts are dedicated to cope with the training time in different fields on different hardware architectures.Such as, Altaf et al. [35] proposed two techniques of parallelizing back propagation neural networks using Multicore programming based on multithreading and general-purpose computation on graphics processing units (GPGPU).While Xavier et al. [36] presented the parallel training of a back-propagation neural network using CUDA on GPU.The implementation was tested with two standard benchmark data sets which provided by PROBEN1.The two parallelization strategies on a cluster computer; training example and node parallelism using MPI approach was presented by Pethick et al. [37].Lyle et al. [38] proposed the scalable massively parallel artificial neural networks for pattern recognition application.In this approach, the MPI used to parallelize the C++ code.However, each layer is distributed equally over all processors, which mean they used the data decomposition approach to parallelize the algorithm.There are further versions of parallel neural network which uses the task decomposition paradigm for cluster system where they duplicates the full neural network at each cluster node was presented by Dahl et al. [39].Their sys-tem was implemented using MPI library.Schuessler et al. [40] proposed a method for parallelization of neural network training based on the backpropagation algorithm and implemented it using two different multithreading techniques (OpenMP and POSIX threads) applicable to the current and next generation of multithreaded and multi-core CPUs.

Cascade Correlation Neural Networks
The training of back-propagation neural networks is considered to be a slow process because of the step-size and moving target problems [3] [41].To overcome these problems cascade correlation neural networks were developed.These are "self organizing" networks [3] [41] with topologies which are not fixed.The supervised training begins with a minimal network topology and new hidden nodes are incrementally added to create a multi-layer construction.The new hidden nodes are added to make the most of the correlation between the new node's output and the remaining error signal that the system is being adjusted to eliminate.The weights of a new hidden node is fixed and not changed later, hence making it a permanent feature detector in the network.This feature detector can then be used to generate outputs or to create other more complex feature detectors [3] [41].
In a CCNN, the number of input nodes is determined by the input features, while the number of output nodes is determined by the number of different output classes [3].The training of a CCNN starts with no hidden nodes.The direct input-output connections are trained using the entire training set with the aid of the back propagation learning algorithm [3].Hidden nodes are then added gradually and every new node is connected to every input node and to every pre-existing hidden node.The goal of this adjustment is to maximize S, the sum overall output units o of the magnitude of the correlation1 between V, the candidate unit's value, and o E , the residual output error observed at unit o. S can be defined as: ( ) , ( ) where o is the network output at which the error is measured and p is the training pattern.The quantities V � and o E are the values of V and o E averaged over all patterns.Training is carried out using the training vector and the weights of the new hidden nodes are adjusted after each pass [41].Cascade correlation networks have a number of attractive features including a very fast training time, often a hundred times faster than a perceptron network [41].This makes cascade correlation networks suitable for use with large training sets.
Depending on the application and number of input nodes, cascade correlation networks are fairly small, often having fewer than a dozen neurons in the hidden layer [42] [43].This can be contrasted with probabilistic neural networks which require a hidden-layer neuron for each training case.Also, the training of CCNNs is quite robust, and good results can usually be obtained with little or no adjustment of parameters [41].

Parallel Cascade Correlation Neural Networks
As mentioned before, Fahlman et al. [41] claim that Cascade Correlation algorithm is attractive for parallel implementation because the candidate units do not need to interact which means it can be trained independently.Few previous efforts have been made to enhance cascading correlation neural networks.Few research efforts have reported parallel cascading correlation neural network in different fields.David German [31] proposed a project proposal titled computing hardware for accelerated training of cascade-correlation neural networks by parallelization of multiply-accumulate operations implemented on a field-programmable gate array (FPGA) in communication with a host PC.However, the author was not clear enough and did not mention the parallel technique in details.Moreover, the authors did not mention their benchmark to evaluate their method such as speedup, overhead,..etc.
Moreover, Ingrid et al. [44] proposed the parallel training data of recurrent cascade correlation learning architecture (RCC) to recognize Japanese phonemes using a method called time-slicing.However, it was intended by the authors that in parallel RCC there will be a large number of training patterns or the training set should be divided into smaller chunks which are to be trained separately and sequentially, this is done in such a pattern that it goes from the simplest to the most complicated one.Therefore, the authors did not use the standard parallel computing concepts and models such as shard memory, distributed memory.Furthermore, they did not use parallel programming approaches such as multithreading, MPI or GPGPU.

Summary and Future Work
The purpose of this paper was to propose a brief assessment for existing 3D facial recognition techniques.As well as, highlight the potential of using parallel CCNNs methods for 3D facial recognition systems.The automatic processing and characterization of 3D scanned images is still considered a challenging problem with a relatively few papers in the public domain addressing it.Semi automatic approaches, where initial assumptions are made about the pose of the face are often encountered to simplify the problem.On the other hand, extracting facial features points is considered to be an essential stage in any facial recognition systems, which can be utilized to allocate the central region of the face and extract a set of effective profiles from that region in order to use them effectively with neural networks algorithms for recognition and classification purpose.The neural networks have been applied in many fields such as face recognition, speech recognition, and pattern recognition.Parallel neural network was implemented using different parallel techniques such as GPGPU [35] [36], MPI [37]- [39] and multithreading [40].However, only two-research effort have reported parallel cascading correlation neural network in different fields using FPGA.From the review, the authors noted the previous works did not implement the algorithm using stranded parallel approaches.Moreover, there is not any work hybrid between distributed and shared memory model or used both of GPU and CPU methods.Some areas of future work include develop and implement a compact representation of facial data by reconstructing a 3D triangulated human face containing the coordinate and connectivity information to simplify the process of recognition in order to extract robust and distinguishable features.After that we will propose an efficient parallel computing system for 3D facial recognition using a multi-core programming model that implements the cascade correlation neural network technique (CCNN), which is widely recognized as appropriate and efficient validation methods as explained before.

Figure 2 .
Figure 2. A mechanism to parallelize large amounts of data.

Table 1 .
Summary of recognition algorithms using 3D facial data.