FPGA Simulation of Linear and Nonlinear Support Vector Machine

Simple hardware architecture for implementation of pairwise Support Vector Machine ( SVM ) classifiers on FPGA is presented. Training phase of the SVM is performed offline , and the extracted parameters used to implement testing phase of the SVM on the hardware. In the architecture , vector multiplication operation and classification of pairwise classifiers is designed in parallel and simultaneously. In order to realization , a dataset of Persian handwritten digits in three different classes is used for training and testing of SVM. Graphically simulator , System Generator , has been used to simulate the desired hardware design. Implementation of linear and nonlinear SVM classifier using simple blocks and functions , no limitation in the number of samples , generalized to multiple simultaneous pairwise classifiers , no complexity in hardware design , and simplicity of blocks and functions used in the design are view of the obvious characteristics of this research. According to simulation results , maximum frequency of 202.840 MHz in linear classification , and classification accuracy of 98.67% in nonlinear one has been achieved , which shows outstanding performance of the hardware designed architecture.


Introduction
Today, many algorithms in the fields of signal processing, including speech processing, image processing, machine vision, data mining and pattern recognition, is developed on software and every day new progress in their development is achieved.Many of these algorithms run with lower frame rates because of computational complexity.Fast and real-time processing requirement yield hardware implementation of these algorithms.But, implementing these algorithms on hardware such as microprocessors, DSP or FPGA processors continue to be an important challenge [1].
Recently FPGAs have been introduced as one of the hardware used in the field of digital signal processing and provided acceptable results in real-time applications, using parallel processing [2].
SVM is one of the trusted and ultra high-performance algorithms, has been widely used in the fields of linear and nonlinear classification and regression problems [3].As regards that the structure of SVM includes a second optimization process, so practical applications of this classifier will be limited, due to the computational complexity and power consumption of training phase.
Obviously useful applications of online SVM require a hardware implementation suitable for training and testing phase.But considering the type of application, training phase of SVM can be performed offline in a computer by software.Therefore in these cases only testing phase of the SVM, that includes a decision function for classification target and is performed very fast, will be implemented on hardware [4].
Anguita [5] proposed a fully digital hardware structure for implementation of testing and training phase of SVM using linear and RBF kernel functions.The proposed structure in addition to consuming too much hardware resources, has not an acceptable processing speed.Maximum frequency obtained by Anguita was 35.3 MHz.Faisal Khan [6] avoided of fixed-point computations and used the logarithmic number system for implementing testing phase of linear SVM.Although the proposed structure is suitable for hardware,  b  but always may be difficulties in converting real numbers to their equivalent logarithmic.Ramirez [4] implemented a linear SVM for classification of threedimensional MRI images.Classification accuracy and processing time was 95% and 109.7srespectively, which was 5% less and 1.84 times more than a 550 MHz PC.In this research maximum frequency of 202.840MHz in linear classification, and classification accuracy of 98.67% in nonlinear one has been achieved, while processing time is much more than PC, which is comparable in literatures [1,7,8].
The rest of this paper is organized as follow: after introduction, Section 2 describes the basics of linear, nonlinear and multiclass SVM.In Section 3 proposed algorithm for SVM hardware implementation, includeing software implementation and hardware architecture design has been described.Finally our simulation results are given in Section 4, followed by conclusion and suggestions in Section 5.

Support Vector Machine
SVM is a technique of classification and regression introduced in 1990s by Vapnik [9,10].SVM is essentially a binary classification, but possible to classify samples of multiple classes.Also it can be used to solve either linear and nonlinear classification or regression problems.

Linear SVM
Consider a linear classification problem aiming to find optimal separating hyperplane with maximum margin.Suppose x i is the feature vector set of training samples that are linearly separable and have been labeled in classes1 and 2, as Figure 1.There might be some data that fall into the margin, called slack variables.
So decision function is defined as follow: Optimal separating hyperplane can be obtained by solving the following dual Lagrange problem: in which C is a trade-off parameter between maximization the margin and minimization the error, and is determined by user [11].
By solving dual Lagrange function, α is obtained.Subsequently w and b are achieved too, according to the following: where SV is the number of support vectors.Now we can classify an unknown data x, by utilizing (3) and (4) into the decision function Equation (1), which yield to the following equation: (5)

Nonlinear SVM
A question that is being asked here is that what to do if data are not linearly separable?Is it possible to generalize the idea of a linear SVM to nonlinear systems?Extensive studies conducted in this area resulted in Mercer theory [12].The idea behind this theory is to transfer vector x from limited space (input space) to the higher space (feature space), using Hilbert conversion, as Figure 2. In this context a vector x in the feature space is converted into the φ(x).
According to this theorem, decision function of Nonlinear SVM can be acquired as following:

 
where: x x (7) in which the term has been replaced with a kernel function defined as follow: where, well-known kernel functions have been introduced in literatures [13][14][15].

Multi Class SVM
SVM is a binary classifier, but some methods can be used to solve multi-class problems with this classifier.Four methods of multi-class SVM are: In one against all method, if we have n class, n binary SVM classifier will be formed, so that in i-th classifier, class i is separated from the others.But in this status still there will be unclassified regions according to Figure 3 [11].
To solve this problem, Krebel [16] proposed pairwise classifiers method, in which to solve n class problem,   1 2 n n  binary classifier will be formed.This method compared to one against all is better but still there will be unclassified region according to Figure 4.
In the pairwise classifier method, for classification of an input data x, in each binary decision function d ij (x) a class is selected, and finally the one with the most votes is selected as desired class.

The Proposed Algorithm for SVM Hardware Implementation
To implement SVM classifier architecture on the hard- ware, we must first perform learning phase of the SVM, which has a complicated algorithm, offline on the software, and then use extracted parameters to implement testing phase on the hardware.In this research training phase of the SVM is implemented in the MATLAB software environment utilizing libsvm [17] model, and the graphical hardware simulator, System Generator, will be used to design and implement of testing phase.
In order to realize designed architecture and desired results, three different classes of data related to Persian handwritten digits, 1, 4 and 8, are used for training and testing samples of the SVM [18].We named class1 for 1, class2 for 4, and class3 for 8.So the architecture will be designed for a 3-class SVM, according to pairwise method.800 data from each class (total 2400 samples) is provided.From each class of data, 600 samples have been used for training and 200 samples for testing.Data has feature vector dimension of 7 and 24, which have been extracted by using of geometric moments approach [19].We use 7-d data for linear and 24-d data for nonlinear SVM.

Software Implementation of SVM
Before hardware implementing of the SVM architecture, it is necessary to run both training and testing phase in MATLAB.The target is to get the lowest classification error rate by changing various parameters.In additions, in that lowest error rate, necessary parameters will be extracted for offline classification on the hardware.In order to implement software SVM, despite SVM toolbox of the MATLAB, many other codes written by researchers have been proposed.Libsvm model has been introduced as one of the best and trusted tools for implementation of SVM, and is used in most of academic papers and researches, so we utilize Libsvm too.
In general, before SVM implementation, the type of kernel function must be specified.We used linear and Gaussian kernel functions to implement SVM.

SVM Software Implementation with Linear Kernel
Linear kernel is the simplest type of kernel functions.stead, there is a parameter γ in Gaussian kernel, affecting the classification error rate, which defined as following: Because of its simplicity in structure, and fast computation, it is a very good choice to for hardware implementation.In this research, experiments show that by changing the parameter C, the classification error rate will be in change too.In order to achieve the lowest error rate, we draw diagram curve of error rate versus changing parameter C of SVM structure.Figure 5 shows this diagram in classification of Persian handwritten digits with 7-d data.

   (9)
Choosing appropriate gamma value is a very important issue to find the optimal classification error rate.Exact formulation doesn't exist to calculate an appropriate value of gamma, and often try and error method is used to get that.In order to attain that value in nonlinear classification of Persian handwritten digits with 24-d samples, a diagram curve of error rate versus changing of gamma is considered in Figure 6, which shows that gamma = 0.95 offers the lowest error rate.So in the next step we use gamma = 0.95 and again run training and testing phases of SVM.Table 2 shows numerical results of this classification.
Figure 5 shows that the best classification error rate is obtained at C = 5500.But if the same value of C is used for implementation on hardware, have to assign much number of bits.If the value of C is equal to 100, due to the low number of bits, desired hardware design can be implemented, while increasing in error rate is not very sensible.

Hardware Architecture Design
Table 1 shows numerical results of linear SVM classification by using C = 100, in which SV and b indicate the number of support vectors and bias term, respectively.
Figure 7 shows block diagram of hardware architecture design for 3-class SVM classification according to simultaneous pairwise classifiers method.This design is the same for linear and nonlinear SVM, but only kernel block is differently designed.

SVM Software Implementation with Nonlinear Kernel
Gaussian kernel function is one of the kernel functions that mostly has been used in support vector machines, and provided much better results compared to other ones [20].In this research, the Gaussian kernel function is used for training and testing nonlinear SVM.Here the results show that by changing the parameter C, classification error rate will not be changed anymore.But in-

System Generator Design of Linear SVM
Before FPGA implementation, in order to simplify computational operations it is better to merge dot product of vectors α and y into a new vector called yα. Figure 8 shows matrix form of linear SVM decision function for classification of Class1 & 2, i.e. d 12 (x).To design hardware architecture of above function, a combination of series and parallel methods have been used.Vector of yα located in YAlpha ROM which its counter is the same as SV ROMs.Blocks of Mult1 to Mult7 and Addsub1 to Addsub6 perform inner product between test data and support vectors; this result is multiplied by corresponding array of yα vector in Mult9.The remaining operations is sum of this values, that is done serially by Accumulator and +b blocks.

System Generator Design of Nonlinear SVM
For implementation of nonlinear SVM, 24-d data from Persian handwritten digit database has been used.Decision function equation of Nonlinear SVM with Gaussian kernel function is as follows: Due to the structural limitations in the FPGA, it is better to simplify above function, before implementation of it.Suppose that A is a vector as follows: Norm A is defined as: And Square of norm A: The above equation can be simpler as follows: So we can easily implement by using (14).For implementation of exp function, there is a CORDIC block in System Generator that produces sinh and cosh outputs.By knowing following equation, exp function can be implemented utilizing a CORDIC block and an Adder.
Figure 10 shows FPGA architecture for computation of α i y i K(x i , x) in nonlinear Gaussian SVM, while other parts of design are the same as linear one, except that here data dimension is 24-d.The value of  is located in constant2 block.

Simulation Results
First step for simulation of designed architecture is to choose FPGA part number.We use Xilinx Virtex4-xc4vsx35 device for linear and nonlinear SVM simula- tion.Fixed point numbers is used for quantization.For linear simulation Q24.16 and for nonlinear one Q24.14 quantization format is used.To avoid increasing the word length in serial computations, truncation method is used.In Table 4 hardware resources used in Virtex4 device for implementation of above classifier is shown.

Nonlinear SVM Simulation
Simulation result of hardware designed architecture of nonlinear SVM is shown in Table 5.In this case classification error rate of 1.33% and total time of computation of 0.27 ms are considerable.
In Table 6 hardware resources used in Virtex4 device for implementation of nonlinear SVM classifier is shown.

Conclusions
In this research FPGA architecture of linear and nonlinear pairwise SVM classifiers for detection of 3-class of Persian handwritten digits has been proposed.This design can be generalized to other SVM classification applications with no limitation in the number of training and testing data.Also it is Possible to increase the number of pairwise classifiers.But there is a restriction only on the FPGA hardware resources.This problem can be solved by utilizing multi FPGAs or some external memory devices.
In order to continue the research, proposing a method for implementation of other nonlinear kernel functions is recommended.Since that for a certain application one of the kernel functions respond better classification results than the others, if the solutions to implement all these functions exist, the designer will have a tendency to achieve the best classification accuracy.Another suggestion is to implement the entire process of SVM, including training and testing phase, on the FPGA.Training phase of the SVM includes optimization problem Equation ( 2), which has a very time consuming and complicated process of solution for hardware implementation purpose.If an appropriate and hardware friendly solution is found to simplify the problem, then whole process of SVM implemented on the FPGA, so that no need to implement the training phase in software environment which may cause quantization errors in testing phase hardware implementation.

Figure 1 .
Figure 1.Optimal separating hyperplane with maximum margin and slack variables.

Figure 2 .
Figure 2. Transferring a data x form input space (left) into the feature space (right).

Figure 3 .Figure 4 .
Figure 3. Unclassified regions in one against all method.

Figure 6 .Figure 5 .
Figure 6.Diagram curve of nonlinear classification error rate versus changing parameter γ. Figure 5. Diagram curve of linear SVM classification error rate versus changing parameter C.

Figure 7 .
Figure 7. Block diagram of 3-class SVM hardware architecture design.

Figure 9
shows part of designed architecture using System Generator.Each array of rows of matrix SV is located in SV ROM 1 to SV ROM 7 outputs, which called by counter1.Test data are in blocks X_Test ROM1 to X_Test ROM 7 addressing by Counter 2 with a period of 70 (as the number of support vectors of Class1 & 2).