Wireless Sensor Network, 2010, 2, 48-52
doi:10.4236/wsn.2010.21007 anuary 2010 (http://www.SciRP.org/journal/wsn/).
Copyright © 2010 SciRes. WSN
Published Online J
Signal Classification Method Based on Support Vector
Machine and High-Order Cumulants
Xin ZHOU, Ying WU, Bin YANG
Zhengzhou Informa tion Science and Technology Institute, Zhengzhou, China
Email: zx007_0_0@126.com
Received September 15, 2009; revised November 13, 2009; accepted November 18, 2009
Abstract
In this paper, a classification method based on Support Vector Machine (SVM) is given in the digital modu-
lation signal classification. The second, fourth and sixth order cumulants of the received signals are used as
classification vectors firstly, then the kernel thought is used to map the feature vector to the high dimensional
feature space and the optimum separating hyperplane is constructed in space to realize signal recognition. In
order to build an effective and robust SVM classifier, the radial basis kernel function is selected, one against
one or one against rest of multi-class classifier is designed, and method of parameter selection using cross-
validation grid is adopted. Through the experiments it can be concluded that the classifier based on SVM has
high performance and is more robust.
Keywords: High-Order Cumulants, Support Vector Machine, Kernel Function, Signal Classification
1. Introduction
Automatic modulation classification (MC) is an interme-
diate step between signal detection and demodulation,
and plays a key role in various civilian and military ap-
plications. It is also one of many key technologies in
software radio and cognitive radio.
The recognition methods in early years are mainly
about signal waveform, frequency, transient amplitude
and transient phase [1]. The performances of these
methods descend quickly when they face to low SNR.
Statistical decision and pattern recognition based on
statistics are two main methods in approaching MC
problem in recent years [2]. The first method is based
on hypothesis testing; problem it has to face is that
needs to give proper hypothesis and strict data analysis
to get the correct decision threshold. The Reference [3]
uses neural net to solve MC problem and gets better
effect. But because the sample length is limit, the
neural net is easy to bring the phenomenon of over-
learning and local minimal value. There are some re-
searchers use support vector machine (SVM) to solve
MC problem, and get higher classification accuracy
[4,5]. But in the two references they neither gave how
to select the optimal parameter of SVM classifier and
how to construct multi-class SVM. In this paper, we
introduce the support vector machine firstly, then re-
search the selection methods of kernel function and its
parameter, and study on the multi-classes classifica-
tion methods, and then apply them to digital signal
classification. We also compare the SVM with other
common classifiers.
The paper is organized as follows: In Section 2, the
robust feature extraction based on high-order cumulants
is proposed. In Section 3, the multi-classifier based on
SVM is designed. The principle of SVM is introduced
firstly, then the kernel and parameter selection are given,
the method of decomposing multi-class classifier is used.
In Section 4, we input the signal feature to multi-class
SVM classifier to do experiment. In Section 5, the paper
is concluded.
2. Feature Extraction Based on High-Order
Cumulants
High-order cumulant is a tool of mathematics which de-
scribes the high order statistical characteristic of random
process. It not only can remove the influence of Gauss
noise, but also is robust to the rotation and excursion of
the constellation diagram.
We suppose the classifier works in the interrelated and
synchronization environment. The received signal has
carried out carrier frequency synchronization and
timing synchronization, but the unknown referenced
phas ed offset exists. The output signal of receiver can be
X. ZHOU ET AL. 49
Table 1. The cumulants of signals.
signal 40
C 42
C63
C
42
40
C
C
3
42
2
63
C
C
4ASK 2
36.1 E 2
36.1 E3
16.9 E 1 36.33
2PSK/
2ASK 2
2E 2
2E3
13E 1 125.21
4PSK 2
E
2
E
3
4
E
1 16
2/4/8FSK 0 2
E 3
4
E
0 16
16QAM 2
68.0 E 2
68.0 E3
08.2 E 1 76.13
Table 2. The cumulants of FSK signals.
signal 21
C 42
C
42
2
21
C
C
2FSK 2
E 42
E 1
4FSK 2
5
E 42
9
E 78.2
8FSK 2
21
E 42
105
E 2.4
expressed as:
1
()() ()
()(
c
Lj
k
k
risi ni
Ea ep ikn i


)
(1)
where is the sending symbol sequences, is the
observational symbol number, is the signal average
power,
k
a
c
L
E
is referenced phase, is channel rem-
nant answer, is assumed to be complex white
Gaussian noise with power
()pi
()ni
2
.
Suppose the emanant signal serial is independent and
identically distributed, the different average power has
been normalized to 1, the ideal high-order cumulants of
these signals can be expressed by Table 1 [6].
Because we calculate the high-order cumulants can not
identify 2FSK, 4FSK and 8FSK signal directly, the ratio
of 2
21
C and 42
C get from each signal in Table 2 is
the signal after difference through median filter which is
used to classify FSK signals, where
is frequency
offset.
3. The Classifier Based on Support Vector
Machine
3.1. Support Vector Machine (SVM)
SVM is basically a two-class classifier based on the idea
of “large margin” and “mapping data into a higher di-
mensional space” [7]. The principle of SVM is to make
minimize the structure risk, in the high dimensional fea-
ture space, find an optimal discriminant hyperplane with
low VC dimension to make the distance between the two
classes’ data have large margin. When the feature space
is not linear dividable, SVM maps the data into high di-
mensional feature space with non-linear mapping, and
finds the optimal classification hyperplane in high di-
mensiona l f ea ture space.
Based on the principle of conf iguration risk minimiza-
tion, suppose in inn er product space exists two kinds
discriminable samples
F
 
11 2
,,,,,,
nn
yxx x
2
yy ,
,
n
iRx
1, 1
i
y
 , 1, 2,,in
. -1 and +1 denote
two kinds; the optimal classification hyperplane can be
expressed as:
:( )0Fb
xwx (2)
where is support vector, is translation vector. In
order to make classification hyperplane and one-to-
one correspondence, we standardize it and let the dis-
tance of the sample which is nearest to hyperplane is
wbw
1w. So hyperplane after standardization satisfies:
1,2, ,
min ()1
i
in
b

wx (3)
Solving the optimal classification hyperplane can be
transformed into quadratic optimization problem:
01)(..
)(
2
1
)(min


byts ii xw
www (4)
The optimal hyperplane is discussed on the condition
that samples can be classified linearly, if can not, we will
use slack variables 0
i
and penalty factor to
resolve generalized optimal classification hyperplane (to
classify samples farthest and make the largest classify
margin at the same time):
C


001)(
..
)(
2
1
)(min
1
i
iii
n
i
i
by
ts
C
xw
www
(5)
where 1, 2,,in
, is a certain constant, it is the con-
trol of the punishment of samples which are classified
mistakenly. It is a compromise between the propor tion of
false classified samples and algorithm complexity.
C
According to the equation above and Lagrange theo-
rem, use Kuhn-Tucker condition, the (5) can be trans-
formed into duality problem:



11 1
1
max( )2
nn n
ijijiji
ji i
Qyy
 
 
 

xx
1
0
.. 0
n
ii
i
i
y
st C
, (6)
1, 2, ,in
C
opyright © 2010 SciRes. WSN
X. ZHOU ET AL.
50
Use kernel function (,)() ()
ijij
k

xxxx , the
quadratic problem can be represented by [8]:
11 1
1
max( )(,)
2
nn n
ijiji ji
ji i
Qyyk

 
 
 
xx
1
0
.. 0
n
ii
i
i
y
st C

, (7) 1, 2,,in
ni ,,2,1
The classification threshold can be gotten by any
support vector use (8): b

01)( 1by
y
ii
n
i
iii
xw
xw
, (8)
The optimal classification discriminant function ex-
pressed by kernel function is:


bkybf
SV
iii
i
x
xxxwx ),(sgnsgn)(
(9)
where .
SV
iii
i
y
x
xw )(

According to optimal problem (7), the complexity of
SVM has nothing to do with dimension of feature, but is
restricted by the number of samples. SVM needs to
compute the kernel functions between every two training
samples, to generate a kernel function matrix which has
elements, and n is the number of training sam-
ples.
nn*
3.2. The Selection of Kernel Function
In fact, changing kernel parameter is to change mapping
function implicitly, and change the complexity of sam-
ples’ distribution in feature space. So the selection of
kernel function and parameters are very important. There
are 3 kin ds o f kernels that are usua lly used [ 8]:
1) Dimensional polynomial kernel of degree , the
expression is: d
d
pk ]),[(),( yxyx (10)
where and are custom parameters; If p d0p
and
, it is called linear kernel function. The operation
speed of kernel function is fast.
1d
2) Radial basis function kernel, the expression is:
 2
2
exp),(
yx
yx
k (11)
where , it controls the width of kernel function
and needs to be confirmed.
20
3) Neural Network kernel fun ction, the expression is:
vk
),(tanh),( yxyx
(12)
where
and are parameters. Only some values sat-
isfy Mercer condition can be used .
v
Because the feature space of radial basis function ker-
nel is limitless, the limit samples in this feature space
must be linearly discriminable, so it is most commonly
used in classification. In this paper, we also select radial
basis fun c tion kerne l.
3.3. The Parameter Selection of SVM
In SVM classifier, the parameter selection of kernel
function and penalty factor is very important. The pen-
alty factor is the optimal compromise with the dis-
tance between hyperplane and the nearest training point
is farthest and the classification error is least. The pa-
rameters of kernel function determine the data mapping
into higher dimensional space.
C
There are many parameter selection methods, such as
grid searching, GD algorith m, gradient descen t algorithm,
genetic algorithm, simulated annealing algorithm and so
on. The parameter evaluation criterion has k-fold cross-
validation, leave-one-out (LOO), generalized approxi-
mate cross-validation (GACV), approx imate span bound,
margin-radius bound and so on.
In this paper, we use k-fold cross-validation to select
parameter ,
C
(2
1
)of RBF-SVM. Suppose we
have known samples, they construct sample
set n
,
ii
y
x, 1, 2,,
in
. In order to
differentiate kernel function, we use express the k
value of the k-fold cross-validation. The steps of k-fold
cross-validation are as followed:
1, 1
i
y 
l
1) Divide sample set contains n samples to
subset equally, each subset contains
l
nl samples.
2) Put from the first to (l-1) subset of (1)lnl
samples as training ones, give a smaller value of pa-
rameter (C,
), put in (7) and get the solution of La-
grange operator *
i
the samples corresponding to *
i
which are more than zero are support vectors.
3) Put each *
i
into:

lnl
ji
jijiji
Tyy
)1(
1,
2
****
2exp xxwww



svk
lnl
i
jiiik yy
sv
b)1(
1
2
** exp
1xx

where
s
v is the number of support vector, and the cal-
Copyright © 2010 SciRes. WSN
X. ZHOU ET AL. 51
culation of classification threshold *
buses the mean of
each SVM.
4) Put the *
i
, *
band test samples u
x, (ul
1)1, ,nl n into classification function (9) to get the
output ()
u
fx of each kind, to validate whether ()
u
is in accordance with real output u
y.
fx
5) Take from the second to l subset of (1)lnl
samples as training ones, the first subset as test samples,
repeat the steps 2)–4). According to the proposed mecha-
nism, until all subsets are tested, it also repeats the above
steps l times and calculates the accuracy of cross-
validation.
6) Fix the parameter C, first increase
gradually,
repeat steps 2)–5); then increase
gradually, repeat
steps 2)–5) to get different accuracy of different parame-
ter

,C
. The experience expresses that parameters in-
crease as exponent is more effect.
7) Get the max validation classification accuracy and
corresponding

,C
, if the accuracy satisfies require-
ment, then go to step 8); Or search in the range of

,C
continually which is gotten by the maximum
validation accuracy and

,C
gotten by the second
maximum validation accuracy. It also repeats step 2)–6)
until satisfies the accuracy.
8) Use the satisfied parameter

,C
to train all
training samples and get the final optimal parameter

*
iisv
and *
b, then determine the optimal classi-
fication function.
3.4. The Design of Multi-Classifier
There are two ideas to solve multi-class classification
problem of SVM [9]: One is to properly change the
original optimal problem in order to compute all the
multi-class classification discriminant functions at the
same time; the other is to divide the multi-class problem
into a series of binary problems which can be solved
directly, and based on the results, gain the final dis-
criminant results.
The first idea seems to be simpler, but because its
computation is too complex and costly, and also hard to
implement, it is not widely used. There are 5 kinds of
multi-class methods based on the second idea: One
Against Rest (OAR), One Against One (OAO), Binary
Tree (BT), Error Correcting Output Code (ECOC), Di-
rected Acyclic Graph (DAG). The OAO and OAR
methods are often used.
4. Computer Simulation and Performance
Analysis
4.1. Experiment Steps
The steps of signal classification of SVM based on grid
searching parameters selection are as follows:
1) Extract cumulant features of the received sig-
nals, divide the feature vectors equally to training sam-
ples and test samples.
2) Select RBF kernel function and a certain
multi-classifier design method; initialize 2
and C;
give the parameter search range, use k-fold cross-valida-
tion to get the optimal parameter of SVM.
3) Set the optimal parameter according to step 2)
of RBF-SVM and train it using training samples.
4) After training, input features of await classifi-
cation signals to classify them.
4.2. Classification Experiment
Parameter selection experiment: we create 200 every
digital signal every 2dB from 0 to 20dB in awgn channel,
extract cumulant feature and get new sample serial.
Samples of each class are separated into training ones
and test ones randomly. We use SVM one-against-one
decomposition, chose RBF kernel, initialize 010C
,
2
010
and disperse the parameter logarithmically, get
the grid value
2
log ,logC
. Where 0
{lo gCClog
00
,log3}CC3,log 2,
,
22 22
0
loglog3,log2, ,log3
 

0.01
C
. Then
the isolines of classification accuracy are shown in Fig-
ure 1. The maximum accuracy is 99.1% and the optimal
parameters are
,.
20.1
0.82
0.82
0.82
0.82
0.84
0.84
0.84
0.84
0.84
0.84
0.86
0.86
0.86
0.86
0.86
0.86
0.88
0.880.88
0.88
0.88
0.88
0.9
0.9
0.9
0.9
0.9
0.9
0.92
0.92
0.92
0.92
0.92
0.92
0.94
0.94
0.94
0.94
0.94
0.94
0.96
0.96
0.96
0.96
0.96
0.96
0.98
0.98
0.98
0.98
0.98
0.98
0.98
0.98
0.98
0.98
0.98
0.98
0.98
0.99
0.99
0.99
0.99
0.99
logC
log
2
-2 -10 1 2 34
-2
-1
0
1
2
3
4
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
Figure 1. The cross-validation isolines of OAO-RBF-SVM.
C
opyright © 2010 SciRes. WSN
X. ZHOU ET AL.
Copyright © 2010 SciRes. WSN
52
Table 3. The simulation result at 5dB.
output(classification accuracy)
input 4ASK 2PSK/
2ASK 4PSK 2FSK 4FSK 8FSK16QA
M
4ASK 92.2 7.8 0 0 0 0 0
2PSK/
2ASK 100 0 0 0 0 0 0
4PSK 0 0.6 99.4 0 0 0 0
2FSK 0 0 0 100 0 0 0
4FSK 0 0 0 2.6 97.4 0 0
8FSK 0 0 0 0.4 2.6 97.00
16QAM 0.8 4.2 5.8 0 0 0 89.2
Table 4. Comparison of different classification methods.
Classifier Classification accuracy(%)
Nearest distance classifier 80.2
Neural network 85.6
OAR-SVM 2=0.01,C=1 90.4
OAO-SVM 2=0.01,C=0.1 92.2
Test 1: In this experiment, we get the classification
accuracy of different signals in awgn channel. The sam-
ple frequency is 40kHz and carrier frequency is 8kHz.
The length of signal is 1200, the symbol rate is 2000Bd,
and we do 500 Monte Carlo experiments at 5dB. The
OAO-SVM is used to get the classification accuracy in
Table 3.
From Table 3 we can see that SVM classifier can get
higher classification accuracy at 5dB. The QAM classi-
fication accuracy is lowest and is 89.2%. This is because
the feature extraction of QAM is close to the feature of
2PSK and 4PSK, so it is easy to judge to the two signals
mistakenly.
Test 2: In this experiment, we compare SVM, neural
network and the nearest distance discrimination classifier.
The simulation assumption is the same as test1. We cal-
culate the classification accuracy of 4ASK at 5 dB. The
SVM uses RBF kernel and OAR and OAO classification
algorithm, and then we do 500 Monte Carlo experiments.
The classification accuracy is shown in Table 4.
From Test 2 we can see that the classification accuracy
of the nearest distance classifier is lowest, and then is
neural network and SVM is highest.
5. Conclusions
In this paper, we use the kernel thought of statistical
learning theory for reference and use decomposition me-
thods of multi-class classifier and method of parameter
selection using cross-validation grid search to build ef-
fective and robust SVM classifiers. We also use fourth
and sixth cumulants of the received signals as the classi-
fication vectors, to realize digital signals classification.
From the computer simulation and analysis, we can get
the following conclusion:
1) The feature vector of cumulants can remove the in-
fluence of Gauss noise. It is robust and has high per-
formance.
2) Classification method based on kernel function is
less affected by dimension of input data. The classifica-
tion capability of kernel classifier is affected by the ker-
nel function and parameters, and a fine classification
precision can only be obtained when kernel parameters
are in special range. The classification stability can be
effectively improved by parameter selection via cross-
validate grid search method. If the proper parameters are
chosen, the classification accuracy of SVM is high.
6. References
[1] K. Nandi and E. E. Azzouz, “Automatic modulation rec-
ognition [J],” Signal Processing, Vol. 46, No. 2, pp. 211–
222, 1995.
[2] O. A. Dobre, A. Abdi, Y. Bar-Ness, et al., “Survey of
automatic modulation classification techniques: Classical
approaches and new trends [J],” IEE Communication, ,
Vol. 1, No. 2, pp. 137–156, 2007.
[3] W. C. Han, H. Han, L. N. Wu, et al., “A 1-dimension
structure adaptive self-organizing neural network for
QAM signal classification [C],” Third International Con-
ference on Natural Computation (ICNC 2007), HaiKou,
August 24–27, 2007.
[4] X. Z. Feng, J. Yang, F. L. Luo, J. Y. Chen, and X. P.
Zhong, “Automatic modulation recognition by support
vector machines using wavelet kernel [J],” Journal of
Physics, International Symposium on Instrumentation
Science and Technology, pp. 1264–1267, 2006.
[5] H. Mustafa and M. Doroslovacki, “Digital modulation
recognition using support vector machine classifier [C],”
Proceedings of The Thirty-Eighth Asilomar Conference
on Signals, Systems & Computers, November 2004.
[6] O. A. Dobre, Y. B. Ness, and S. Wei, “Higher-order cyclic
cumulants for high order modulation classification [C],”
IEEE MILCOM, pp. 112–115, 2003.
[7] Z. L. Wu, X. X. Wang, Z. Z. Gao, and G. H. Ren, “Auto-
matic digital modulation recognition based on support
vector machine [C],” IEEE Conference on Neural Net-
works and Brain, pp. 1025–1028, October 2005.
[8] V. Vapnik, “Statistical learning theory [M],” Wiley, 1998.
[9] B. Gou and X. W. Huang, “SVM multi-class classifica-
tion [J],” Journal of Southern Yangtze University, Vol. 21,
pp. 334–339, September 2006.