Secure Cognitive Radio Communication for Internet-of-Things : Anti-PUE Attack Based on Graph Theory

Internet of Things (IoT) paradigm with strong impact on future life will be interconnected through Cognitive Radio Networks (CRNs). CRNs with Ubiquitous trait are highly promising to achieve interference-free and on-demand services. CRs are able to sense the spectral environment, to detect unoccupied bands, and to use them for signal transmissions. This opportunity encourages malicious Users to surpass CRs by Primary User Emulation (PUE) attack and use vacant spectrums. This paper proposes an unsupervised algorithm to distinguish CRs from PUs regardless of static and mobile user. Employing K-means and graph theory are coincident in our algorithm to improve detection outcomes. The edge of graph corresponding to the relation between signals is used and the result of comparison the signal properties is exposed to different clusters. The Receiver Operating Characteristic (ROC) and Detection Error Tradeoff (DET) of our proposed algorithm prove our claim.


Introduction
Internet of Things can provide a ubiquitous network of connected devices and intelligent sensors for smart communications, and big data analytics has the potential to enable the movement for real-time control.IoT capable objects will be interconnected through wired and wireless communication technologies.However, cost-effectiveness issues and accessibility to remote users make wireless communication as a feasible solution [1].The continuous growth of objects and smart sensors led to emerge cognitive capabilities.This emergence may be a suitable option for mitigating the fixed spectrum allocation, the interference in different bands, accessibility problems in every place, large volumes of data collection and transmission, license costs and power consumption.
A Cognitive radio Network (CRN) is a network comprising of CR devices that are equipped with cognitive capability and reconfigurability and built on the software-defined radio (SDR) [2].They can adjust their own transmitter parameters such as power output, frequency, modulation and identify potential impairments to communicate quality in interference, path loss, shadowing and multipath fading.A CR is an intelligent wireless communicator that learns from its radio environment and adapts its internal states for starting or not starting radio transmission depending on the absence of the PU in the band.There are four major tasks for a cognitive radio: spectrum sensing to find vacant channels, spectrum sharing, spectrum mobility and spectrum management.
Due to the aforementioned tasks, a CR requires following enabling techniques: • Bayesian signal processing (e.g.cognitive radar with the availability of a priori information), • Dynamic programming, • Learning machines with feedback (e.g.neural networks), • Game-theoretic models, • Cross layer protocol design.
Nowadays, PU detection is performed using one of the following three methods: transmitter detection, cooperative and interference detection.Four approaches are shown in Figure 1 for transmitter detection, based on the knowledge on the transmitter signals, which are obtained through spectrum sharing.
Cooperative is one of the methods for detecting PUs, in which information from CRs are sent to a common center for decision.The third method is interference detection between transmitter and receiver.Four ways are followed to identify the Malicious User (MU) by transmitter technique: Firstly, Matched Filter [3], the received signal in matched filter is processed by prior knowledge like mod-  ulation.Secondly, Energy Detection [3] is highly recommended for distinguishing the known users, since unknown user utilizes various parameters which can cause errors in real signal detection such as mobility parameters and emulation attacks.Thirdly, Cyclostationary Features [3] introduce the signal features and their effects on signal processing.Finally, Information Theory Criterion (ITC) [4], which uses information theory to detect the PU transmitter.This paper emphasizes graph theory to enhance the accuracy of clustering based on the unsupervised learning methods, while MUs' traits lead to misclustering.
The remainder of this paper is organized as follows.Section 2 describes the K-means algorithm and its challenges for detecting MU.In Section 3, Graph theory is discussed.Section 4 introduces the dependence of signals and then details of the proposed algorithm are explained in Section 5.In Section 6, we survey the accuracy of the proposed algorithm to recognize a malicious user.In Section 7, we analytically evaluate the performance of the K-means method, and our proposed algorithm.Summary of the work and conclusions are presented in Section 8.

Enhanced K-Means Algorithm
In the Equation (1), n opt is the minimum distance among every two distances between two rms values.So, k-means algorithm assigns the received signal to cluster n opt .This procedure is done for any received signal and it is assumed that the features (e.g.rms value) of cluster centers are known by k-means algorithm.
Noise, interference and Rayleigh fading channel may change features of the received signal.To combat this problem, we propose that k-means employs randomly three features of the received signal and find its distance from the three corresponding features of the cluster centers, then decides to which cluster the signal should be assigned.Figure 2 presents clustering of I = 30 received signals into N = 3 clusters, using three features: kurtosis of square of signal, kurtosis of d/dt of the signal, and autocorrelation of the signal [7].In this example, PUn signal is BPSK, 16QAM and GMSK for n = 1, 2, 3, respectively.Cluster centers are shown by "o" in the Figure 2.
As stated by three features, all other kinds of received signals can be clustered.If one of these features causes the same value between clusters, this feature will be replaced randomly with other features of the signal.This procedure may be repeated a number of times until a received signal clustered into one of existing clusters.

Graph Theory
Wide range of wireless standards and relevant devices to communicate in IoT environment [8] can cause complexity in calculation and performance reduc- , , which represents the relation between channels (Figure 3).An edge ).Next section is focused on the edge ε as the relevance of signals.
We consider three standard modulations, namely, BPSK, 16-QAM and GMSK, but any other modulation can be easily included in the scheme.These modulations are representative of CDMA, WiFi and GPRS IoT standards respectively.Other related standards as well as their application in internet of things are exposed in Table 1.

Signals Relevance
The continuous growth of incoming signals means a continuous growth of features.Besides mentioned features which have been implied earlier, some other features like carrier frequency, signal bandwidth, symbol rate, modulation scheme and propagation channel exist as extra features of unknown incoming signals and channels.Increasing these features to the calculation led us to exceedingly complicated analysis.Hence, these recent features are ignored in "blind" channel modeling to cluster signals.In previous section, we discussed about edgeԑ that shows the relevance of two points and consequently relation between two clusters.In this paper, we use common features between two signals like signal to noise amplitude ratio (SNAR), bit error rate (BER) and others that shown in Table 2 as signals relevance.According to the Figure 3, every two signals may have common area which called signal relevance.Good exemplar of this relationship could include signal to interference plus noise ratio (SINR), SNAR, BER, power, data rate, and others in Table 2.A few of these signal relevance are listed in the first column of this Table .For instance, two signals by their own mean (µ) and standard deviation (σ) have common value like SNAR that is calculated at second column in second line.On the other hand, SNAR may be related to the other signal relevance such as BER.Therefore BER can be calculated by SNAR which inserted in third column as Relation.To sum it up the third column is the result of relation between previous signal relevance.These computations with details in deeply manner about relationships can amend the discernment of clustering and mitigate suspicious analyses.
SNAR, BER and Power features can be used for the initial calculation and seeking the degree of membership.: the number of bytes to maximum bytes in a packet; N: Noise Power; m: modulation order; c: carrier power; µ i : the mean of signal i; i σ : standard deviation of signal i.

Proposed Algorithm
Advent of cognition ability will change the future of communications.And bring new requirements of sources.The most proportion of sources will be allocated to data storage and data analyzing.These two factors should be handled by optimized algorithm of analyzing and organized information like that described in.Machine learning and associated branches have a significant role in computing.In this paper, we try to step into optimal modes to reduce the adventure of sophisticated analyzing and heighten the accuracy of malicious detection by k-means clustering.K-means restricted to distance and threshold radius of cluster.But our algorithm by exhibiting a new way tried to escape from k-means' restrictions and its imprecise clustering.Initially, it alludes to thirteen features of signal which encompass Mean, The Standard Deviation (StdDev), Variance-Covariance (VAR), Kurtosis, Kurtosis of Square signal, Kurtosis of Derivation of signal, Skewness, Power, Average Power, StdDev of amplitude, StdDev of angle, Max Autocorrelation of signal and StdDev of absolute value of Phase change (StdDev_abs_PhaseChange).These features are momentous for stochastic chosen name for each axis which their impact have been explained in Section 2. First of all it uses Euclidian metrics (Figure 4) to follow k-means rule until facing with inefficient measure of the distance criterion.Then, the procedure switches to compare the signal features and relevant features respectively.Aforementioned argument the Power of unknown signal will be compared with the average power of nearest cluster as a second stage.Third step dedicated to utilize the comparative features such as Signal-to-Noise Amplitude Ratio (SNAR) and Bit Error Rate (BER) and exploit their privileges to determine malicious user (extra explanation brought in Sections 3 and 4).

Verifing the Proposed Algorithm
Our algorithm attempts to discover patterns of malicious users.In this section we try to confirm whether this algorithm reflects the properties of MUs.The ROC curve is a fundamental tool for evaluating the algorithm's result.In a ROC curve the true positive rate is plotted in function of false rate for different cut-off points.Each possible cut-off point represents the discrimination between malicious cases and normal cases, there will be some cases with the malicious correctly detected as positive (TP = True Positive), but some cases with the malicious will be detected negative (FN = False Negative).On the other hand, some cases without the malicious will be correctly detected as negative (TN = True Negative), but some cases without the malicious will be detected as positive (FP = False Positive).
Our assumption about TN, FP, TP and FN is defined as follows: ( ) In Equation ( 2  the probability of threatening points of cluster n for cluster q.These probabilities can be calculated with following equations: FP : 1 TP : Finally, False Negative fraction: Figure 4 exhibits group of points according to their modulation on three-dimensional coordinate axes (accidentally chosen features) to clarify threatening point.Each group of points has own average which far from other points.If these distances have been introduced by average distance (dn) we can choose threatening point which is farther than dn.
To verify this idea, a point should be compared with others in Power, SNAR, BER, SINR and other signals relevance like what was stated in Algorithm 1.

Simulation Results
We generate 100 random signals corresponding to 100 trials for three clusters with three modulations, BPSK, 16QAM and GMSK.While the properties of these signals are: 1) α = 0.5, 2) R s = 256 kb/s for one level modulation, 3) E b /N 0 = 10 and AWGN fading channel with 320 max doppler frequency.The ROC curve describes the detection performance of proposed and k-means algorithm.As can be seen from Figure 5, Proposed bipartite graph moved cut-off point (blue circle point) from (79, 21) to (83,17).Validation results in this point show that proposed algorithm can detect PUE attack nearly 15 percent more efficient than the traditional K-means.
Two types of errors may be occurred during detection process.The first type is related to inaccurate sense occupied bandwidth by CR when PU is present.In this case, the subsequent transmissions of CRs will cause interference to the PU.The second type arises from inaccurate sense occupied bandwidth by CR when PU is absent.In addition to exploiting the transmission opportunities and use spectrum more efficiently, the cognitive structure needs to minimize the risk of malicious activities and restrict the probability of incorrect attack detection.
sig + and do stage 4 to 6 then continue.

T d d
then go to stage 28.

24) Else
( ) An alternative to the ROC curve is the Detection Error Tradeoff (DET) graph, which plots the false negative rate (missed detections) vs. the false positive rate (false alarms) on non-linearly transformed x-and y-axes.Figure 6 demonstrates (1) that the proposed algorithm can identify malicious user with lower error probability in detection.

Conclusion
The new field of unsupervised learning in cognitive radios with a special emphasis on unique aspect of these radios-spectrum sensing provides novel opportunities to cluster.We strengthen the accuracy clustering while a malicious user wants to be a PU and opportunistically occupies licensed bandwidth.As claimed by the rules of graph-based, the edge of each cluster is defined by the relation between signals (Table 1).In each iteration or adding new signal as a point, besides comparison the distance between a point and the center of clusters, the value of signal properties like SNAR, BER and etc. should be compared with other points simultaneously.Despite highly accuracy, cognitive system may encounter data flood during reconfiguration.This problem has a fundamental role in operation phase and causes utilizing elaborate data center or separate layer for analyzing data and algorithm.The future research of this paper can focuses on optimizing algorithm and accelerating procedure.

Figure 2 .
Figure 2. K-means clustering exhibition of signals with three modulations, cluster centers are shown by blue "o".
). SIG1 is the set of 1 I′ signals that belong to cluster "1" ( ) and SIG2 is the set of 2 I′ signals that belong to cluster "2" (

R
S : Symbol Rate; T 0 : Symbol Period (1/R S ); K: number of bits per symbol ( 2 log M ); α: pulse shape filter roll-off factor (in this scheme, α = 0.5); TX RX T T : amount of transmitting time to receiving in a TDD sys; max L L

Figure 4 .
Figure 4. Defined distance between points and average in one cluster.
), L presents legal point and Tr is a representative of the threatening point.Each point falls into two categories, firstly, legal affiliation to CR, secondly, dependence on malicious users.Both of aspects have at least one acceptance value for signals relevance.
is the probability of threatening points for cluster n and ( ) nq P N′ is go to stage 15.27) Announce1 i sig + is a malicious user.28) Input another signal 1 i sig + = SIG go to stage 11.

False
Alarm probability (in %)Miss probability (in %) Our paper is based on feature extraction for clustering received signals to detect MUs.In this scheme, we mainly focus on the security problem from PUE attacks in CRN.In this attack MUs emulate the PUs' transmitting signal and mislead the legitimate SUs to give up the spectrum band.
For instance, if a MU is to be recognized as a PU, a TV broadcaster with specific modulation, rate and power, should transmit the same signal with exactly PU's features.This scenario would not be endured for a long time because MU wants to transmit its own data and features.In recent approaches, MUs have been de- For instance, k-means algorithm considers rms values of Journal of Computer and Communicationseach received signal denoted by rms s for clustering them.Imagine that it declares the rms values of cluster centers by rms n

Table 1 .
Application of modulations in IoT.

Table 2 .
Ratio features and their relevance.