Investigation on Analog and Digital Modulations Recognition Using Machine Learning Algorithms ()
1. Introduction
Automatic Modulation Recognition or Classification (AMR/AMC) is a technique for classifying or identifying the type of modulation of the detected signal and for estimating certain signal parameters such as power, symbol rate, carrier frequency, etc…
In the last decade, advanced analog and digital modulation techniques have been used in communications systems with the aim of improving spectrum efficiency and transmission reliability [1]. In analog communication systems, a transmission signal is encoded by using analog modulations like amplitude modulation (AM), phase modulation (PM), and frequency modulation (FM). Typically, an analog modulation technique encodes an analog baseband signal onto a high frequency periodic waveform. In digital communication, the signals are digitized, and then the information is transferred through these digitized signals from source to destination. Digital modulations are more preferable in terms of usage, thanks to better coordination with digital data and stronger robustness against interference compared with analog modulations [2]. Different types of digital modulations are employed, these included amplitude shift keying (ASK), phase-shift keying (PSK), frequency shift keying (FSK), pulse amplitude modulation (PAM), amplitude and phase-shift keying (APSK), and quadrature amplitude modulation (QAM). Due to the increase in analogue and digital modulations related to civil and military applications, modulation recognition is important, as recognition of the modulation type is an intermediate step between signal detection and signal demodulation [2]-[4]. It is a very important step in reception as it allows selection of the corresponding demodulator. The use of machine learning techniques has made the recognition process easier as well as more reliable. Hence, many works have attempted to apply Machine Learning (ML) techniques to AMC [2]. Hong and Ho [5] used Bayes method to classify BPSK and QPSK without a priori information about the received signal level. Wong et al. [6] introduced minimum distance classifier to reduce the complexity of Machine learning classifier. They also used blind source separation algorithm. Swami and Sadler [7] proposed a method for digital modulation classification using fourth order cumulants. Their method was robust to the presence of carrier phase and frequency offsets. Author in [8] used higher-order statistics for blind channel estimation and pattern recognition. They presented results of modulation classification in the presence of a fading channel. Headley et al. [9] used a two-stage process for AMC. In the first stage, local radios make a decision which is then sent to a fusion center, which makes a global decision about the modulation type. Wong and Nandi [10] used artificial neural networks (ANN) and genetic algorithms for recognition of various digital modulation types. They used two types of ANNs; multi-layer perceptron (MLP) and resilient back propagation (RPROP). They also used Bayes method with higher-order cumulants for classification of BPSK, QPSK, 16QAM and 64QAM.
Hassanpour et al. [11] proposed a pattern-based AMC approach. The authors focused on feature extraction blocks and digital modulation pattern identification, assuming AWGN, BASK, BFSK, BPSK, 4-ASK, 4-FSK, QPSK, and 16-QAM will be employed. To extract their attributes, signs are analyzed in time, frequency, and wavelet domains. A Binary SVM-based hierarchical structure is being studied to resolve the problem of multiple classes. Simulations show how the suggested characteristics enhance digital signal differentiation in a noisy environment with low SNR. They obtained, a 98.15 percent accuracy rate with an SNR of −10 dB, proving that this is the minimum required for flawless identification.
In their studies, Ansari et al. [12] and Jajoo et al. [13] proposed a modulation recognition and separation method by choosing useful features from the modulated signal for identifying the modulation type using a decision tree and probabilistic neural network methods. They were conducted using MATLAB simulations with a signal-to-noise ratio over an additive white Gaussian noise channel. The results showed that picking useful features and establishing tuning parameters significantly improved modulation type recognition accuracy and speed.
With the development of machine learning, more and more people have begun to use supervised learning to improve the efficiency of signal recognition [14]. Daldal et al. used a deep LSTM model to identify six modulation types of signal sequences and achieved significant results [15]. Cheol-Sun Park uses Support Vector Machine (SVM) to improve the recognition effect [16]. Timothy J. O’Shea [17] proposed convolutional neural network to solve the problem of modulation recognition.
In several recent works, many types of datasets have been used to evaluate the performance of machine-learning algorithms for modulation classification. A. Jagannath and J. Jagannath [18] proposed a synthetic wireless waveform dataset suited for modulation recognition and wireless signal (protocol) classification tasks separately as well as jointly. Cheng et al. [19] shared a dataset consisting of four different modulation schemes and varying levels of AWGN and fading. They evaluated the performance of several deep learning algorithms on this dataset and found that Convolutional Neural Networks (CNN)s and Recurrent Neural Networks (RNN) achieved the highest classification accuracy. Alam et al. [20] used a dataset involving eight modulation schemes and varying levels of Additive White Gaussian Noise (AWGN). The performance of several machine learning and deep learning algorithms was investigated on this dataset and found that deep learning algorithms outperformed traditional machine learning algorithms in high-noise environments. Table 1 presents the summary of related works.
Table 1. Summary of related works.
Reference |
Modulations Studied |
Method Used |
Limitations |
[5] Hong and Ho |
BPSK, QPSK |
Bayes method |
Lacks consideration of varying SNR conditions. |
[6] Wong et al. |
Digital modulation: ASK, PSK, FSK, QAM |
Minimum distance classifier, blind source separation |
Complexity reduction impacts recognition accuracy in noisy environments. |
[7] Swami and Sadler |
Digital modulation: ASK, PSK, FSK, QAM |
Fourth-order cumulants |
Susceptible to carrier phase and frequency offsets in high noise environments. |
[9] Headley et al. |
Multiple digital modulations |
Two-stage AMC process |
Relies heavily on local decisions, potentially reducing robustness in diverse conditions. |
Continued
[10] Wong and Nandi |
BPSK, QPSK, 16QAM, 64QAM |
ANN, Genetic algorithms, Bayes method |
They can be computationally intensive, impacting real-time application feasibility. |
[11] Hassanpour et al. |
BASK, BFSK, BPSK, 4-ASK, 4-FSK, QPSK, 16-QAM |
Binary SVM-based hierarchical structure |
May struggle with feature extraction in extremely low SNR environments. |
[12] Ansari et al. |
Various digital modulations |
Decision tree, Probabilistic neural network |
Performance highly dependent on feature selection and tuning parameters. |
[15] Daldal et al. |
Six modulation types |
Deep LSTM model |
Requires extensive training data, and computational resources may limit deployment. |
[16] Cheol-Sun Park |
Various digital modulations |
Support Vector Machine (SVM) |
May not generalize well across all modulation types without extensive tuning. |
[17] Timothy J. O’Shea |
Various digital modulations |
Convolutional Neural Network (CNN) |
High computational complexity may limit real-time application, especially in noisy channels. |
In this paper, we therefore analyze modulation systems and thus experiment their recognition on KNN and ANN automatic approaches to classify ten types of modulation. The studied modulations include: AM_DSB_FC, AM_DSB_SC, AM_USB, AM_LSB, FM, MPSK, 2PSK, MASK, 2ASK and MQAM.
This study is a continuation of the works mentioned above on AMR using Machine Learning algorithms. His objective is to contribute to the development and implementation of a reliable and robust classification algorithms for both digital and analog modulations.
2. Approach Description
The aim of this paper is to design and implement an automatic signal classifier according to the type of modulation. This includes a system compatible with the communication intelligence system where the receiver is able to automatically detect the modulation pattern of the signal it receives using Modulation Recognition Algorithms (MRA), without having any prior knowledge of the transmitted signal. The purpose of automating modulation recognition is to analyze signals more quickly; to extract information that can be used by the algorithm to distinguish the type of modulation. The approach used in this study has two main steps, the extraction of characteristics of the received or intercepted signal representing the input element of the classifier and the classification process, involving a set of data for training, validation and testing of the classifier.
2.1. Features Extraction for AMC
The recognition system is based on pattern recognition, so it requires suitable features for the process. Regarding feature extraction, Nandi and Azzouz use instantaneous features for the classification of both analog and digital signals in [21], which is the most representative work in the field of AMC. In this work nine features are taken in consideration and which are therefore derived from the instantaneous phase φ(t); the instantaneous amplitude a(t) and the instantaneous frequency f(t) of the signal. They are divided in two categories; the spectral characteristics and statistical characteristics. Most of which were proposed by Nandi and Azzouz in [21], are described as follows:
Maximum power spectral density value (γmax),
(1)
(2)
where:
N is the number of samples and Acn[n], the value of the instantaneous centered amplitude normalized to time t = n/fe, defined by:
(3)
µA is the average value of the instantaneous amplitude evaluated on a segment.
(4)
knowing that: A[n] = a[n],
Standard deviation of the absolute value of the instantaneous phase (σap)
(5)
where:
ΦNL[n]: is the component of the instantaneous phase of the n-th sample at the time instants t = n/fe (n = 1, 2, 3, …, N);
NC: is the number of samples meeting the condition: An[n] > At;
An[n] is the normal instantaneous amplitude; At the threshold value for A[n] below which the phase evaluation is very sensitive to noise. The variable At is a threshold value that filters out low amplitude signal samples because of their high sensitivity to noise.
Standard deviation of the instantaneous direct phase value (σdp)
(6)
Spectral symmetry (P)
This feature extraction is based on the spectral powers for the lower and upper sidebands of the modulated signal. The characteristic key is defined as:
(7)
With
(8)
(9)
PU: upper sideband spectral power;
PL: lower sideband spectral power;
Xc(n): is the Fast Fourier Transform of the modulated signal;
(fcn + 1) is the number of samples corresponding to the carrier frequency fc and the sampling rate fs is defined as:
(10)
Standard deviation of the absolute value of the centred-normalized instantaneous amplitude (σaa)
(11)
Standard deviation of the absolute value centred-normalized of the instantaneous frequency (σaf)
(12)
Standard deviation of the value normalized-centered instantaneous amplitude (σa)
(13)
2.1.1. Statistical Characteristics
Two statistical parameters are used in this study, the kurtosis of the normalized and centralized amplitude and the kurtosis of the normalized and centralized frequency.
Kurtosis of the normalized-centered instantaneous amplitude (
) is defined
(14)
: kurtosis of the normalized-centered instantaneous frequency
(15)
2.1.2. Process of Feature Extraction
The process of feature extraction is done by using some transformations such as the Hilbert transform, the forward and inverse Fourier transforms. The filtering of the signal before the feature extraction is done by means of a convolution. The alternative, filtering by multiplication in the Fourier domain, would imply that the phase and magnitude of the output of the IFFT (Inverse Fast Fourier Transform) must be found. The magnitude found here is equivalent to the filtered input of the FFT (Fast Fourier Transform).
2.2. Classification Algorithm Analysis
In this work, two classification algorithms were used: the KNN and ANN. The K-Nearest Neighbor (KNN) is known as a simple but robust classifier and is capable of producing high performance results even for complex applications [22]. In the KNN algorithm, the training phase corresponds to a knowledge induction of several modulation samples, stored in a database called training data.
For the test and validation phase, when a new signal from the test sample is presented, it is compared with all signals from the training phase. It seems logical that the smaller the distance between the features of two modulations, the more similar they are. Thus, the recognition result is the modulation from the training base that is closest to the newly presented modulation. The comparison is done by calculating the Euclidean distance between the test modulation (vector) and the modulations (vectors) from the training base using the Equation 16 below.
(16)
The constant j varies from 0 to 9, which corresponds to the positions of the nine features of each vector.
The artificial neural network (ANN) used for this classification problem is a Multi-Layer Perceptron (MLP). The MLP algorithm is a computational system inspired by biological neural networks that use a network of functions to understand and transform a wide range of input data into the desired output [23]. ANN minimizes errors for nonlinear inputs and can obtain relationships between inputs and outputs without complex mathematical equations.
3. Materials and Methods
In this section, we outline the tools, datasets, machine learning model architectures, and hyperparameters used in our investigation of analog and digital modulation recognition.
3.1. Tool
Experiments and simulations of different algorithms were carried out using MATLAB software version R2019a. In addition to the basic part of the software, several tool boxes were used such as, Neural Network Toolbox (nn tool), Neural Pattern Recognition Toolbox (npr tool), nntrainTool, Signal Processing Toolbox.
3.2. AMC_Master Dataset
The AMC master dataset, introduced by J. Visser and A. Farquharson [24] was used for this study. The distribution of modulations in the AMC_Master learning/assessment/test set is as follows: 70% of the dataset corresponds to learning data, 15% to test data and the remaining 15% to system assessment. The sample distribution is depicted in Table 2.
Table 2. Distribution of modulations in the AMC_Master evaluation set.
Modulations |
Class |
Number |
Percentage |
AM_DSB_FC |
0 |
3602 |
6.1781 |
AM_DSB_SC |
1 |
3733 |
6.4028 |
AM_USB |
2 |
4111 |
7.0512 |
AM_LSB |
3 |
4785 |
8.2072 |
FM |
4 |
7294 |
12.5107 |
MPSK |
5 |
5554 |
9.5262 |
2PSK |
6 |
7159 |
12.2779 |
MASK |
7 |
7284 |
12.49355 |
2ASK |
8 |
7374 |
12.6479 |
16QAM |
9 |
7406 |
12.7028 |
Total |
10 |
58,302 |
100 |
3.3. Model Architectures and Hyperparameter Tuning
We explore K-Nearest Neighbors (KNN) and Artificial Neural Networks (ANN) to determine the most effective approach for our task. For KNN, we optimize the number of neighbors and distance metrics to enhance classification accuracy. For ANN, we adjust hyperparameters such as the number of hidden layers, neurons per layer, learning rate, and batch size. Through systematic tuning and cross-validation, we aim to achieve a model configuration that provides both high accuracy and computational efficiency. Table 3 provides a concise overview of key parameters for both KNN and ANN models, along with example values for each.
Table 3. Summary of model architectures and hyperparameters for KNN and ANN.
Model |
Parameter |
Description |
Values |
KNN |
Number of Neighbors |
The number of neighbors to consider for classification |
1, 2, 3, 4, 5 |
Distance Metric |
The distance metric used to calculate similarity |
Euclidean |
ANN |
Neurons per Layer |
The number of neurons in hidden layer |
50, 60, 70, 120, 160 |
Activation Function |
The function used to introduce non-linearity |
ReLU, Softmax |
Learning Rate |
The rate at which the model learns during training |
0.001 |
|
Batch size/Data division |
The number of samples processed before
updating the model |
Random |
Epochs |
The number of times the entire dataset is passed through the network |
1000 |
For KNN, the number of neighbors is tested from 1 to 5, using Euclidean distance. For the ANN, neurons per layer range from 50 to 160, with activation functions including ReLU and Softmax, and a fixed learning rate of 0.001 over 1000 epochs.
3.4. Performance Metrics
The performance of the ANN and KNN models was evaluated using ROC (Receiver Operating Characteristic), Accuracy, and Precision [25]-[27]. ROC assesses the model’s ability to distinguish between classes, providing insights into the trade-off between true positive and false positive rates [28]. Accuracy measures the overall correctness of the model, indicating the proportion of correctly classified instances. Precision focuses on the model’s ability to correctly identify positive instances, reflecting its reliability in positive class predictions [29].
To complete our investigation, a number of signals are generated and simulated at different Signal Noise Ratio (SNR) level values. Secondly, the algorithm for extracting key characteristics is represented as vectors corresponding to inputs of KNN and ANN classifiers. Each model is first created and then trained on a set of training data to be finally tested and used with noisy signal segments. Through repeated experiments, we have observed how classifier performance scales with variances of different approaches and even validated the robustness of key characteristics with respect to noise.
4. Results and Discussions
In this section, the performance of both KNN and ANN algorithms is evaluated by presenting the confusion matrix with corresponding plots and analyzing the extracted features from the signal. Finally, the implementation complexity and processing speed of these methods are compared.
Figure 1 displays the confusion matrix, also known as the percentage matrix, for the KNN model. From the confusion matrix, it can be seen that more than 7 classes can be classified with more than 50% of accuracy. The best result gives an accuracy of 73.5% and a precision of 73.17%. By varying the number of neighbors, the result shows that the more the number of neighbors increases, the weaker the performance of the used model: the precision (percentage of good overall detection) and accuracy (percentage of detection for a specific class) decrease, as depicted in Table 4. It shows that the best recognition rate obtained with KNN is close to 74%.
Figure 1. Confusion matrix of the KNN model.
Table 4. Variations in the performance of the KNN model as a function of K.
K |
1 |
2 |
3 |
4 |
5 |
Precision |
73.98 |
60.21 |
52.01 |
47.13 |
45.25 |
Accuracy |
73.17 |
59.14 |
51.4 |
47.18 |
45.44 |
Figure 2 displays the confusion matrix of the ANN algorithm. For optimizing the number of neurons in the hidden layer, a neural network with 9 input layer neurons, 60 hidden layer neurons and 10 output layer neurons was used. The training time was 3 minutes 41 seconds.
Figure 2. Confusion matrix of the ANN model.
After several experiments related to the variation in the number of neurons, we see in Table 5 that the best rate is 60 neurons with 90.5%.
Table 5. Variations in the performance of the ANN model as a function of the number of N neurons.
N |
15 |
50 |
60 |
70 |
120 |
160 |
Precision |
89.3 |
90.2 |
90.5 |
89.6 |
88.7 |
83.2 |
The best recognition rate obtained with KNN is 74%. The learning time has always been long, but it is short compared to the neural method alone. On the other hand, the best recognition rate goes to ANN with 90.5%. Therefore, ANN is a preferred method to derive the modulation type from the KNN method.
Classification by the K-nearest neighbor algorithm (KNN) is a simple and efficient approach. However, improvement in classifier performance is always attractive. The learning time is sometimes long, but it is always short compared to the neural method alone. Using Neural Networks has made the recognition process easier and more reliable. Combining multiple classifiers is an effective technique for improving accuracy.
4.1. Using the Model Obtained
As described above, in order to exploit the power of the machine learning operations, some modulated signals were generated by adding white Gaussian noise. Using the parameter extraction algorithm, we have implemented the features representing the input elements of the classifiers. Once the vector of 9 features is obtained, it is injected into the classifiers and the results are checked.
4.1.1. Generation of Modulated Signals
The parameters below can be used to generate an analogue signal:
Fm: Frequency of the modulating signal;
Am: Amplitude of the modulating signal;
Fp: Carrier frequency;
Ap: Amplitude of the carrier;
Fe: Sampling frequency considering the values:
Fm = 1; Fc = 20; Fp = 200; Am = 15; Ap = 10; Fe = 4 * Fp * Fm;
Figure 3 shows the modulated and noisy FM signal with SNR = 10 dB. Similarly for the single sideband modulated signal as shown in Figure 4. QAM modulated signal 3 with M = 4 and SNR = 10 dB is displayed in Figure 5.
4.1.2. Extraction of Features
Once the signals have been generated, the characteristic parameters are extracted. Table 6 depicted the extracting features from signals 1, 2 and 3.
From these data we can see that for signal 1 FM, the amplitude variations σa and σaa are zero, while for signal 3 the first element indicates a strong amplitude compactness. Signal 2 AM_LSB is better positioned here, as the value of γmax indicates a strong amplitude membership.
Figure 3. Signal 1 FM modulated signal with SNR = 10 dB.
Figure 4. Signal 2 modulated with AM_LSB with SNR = 10 dB.
Figure 5. QAM modulated signal 3 with M = 4 and SNR = 10 dB.
Table 6. Result of extracting features from signals 1, 2 and 3.
Features |
Signal 1 |
Signal 2 |
Signal 3 |
|
1.94 |
1.19 |
1456.97 |
σaf |
9.7e−11 |
6.7e−10 |
0.043 |
σdp |
3.97 |
4.29 |
4.01 |
σap |
6.5 |
7.08 |
6.656 |
γmax |
508.9 |
4.06e5 |
3605.9 |
P |
−0.9 |
−0.99 |
−0.998 |
σa |
0 |
0 16 |
21.80 |
|
1.74 |
1.83 |
2.403 |
σaa |
0 |
0.11 |
21.808 |
4.1.3. Classification
The parameters thus extracted are injected into various classifiers and the results obtained are shown in Figures 6-8 below.
Figure 6. Classification result for signal 1 KNN.
Figure 7. Classification result for signal 2 ANN.
Figure 8. Classification result for the 3 ANN signal.
From these results, we can observe that the 1 FM signal was very well classified at 100%, but note that the results obtained with the KNN classifier are bivalent 0 or 1, so the probability of error is very high.
Signal 2 LSB is classified as such with a dominant percentage of more than 42%.
Signal 3 QAM on the other hand is misclassified as ASK. Going back to the characteristics, it can be said that this result is predictable as its γmax element is very dominant.
4.1.4. Impact of SNR Conditions on Model Robustness
The current study’s findings indicate that added noise levels make modulation detection more challenging, which impacts model performance. This observation aligns with previous research that underscores the detrimental effects of noise on modulation recognition. For instance, Zhou demonstrated that noise significantly degrades the accuracy of modulation classifiers, highlighting the necessity for robust algorithms capable of handling noisy environments [30]. Liu et al. [31] showed that the recognition rate is not ideal when the signal-to-noise ratio is low. Similarly, Hu emphasized that effective modulation detection requires strategies to mitigate the impact of noise and interference [32]. Jader [33] examined how various machine learning algorithms, including Decision Trees, Random Forests, SVM, and KNN, perform in recognizing four different modulation types (PSK, QPSK, AM, Morse) under varying Signal-to-Noise Ratios (SNRs) ranging from −10 dB to +25 dB. They demonstrated that as the SNR increases, the performance of machine learning models in signal modulation recognition generally improves. This trend highlights the models’ ability to better distinguish signal features when the noise is reduced. However, in low SNR conditions, where noise dominates, model performance declines, particularly for those models less robust to noise, such as Random Forest. When the SNR is lower than 10 dB, particularly around 5 dB, model performance tends to stabilize, showing little improvement as the noise remains dominant.
4.1.5. Comparison of Models Obtained
To evaluate the performance of the proposed system, a training base of 58,200 signal samples were used, each characterized by nine attributes with two training techniques: KNN and ANN. The hold-out method is performed during the process of evaluating the machine learning model, where the training database corresponds to 70% of the global dataset.
In this work, KNN model is trained using the same data under different K values, and the K value giving the best performance with the validation data was retained. The best recognition rate obtained with KNN is 74%. The learning time was still long, but short compared with the neural method alone. On the other hand, the best recognition rate was achieved by ANN with 90.5%. ANN is therefore a reliable method for predicting the type of modulation compared with the KNN method. The analysis of the performance of each approach is depicted in Table 7.
Table 7. Analysis of the performance of the different approaches/the best recognition results obtained.
Models |
Settings/Features |
Recognitionrate (%) |
Learningtime |
Testingtime |
Complexity |
KNN |
9 |
74.7 |
Long (seconds) |
Very short
(a few m.seconds) |
Low |
ANN |
9 |
90.5 |
Very long (minutes/hours) |
Very short
(a few m.seconds) |
High |
Classification using the K-nearest Neighbor (KNN) algorithm is a simple and effective approach. But the improvement in classifier performance is still attractive. The learning time is sometimes long, but it is always short compared with the neural method alone. The use of Neural Networks has made the recognition process easier and more reliable. Combining several classifiers is an effective technique for improving accuracy.
5. Conclusion
The contribution of this work is on the development of analog and digital based on ANNs and KNNs, which can automatically recognize all forms of modulation schemes. Using the AMC_Master dataset, we chose a characteristic-based automatic modulation classification scheme for ten analogue and digital modulation signals of; AM_DSB_FC, AM_DSB_SC, AM_USB, AM_LSB, FM, MPSK, 2PSK, MASK, 2ASK and MQAM. From a statistical and spectral analysis of signals, nine key differentiation characteristics are extracted and used as input vectors for each model formed. Simulation results including the recognition rate of 90.5% from the ANN method with a hidden layer (sixteen neurons) highlight this algorithm effectiveness in accurately predicting the modulation type for testing the dataset. These results indicate that the proposed ANN algorithm has successfully captured the distinguishing features and patterns of different modulation schemes, enabling it to make precise predictions. In order to improve this result, future work should explore more advanced techniques, such as convolutional neural networks and reinforcement learning, as potential methods to address the current study’s limitations and improve model performance across a wider range of real-word scenarios of modulations. Our next step is to develop and evaluate methods that enhance recognition accuracy under high noise conditions to improve overall robustness. Also, various channel types, including Rayleigh and Rician fading, can be investigated in the future to extend the work.