A Hybrid Intrusion Detection System for Smart Home Security Based on Machine Learning and User Behavior

With technology constantly becoming present in people’s lives, smart homes are increasing in popularity. A smart home system controls lighting, temperature, security camera systems, and appliances. These devices and sensors are connected to the internet, and these devices can easily become the target of attacks. To mitigate the risk of using smart home devices, the security and privacy thereof must be artificially smart so they can adapt based on user behavior and environments. The security and privacy systems must accurately analyze all actions and predict future actions to protect the smart home system. We propose a Hybrid Intrusion Detection (HID) system using machine learning algorithms, including random forest, Xgboost, decision tree, K-nearest neighbors, and misuse detection technique.


Introduction
The Internet of Things (IoT) is a commonly used term for a concept that incorporates technology and devices for networking. This idea encompasses creations such as Machine-to-Machine (M2M), Wireless Sensor Networks (WSN), Low Power Wireless Personal Area Networks (LoWPAN) communications, or technologies such as Radio-Frequency Identification (RFID) [1] [2]. Ultimately, the goal of the IoT is to develop capabilities for making these devices communicate with other devices using Internet communication protocols. However, despite having limited resources most developers of IoT devices such as smart TVs, smart watches, and smart lights attempt to add additional capabilities such as IoT technology has quickly been incorporated into the development of smart home systems. Smart home systems are designed using sensor technologies; several devices are linked to a specific network where they can be easily operated and monitored [1]. In addition to personal computers and smartphones, objects such as coffee makers and air conditioners, have recently begun linking to the internet, hence the term IoT. Customers can access relevant data from embedded applications while using a smartphone, tablet, or AI speaker to start operating IoT devices. One major example is Google Home [4]. The possibility of these products being the object of cyber attacks is growing as the variety of devices connected to the network increases [5] [6]. In fact, direct attacks and viruses attacking IoT devices have already been identified [4] [7]. These threats may be detected utilizing techniques based on an analysis of hacking behavior as compared with valid use [8].
The fact that smart home systems allow various electronic devices, such as security cameras, to be remotely accessed through the internet means that attackers can take advantage of their faults to steal personal information and breach the privacy of smart home users. These security violations include eavesdropping on communication inside and out of the house through the involved wireless and Internet technologies, while the security cameras may be compromised to expose the activities of a smart home user [9]. Such violations of security and privacy can threaten the protection of a smart home customer and such data can be used to commit serious crimes.
The majority of mainstream attacks targeting connected technologies are intended to undermine the growth of IoT systems [10]. However, because IoT devices are intertwined with everyday life, attacks can have an immediate and direct effect on users [11]. For example, hacking into commercial air conditioning units could result in the ability to change the temperature range in medical centers thereby compromising the safety of the healthcare environment. Tools to detect and eradicate attacker-initiated activities are also essential. Traditionally, cyber threats, safety tools, and intrusion prevention systems are also used to identify attackers. Using pattern recognition, these tools normally recognize threats by comparing the packets with a set of rules.
The conventional IDS is not very accurate when detecting anomalous trends since it operates on the basis of standard laws. In smart homes, these laws cannot be changed with new anomalous patterns [12]. In a smart home environment, modern wireless networks, computers, and sensors face various security threats, and machine learning is seen as an ideal solution to this problem. Using different learning algorithms train sensors, and computers without any explicit programming, machine learning technology takes advantage of artificial intelligence using various learning algorithms train sensors, and devices without any explicit programming [13] [14] [15]. This paper aims to introduce the use of a Hybrid Intrusion Detection System  (HID) with a two-tiered intrusion detection system as shown in Figure 1. The first tier contains the machine learning technique. This technique has been studied by the smart home's network traffic. The second tier will examine all requests that are being sent to the system based on patterns of user behavior profile. The reason for having a two-tiered intrusion detection system is to increase the system security and restrain the error rate since there will be more than one user who can control and monitor a smart home [1] [13]. The remainder of this paper is organized as follows: Section 2 briefly discusses the smart home technology. Sections 3 presents the problem statement. Next, the evaluation is demonstrated in Section 4. Section 5 shows the result. Finally, Section 6 contains the conclusion.

Smart Home Technology
The design of smart homes architecture consists of four main layers: the physical layer, communications layer, information layer, and decision layer [16]. The physical layer contains the essential hardware of the smart home such as devices, sensors, routers, and any devices that can be involved in the smart home network. The communications layer is comprised of the software that is mainly used to format and route data between users, agents, and the house. The information layer in a smart home's network is used to capture and store information which is later used to produce information to identify patterns used in decision-making. The decision layer is structured to determine the type of behavior obtained or stored in the information layer. As such, all four layers work closely together in the sense that the activities associated with one layer support the others [13] [17]. Advances in Internet of Things tors are used to manipulate a physical component; these are devices that are given a specific input upon the information on which to act and a specific motion. A physical feature, such as a temperature control valve mounted in smart homes, is manipulated by actuators [1] [19]. A sensor gathers and distributes information about the physical environment and sends it to systems and devices for action. Sensors detect, measure, and indicate physical quantities such as light, motion, heat, pressure, and moisture, among others by converting them into electrical signals [1] [20]. Gateways serve as the bridge between the actuator and the sensor. Gateways collect data from the sensors and send the processed data for action to the actuator. Gateways are technically the control centers to provide access to the users to their smart home device [1] [21].

Services
The service is a software program that has two methods to operate in a smart home system. A cloud provider that takes the responsibility of maintaining the program hosts is the first method, and the second method is to provide the service within the home environment. However, having the service inside the home setting means that users are responsible for tracking and upgrading any components of the software themselves [1] [22].

Problem Statement
Today, architects are incorporating smart home technology into new construction designs by adopting wired and wireless network infrastructures, paving the way for a seamless transition to this technology in the future. Many users are unaware of the threats to their privacy and security that exist from the potential breach of information collected by smart home devices ( Figure 2). Every year the sophistication and number of cyber threats increase with millions of identities and billions of dollars being stolen.
There are hardware limitations on smart home devices presenting a major issue for IoT devices. These hardware limitations also lead to difficulty in adapting security features to any IoT devices over time. Since encryption and decryption are complex operations that involve a lot of computations, security approaches that rely heavily on encryption are not a good match for applying these resource-constrained devices. Most researchers agree that there are two major drawbacks to smart home devices: battery power and hardware computing [1] Advances in Internet of Things [23] [24]. The second major dilemma is heterogeneous protocols and weak encryption schemes can also affect dynamic features of smart home devices. Both heterogeneous protocols and weak encryption schemes lead the smart home network to face a lot of security problems [1] [22]. Smart home providers often try to deploy secure services by reaching the essential security and privacy requirements, which include confidentiality, integrity, and availability. All these implementations will depend on factors such as device capabilities, mode of operation, and the manufacturer [1] [22].
Such network attacks that can occur at any given time might be detectable by applying a technique to study smart home network traffic. However, because smart home devices are closely employed by the user every day, there would be a risk of attack coming from the user behavior tier [25]. For example, if the request is legitimate, and passes the network tier, the only method to determine if this request comes from the legitimate user is to have a known set of patterns.
Therefore, user-behavior needs to be studied and identified, selecting the right user who sends the proper request at the right time while receiving the sensors correct request.

System Description
In the context of a sensor network, the smart home as a distributed environment shows the generic features of unreliability, which creates problems for behavior prediction. Security methods that rely heavily on encryption are not standard on these resource-constrained devices because encryption and decryption are complicated operations that require several computations [1] [26]. Even if activated correctly, the malfunctioning condition of sensors may not produce a trigger event. Currently, using only one IDS will not be enough to secure and determine all requests that might occur in the smart home. We propose a HID in order to detect such attacks based on a profile of user behavior by using a two-tiered IDS.
The first tier is for intrusion detection systems using machine learning algorithm. The machine learning algorithm is an efficient data mining algorithm that can be used for real-time network intrusion detection [1] [26]. The second tier is the misuse detection technique that applies a known set of user activity patterns.
The user behavior profile will ask questions to determine the normal behavior of a user, thereby allowing anomalies to be identified [1].
In this paper, there are two experiments using two sorts of datasets. The first one is CSE-CIC-IDS2018 and the second one is NSL-KDD as shown in Table 1.
This experiment was done using Jupyter Notebook and, Python. The libraries that we used are panada, and sklean. The operating system is Windows with Intel core i7 processor. Figure 3 provides an overview of the first tier of the HID smart home system which will scan the network requests that come from the user side. This phase aims to examine all requests coming to the smart home system using machine learning. We used and compared four types of machine learning algorithms [26].

System Model
They are random forest, Xgboost, decision tree, and K-nearest neighbors on  The results show that our models for each algorithm can effectively achieve seemingly satisfactory classification accuracy with the lowest false positive [26].
Before starting the training model, we applied preprocessing the CSE-CIC-IDS2018 and, NSL-KDD datasets the following steps: 1) Preparing the dataset by clearing noisy, and missing data.
2) Replacing the data frame with pandas library.    2) When users are not sending requests to sensors, the system is in static or fixed mode.
3) Each sensor is programmed to expect requests from certain users during predetermined times each day.

Evaluation
Most of the similar research work was executed by doing one tier of IDS. This tier could be focused on network behavior or user behavior. To summarize these methods, Table 2 presents the current IDSs for the IoT network tier. Consequently, current IDS ideas on the IoT environment are still at an early stage of growth. Some experiments have used data from network simulations or datasets that might dramatically decrease from a realistic setting.
Amouri et al. [27] incubated IDS for IoT networks by using machine learning.
Their idea was to create list of the benign behavior of each sensor and detect any irregularities in network traffic. However, the experiment was evaluated by using a simulated network and not a real testbed. Doshi et al. [28] also developed machine learning algorithms in IoT networks to detect a particular attack, Distributed Denial of Service (DDoS) attacks. However, the studies rely exclusively on learning one attack behavior. In a study conducted by Lotfi et al. [29], with the intention of identifying any unusual short term and long term activities happening in a smart home environment by using neural networks. The results demonstrate that the system was showing the many false positives that can occur when analyzing the security of a given network. Yamauchi [4] developed an IDS for the smart home system by applying method learned sequences of events for a

Parameter Setting
In this paper, we attempted different parameters to achieve accuracy in all the implemented algorithms. The chosen training and test data were divided into 80% to 20%. We used a random forest classifier, Xgboost, decision tree, and K-nearest neighbors. The accuracy shows the percentage of data normality and attack data that are true to classify. The metric used to detect attacks can be calculated using the following Equation (1) The other metrics, such as precision and recall, can be calculated using the following Equation (2) and (3). Precision is indicated as a positive predictive value that means the precision of exposed attacks behaviors was correct [13] [26]. Recall indicates the true positive rate or sensitivity, meaning how many anomalies requests the model exposes. Accuracy, recall, and precision is the most distinguished metrics used for comparing the performance of the algorithms used in intrusion detection systems. Other metrics, such as F1, should also be considered. F1 values refer to how discriminative the model is. It can be calculated by using Equation (4):

Network Behavior
The results demonstrate that the system, for the first tier experiment CSE-CIC-IDS2018 in Figure 5, the K-nearest neighbors was recognized as the most successful algorithm with an average accuracy rate of 95.9% [26]. Random forest was identified as the second most accurate with an average rate of 95.7% [26]. Other algorithms also earned strong accuracy relative to K-nearest neighbors and random forest. For the second experiment, NSL-KDD in Figure 6, random forest was the most successful algorithm with an average accuracy rate of 98.6%. Xgboost was the second most accurate with an average of 98.5%. The other algorithms also fulfilled strong accuracies likewise to random forest and Xgboost [26].

User Behavior
In several matching pattern methods, pattern matching algorithms are a crucial  represented within a body of information as tree frameworks. The matching pattern method is most commonly used to examine and detect any request for concern that arrives at the smart home and does not correspond with the pattern model. Based on our previous work [1] [13] about the smart home network and using machine learning as an overarching framework, we added the patterns of user behavior profile based smart connected home described as Algorithm 1, S: is the sensor number which will be in a different location, t: is the time, u: user.
We used a dataset that belongs to the CASAS project [31]. The CASAS is a project for creating real smart homes for researchers in this field. A simple and lightweight toolkit called "smart home in a box" has been developed. To be able to provide smart tasks, the components of this toolkit are packaged in a single small box and conveniently mounted in a home. The toolkit was installed in 32 smart homes and created several datasets [31]. We employed one of the CASAS datasets for this study. The file that we employed has three features: date, time, sensor number, and status. We used the time column and sensor number column to create a scenario, as we mentioned in the case study part.
To improve user protection, Human behavior is various and hard to incorporate into one lifestyle. This means that each person can differ from one person to another. Therefore, to implement a data-driven approach for human behavior dealing with smart home sensors, feature extraction is one of the most important steps. This refers to the process of learning how many times a user will send a request to the smart home system using his smart devices such as a smartphone or smart tablet information from the sensor data. To conceptualize static user behavior to a normal level that is applicable to more than one individual, static user behavior will be usefully represented as a stable use of smart home sensors.
We created a scenario that considered an example to highlight the pattern task-related in user behavior. The task model of this use case starts from the early morning routine of a user awakening around 5:00 AM. The User always turns on the light, runs the water, and turns the coffee machine on to get ready before leaving to work. The User also turns on the TV and watches it while eating  Table 3 shows the time that each sensor can be received a request from the user side. Table 4 shows the anomaly event, which included routine attacks that may cause immediate and personal harm to users.
In this experiment, we create 8 types of sensors listed that connect it to a smart home. We assume these smart home devices can be connected to the Internet, users can command these devices.
We analyzed the packets from/to for a period of time. The result showed us the deployed electronics when the user controls the devices and shows the system has ability to clarify the status of the sensor when devices are operated. Figure 7 shows how the system can determine and accurately classify all requests that come to the smart home. To prove the efficacy of the user behavior system, we tested the system by generating a random request with a time and ran it through the system to see how the system determined the request, Figure 8 shows the random result. The evaluation shows that the system can detect the type of request if it is legitimate and match it with the user behavior profile. We    observe that there are a limited number of legitimate requests that the user input into the user behavior profile. We added 2 anomalous requests that resembled legitimate request of turning on each sensor into data and attempted to detect them. The evaluation shows that the system can detect the type of request if it is legitimate and match it with the user behavior profile.

Conclusions
Theoretically, the smart home system would be a part of overall smart living, such as entire smart cities, and connect to various networks at any time and anywhere. The smart home system has two divisions, including network behavior and user behavior. However, this two-part design makes the system more vulnerable. This paper proposed a novel hybrid model based on intrusion detection methods tailored for smart homes, a machine learning-based prevention technique, and misuse detection methods based on user behavior profile patterns.
For the first tier, the proposed approach can be used for controlling data and monitoring systems that have specifications for individual smart home devices.
The method is a scalable model that is cohesive with big data. We analyzed the model with CSE-CIC-IDS2018, and NSL-KDD datasets can still be applied on relatively minimal datasets with a low ratio of anomalies request. For the second tier, we focused on adding the detect anomalies method that offers more protection to smart home systems and supports the network tier. This approach examines all requests that come from the network tier and detects anomalies from user profiles. Anomalies will be identified and analyzed by monitoring the number of requests for specific events and the time duration of an activity. By doing this, the system will be most effective and secure [1].

Conflicts of Interest
The authors declare no conflicts of interest regarding the publication of this paper.