The development of a public, packet-oriented, communication network (Internet network), accompanied by an increase in the number of users and information and communication (IC) services, has also resulted in an increase in the amount of data transferred. Data stored, processed and transmitted through the IC system is often the target of illegitimate users whose goal is to gain unauthorized access or to prevent legitimate users from accessing IC system resources. This results in an increase in the need for research in the field of IC protection in recent decades.
The goal of protecting an IC system is to achieve and maintain the required level of basic security principles. The basic principles of security are presented by the CIA (confidentiality, integrity, availability) model, which embraces the integrity, confidentiality and availability of IC resources. The availability principle is defined as the probability that the requested service (or other IC system resource) will be available to a legitimate user at the required time. There are several factors to negatively impact the availability of IC resources. They can be classified according to the source of action (internal and external) and the executor (human, environment and technology). One of these factors with the steadily increasing trend over the last ten years is network-oriented Distributed Denial of Service (DDoS) attack, or DDoS traffic as a means of conducting attacks. The traffic generated by the DDoS attack is aimed at exploiting the deficiencies of the elements of the IC system in charge of processing and transmitting data such as communication links, active network equipment (routers, switches, firewalls, etc.) and devices intended for processing user requests and delivery of services (servers). The primary disadvantage that a DDoS attack exploits is the limitation of the capacity of the communication link, network equipment, or server.
Congestion can result from an increase in the intensity of legitimate inbound traffic that exceeds the total server and queue capacity, which negatively affects the quality of service (QoS). In doing so, it is necessary to apply traffic flow control methods which, between traffic flows of equal importance, will determine those that will be processed first.
Another way of creating congestion in a communications network may be the result of deliberately generating DDoS traffic. Such traffic has the characteristics of a legitimate user, and its primary objective is to exploit the previously identified shortcomings of the IC resources and to cause congestion resulting in degradation of quality or complete inaccessibility of the IC resources to the legitimate user. Using traffic flow control and congestion management methods to solve DDoS traffic problems is not appropriate. The reason is that traffic flows are not of equal importance and it is therefore necessary to detect illegitimate traffic, which is an anomaly of network traffic at the level of individual network packets or traffic flow.
Network traffic anomaly detection is a dynamic and broad area of research. Any network traffic pattern that deviates from the sample of a previously defined profile of legitimate (normal) traffic and has the potential to disrupt the normal operation of the IC is considered an anomaly. The legitimate traffic profile is defined by the values of traffic features recorded over a period of time in which the traffic generating terminal device is not security compromised and operates in the manner defined by the manufacturer. The root causes of network traffic anomalies may be related to performance or IC system security. One of the growing causes of security-related network traffic anomalies is DDoS attacks. This type of attack utilizes a number of compromised terminal devices to generate legitimate, DDoS traffic to the destination. The consequences of DDoS attacks are degradation of quality or complete unavailability of IC services to legitimate users.
The emergence of the Internet of Things (IoT) concept as a new direction of technological development and a new communication paradigm that brings together billions of new devices connected to the Internet, creates a new space of security vulnerabilities that can be exploited for unauthorized and malicious activities. The continuous growth in the number of such devices, their inadequate protection and the ability to generate traffic on the network, makes them ideal candidates for the creation of a botnet network for the purpose of generating DDoS traffic of unprecedented traffic intensity. The concept of smart home as one of the fastest growing application areas of the IoT concept is becoming one of the most heterogeneous application areas in terms of number of IoT devices manufacturers. Such devices are often delivered with minimal or no protection, and the security of such devices is also reduced by the ease of use required by end users, who often do not have the adequate level of knowledge required to install and operate such devices. All of the above listed smart home devices are among the most vulnerable to a number of security threats, emphasizing the use of such devices to generate DDoS traffic.
The subject of this doctoral research is the traffic characteristics generated by IoT devices in a smart home environment as a basis for detecting anomalies resulting from DDoS attacks. Based on the research problems and the existing shortcomings, the following scientific hypotheses of the research were put forward: (1) Based on the traffic features generated by IoT devices in a smart home environment, it is possible to define classes of IoT devices and associated profiles of legitimate traffic.
(2) Based on the defined profile of legitimate traffic of a particular class of IoT devices in a smart home environment, it is possible to detect with high accuracy the illegitimate traffic generated by such devices.
The concept of IoT offers numerous benefits in different fields of application, but from the point of security view, it also highlights a number of challenges that need to be adequately addressed. Research within this doctoral thesis considers the smart home environment as one of the fastest growing application areas within the IoT concept. Devices within this environment have many limitations and disadvantages that make them potential generators of DDoS traffic.
Despite the identified shortcomings, the communication of such devices generates traffic that possesses specific features and differences with respect to conventional devices. This research seeks to analyze the possibilities of applying such features for the purpose of classifying devices, regardless of their functionality or purpose. This kind of classification is necessary in a dynamic and heterogeneous environment such as a smart home where the number and types of devices grow daily, as it depends solely on the traffic features such devices generate.
Device classification allows defining the legitimate traffic profile of a particular class, based on which it is possible to determine deviations in the form of anomalies caused by the DDoS traffic generation of an individual device. Consequently, the aim of this research is to develop a model for detecting illegitimate DDoS traffic generated by IoT devices in a smart home environment based on specific traffic features and class affiliation of IoT devices.
Based on the above, the scientific contributions of the doctoral research are as follows:
(1) Identification of traffic features by which it is possible to classify IoT devices in a smart home environment for the purpose of detecting illegitimate DDoS traffic.
(2) Defining legitimate traffic profiles for each class of IoT device in a smart home environment.
(3) DDoS traffic detection model based on traffic features and class affiliation of IoT devices.
Despite the high accuracy of detection and the advantages shown by the methods used, there are some shortcomings in the research of DDoS traffic detection problems to date. The first drawback is reflected in the datasets used, that is, in traffic records, which are the basis for the development of the detection model. Datasets containing traffic are often outdated, which reduces the accuracy of detection because they do not reflect the characteristics of current traffic that are changing as technological developments in new IK devices, concepts and services change. The previous research implies DDoS traffic generated solely through conventional terminal devices without considering devices for which human communication is not necessary for communication. The latter devices are unified under the IoT paradigm.
According to predictions, by the end of 2020, approximately 31 billion IoT devices will exist globally, and till 2025 75 billion. In this case, 41%, or 12.86 billion IoT devices will be installed within the concept of smart home (SH). The limitations of IoT devices in general, and thus SHIoT (smart home IoT) devices, are described in the previous researches, covering hardware constraints, high autonomy requirements and low cost of production, which reduces the ability to implement advanced security methods and increases the risk of numerous threats. Traffic generated by SHIoT devices or MTC (Machine Type Communication) traffic is different from traffic generated through conventional devices, HTC (Human Type Communication) traffic. Although SHIoT devices are characterized by heterogeneity, MTC traffic is homogeneous in contrast to HTC traffic, which means that devices of the same or similar purpose behave approximately equally, that is, generate traffic of similar characteristics.
The identified shortcomings of previous research, such as taking into account of SHIoT traffic features when detecting DDoS traffic, the consideration of classes of SHIoT devices that generate roughly equal values of traffic features, and the number of devices used in the study, will be sought to be remedied by planned research.
The importance of this research is also evident through the increasing number of research and projects in this field. An example of this is the project called Mitigating IoT-Based Distributed Denial Of Service (DDoS), implemented by NIST (National Institute of Standards and Technology) and NCCoE (National Cybersecurity Center of Exellence), which addresses the issue of generating DDoS traffic through an IoT device.
The research within this doctoral thesis formed the laboratory environment of the smart home. Such an environment is comprised of a variety of SHIoT devices, along with an accompanying communications infrastructure and software-hardware platform that enables traffic collection and data set to be applied in later stages of research and development of network traffic anomaly detection models. In addition to the primary data collected through the process described above, the research also included secondary data, encompassing a greater variety of SHIoT devices. The reason for this is the heterogeneity of devices that can exist in the observed environment.
A total of 41 devices in a smart home environment were used for this doctoral research. According to statistics, there are differences in the estimation of the average number of SHIoT devices per household that has a certain form of smart home implemented. These estimates range from 6.53 to 14 SHoT devices per household. In the Republic of Croatia, smart home representation is still low, and telecom operators are assuming the role of smart home provider through the offering of end-user SHIoT devices. For example, Iskon Internet service provider offers customers the option of purchasing a smart home package that makes four SHIoT devices, while telecom operator A1 provides users with the ability to deploy a total of five SHIoT devices in a smart home environment. Despite mentioned, this research sought to achieve the greatest possible variety of SHIoT devices due to the need to define device classes based on the characteristics of the traffic generated. Therefore, the number of devices used is higher than the current statistical estimate of the average value of SHIoT devices per smart home in the Republic of Croatia and worldwide.
Predictability of IoT device behavior is a phenomenon that has been the result of communication activities of IoT devices observed in numerous studies. Given that SHIoT devices have a limited number of functionalities, certain devices will behave approximately the same in time according to the values of the observed traffic features. Unlike IoT devices, conventional devices (smartphones, desktops, laptops, etc.) support the installation of a large number of applications, where the communication activity of such devices depends on the end users and how the device is used. Accordingly, the index of the predictability level of the behavior of an IoT device, expressed by the coefficient of variation of the received and sent amount of data (Cu index), is a measure on the basis of which it is possible to determine the behavior of an SHIoT device over a period of time. The closer the index (Cu) to 0, the observed device has a smaller deviation with respect to the amount of data received and sent, and it is considered that the level of predictability of the behavior of such device is higher than the device whose index Cu is greater than 0.
For the purpose of developing a classification model based on the logistic regression method enhanced by the concept of supervised machine learning, a data set was created containing the values of extracted features of traffic flows of SHIoT devices and belonging to the class of individual device for each traffic flow in the set. Model development, testing and validation were performed using the WEKA software tool with the support of MS Excel 2016 during the preparation of the model development dataset. Since a total of 59 features were selected using the information gain method, during model development, the number of features was gradually reduced when the validation measures for each model were compared. The aim of this procedure is to develop a model that will use the least possible number of independent features that will not significantly affect its performance. Each model was validated by k-fold cross-validation at k = 10. This method is used to evaluate the behavior of the model over data not used in the learning phase. In doing so, the model is applied iteratively k times over the dataset. In each iteration, the data set is divided into k parts. One part of the set is used to validate the model while the remaining k-1 parts of the set are combined into a model learning subset. In order to develop DDoS traffic detection models based on predefined classes of SHIoT devices, it is necessary to define the legitimate traffic profile of each device class. When developing any anomaly detection model based on supervised machine learning methods, it is necessary to have a data set that will represent legitimate traffic and a data set that will represent illegitimate traffic. The defined classes of SHIoT devices allow the establishment of a legitimate traffic profile for each class of device, which is important in the later development of anomaly detection models. In doing so, the SHIoT device traffic feature values become part of the legitimate profile of the observed device class. The legitimate traffic profile of a particular class of SHIoT device is defined by the values of the features of those traffic flows that are assigned to a particular class of SHIoT device by the classification model. The Logistic Model Trees (LMT) method was used to develop a model for detecting illegitimate DDoS network traffic. The WEKA software tool was used to implement the method and process the data, and datasets that represent the profiles of normal traffic resulting from the SHIoT device classification model and illegitimate DDoS traffic datasets.
The work of the developed model of detection of illegitimate DDoS traffic takes place in two stages. The first phase is a prerequisite for the later detection of DDoS traffic in the second phase of operation and implies the classification of the SHIoT device based on the generated traffic flow. One of the basic metrics that indicate model performance is classification accuracy and kappa statistics. According to the classification accuracy, all models show high performance, which means that based on the observed flow, they can determine with high accuracy whether the traffic flow is the result of legitimate device communication or the device generates DDoS traffic. Thus, the LMT model for the C1 device class shows an accuracy of 99.9216%, or 56092 accurately classified traffic flows, as DDoS or traffic flow that legitimately belongs to a SHIoT device in class C1. A total of 44 traffic flows were misclassified, or 0.0784% in the total set of 56136. In addition to high accuracy, the LMT model for the C1 device class also exhibits a kappa coefficient (κ = 0.9984) indicating high model performance. The LMT model version developed for the C2 class shows high accuracy (99.9966%). This implies 59660 accurately classified traffic flows in a set consisting of 59662 traffic flows. The classification error is 0.0034%, or two traffic flows. The kappa coefficient is 0.9999, which indicates the high performance of these LMT models. The LMT classification model developed for the C3 class provides 99.9744% accuracy. Therefore, out of a total of 58661 traffic flows, 15 were misclassified, or 0.0256% while accurately classified, 58646. The kappa coefficient of 0.9995, as in previous versions of the LMT model, indicates its high performance. The latest version of the LMT model, developed for the C4 class, shows an accuracy of 99.9583% which implies 59879 correctly classified traffic flows. Accordingly, a total of 25 traffic flows were misclassified. The success of the model as measured by the kappa coefficient is 0.9992.
Research has shown that it is possible to define device classes based on the variation of the received and sent traffic ratio, and it is possible to classify devices into defined classes based on the traffic flow features such devices generate. Finally, depending on the affiliation of an individual device to a defined class, it is possible to determine whether the traffic flow that the device generates is an anomaly in the form of DDoS traffic or legitimate traffic.